I've tried working with conventional methods using the parameters ZC, STE and AC, but I'm getting unexpected results and think it may be because of not setting the threshold optimally.
If you only want to detect silence regions form others(voiced or unvoiced), it is better to use Voice Activity Detection(VAD) methods which has been investigated already.
If signal is not corrupted by noise, energy of each speech frame can be used to separate silence regions.
here, I attached a energy based VAD (Matlab code) which works in low SNR.
To separate the Voiced from Unvoiced, the frequency domain techniques are more effective.
The question is not complete, as the answer to this question strongly depends on the application. For example for speaker recognition energy based activity detector works better than a pitch based, while it is exactly opposite for accent recognition.
Please read the following article for more details:
1.Contrasting the Effects of Different Frequency Bands on Speaker and Accent Identification,
Saeid Safavi, Abualsoud Hanani, Martin Russell, Peter Jancovic, M Carey, IEEE Signal processing letter.
I do not know if you still need the answer for you question, if so you can try the method of landmarks. Pleas read the article from the attachement. You should use landmarks 'g' in order to detect voiced regions of speech. I obtained detection rate above 90% in my preliminary experiments.