Why is the performance for automatic speech recognition task using neural networks based on forced aligned HMM data set (automatically segmented data set from TIMIT database) is better than manually segmented data set (TIMIT).
Maybe the manually segmented corpus isn't actually internally coherent - the human annotator performed differently in various parts of the database, hence the bias. ANNs then obviously cannot capture this incoherence. On the other hand, the force alignment is always coherent - maybe not correct, but at least the errors are systematic and perhaps predictable (unlike in the case of the manual annotator).
So, automatically segmented corpus will always perform better than manually segmented corpus for ASR application using ANN ? Because my assumption was ANN performance may go up for atleast 3-4% by using more accurate manually segmented TIMIT corpus than Forced Aligned corpus.
I wouldn't put it exactly like this... It's quite a complex issue, and I would myself also expect the performance to go up if I use more precise manual segmentation. But here it seems to me that the manual corpus comprises more information than the forced aligned data set, i.e. the manual segmentation has higher entropy, is more irregular - because it comprises all the nuances added by the human when carefully listening, observing the waveforms and making well-informed and cognitively highly complex judgements. So in my opinion this can have various consequences, such as: 1) your ANN is performing worse in generalizing the more irregular manual data set; 2) your feature vectors (MFCC?) aren't discriminative enough for the variability in the manual dataset, and the ANN gets "confused" by being trained with more cases of very similar input vectors with very different outputs (and this is a combination of the problems with the feature vectors not being able to properly describe the actual data, and the ANN itself not being able to cope with such a non-linearity). And since the automatically segmented data set has probably significantly lower entropy, it accidentally (for your case) happens that the ANN performs better here.