The language recognition uses the Shift Delta Coefficients(SDC) as acoustic features.

Some papers uses only SDC(i.e. 49 for each frame), while some uses 

MFCC(c0-c6)+SDC (total of 56 for each frame). 

Question is :

1) Are SDC are enough for language modeling(i.e. 49)

2) Are MFCC(c0-c6) + SDC much better, and what about c0 should be energy of frame of simple c0? 

More Rizwan Ishaq's questions See All
Similar questions and discussions