The language recognition uses the Shift Delta Coefficients(SDC) as acoustic features.
Some papers uses only SDC(i.e. 49 for each frame), while some uses
MFCC(c0-c6)+SDC (total of 56 for each frame).
Question is :
1) Are SDC are enough for language modeling(i.e. 49)
2) Are MFCC(c0-c6) + SDC much better, and what about c0 should be energy of frame of simple c0?