What are the better features for speaker recognition MFCC or MFCC+delta+deltadelta? I tested my system with both type of features, and I got better results with mfcc only? Any comments about these observation????
Usually, there is a 20% gain in performance when MFCC+D+DD features are used compared to MFCC features since they convey richer information about the frames context [1][2].
You might want to check the computation of your dynamic features (or evaluate your two systems on a bigger trials set if you're running tests on few data).
---
References
[1] Hanson, Brian, and Ted Applebaum. "Robust speaker-independent word recognition using static, dynamic and acceleration features: Experiments with Lombard and noisy speech." Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on. IEEE, 1990.
[2] Furui, Sadaoki. "Speaker-independent isolated word recognition using dynamic features of speech spectrum." Acoustics, Speech and Signal Processing, IEEE Transactions on 34.1 (1986): 52-59.
It depends on the later processing. If you project your features into some low dimensional space in a sensible way, then the additional delta features become redundant, and only the original features matter. Check the IBM papers about a decade ago for more information.
Delta and double delta will improve performance accuracy but add more redundancy. I already used LDA for feature selection using SPSS Tool and it discards more than 12 MFCCs coefficients, all of them were first and second derivatives.
What is the significance of delta delta mfcc ? If delta mfcc can resolve the overlapping mfcc static coefficients collected frame by frame then what exactly remains as the significance of delta delta mfcc ? Understood the mathematics behind but couldn't understand why it is required exactly or whether we can do more delta operations on Mfccs