The key to speech emotion recognition is the feature extraction process. The quality of the features directly influences the accuracy of classification results. If you are interested in typically feature extraction, the Mel-frequency Cepstrum coefficient (MFCC) is the most used representation of the spectral property of voice signals as well as you can try energy, pitch, formant frequency, Linear Prediction Cepstrum Coefficients (LPCC), and modulation spectral features (MSFs).
According to your suggested IS09 and IS10 which one is better so both are working good and there is no big difference but I recommend trying high-level (DL) features, it will be defiantly better than low-level.