Hi All,
I have an audio database consisting of various types of signals and I'm planning to extract features from the audio signal. So I would like to know whether it's a good idea to extract basic audio features (eg MFCC, Energy ) from the audio signal with a large window (Let's say 5s width 1s overlap) rather than using conventional small frame size (in ms). I know that the audio signal exhibits homogeneous behavior in a 5s duration.
Thanks in advance