I have gone through this paper.
https://wlv.openrepository.com/bitstream/handle/2436/622981/IF2019.pdf?sequence=2
In section 3.2, they had describe the methods to extract audio features. But I could not understand the last two part: 1) how to build power spectrum with 2048-bin 2) how to build 23-D log-fb vector. As after applying hamming window, I got an array of dimension (3000,800).
Kindly guide me how can I implement these two points. And I am using python.