DCT is similar to the DFT, but it only consider the cosine part with real values. Thus, some features in audio signal processing such as Frequency Domain Linear Prediction (FDLP) uses DCT to convert the time domain signal into frequency domain. The DCT was meant for diagonalizing the resulting co-variance matrices with Toeplitz structure, to give fast estimates of their eigenvectors. Transforming a big co-variance matrix into a thin, close to a diagonal, matrix is an instance of decorrelation. In this way, it compresses the energy of a feature. Is my understanding correct?