I am doing a research on feature selection for extra-high dimensional data (big data). Does any one have good advice about finding the redundant features? Please don't mention PCA, decision trees or information gain :(
You could go old school and compute an intercorrelation matrix showing the pairwise correlation coefficient of two features at a time. And remove them if they have: (1) very low variance such as less than 0.01 or (2) if their pairwise correlation coefficient is greater than, for example, 0.99.
Please see Figure 4 from the full text of the link provided below.
Hope this helps.
Best regards,
Chanin
Article Unraveling the origin of splice switching activity of hemogl...
See ECFS -> Feature Selection via Eigenvector Centrality the code is available for Matlab https://it.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library?requestedDomain=www.mathworks.com