Feature selection methods are popular and are definitely useful to remove correlated data that may confuse a classifier or simply to reduce the feature vector size. Does a certain threshold for feature vector dimension exist or some guidelines.
Redundant features add no relevant information to your other features, because they are correlated or because they can be obtained by [linear] combination of other features. Having them on your set will not add anything, but it won't hurt either, information-wise.
It will, however, hurt your training and classification times. Any limits or guidelines? You name it! Can you put up with the little extra time to train your classifier using that additional feature? Then leave it. Does it take you one week instead of one day? Remove it!
Bear in mind, however, that some algorithms may have different complexity over the number of features than over the number of samples. Consider which of them hurts you most, and then choose your strategy: reducing dimensions AND subsampling are both pertinent.
To sum up: make your estimation in terms of time and balance "preprocessing effort" vs "training/classification time".
I am running my final test feature vector through various classifiers (KNN, SVM, LDA, decision tree, Naive Bayes) using a Grid Search cross-validation method. I am using python and mostly the sci-kit learn module.
Redundant features add no relevant information to your other features, because they are correlated or because they can be obtained by [linear] combination of other features. Having them on your set will not add anything, but it won't hurt either, information-wise.
It will, however, hurt your training and classification times. Any limits or guidelines? You name it! Can you put up with the little extra time to train your classifier using that additional feature? Then leave it. Does it take you one week instead of one day? Remove it!
Bear in mind, however, that some algorithms may have different complexity over the number of features than over the number of samples. Consider which of them hurts you most, and then choose your strategy: reducing dimensions AND subsampling are both pertinent.
To sum up: make your estimation in terms of time and balance "preprocessing effort" vs "training/classification time".
I have found that dimensionality reduction techniques, especially simple linear ones like PCA, can hurt classification performance. However, this may be specific to my data and other methods for feature extraction and classification. The scientific approach would be to test it in your data by doing it both ways!
its's not your data! I wrote this several times in different RG blogs and forums. The use of PCA for feature reduction is not a good method because PCA tries to find the "best" eigenvalues in the sense of variances. That does mean, that you try to extract the most dynamic one, but this does not necessarily mean, that these features are the most prominent ones! Assume a little example: Your feature are overlaid with noise Which is normal in real-world-applications then PCA will generate you Eigenvaluses (features) which react "good" on noise, but you would like to have are stable features. Therefore, PCA ist under practical reasons not a good reduction method.
Models for reduction based on LDA or the like are much better to handle and give better results than PCA.