I know the feature selection method keeps most relevant features and reduces redundancy. My question, does the feature extraction method like PCA do the same?
this question is related to dimension reduction aproaches, featrures selection and features extraction:
dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.
Feature selection approaches try to find a subset of the original variables (attributes). this approaches can be divided on two strategies: information gain strategie as Relieff ...that calculating information gain of each attribute and optimisation problem strategie like Genetic algorithm to select attributes...
this approaches can be used to match big features vectors in big data.
Feature extraction transforms the data in the high-dimensional space to a space of fewer dimensions as PCA, LDA, CCA ...
In general we use PCA to visualise data of features in 2 or 3 dimension.
Thanks dear Abdelbar Nasri. You understand my question. I know Feature selection and extraction both are for dimension reduction. My question: If I want to use both then which one would be first? At first feature extraction and then feature selection or at first feature selection and then extraction to reduce dimensionality?
Actually, the feature extraction process does not necessarily reduce dimension. If a set of transformation, like FFT, wavelet transform, higher-order statistics and others, are performed in order to find information not revealed on the original data space, a dimension expansion can be achieved. And in this case, the use of a feature selection approach is strongly recommended.
Answering your first question Farhad Bulbul, the PCA reduces data redundancy by making them uncorrelated, but it is not guaranteed that the PCA-based features are the most relevant ones.
Furthermore, some feature selection methods are not able to perform redundancy reduction. Fisher's Discriminant Ratio (FDR) is one of them.
Answering your last question, it is recommended to perform feature extraction before feature selection.
Thanks for all answers. Now I want to use mutual information maximization and PCA both. At first I will select a feature subset using mutual information maximization and then apply PCA. Can I do it? For example, I will apply mutual information maximization to select a feature subset of 200 from 10,000 features and then apply PCA on 200 features to reduce it to 20 only. So , Can I do it? or at first, I need to use PCA to reduce from 10,000 to 200 and then 200 to 20 by mutual information maximization? Already I have done some experiments on Car dataset of VOC2007. I got better result using mutual information-PCA rather than using PCA-mutual information.
An important characteristics which has not been preserved in many F.E. techniques is the order of data in the primary domain. Some methods like "using curve fitting in feature extraction" preserves it.