Binary GA/EA/ACO with each genes representing a feature along with a classifier output as fitness function will provide support for feature selection. OR you can use ICA for feature selection.
PCA will provide you dimensionality reduction with less number of features which can be used for better classification, but it will not help you to select the features which are responsible for better classification.
There are a lot of tools you can use. You can use the well known first and second order statistical parameters (like mean, standard deviation, kurtosis, moments etc). For classification tasks you can use the unsupervised k-means clustering
algorithm. In order to maximize the performance of the classifier, the ROC analysis is recommended.
You can take a look to the paper:
Luminita Moraru, Dorin Bibicu, Anjan Biswas, Standalone functional CAD system for multi-object case analysis in hepatic disorders”, Computers in Biology and Medicine , 2013, 43 (2013), pp. 967-974
There are whole lot of feature selection methods and approaches. Optimal feature sets are not always feasible.
If you have too many features then applying a feature selection like forward feature selection will take forever, because of the size of the search space.
On the other hand, you can use feature ranking methods like chi-square feature selection. It ranks features based on their Chi-square value and its relevance to the class label.
As other colleagues explained above, PCA is a feature extraction method that uses orthogonal projections to reduce dimensionality. Try creating a pool of candidate features without using PCA, and then apply a method such as Relieff or mRMR to select the most relevant and non-redundant ones.
You can find simple and effective method of FS in my article: Dudek G.: Tournament searching method to feature selection problem. In: Rutkowski L. et al. (eds.): Artificial Intelligence and Soft Computing, LNCS 6114, pp. 437-444 (link below)
there are lots of algorithms to calculate optimal features. but unfortunately, it is a NP-hard problem. I have two simple heuristic algorithms to calculate one optimal feature and published in related chinese journals. they are wtried by chinese.
pixel sampling is one of the major role in feature selection,so search for algorithms related to that,try dense trajectories(pixel by pixel sampling method)
I think it is depending on task. You cant assume data will be correctly projected or selected by a linear method. As Dejellali Hayet said, method you want to use must be thought as a filter (or projector) which must be designed according to specific information you want to get from the process. It is, indeed, suggested you to understand linear methods (PCA, SVD, etc. as basics) and determine whether they are sufficient approach to your specific task (data driven). Conversely, if you determine something more is needed, nonlinear methods possibly can help you e.g. kernel PCA, Entropy, Renyi's entropy, Rademacher complexity, etc.
While there are several methods, as others mentioned, for feature extraction, you may be interested in reading a recent paper by Ditzler et. al. "A Bootstrap Based Neyman--Pearson Test for Identifying Variable Importance" - IEEE Transactions on Neural Networks.
The novelty of the approach presented in that paper is that it determines the number of most relevant features as well as the identity of those features. It uses any feature selection approach as a base-algorithm (such as a filter approach, using any of the information theoretic measures), but then uses a Neyman Pearson test to identify those features that are truly relevant. The approach works even if the original feature selection base algorithm picks more or fewer than the actual number of relevant features.
See http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6823119
Linear Discriminant Analysis is generally the easiest route to go. It generally shows a clear distinction between different features. This can be especially helpful when you do a 3 dimensional plot of the features and can manipulate the angle of viewing so you can see a clear distinction between those features.
If your data set, the number of its features is low, it is better to use the wrapper method and if your data sets the number of its features is much, it is better to use the filter method. In order to better understand the issue, you must read the following article.
t-test coupled with leave-one-out method can be used to verify the accuracy of the features selection procedure. Please see:
L Moraru et all. Optimization of breast lesion segmentation in texture feature space
approach, Medical Engineering & Physics 36 (2014) 129– 135
L Moraru et al. Optimization in Breast Lesions Detection via Integrated Statistical Approach, Journal of Scientific Research & Reports 2(1): 460-473, 2013