You could invoke a search method, such as "Best First", "GeneticSearch", etc., then use "Correlation-based Feature Subset Selection", Wrapper method, etc. in order to evaluate the worth of the selected feature subset. Google for the associated references.
we evaluated the performance of different classifiers in combination with various feature selection methods in the context of NMR-based metabolomics. Just have a look at the attached publication.
Article Performance Evaluation of Algorithms for the Classification ...
This is quite a general question. Do you have a particular problem on which you want to apply feature selection ? The type of method to use may depends on many factors. Is it for regression or classification ? 2 or more classes ? Noisy data ? highly redundant data ? time-series or not ? ...
Anyway, here is a nice introduction to feature selection :
Anyway you can also find some famous methods simply on wikipedia :
https://en.wikipedia.org/wiki/Feature_selection
Concerning supervised selection, if you don't have a lot a data, you may take a look to filter methods, such as minimal redundancy maximal relevance (mrmr). If you have a very large amount of data and not too much features, an embedded method or a wrapper, such as recursive feature elimination with an svm (embedded), regularized tree (embedded), or a genetic algorithm (wrapper) may be more relevant.
If you are coding in matlab, here is a feature selection library with quite various methods :
After reading my question again, I think I can be more specific. My label only has two classes. Number of samples is 200, and number of features is 15000. So I am interested in feature selection, and not feature extraction. I do not know if this additional information will change your answers a little bit?
We can broadly classify feature selection algorithms into two main categories, namely, wrapper methods and filter methods.
Filters methods study the relationship between the features in order to obtain a ranking of your features before use the top of them into your classification method. They include correlation criteria, Mutual Information, Fisher criterion, etc.
Wrapper methods use prediction performance as an input to select the best set of features that give the best prediction performances. The model is wrapped on search algorithm which will find the features subset which gives the highest score for classification in your case. It includes ROC curve assessment, mRMR, Fisher linear discriminant, etc.
A good review on feature selection approaches is addressed in Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28.
More generally, you have to keep in mind that the feature selection is about two topics the relevancy of your features and the redundancy. In your specific case (1500 features), I would suggest doing a PCA first. In fact, the PCA can be used as a feature selection approach. It will you tell you which features are linearly dependent or independent. (for PCA see https://stats.stackexchange.com/questions/27300/using-principal-component-analysis-pca-for-feature-selection)
Van Der Maaten, L. , Postma, E. , & Van den Herik, J. (2009), Dimensionality reduction: A comparative, Journal of Machine Learning Research, 10 (1-41), 66–71.
Sorzano, C. O. S. , Vargas, J. & Pascual-Montano, A. (2014). A survey of dimensionality reduction techniques, Cornell University Library Abstracts (1–35).
In our work, we included a supervised method in the overall structure of the classifier that selects which features offered higher accuracy rates on a electromyography signal pattern recognition to predict humam movement. I believe this is exactly what you are looking for.
Our paper is avaliable @ http://ieeexplore.ieee.org/document/8036844/
A collection of very simple techniques to begin with (e.g.: computing the chi-squared statistic between each predictive attribute and the class atribute) can be found in the following paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.9956