For classification if some m number of features are selected. some of these may be co-operative (good for classification) and rest may not. What is the statistical measure to discard rest of the features.
There is few validation technique to evaluate the performance,it is also depend on which kind of analysis you done. The most common is cross validation (includes hold-out, k-fold, or leave-one-out). Or you can use similarity measure (RAND index / JACCARD index), stability measure (Kuncheva index) , error rate and etc.
The validation technique of classifying object with variance of all dataset Leave-one-out cross-validation much appropriate where features are having more variances over dataset. Mostly, larger dataset are having such problem: Intrusion detection DARPA OR KDD CUP99. But posterior criteria for feature selection used at time of testing. For selection of feature it is advised to use wrapper based criteria, probability , Apriori and GA . But, I think SVM is much better among all selection techniques for feature selection.
SVM is much better among all selection techniques for feature selection. But it can be easily applied only for binary classification using MATLAB. Not seen for class M (more than 2 classes).
As Marko Tscherepanow suggests,resulting classification accuracy can be one of the criteria. But whether there are any redundant features, due to which unnecessary delays the result, or are there any confusing features which reduces accuracy is the real issue. I am also facing the same problem
I have also seen papers in deep learning that monitor the weight updates of neurons as a cue for goodness of representations, but not sure if it would be relevant to feature selection.
If you have a set of features, and want to use the useful ones, you have calculate the correlation between the features, and remove the features the are highly correlated, as having the same information in different features doesn't enhance the classification, hence it increases the computation time.
Try to use subset-based feature selection methods like rough reducts, CFS, etc. This will provide you with the useful feature subset and you do not need to worry about to discard some features further. The simplest measurement is the classification accuracy, or you may use ROC, etc.
I suggest to use ROC curves and cross-validation methods for this problem. Moreover, with some classification methods you can use the odds ratio and stepwise selection procedure too.
You can use Correlation based feature selection method. Alternatively you can use simple method based on Fishers Discriminant Ratio, or use some class separability measures like Scatter Matrices, Bhattacharya Divergence etc.
May be chapter-4 of below book will help you:
Book: An introduction to pattern recognition: A MATLAB approach
correlation between one predictor and the class label can not be solely considered as a measure for feature selection. two features might have synergistic dependency in predicting the class labels. for instance in XOR problem none one the variables alone is able to predict the output but when they are combined, they can predict the output perfectly.
Ultimately, to make an implicit assumption explicit: can you use the classification for making informed decisions in a, preferably critical, activity to achieve some significant goal? Forgive me for stating the obvious, but it is my experience that this is sometimes forgotten.
Dimension of data can be reduced using feature selection methods. Feature selection (also called feature subset selection) can be defined as selecting a subset of existing features without using a transformation (such as PCA). In large datasets, it is needed to reduce to dimension in order to decrease evaluation time. Generally, feature selection increases classification performance, because it gets rid of redundant data.
There are various feature selection methods which differ from each other based on their search strategies. Search strategies can be broadly divided into 3 categories.These are: exponential algorithms, sequential algorithms, and randomized algorithms. Exponential algorithms (such as; exhaustive search, beam search, etc.) evaluate a number of subsets that grows exponentially with the dimansionality of the search space. Sequential algorithms (such as; sequential forward selection, sequential backward selection, etc.) add or remove features sequentially, but have a tendency to become trapped in local minima. Randomized algorithms (such as; simulated annealing, genetic algorithms, etc.) incorporate randomness into their search procedure to escape local minima.
Based on your application, you may try some of those methods and see their effect on increasing classification performance.