I'm looking for results about robustness or stability of feature selection algorithms. I know some results for LASSO associated with differentially private model selection. Are there any theorems, e.g. for MDR?
Just brief related comment: please think of Bayesian methodology of structure selection (also hidden under label hypotheses testing). This essentially balances model complexity, amount of processed data and predictive ability. The search through huge space of possible hypotheses then becomes an independent problem solvable by using randomly restarted local search.
Stability of variable selection is defined as the insensitivity of the features selection algorithm to variations of the training set.
There are three kinds of representations in which variable selection approach can indicate feature preferences: in fact, the different variable selection methods usually provide their outcomes on one of the following forms:
_ a weighting-scoring
_ a ranking vector
- an n-dimensional binary vector where each component is associated to a feature and its null or unitary value represents, respectively, absence or presence of a variable in the selected subset.
In order to evaluate the stability of a variable selection method, a measure of similarity for each of the three representations must be considered.
As already mentioned, the stability of a feature selection algorithm is usually assessed by measuring its insensitivity to variations in the training set: that is, if you apply your method on two datasets A and B, will it select the same subset of features [1] ?
With this regard a quite popular measure is the Kuncheva Stability Index [2]. Such measure consider also certain subtle issues like the correction for chance. Furthermore it has been used to measure the robustness of feature selection algorithm in adversarial environments.
[1] A Stability Index for Feature Selection
[2] Measuring the Stability of Feature Selection with Applications to Ensemble Methods
[3] Is Feature Selection Secure against Training Data Poisoninjavascript:g?
how to measure stability and robustness for feature selection? I think it can get from the results of machine learning,such as ROC.
Feature selection is a key for practical work. At present in research on Chinese medicine, this work mainly depends on experts, which I don't expect. Some statistical methods, machine learning methods and evolutionary computing methods are shown to be effective for feature selection.
In order to assess how robust is a feature selection method you could use the kuncheva stability index or the Shannon-Jensen criterion like we did in our paper, please see infinite feature selection from my publications. We analysed the stability of the ranking according to the number of available samples, some feature ranking techniques may suffer when reducing the amount of samples.
Moreover, if the goal is to study the ranking also the spearman's correlation coefficient can be used.
There are multivariate methods and statistic diagnostic test able to measure of stability and robustness for feature selection. Of course is important considering application requirements.