For one-class classification problem optimal set of features can be selected on the basis of statistical properties of features( correlation analysis can be used) or any simple optimisation technique e.g. Dynamic Programming, GA can be used.
A classical method in One-Class-Classifier is usually to reduce the Intra-class-distance of you object in your M-dimensional feature space. A measure which is appropriate is e.g. DBSCAN-clustering among others. Furthermore, you can apply a feature selection procedure (see below) and then check whether you cluster radius is shrinked, and so on.
Paper:
Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction.
Authors: Christian Bayer, Olaf Enge-Rosenblatt, Martyna Bator, Uwe Mönks
If you want to rank your features you may use filter method of feature selection. This includes F-ratio, T-score and mRMR methods, etc. Among these I have found mRMR method is the best one to rank the features as it uses mutual information criterion. For details you may see the paper of Hanchuan Peng, Fuhui Long, and Chris Ding, "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005.
But the question of applying it to one class data set is difficult. If you have few samples of other class you may consider it as a binary classification problem and easily apply this method. If this is not possible an evolutionary method may be employed to rank the features with the aim that inclusion of a feature leads to the smallest increase of the radius of minimum bounding hyperplane that encloses all the samples.
I would like to add a gneral comment on feature selection which I have stated in several blog.
There are many feature selection methods available like LDA, Fisher's Disriminant with Rayleigh coefficient, Intra-class-Minimizers, etc. What is usually not working is PCA! Why? PCA tries, based on a gaussian process (assumption), to measure the variance and sort the eigenvalues which are proportional to the variances in decending order. The assumption is that the main Eigenvalues (EWs) contains most of the information and therefore, we use the main components (EWs) for data reduction. So far so good. But: Applying this approach for feature reduction is risky because this assumes that the feature in itself is stable (invariant) AND the feature's variance contains all information for classification.
This assumption is wrong!
Real-world features contain artefacts (noise, etc.) and therefore, PCA generates in its main components EWs with the highest variance which are related to noise. Hence, you generate "new" features which are not stable.
Iff you can prove that your features are completely artefact-free and the features underly a gaussian process, then PCA might work - in all other cases PCA is, as I said, very risky.
Classical statistical methods to rank features in classification are Greedy forward selection Variable, Mutual information based, Backward elimination , Metropolis scanning / MCMC , penalized logistic regression etc.
Apart from other optimization techniques, DPSO ( Discrete Particle Swarm Optimization) is also used recently by some researchers and they are getting good results.
Hi, For ranking purposes in order of their importance (not compressing features into a new but lower dimension vector as PCA does) you can try Sensitivity Analysis. I do SA easily in NeuroSolutions 5.05.
SVDD is particularly used for describing the boundary of one-class data. There are some different kernel functions can be used, such that you can get different boundary of data for classification.