Hi Sajjad. Generally, when speaking about one-class classification, negative examples are not to be used for parameter tuning. This is due to the fact that, in a real one-class scenarios, parameter tuning using ROC measures is simply not possible by definition. If you have, however, a few negatives available, ROC values can provide useful information for parameter tuning. You should, however, keep in mind to strictly separate training and test data. This might be a problem when negatives are scarce.
Parameters other than threshold parameters can possibly be tuned using the (threshold-free) area under the ROC curve (based on the extra validation set).
One additional comment: be aware that negative selection algorithms might have problems in high-dimensional feature spaces and methods describing the positive data might be advantageous (or at least computationally feasable) in those cases.
Thanks a million for your beneficial tips. I have both negative and positive data but I'm interested in using just normal data(not abnormal). As you said maybe I should use ROC analysis.
And about the negative selection actually I agree because I've implemented one of the negative selection algorithm(V-detector) for a high-dimensional data set and it didn't perform well.
Hi Sajjad, I did something similar for my MSc thesis... I split high-dimensional data up into several subspaces, and used AUC (on both positive and negative examples, because these were available) to select the best subspaces for one-class classification. The selected subspaces were then used to train an ensemble, which outperformed the ensemble of all subspaces, without any selection. Hope this helps.
Yes, each subspace had a lower dimensionality than the original space, but there was no feature extraction involved (but you can see it as feature selection). Imagine I started with features numbered 1 to 10... I could split this up into 3-dimensional, overlapping subspaces: features [1 2 3], [1 5 6] etc (in a random way). Of these, I selected several subspaces based on the AUC performance on a validation set.
Sajjad, you might also consider testing out FLD dimensionality reduction instead of PCA. Sometimes, the results are significantly different. Yes, ROC should be a good measure even for a one class problem.
Hi Tanvi, I have to admit that got some difficulties in understanding your advice regarding dimensionality reduction. If the abbreviation FLD stands for Fisher linear discriminants, what classes would you attempt to separate in a one-class setting?
Wow, that was fast! You are right Michael, I forgot the one class issue when I mentioned FLD. I guess PCA would be the best solution here. Thank you for catching that!
I guess the problem with using the outliers as "class 2" may be that the outliers have nothing in common and it may be difficult to generalize them into one class.
I agree with Veronika. Even if you have access to the second class of outliers, it might be very ambitious to separate them from the target class based on a mere linear model, as outliers are usually imaged to surround the target class (more than on one half-space defined by a separating hyperplane).
Thank you all for all your advantageous piece of advise. I agree with Veronika and Michael too. In addition generating "class 2" is a challenging issue.