I am currently researching on Ensemble Multiclass Classification and Outlier detection in data mining.  I intend to use Tukey's method to detect outliers. To facilitate the ranking, I intend to compute LOF for each of the instances and then sort them  using the feature outliers as the primary key and LOF as the second key.  I will create classifiers using Random-forest. Using the predictive values of root mean square error (RMSE), i will generate a learning curve by removing the outliers and recording their RMSE and ROC values starting with the top most outlier. The process will continue until all the outliers detected by the Tukey method have been removed from the dataset. 

To establish which outliers to retain in the dataset after the experiments the threshold shall be all the outliers below the least value for RMSE value depicted in the experiment.

Similar questions and discussions