02 February 2014 3 4K Report

I created my first randomForest model (regression) in order to predict sensitivity to a particular drug in breast cancer cell lines. Total features were 35000 and I used default mtry for regression (N/3) and 30000+1 for ntree.

Then, I determined the optimal number of predictors that minimized the MSE error using the ‘rfcv’ function in the randomForest library (50 replicates with five-fold cross validation). This resulted in a selection of 23 predictors.

Then, I created a second randomForest model based on these 23 features in order to recompute variable importance values, using default mtry=N/3=7.

Results of this second RF model are very good but I noticed that if I lower the mtry parameter I can get a lower MSE error, reaching the best results at mtry=1, but that would be like using univariate decision trees and I'm not really sure if this is desirable.

Can someone kindly give me some advice?

Thank you

More Marco Bolis's questions See All
Similar questions and discussions