I created my first randomForest model (regression) in order to predict sensitivity to a particular drug in breast cancer cell lines. Total features were 35000 and I used default mtry for regression (N/3) and 30000+1 for ntree.
Then, I determined the optimal number of predictors that minimized the MSE error using the ‘rfcv’ function in the randomForest library (50 replicates with five-fold cross validation). This resulted in a selection of 23 predictors.
Then, I created a second randomForest model based on these 23 features in order to recompute variable importance values, using default mtry=N/3=7.
Results of this second RF model are very good but I noticed that if I lower the mtry parameter I can get a lower MSE error, reaching the best results at mtry=1, but that would be like using univariate decision trees and I'm not really sure if this is desirable.
Can someone kindly give me some advice?
Thank you