I want to compare the classification performance of Random forest with variable selection algorithm (method A) and Random forest only (method B). The original dataset has 100 variables. After using variable selection, method A used only 27 variables and achieve 89% overall accuracy while B is 87.2% overall accuracy using all original variables.

I used the McNemar's chi-square test and z value from McNemar's chi-square test z=0.7. Therefore, Method A and B is no significant difference. Is the statement that the variable selection algorithm can improve the classification accuracy with fewer variable not correct?

More Lien T.H Pham's questions See All
Similar questions and discussions