24 November 2023 0 9K Report

I am aimed to find out the diagnosis-relevant biomarkers by recurrence feature elimination(RFE), and selected random forest and logistic as models to be fit. But the number of selected features and the rankings of feature importance were completely inconsistent. In details, although the highest accuracy in logistic was higher, but the number of selected features in logistic were 3 times around more than that in random forest, and was close to the number of subject. So I am wondering that if logistic suffered from the problem of over-fitting or co-linearity? if it did, should I ignore the results from logistic even if the accuracy for logistic was higher?

More Hao Luo's questions See All
Similar questions and discussions