I'm in the process of creating classification models using a substantial dataset (approximately an MxM matrix). To enhance the performance of these models, I'm planning to conduct some feature selection as a preliminary step. It seems like a common practice to start with variance filtering, which involves eliminating variables X with var(X) that's close to zero. Given that my dataset contains variables with varying orders of magnitude, I'm unsure whether I should normalize the data [x - mean(x)] / var(x) before or after applying this variance-based feature selection.
For context, I'm aiming to build several models using a batch approach, which includes Logistic Regression, LDA, QDA, k-NN, Naive Bayes, Decision Trees, Random Forest, XGBoost, BART, among others.
I would greatly appreciate any insights into the optimal sequence for these preprocessing steps.