Hi, I am using a regularized XGBoost model. I have included industry and location dummies. The ML model, via regularization, is excluding a few city dummies and a few industry dummies, declaring them statistically insignificant contributors in the prediction.
Is there any literature that we cannot remove a few such dummies just based on the mathematics behind the regularization? We should have strong logic, says one of my labmates.
My personal opinion is that it should be just fine because we need significant predictors for feature selection purposes. Therefore, it should not have any side effects as we have in econometric modeling.
Any constructive suggestions are welcome from domain experts.
Regards,
Sahil