hi there, I'm looking for assessing the importance of different features within a mixed linear model, and to be able to tell statistically (i.e. to be able to talk about significant difference) which feature is more important than others. Thanks
To assess the importance of different features within a mixed linear model, you can use a combination of statistical methods. The following approaches can help you identify the most influential features and assess their significance:
P-values from Likelihood Ratio Tests (LRT):By fitting separate models with and without a specific feature (or set of features), you can perform LRT to compare the models and assess whether the inclusion of the feature significantly improves the model's goodness of fit. A lower p-value indicates that the feature has a significant impact on the model's performance.
Coefficient estimates and confidence intervals:Examine the coefficient estimates of your mixed model to determine the impact of each feature on the response variable. Confidence intervals for these coefficients can help you assess their statistical significance. If the confidence interval for a coefficient does not contain zero, it suggests that the feature has a significant effect on the response variable.
Information criteria:Comparing information criteria, such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), can help determine the most appropriate model (and thus, features) based on the trade-off between model complexity and goodness of fit. Lower values of AIC or BIC indicate a more parsimonious model.
Standardized coefficients:In some cases, it might be helpful to standardize the features to compare the magnitude of their effects. Standardized coefficients allow you to assess the importance of a feature based on its relative effect size compared to other features in the model.
Permutation importance:This is a model-agnostic method that involves shuffling the values of a single feature and assessing the resulting change in model performance. A greater decrease in performance implies higher importance for the permuted feature. This method is often used for nonlinear models or machine learning techniques, but it can also provide useful insights for linear models.
When interpreting the results and discussing the importance of various features, be sure to consider both statistical significance and practical relevance. Additionally, be mindful of potential multicollinearity among the features, which can affect the interpretation of their individual effects.
Ali Abedi Madiseh When you reprint a quotation from Chat-GPT or any AI, you should credit that source. Otherwise, people may assume that you wrote this yourself.
Ali Abedi Madiseh ha ha, smiley, ha ha :) ... So you assume people asking questions here are too dumb to use ChatGPT by their own? Okay, this might be the case occasionally (and some don't even know how to google, it seems), but to hold it as a general assumption is quite impolite, imho.
The problem with AI generated answers is that this is just text that is supposed to sound good, and there is no intention that these answers are correct.* That could be a side-effect, but one should not and can not trust in correctness. Sure, this is the case of other answers here, too, but that shouldn't be an excuse. The problem is just for those who don't know and seek for help get answers that sound very literate and smart and structured but that are delicately wrong, quite often and in many subtle points. So after all, this is more a mis-service.
And eventually, I'd say that someone just posting AI generated answers disqualifies him/herself as a scientist.
Jochen Wilhelm , some commentators may not even know what "of course" means. See their response to David L Morgan .
@Tomer, you asked about "importance." That will likely depend a lot of details that require more information and knowledge of the substantive area. Do you just mean not be being nil, or do you mean something else?
Tomer Oz's question made me wonder if anyone has developed a version of dominance analysis for multilevel models. A little digging took me to the Statalist discussion linked below, and it took me to the two resources listed below that.
Luo, W., and R. Azen. 2013. Determining predictor importance in hierarchical linear models using dominance analysis. Journal of Educational and Behavioral Statistics 38: 3-31. https://doi.org/10.3102/1076998612458319.
Snijders, T. A. B., and R. J. Bosker. 1994. Modeled variance in two-level models. Sociological Methods and Research 22:342-363. https://doi.org/10.1177/0049124194022003004.