1. Why do the residuals need to be normal when conducting multilevel modeling? 2. I conducted a square root transformation on my dependent variable to normalize the residuals. However, the data was still not normal. Is that a problem?
Hi Alex, one of the big problems with non-normality in the residuals and heteroscedasticity is that the amount of error in your model is not consistent across the full range of your observed data. When you think about your predictor variables, this means that the amount of predictive ability they have (i.e., as calculated in their beta weights) is not the same across the full range of the dependent variable. Thus, your predictors technically mean different things at different levels of the dependent variable. Not so good for interpretation.
Transforming the dependent variable can help to correct for this - but at the same time makes the interpretation of the overall model a little bit more opaque. You have to make the trade-off on what you are comfortable with here.
If the square-root transformation did not fully normalize your data you can also try an inverse transformation. The strength of transformations tends to go from 1. Logarithmic, 2. Square Root, 3. Inverse (1/x). See if that helps.
When the residuals are not normally distributed, then the hypothesis that they are a random dataset, takes the value NO.
This means that in that case your (regression) model does not explain all trends in the dataset. I guess, you don´t want unkown trends to remain in your dataset. I would feel uncomfortable with that, because this would mean that your model is not fully explaining the behaviour of your system.
Only solution is to find a model that fully explains the behaviour of your system. That means that you have to find a model, that shows residuals which are,... yes indeed, normally distributed.
Hi Alex, one of the big problems with non-normality in the residuals and heteroscedasticity is that the amount of error in your model is not consistent across the full range of your observed data. When you think about your predictor variables, this means that the amount of predictive ability they have (i.e., as calculated in their beta weights) is not the same across the full range of the dependent variable. Thus, your predictors technically mean different things at different levels of the dependent variable. Not so good for interpretation.
Transforming the dependent variable can help to correct for this - but at the same time makes the interpretation of the overall model a little bit more opaque. You have to make the trade-off on what you are comfortable with here.
If the square-root transformation did not fully normalize your data you can also try an inverse transformation. The strength of transformations tends to go from 1. Logarithmic, 2. Square Root, 3. Inverse (1/x). See if that helps.
from what I understand, normally distributed residuals are required since your are estimating the parameters of your model via maximum-likelihood estimation. To obtain these estimates, you have to make assumtions about the distribution of your residuals and this assumption is (in linear multilevel modeling) that the residuals are normally distributed. The logic behind this is the same as in (sinlge level) regression analysis.
But just as for a single level analysis you can make other assumptions (take for example logistic regression) you can also change the assumption of your residuals in multilevel modeling. You "only" have to define an alternative distribution of the residuals via generalized mixed models (for an applied example of these methods see the attached reference).
As to your question whether this is a problem: strictly speaking yes, because you violate a basic assumption of the model and you parameters might be biased. From an applied perspective it very likely depends on the degree of the violation.
Best,
Andreas
Article Reactivity to Stressor Pile-Up in Adulthood: Effects on Dail...
I don't believe it is generally true that the residuals need to be normal (it certainly doesn't follow from maximum likelihood estimation). If you have a multilevel generalized linear model then it depends on how you have set it up. The common common setup is a normal response with variance parameters that are also assumed to be normal. For other models the response might be assumed binomial or Poisson, but typically the variance parameters would still be modeled as a normal distribution. However, other models are possible - it just isn't very easy without a bit of extra work (e.g., switching to a Bayesian software that has more flexibility in say using t distributions).
I'm not aware of work that suggests that the normality assumptions is particularly important for multilevel models (as opposed to other regression models). You certainly want to avoid marked skew or kurtosis and consider transformations.
1 As others have stated it is quite common to model non-Normal distributions at level 1 using a discrete outcome model such as Probit/Logit/ Poisson and NBD model.
2 So I presume you are talking about higher level residuals which often assumed to be Normal so as to be summarized in a variance term. If they are not Normal, this estimate could be poor.
3 As usual this is the assumption of conditional normality - so that the assumption is that level 2 residuals are Normal taking account of what is in the fixed part - a well specified fixed part often works wonders.
4 (Consequently) if there is a notable outlier ( or indeed a set of outliers) it is possible to include fixed part dummies and thereby assume (and often achieve) that the rest of the higher-level residuals follow a Normal distribution - see this manual on Rgate for an extended example
5 There is some literature that suggests (unless there are marked outliers) you do not need to get worked up about the Normality assumption
eg Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter, Statistical Science, 2011, Vol. 26, No. 3, 388–402
6 There is software - eg Winbugs that allows different distributions for the higher level - eg a t distribution with fattened tails.
7 Finally it is possible to fit non-parametric distributions at higher level eg GLLAMM (in Stata) has the possibility to put in mass points; see these two papers for the use of this idea to get at latent trajectories in a growth model- that is discrete random effects and not a continuous distribution
You need to extract the degree of non-normality in the set of residuals and then diagnose the problem. Non-normality is not all that much of an issue when it comes to modeling if you have a large data set since your estimates will be unbiased. However, the problem usually arises due to inferences you will make upon the model you have estimated and since most of the tests are asymptotically normal in assumption, you run the risk of making errors at the inference stage. If you have a fairly high degree of NN, I would assume that there is some flaw to the underlying specification of your model or that you have misspecificed or even omitted some factor(s) that is(are) pertinent to the relationship you are trying to model. In addition, if you wish to optimize using your estimates, you may also run in to issues if your underlying residuals are non-normal. In that case, I would model the relationship using estimators designed for data that is explicitly NN like from the Weibull distribution, just as a suggestion (from family of NN distributions).