Dependet variable. When your sample size is 30+ violation of normality is not a problem according to Tabachinick and Fidell, 2007, chap 4). You can thick the box normality curve when you draw a histogram. You obtain the histogram as a part of the Graphs menu in SPSS. Furthermore, in Explore you can use the Test of Normality from Kolmogorov-Smirnov's test where a non-significant value
If you want to see if the assumption of normal distributed errors is not too inappropriate, then you should check the residuals. Testing the hypothesis that the residuals are a sammple from a normal distributed population is usually not very sensible. Better have a look at some diagnostic residual plots, see if you find strange pattern and if these patterns are "strong" enough to possibly render the results from your model useless.
It is hard to get normally distributed residuals if the variables are not normally distributed.
The problem with biological data is that it is seldom normally distributed. People often invoke the Central Limit Theorem, but that only applies if the underlying distributions are equally weighted.
More generally, a failure to find a significant departure from normality is easily changed by increasing sample size.
So I will check for normality, and I will use a transformation that improves the fit as best I can. However, I do not obsess over this issue. Standard parametric tests are fairly robust to departures from normality. Also a failure to reject the null hypothesis (that the variable is normally distributed) is not sufficient to prove that the variable is normally distributed. It is very important to keep the assumptions of the model in mind. How you handle "success" or "failure" is more problematic.
The assumptions for the model state that the errors are normal. This theoretical assumption is checked by inspection of the residuals. If the errors are normal, then Y is normal too. It is the only random variable in the model other than the errors. It is a widely spread error to check Y for normality. Y can be non-normal while a check on the residuals indicates normality.
The Asymptotic Relative Efficiency of rank based tests often exceeds 85% against their parametric counterparts, which suggests to consider rank based alternatives if the sample size is too small to allow a meaningful (powerful) test for normality. I often start out by converting all continuous variables to normal quantiles (see Van der Waerder, BLOM, ...).
thank you for your respond, what i come to understand if i have normal distributed Y is ok that mean model is fine, if not i could check residual so if its normal it will be y is ok also. i am correct so the original to test y and if its not fine i could use residual ?
Abdalla, it usually does not make much sense to check the distribution of Y, because this depends on the predictors (groups, experimental factors, ...). If you have distinct groups (as compared to continuous predictors), you could check the distribution in each group individually. But the amount of data in each group will be small, making it difficult to judge the distribution. The problem increases strongly with an increasing number of predictors, and for a metric predictor you actually have only a single value in each "group" (i.e. for each value of the predictor) - so there it is impossible to get an impression about the distribution.
The general solution is to look at the distribution of the residuals directly and all together. In any case you get a better picture of the distribution, and it works for any kind of model (catergorical or continuous predictors, linear and non-linear effects, interactions, ...).
appreciated, but i not able to find any literature review to confirm the same, i need to keep reference to run such model could you advice any famous research regrading the same ?
I would suggest reading a textbook rather than searching for a research article. Read chapters on ANOVA and regression analysis. Sometimes there is a chapter on the analysis of residuals. Also try looking up residuals in the index. It should be a few hours of time well spent.
I tested my continuous data dependent variables only and it was not normally distributed for 3 variables. Then, tested dependent and independents and it becomes approximately distributed >.05.
@ Zuwaina: not clear what you are exactly doing, testing for the normality of a given variable should not depend on the fact that you are testing other variables or not, being them dependent or independant, unless
1) you apply a multiplicity correction like Bonferroni's method;
2) or you're testing the overall multidimensionnal normality of the random vector of all variables
3) or you're doing something else that is not clear in your message.
Case 2) seems pointless for multiple regression. Case 1 and 2 are clearly related to loss of power for a given variable, that could be explaining the change in p.
Anyway, without more details, impossible to give any real explanation about your findings.
@ Béatrice : Just to precise your answer to avoid any misinterpretation...
KS test is not appropriate if you have to estimate mean or standard deviation from the sample, which is most often the case, unless you use suited corrections such as in the Lilliefor's test. Quite often, it is then less powerful than the Shapiro-Wilks test or its variants.
And for completeness: testing normality on the raw dependant variable is a bad idea: if independant variables are indeed important, there is no reason for the dependant vriable to be Gaussian (in fact, it should not be in general). Test should be made on the residuals (and in that case a KS test vs a Student distribution with the appropriate degrees of freedom may be used).
what to do in spss if we have 04 independent variables and 01 dependent variable. Additionally we have sub constructs for each and every variable. in that case which test you will advise and what variables will be selected in case sharipo-wilk test is to be done. I have more than three items in almost every sub construct.
so far I didn't find conclusive idea about whether to use dependent or independent variable to test for normality.please anyone who can clarify this issue?@ Srijan Lal Shrestha@ Triinu Lukas @ Kevin D. Moore @ Alexander Kowarik @ Johan Erikson @ Kutlo noni Oratile @ Rustam Simanjuntak @ Marika Chrapava
Test of normality for independent variable is pointless, since there is no assumption on independent variable distribution. In fact, it can even be non-random.
Test of normality for dependant variable is useless, because the assumption is not on the distribution of the dependant variable, but on its *conditional* distribution when each independent variable is fixed.
The only interesting check of normality is on the residuals.
what is residuals and how to check normality of the residuals.When I watch youtubes about normality test they just enter dependent and independent variable and see the value of shipiro wilk test to declare normality of data?
Most software packages test the residuals directly. For example, in fitting a regression model with SAS (proc reg), the automatically generated diagnostic plots include a graph of the residuals, with a standard normal curve fit over the residuals to show how close (or not) a normal fit is.
You should read an introductory text on linear models and regression methods. Residuals is a basic concept in such a context and you should learn it before interpreting your results...
Basically, residuals are the difference between the value predicted by your model (for instance, the straight line in a simple linear regression) and the value you really observed (your data). They represent the unexplained part of your model, so should be only random noise if your model is correct. *All* model sanity checks involve residuals.