Currently, I'm screening my data. I am aware that I need to do normality test before I proceed further. My question is, am I suppose to check the normality for univariate and multivariate?
Michael, you can use one sample Kolmogorov–Smirnov or shapiro wilk test for assessment of normality assumption for univariate, but for multivariate normal testing, according to skewness and kurtosis measures, Cox-Small, Smith and Jain's adaptation, Mardia's test can be used.
From the list on the left, select the variable "Data" to the "Dependent List".
Click "Plots" on the right. A new window pops out. Check "None" for boxplot, uncheck everything for descriptive and make sure the box "Normality plots with tests" is checked.
The results now pop out in the "Output" window.
For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used.
If p-value is >0.05, we can reject the alternative hypothesis and conclude that the data comes from a normal distribution.
This situation will vary according to the method you use. If your analyse requires multiple normalty (such as SEM), you should look both univariate and multivariate normality.
You can perform the test for data distribution for normality by using Shapiro-Wilk test in SPSS, which widely used for this purpose, also you can test normality by plotting your data or use the measures of skewness and kurtosis from the descriptive statistics.
I beg to differ but generally you do not need to check normality of your data. Instead you need to check normality of residuals. Some methods like linear discriminant analysis might require/perform better when multivariate normality of data though.
Be cautious of using significance tests of normality. In small samples where nonnormality is more likely, you may be underpowered to detect it (Type II error) In large samples, where data are more likely to be normal, you may be overpowered (Type I error).
Mohamed stated "If p-value is >0.05, we can reject the alternative hypothesis and conclude that the data comes from a normal distribution." This is not the appropriate phrasing. A better choice would be: If p-value is >0.05 then we cannot conclude anything. Being unable to either reject the null hypothesis or conclude that the null hypothesis is true we will behave as if the null hypothesis was true.
A failure to reject the null hypothesis is not equivalent to proving that the null hypothesis is true. There are no exceptions to this rule.
Be careful what you are testing. Do all the variables in your statistical model have to be normally distributed, or just the residuals? See Mehmet's response.
David's answer seems to say that as sample size increases there is a greater probability of rejecting the null hypothesis when in fact the null hypothesis is true. This is false. What is usually argued is that with increasing sample size you are more likely to find a statistically significant effect that is so small that it is of little or no practical value. This is true. The problem is then to quantify what is of little or no practical value.
Also be aware of issues like p-hacking. If you have a response and 80 potential explanatory variables then it is very likely that you will find at least one that is statistically significant at the 0.05 level or lower. The distribution peaks at 4 significant variables where there is a 20% chance of finding 4 significant outcomes if the null hypothesis is true. If the 0.05 level is a sacred number then we could argue that you need at least 7 significant variables because there is a 10% chance that 6 or more variables are significant while only a 4.6% chance that there are 7 significant outcomes. So, at my discretion I will remove six of your "significant" findings. Does your manuscript still carry any weight?
It is very likely that you will find more than one. So in your manuscript, what happens if a reviewer crosses out two or three variables at their discretion. Does the manuscript still work?
Histograms, normality plots with tests (Shapiro-Wilk and Kolmogorov-Smirnov), skewness, kurtosis, ...are all used to check the normality assumption which is usually required for parametric analysis.
Actually there are several methods used to check the normality of the data. Graphical methods can certainly be used, in addition to other methods including: D'Agostino's K-squared test, Jarque–Bera test, Anderson–Darling test, Cramer–von Mises criterion, Lilliefors test, Kolmogorov–Smirnov test, Shapiro–Wilk test, and Pearson's chi-squared
I totally agree with dr. Rowe. I suggest you to check the normality of distribution with at least two methods (Shapiro-Wilk`s test and with histograms, Q-Q plots and/or coefficients of skewness and curtosis). If you get the same results from at least two methods that could be indication that your variables are normally distributed. Best regards!