In some cases, I encountered different results from some normality tests. For instance, in one of my samples, Shapiro–Wilk normality test indicated that my data varies significantly from a normal distribution p
Someone once said that testing for normality prior to applying a T-test is like sending a life boat in a hurricane to help out a cruise liner.
Point being.....if sample sizes are large enough for tests of normality to be working accurately, then the central limit theorem is already insuring that the T-test will work properly.
If there are doubts about normality, then by all means use a nonparametric test, but don't bother testing for normality.....little of nothing is gained.
But if you insist here is a pretty definitive text on this topic:
Dear Turkay, as I worked with such data before, nowadays working with Kolmogorov-Smirnov is outdated. The best test for large data (n>10 ) is D'Agostino pearson as David J.Sheskin said in his book. But for small datasets (5
I prefer the ideas behind Snows penultimate normality test, which is implemented in the R package TeachingDemos (but can easily be ported to a statisticial package of your taste)
In general goodness-of-fit tests used to test the distribution for samples to test:
H0: data represents a sample from the theoretical distribution
Chi-square Goodness-of-Fit test is the most popular of all goodness-of-fit tests; it requires large samples, say at least 50 and preferably at least 100 data points. But Kolmogorov-Simirnov can be used for Goodness of Fit Test with Small Sample sizes.
Someone once said that testing for normality prior to applying a T-test is like sending a life boat in a hurricane to help out a cruise liner.
Point being.....if sample sizes are large enough for tests of normality to be working accurately, then the central limit theorem is already insuring that the T-test will work properly.
If there are doubts about normality, then by all means use a nonparametric test, but don't bother testing for normality.....little of nothing is gained.
But if you insist here is a pretty definitive text on this topic:
Dear Huda, unfortunately I forgot the reference to confirm my statement. But in my analysis experience, most of times i reach results far from other tests for Kolmogorov-Smirnov test.
I completely agree with John Kern. Normality assumption is not a dichotomous question: normality or non-normality. The real question is which degree of normality violation can disrupt the good functioning of a statistical model that requires normality (t-test, anova, regression model, etc). It is very well known that when sample size increases, and power increases, tests of normality are almost always statistically significant. I recommend to use exploratory data analysis to assess possible normality violations. Of course, small samples only could be managed with non-parametric models or bootstrapping methods, but in general, the problem with small samples is the intragroup variability and the lack of external validity.
I have to ask why you want to test normality. A test for normality will usually rule out normality well before these departures would cause problems for a t-test or ANOVA (or general linear model). My point of view has always been 'tests' are about addressing research questions, not assumptions.
Just to give you an example: In randomized control trials it is expected that the researcher will prospectively power there study, that is, select a sample size that allows statistical significance to coincide with a (minimal) clinically important effect. Nowhere in the calculations is there any powering of any assumption test. So the important question is:
Does a 'statistically significant' departure from normality (which may be over- or under-powered) coincide with a departure that would cause problems for your statistical tests?
Normality can be tested by different ways. Non parametric tests can be used for pilot studies and at research starting point e.g. In order to verify the normal distribution of variables and sub-variables, the researcher(s) should carried out Kolmogorov-Smirnov (K-S) Z test for all dependent and independent variables and sub-variables were tested for normality. If the significance level was more than 5 percent, normality was assumed.
Later on, when using regression you can use Histogram (Zresid histogram) and Normal Probability Plot to. Before doing regressions the main assumptions of regression are the following five tests, plus correlation test.
The coefficient of determination (R2) indicates the goodness and fitness of the model. The higher the R2, the better the independent variable(s) explain(s) that the variation in the dependent variable. The t-value indicates the significance of the relationships found. The main assumptions of regression are (Norusis, 1993, Berenson et.al, 2006 and SPSS 16.0, 2007):
1- Linearity test: States that the relation between variables is a linear relationship. To test the Linearity will depend on plotting of Studentized residual against the predicted value. When there is no relation between the predicted and residual values, in such case the model does not violate this assumption.
2- Independence of errors: States that the errors are independent from one another. Durbin-Watson test is used to test independence of errors, if D equals two, in such case the model does not violate this assumption.
3- Normality: Requires that the errors should be normally distributed at each value of X. To test the normality will depend on the histogram of residuals. If the shape follows the normal distribution, in such case the model does not violate this assumption.
4- Equal variance (homoscedasiticity): Requires that the errors are constant for all the value of independent variables. The equal variance is important for making inferences about βo and B1. To test the Equal variance will depend on the plot of Studentized residual against the predicted value. When there is no relation between the predicted and residual values, in such case the model does not violate this assumption.
5- Multi-Collinearity: Refers to a situation in which one or more variables are highly correlated. To test the collinearity, the VIF (Variance Inflation Factor) will be used. When VIF is less than 10, in such case the Collinearity model does not violate this assumption.
Finally, wide standard deviations may be used as an indicator for normality, but not as a test to confirm normality.
At the end: standardized skewness and the standardized kurtosis tests are preferred to be used in Science more than management.
It depends on different factors to select a particular stat. test and also there a criteria for every test which can fit your data. Generally we use parametric test for normalized data. We can use normalize the non-normalize data by different tech. like log transformation and then can resort to parametric test.
Before selecting any stat test, you shd know the drawbacks of that particular test. Every stat. test is not 100% accurate and is based on your logic and need. So, needs a great statiscal appraoch to select a specific test.
It is a matter of sample size, small sample size put you in the corner of non-parametric tests, but equally important the pros and cons of different non parametric methods.
Statistical methods include diagnostic hypothesis tests for normality, and a rule of thumb that says a variable is reasonably close to normal if its skewness and kurtosis have values between –1.0 and +1.0.
If the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead.(Reference: www.utexas.edu/courses/.../AssumptionOfNormality_spring2006)
Anther reference reports that for dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used. (Reference: . For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used.)