Normality test using SPSS?

Michael Teng Loong Ing @Michael_Teng_Loong_Ing

12 December 2016 16 4K Report

Currently, I'm screening my data. I am aware that I need to do normality test before I proceed further. My question is, am I suppose to check the normality for univariate and multivariate?

Alexander Egoyan

Hi,

You can use GLM univariate test in spss if you have one variable or GLM multivariate if you have two or more variables.

For more information see the link below

https://www.researchgate.net/post/How_can_I_cary_out_bivariate_or_multivariate_normality_test

Hope this helps.

Shayan Mostafaei

Michael, you can use one sample Kolmogorov–Smirnov or shapiro wilk test for assessment of normality assumption for univariate, but for multivariate normal testing, according to skewness and kurtosis measures, Cox-Small, Smith and Jain's adaptation, Mardia's test can be used.

Mohd H

It also depends on what is the objective and statistical analysis that you intend to apply on the data.

Mohamed A Elkoushy

Select "Analyze -> Descriptive Statistics -> Explore".

A new window pops out.

From the list on the left, select the variable "Data" to the "Dependent List".

Click "Plots" on the right. A new window pops out. Check "None" for boxplot, uncheck everything for descriptive and make sure the box "Normality plots with tests" is checked.

The results now pop out in the "Output" window.

For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used.

If p-value is >0.05, we can reject the alternative hypothesis and conclude that the data comes from a normal distribution.

Özer Yılmaz

This situation will vary according to the method you use. If your analyse requires multiple normalty (such as SEM), you should look both univariate and multivariate normality.

Khalid Hassan

You can perform the test for data distribution for normality by using Shapiro-Wilk test in SPSS, which widely used for this purpose, also you can test normality by plotting your data or use the measures of skewness and kurtosis from the descriptive statistics.

Good Luck

Mehmet Sinan Iyisoy

I beg to differ but generally you do not need to check normality of your data. Instead you need to check normality of residuals. Some methods like linear discriminant analysis might require/perform better when multivariate normality of data though.

David A Rowe

Be cautious of using significance tests of normality. In small samples where nonnormality is more likely, you may be underpowered to detect it (Type II error) In large samples, where data are more likely to be normal, you may be overpowered (Type I error).

Timothy A Ebert

Mohamed stated "If p-value is >0.05, we can reject the alternative hypothesis and conclude that the data comes from a normal distribution." This is not the appropriate phrasing. A better choice would be: If p-value is >0.05 then we cannot conclude anything. Being unable to either reject the null hypothesis or conclude that the null hypothesis is true we will behave as if the null hypothesis was true.

A failure to reject the null hypothesis is not equivalent to proving that the null hypothesis is true. There are no exceptions to this rule.

Be careful what you are testing. Do all the variables in your statistical model have to be normally distributed, or just the residuals? See Mehmet's response.

David's answer seems to say that as sample size increases there is a greater probability of rejecting the null hypothesis when in fact the null hypothesis is true. This is false. What is usually argued is that with increasing sample size you are more likely to find a statistically significant effect that is so small that it is of little or no practical value. This is true. The problem is then to quantify what is of little or no practical value.

Also be aware of issues like p-hacking. If you have a response and 80 potential explanatory variables then it is very likely that you will find at least one that is statistically significant at the 0.05 level or lower. The distribution peaks at 4 significant variables where there is a 20% chance of finding 4 significant outcomes if the null hypothesis is true. If the 0.05 level is a sacred number then we could argue that you need at least 7 significant variables because there is a 10% chance that 6 or more variables are significant while only a 4.6% chance that there are 7 significant outcomes. So, at my discretion I will remove six of your "significant" findings. Does your manuscript still carry any weight?

It is very likely that you will find more than one. So in your manuscript, what happens if a reviewer crosses out two or three variables at their discretion. Does the manuscript still work?

Mohammad Ali Khasawneh

Histograms, normality plots with tests (Shapiro-Wilk and Kolmogorov-Smirnov), skewness, kurtosis, ...are all used to check the normality assumption which is usually required for parametric analysis.

Ghazi Al-Khateeb

Actually there are several methods used to check the normality of the data. Graphical methods can certainly be used, in addition to other methods including: D'Agostino's K-squared test, Jarque–Bera test, Anderson–Darling test, Cramer–von Mises criterion, Lilliefors test, Kolmogorov–Smirnov test, Shapiro–Wilk test, and Pearson's chi-squared

Nigus Bililign

QQ/PP plot, histogram with normality curve, skewness, kurtosis..

Tim Kambič

I totally agree with dr. Rowe. I suggest you to check the normality of distribution with at least two methods (Shapiro-Wilk`s test and with histograms, Q-Q plots and/or coefficients of skewness and curtosis). If you get the same results from at least two methods that could be indication that your variables are normally distributed. Best regards!

Diksha Rohra

Shapiro Wilk works only for small samples - is that true?

Khalid Hassan

@ Diksha

Please, read in the following link:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693611/

Good luck

Joanne Yim

Another way of obtaining Multivariate normality is testing for Mardia's coefficient. This can be obtained via the online software

https://webpower.psychstat.org/wiki/tools/index

Does normal distribution data has outliers?

Smart Pls, indicator with Outer loading >0.4, acceptable?

How do I identify outliers in Likert-scale?

What is the acceptable number or outliers in a research?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?