Use of parametric tests for not normally distributed data - central limit theorem?

11 November 2019 39 6K Report

Can we use parametric tests for data that are not normally distributed based on the central limit theorem, especially if we have a large sample size?

Jochen Wilhelm Popular answer

Both, 31 and 17354 are larger than 30...

And it makes a difference if the distribution is strongly skewed or not. Or if we talk about a binomial response with p close to 0 or 1.

Jochen Wilhelm

Yes.

However, it would be better to use a distribution model that better fits to the data.

Daniel Wright

Do you mean parametric test or test that assumes the residuals are normally distributed? I'll assume the later (since otherwise I don't understand the question), but you should clarify. How big a sample? There has been a lot written on how procedures like t-test have low power when there are outliers. A good review is:

Article How Many Discoveries Have Been Lost by Ignoring Modem Stati...

Osaid H. Alser

Daniel Wright Thanks Daniel for your reply, I meant tests that assume normality e.g. t-test or ANOVA. I had a discussion with a colleague and he states a sample that is larger than 30. I will take a look at the article - Thanks!

Jochen Wilhelm

Both, 31 and 17354 are larger than 30...

And it makes a difference if the distribution is strongly skewed or not. Or if we talk about a binomial response with p close to 0 or 1.

Osaid H. Alser

Jochen Wilhelm Thanks for your comment, so if you have a sample size of 200 and you want to compare the mean of two unpaired groups, do you use a t-test or Mann-Whitney test? Do you base your decision on the distribution i.e. draw a histogram or q-q plot?

Thom S Baguley

There is no finite n for which the central limit theorem (CLT) always applies, so the brief answer is that one can't assume that that inference will be accurate with statistical models that assume a normal distribution for the errors.

If the CLT applies (e.g., for averages of well-behaved distributions such as normal, binomial etc.) then the rate of convergence towards a normal sampling distribution depends on the shape of the distribution - being faster for symmetrical distributions and slower for distributions with heavy tails (for instance). So if the shape depends on the value of the parameter being estimated - as it does for say binomial or Poisson then convergence could be very fast or very slow.

e.g.,

- binomial proportion with mean = 0.5 requires only fairly low n to be approx normal

- binomial proportion with mean = 0.0005 requires only very high n to be approx normal

Daniel Wright

The "is 30 right" question has been asked before (https://www.researchgate.net/post/What_is_the_rationale_behind_the_magic_number_30_in_statistics). One of the problems is that outliers increase the standard deviation (because the residual is squared before being summed) more than the mean.

Daniel Wright

Just to show Thom S Baguley 's point, here is sampling 1000 people from almost normal distribution (99.8% normal, sd=1, .2% normal, sd=100) and the observed percentage significant is less than have the nominal value, thus showing the power is low. A point made by Fisher, Tukey, etc.

> g ps xs for (i in 1:1000){

+ x

Desmond Chekwube Bartholomew

Yes you can but it depends on the nature of the testing problem. With regards to central limit theorem, the testing problem for instance Test for randomness of data set converges to normal distribution while others may converge to chi-square.

Regards

Jochen Wilhelm

Osaid H. Alser, the point is that the MW-test does not "compare the means" (i.e., test hypotheses about mean differences). It does not even "compare medians", as many say. The only test I know that is about mean differences is the t-test. If you are interested in testing mean differences but the assumptions the t-test is based on really make no sense, you may bootstrap the null distribution of the mean difference. A sample size of 200 seems to be ok to go for a reasonably robust bootstrap approach.

According to the MW-test, I'd like to cite Ronán Michael Conroy's answer:

>>>

It's worth noting the actual title of Mann and Whitney's paper : On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other (1). That's exactly what it tests. In fact, if you divide U by the product of N1 and N2, this gives you the proportion of cases in which an observation from one sample is higher than an observation from the other sample.

t-tests are, in fact, pretty robust to non-normal variables (there's a big simulation literature on this). The real problem is that people who use the Wilcoxon Mann-Whitney don't understand what hypothesis they have just tested!

1. Mann HB, Whitney DR. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann Math Statist. 1947 Jan 1;18(1):50–60.

Thom S Baguley

Also worth clarifying, it is the sampling distribution of the statistic (e.g., mean) that is converging on normal under the CLT. The raw data won't change distribution.

Jochen Wilhelm

Daniel Wright I don't understand your example. Both "populations" have a mean of 0, so H0 is true under the simulation. I don't understand how you can discuss "power" in this context.

The distribution of the p-values under H0 is not uniform giving a more conservative test.

If the samples are generated including an effect:

Mohamedraed Elshami

If you are using a t-test and the sample size is at least 50, then yes.

David A. Jones

An alternative approach to the bootstrap mentioned briefly by Jochen Wilhelm is to do a permutation test based on the mean difference or on the statistic usually identified with the t-test. You just need to do the permutations to correspond to whether you have a paired or unpaired situation. If you can do enough permutations, the probabilities obtained for the null distributions are exact for any distribution of the data and for any sample size. But the interpretation of those probabilities may differ from the usual as they refer to different probability spaces. However they still provide a valid significance test.

Ronán Michael Conroy

Mohamedraed Elshami where did you get the magic number of 50?

Jochen Wilhelm

30 is cetrainly more magic than 50: it contains a 3.

Osaid H. Alser

Many thanks for your comments, really appreciated!

Daniel Wright How do you do a bootstrapping test using STATA? Do you mean this command: https://www.stata.com/manuals13/rbootstrap.pdf?

Kelvyn Jones

Here's another magic number, and mine is smaller than yours!

https://blog.minitab.com/blog/adventures-in-statistics-2/how-important-are-normal-residuals-in-regression-analysis

"The good news is that if you have at least 15 samples, the test results are reliable even when the residuals depart substantially from the normal distribution".

" For simple regression, the study assessed both the overall F-test (for both linear and quadratic models) and the F-test specifically for the highest-order term.

For multiple regression, the study assessed the overall F-test for three models that involved five continuous predictors:

a linear model with all five X variables
all linear and square terms
all linear terms and seven of the 2-way interactions ".

Jochen Wilhelm

I don't understand the experiment/simulation they have done.

5 predictor variables + square terms = 10 coefficients, leaving 5 d.f.. And this with serious violation of assumptions?

And what distributions did they use?

Apart from that it's not the problem that the nominal type I error would not be met. If you have just one residual d.f., you end up with the fact that any (proper) test keeps the nominal type I error rate -- but the power is miserable! This is nothing they mention. They only mention that "there is a caveat if you are using regression analysis [of data clearly violating the assumptions] to generate predictions." Now what? Hooray because the type I error rate is kept but no power and no predictions. Sounds like a bad deal.

(Kelvin, that's not against you. I am just pointing out that such recommendations as given in the link you posted are stupid, and I know that you know that this is stupid. I am afraid that other readers might not)

Khaled Almaz

Doctor Mohamedraed Elshami ,,,

There are 2 points regarding your answer:-

1 - Regarding sample size is 50 , you sir may mean about FA & PCA (check power point attached).

Please check

What is the sample size for EFA?

https://www.researchgate.net/post/what_is_the_sample_size_for_EFA

2- Regarding T test / Mann-Whtiney Test (Please check 2 screenshots)

Please check Essential Medical Statistics textbook by Betty Kirkwood, Jonathan Sterne page 56 (Chapter 6)

Table 6.1 Recommended procedures for constructing a confidence interval. (z0 is the percentage point from the

normal distribution, and t0 the percentage point from the t distribution with (n 1) degrees of freedom.)

, and page 345 (Chapter 30)

Table 30.1 Summary of the main rank order methods. Those described in more detail in this section are shown in

italics.

Also check

https://en.m.wikibooks.org/wiki/Statistics/Testing_Data/t-tests

Ronán Michael Conroy

Mohamedraed Elshami None of the references you cite actually presents a reason for the magic number, and they look more like cookbooks than research. There's a lot of that about, to be fair – simple answers to complicated questions. The book you show extracts from has some odd notions, such as that the Mann-Whitney test is an alternative to the Wilcoxon. It is exactly the same as the Wilcoxon, which is why it gives identical results. It also doesn''t actually tell you the hypotheses you are testing (kind of important if you are to run a test!). Not impressed.

There's a lot of literature on the performance of the t-test (which is a simple OLS regression with a binary predictor) and I've never come across 50 as being a threshold for anything. Can you point me to where this figure is actually calculated or to simulation studies?

And as for EFA, sample size is heavily dependent on the number of variables, so there's nothing magic about 50 here either.

Bruce Weaver

Jochen Wilhelm & Kelvyn Jones :

Re that Minitab Blog post, I wonder if the author(s) meant 15 observations per variable in the model? I tried to look at the two white papers mentioned near the end of the post, but both links are broken. I've written to Tech Support at Minitab to see if they can provide links that work. Will post them here if I receive them!

Cheers,

Bruce

Jochen Wilhelm

Thank you for your efforts, Bruce Weaver

Kelvyn Jones

Bruce Weaver , your suggestion seems much more plausible. Thanks.

Amani Almohaimeed

An obvious concern of using parametric tests whether there are any harmful effects of misspecification. This problem can be avoided by estimating this distribution through the use of the Nonparametric Maximum Likelihood technique or by using a transformation to transform the non normally distributed data to normal or close to normal.

Bruce Weaver

I just received this from Tech Support at Minitab. Emphasis added.

"I’ll pass that error [i.e., broken links to the white papers] onto the web team. In the meantime, all the white papers including the two on Regression are located here:

https://support.minitab.com/minitab/19/technical-papers/

And the 15 refers to the number of rows of data, regardless of the number of variables you have. Whenever you run regression, all columns must have the same number of rows, and that number should be at least 15."

I am astonished by that claim, and don't believe it for all the reasons Jochen Wilhelm listed in his Nov 29 post.

Details (such as they are) about the simulations for multiple regression are in Appendix C of this document:

https://support.minitab.com/en-us/minitab/19/media/pdfs/translate/Assistant_Multiple_Regression.pdf

HTH.

Jochen Wilhelm

Scary, really.

Cyril Iaconelli

Quick answer is no.

Question is, why your data are apparently not normally distributed even with a large sample size ?

Does it come from a graphic observation, a test (by which one...?)...

Prior testing your data, you may "pre treated" them with some standard function and see if you get normality.

Such as : log, sqrt...

It would give another information to you about your data dispersion and you could test your pretreated and normally distributed data.

Regards

David Eugene Booth

The 30 comes from the fact that the tabulated t-tables tend to stop at 30 df. This has nothing to do with the issue in the question but that's where it comes from. For a little history of this see this link: https://www.google.com/search?q=n%3E30+implies+the+central+limit+theorem+history&rlz=1C1CHBF_enUS874US874&oq=n%3E30+implies+the+central+limit+theorem++history&aqs=chrome..69i57.33394j0j1&sourceid=chrome&ie=UTF-8

The history of Statistics, Mathematics and the Sciences in general is quite interesting. I recommend it to you. Best wishes, David Booth

@Bruce Weaver Thanks for mentioning the nonparametric maximum likelihood approach. I had no idea that it existed. For any others like me here's a link:

https://www.google.com/search?rlz=1C1CHBF_enUS874US874&ei=890jXsbDJJXJtQbO0YvIDQ&q=nonparametric+maximum+likelihood&oq=nonparametric+maximum+lik&gs_l=psy-ab.1.1.0l3j0i22i30l7.867702.885655..894311...1.2..4.1015.10832.8j8j6j0j1j1j5j2......0....1..gws-wiz.....6..0i71j33i160j33i10j0i362i308i154i357j0i67j0i273j0i131.eXFLGOhxiyQ

David Eugene Booth

Jochen Wilhelm and Bruce Weaver Yes it is scary but software vendors can often say things like that. I have seen similar stuff from others. Best, David I have also heard colleagues say since Microsoft is a big, Famous company Excel can't have errors. Well ….. Best, David

Muhammad Ali

Parametric models for non normal data or in other words non-linear statistical models for density estimation problems. In such scenarios, you may use Statistics on Stiefel & Grassmann manifolds : Book Statistics on Special Manifolds. Lecture Notes in Statistics

Another good Book is Directional statistics defined on non linear data: Chapter In Directional Statistics

For Aspects of multivariate statistical analysis, also see Book by Muirhead :https://www.booktopia.com.au/aspects-of-multivariate-statistical-theory-robb-j-muirhead/book/9780471769859.html?source=pla&gclid=EAIaIQobChMIu4-n8fqO5wIVSgwrCh2gOAEBEAQYASABEgJ_AfD_BwE

G Srinivasarao

Yes.

Chukwudi Obite

The central limit theorem tells us the data should be approximately normal for large sample. If your data is still not normally distributed for large sample, I suggest you use the non parametric equivalent for the required parametric test

Robert Ryley

There are 2 trade-offs to consider when assessing any procedure from a frequentist POV:

1. Robustness for validity. Will the t--test falsely reject the null at a higher rate than the pre-specified alpha? In general, the t-test *is* robust for validity in that the type I error rate remains near the nominal alpha when assumptions are false.

This is the "robustness" that secondary sources on research methods tout when they defend the general use of the t-test for non-normal data.

2. Robustness for efficiency: What these proponents fail to consider is that the WMW (Wilcoxon-Mann-Whitney) independent samples test is nearly as efficient as the t-test under normality (aprox. 95.5%), *at worst* 86.5% efficient (ie. distributions with thinner tails than the normal), but it can be *infinitely* more powerful in certain cases of heavy tails or skew.

I've read a number of papers simulating these results, and they all come to the same general conclusion that asymptotic analyses conducted in the 1940's and 1950's also discovered.

It is very hard to beat the simplicity of the Wilcoxon test without either making an assumption (ie. Bayesian methods), or peeking at the data (Robust or Adaptive Methods).

R. Clifford Blair discusses this debate in a historical context in this link.

https://digitalcommons.wayne.edu/jmasm/vol3/iss2/22/

Also worth looking up are papers by Shlomo Sawilowski.

Jochen Wilhelm

Robert Ryley ,

you say that, in case of skewed distributions, the power of the MWM test is higher than that of the t-test. But the MWM test tests a different hypothesis than the t-test. How can you comper the power? Isn't this is like comparing apples and peaches?

One might argue that both test the same hypothesis (zero expected difference) under the assumption that the distributions can only differ by a location shift. In this case the power of MWM can be (much) higher for skewed distributions. But if the distributions are skewed, the effect to test is almost never a location shift. I would be thankful for having a single practical example of a variable with a skewed distribution where the relevant effect is a pure location shift.

Robert Ryley

The MWW test the stochastic ordering assumption -- to what degree are values in group X > Y? The real difference is the parametric t test makes a scale assumption (data are interval or ratio), while the MWW only makes the assumption that the data can be ordered.

We can always convert an effect size from one model to another by multiplying by the appropriate scale.

We can also see the decision result under both procedures conditioning on the same data set. That is how the simulation studies work. We specify a distribution, draw samples from that distribution, then see how power and alpha behave empirically.

I don't see the problem with comparing the procedures at all, and neither do the hundreds of papers that compare the 2 via simulation.

FWIW -- I think mean differences are used in circumstances when the MWM/proportional odds model would be a more appropriate choice.

Misconceptions Leading to Choosing the t Test Over the Wilcoxon Mann-Whitney Test for Shift in Location Parameter

https://digitalcommons.wayne.edu/coe_tbf/12/

Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means When σ_1^2≠σ_2^2

https://digitalcommons.wayne.edu/jmasm/vol1/iss2/55/

Article Wilcoxon–Mann–Whitney or t-test? On assumptions for hypothes...

Jochen Wilhelm

The problem I see is the following:

P(X>Y) > 0.5 (MWM significant) and at the same time, for the same data, E(X-Y) > 0 (t-test significant). Now if you rejet "some H0", what do you conclude?

PS: Just because something is said or written or done very often does not make it correct, and this is never a reason to give credit. How many stats books do you find where "probability" is defined as a limiting relative frequency? In how many is written that failing to reject H0 means to accept H0? These things don't become correct just because they are repeated so often.

Robert Ryley

Jochen Wilhelm You wrote:

Quote:

P(X>Y) > 0.5 (MWM significant) and at the same time, for the same data, E(X-Y) > 0 (t-test significant). Now if you reject "some H0", what do you conclude?

In principle, shouldn't you should pre-specify what test to use before seeing the data, or pre-specify a protocol for an adaptive/robust test?

In reality, there is never a reason to do both tests as far as I can tell. If I were presented with the results of conflicting tests, I'd favor the MWW over the T test. I'd also prefer to see the actual p value, rather than "reject/fail to reject." An actual estimate would be best.

If your argument is that these problems/questions are better placed in an estimation framework, you would be in excellent company.

I only objected to the idea that these procedures were not comparable because the hypotheses were different. Both procedures map scientific hypotheses to number systems (reals for the t-test, naturals for the MWW). Both systems share the ordering assumption, but differ on the idea that the scale is equally spaced.

Where you can do a t-test, you can also do a MWW. Do you disagree with this? I'm not exactly sure what your criticism entails.

The most you can conclude from your example (or any hypothesis test, for that matter) is that, (from a frequentist perspective), based on the data, the researcher would reject the null model of no effect -- ie. the data are compatible with the existence of a discernable effect.

What that "rejection" entails behaviorally is context sensitive, and outside the realm of the operating characteristics of the decision procedures.

Badges
Science topic

Similar topics
Mathematics
Statistics

More Osaid H. Alser's questions See All

Survey of the public regarding bleeding control?

Could anyone provide me with a propsed survey on public awarness regarding bleeding control in mass-casualty incidents?

03 April 2018 3,039 1 View

Any short & free online course to learn R programming?

I need a short online course to learn R programming within a few weeks. Thanks!

08 September 2017 9,967 7 View

How do the ranking agencies/websites rank medical schools in terms of research contribution?

Do they depend on the total number of publications? Quality? PubMed-indexed/not?

07 August 2017 5,023 2 View

Does anyone have a well-summarized source for medical teaching methodology?

07 August 2017 4,106 11 View

The pulse may vary in volume between the two arms e.g in case of aortic dissection, but is there any cause for pulse variation in its rate?

Cardiology, Physiology, Internal Medicine

09 October 2016 6,014 9 View

Is there any evidence for anticipating eye color inheritance in the offspring using this chart?

Genetics, Ophthalmology, NIH, PubMed, Cochrane, EBM, Evidence

02 March 2016 10,069 1 View

How to write a case report in a proper way? what are the most commonly used journals by medical students to publish their papers?

Case report, EBM, Medical students, PubMed, BMJ ..

02 March 2016 7,942 0 View

Which is more accurate definition of projectile vomiting?

Which is more accurate definition of projectile vomiting? (not preceded by nausea or forcefully ejected & to a large distant ) History, Examination, Definitions, Medicine, Gastroenterology

01 February 2016 6,223 2 View

Could anyone suggest a free-online & reliable website to publish a survey questionnaire, please?

Research, Questionnaire, Survey

01 February 2016 1,139 1 View

What is the target plasma glucose level for proper glycemic control in adult with diabetes, any consensus?

What is the target plasma glucose level (fasting/preprandial/postprandial/HbA1C) for proper glycemic control in adult with diabetes, any consensus?

11 December 2015 4,556 6 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Please can anyone support with the survey questions based on RQ measures and propose how to do it in FMCG industry and include as well the role of brand equity Thanks

06 August 2024 949 0 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

In what part of the brain´s rabbit bdnf is located and include references please?

I´ve been unable to find specific information about this neurotrophin in the CNS of rabbits exclusively. There is extensive info in mice, fish and rats, but in brain´s rabbit is hard to find....

04 August 2024 762 1 View

Which distribution type should I use when calculating the average particle size from TEM image? and how to calculate the error ?

average particle size calculation from TEM

04 August 2024 2,921 1 View