Which normality test is preferable for small data sets (<10 observations)?

15 July 2013 63 368 Report

There are several commonly used normality tests. I wonder what do you suggest is optimal for small data sets?

Murali Dhar Popular answer

In case of small data sets, a test of significance for normality may lack power to detect the deviation of the variable from normality. Therefore, I advise to take a subjective route looking at the two things; one, what literature contains about the normality of the variable under consideration, and two, look at the descriptive statistics, namely, mean, median, mode, range, and quartile deviation.

Budi Hari Priyanto

Wilk-Saphiro test

J. Patrick Kelley

Good question. Unfortunately, just as for other statistical tests, the chance of detecting an effect if an effect is present is greatly reduced by such small samples.

But, are you asking this so that you can decide whether or not to transform your data prior to some other analysis? If so, remember that many statistical tests actually do not require normality of the raw data. It's the normality of the model residuals that you're most concerned about, since this tells you if the model is explaining the distribution of your data or not. In some cases, in order to improve residual normality, you may need to resort to data transformations. But, I would say be very careful of these as well.

Igor Shuryak

Thank you, Budi and J. Patrick!

Budi: could you please clarify why you suggest this particular test?

J. Patrick: One of the reasons why I am interested in this is whether or not I should choose parametric or nonparametric bootstrapping to estimate confidence intervals for the parameters of a model (e.g. nonlinear) fitted to the data set.

J. Patrick Kelley

Hi Igor,

I would examine your model residuals to see if they are normal. With such a small data set (the kind that I also deal with), such an assessment of residual normality may be quite subjective (i.e. does the QQ-plot show a straight line?). With such a small sample, you're right that some bootstrapping is important.

I've only worried myself with parametric bootstrapping. So, perhaps others can weigh in here...It seems like the sample size alone would prevent you from adequately making an assumption about the distribution, so wouldn't a nonparametric bootstrapping approach be the most conservative?

Budi Hari Priyanto

Wilk-Saphiro test was designed to test for normality for small data-size (n < 50). This test is more powerful than Lillifors, Kolmogorov-Smirnove, Anderson-Darling and other tests for small data-size. (See Wilk-Saphiro test).

Igor Shuryak

Thank you, Budi, this is the kind of information I was looking for! Perhaps you can suggest a paper about this topic?

Thanks again, J. Patrick! I am thinking too that nonparametric bootstrapping would be a first choice in such situations, but a concerning factor (if I am not mistaken) is effect of outliers. What method of outlier detection/data transformation would you recommend for small data sets?

J. Patrick Kelley

Just to clarify, normality of raw data is not an assumption for models like ANOVAs, GLMMs, etc. You're only concerned about normality of residuals. Sorry if I'm sounding like a record player. I just think people use a good bit of time messing with their raw data unnecessarily to make them normal, and that's not necessarily required. That said, any outlier test may also be sensitive to sample size. The outlier tests I know of still need a set of values to then assess an individual point's leverage value (for example). I can imagine a case--but I haven't simulated this--where an outlier point detected with n=10 might turn out NOT to be an outlier with an n=100.

Igor Shuryak

Thanks again, J. Patrick! So, does the following procedure make sense for small data sets?

1. Fit the model (e.g. non-linear) to the raw data.

2. Test residuals for normality (e.g. with Shapiro-Wilk test).

3. If residuals are normal, use parametric bootstrapping to estimate model parameter confidence intervals. If not, use nonparametric bootstrapping.

Still unclear about what best to use for outlier detection, particularly keeping in mind what you mention about sample size effects. What do you use generally in this case?

J. Patrick Kelley

That sounds like a great approach. With n=10, I would likely just view the residuals with some level of subjectivity and simply ackowledge that there's going to be noise either in the normality test itself or my subjective characterization of normality. Of course, that means that it will be good to perhaps put the QQ-plot in the supplementary materials so that others can see what you judged to be normal, regardless of what your normality test says about your residuals. It's just tricky with small sample sizes. Re: outliers. I tend to keep all data points, since I usually can't justify why a particular data point should be excluded. We've all noticed that one individual or subject that has some unexpected character (sort of a Black Swan). Those entities are still in the distribution, but they may not look like it. So, that's the reason I keep them around. (I have one set of eggs from a clutch of my focal species that built a nest out of completely different materials and laid a strangely colored egg---with video data of the parents!...it's quite evil for graphs and analyses, but I always keep that nest around.)

Murali Dhar

Shree Nanguneri

In the first place what is your specific need to perform a normality test if I may ask? I would be able to better address your query knowing it in the first place. Thanks and I look forward to the questioner's response.

Igor Shuryak

J. Patrick, thanks again! I agree with reluctance to completely remove points which can look like outliers. But what about transformations (e.g. log) to reduce their effects?

Igor Shuryak

Thank you, Murali and Shree! For the purpose of publishing results, would it not be useful to perform and report formal normality testing rather than a subjective analysis only (even though it does make sense of course that power of such tests may be low)? I would be grateful for your suggestions.

Shree Nanguneri

Hi Igro,

Congratulations on your publications (ready I suppose). When you say "purpose of publishing results," what is the purpose served by the normality test in this work that you are doing that according to you requires the normality test, if I may ask? In order to avoid an IP conflict, you can make it very general. i am conducting a DOE, or assessing a process improvement etc if you choose to. So by understanding your statistical purpose of your process (rather than professional purpose of publishing), I would be better able to articulate my thoughts. My apologies for not clarifying this in the first response of mine. Thanks for your patience and I look forward to hearing from you to help you move ahead.

Alessandro Giuliani

Forget testing normality when you have 10 statistical units, even because in the great majority of practical cases knowing that a sample is normally distributed is less important than usually thought...

Shree Nanguneri

Just look at this (http://stats.stackexchange.com/questions/13983/is-it-meaningful-to-test-for-normality-with-a-very-small-sample-size-e-g-n) if you wish to disregard the purpose of the test (a statistical explanation with small sample size). Knowing the purpose of your test will also tell me if you need some other test or information in the first place other than the "Normality" test. This is why I was insisting on the purpose at the very first.

Rahul Pratap Singh Kaurav

You need not to have normality test. You may apply non-parametric test.

Now SPSS have the option to automatic selection of NP test.

Jochen Wilhelm

Rahul, this is a strange advice. Usually, the NP "equivalents" to the prarametric tests test something different, what actually may not be intended. Also, science is more about modelling than about testing. There are often no reasonable models using just rank information.

Alessandro Giuliani

...adding to Jochen answer that I subscribe, but you probably want to have a rough idea if 'something is there' in that cae NP is perfect...as for normality it is important to stress tha any significance test has to do with sample and not population variance and when you have n > 30 any sample distribution can be considered as normal. NP are OK for getting rid of outliers...

Jochen Wilhelm

Yes, and again adding to Alessandro's post:

If you have "large" samples, some few outliers don't matter much (one still can analyze their influences). If you have a lot of outliers, you should think harder about the process/model used to explain the responses. Thus, few/sporadic outliers[*] are not a problem, and a larger group/frequent outliers are less a problem but rather an important information you should use appropriately instead of getting rid of them.

If you have "small samples", meither the shape of the error distribution nor the presence of outliers can be judged with resonable confidence. Here only theoretical considerations help. If you think that there is a stable common center and a finite variance, and if this is all you know, then the normal error model is the one with the highest entropy, i.e. adding the least specific information to the analysis. There is nothing wrong with this. If a hypothesis test is performed, it can be done within the Newman-Pearson's regime or withing Fisher's regime. Only the former is illegitimate, because you can't claim that any particular long-run error-rate will be held (this frequentistic feature holds only when the error-frequency distribution matches the error probability model). The latter is still ok, because the p-value is just one of many indicators and is not interpreted all alone in outer space, and it is not used to control fixed error-rates. Fisher would call this a potential "type-III error".

[*] outliers are meant to be values showing "unexpectaldy large" residuals, way off from the majority of other residuals. Clearly, any obvioulsey wrong, unreasonable, impossible values *must* be excluded from the analysis. Here, a measurement error, typo, or some ther physical incident may be responsible, so that the value cannot contain any information about the process under study. Outliers not to be recognized as "measurement failure", "unphysiological", "unreasonable" or even "impossible" values may tell an important story. A good model should apprechiate this.

Igor Shuryak

A lot of thanks to everyone again for your useful contributions! Based on the discussion here, am I right then to assume that for small data sets (

Jochen Wilhelm

Formal normality testing is never very useful - either it is underpowered and relevant deviations are not recognized or shows "significant" deviations, but than it is still not clear if these deviations are *relevant*. Many common procedures (linear models, for instance) are quite robust against deviations form normality. The biggest problem is usually a loss of power.

Bootstrapping CI's is the only option if the distribution of residuals is very assymetric or not unimodal (that is: when it is quite strange). Otherwise, bootstrapping does not perform any better than the usual methods - the small sample is just taken to be perfectly representative for the population, not only for mean and variance, but also for other properties of the distribution; this is surely as hard to justify as for mean&variance alone (and further assuming there is nor severe assymetry and/or non-unimodal shape) .

Igor Shuryak

Thank you very much, Jochen! I thought that bootstrapping on small data sets is not very difficult or time consuming, and has the advantage of not making any assumptions about the data distribution. Does this make sense to you? Or perhaps an alternative would be to generate synthetic data sets assuming say Gaussian (or log-normal) errors?

Shree Nanguneri

My question is why do you need to perform the normality test other than it being a statistical test? What is the purpose behind this with respect to the content you are trying to publish? To put it in another mode, what would your readers challenge you, if you were to publish the results without sharing the normality test results? Couldn't be more blunt I guess. Hope you don't misunderstand this probing.

Demetris Christopoulos

I don' t think that it is worth doing a normality test at all, not only for small samples but also for big samples. Normality is an option that has been used so widely that it will not add more respect to your research work. Perhaps you should try alternative thoughts for judging about your data. Just because of curiosity: can you provide us this small data set?

Scott Reza Jafarian Kerman

As far as I know, Normality tests such as K-S test only will have power when your data number in more than 30. even though, you can only compare the odd results with each-other without any statistical analysis.

I prefer NON-PARAMETRIC TESTS in these kind of situations. :)

Valeriy R Korostyshevskiy

Igor, you might want to check out work by Stephan Morgenthaler (http://statwww.epfl.ch/morgenthaler/people/morgi.shtml), recently he has been working on estimators in small to extremely small samples. He doesn't directly deal with tests of normality, but you may be able to extract something useful from his work.

Igor Shuryak

Shree, thanks for your question: my original thought was to test the residuals of a fitted model for normality. In case they were normal, I could use parametric bootstrapping to generate model parameter confidence intervals. If they were not normal, I would use nonparametric bootstrapping. But after reading the comments here I am now leaning towards skipping the normality testing as not useful, and going straight to nonparametric bootstrapping. Does this make sense to you?

Igor Shuryak

Thank you, Demetris, Scott and Valeriy! After reading the comments here I am indeed much less inclined to use normality testing. I have no particular data set of interest yet - my question is generic. I am trying to figure out what would be a good approach for analyzing small data sets in general and reporting the results.

Budi Hari Priyanto

Wilk-Shapiro test

Budi Hari Priyanto

There are more than 40 tests for normality. Have you been trying to normality tests one by one?

Rahul Pratap Singh Kaurav

Just for identifying outliers the best approach is to draw box plots, and remove the outliers but the issue is to analysis with the outliers. You may not apply any parametric test if the sample size is

Jochen Wilhelm

@Igor: "But after reading the comments here I am now leaning towards skipping the normality testing as not useful, and going straight to nonparametric bootstrapping. Does this make sense to you?"

If you have a model (what typically means that you use the data to fit model parameters), then the model provides standard errors and/or confidence intervals for the parameter estimates. If the probability distribution used by the model matches the frequency distribution of the residuals, then the confidence intervals will have the expected frequency properties (i.e., in the long run, not more than 5% of such intervals are expected to miss the "true" values). But this is IMHO not essential; the confidence intervals can be interpreted as "highest likelihood regions", without claiming any frequentistic properties. This likelihood function tells us how strongly we can modify our expectations (about the parameters) after knowing these data. In this regard, if the residuals do show a frequency distribution systematically deviating from the underlying probability distribuion, then it tells us that the misspecification of the model is already visible using the available data (model are ALWAYS wrong! - but some are useful). This in turn can make us think about a better/different model, i.e., to *learn* something more from the data what we previousely might not have thought of (missing predictor, interaction, non-linear relationships, ...). But even if we have no idea or no chance to modify the model (probably because the data just does not provide information about other possibly impportant predictors), then the model at hand is the best guess we can do, being well aware that there is something more hidden in the data, what might bias our conclusions.

Bootstrapping makes an assumption: The data (or the residuals) are representative for the population, in all its properties. Instead of assuming that the population is a normal distribution with mean and variance to be estimated from the data, the entire shape of the population distribution (including mean, variance and theoretically infinitively many more parameters) are assumed to be identical to the parameters of the sample.

The uncertainty of some statistics can not (or just very hard) be analytically determined. Here, bootstrapping is the only option to get a guess about this uncertainty.

Igor Shuryak

Thanks, Budi and Rahul! Indeed, I am now thinking of going straight to nonparametric methods with the small data sets.

Igor Shuryak

Thanks a lot, Jochen! Your explanation is very detailed and useful! I am thinking now of using two methods (1. nonparametric bootstrapping and 2. generating synthetic data assuming Gaussian errors), fitting the model with both techniques, and comparing the results (i.e. the parameter confidence intervals). Does this sound useful? Would be grateful for your suggestions and suggestions from any other contributors!

Jihad Abdallah

Nonprametric analysis is good suggestion. In case you still want to test normality, Shapiro-Wilk can be done and you may also obtain Normal Probability Plot of the data or residuals.

Emilio José Chaves

Igor, have you observed that Lorenz curves of normal distribution samples and normal models are very close to the diagonal line from point (0;0) to (1;1) ? This is due to their small dispersions. When the sample does not follow normality you may compare them by graphing the point values of

Fi= ln(Li)/ln(Xi), for data ordered frop top to low variable, with L= fraction of distributed mass and X=cumulative fraction of population. If you compare two samples of this (Xi, Fi) values of any size you get a graphic picture that shows if they have or do not have a similar distributive structure. Pareto distributions show horizontal graphs. Try it and obtain your own conclusions. This bring important consequences for non parametric modelling any kind of small and big samples. Thanks, emilio

Sunita Bansal

Dear friends,

I have performed Shapiro-Wilk Test on my result set. I do not have statistics background. I have attached excel file. I want to know how should I infer my data which algorithm is better in result set.

Murali Dhar

Dear Sunita,

I had a look at your data and the analysis you have done. There is a clear evidence of non-normalcy in your data. I mean the variables you have do not follow normal distribution. Therefore parametric tests will not be valid. Hence, I advise you to analyze and present your results using non-parametric methods. If you are not that confident in statistics and you are conducting some research study requiring statistical analysis, I advise you get in touch with some experienced statistician to collaborate with you.

Sunita Bansal

Dear sir,

Thank you, that is very true my data is not normal distribution. I have done Wilcoxon Signed Ranks Test for one tail and two tail but i don't know how should i infer. I have attach file. please comment on it.

Jochen Wilhelm

Your data shows values for the groups "MCT" and for "MET" and this for several "number of tasks" (100, 1000, 5000, and 10000).

The data within a group is unlikely to be sampled from a normal distributed variabe. However, it is not unlikely that such data was samples from a log-normal distribution. It the Shapiro-Wilks test is applied to the log values, no p-value is even close to any conventional level of significance. The smallest p-value, after Holm's correction for multiple testing, is 0.113.

Apart from testing the *data* I wonder what you want to analyze. The MET- and MCT values are highly correlated. Do you want to show that correlation? Or do you want to analyze that/how much MET- and MCT-values are systematically different (MET > MCT), and probably depends on the number of tasks? Or do you want to show that both values approximate some upper limit with increasing number of tasks, probably different limits or in a different speed? Do you have some model how MCT, MET and tasks should be related?

I attached diagrams showing log(MET) against log(MCT) for the different numbers of tasks and the normal-QQ plots of the residuals, once for all data and once for the data after the three "outliers" have been removed (just for illustration! you should think why these three values might be outliers and if they conceal or rather reveal important insights!).

Sunita Bansal

Thank you for quick reply. I do not know much about all this. I have just start reading statistics, because I have submitted a paper in journal, where I compared my result using average of the above data set, and shown graphs(file is attached). They have asked me (In the simulations, the authors average over 10 trials. This can be sufficient or not, depending on the randomness of simulation parameters. Therefore, confidence intervals are of paramount importance to take any conclusion from the results.)

I do not know about confidence intervals. By reading here and there I understand if data is not normal non parametric test should performed like t-test or Wilcoxon Signed-Rank Test for Paired Samples. I have performed this using the following tool

http://www.real-statistics.com/

Please help me to out file is attached.

Jochen Wilhelm

Sunita,

first of all an important cue: do not confuse "tests" and "confidence intervals". A confidence interval (CI) is a region (or set) of non-rejectable hypotheses, whereas a test gives you a just p-value indicationg the probability of getting more extreme tests statistics as the one observed given a particular (null-)hypothesis. The connection between a test and a CI for is that a test based on alpha will lead to rejection of H0 whenever the (1-alpha)-CI does not include H0. I strongly recommend that you first learn what empirical science is about (builing models), what information is given in data and how statistics is used to elaborate the information content in data and to tell us what and how much we can learn from a given set of data. You should (at least) know what a likelihood is, what maximum likelihood estimates and likelihood intervals are. This includes to learn the meaning of probability(distributions), how they are derived and what they tell us. Then you can go one step further and plan experiments to give you data that can reasonably be interpreted and delivers most information about the interesting aspects. Then it will also be obvious how to analyze such data after the experiment has been done or the data has been collected. People in academics should more be trained in what science and learning actually is before they start to do experiments or to somehow analyze data (often in a particular way just because others did it this way). But now to your problem:

I will first show you how I would start (actually not knowing much about anything here, just looking at the data), and then I will also present a solution closer to your analysis you already started (as shown in your file rg3.xlsx).

As you can see from the picture I attached to my last post, there is a clear linear relationship between log(MCT) and log(MET). The residuals of the regression line through these points can be seen as normally distributed (as shown by the Normal-QQ-plots). The slope of the regression line is greater than 1. For the line log(MET)=B*log(MCT) the slope (B) is 1.017 with a 95% confidence intervals from 1.003 to 1.030. After removal of the three "outliers" with the large negative differences, B is estimated as 1.027 (1.019...1.035) (the estimate of the slope is steeper now)

This indicates that the log(MET) values are consitently higher than the log(MCT) values, and that this difference is larger for higher log(MCT) values. Going back to the original scale it says that the ratio of MET/MCT is greater than 1, and becoming greater for higher values of MCT. (Note: log(MET/MCT) = log(MET)-log(MCT))

Now you might want to separate this for the different groups(numbers of tasks). You could now calculate the mean difference in logs (i.e. the mean log-ratio) per group and state the confidence intervals (CI) for tehse estimates. Since there is no reason to believe that the distribution of the residuals is considerably different to a normal distribution, you can use the "standard procedure" to calculate the CI of the means. You data are the log ratios. Any conventional statistics program will calculate these CIs.

Given all the data (including the three "outliers") I get (using the natural logarithm)

tasks | mean| lower | upper

100 | 0.1174 | 0.0049 | 0.2299

500 | -0.1484 | -0.5829 | 0.2860

5000 | 0.2237 | -0.0228 | 0.4702

10000 | 0.5481 | 0.3846 | 0.7116

You can get the values for the ratios simply by anti-loging these values. For instance, the mean ratio for 10000 tasks is exp(0.5481) = 1.73, so the MET values are expected to be 1.73-times as high as (or 73% higher than) the MCT values. The 95%-CI ranges from 1.47 to 2.04 (or from +47% to +104%).

These group-wise analyses need to estimate the variances only for a subset of the entire available data, what is a waste of resources. We saw that the variance of log-ratios did not depend on the group (otherwise the distribution of the residuals was clearly not normal), so it would be advantagous to estimate the variance (and, hence, the CIs) from all available data together. That means we can calculate the CI for the residuals as a whole. To avoid a bias here, the residuals are calculated from the regression line including an intercept term (A): log(MET) = B*log(MCT)+A. The CI for the residuals on this model is -0.1423 ... +0.1423 (it is symmetric around zero). So better (more robust) estimates for the CI of the log-ratios can be obtained by adding and subtracting 0.1423 to the group means, giving

tasks | mean | lower | upper

100 | 0.1174 | -0.0249 | 0.2597

500 | -0.1484 | -0.2907 | -0.0061

5000 | 0.2237 | 0.0814 | 0.3660

10000 | 0.5481 | 0.4058 | 0.6904

Note two things: 1) the three "outliers" are included, leading to a relatively low log-ratio for the group "500". 2) the conclusion obtained from looking at all data together ("consistently higher MET-values than MCT-values") can not be seen if the data is analysed separately by group (the groups "100" and "500" have CIs that include the zero log-ratio or a ratio of 1, indication equality or a non-difference of MET and MCT).

You may calculate this after exculding the three "outliers" to see how much the results change. As I said before: there is no statistic that tells you something about the importance of these "outliers". Including them may invalidate your conclusions, or they might try to tell you the actually interesting story.

Now you have already presented the results as differences (not as (log-)ratios). So you might want to get CIs for these differences. Since the differences are clearly not nnormal distributed, CIs can not be calculated using standard techniques. I would recommend to use bootstrap to get the CIs for the differences directly. You will need a software that can calculate bootstrap-CIs.

My result (obtained with R and 100000 bootstrap-samples on the pairwise differences, including the three "outliers") is

tasks | mean | lower | upper

100 | 1212 | -13 | 394

500 | -16530 | -52282 | -39573

5000 | 20451 | -1906 | 6562

10000 | 85390 | 63068 | 70222

Richard David Gill

Testing normality with less than 10 observations seems to me to be almost nonsensical. With such a small amount of data you could only ever detect rather gross deviations from normality. Certainly it does not make much sense to apply an "omnibus" test - a test having some power against any alternative at all ... which pays for this by having low power against every alternative.

If in advance of looking at your data you do have some idea of what deviations from normality might be expected in your field - or what deviations might be particularly harmful to the further analyses which you plan - then you should use a test specially designed to have high power against those particular alternatives. As long as the test is "exact" (does not depend on asymptotic theory) then it is reliable.

Mahfuz Judeh

I advise you to use a non-parametric technique. You may want to test normality through Shapiro-Wilk`s test.

Erick Gryzbowski

Hello,

Regarding your questions, The Shapiro-Wilk test for normality, and Shapiro-Francia ' test for normality.

swilk "stata command"can be used with 4

Jaime Troncoso-Palacios

Dear colleagues

I have a similar question. Recently, a researcher published a paper in which he difference by size two populations of lizards. One with n = 2, an for the other the number of samples was not indicated. Normality was assessed with Shapiro-Wilk and the groups compared with t-Student test. I know this is really bad, but even if one group had an adequate number of samples, Is it possible to compare against another group that has only 2?

Thank you very much in advance

Shree Nanguneri

Dear Dr. Palacios

Glad to see your question here. In order to better assist with sample size determination, it would help knowing what your null hypothesis is? What is it that you are trying to prove or disprove?

Kindly advise.

Cheers,

Dr. Shree Nanguneri

Amanda Dookie

A hypothesis is never proven, only supported!

Shree Nanguneri

Yes, more precisely, a hypothesis is either rejected (disproved) or failed to be rejected (yet to be disproved or accepted as a hypothesis) based on the data provided and assumptions of risk levels. The null or the alternate stays statistically significant till proven otherwise with supporting data.

Jochen Wilhelm

Shree, I would not use words like "proven" and "disproven" in the context of hypothesis tests. The tests neither prove nor disprove anything. Tests only suggest how you should act when adhereing to a strategy that will balance the expected losses in a defined way. Nothing else.

Shree Nanguneri

The tests show whether or not a factor that is deemed influential is proved or disproved to be statistically significant at a chosen level of statistical confidence (risk). Does that sound better?

Jochen Wilhelm

No, not at all. There is nothing like a prove. Statistical significance is only a measure how likely the data (ore "more extreme" data) is expected given a particular probability model and a particular hypothesis.

If it was correct what you were saying, then I could easily prove psychic powers in men:

Experiment: play lottery

Null hypothesis: the player is only gusseing

Observation: Mrs. F. won the lottery (she's a lucky millionaire now)

Significance = p = P(winning|guessing) < 0.0000001

That is statistically significant at any reasonable level.

You say: this disproves the null hypothesis or proves the alternative (="not guessing"). So People (at least Mrs. F) have psychic powers and can magically foresee the next lotto numbers. Really?

I'd say: no, by no means this is a prove of anything. It just says that I would have been quite sure that Mrs F. would not win the lottery. But she won, so I am quite surprised.

----

NB:

Surely, the mean trick here is that Mrs. F was not mentioned before. She could be one of many people who actually played lotto, and she was just selected because she did win. This is the multiple testing problem, and it demonstrated that the interpretation of a p-value requires a context, and changing the context means that the interpretation/meaning of a p-value changes. But is the interpretation depends on a context, it cannot be a prove.

However, we do not need to consider a multiple testing scenario. Consider that I have the hypothesis that Mrs. F has psychic powers so that she should be able to forsee the lotto numbers. Now we specifically ask Mrs. F to play lotto to test this hypothesis (well, actually to test the null hypothesis). Now she really wins. Wow - we are surely surprised by this result. Very very surprised if we think that she was just guessing. But even in this case we would not beliefe that we now have proven her to have psychic powers. Again the context actually defines what the result tells us, but the result itself is neither a prove nor a disprove of anything.

Shree Nanguneri

Yes so it proves or disproves the likelihood of an occurrence. Thank you.

Jochen Wilhelm

This proves: you really do not want to understand my point... ;)

So may it be. I hope other readers won't be confused.

Erick Gryzbowski

the Shapiro-Wilk and the Shapiro-Francia are used to test for normality, in this context, . swilk can be used with 4

Jhohann Benzi

I think this publications may help:

Article Normality Tests for Statistical Analysis: A Guide for Non-St...

Comparing the performance of normality tests with ROC analysis and confidence intervals

Shahnaz Shahrbanian

Look at this discussion

https://www.researchgate.net/post/How_do_we_know_which_test_to_apply_for_testing_normality

Martin Schmettow

It is best to refrain from using tests for assumptions at all. It is long known that they have a flawed logic. One issue is that with small sample sizes, you will almost never get a "significant" rejection of normality, whereas with very large data sets, minor (i.e. negligible) deviations lead to rejection.

For large data sets the best strategy is using visual assessment, i.e. residual or Q-Q plots. For very small data sets, the only viable strategy is to think upfront whether the residuals can be normally distributed by any means. Generally, normality is an idealization, that strictly never happens in this universe. Why? Because all measures have one or two boundaries. Distributions with boundaries inevitably are skewed.

Practically, the researcher should first consider the type of measure (discrete/continuous, one boundary, two boundaries), select a fitting response distribution and use it in a Generalized LInear Model, for example:

Waiting time (not response time): Gamma
Counts: Poisson or negative binomial
Successes in a fixed number of trials: binomial (aka logistic regression)

See chapter 7 of my online book:

https://www.researchgate.net/project/Book-New-statistics-for-the-design-researcher

Claudio eduardo Justo

Let's say 10 data I'll check history of data and then check graphically for trends. Hard question.

Stephanie Jarvis

Wilks Shipiro is best for small samples

Shree Nanguneri

Why are we having a limited sample size, if I may ask? What problem are we trying to solve?

Badges
Science topic

More Igor Shuryak's questions See All

Why don't I have a PCR product with proofread polymerase?

Dear Colleagues, I have the following problem: I’m trying to amplify a cassette with a resistance gene (about 2 kb) from the yeast genome, so that I can then insert it into a plasmid. With Taq...

03 July 2024 105 11 View

Staining of isolated mitochondria tecnique?

Greetings, dear colleagues! Do you know any technique utilizing some ~ordinary molbiol lab reagents (not the commertial kits for isolated mitochondria staining) that will allow isolated...

07 June 2024 5,909 0 View

How proving Riemann Hypothesis will fasten the development advanced self learning algorythms?

If Proven how Riemann hypothesis will help the development of AI and maybe help us create real advanced self-councious algorythms?

05 June 2024 9,787 3 View

Which relevant sources can be used to find reports on the scope of donations to political parties within the EU?

We are working on gathering raw data and information about the scope/share of private donations and public funding to pilitical parties that sometimes in EU is a part of responsibilities of...

16 April 2024 5,178 0 View

Is it possible to know the dwell time for a given isotope/mass from a raw LA-ICP-MS trace elements data file?

I have a dataset of trace elements LA-ICP-MS analyses with raw data, which consists of a set of a single file for each spot analysis. Each of these files has some columns (masses) and plenty of...

24 March 2024 898 6 View

Why are p-values of Durbin-Watson statistics different for dwtest and durbinWatsonTest in R?

I am analyzing some time-series data. I wrote a script in R and used two methods from two different packages in R to calculate the DW statistics and respective p-values. Surprisingly, for the...

14 March 2024 6,127 1 View

How to normalize expression data with SEM?

Greetings. Probably the question is not complex at all, but can't find an answer. If I have RT-qPCR data of gene expression in a sample with multiple analitycal replicates - to compare it to data...

13 March 2024 7,640 1 View

How can I add my papers with my name with different spelling?

In particular, I have few papers with spelling Agranovskii. Thank you

05 February 2024 3,764 1 View

How to generate causal effect estimates from causal forest in the grf R package for continuous treatments?

I am using the grf R package (https://grf-labs.github.io/grf/reference/causal_forest.html) to obtain causal effect estimates for a continuous treatment variable. The package description says:...

15 January 2024 1,990 0 View

What does a 4-dimensional Euclidean space look like from the point of view of a 3-dimensional observer?

If we keep in mind that R^{4}=R^{+}xS^{3}=R^{3}*RP^{1} where * means a direct product with a singularity at the zero point of a 3-dimensional Euclidean space in which the projective line is...

30 December 2023 9,110 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View