Please provide your advice on which statistical test to use for comparing two data sets with unequal sample sizes and non-normal distributions. The data is shown in the attachment.
If your two sets are two independent samples, it is more customary in most statistical software to have a group code in one column and the scores in another. Like this:
Set Y
1 0.3
1 13.8
1 30.3
1 28.7
1 30.4
1 15.9
1 9.6
1 12.9
1 11.8
1 13.5
1 9.8
1 14.5
1 17.0
1 21.0
1 16.8
1 13.4
1 9.8
1 0.0
1 12.2
1 0.0
1 18.2
1 15.4
1 9.5
1 8.9
1 6.0
1 3.1
1 2.5
1 1.4
1 0.4
1 4.0
1 2.5
1 5.9
2 2.0
2 5.3
2 8.3
2 7.1
2 5.9
2 6.4
2 4.1
2 11.9
2 22.1
2 35.3
2 35.5
2 30.9
2 30.0
2 29.3
2 27.0
2 24.6
2 24.5
2 19.8
2 15.0
2 16.6
2 16.1
2 12.2
2 16.3
2 17.7
2 19.1
2 14.0
2 12.0
You say you want to compare the two sets. But what specifically do you want to do? Do you want to compare the groups means? Or medians? Or do you have some other kind of comparison in mind Thanks for clarifying.
Bruce Weaver dear sir, Thank you very much for your kind support. I want to compare both data sets to determine if there is a significant difference between them. Using the p-value is also a good option. However, I'm not sure which statistical test would be the best choice. Could you please provide your advice?
Although you did not provide sufficient info, but i will suggest, the Mann-Whitney U Test. Since you are comparing two independent samples when the data does not meet the assumptions required for parametric tests. This statistical tool will be useful in comparing the medians scores of the groups.
Prof. Enoch Tomiwo Oladunmoye thank you very much for your kind support!
With my attached data, can I use the Mann-Whitney U Test? When I tried before, I noticed that the critical value for the Mann-Whitney U Test is shown for only up to 20 samples. However, in my case, I have more than 20 samples. Am I wrong? please support me. thank a lot!
Enoch Tomiwo Oladunmoye why switching to U test? It does not compare medians without further assumptions, i.e. that both groups are identically distributed and hence only a shift (which would be the same for mean, median and mode if both are identically distributed) explains the stochastic superiority of one group.
There are plenty of possibilities, but Que Ho has not told us yet, what he really is interested in..... a p-value is a result of a statistical (frequentistic) analysis and not the goal. Typically, you want to investigate mean differences and a p-value can help to evaluate the signal to noise ratio and if this is strong enough to be confident about the sign of your effect (i.e. is it positive or negative).
Further to Rainer Duesing's point about the Wilcoxon-Mann-Whitney test NOT being a test of medians, here is an example Ronán Michael Conroy included in his 2012 Stata Journal article.
Grp Y
0 5
0 5
0 5
0 5
0 5
0 5
0 7
0 8
0 9
0 10
1 1
1 2
1 3
1 4
1 5
1 5
1 5
1 5
1 5
1 5
The medians for the two groups are both 5. But the WMW test yields these results:
z = 2.730
Prob > |z| = 0.0063
Exact prob = 0.0100
P{value(group==0) > value(group==1)} = 0.820
Clearly, therefore, the WMW test is not testing a null hypothesis stating that the population medians are equal. As Rainer noted, it would only do that under the additional and unrealistic assumption that the two population shapes are identical, with the only possible difference between populations being a shift in location.
Article What Hypotheses do “Nonparametric” Two-Group Tests Actually Test?
I'd like to add a few (more fundamental) questions to consider before making a decision:
1. Why are the data non-normally distributed? Are there outliers present? Data entry errors? Are there noticeable subgroups in the data (i.e. multiple peaks)?
2. Should the data be normally distributed? What variable do the data reflect?
I wonder why switching to rank based approaches in the first place. You/we should asses how much and why normality is violated, similar to Blaine Tomkins post. The data does not seem to be ordinal, but maybe you could clarify this.
I used 3 different Bayesian models on your data, with a gaussian, a student t and a skewed normal likelihood. The latter because the data looked somewhat skewed and seemed bounded at zero. Although the skewed model fitted best according to leave-one-out cross-validation (loo), the differences were marginal, all posterior predictive checks looked similar and generally not bad, and the residuals did not fit perfectly, but nothing I would bother too much. The estimates for the group means were also very similar for all three models.
Therefore, if you want to stay in the frequentist realm, maybe a normal t-test with bootstrapping will work?
But I would like to hear other opinions from the people here.
P.S.: I calculated the models at home and do not have the code at hand, but I can upload it, if wanted.
Dear Rainer Duesing Bruce Weaver Blaine Tomkins Sal Mangiafico ! Thank you so much for taking the time to provide me with an answer. I really appreciate it. Thanks to your idea, I found a solution to the issue. I plan to use the Wilcoxon-Mann-Whitney test in R. Have a nice day!
Dear Sir, if you want to compare the mean between the two, and if you are using SPSS, you can use Welch statistics if the data is not normally distributed.
Que Ho You can use Mann-Whitney Test, just be mindful of what the test is actually comparing and what conclusions you can draw from the test. As others have noted, the test does not compare central tendency between the two samples (as t-test does), but the shapes of the distributions between the two respective groups.
Hello Pathak Abhijit. When you say "Welch statistics", do you mean Welch's (1938) t-test? If so, bear in mind that it is robust to heterogeneity of variance. To my knowledge, it is no more robust to non-normality of the underlying populations than Student's t-test is. (But I am happy to be educated on this point if I am wrong.)
PS- The tests by Welch (1938) and Satterthwaite (1946) are the same, and for that reason, the test is often called the Welch-Satterthwaite t-test. But there is another variation by Welch (1947) that is implemented in Stata. If you are interested, you can find some discussion here:
But note that my initial statement in that thread was wrong. The StackExchange discussion that Sal Mangiafico shared cleared things up very nicely, I think.
Blaine Tomkins , the Mann-Whitney is almost always a test of whether the observations in one group tend to have higher observations then the other group. A hypothesis, I think, that is often useful.
There is a slight issue --- if I remember correctly --- about the type I error rate when the sampled populations are heteroscedastic. That’s why some authors either list the hypothesis about being generally about the distribution, or impose a “same shape and spread” assumption. This occurs also with some permutation tests, like Fisher-Pitman.
Interestingly, with the data Que Ho provided, I ran the Mann-Whitney, Fisher-Pitman, and a permutation test on the mean, and the p-values were nearly identical. It’s just one of those cases…
For most cases with unequal sample sizes and non-normal distributions, the Mann-Whitney U Test is a suitable choice due to its simplicity and robustness. If you are interested in comparing the overall distributions rather than just the central tendency, consider the Kolmogorov-Smirnov Test.