Regarding the Kruskal-Wallis test there still seems to be some confusion. Like the (Mann-Whitney)Wilxocon test, it is by principle *not* a test of the medians. It is a test of the rank distribution. It *reduces* to a test for location (medians) if and only if all other moments of the distributions are the same in all groups. This is particularily *not* the case when the variances are not equal. Thus, claiming that a KW test should be done if there is heteroscedasticity is a bad advice; in this case the test surely will work - but is does actually NOT test what the scientist expects to test!
You may use ANOVA with small sample sizes, too, if the conditions (normality, homoscedasticity, independence) are satisfied and you are studying the differences between the means.
Whether ANOVA is the optimal method or not depends on the multiple details.
If you have three and more independent groups and in each group there are six or more experimental units and normality in each group and equal variances between groups the One way ANOVA can be used. If you do not have normality ( after log transformation ), but equal variances between groups are meet the Kruskal–Wallis test can be used. If the data set is dependent by time and both normality and sphericity are meet the repeated measurement RMANOVA can be used. If the one of assumptions was not meet the Friedman test is appropriate statistical analysis.
It all depends on what you are trying to analyze. Are you interested in comparing means? If so, ANOVA or Kruskal-Wallis may be the way to go. But, mind you, ANOVA is based on some severe restrictions, as pointed out above: homokedasticity, asymptotic normality, independence). So you may start by exploring your data, making some boxplots and possibly apply Barlett's test for variance homogeinity.
If you provide more details, we will be happy to provide some more specific help.
With n=6 per group, any of the methods may not work well. To check the assumption of normality, homoscedasticity may not be done properly. One simple way may be to use two-sample t-test with unequal variance and correct the degrees of freedom by Satterwaite approach, provided you want to compare the means.
Well, you don't have to teach me what Kruskal-Wallis is, thanks. As for your observation regarding boxplots, you are absolutely right, they are not of much help. I'm sorry to have missed the small sample size the first time.
@Abraham
Increasing sample size, if possible, may be highly desirable. Bayesian methods can help you handle small sample size, if you have considerable information to add to the analysis. Check this:
Regarding the Kruskal-Wallis test there still seems to be some confusion. Like the (Mann-Whitney)Wilxocon test, it is by principle *not* a test of the medians. It is a test of the rank distribution. It *reduces* to a test for location (medians) if and only if all other moments of the distributions are the same in all groups. This is particularily *not* the case when the variances are not equal. Thus, claiming that a KW test should be done if there is heteroscedasticity is a bad advice; in this case the test surely will work - but is does actually NOT test what the scientist expects to test!
The Kruskal-Wallis test only operates on the ranks of the data, not the data themselves, so it can NOT calculate or compare means. The suggested use of the test is when the shape of the distributions is the same, in this case saying that the test compares medians is an expressive way of talking about locations.
One could ask, though, if we compare populations where the shapes of the distributions are the same and not normal, why couldn't we phrase the differences in locations as differences in means. My answer would be that the means of skewed distributions are strongly influenced by the values in the tail area of the distribution, while rank based methods like the median and the KW test are not.
"One could ask, though, if we compare populations where the shapes of the distributions are the same and not normal [...]"
Can anyone give me a real-world example where this is the case? AFAIK most skewed distributions can reasonably/sensibly be modeled by a log-normal, by a gamma- or by a beta distribution, other rather rare cases by exponential- Weibull- or other distributions; that is, based on theoretical considerations about the fundamental sources of uncertainty/variability and its propagation to the measured response variable.
@Gabor, I know, I know. There was nothing wrong about it (at least to my understanding). I am honestly just wondering where such a scenario does play a role in practice, what are cases where KW or MW can be applied to test a location shifts (differences in medians)?
I am agree with.Ujjwal Das. A log transformed procedure to achieve the normality and an unpaired two tailed t-test with the satterthwaite's approximation is an appropriate statistical analysis. In addition, It should be added statistical power analysis and confidence interval (CI). The SD and SEM do not have any interest for statistical analysis when sample size (n) is six (n=6). It should be also remebered, that P -vale should be evaluated without any correction of the alpha ( type I error rate).=, while three-valued logic paradigm should be used.
Why are we complicating issues here! We have 3 factor levels. For each level, 6 responses were collected. If continuous response variable was collected, then use 1-way. If rank data, use Kruskal-Wallis. You can also tranform non-parametric and use conventional 1-way.
If "SD and SEM do not have any interest for statistical analysis when sample size (n) is six (n=6)", as you said, which would be the interest in confidence interval and t-tests, since t-tests and confidence intervals are strongly dependent on SEM and this one is based on SD?
For a large sample size characterized by a normal distribution reflects the information on the proportion of data values within ±a given number of SD, e.g. ±1.96. If a sample size is a small, then the information based on SD is incorrect. It is a well known that SEM= SD/√n. Therefore, the reasoning given above for the non-use of SD at very small sample sizes obviously also applies to SEM, while confidence interval is correctly reflects properties of the size of samples.
Just to clarify, Kruskal-Wallis can be used to compare, in theory, any ORDER statistic, like median or quantiles. As for expectations, it's no appropriate for comparing that. I'm very sorry if this was suggested in my first post.
Anyway, I think we're creating a storm about nothing here. There are plenty of methods to analyze this kind of data. Just explore away and discuss the results from several methods in the light of your knowledge.
I believe that you are still confused about the confidence intervals concept. When you say that, with large samples, ±1.96 SD contain 95% of data values, you are using the concept of confidence intervals. This is a confidence interval! Then you say that this does not work on small samples, but confidence intervals work?
A confidence interval to the mean is "mean ± t(alpha/2) * SEM". So, if you state that SEM will not work with small samples, confidence interval should not work either...
Yes! There are plenty of methods to deal with this problem, And there are plenty of methods to deal with all problems in statistics. Our purpose here is not just to point one of them, but discuss the what seems to be the best alternative and to discuss the weakness of other approaches.
For instance, every statistical class says, generally, that if sample size is small, you should use classical non-parametric tests, like Kruskal-Wallis. And I see a lot of people using these tests on very limited sample sizes, where they do not have enough power to find any significant effect, even if it exists. That's the reason to the "storm". After the storm, some wisdom emerges...
there are four types of conditional 'error bars': ranges, SD, standard errors SEM, and CI. If there is a small sample size (n=6), then the better decision will be the CI, but not one of three 'error bars'. I assume that the CI is more informative indicator in comparision with SEM due to the small sample size. The CI is a "very sensetive" (biological sense only) index when the sample size is small.
Again... If SEM is not valid because is based on SD (you said that before), then CI can not be valid either, beacause is based on SEM, that is based on SD, that is not valid... Among the four types of error bars you cited, only ranges are not based on SD... If SD is a bad choice, all three must be bad choice...