Hello,
I'm preparing for a webinar addressing several common statistical misconceptions in clinical trials that I observed many times. Now I'm collecting "good resources", clearing these misconceptions (+ doing my own simulations illustrating them).
One of the most common misconception I saw in various textbooks, presentations, discussions, etc. is that "both Mann-Whitney (-Wilcoxon) and Kruskal-Wallis compare medians", which is often stated:
a) without any additional conditions, which is wrong in general and easy to disprove just by example (or less easy, formally, as in the 1st book cited below)
b) as a "location-shift" problem, which doesn't translate to medians easily without additional conditions of IID samples or symmetry around the medians - which is a very strict condition and practically a "zombie" one. Zombie - means that this is rarely (if ever) checked in practice, as far as I could observe over years. Which sometimes makes researchers surprised to learn that:
1) stochastic equality (even at very high p-value) was claimed at very different means or medians
2) stochastic superiority (even at very low p-value) was claimed at exactly same means or medians. They simply forgot to check variances and shapes of the distributions. They also learned, that "happy juggling with tests", as I call it, in case of violated normality assumption may lead to testing different hypothesis, consistent or inconsistent with the original questions. In other words, it's not impossible to obtain a technically valid answer to a never asked question.
Funny, the original papers by Wilcoxon ("Individual Comparisons by Ranking Methods") and Mann-Whitney ("On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other") don't refer to medians.
So far I found two excellent books, explaining this very issue, and a couple of articles, software manuals, and forum discussions:
Books:
1. Brunner, E., Bathke, A. C., & Konietschke, F. (2018). Rank and Pseudo-Rank Procedures for Independent Observations in Factorial Designs: Using R and SAS. Springer International Publishing.
This book shows step by step why do the MW(W), Fligner-Poicello and Brunner-Munzel are not consistent for detecting different medians or means.
2. Nussbaum, E. Michael. (2024). Categorical and Nonparametric Data Analysis: Choosing the Best Statistical Technique (2nd ed.). Routledge.
3. Thomas D. Cook, David L. DeMets. (2007). Introduction to Statistical Methods for Clinical Trials.Chapman and Hall/CRC
Links:
1. K. Barbe, Statistical Methods I: Nonparametric statistical inference, Master Mathematics Vrije Universiteit Brussel and Universiteit Antwerpen, [2.1 Walsh Average and Wilcoxon rank test] (a copy of the PDF from the Wayback Machine), https://web.archive.org/web/20210524064522/http:/homepages.vub.ac.be/~kbarbe/StatMet1.pdf
2. The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians
Article The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians
3. What hypotheses do “nonparametric” two-group tests actually test?
Article What Hypotheses do “Nonparametric” Two-Group Tests Actually Test?
- The Mann-Whitney test doesn't really compare medians
https://www.graphpad.com/guides/prism/latest/statistics/stat_nonparametric_tests_dont_compa.htm
- Example 2014.6: Comparing medians and the Wilcoxon rank-sum test
http://proc-x.com/2014/06/example-2014-6-comparing-medians-and-the-wilcoxon-rank-sum-test/
- Mann-Whitney test is not just a test of medians: differences in spread can be important
https://edisciplinas.usp.br/pluginfile.php/1065042/mod_resource/content/1/Mann%C2%ADWhitney%20test%20is%20not%20just%20a%20test.pdf
- FAQ: Why is the Mann-Whitney significant when the medians are equal?
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-is-the-mann-whitney-significant-when-the-medians-are-equal/
- Wilcoxon signed-rank test null hypothesis statement
https://stats.stackexchange.com/questions/363335/wilcoxon-signed-rank-test-null-hypothesis-statement
- Yoon-Jae Whang, Econometric Analysis of Stochastic Dominance. Concepts, Methods, Tools, and Applications, ISBN: 9781108602204. [page 64: Test of stochastic dominance: Basic results]
Could you, please, recommend any other titles (books, papers) that cover this very topic in any way, less or more formal?