Fisher established Fisher's linear discriminant function (Fisher's LDF) under the assumption data satisfies Fisher's assumption. He had no power of computer and open the new world by his cleaver brain. After Fisher, most ones enjoy the Lotus eating. He never say his theory can adopt every type of data. However, there is no good test for Fisher's assumption. Therefore, everybody discriminate the medical diagnosis, rating etc. My paper shows serious problems we discriminate these kinds of data by the statistical discriminant functions based on the variance-covariance matrices.
I remember expensive equipment being sold to the US Govt, where destructive testing had to be done to show if performance met specifications. Because the equipment was expensive, and would not survive testing, sample sizes tended to be small. That was great for the contractors selling the equipment. I tried to emphasize power analyses, because p-values are driven by sample size. With a very small sample size, if you just looked at a p-value, you could not "prove" that the equipment did not meet specifications, unless this was extremely far from true, so the contractors claimed that it was 'accepted' that their equipment met the null hypothesis that they met specifications, even when the sample size may have been so small that it was virtually impossible to fail.
I understand that the problem now may often be at the other extreme where, with 'big data,' p-values tend to be so small that you can 'reject' almost anything.
The problem is that no decision should ever be made based on a single, isolated p-value. The p-value is driven by (something like a function of) sample size. Whenever you can use a confidence interval instead, or just look at a relative standard error (or coefficient of variation), that is generally far more practically interpretable.
People now often look at 'effect size,' but it is still commonplace, and apparently required by many journals, according to what I have read on ResearchGate, to use a p-value, perhaps often as a stand alone statistic, and very much misinterpret its meaning. The p-value is very much overused and misinterpreted.
Jim
PS - At the very least, the practice of using 0.05 as the level of "significance" (a very misused word) as a default needs to change. Basically, and depending upon circumstances, the larger the sample, the smaller the level should be set. A power analysis and/or other sensitivity analysis on a case-by-case basis is needed. We need to remember that the more information we have, as long as it is good information, the more we actually know, which impacts both hypothesis testing and confidence intervals.
On the other hand, I have too often heard statistics referred to as "magic," when it really should be used to help people understand the data better - even if not always used that way. But when used properly, yet people call statistics "magic," and I have really heard that word used, I have found, in my opinion and experience, that this can possibly be because the person saying that thinks that he/she knows what the data are saying (or what they think the data should say), but they are actually wrong, and do not really understand the subject matter as well as they think they do either. Then it is easy to just blithely say that something you don't understand must be wrong.
I would argue that statistics cannot lie. People can misdirect other people using statistics, but that is not the fault of the mathematical operations and theory. Often what people want are simple answers. It is published in a respected peer reviewed journal and therefore it must be true. Sometimes it is published in some magazine at the grocery checkout line and it must be true. So I run one analysis on the data and it gives me the answer that I was hoping for (because I know that my theory is true) and I stop, publish, and move on. How many non-statistical methods research papers have you read that went through half a dozen different ways to analyze the data and then drew conclusions on how these analyses had the same/different results? How many times have you found one and only one way to analyze the data? When did your data last satisfy (not merely fail to reject) the null-hypothesis for all assumptions? I know that I have never had a real data set that satisfies the assumptions of the statistical models (I have used computer generated data, but that doesn't count), and I can always think of alternative methods with no real way to decide which single approach is the best based on a sound theoretical understanding of the system being studied. I take my best guess, and try to look at enough alternatives that I don't make some really stupid error. I could present alternative analyses to make a more comprehensive document but no journal that I know of will publish 80+ page research articles. I could always design experiments to be as simple as stupidly possible: "The world as a multiple comparison procedure with three treatments." Now which multiple comparison procedure (MCP) would you like: LSD, Tukey, SNK, or any of about 2 dozen alternatives. If you as a reader/reviewer/publisher only allow me to publish one choice, how do I decide which one to use if they are giving me slightly different answers? Maybe I should have a 20 page methods section describing the relative merits of each approach as it relates to my data .... I have tried that and it doesn't work very well with reviewers. So the system is set up to encourage mistakes (deliberate or accidental), and the lying has more to do with people and the social aspects of scientific research, not the statistics.