01 January 1970 100 5K Report

Due to growing concerns about the replication crisis in the scientific community in recent years, many scientists and statisticians have proposed abandoning the concept of statistical significance and null hypothesis significance testing procedure (NHSTP). For example, the international journal Basic and Applied Social Psychology (BASP) has officially banned the NHSTP (p-values, t-values, and F-values) and confidence intervals since 2015 [1]. Cumming [2] proposed ‘New Statistics’ that mainly includes (1) abandoning the NHSTP, and (2) using the estimation of effect size (ES).

The t-test, especially the two-sample t-test, is the most commonly used NHSTP. Therefore, abandoning the NHSTP means abandoning the two-sample t-test. In my opinion, the two-sample t-test can be misleading; it may not provide a valid solution to practical problems. To understand this, consider a well-posted example that is originally given in a textbook of Roberts [3]. Two manufacturers, denoted by A and B, are suppliers for a component. We are concerned with the lifetime of the component and want to choose the manufacturer that affords the longer lifetime. Manufacturer A supplies 9 units for lifetime test. Manufacturer B supplies 4 units. The test data give the sample means 42 and 50 hours, and the sample standard deviations 7.48 and 6.87 hours, for the units of manufacturer A and B respectively. Roberts [3] discussed this example with a two-tailed t-test and concluded that, at the 90% level, the samples afford no significant evidence in favor of either manufacturer over the other. Jaynes [4] discussed this example with a Bayesian analysis. He argued that our common sense tell us immediately, without any calculation, the test data constitutes fairly substantial (although not overwhelming) evidence in favor of manufacturer B.

For this example, in order to choose between the two manufacturers, what we really care about is (1) how likely the lifetime of manufacturer B’s components (individual units) is greater than the lifetime of manufacturer A’s components? and (2) on average, how much the lifetime of manufacturer B’s components is greater than the lifetime of manufacturer A’s components? However, according to Roberts’ two-sample t-test, the difference between the two manufacturers’ components is labeled as “insignificant”. This label does not answer these two questions. Moreover, the true meaning of the p-value associated with Roberts’ t-test is not clear.

I recently visited this example [5]. I calculated the exceedance probability (EP), i.e. the probability that the lifetime of manufacturer B’s components (individual units) is greater than the lifetime of manufacturer A’s components. The result is EP(XB>XA)=77.8%. In other words, the lifetime of manufacturer B’s components is greater than the lifetime of manufacturer A’s components at an odds of 3.5:1. I also calculated the relative mean effect size (RMES). The result is RMES=17.79%. That is, the mean lifetime of manufacturer B’s components is greater than the mean lifetime of manufacturer A’s component by 17.79%. Based on the values of the EP and RMES, we should have a preference of manufacturer B. In my opinion, the meaning of exceedance probability (EP) is clear without confusion; a person even not trained in statistics can understand it. The exceedance probability (EP) analysis, in conjunction with the relative mean effect size (RMES), provides the valid solution to this example.

[1] Trafimow D and Marks M 2015 Editorial Basic and Applied Social Psychology 37 1-2

[2] Cumming G 2014 The New Statistics Psychological Science 25(1)DOI: 10.1177/0956797613504966

[3] Roberts N A 1964 Mathematical Methods in Reliability Engineering McGraw-Hill Book Co. Inc. New York

[4] Jaynes E T 1976 Confidence intervals vs Bayesian intervals in Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science, eds. Harper and Hooker, Vol. II, 175-257, D. Reidel Publishing Company Dordrecht-Holland

[5] Huang H 2022 Exceedance probability analysis: a practical and effective alternative to t-tests. Journal of Probability and Statistical Science, 20(1), 80-97. https://journals.uregina.ca/jpss/article/view/513

More Hening Huang's questions See All
Similar questions and discussions