In 2000, Ballico reported a paradoxical behavior of the expanded uncertainty (EU) estimated with the GUM’s WS-t approach in a real-world application (WS stands for Welch-Satterthwaite). According to Ballico (2000), during a routine calibration and associated uncertainty calculation at the CSIRO National Measurement Laboratory (NML), Australia, a thermometer was calibrated for two ranges: 1 mK range (higher precision range) and 10 mK range (lower precision range). He observed a counter-intuitive result: the estimated EU for the 1 mK range was 37.39, which was greater than 35.07, the estimated EU for the 10 mK range! This paradoxical result was later designated as the Ballico paradox (Huang 2016). Ballico (2000) suspected that the paradox was due to the limitation of the WS formula.

Hall and Willink visited the Ballico paradox in 2001. They presented a calculation example and employed Monte Carlo simulation to generate the t-intervals with the effective degrees of freedom (DOF) estimated by the WS formula. Their results for the mean width of the simulated t-intervals showed some anomalous behavior. However, Hall and Willink (2001) didn’t resolve the Ballico paradox.

The Ballico paradox had been ignored and unresolved until 2016 when I visited it and provided a resolution with a proposed WS-z approach (Huang 2016). I figured out that the Ballico paradox essentially invalidates the WS-t approach. However, the Ballico paradox is not due to the WS formula. The WS formula is valid for estimating the effective DOF; the Ballico paradox is due to the use of the t-interval in uncertainty estimation (Huang 2016). I revisited the Ballico paradox in 2018 and 2019 and provided the resolution with two alternative methods (Huang 2018, 2019).

Nobel laureate Richard Feynman (1964) stated, ‘If a theory disagrees with experiment, it is wrong. In that simple statement is the key to science’.” Feynman’s statement can be interpreted to mean that theories should be tested against experiment and only against experiment (White 2016). While frequentists and Bayesians disagree on their views and methodologies, both agree that a statistical method should be judged by the result which it gives in practice (Jaynes 1976, Kempthorne 1976).

Therefore, I propose using the Ballico paradox as a standard test for the validity of any method for computing measurement uncertainty. That is, a statistical method, regardless of whether it is derived based on frequentist or Bayesian statistics, must resolve the Ballico paradox. Otherwise, the method is invalid.

References

  • Ballico M 2000 Limitations of the Welch-Satterthwaite approximation for measurement uncertainty calculations Metrologia 37 61-64
  • Feynman R P 1964 Almost Everyone’s Guide to Science ed J Gribben (Hyderabad: Universities Press) (see also www.youtube.com/watch?v=b240PGCMwV0)
  • Hall BD, Willink R (2001) Does “Welch-Satterthwaite” make a good uncertainty estimate? Metrologia 38:9-15
  • Huang H 2016 On the Welch-Satterthwaite formula for uncertainty estimation: a paradox and its resolution Cal Lab the International Journal of Metrology 23 20-28
  • Huang H 2018 A unified theory of measurement errors and uncertainties Measurement Science and Technology 29 125003 https://doi.org/10.1088/1361-6501/aae50f
  • Huang H 2019 Why the scaled and shifted t-distribution should not be used in the Monte Carlo method for estimating measurement uncertainty? Measurement 136 282-288 https://doi.org/10.1016/j.measurement.2018.12.089
  • Jaynes E T 1976 Confidence intervals vs Bayesian intervals Foundations of Probability Theory, Statistical Inference, and Statistical Theories and Science Vol. II 175-257 Eds. Harper and Hooker (Dordrecht-Holland: D. Reidel Publishing Company)
  • Kempthorne O 1976 Comments on paper by Dr. E. T. Jaynes ‘Confidence intervals vs Bayesian intervals’ Foundations of Probability Theory, Statistical Inference, and Statistical Theories and Science Vol. II 175-257 Eds. Harper and Hooker (Dordrecht-Holland: D. Reidel Publishing Company)
  • White D R 2016 In pursuit of a fit-for-purpose uncertainty guide Metrologia 53 S107–24
  • Similar questions and discussions