In 2000, Ballico reported a paradoxical behavior of the expanded uncertainty (EU) estimated with the GUM’s WS-t approach in a real-world application (WS stands for Welch-Satterthwaite). According to Ballico (2000), during a routine calibration and associated uncertainty calculation at the CSIRO National Measurement Laboratory (NML), Australia, a thermometer was calibrated for two ranges: 1 mK range (higher precision range) and 10 mK range (lower precision range). He observed a counter-intuitive result: the estimated EU for the 1 mK range was 37.39, which was greater than 35.07, the estimated EU for the 10 mK range! This paradoxical result was later designated as the Ballico paradox (Huang 2016). Ballico (2000) suspected that the paradox was due to the limitation of the WS formula.
Hall and Willink visited the Ballico paradox in 2001. They presented a calculation example and employed Monte Carlo simulation to generate the t-intervals with the effective degrees of freedom (DOF) estimated by the WS formula. Their results for the mean width of the simulated t-intervals showed some anomalous behavior. However, Hall and Willink (2001) didn’t resolve the Ballico paradox.
The Ballico paradox had been ignored and unresolved until 2016 when I visited it and provided a resolution with a proposed WS-z approach (Huang 2016). I figured out that the Ballico paradox essentially invalidates the WS-t approach. However, the Ballico paradox is not due to the WS formula. The WS formula is valid for estimating the effective DOF; the Ballico paradox is due to the use of the t-interval in uncertainty estimation (Huang 2016). I revisited the Ballico paradox in 2018 and 2019 and provided the resolution with two alternative methods (Huang 2018, 2019).
Nobel laureate Richard Feynman (1964) stated, ‘If a theory disagrees with experiment, it is wrong. In that simple statement is the key to science’.” Feynman’s statement can be interpreted to mean that theories should be tested against experiment and only against experiment (White 2016). While frequentists and Bayesians disagree on their views and methodologies, both agree that a statistical method should be judged by the result which it gives in practice (Jaynes 1976, Kempthorne 1976).
Therefore, I propose using the Ballico paradox as a standard test for the validity of any method for computing measurement uncertainty. That is, a statistical method, regardless of whether it is derived based on frequentist or Bayesian statistics, must resolve the Ballico paradox. Otherwise, the method is invalid.
References