Excellent question - what is the "best" outlier cutpoint for cost analyses so as to retain accurate/appropriate/meaningful results? It does seem that "it depends..." is not a copout here. The distribution of costs of care does not follow a bell curve, so the usual "textbook" answers are neither helpful nor relevant. The cost distribution curves are all over the map depending on the specific condition (illness) and treatment(s) applied. To further complicate things, there are situations in which a lower cutpoint should be applied. Consider the cost of "observational care" such as a periodic ultrasound to follow an abdominal aortic anuerysm (AAA) - the low cost of this "treatment" for this subpopulation of patients with AAA would skew the "mean" cost of care for AAA and would also likely skew the cost distribution ("box-and-whiskers" plot) curve for AAA. Conditions with cost curves that are broad with large Standard Deviations (on the top end of the curve) would likely require a higher cutpoint (farther out on the upper tail) to appropriately capture a greater portion of the "outliers" that represent a legitimate component of cost of care for such conditions. On the other hand, for conditions in which the clinical care is homogenous with narrow cost curves and small Standard Deviations, it might be more appropriate to set the cutpoint lower to eliminate billing/claims/data reporting errors that invariably show up in such studies (so called medically unlikely edits that are often used by third party claims payment systems to detect billing errors and fraudulent claims).
Great response, James. Appreciate your insights. Would it be fair to say that the designation of an individual as an outlier essentially depends upon both the shape of distribution of the population and the purpose of the analysis? And going one step further, is the characterization of an individual data point as an outlier always a subjective decision by the researcher?
Bootstrapping the means is a common technique used to overcome these issues in cost studies. This will create a normal shaped distribution from which you can get a mean and a 95% confidence interval for the estimate. Cost distributions are usually always right skewed so appropriate statistical techniques that can handle skewed data should be used to analyse this type of data. Outliers should not be removed unless there is a very good reason to do so.
Absolutely agree with Professor Comans - arbitrary "eye-ball" exclusion of data points is a crude statistical technique fraught with hazards (neither proportional nor amenable to Cox)! It belongs in the lower drawer section of the "tool box" along with hunch and intuition.