In undergraduate (and some graduate) statistics courses required for a degree in many of the sciences, it is not uncommon to find examples like height or weight following a definition of a “continuous variable” or a continuous variable that is normally distributed (height and weight of a population is a fairly common example for a variable that is normally distributed). The most common statistical tests used in (null hypothesis) significance testing assume that the variable in question is continuous (otherwise, there wouldn’t be an issue of how robust e.g., ANOVA or t-tests are to violations of normality assumptions). Let’s assume that human birth weight (currently or all humans- past, present, and future) is actually normally distributed or that it is simply continuously distributed (a far weaker assumption). The number of human births, even if we consider all human births that were or will ever be, is countably infinite. Therefore, there is some one-to-one function that can map all birth weights that were or will ever be to the set of rationals in the unit interval, and no possible one-to-one mapping from the set of all birth weights that ever were or will be to the interval [0,1] (the required interval for every continuous probability distribution). It cannot be that birth weights are normally distributed, but there is a more far-reaching issue here. Consider the probability that a randomly “picked” number from the unit interval will be rational. That probability is 0, because even though the rationals are “dense” (they satisfy the incorrect definition of “continuous” given in many an introductory statistics textbook that continuity means there are an infinite number of values between any two values in the set), they “fill-out” a negligible “amount” of the unit interval (they have measure 0). Thus whenever we say that some variable like “weight”, “height”, etc., is normally distributed we are asserting:

1) There is no interval of possible values this variable can take in which an irrational number doesn’t appear

2) The condition that between any two values there must exist infinitely many other possible values is wholly insufficient (alternatively, between any two values there are infinitely many rational numbers AND infinitely many rational numbers)

3) If we remove all rational values from the set of all possible values this variable can take (alternatively, if we remove all the rational points along the x-axis under the normal curve of this variable), what is left over is essentially the same (we have removed an “amount” of measure 0).

Given that often we treat as normally distributed variables that are actually far more clearly “discrete” than those like all present, past, and future birth weights, to what extent are we justified in doing so? Alternatively, to what extent are we justified in using as a basis for hypothesis testing or statistics more generally a formulation of probability theory that isn’t measure-theoretic (i.e., one in which the distinction between continuous and discrete variables is dismissed as artificial and unnecessary)?

Similar questions and discussions