Can things like weight or height be normally distributed?

07 July 2015 17 6K Report

In undergraduate (and some graduate) statistics courses required for a degree in many of the sciences, it is not uncommon to find examples like height or weight following a definition of a “continuous variable” or a continuous variable that is normally distributed (height and weight of a population is a fairly common example for a variable that is normally distributed). The most common statistical tests used in (null hypothesis) significance testing assume that the variable in question is continuous (otherwise, there wouldn’t be an issue of how robust e.g., ANOVA or t-tests are to violations of normality assumptions). Let’s assume that human birth weight (currently or all humans- past, present, and future) is actually normally distributed or that it is simply continuously distributed (a far weaker assumption). The number of human births, even if we consider all human births that were or will ever be, is countably infinite. Therefore, there is some one-to-one function that can map all birth weights that were or will ever be to the set of rationals in the unit interval, and no possible one-to-one mapping from the set of all birth weights that ever were or will be to the interval [0,1] (the required interval for every continuous probability distribution). It cannot be that birth weights are normally distributed, but there is a more far-reaching issue here. Consider the probability that a randomly “picked” number from the unit interval will be rational. That probability is 0, because even though the rationals are “dense” (they satisfy the incorrect definition of “continuous” given in many an introductory statistics textbook that continuity means there are an infinite number of values between any two values in the set), they “fill-out” a negligible “amount” of the unit interval (they have measure 0). Thus whenever we say that some variable like “weight”, “height”, etc., is normally distributed we are asserting:

1) There is no interval of possible values this variable can take in which an irrational number doesn’t appear

2) The condition that between any two values there must exist infinitely many other possible values is wholly insufficient (alternatively, between any two values there are infinitely many rational numbers AND infinitely many rational numbers)

3) If we remove all rational values from the set of all possible values this variable can take (alternatively, if we remove all the rational points along the x-axis under the normal curve of this variable), what is left over is essentially the same (we have removed an “amount” of measure 0).

Given that often we treat as normally distributed variables that are actually far more clearly “discrete” than those like all present, past, and future birth weights, to what extent are we justified in doing so? Alternatively, to what extent are we justified in using as a basis for hypothesis testing or statistics more generally a formulation of probability theory that isn’t measure-theoretic (i.e., one in which the distinction between continuous and discrete variables is dismissed as artificial and unnecessary)?

Badges
Science topic

More Andrew Messing's questions See All

Is there any use in constructing/defining integration over (some subset of) the rationals?

I was working on 2 papers on statistics when I recalled a study I’d read some time ago: “On ‘Rethinking Rigor in Calculus...,’ or Why We Don't Do Calculus on the Rational Numbers’”. The answer is...

05 June 2015 7,928 85 View

Which math textbooks do you (or would you like to) use to teach?

Most of the time, I don't get to choose the textbooks I use for undergraduates (the main exception is when I'm asked to tutor, not teach a class). But I am always trying to persuade faculty or...

04 May 2015 8,338 1 View

Likert, Language, Linguistics, and Loss: How do we justify the use of Likert-Type Response Data?

I’m working on a paper on Likert-type scales, as well as a statistical measure/test that sort of emerged by accident whilst working on the paper. However, I was hoping for some preliminary...

04 May 2015 7,245 13 View

Know of any free (legal) resources, libraries, etc.?

I spend a lot of time doing consult work, teaching, tutoring, etc., both volunteer and paid, and over the years (and unintentionally) I started reviewing and documenting freely available sources...

02 March 2015 7,420 7 View

Do you know of any good databases/datasets?

I often see people asking for data, databases, and similar resources, so I thought it would be nice to start a dialogue in which we can share one's we know of and learn of those others use....

02 March 2015 6,677 14 View

Can creationism contribute to science?

I was ranting during a conversation/debate with a family member’s friend when a thought occurred to me (apparently even I can’t stand listening to me). Can we dismiss pseudoscience and fringe...

02 March 2014 8,788 7 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

Is it possible to plot the atom-projected band structure using GPAW?

Hi, I'm currently working on a project where I need to plot the atom-projected band structure using GPAW. I've been able to calculate the band structure for my material, but I'm having trouble...

07 August 2024 269 3 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View