The standard deviation is usually quoted alongside the mean, as a matter of convention. However, it rarely tells you anything useful about the dispersion of the data in the sample or the underlying population, and probably never tells you anything more useful than quoting percentiles or an interquartile range.
So why do we use it at all?
One use that is probably justifiable is calculating effect sizes such as Cohen's d. But is anyone prepared to defend its routine appearance in tables of descriptive statistics, and it's disempowering appearance on undergrad stats courses?
And while I am at it, why does no-one ever present the confidence interval for the standard deviation?
Singling out the SD seems unfair. The same question can be posed about all other moment-based estimators as well. Just like the absolute deviation is a robust estimator for dispersion, the median is robust for centrality while the mean is not. So why quote the mean? Skewness and excess kurtosis are also influenced by the standard deviation, so these are not independent descriptives either. The 'middle' of a skewed distribution is just as ambiguous as a single value for measuring its spread.
I would say that the use of the SD (and the mean) can be justified with, and because of the central limit theorem. For the normal distribution, SD then serves as a yardstick carrying a simple and unique relationship with the (symmetric) confidence interval for a chosen confidence level. If your sample size is large enough, then this is usually sufficient to apply it. Also, the standard error on the SD is typically smaller than for the IQR or CI for small data set sizes, because metrics that operate closer to the ends of the data set have larger uncertainty as a result of lower density. The closer you can define a measure of spread near the bulk, i.e., center of the data, the lower its uncertainty.
Good question! I also think that SD is only used because others use it. If I'd like to give a summary of my *data* I would provide some quantiles (e.g. median and IQR).
I never found a reference explaining how the SD as a dispersion measure was "invented". The first use goes back to Karl Pearson. Interestingly, here SD was not derived simply as average squared residuals but as a summary from the data which was part of a normalization factor of the likelihood function. This normalization factor is the standard error, and this was developed much earlier by Gauss(*) and others. It was the use of the normal distrbution, which turned out to express our expectations about symmetric and independent "errors". This model had, besides a location parameter, a dispersion parameter (the variance). Using this model lead to a likelihood function with a dispersion parameter to (the standard error). It must have been the starting point for Pearson to recognize that the standard error is a function of the square root of the variance of the normal distribution. Pearson was analyzing frequency properties of outcomes, so the normal model for the likelihood was only useful for him when the frequency distribution of the data resembles a normal distribution, too. Unfortunately, the dispersion in this model is expressed in square units of the original data, what is little intuitive. Therefore he might have felt that it is easier to report the square root of this variance a measure for dispersion in the data.
These are just my thoughts, I may be wrong. But I haven't found any better explanation yet.
(*) Gauss never described how he derived the normal distribution. It seems that he intuitively recognized this as the solution so that the maximum likelihood estimate for symmetric errors will be the sample mean. There are several derivations of the normal distribution available, none of which assumes some "real" frequency distribution of anything. It is a probability distribution, not a frequency distribution. My favorite derivation is from Maxwell and Herschel. However, Galton and others were so fascinated that many frequency distributions nicely resemble a normal distribution, and this surely influenced Pearson. The Neyman test theory finally requires a match of the frequency distributions to the probability model (only then a control of error rates is possible).
Singling out the SD seems unfair. The same question can be posed about all other moment-based estimators as well. Just like the absolute deviation is a robust estimator for dispersion, the median is robust for centrality while the mean is not. So why quote the mean? Skewness and excess kurtosis are also influenced by the standard deviation, so these are not independent descriptives either. The 'middle' of a skewed distribution is just as ambiguous as a single value for measuring its spread.
I would say that the use of the SD (and the mean) can be justified with, and because of the central limit theorem. For the normal distribution, SD then serves as a yardstick carrying a simple and unique relationship with the (symmetric) confidence interval for a chosen confidence level. If your sample size is large enough, then this is usually sufficient to apply it. Also, the standard error on the SD is typically smaller than for the IQR or CI for small data set sizes, because metrics that operate closer to the ends of the data set have larger uncertainty as a result of lower density. The closer you can define a measure of spread near the bulk, i.e., center of the data, the lower its uncertainty.
There is two important uses to SD reported on experiment results. The first is the descriptive function and the especial relation with CI and Normal distribution. The second is to verify the statistical results reported.
It seems incredible, but many researchers do not know that, with mean and SD, you can check the p-values reported. It is not surprise when you find wrong p-values reported, even leading to wrong decisions about the hypothesis. It can be performed easily with t-tests and ANOVA.
Irrespective of whether the parent distribution is normal or not SD dose have its role in some useful inequality regarding the variable under inspection. See Chebychef's inequality for instance.
I wanted to point out that the SD is not a "primary statistic" of the variability of values, with having a more or less practically interpretable meaning in a supposedly normally distributed variable, and that the SD is rather derived indirectly from the standard error (of the mean) what, in turn, was not a measure of the variability of data or estimated but a dispersion measure for expectations, a parameter of the likelihood function. This is just the reverse way of the usual text-book line of arguments, that (1) the mean deviation from the mean is always zero, that (2) the mean absolute deviation is positive but has "bad mathematical properties", and that finally (3) the mean squared deviation (=variance) is also positive but with "nice" mathematical properties. And because this measure has the square unit of the variable, a more intuitive measure is the square root of the variance, what is then called the SD. In the further steps the SE is derived from the SD. I learned it this way and I always found it confusing. The most interesting questions were never answerd by this. The unfortunate lack of distinction between probability and frequency distribution subsequently caused a lot of misunderstandings and conceptional problems. By all I know by now is the connection just the other way around, so the SE is a "fundamental" statistic, and the SD is indirectly derived. This way, everything makes much more sense to me.
I never wanted to say that the SD, the mean, the skew or any other characteristic of a distribution model is per se inappropriate, misleading, or unuseful. They have their places and meanings. But I find it just unpropitious using the SD as a dispersion measure of the data, because its meaning is much deeper and much more indirect than written in many text books. The same is the case for the mean (average), what is also much more than a simple location measure for the data. It took mankind 100+ years to understand this (despite the ease of calculation the mean was usually NOT used in empirical sciences in the 17th/18th century *because* this was not clear!), and now since we know it, text books still and excessively introduce the mean as a simple summary statistic of the data (often demonstrated by example with weights placed on a balance), what I find unfortunate, because it hinders a deeper conceptual understanding.This balancing thing is not a "reason" for the mean, it is more an "effect". This introduction or explanation leads to an confusion of "cause"(reason) and "effect", subsequently making statistics so difficult and so puzzeling for many, and often ends in using statistics to answer questions that were never asked and interpreting results wrongly.
As I understand it, as an ecologist and not a statistician, the main problem is that are two standard deviations in the literature: the standard deviation of the sample (SD), and the standard deviation of the mean, which is better called the standard error (SE). For a normal distribution and a sample size of n we usually indicate confidence limits on the mean m as m ± SE, where SE = (SD / square root n). The 95% confidence limits are m ± 1.96SE. Too many researchers present the sample SD instead of the SE and do not always give the sample size. Note that using the SD in place of the SE always make your results look worse!
Peter, I have to veto ;)
A minor point is your observation that SD is used over SE ("Too many researchers present the sample SD instead of the SE"). I see the opposite. In the biological and medical literature I am vaguely aware of, very typically SE is presented, and only rarely someone is really showing SD. The problem here is that many readers and authors might not clearly understand the difference and sometimes use any one of theses two for the wrong purpose. Besides this I am of the opinion that SE should actually *not* be presented as such anyway, because its interpretation is difficult and depends on the sample size (for large samples it depicts roughly a 63%-CI, but for small samples the confidence depends much on the actual sample size) and it is always better to provide an interval for a defined confidence (e.g. the 95%-CI). Btw: your 1.95*SE only is ok for large samples (for the same reason as described before). Better use t[0.025,df]*SE instead.
More dangerous I find your statement "Note that using the SD in place of the SE always make your results look worse!". Science is not a beauty contest, mean to say it is not the aim to have "nice looking error bars". Error bars (or intervals) should have a particular meaning and interpretation the reader understands (or at least *could* understand). As said above, SEs aren't easy to interpret at all, just many people are used to see them so the feel confortable with seeing SEs without actually being able to derive some quantitative information out of them (why showing then?). CIs, in contrast, have a simple straight-forward interpretation (range of non-rejectable hypotheses), that might often directly get a Bayesian interpratation (as the range of hypotheses covering 95% credibility or a posterior predictive interval, given a flat prior is used [what often is not very insensible]).
So at the bottom line I think we agree to say: If your aim is to show an estimate for some parameter (a mean or an effect size), provide a confidence interval. If your aim is to give a summary value for the dispersion(*), you can use the SD (but I still would say that giving the IQR or some other inter-quantile range is more appropriate).
(*) The dispersion can be seen as a parameter, too. But here the variance would be the better choice. And again provide a CI for the variance estimate!
Thank you Jochen, I was aware that the 1.96*SE was an approximation to a 95% confidence interval (CI) and your t[0.025,df]*SE is useful. When I said "Too many researchers present the sample SD instead of the SE", I meant that some make the error of presenting the mean ± sample SD, in the place of ± SE, to indicate the CI. Any errors of this sort are "too many". I agree that it would be better to quote the mean ± CI.
About other coefficients of dispersion, some of my colleagues like to use the coefficient of variation (CV) given as 100*SD/ mean.
Luis, that is right. In addition to its other uses, the application of SD in calculation of statistical power of test is one of its underutilized aspects.
I fully agree with Arnaut. Singling out the SD is unfair.
In my opinion the disempowering aspect of undergrad stats courses is the focus on the mean and SD at the expense of the other moment estimators. Another more disempowering aspect is the focus on the normal distribution, and all the fudging (e.g. transformations) to fit nonormal data to a normal distribution.
Dear Collegues: with the sample size (N), mean (m) and SD we can replicate the results reported. This is a parametric approach. If we use a non parametric approach (median and interquartile range) we need to have the whole database to replicate the study. I have writen a very simple program in basic to do this.
I don't see how you can replicate the study. And the only way you can replicate data from a mean and standard deviation is under the assumption of a particular distribution.
Please explain!
@javier: Are you saying you recapture the whole data from it mean and sd?
Dear collegues: it is very usual report the results by using the number of cases, the mean and the SD f
For instance:
Group Study: N, Mean, SD
Group Control: N2, Mean 2, SD 2
If the variable has a normal distibution you can compare both groups in spite of do not have the original database. This can be done with Oneway variance test or a Student's t test. That is not the same in the case of non gaussian variables. In this situation investihators show N, Median, IQR for each group. In the last case we do not calculate nothing without complet access to the original , raw, observations. Kruskal Wallis test runs by ranking the original observations. The same is true for
Mann Withney or Mood test.
This is the sense of my first answer. Obvoiously I don't recapture nothing.
Thanks you for your interest.
The SD is useful for the believers in Normal Distributions, but it is not relevant for non-parametric statistics. I see no clear sense in its definition: it is cuadratic, it is not easy to explain to students, and it tends to infinite when high extreme values appear in the sample. There are many good critics of it that have written demolishing articles.in the past century and in XXI.
Luis: The circumstances under which the mean and standard deviation summarise what one wants to know about a problem are rare, in my experience. I want to see the data, I am interested in extremes, in clustering of values, and in the shape and granularity of the distribution.
Emilio: I would be very interested in references to critics of the SD. I have been trying to find critical papers, and find the sort of silence that suggests no-one bothers to think about it!
The reporting of the mean and SD have a historical link to the normal distribution and the concept of sufficient statistics. However, although mean and SD are limited in value and approximate normality will often not hold, good summary statistics such as the mean and SD are very useful.
First, you can tell a lot of from the SD. if you have approximate normality then the SD has heuristic value because it tells you the approximate average distance a typical point lies from the mean. If the quantity is bounded then the SD can tell you if the data are skewed and alert you to ceiling effects or other anomalies. For example, measure from 0 to 100, Mean = 20, SD = 15 means there is positive skew.
Second, it is useful for also sorts of other computations (effect size, power etc.).
Third, it is an invaluable data quality check. This could be because you want to check the accuracy of an analysis as a reviewer or to detect fraud or to check for errors in published work. Several recent prominent cases of fraud hinge on the reported SDs not matching the the inferential statistics.
For most research you need at minimum to report either the SD or something similar (SE or t plus differences in means) to do these kinds of checks. One can argue that a good graphical summary and raw data is even better.
As a teaching concept I agree that the SD on its own is not useful - but its importance is bound up with the concept of variance (essential), standard errors (essential), correlation (a necessary evil) and so forth.
As for the CI for the SD - I do cover that in my own book.
Ronan, for the moment two references:
1) "The Myth of the Bell Curve" by Ted Goertzel
Adapted and condensed from: Ted Goertzel and Joseph Fashing, "The
Myth of the Normal Curve: A Theoretical Critique and Examination
of its Role in Teaching and Research," Humanity and Society 5:14-31
(1981), reprinted in Readings in Humanist Sociology (General Hall,
1986).
2) "Most bell curves have thick tails" by Bart Kosko (Information Scientist and Professor of Electrical Engineering and Law, the University of Southern California; Author, Noise) (2006)
Luis, I certainly think that accepting any theory is an act of faith that includes its premises, tools, methods and terms. Some theories are more rational than others but less realistic. I can not accept that in normal curves median=media=mode nor that half of the total distributed mass remains at the middle of population when dispersion increases. If you have a uniform distribution and give a tiny piece of one half to the other half, immediately it is reduced the proportion of "richer" receptors of the tall half. In a Lorenz curve this is represented by point (Z, 1/2).
Thanks, emilio
Indeed, a sample standard deviation is not more informative than an interquartile range or some percentiles, and arguably in many situations less informative. So why do people use it? Because everyone else does. And why did people start using it? Because in the old days it was easy to calculate the average, and the average of squared deviations from the average, but sorting the data and computing various percentiles was a lot harder. In other words, what we do now is largely determined by the computing facilities of our ancestors. And sure, one could give confidence intervals for it (and for quantiles etc) as well .. but are we really trying to make inference to a population? Maybe we are just trying to briefly summarizing the data at hand.
Boxplots are excellent graphical summaries of data.
Dear collegues,
When we use SD not only decribe the populatin or sample but make possibly to do inferences or comparations between groups without to read the "whole database".
Obviously if you need only to describe you can use a boxplot or a stem&leaf plot. Even you can try the dataset by hand ( not a good idea).
Standard deviation is so informative as interquartile distance or the mean, median, mode, variance or whatever statistic you choose.
Standard deviation, as already stated sometimes above, is very atractive because it's relation to Normal distribution and because it can be used to parametric hypothesis tests. But what if I am not working with parametric techniques? What if my data is not even continous? So don't use it!
SDs are described in a way to add information. But if you don't know what information is it, don't use it! But we have now more and more "researchers" thinking they know how to analyse data and do not want to study statistics (because the lack of time) and do not want to pay a professional to analyze data (because we need the money to more important things). At the end, we spend a lot of money collecting data and analyze it the way we "know".
Finally, consider the scenário where a statistician decides to perform a microarray research and he does not know anything about DNA, hybridization and this kind of things. So, he decides to do it by his own. What a great research it will be!
But not...
My apprehension: Why no one made any reference to the use of SD as a measure of volatility and the enitre financial world revolves around studying, benefiting out of and overcoming volatility? At least in so far as the financial data goes, SD is a very simple parameter to indicate volatility - check the SD, pre & post an event, and if the SD post is higher, we say we are in higher volatility. Please correct if I am wrong. Thanks
Dr Durairajan
Possibly the whole (quite unsuccessful) struggle of the financial world in overcoming volatility is related to the fact that the SD might not be a good or appropriate measure of volatility ;)
Dear Ronan,
Once upon a time we had to draw our figures over strain differences by hand in the laboratory where we housed Mus Musculus L. SD told us a lot about the differences even if we recalculated them into SE to make them fit into the figures.
Standard deviation is a powerful statistic and it's used almost in all statistical calculations that demand the extent one moves away from the mean or any standard basis that it's to be used as reference to determine a range of acceptability such as in confident interval calculations. Under normal distribution standard deviation plays the role of helping one judge how minimal his or her error is by estimating the kind of error one commits as you deviate at multiples of the standard deviation from the mean. Its practical use reflects in comparison ( coefficient of variation) and measures of range of acceptance ( confidence interval) to make certain conclusions possible in both parametric and non parametric data analysis.
http://en.wikipedia.org/wiki/Standard_error
Jochen - good point. Not just useless, but potentially toxic to the planet!!
James: I cannot agree. It is not a powerful statistic, just a statistic. It is not used in confidence interval calculations (that's the standard error).However, I am struck by your accidentally calling it the "standard devotion". There is something superstitious about the use of the standard deviation. I am suggesting that we rethink its mindless use.
It does not communicate to most people, and in most situations the inferences people draw from it are wrong.
Beatrice: I agree that calculating measures of dispersion is a useful way of looking at data, but really it's the standard errors that are useful in comparing groups, isn't it? And as for measuring dispersion, the standard deviation is not robust to outliers, so it takes just one determined Mus to mess up your SD.
Ronan, thanks for the correction on the devotion for deviation. Serious the only mystery about the standard deviation is that it is used to calculate the standard error only for professionals to accept what it aids to calculate (standard error). So what makes the standard deviation potentially toxic if it takes a prime role in calculating the standard error which you agree is of high importance since it aids in calculating the confidence interval and other similar statistics and parameters that are used in data analysis? I think the standard deviation still has a lot to offer.
While I have sympathy for Ronan's point, the standard error is the the SD of the sampling distribution ... Thus saying the SE is more informative than the SD is saying that the SD can be useful in some contexts.
Dear Ronan,
Very true, in our small mice groups it took only one mouse to mess up our distribution.
Bravo to that mouse, Béatrice! In every batch, no matter how we control for genes and environment, there beats a heart we cannot tame.
My motto used to be "I'm not an outlier; I just haven't found my distribution yet".
Well, and still it is much more likely with large samples to get demonstrated that the distribution model for the data is incorrect.
In http://www.nature.com/pr/journal/v74/n6/fig_tab/pr2013156f3.html the aouthors would expect 1.7% of the neonates have a negative amount of body fat! In their sample size (581) they should expect 9 neonates with "negative body fat". There was not a single one!
The prevalence of breat cancer in US is about 0.95% (http://seer.cancer.gov/statfacts/html/breast.html). This is a considerably lower proportion than the expected proportion of neonates with "negative body fat". There are breast-cancer screening programs, courses, clinics, and much money is invested in drugs. But the "negative body fat"-problem seems to be much more severe! Why is nothing done there?! Why do we let all these babies die, taking no notice of their metaphysical problem?
This should demonstrate that the discrepancy between model and data needs to be judged for its relevance. Simply stating that a model is wrong is typically not helpful. As Tukey said: "All models are wrong. But some are useful".
Actually, it was George Box who said: "All models are wrong. Some models are useful".
But to get back to the standard deviation, the problems with it are due to lack of robustness. To a lesser extent the same applies to the mean, so should we abandon that as well? I say keep both but be aware that highly skewed or long tailed distributions make both statistics suspect. The lesson is that one should always examine the assumptions that an analysis is based on.
Evangelos : I take exception to the idea of using the standard deviation to define normal. There are a number of very serious reasons why this is wrong:
1. It confuses 'usual' with normal. If you look at "normal" Irish cholesterol levels, they are almost all in the range that places the person at high risk for cardiovascular disease. Worse, it implies that there is a "normal" range for blood lead levels. The normal level for blood lead is zero. Anything else is abnormal.
2. It suggests that the extremes of a parameter are abnormal. In fact, there are no real health problems associated with primary low cholesterol (though it can be low secondary to malignant lung disease, but that's another matter).
I also do not think that you can defend the SD because it's a step in the calculation of the standard error. The standard error is a basic requirement of data reporting. It allows us to measure the uncertainty associated with estimation. But the standard error does not require the calculation of the standard deviation – indeed, the concept of the standard deviation of a proportion or an odds ratio is meaningless.
There is actually a formula for the confidence interval for the standard deviation. The fact that no-one ever calculates the confidence interval suggests strongly to me that no-one cares about the standard deviation, and that its precision is therefore utterly unimportant.
Nu?
standard deviation can never be used for define normality. check out our new paper
Article TO DETERMINE SKEWNESS, MEAN AND DEVIATION WITH A NEW APPROAC...
It is the spread of data which is always necessarily considered while testing a hypothesis. Also data of several studies can be pooled only with homogeneous variance of studies. In medical research, I have seen a very large range of SD in the reporting of observations of sperm concentration over a period of 40 years. Such a wide range has been reported in every decade starting from 1970.
The question is, if you have to comment any thing on the trend of sperm concentration, whether or not you will consider the SD in reporting the trend? If YES, then how? As a bio-statistician I feel it necessary to include SD while reporting the trend of sperm concentration.
Please comment.
You are working on a problem that economists and social scientists have been working on for years – measuring inequality. There are a number of interesting indices of inequality, and a whole lot of useful theory you can read that will allow you, I think, to come up with a useful measure in your domain. The Wikipedia page is actually a good starting point:
https://en.wikipedia.org/wiki/Income_inequality_metrics