Firstly, it is mandatory to understand the difference between and SD-standard deviation and SE(M)-standard error of the mean.
The SD describes a dispersion measure of the SAMPLE. You can find roughly 66% of the sample data between ±1SD for normal distributed variable.
The mean estimated by your sample is usually an estimator of the population mean. But when you draw more that one sample from the same population, the mean values will slightly vary. Therefore, the SE(M) gives you an estimator for the variation. SE is calculated SD/sqr(N)
As you can see, with growing N, SE(M) becomes smaller. That's why the power of statistical tests increases with increasing N, e.g. the difference between two mean values is divided by the standard error of the mean -> one sample t-Test
Thanks for this question! Actually this is little bit confusing for me too. I see a lot of publications where "mean±SD" is used. But I'm not sure if it is mathematically correct. In some discussions I've heard that using "mean±SD" makes no sense form statistical point of view and we need to calculate the SE value. Because "mean±SE" is the one which represents the variability of the sampling distribution of a statistic.
Is there a good mathematician who could put things in order here?
If you measure a variable, e.g. a psychological trait, then the mean value of the sample is said to be an unbiased estimator of the population mean. But if you would draw k samples with the same N, the measured mean values would differ. But how large is the variabilty? This variability of the mean values is represented by the SE. Luckily, it is not necessary to draw k samples of the size N from the population to determine the variablity. It can be derived from the sample SD --> SE=SD/sqrt(N).
With the SE at hand, you can for example caculate the confidence intervals for the mean, showing in which range the "true" mean value of the population will be. E.g. mean±(1,96*SE) for a 95% confidence interval, where 1,96 is the z score for alpha/2.
The standard error is a rather useless statistic. As a measure of the precision of measurement of the mean, it is much less useful than the 95% confidence interval (it is, in fact, a sort of 67% confidence interval, which is about as useful as a half a hat.
So rather than give standard errors, the recommendation is to give confidence intervals. Douglas Altman lists giving standard errors to describe data as a definite error in his classification of errors in statistical presentation (see page 2666 of http://www.medicine.mcgill.ca/epidemiology/moodie/AGLM-HW/Altman1998.pdf)
Also see a very good note by Altman here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1255808/
"But I'm not sure if it is mathematically correct." - this is all mathematically correct (unless you don't make some calculation error). This is neither the problem nor the right question.
The right question is: Is is reasonable to show SD or is is reasonable to show the SE. What reasonable is depends on several things. First of all: if the data is not approximately normal distributed, then SD it typically unreasonable, and SE may be reasonable only when the sample size is large. Then you have to ask yourself is you want to show the dispersion of the DATA or the precision of an ESTIMATE. Both have their own distinct value (their "right to exists", so to say).
It may be important to report how dispersed the values of a variable is. Example: physiological measures like blood pressure, BMI etc, also IQ and alike. This helps to judge if any particular value (possibly your own BMI) is "normal" or "rather atypical".
In reaserch often parameters are estimates (like the mean BMI). It may be of interest to show how precisely this parameter was estimated based on the used data. However there is a very bad habit that presumably causes a lot of confusion:
A typical research question is of the tyle like "does my treatment have an effect on the BMI?" (e.g. "does a high omega-6-fatty acid diet reduce the BMI?"). The central question is *NOT* the BMI of the controls and *NOT* the BMI of the treated. The central question is about the DIFFERENCE of the BMI between these groups. Therefor it makes only very limited sense to present the precision of the mean BMI for the two groups. The really relevant estimate is the difference between the groups, and for this estimate the SE could/should* be (but is typically not!) provided. This estimate like the difference, the treatment effect, does not have a SD; there is only a SE. Often, the estimates of the "group means" are rather irrelevant, and so are their SEs. In simpler designs such group means and their SEs can be easily calculated and so they are presented. But when the design gets more complicates (more-factorial designs, paired or longitudinal designs, designs wirth nested and/or random effects) the calculation of a simple group mean becomes cumbersome or even impossible, but yet the effects are estimated and the precision of these estimates can be given.
*presenting the CI should be preferred over the SE
I don't think so. It would make some sense to plot mean±2SD, as it gives an approximate 95% confidence interval. However, most researchers I know are primarily interested in showing small-as-possible error bars, than therefore thay show SE (simply because it usually gives the smallest error bars; they don't care much about the interpretation of what is shown). It may happen that they accidentally write "SD" instead of "SE", such mistakes happen, but in this case the intervals shown (wrongly) indicate a higher precision or lower variance than was actually obtained (in your example it would be the other way around, what does not seem to be as bad).
"If what one wants to describe is the variability in the original sampled population then one should use the standard deviation (s.d.), whose expected value does not depend on the number of replicates. If one wants to compare treatment or group means, one should use con- fidence intervals (c.i.) or the standard error of the mean (s.e.). These last two statistics decrease in size with an increase in the number of replicates, reflecting increased reliability of the estimated means." - Aphalo et al., 2012
Funnily, the article about statistical errors is itself not free of errors. For instance, a confidence interval is not a range of values that is expected to contain estimates of similar studies with a given probability.
You have just to watch these two amazing videos on Joshua Starmer (a geneticist)on his youtube channel where he explains statistics in a super easy and funny way...
StatQuickie: Standard Deviation vs Standard Error https://www.youtube.com/watch?v=A82brFpdr9g
StatQuest: The standard error https://www.youtube.com/watch?v=XNgt7F6FqDU
I don't know... maybe because most people don't realize how weird, indirect and counter-intuitive frequentist reasoning is? The mistake is obviousely already made when interpreting p-values, and it is consequently transferred to the interpretation of CIs.
Also a nice one: "Most studies report several P values, which increases the risk of making a type I error: such as saying that a treatment is effective when chance is a more likely explanation for the results" - the common but wrong view that P values are related to the type-I error, and treating "chance" as an explanation. This is also a wrong concept I read very very often.
Another very common "mistake" is that almost everything is reduced to t-tests. The text also focuses too much on t-tests. It gives the (wrong) impression that a "parametric test" is always a t-test. And the "Wilcoxon rank-sum test (or another non-parametric test)" is proposed as an alternative in cases where the t-test is not approriate (but it then tests an entirely different hypothesis!). This is also written very frequently.
Less common it the bad example given in Fig.5. This is just a case where modelling a linear relationship is perfectly fine (but it is used as a counter-example). One may check if there went something wrong with the single outlying point, but this point has not much impact (the CI for the slope is [3.1, 4.4] with this point and [3.0, 3.3] without), instead of considering some strange non-linear relationship that explaines this point.
I also sometimes see bar charts with broken axes. I don't consider this any good practice (using barcharts is already often not a good practice in the first place!).
However, the manuscript also makes many good and valid points. It's worth reading.
By answering of prof. Jochen & prof. Kenneth Carling As Jochen made clear. SD is about the variation in a variable, whereas Standard error is about a statistic (calculated on a sample of observations of a variable) and SEM about the specific statistic mean. You want to describe the variation of a (normal distributed) variable - use SD; you want to describe the uncertaintly of the population mean relying on a sample mean (when the central limit theorem is applicable) - use SEM.
With respect to our statistician collogues, here is my take on the difference between SD and SEM
Standard deviation (SD) calculates the dispersion or the variability of the "population/dataset" around the mean of that particular "population/dataset". So SD is a measure of the variability within a "population/dataset".
Standard error of the mean (SEM) is a measure that quantifies how far your "sample "is likely to be from the "true" mean of the "population". SEM is simply the SD of the averages of repeated experiments. The lower SEM is, the more likely it is that your calculated mean is close to the actual mean of the "papulation". In other words SEM quantifies the precision of the mean.
In an ideal condition (if we had all the time, energy, and sanity) SEM of a sample could be calculated as the SD of "all of the averages (means)" from a population/dataset. In other words, SEM is simply the result of measuring the variability of a "sample" from the true "population" by cellulating multiple SD of all those averages; So, SEM is simply the SD of the averages of repeated experiments.
For example, you can estimate the mean and SD of pulse rates for an athlete in one day by serial measurements of pulse rate. SEM of the pulse rate for the same athlete can be measured by a) the serial calculation of pulse rates over several assigned days; followed by, 2) calculating the SD of the averages of those pulse rates in all measurement days!
Thankfully statisticians provided a formula to estimate the SEM without having to repeat all those experiments and compromising the integrity of what it means. We have a classic formula to measure SEM: SEM is calculated by dividing the SD by the square root of the sample size of the experiment.
The standard deviation (SD) is a measure of dispersion of the sample. It will generally be larger than the expected standard error of the population which is simply - as Maryam points out - the SD/√n. Therefore is corrects the SD for by the size of the sample. Hence, small samples from a population will have larger SEs than large samples from the same population. It standardizes the dispersion measure by sample size.
Again, the two most popular answers are those of self-appointed experts who do not know their statistics. Thank you Maryam for your fine statistically accurate description free of opinionated bias and excessive hubris..
The statement "*presenting the CI should be preferred over the SE" is misleading. The SE is a simple nomalization of the SD and the CI is a simple function of the SD. The CI and the SD contain the very same information. They are mathematically confounded.
could you please elaborate, why the statement "presenting the CI should be preferred over the SE" is misleading? Also, CI, SE and SD do not contain the same information and have different purposes. You are in so far right, that the sample size is implicitly part of the SD calculation, but only with the SD value at hand, you cant say anything about the width of the SE and hence not of the CI. Both are needed for frequentistic inferences and have it's own purposes. Confounded, yes, because SD is the upper limit of SE, but there is nothing more you can say, since N is independent of the population Standard Deviation, for which SD is an estimate.
Rainer, the statement is misleading because it suggests that the CI has unique information independent of the SD and/or SE. If you know the SD, you know can calculate the CI which is a very convenient way to present the variability around the mean but is mathematically derived from the standard error. The CI is an easy and convenient function of the standard deviation, the critical value and the sample size (in other words of the SE).
Both SD and CI add needed information, in different expressions of the same thing: the variance. The mean is really not useful without the measure of variability around the mean.The p-value doesn't give you the upper and lower limits of the CI but does tell you whether the CI covers the values expected under the null.
If you are inferring to the population from the estimates of a sample, using standard error of the mean ( SEM) is preferable; but when you want to simply describe data of a given sample, you can use standard deviation(SD). By the way, the sample size issue needs to be considered
SD and/or SE. If you know the SD, you know can calculate the CI which is a very convenient way to present the variability around the mean but is mathematically derived from the standard error.
Wisam and Leake. Thank you for quoting me verbatim in your answer. For the more detailed answer please see my post of a year ago where you will find these exact statements.
Furthermore, the two "most popular" answers from Jochen and Ronan still have misleading information. The CI and the SD contain absolutely the same variance (σ ) information around the mean and the same. The CI merely gives the upper and lower bounds of the interval most likely to cover the mean. As a convenient function of the standard deviation it is a very convenient way to display the standard error for a given level of confidence (say 95%):
Upper limit CI = mean + 1.96 × (s /sqrt n)
Lower limit CI or m 1.96 × (s /sqrt n)
or
CI = (mean + 1.96 × SE, mean - 1.96 × SE)
where s is the sample standard deviation and SE is the standard error the mean which corrects for sample size.
Indeed it can be readily seen that the CI is a function of the standard deviation (and the mean). In other words the SE and the CI are completely confounded with respect to one another: The CI cannot be defined in the absence of the estimated standard deviation. The CI is simply far easier to interpret than the mean+/-SE.
This is not a trivial question. Many researchers, even statisticians, misuse SD and SEM. In most cases, SD should be reported, instead of SEM. I used to be confused too.
SD describes the variation of the values of a variable, while SEM is the standard deviation of the sample means. In other words, SD is about how spread out of the data values in the sample/population is; SEM is about the uncertainty (or precision of the measurements) about the sample means (as means vary when taking a new sample from the same population). Mathematically, SEM = SD/sqrt (N), where N is the sample size. As N >=1, SD >= SEM. Thus, researchers may tend to report SEM (smaller values) rather than SD.
@Ronán Michael Conroy was largely right: in the case that you need to report SEM, then you would rather report CI instead—which is more intuitive, although mathematically they are confounding as Patrice Showers Corneli pointed out.
Here are some useful literature:
Guidelines for reporting statistics in journals published by the American Physiological Society:
Inappropriate use of standard error of the mean when reporting variability of study samples: https://www.sciencedirect.com/science/article/pii/S1028455914000084
Mean (Standard Deviation) or Mean (Standard Error of Mean): Time to Ponder:
This is just a rhetorical difference. SD is the technically pure term it doesn't have the pejorative sense of having "erred." But remember, it measures a "standard" departure from the mean, which is also an "error" if you use this value as a predictor of the mean. Keep in mind that if you sample a random variable, you will not observe the value of the mean but something different (even tho the value you observe is an unbiased estimator of the mean). If you take the difference between that value and the mean, you can call this an "error" in estimation if you are using the sample value to predict the mean. if you take the average of these errors, you get a "standard" error, which is also a standard "deviation," the difference on average between sample values and the true mean. Stick to the actual meanings of the words and you will be fine.
In statistics, error does not mean mistake. It is simply variation or departure from the mean.. Both measures are deviations from the mean (or measures of error). The difference is that the SEM takes into account the sample size which is a more than rhetorical difference.
Three uses of SD means that all the numbers are inside the data range of the expected values like in engineering measurements which use mean±3SD . We have then the data range of 6SD.
SD means that most of the numbers are around the mean value with a data range of 2SD.
In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability. However, they measure different parameters. SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean.
In other words, SD characterizes typical distance of an observation from distribution center or middle value. If observations are more disperse, then there will be more variability. Thus, a low SD signifies less variability while high SD indicates more spread out of data.
On the other hand, SEM by itself does not convey much useful information. Its main function is to help construct confidence intervals (CI). CI is the range of values that is believed to encompass the actual (“true”) population value. This true population value usually is not known, but can be estimated from an appropriately selected sample. Wider CIs indicate lesser precision, while narrower ones indicate greater precision.
In conclusion, SD quantifies the variability, whereas SEM quantifies uncertainty in estimate of the mean. As readers are generally interested in knowing the variability within sample and not proximity of mean to the population mean, data should be precisely summarized with SD and not with SEM.
In general, the use of the SEM should be limited to inferential statistics where the author explicitly wants to inform the reader about the precision of the study, and how well the sample truly represents the entire population.
Kindly refer to these citations for additional information:
1. What to use to express the variability of data: Standard deviation or standard error of mean? https://doi.org/10.4103/2229-3485.100662
2. Empowering statistical methods for cellular and molecular biologists. https://doi.org/10.1091/mbc.E15-02-0076
3. Error bars in experimental biology. https://doi.org/10.1083/jcb.200611141
Under the normality assumption, SEM is a function of sample size n. And SD is a special case where n=1. So, in measurement science, SD is sometimes known as “single measurement standard deviation“.
I disagree with these two claims: "The standard error is a rather useless statistic" and "... SEM by itself does not convey much useful information" in the above discussions of Ronán Michael Conroy and Francis Tieng, respectively. In measurement uncertainty analysis (e.g. the GUM uncertainty framework), SEM is known as the Type A standard uncertainty (SU). SU is the fundamental quantity in the Law of Propagation of Uncertainty (LPU).
GUM: Guide to the Expression of Uncertainty in Measurement https://www.bipm.org/en/committees/jc/jcgm/publications
For sure, the SEM is indicating the uncertainty associated with a mean of independent mesurements, as it is the square root of the variance of the sampling distribution of the mean.
But the statistic (the estimated value from the observed data) can be extremely misleading.
To give an example: you have a Poisson process and you are counting the number of events in a given interval to estimate the expected value. To make an extreme case: You have only 2 observations: 12 and once again 12. So the best guess about the expectation is 12. Given the SEM statistic being calculated "as usual", the SEM is 0. So there is no uncertainty at all. This is obviously nonsense.
Ok, let's say you have 3 observations and they are not all identical. Say: 12, 10 and 14. The mean is 12 again, the SEM is 1.15. Well, now you have some finite estimate of an uncertainty, but that not really better. Given that the data were from a Poisson process, we know, by theory, that the variance is at least as large as the expectation, so if we guess that the expectation is 12, then for n=3 the SEM is at least 2.
The same applies for the normal distribution, where we usually don't know the variance, so we usually don't know the SEM either and we must estimate it from the observed data. And in small samples, these estimates can be very wrong, and this is not a very rare event.
Jochen Wilhelm your viewpoints apply to any “statistic” calculated from a small sample. But the limitation, i.e. large uncertainty, of a statistic does not rule out the usefulness of the statistic. Besides, the probability of two same observations randomly drawn from a Poisson distribution is zero, which won’t happen in real world. A nice property of a statistic is that, on the average, the statistic converges to the parameter.
The probability to get twice the same number in two subsequent realizations of a Poisson variable with mean lambda is log(P) = 0.28 * 0.6^log(lambda). For lambda = 8 this is about 10%. And this certainly happensin the real world.
Regarding the convergence on average: note that the variance is unbiased, but notthe square root of the variance.