Is it normal to have a standard deviation value higher than the average value?

22 January 2015 16 8K Report

Hi there, this is the second time I calculate the standard deviation value and the number is higher than the average. I wonder if this case is normal and what is the simplest explanation to such result? Thank you in advance...

Sama S. Almaarofi

correct Dr. Peter, I see what you mean. i forget about the negative values. actually my data set does not have negative values but i think the wide range between my large data set is the reason. your explanation was so comprehensive. i appreciate your help.

Edward Shiu

Get back to basics. Remember the standardized normal distribution has mean zero and SD 1 (thus SD > mean). Furthermore, adding or subtracting a constant value (say the mean vlaue of a variable) will not change its standard deviation. Therefore measurement scales (e.g., likert scales) based on 0 to 6 (such as degree of agreement) can be just as well transformed to -3 to +3 and thus likely to yield lower mean values closer to zero. This transformation will not affect the spread of the distribution and can thus the SD is more likely to be greater than the 'mean' value.

Of course if you are capturing information such as the respondents' annual income where you expect each value to be large, one would not expect the SD for this measure to be greater than the mean.

Jochen Wilhelm

Assuming your affiliation is correct, then here might be an example of some more "practical relevance":

Consider you count fishes of a particular species in a particular area of a reef. During the counting process, each of the fishes around this reef must have the same and constant probability to be counted. You do not count all the fishes that live in the reef, only those that swim through your counting-area within a fixed time period.

Each time you repeat this experiment you will count a more or less different number of fishes. From the same and constant probabilities for each individual fish to be counted follows what we can expect about such counts. This expextation is mathematically expressed by a distribution model, and the model here is the Poisson distribution.

Given the probabilities are frequencies (the probability of observing a fish is approximated by the relative frequency with what the fish is counted) then the Poisson distribution describes the expected relative frequencies (or proportions) of counts in a long series of such experiments.

The Poisson distribution has one parameter, lambda, what equals both the mean and the variance of the counts. The standard deviation is the square-root of the variance. This alone demonstrates that for a mean counts below 1 the standard deviation will be larger than the mean. I attached a figure showing the Poisson distribution for a mean count of 0.5 fishes (per experiment). The distribution shows that most often (in 60% of such experiments) there will be not a single fish counted (k=0), in 30% of such experiments one fish will be seen, and in 10% 2 or more fishes will be seen.

The mean is 0.5 and the variance is 0.5, too. The standard deviation is sqrt(0.5) = 0.707 what is larger than the mean.

Now you may not be interested in repeating the counting in the same area all the time. This area may be special and may not be representative for the reef. So it is better to repeat the counting in different areas of the reef to get a better (= more typical for the entire reef) estimate for the average number of fishes. But this can lead to a violation of the assumption that the probabilities of seeing a fish are similar and constant; there may be areas where the fishes are rare and others where the fishes are frequent. Therefore the Poisson model becomes inappropriate. Your counts will vary even more that expected from the Poisson model. This is called "overdispersion": the variance will be larger than the mean. It should be obvious from the previous explanations that for such overdispersed data the standard deviation can be larger than the men even for higher mean counts.

The second figure shows an example. This is essentially a mixture of two distributions. You can imagine that the 25% of the reef is grown with a particular type of coral, and in these areas the mean number is 100, and in the other areas the mean number is 1. If you count a lot of such areas, then the distribution shows what you will get: most of the time you will see zero or one fish, but you will also often have counts between 90 and 110. The mean is 34 and the standard deviation is 47.

So you see that it is possible for real data that the standard deviation is larger than the mean. In this particular example it is more interesting that the variance is larger than the mean. If you have a sample of count data and you find that for your data the (sample) variance is larger than the (sample) mean, than this indicates that the Poisson distribution may not be appropriate to medel the abundance, and this in turn means that the areas are not as homogeneous as you thought. This inhomogeneity could mean that there are kinds of "preferred places" or "avoided places" of the fishes, and it might be interesting to search a possible reason for this to better understand the biology of these fishes.

Eshak Mourad El-Hadidy

no, means value larger than SD or SE, if another case the design of experiment may error

Kenneth Hemmerechts

Check your median. The median should be fairly similar to the mean. If it is not then you have skewed data. Geometric means are a solution.

Fikrat M Hassan

Try to use SE instead of SD, may you find it less than mean:

SE = SD /n0.5

Jorge Ortiz Pinilla

Sama, in some distributions this may be a common situation. For example in the binomial distribution with n an p related by p < 1 / (1 + n * k^2) you will have a standard deviation equal to k times the mean. Take n = 3, k = 5 and p = 1 / 76 and you will have a standard deviation 5 times the mean.

If S is the standard deviation and M is the mean, the ratio S / M is known as the coefficient of variation (valid only for ratio scale variables).

Rajveer Rawlin

just an added query here given in financial markets standard deviation is an indicator of total risk if the standard deviation exceeds the mean, has the risk of the security increased?

Vasilios Pergialiotis

This reflects an abnormal distribution most of the times. Should check normality and if this is the case use non parametric analysis and median (range) values rather than mean and sd

Cathy Monteith

If you SD is >mean your data is likely skewed and median values and interquartile ranges would like give a better representation of the data

Đorđe Grozdić

Yes, it's possible to have a higher std than mean value, and it means that your data isn't normally distributed.

Jochen Wilhelm

Đorđe Grozdić , SD > mean does not neccesarily imply that the data isn't normally distributed. This is only the case when the data is strictly positive.

Jorge Ortiz Pinilla

In normal distributions the parameters (mean and variance) are independent. You can change the value of one of them without affecting the other. Note that the standard normal distribution has variance 1, which is more than infinite times the mean that is zero!

Keshav Prasad

@ Sama S. Almaarofi

Standard Deviation > Mean: Preponderance of Low values

Standard Deviation < Mean: Preponderance of High values

Larger the difference between SD and Mean, more will be the frequency of extreme values.

Ikenna Onyekwelu

I have similar results of my SD>mean. I notice that the low values as against high values is more. I did the COV to understand the variability but turned >100%. I guess it implies that the variable am analysing is widely spread.

Santhalingam Sathees

Your data seems non normally distributed. consider non-parametric estimates.

In Gstreamer pipeline Camera distortion correction, from C++ API, I could not able parse camera parameters and not corrected image.?

Ground Motion Modification with amplitude-scaled method based on ASCE7-22 ?

Can I grow HEK293FT cells in suspension culture?

How can I Model BRB and define it's Inelastic data in Etabs2019?

Did anybody use Labchart sofware for calculating power spectrum for EEG waves ?

How to convert a GUI (matlab) to .exe (standalone)?

Can Hydrogen gas be captured and stored using zeolite?

How to standardize FAS for COD analysis (Closed Reflux)?

Does the expiration date effect on the binding capacity of Ni-NTA or any other chromatography resins?

Why do we use a combination of buffers(it is better to say amphoteric compounds) and their salt in some buffer solutions for protein work?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Why does my protein refolded to beta sheet during thermal denaturation analysis?