Hi there, this is the second time I calculate the standard deviation value and the number is higher than the average. I wonder if this case is normal and what is the simplest explanation to such result? Thank you in advance...
correct Dr. Peter, I see what you mean. i forget about the negative values. actually my data set does not have negative values but i think the wide range between my large data set is the reason. your explanation was so comprehensive. i appreciate your help.
Get back to basics. Remember the standardized normal distribution has mean zero and SD 1 (thus SD > mean). Furthermore, adding or subtracting a constant value (say the mean vlaue of a variable) will not change its standard deviation. Therefore measurement scales (e.g., likert scales) based on 0 to 6 (such as degree of agreement) can be just as well transformed to -3 to +3 and thus likely to yield lower mean values closer to zero. This transformation will not affect the spread of the distribution and can thus the SD is more likely to be greater than the 'mean' value.
Of course if you are capturing information such as the respondents' annual income where you expect each value to be large, one would not expect the SD for this measure to be greater than the mean.
Assuming your affiliation is correct, then here might be an example of some more "practical relevance":
Consider you count fishes of a particular species in a particular area of a reef. During the counting process, each of the fishes around this reef must have the same and constant probability to be counted. You do not count all the fishes that live in the reef, only those that swim through your counting-area within a fixed time period.
Each time you repeat this experiment you will count a more or less different number of fishes. From the same and constant probabilities for each individual fish to be counted follows what we can expect about such counts. This expextation is mathematically expressed by a distribution model, and the model here is the Poisson distribution.
Given the probabilities are frequencies (the probability of observing a fish is approximated by the relative frequency with what the fish is counted) then the Poisson distribution describes the expected relative frequencies (or proportions) of counts in a long series of such experiments.
The Poisson distribution has one parameter, lambda, what equals both the mean and the variance of the counts. The standard deviation is the square-root of the variance. This alone demonstrates that for a mean counts below 1 the standard deviation will be larger than the mean. I attached a figure showing the Poisson distribution for a mean count of 0.5 fishes (per experiment). The distribution shows that most often (in 60% of such experiments) there will be not a single fish counted (k=0), in 30% of such experiments one fish will be seen, and in 10% 2 or more fishes will be seen.
The mean is 0.5 and the variance is 0.5, too. The standard deviation is sqrt(0.5) = 0.707 what is larger than the mean.
Now you may not be interested in repeating the counting in the same area all the time. This area may be special and may not be representative for the reef. So it is better to repeat the counting in different areas of the reef to get a better (= more typical for the entire reef) estimate for the average number of fishes. But this can lead to a violation of the assumption that the probabilities of seeing a fish are similar and constant; there may be areas where the fishes are rare and others where the fishes are frequent. Therefore the Poisson model becomes inappropriate. Your counts will vary even more that expected from the Poisson model. This is called "overdispersion": the variance will be larger than the mean. It should be obvious from the previous explanations that for such overdispersed data the standard deviation can be larger than the men even for higher mean counts.
The second figure shows an example. This is essentially a mixture of two distributions. You can imagine that the 25% of the reef is grown with a particular type of coral, and in these areas the mean number is 100, and in the other areas the mean number is 1. If you count a lot of such areas, then the distribution shows what you will get: most of the time you will see zero or one fish, but you will also often have counts between 90 and 110. The mean is 34 and the standard deviation is 47.
So you see that it is possible for real data that the standard deviation is larger than the mean. In this particular example it is more interesting that the variance is larger than the mean. If you have a sample of count data and you find that for your data the (sample) variance is larger than the (sample) mean, than this indicates that the Poisson distribution may not be appropriate to medel the abundance, and this in turn means that the areas are not as homogeneous as you thought. This inhomogeneity could mean that there are kinds of "preferred places" or "avoided places" of the fishes, and it might be interesting to search a possible reason for this to better understand the biology of these fishes.
Sama, in some distributions this may be a common situation. For example in the binomial distribution with n an p related by p < 1 / (1 + n * k^2) you will have a standard deviation equal to k times the mean. Take n = 3, k = 5 and p = 1 / 76 and you will have a standard deviation 5 times the mean.
If S is the standard deviation and M is the mean, the ratio S / M is known as the coefficient of variation (valid only for ratio scale variables).
just an added query here given in financial markets standard deviation is an indicator of total risk if the standard deviation exceeds the mean, has the risk of the security increased?
This reflects an abnormal distribution most of the times. Should check normality and if this is the case use non parametric analysis and median (range) values rather than mean and sd
Đorđe Grozdić , SD > mean does not neccesarily imply that the data isn't normally distributed. This is only the case when the data is strictly positive.
In normal distributions the parameters (mean and variance) are independent. You can change the value of one of them without affecting the other. Note that the standard normal distribution has variance 1, which is more than infinite times the mean that is zero!
I have similar results of my SD>mean. I notice that the low values as against high values is more. I did the COV to understand the variability but turned >100%. I guess it implies that the variable am analysing is widely spread.