What do you consider a good standard deviation?

26 September 2014 56 7K Report

What do you consider a good standard deviation?and are there tolerance for maximal standard deviation?

Joshka Kaufmann Popular answer

Hi Riki,

For an approximate answer, please estimate your coefficient of variation (CV=standard deviation / mean).As a rule of thumb, a CV >= 1 indicates a relatively high variation, while a CV < 1 can be considered low. This means that distributions with a coefficient of variation higher than 1 are considered to be high variance whereas those with a CV lower than 1 are considered to be low-variance.

Remember, standard deviations aren't "good" or "bad". They are indicators of how spread out your data is. A "good" SD depends if you expect your distribution to be centered or spread out around the mean. This really depends on your data.

CheersJosh

Joshka Kaufmann

Hi Riki,

CheersJosh

M. Ricky Ramadhian

when data , CV >=1, it should be repeated the experiment or how?

Joshka Kaufmann

Well, it really depends on your sampling scheme or experiment.

If you have several treatments or different samplings you would like to compare, the overall distribution of your variable might be spread out for example.

M. Ricky Ramadhian

if i have SD more than 20% from mean?can that data to be analysed?

Cyril Iaconelli

There is max or min SD to realize a data an analysis. SD is just a way to represent data, but usually we compare and test more the variance (which is link the SD, i give you that).

However the homogeneity of variance between several group is a crucial point and affect mainly the type II error of the test.

Moreover, as said Bernardo, distribution of your data is more important than its SD itself.

You can compute any parametric test you wanted (with normality distribution, independence and variance homogeneity of your data), however you should be careful with the interpretation of the results.

Joshka Kaufmann

I agree with Bernardo that distribution of your data is crucial.

Keep in mind that a general misconception in the assumptions of linear models is that your variable needs to be normally distributed. This is not the case.Your residuals need to be normally distributed, not your response variable

I agree with Cyril and Bernardo that your data can be analysed if you have high or low SD, if your data is continuous or non parametric and so on. It is already a great start checking your variables distribution before running tests. Good luck

Joshka Kaufmann

Also keep in mind that SD is only valid for a normal distribution. For data that are non-normal, the standard deviation can be a terrible estimator of scale.

Ronán Michael Conroy

No standard deviation is the best standard deviation. Mostly, people pay no attention whatever to the standard deviation, just copy and paste it into an article without reading it, where others will ignore it too.

If you need to understand the distribution of your data, seeing a graph is the best start. Dotplots are good for relatively small numbers, and density plots for larger ones. Then calculating quantiles and, in parallel, examining outliers.

Many distributions don't have a standard deviation at all, though people calculate one.

Jochen Wilhelm

Ronán, could you provide a distribution that has no SD?

The problem with the SD is that it has a particular meaning relating to the dispersion of values in a normal-probability model, but that the mathematical definition can be applied to any set of values and any distribution. So one may say that the Poisson distribution does not have a SD because it is not a normal distribution, but one can calculate the square-root of E((X-E(X))²), what gives λ (the parameter of the Poisson distribution) and what is then called the SD (because, by definition, E((X-E(X))²) ist the variance and the square-root of the variance is the SD.

I'd say that the DS (and the variance) is in fact a statistical property of any distribution, but that it only has a meaning in the case of a normal distribution. The statistical property is useful to calculate properties of large-sample approximations (where the likelihood function approximates the normal distribution, where it is meaningful to talk about the SD [of the likelihood function!], what is called SE here). Thus, knowing the "not-meaningful" SD of a binomial distribution allows me to give the meaningful value of the SE of a proportion from a poll with, say n=1000 or so.

One could surely provide the formula for the SE without referring to the SD, but in all the formulas for all the different distributions will contain a similar part, and this part is just mathematically equvalent to the SD (of the distributions). So it is quite convinient to use this, although it is not particularily meaningful for the distribution itself.

Jochen Wilhelm

Nice, thank you, Bernardo. I haven't thought about distributions with undefined variance, and there are in fact a couple of them, but they really do not seem very familirar to me nor do I think that they are very often used.

http://en.wikipedia.org/wiki/Category:Probability_distributions_with_non-finite_variance

Ronán Michael Conroy

The binomial, negative binomial and Poisson (and therefore also the exponential) distributions have no standard deviation, in the sense that a single parameter – the mean – defines them. So the standard deviation does not provide any information not already inherent in the mean. So yes, more common than you think. And although I don't work with Power Law distributions, the Pareto distribution can reduce to a single parameter also, with no defined variance.

And remember how common ratio variables are – Body Mass Index for example. The Cauchy distribution has, as Bernardo points out, no defined variance.

Jochen Wilhelm

Thank you Ronán for your clarification. To my opinion, I think the fact that a distribution is defined without the need of a disperion parameter does not mean that it has no disperion. The SD is a dispersion measure, and it can derived also for distributions that do not have a dispersion parameter in their definition. I would agree to the convention to say the SD of a distribution can be undifined or non-finite (like for the Cauchy for instance, that also has no defined mean as well).

M. Ricky Ramadhian

So which best to use Standard Error Mean or Standard Deviation, example if i want to measure thickness of with tunica media aorta different mice?

Jochen Wilhelm

M. Rickey, please search this question in RG - it has ben asked and answered quite often. In short: you would use the SD to give a measure for the dispersion/variability of the data, whereas you would use the SE to give a measure for the expected dispesion of the estimates means of such experiments/studies.

NB: both are only sensible/meaningful when the distribution of the data is Gaussian ("normal"). The SE can be sensible also when the distribution of the data is not Gaussian but when the central limit theorem assures a sufficiently good normal-approximation of the likelihood function/sampling distribution of the mean.

Carmen C W Lim

Standard deviation : how far the individual responses to a desirable question vary or deviate from the mean. It describes the distribution in relation to the mean. A big or small SD does not indicate whether it is good or bad.

Standard error of the mean: This is an indication of the reliability of the mean. A small SE indicates that the sample mean is more of an accurate reflection of the actual population mean.

In summary, SD tells us about the shape of the distribution, how close individual data values are from the mean value. SE on the other hand tells us how close our sample mean is to the true mean of the overall population.

Carl Alexander Sorensen

Dr. Ramadhian:

Is this question in regards to the reliability of your data, or is it more about effect sizes (or something else)? That might help to clarify your question a bit.

Debashis Chakraborty

I apreciate Sorensen's comments. Samle size is absolutely the most important, and a high SD although generally says about wide scattering of your data from the mean (away from mean), this may differ if your sample size is very large. You may expect a large SD when the range of values are wide. so outliers do affect teh SD the most.

Jonathan Jewell

Debashis, I am not sure that I agree with your point about sample size being absolutely the most important. More important is that the underlying distribution of the data is correctly assumed, or the results, regardless of the sample size, will not be meaningful. As the size of the sample grows (and making the assumption that we are sampling in an appropriate way for the study in question), the additional marginal benefit of additional data points diminishes rapidly. This is the point of statistics surely. For smaller sample sizes, the use of n-1 is generally considered an appropriate correction, allowing the calculation of a sample standard deviation using N-1 as the denominator (Bessel's correction).

Umar Faruq Muhammad

Bernardo dos Santos,

If there is no such thing as good or maximal standard deviation, what does the 68–95–99.7 rule mean? I was thinking these percentages provide some standards against which one compares his SPSS calculated SD!

Beatrice Odongkara Mpora

The 68, 95, 99.7% rule assumes normal distribution, i.e., when skewness, and kurtosis approximates zero, twice standard deviation should less than mean and mean, mode, median are similar. Of course these conditions tend to be true with increasing sample size

Mehmet Guven Gunver

if your data is skewed, the standard deviation is weak...

Article TO DETERMINE SKEWNESS, MEAN AND DEVIATION WITH A NEW APPROAC...

Caleb Mackatiani

For better comparison in regression analysis, the standard deviation should be greater. This indicates that distribution of raw scores will not be skewed to the mean.

Sleem Alhabsi

hi i am doing a quantitative research and SPSS showed standard deviation of 1.4 (mean 2.99) for some sub variables (statements). Is is acceptable?

My second question, the variation for the same sub variable is 47%. I read that above 20% is not acceptable. Can you please help. Thanks

Azin Eftekhari

How about standard deviation of samples with equal size? then the lowest standard deviation is better. Can anyone say when we have to remove one of the sample because of high standard deviation? I mean what the acceptable standard deviation is in this situation.

Jochen Wilhelm

Standard deviation does not depend on samle size.

What do you mean with "removing a sample"? Sounds strange. Why should one want to remove a sample?

Carmen C W Lim

Higher standard deviation meaning your variable has a greater spread. Lower SD = lower spread. Jochen is right -- it has nothing to do with sample size. There is No acceptable or unacceptable SD. Hope this helps

Carlos Jimenez-Gallardo

sorry, but, What kind of question is that?

the standard deviation is that and you should read it in context, and you must understand what it says and its relation to other indicators, such as kurtosis.

some of the answers consider symmetric distributions .. but many of the variables are not.

Damien Hall

A question for Joshka Kaufmann about your post three years ago(!) where you said that, for you,

'a [coefficient of variation] >= 1 indicates a relatively high variation, while a CV < 1 can be considered low. This means that distributions with a coefficient of variation higher than 1 are considered to be high variance whereas those with a CV lower than 1 are considered to be low-variance'.

I'm using the CV in a phonetics paper, but it's not often (maybe not at all) used in linguistics as far as I know. So I'd like to put in a reference for this way to interpret it. I can quote your post, but do you know of a published reference that says the same? Any tips appreciated!

I know that in the medical sciences the bar for a 'low-variance' sample is much lower--I've seen CV=0.1 or 0.15 as the upper bound of 'low' in those fields--but I'm guessing that, where lives aren't at stake, we can tolerate a little more variance! I know you're an evolutionary biologist.

Veerendakumar Mustare

I have a question. I am asking the question in this group as my question is related to the topic.

Limits of Normality and Standard Deviation (SD).

In diagnostic tests, following practices are followed.

1. Result (data point) outside the limit (Mean + 2-3 SD value) is considered as abnormal

When one collects multiple sets of data (20 to 100 or more)

2. Result (data point) outside the limit (Mean + 2-3 SD value) is labeled as "outlier" and deemed to be abnormal

3. Result outside the limit (Mean + 4 SD) is considered as artifact

4. Result outside the limit (visual appearance of scatter plot, arbitrary decision, ? Mean + 2 or 3 SD) is considered as artifact and removed from the dataset so that the outlier does not influence the mean. This is the practice followed in at least one test - especially when serial studies are done.

I am in a dilemma about handling data under conditions 3 and 4. The abnormal (outlier) data points are genuine and not artifacts. How justified we are in discarding these data points. If these points influence the mean unduly, they may be excluded from calculation of mean. However, they should be included in the overall data and considered as abnormal. One could use number of "outliers" to interpret the data.

Jochen Wilhelm

You are mixing diagnostic tests and statistical analysis. In a diagnostic test I want to know if a patient's blood glucose level, for instance, in in a normal range. Assuming a normal distribution of blood glucose levels with mean 85 mg/dl and a SD of 15 mg/dl, a value of 120 mg/dl would indicate that this particular patient might have a problem. a value of 300 is likely an artifact, at least when the patient from whom this value is taken doesn't have any obvious problems anyway.

Veerendakumar Mustare

Thanks for the anwer.

My confusion is regarding the use of a test for diagnosing a condition. Becuase of the issues that I have elaborated, I am having difficulty in deciding what I should do. Further, I am not sure that what others are doing is right.

Regarding blood sugar, considering a blood sugar value of 300 mg as artifact is hazardous. This person definitely has diabetes. And if the value is more abnormal eg. 500 or 600 mg, his condition could be serious (diabetic coma).

Jochen Wilhelm

Ok, sorry for having taken a bad example (or inappropriate values). But you got the point, I think.

Veerendakumar Mustare

Dear Jochen Wilhelm,

I am still not sure. Should I say that the prevalent practice of discarding data points (signals) beyond 4 SD as artifact is wrong and develop new algorithm for including these data points in the final analysis (sample of 100 datapoints). Kindly clarify.

Jochen Wilhelm

I can't tell you what you should say. That's all very context-specific.

Discarding values for analysis, just because they are "outlying", is almost generally a bad thing to do (exception: the values are obviousely artefacts, because they practically impossible or at least extremely implausible). Having small samples, an outlier may be the only "guide" telling you that the variance is higher than the other values suggest, and having large samples the presence f an outlier has not much an effect anyhow. Having a lagre sample and many outliers indicated that your understanding of the data distribution is inadequate.

Cletus Ukwubile

Standard deviation depends on the sample size. However, it is better the value stays closer to the sample mean than larger values.

Jochen Wilhelm

@Cletus - to avoid confusion by the readers: standard deviation does not depend on sample size. The standard error depends on the sample size.

The standard deviation is a "population parameter" (referring to a population; a constant for a given population; larger samples will allow you a more precise estimation of this constant), the standard error is a "sample statistic" (referring to a sample; it will approach zero when the sample size approaches the population size).

Andrew Nkhoma

It all depends upon what you are looking at. The spread which is the SD from the mean has different intepretation in different fields. More so in the Medical field. In qualitative analysis , it has to do with peceptions of respondents where the smaller the SD the better.

Robin Junker

Joshka Kaufmann How can the coefficient of variation be bigger than 1? For example, if you have a scale ranging from 1 to 5 and your mean is 3. It is just not possible to have a SD that is as big as 3. But isnt that what you would need, if your were aiming for a CV-score like 1? Sorry. Perhaps I just do not get it.

Joshka Kaufmann

Robin, a CV can be bigger than 1 if you have high variance in your data. Your example has relatively low variance. On the other hand if your data consists of extreme values around the mean (let's say 1000 observations with only values -5 and 15), your sd will be very high (e.g. 10) compared to your mean and CV>1 (e.g. 1.8) . This means your mean is not really informative with regards to your distribution.

Damien Hall

Joshka Kaufmann, have you published the rule of thumb 'a CV >= 1 indicates a relatively high variation, while a CV < 1 can be considered low' (quoted from your first answer in this thread) anywhere? I find it very useful--to the extent that I used the CV in a recent linguistics paper:

https://www.cambridge.org/core/journals/journal-of-french-language-studies/article/e-in-normandy-the-sociolinguistics-phonology-and-phonetics-of-the-loi-de-position/1B6B9D55EE7865E322D6EEC62D8CD2E9

I haven't found any other uses of the CV in linguistics, so I cited some literature from medical statistics, as well as citing this actual thread. But it would be great if you'd let me know whether you or anyone has published this rule of thumb!

Ambili Thomas

Joshka Kaufmann, I also have the same concern like Damien Hall. If you can respond to his query, it will definitely help me also!

Roberto Molteni

I am not aware of this rule of thumb.

However, we should start from the definition of the coefficient of variation (CV) or relative standard deviation (RSD): ratio of the standard deviation to the mean.

The RSD is also used in analytical chemistry to express the precision and repeatability of an assay and the interpretation of these values depend on many factors: type of chemical, concentration, etc.

In other fields, the RSD can be higher or lower, therefore I can suggest to accept it as low as possible, compatible with your purposes.

Jorge Ortiz Pinilla

The best standard deviation is the true standard deviation. In theory, the square of its value (the variance) is the basis for knowing the quality of estimation procedures for important parameters such as the mean.

For many researchers, the ideal is to have a small standard deviation, but we must be aware that it is not very useful to have a small standard deviation if it does not correspond to the reality of the data. In this sense, it is very important to think twice before eliminating extreme values for the sole purpose of decreasing the dispersion found in the data.

Keshav Prasad

@ M. Ricky Ramadhian

The basic premise of this question is wrong - in the sense that you are doing a statistical analysis to see how your sample data is distributed. You don't want to manipulate a data population to suit your predetermined goals. I don't mean that you want to manipulate .. but sometimes statistics are manipulated - "tweaked" to be politically correct, to paint a rosy picture!

Having said this, ....

CV - as pointed out by Joshka Kaufmann , indicates the variance - high or low depending on CV > or < 1 respectively.

On the other hand,......

SD > Mean: Preponderance of Low values

SD < Mean: Preponderance of High values

Larger the difference between SD and Mean, more will be the frequency of extreme values. Conversely, closer SD is to Mean indicates the sample spread is less and tend to cluster around the Mean.

So SD can can give a better visualization about whether the sample data distribution has a spread of extreme values or has central tendency.

Keshav Prasad

Further, if you know that a population data has some features which are evident or you can deduce some common-sense conclusions - but your sample data points otherwise, then you need to have a re-look in to your sampling mechanism - perhaps your sample data is not representative of the population data.

Consider this: You are looking at a large group of people (this whole data is termed Population)with diverse food habits (vegetarian, non-vegetarian, vegan, and a combination of all these) and their tendency to develop Type 2 Diabetes, then you r sample should be representative - means you should have more or less proportional sampling of all the diverse food habit groups.

But if your sample data has a lopsided representation of different food groups, the conclusions you may draw from your statistical analysis may be wrong or even counter productive. In such cases, you ignore your sample data and do a fresh sampling.

This is correct way of preparing a sample and is not deemed as manipulating statistics.

Stephen OKELO Lucas

Standard deviation is good if it can be interpreted using the mean which is derived from the normal curve, so how it deviate from the mean gives an interpretation

Pooja Rantnakant Padyal

With respect to the discussion so far on Standard Distribution, how to decide for given set of data points the Standard deviation is dispersed closer to Mean or away from mean?

Roberto Molteni

Dear Pooja Rantnakant Padyal,

Coefficient of variation (CV) or relative standard deviation (RSD), as ratio of the standard deviation to the mean, should be as low as possible, compatible with your purposes.

For example in chemistry: “The Horwitz curve is "one of the most intriguing relationships in modern analytical chemistry"; it is related to the Variance-to-mean ratio” https://en.wikipedia.org/wiki/William_Horwitz

Carlos Jimenez-Gallardo

Dear Pooja Rantnakant Padyal

You can also use kurtosis. a value much greater than 1 would indicate a concentration in the central part of the variable, you should only be careful with outliers, it can have a relevant impact on the distribution function in analysis.

Md. Arif Billah

As per my limited knowledge, rule of thumb here is for the general consideration when you are not following any of the statistical distribution. This rule of thumb for SD only indicate the variability of the data from its mean. What you accept and what is standard, is based on your research area. A large sample variability tends to low and vice versa. So, if your data is from a small sample, your variability could be high enough. Again, high variability leads to another direction of your research findings and may have impact in your study interpretation.

My suggestion is to whatever your SD is, if other procedures are perfectly okey, you can conclude your findings associated with the SD.

Lastly, if your data follow any of the distribution, please check the rule of those distribution.

Shakir Tuleab

I believe that the value of the standard deviation should not exceed twice the value of the median in the case of a normal distribution of data.

Adrian Albaciete

omg

Ismael Ibrahim Hasan

Alok Saklani

a statistician once mentioned that SD should not be more than 1/3 of the mean. does this help?

Lalthazuala Rokhum

Under the same set of conditions, for reproducibility of reactions or inhibition in the case of antimicrobial activity test of drugs or nanoparticles, the SD should be close to the mean, of course.

Badges
Science topic

Similar topics
Mathematics
Statistics

More M. Ricky Ramadhian's questions See All

Is there any cases algae not using the nutrient from the wastewater and grow normally?

I am working on microalgae cultivation using waste water. The initial concentration of nutrients were less but the microalgae has achieved biomass growth of 2 g/L. The final concentration of...

08 August 2024 4,812 2 View

I have calcined my catalysts without flowing air from the compressed air cylinder will there be any problem with my reaction if i use that catalysts?

FYI Catalyst is MFI zeolite Si/Al = 140,25.40.11.5,15 I am using this catalyst to study methanol to DME conversion

25 July 2024 2,486 2 View

How can I measure arsenic concentration in a patient blood?

I intend to get an economical and accurate LAB TEST to measure arsenic concentration in blood sample.

23 July 2024 3,654 2 View

It is possible to conduct a structured written interview for a qualitative study?

In your opinion, I would like to know if a qualitative study with a structured interview can be conducted in writing. In other words, we sent out open questions and asked the interviewees to...

11 July 2024 4,163 4 View

What is Differential-Fed in microstrip patch ant? How it is design in CST studio? What is process and procedure to do Differential-Fed in CST studio?

What is Differential-Fed in microstrip patch antenna ? How it is design in CST studio? What is simulation process and procedure to do Differential-Fed in CST studio ?

08 July 2024 7,464 1 View

How to know the TM/TE modes in antenna design in CST studio?

How to know the TM/TE modes in antenna design in CST studio? Example: TM01 ,TM02 ,TM10, TM12, TM30 like this... how can identify the antenna is operate the above said mode? can any one explain...

08 July 2024 6,609 2 View

What is co-polarization and cross polarization? How can plot it from CST studio?

what is co-polarization and cross polarization? How can plot it from CST studio? what are steps and process for find out co-polarization and cross polarization from CST studio?

08 July 2024 7,869 1 View

Why in some degradation experiment the results doesn't fit first or second order kinetics ?

In an antibiotic degradation experiment using metals carbon composites to activate PMS, the results doesn't fit pseudo first or second order kinetics however the degradation efficiency is very...

26 June 2024 6,941 3 View

Why carrier Concentration decreases while increasing Temperature ?

Hi All . Recently, I conducted Hall measurement experiments at high temperatures on highly p-doped diamond with a doping concentration of 2.5×10^20 cm−3 During these experiments, I observed two...

23 June 2024 4,763 0 View

What are the best practices for increasing earth biomass?

I'm thinking of how to increase biomass to replace plastic and industrial waste.

23 June 2024 1,871 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View