What does standard deviation (SD) tell us in non-normal distribution? Is it scientific to use SD to report non-normal distributed data? If no, How we can do?
Daryoush, (updated to answer more of your original question, 28-11-2016)
Standard Deviation can be calculated for any distribution if you have the observations. (As mentioned above, there are other measures, each useful in at least some circumstances.)
[START OF UPDATE] What it tells you, even for a non-normal distribution, is simply the average of the distance of each of the data points from the mean of the distribution.
Note, in particular, that the SD of a distribution is in the same units as the distribution's variable, and can be viewed as a line-segment along the x-axis of the distribution.
You can also calculate the variance of the data (by not taking the square root in the expression for the SD). But this will be in units that are the square of the variable's units - sometimes a bit strange to interpret. (The advantage of the variance is that you can add and subtract variances, allowing one to analyse how the variance is spread among different "degrees of freedom" - but you'd need to look that up e.g. via 2nd link below.) By comparison, because of the square root sign for the SD, one cannot analyse the SD in the same way. [END OF UPDATE]
But, if you're not already aware of it, there is an essential distinction between the SD of a whole population and that of a limited sample (drawn from a whole, but possibly unobservable in practice, population).
The difference arises as, in the case of sample data, your calculated mean is only a sample mean - an estimate of what would be the population mean used in calculating a population SD.*
For a population SD, the divisor under the square root sign is the number of observations, 'n' (see formulas in 1st link below).
To compensate for the potential error when working with sample data, giving only a sample mean, one reduces the n by one, and so uses (n-1) in the divisor. This compensates for the uncertainty introduced, unavoidably, when having to use a sample mean, and its use results in a so-called "unbiassed" estimate for the sample SD. (The larger the sample size, the smaller the error.)
You can find more on this on the web, e.g. link below.
Hoping this helps - Paul
Footnote, re asterisked point above:
* This is where a relationship to the normal distribution creeps in - even for non-normal distributions - in terms of the distribution of the sample mean itself (as outlined by Alexander above). Compare the following distributions on three levels, e.g. sketched one above the other:
a) the distribution of a population, with a population mean and SD
b) the similar (but, crucially, not identical) distribution of a sample drawn from a population: necessarily less widely distributed along the x-axis, and with a sample mean that's only an approximation to the population mean; and of course its sample SD also, based on this same sample mean
c) the - quite different - distribution of the means of samples drawn from a population - far narrower and tending toward a normal distribution (regardless of the structure of the original population or sample distributions) as the sample sizes increase. For a given sample size, imagine sketching many other sample distributions over (b) above, and marking all their sample means on the x-axis; they will all fall some way above or below the true population mean, and it's the distribution of those points that constitutes (c).
If you consider the chance of the error in the sample means being slightly above the true mean to be about equal to that of the sample mean being slightly below the true mean, then you can get a sense of how the Normal distribution gets into the picture, as an approximation to the distribution of sample means.
I'll stop here, before going more into "Student's t-distribution".
I've spent years trying to get the above into the heads of some students, so forgive me for laying out the way I've found to be most effective in getting across to them the distinction between the distribution of a sample and the distribution of the related sample means.
The SD would be reported, because is a measure of the variability or your sample, and a priory you dont know how is your data; in addition, when your data follow a distribution different of the normal, you should be reported aditionally, median and IQR of your data.
Normal distribution is defined by just its mean and standard deviation, with all its other cumulants being zero. As long as the other cumulants aren't too far from zero (and they often aren't, as a result of the central limit theorem), we can approximate other distributions using the normal distribution. Most of the time you see standard deviations being used for non-normal distributions, there is an underlying normal approximation being used.
An alternative is to use the interquartile range. For normally distributed data it should be narrower but with non-normal data it might be wider. It will better represent the dispersion of data from the middle value in the presence of skewness.
Daryoush, (updated to answer more of your original question, 28-11-2016)
Standard Deviation can be calculated for any distribution if you have the observations. (As mentioned above, there are other measures, each useful in at least some circumstances.)
[START OF UPDATE] What it tells you, even for a non-normal distribution, is simply the average of the distance of each of the data points from the mean of the distribution.
Note, in particular, that the SD of a distribution is in the same units as the distribution's variable, and can be viewed as a line-segment along the x-axis of the distribution.
You can also calculate the variance of the data (by not taking the square root in the expression for the SD). But this will be in units that are the square of the variable's units - sometimes a bit strange to interpret. (The advantage of the variance is that you can add and subtract variances, allowing one to analyse how the variance is spread among different "degrees of freedom" - but you'd need to look that up e.g. via 2nd link below.) By comparison, because of the square root sign for the SD, one cannot analyse the SD in the same way. [END OF UPDATE]
But, if you're not already aware of it, there is an essential distinction between the SD of a whole population and that of a limited sample (drawn from a whole, but possibly unobservable in practice, population).
The difference arises as, in the case of sample data, your calculated mean is only a sample mean - an estimate of what would be the population mean used in calculating a population SD.*
For a population SD, the divisor under the square root sign is the number of observations, 'n' (see formulas in 1st link below).
To compensate for the potential error when working with sample data, giving only a sample mean, one reduces the n by one, and so uses (n-1) in the divisor. This compensates for the uncertainty introduced, unavoidably, when having to use a sample mean, and its use results in a so-called "unbiassed" estimate for the sample SD. (The larger the sample size, the smaller the error.)
You can find more on this on the web, e.g. link below.
Hoping this helps - Paul
Footnote, re asterisked point above:
* This is where a relationship to the normal distribution creeps in - even for non-normal distributions - in terms of the distribution of the sample mean itself (as outlined by Alexander above). Compare the following distributions on three levels, e.g. sketched one above the other:
a) the distribution of a population, with a population mean and SD
b) the similar (but, crucially, not identical) distribution of a sample drawn from a population: necessarily less widely distributed along the x-axis, and with a sample mean that's only an approximation to the population mean; and of course its sample SD also, based on this same sample mean
c) the - quite different - distribution of the means of samples drawn from a population - far narrower and tending toward a normal distribution (regardless of the structure of the original population or sample distributions) as the sample sizes increase. For a given sample size, imagine sketching many other sample distributions over (b) above, and marking all their sample means on the x-axis; they will all fall some way above or below the true population mean, and it's the distribution of those points that constitutes (c).
If you consider the chance of the error in the sample means being slightly above the true mean to be about equal to that of the sample mean being slightly below the true mean, then you can get a sense of how the Normal distribution gets into the picture, as an approximation to the distribution of sample means.
I'll stop here, before going more into "Student's t-distribution".
I've spent years trying to get the above into the heads of some students, so forgive me for laying out the way I've found to be most effective in getting across to them the distinction between the distribution of a sample and the distribution of the related sample means.
According to ResearchGate's "Contact Us"/Community Support:
" The reads do not appear to be an error and they seem to be coming from legitimate sources, mostly via google searches for "standard deviation non-normal distribution". "
If it's this popular, perhaps it's serving a useful purpose?