Which correlation coefficient is better to use: Spearman or Pearson?

04 April 2013 98 10K Report

I´m performing a correlational study of two temporal series of data in order to identify positive or negative correlations between them. Which correlation coefficient is better to use: Spearman or Pearson?

Abdulvahed Khaledi Darvishan Popular answer

The Pearson correlation coefficient is the most widely used. It measures the strength of the linear relationship between normally distributed variables. When the variables are not normally distributed or the relationship between the variables is not linear, it may be more appropriate to use the Spearman rank correlation method.

There is a very interesting paper about the differences between these two correlation coefficients on the same sets of data:

http://geoinfo.amu.edu.pl/qg/archives/2011/QG302_087-093.pdf

Hope it will be useful!

Oleg Devinyak

It depends what kind of data you have. If your data is in ratio scale - use Pearson CC, if in ordinal - Spearman CC. The page http://en.wikipedia.org/wiki/Statistical_data_type will be your guide. Spearman CC is the same as Pearson, but is calculated on ranks instead of actual data values. So Spearman CC can identify both positive and negative correlations also.

Dimitra Sifaki-Pistolla

I agree with Oleg Devinyak but don’t forget anpther parameter you should take in mind in order to identify the optimum and correct (for your type of data) result. You should test the type of distribution of your data…whether they follow a normal distribution or not (in the first case you should follow a parametric test and in the second case a non-parametric test). Pearson is a parametric one whereas Spearman is a non-parametric test that assesses how the well the relationship between two variables can be described using a monotonic function and it is flexible enough for your data. If you want to test the distrib. of your data..you coild applie a normality test such as Kologorov-Smirov..it's easy and quick ..providing direct answer to your question (normal/non).

Asher B Feroze

I am agree with Dimitra that Pearson CC is relies on the assumption of bi-variate normality for the two variables while no distributional assumptions are required for Spearman's CC. The main purpose to run coeff of corr is to provide a value that describe the strength of relationship b/w tow variables, it may be -ve or +ve.

Oleg Devinyak

Pearson CC measures the strength of linear relationship between two variables and does not relies on any assumptions to be correct (as, generally, most of other statistics). It is the p-value for statistical significance of obtained Pearson CC that is determined under assumption of residuals normality. The original question asked about the correlation coefficient (its value), but not about its significance.

Sozina Katuli

Just some additional points.

Pearson correlation coefficient is most appropriate for measurements taken from an interval scale,

While the Spearman correlation coefficient is more appropriate for measurements taken from ordinal scales.

Examples of interval scales include "weight in kg" and "height in inches", in which the individual units (1 kg, 1 inches) are meaningful.

Things like "stress scores" tend to of the ordinal type since while it is clear that "5 stress level " is more stressed than "3 stress score", The question is how do you interpret stress say "1 stress"??.

But when you add up many measurements of the ordinal type, you end up with a measurement which is really neither ordinal nor interval, and is difficult to interpret.

Everything starts with the type of the data you have.

Sandro Sperandei

Since Pearson's CC will take into consideration more information (not only ranks, but the proportionality between ranks), it would be better if your dat can meet, at least apparently, the assumptions. If not, Spearman's CC should be used, even if data is in a interval scale.

But remember to take care with Pearson's CC pitfalls. For instance, if you increase the total variance, you increase the r-value artificially. Try to split you ordered data and examine the r-value to each split and then compare to the r-value from the full data and you will see it. Look for the work of Bland and Altman (both 1983 or 1986) on limits of agreement. There, they discuss some of these pitfalls.

Walter Folly

Thanks for the answers folks! I am appreciating all of them... In order to advance the discussions, I would to exemplify with the case in which I am interested to compare is the number of sunspots during several solar cycles (11-year cycles) and the number of solar flares during this same time interval (these phenomena are expected to present in phase oscillations). I start my annalysis by adjusting to both datasets the sum of a third order polinomial (necessary to fit secular trends) with a sinusoidal function (necessary to fit periodic oscillations). After this, I subtract from the two datasets the polynomial parts previously adjusted (secular trends), remaining only their oscillatory features. Then, I compare the two datasets by calculating the correlation coeficients between them. As in this example the oscillations are approximately in-phase, the obtained correlation coeficient is expected to be near 1 positive. However, when I perform these same procedures comparing sunspot data with other geomagnetic data (which may present a counter-phase relationship with solar cycles), negative values of correlation coeficients are expected to be found. I have perceived in my analyses that Spearman seems to be better but considerably less sensible to outliers...

Sandro Sperandei

Are you really sure that your data demands a LINEAR correlation? Take extra care with these situation, when working with dependent data (since data seems to follow a ciclic oscilation). See this interestint classical example of Ascombe's quartet (http://pt.wikipedia.org/wiki/Quarteto_de_Anscombe), where four different data sets show the same correlation coefficient, but...

Walter Folly

Good remark Sandro! It is an interesting link you posted... Actually, this is the first time in my research career that I need to correlate two oscillating (periodic) temporal series. Up to now, all other cases that I studied were between datasets presenting linear or monotonic increases or decreases. Another similar situation can be viewed in my recently published article, titled "Suicide seasonality: Evidence of 11-year cyclic oscillations in Brazilian suicide rates" (see the Fig. 3 of my article available here in RG). The preliminary analysis shown that the cyclic oscillations observed in male suicide rates in Brazil are apparently in counter-phase with sunspot cycles. However, at that instance of my reseach, I only performed a visual comparison of the datasets...

Savitri Joshi

hello friend...

I have gone through all the answers and I am some what agree with them. But I will explore the Katuli's answer.

As we all know that "the efficiency of results of any exp. only depends on the choice of appropriate measure". Pearson's CC is generally used when we are dealing with quantitative data which can be measured physically like height, weight ,pressure etc. On the hand spearman's Rank CC is used when we are dealing with qualitative data which can not be measured quantitatively but can be ordered or ranked like intelligence, Health etc. So it merely depends upon the problem under study that which measure you are going to use and which will be better for that particular problem depending upon the behavior of the problem itself.

regards...

Mary Jannausch

Given your description, Walter, I wonder if fitting b-splines to each ocillation, along with 95% confidence intervals, might not be better. You could then depict the comparison of the two ocillations graphically, with x = Time and Y= # events.

Sandro's right, a linear estimate of association might not be very useful to you.

Dimitra Sifaki-Pistolla

Sandro and Mary are right; maybe a non linear model would be more preferable for your data.

This is sth. you should test first! I am just giving an example through the following link. It’s non-linear canonical correlation analysis that aims to to determine how similar two or more sets of variables are to one another. It could be applied in various softwares (in my opinion best choices are: Matlab or R but the easiest and quicker way might was with more “commercial” softwares such as spss, stata or statistica).

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Foverals_table.htm

of course there is range of non-linear methods that may appeal to your data better than Can.analysis. Eg. Non-linear time series if your wish to do sth more with your data, for instance detect noise, dimensionality, or chaos.. this would be interesting too as a next step.

Walter Folly

Thank you for the advices and suggestions Dimitra, Mary and Savitri... Despite the results of Spearman cc be more interesting at first sight, I agree that Pearson cc is the correct choice to use in my analyses. With respect the linearity (or not) between the studied datasets, it is necessary to remark that the data do not present linear relationship with respect the time (they independently follow approximately sinusoidal functions of time) but they may present linear relationship between them. For instance, supposing the function X = A * sin (wt) and another function Y = B * sin (wt + k) (both these functions with same frequency w), we note that, if k = pi*n (n=0, 2, 4...), Y will vary linearly with X and will be positively correlated (in-phase related) with this variable, that is, Y = (B/A) * X. On the other hand, if k = pi*n, (n=1, 3, 5,...), then Y will be linear and negatively correlated (counter-phase related) with X. That is: Y = (-B/A) * X. For all other possible values of k, the dependency between Y and X is not linear. In a case like this example, it is possible to use Pearson cc to quantify in-phase and counter-phase relationships between X and Y?

Christian Michel

As far as I can see your data are not indipendent samples, but rather a time-series of two parameters (suicide rates and sunspot occurence). In that case be careful just correlating the data, since both tests also assume the data-point have to be independent and random samples. Also, there might be confounding factors, which you dont consider in your simple correlation analyses. In that case the correlation might indicate a relationship between the two, but in fact they are linked by a third confoudning variable. I would at least recommend to look at possibilities to incorporate the time-dependence and coundounding factors in your analysis. For time-series and oscillating relationships see e.g. Chapter 17 in Alain F. Zuurs book "Analysing ecological data". Sorry... dont have a fast and simple answer ;)

Walter Folly

Ok Christian,

thank you for answer my question.

About confounding factors and the possibility of a third variable linking these two phenomena I agree with you... In reality, several other authors who are also studying those issues have searched for a possible connection involving the production of the neurohormone melatonin, which supposedly (until this moment as an unproven hypothesis) would actuate as the linking variable...

Thank you for the reference suggestion. I will try to found the Zuurs' book you quoted...

Christian Michel

Dear Walter. As an ecologist I would also think of another variable in nature that is linked to solar activity (e.g. temperatures, light intensities etc.) and which then affects suicide rates. It would be interesting to see than if in another system the same variable has been previously linked to suicide incidences.

Azubuike Victor Chukwuka

I think the question should not be which is better but which is more applicable tho ur type of data. Spearman’s correlation coefficient is a standardized measure of the strength of relationship between two variables that do not rely on the assumptions

of a parametric test. It is the equivalent of Pearson’s correlation coefficient performed on data that have been converted into ranked scores.

Conversion of data into ranked scores may actually make your data more sensitive to detect significant relationships compared to if they were not ranked.

Hemanta K. Baruah

In fact, as soon as we prepare a frequency distribution table, we cross over from cardinal numbers to ordinal numbers. Statistical analysis is basically performed with ordinal data. Pearson's method is on ordinal data if we construct a frequency distribution table.

Spearman's method is used for nominal data. If the data are already in the nominal form, we have to use Spearman's method anyway.

However, if the data are in the ordinal form it is better to go for Pearson' method. This is what I feel.

Abdulvahed Khaledi Darvishan

There is a very interesting paper about the differences between these two correlation coefficients on the same sets of data:

http://geoinfo.amu.edu.pl/qg/archives/2011/QG302_087-093.pdf

Hope it will be useful!

Hemanta K. Baruah

I would like to add one point. In fact, that the relationship is linear must first be statistically non-rejectable. If the relationship is nonlinear already, then computation of correlation coefficient is meaningless.

Jeffrey E. Jarrett

Spear is a rank correlation coefficient. Pearson uses the raw or original data and therefore a parametric correlation coefficient.

Matthew Marler

Always compute them both and compare them; if they are not nearly the same, examine the scatterplots to determine which is the more reliable in each case. You could do the scatterplots first and then choose Spearman vs Pearson, but it is faster to calculate all of the correlations and examine the cases where they differ.

Jeffrey E. Jarrett

Bad reasoning Matthew. Spearman Correlation is only for data in ranks. Largely used when the normalit assumptions do not hold in the data. Pearsons' r is not to be used when your data is in ranks. Read a statistics book!!

Matthew Marler

Lots of data that appear to be higher than ranks are merely ordinal: body temp as a measure of disease; blood pressure as a measure of health, and so on. And they often have non-gaussian distributions, so p-values that assume normality are incorrect. The way to determine whether the inaccuracy of an assumption matters (linearity, normality) is to do an analysis that assumes them, and compare to an analysis that does not assume them. If the analyses agree, then the departures from the assumptions are not severe enough to affect conclusions. All medical data, and probably all data, do not conform exactly to the usual assumptions, and the only way to determine whether the degree of departure affects the error rate in conclusions is to perform multiple analyses with different errors of approximation.

By coincidence, I just finished reading a statistics book. When it's published you'll be able to read it too.

Dennis Clason

The theoretical properties of either Pearson's or Spearman's (rank) correlation are derived under the assumption that the data are a random sample. You've acknowledged that the data are time series observations. There is no reason whatever to believe that the observations are independent (or even exchangeable). In fact, in the case of the sunspot series (at least) we know the observations are not independent.

So, shorter answer: you need to use time series methods. As it happens, these are based on Pearson correlations and cross-correlations.

Gideon Sappor

Yep. Simple distinction. Pearsons is the commonly used one however if the data is only ordinal for at least one of X or Y, then Spearman's correlation is appropriate. Pearson is parametric, spearman, non parametric

Eduardo Santamaría-Del-Angel

Pearson is the linear relationship..Spearman is a general relationship included the linear.

Eduardo Santamaría-Del-Angel

We used Pearson because the linear model is the most simple empirical model ..but in almost all the case we need only the relationship (with the exception in Mach-up analysis and q-q normality test)...by this Speraman could be the most usable

Eduardo Santamaría-Del-Angel

Please note that not is only the scale (Interval by Pearson Rank by Speraman)..

you can apply and interval data both coefficients and in general case Speraman will be more high (no all case but in general)...Say tha perason is only by interval scale is say that this data only can apply parametric statistic...the inteval scale and normality is only the two conditions to apply this kind of statistic

Eduardo Santamaría-Del-Angel

Sorry by the last finger error...Say that Pearson is ....

Arcady Novosyolov

Note that if F is the distribution function of X, and G is the distribution function of Y, then (provided F and G are continuous, increasing functions) both F(X) and G(Y) are uniform in [0,1], and Pearson(X,Y) = Spearman(F(X),G(Y)). In particular this means that Spearman correlation measures dependence contained in copula function of the joint distribution of X and Y, while the Pearson correlation is "contaminated" by marginals.

Ahmed K Ibrahim

spearman is used for parametric data (e.g large sample size; at least 30 subject in each group, normal distrib. of data etc)

spearman is used for non-parametric data (as it rank the data)

You have to read about the difference in details (I agree with Abdulvahed Khaledi Darvisha) you need to read this paper

http://geoinfo.amu.edu.pl/qg/archives/2011/QG302_087-093.pdf

Olgierd Hryniewicz

Spearman's correlation is just Pearson's for ranks. Pearson's should not be used for non-normal distributions. Especially for negatively correlated skewed distributions. For example, for negatively dependent exponentially distributed variables the minimal value of Pearson's correlation is equal to 1-pi^2/6=-0,644934. In the case of the Weibull distribution with the shape parameter lower than 1 this minimal value is greater (even close to zero!).

Bhim Singh

Pearson benchmarks linear relationship, Spearman benchmarks monotonic relationship.

The Pearson correlation method measures the strength of the linear relationship between normally distributed variables. This is appropriate most of the time for financial returns data. Whereas,when the variables are not normally distributed or the relationship between the variables is not linear it may be more appropriate to use the Spearman rank correlation method. The Spearman rank correlation method makes no assumptions about the distribution of the data. It may therefore be more appropriate for data with large outliers that hide meaningful relationships between series or for series that are not normally distributed.

Bhim Singh

Azubuike Victor Chukwuka

A Pearson’s correlation is the same as a standardized regression coefficient. It is used to determine the linear relationship between two variables which are normally distributed. Pearson’s correlation can be strongly affected by extreme scores or outliers. Consequently, if the scores are not normally distributed, the scores can be ranked and a Pearson’s correlation carried out on these ranked scores. This type of correlation is known as Spearman’s rank order correlation coefficient.

Arturo Erdely

Pearson's correlation is bad even for linear relationships. Let X be a Cauchy random variable and let Y=a+bX, with a and b given constants. X,Y have a perfect linear relationship and it is not possible to calculate Pearson's coefficient. But even if second moments exist, it is a bad choice to use Pearson since this is not a copula-based measure. As a consequence of Sklar's Theorem (1959) all de information about de dependence between continuous random variables is in the underlying copula, not in its marginal distributions. It is easy to prove that you may keep the underlying copula unchanged (and so dependence unchanged) and by just changing one of the marginal distributions Perason's coeffcient changes. For example, take (X,Y) a random vector of positive continuous random variables, it is easy to check that the undelying copula for (X,Y) is the same as (log X, log Y) -so the dependence doesn't change- but certainly corr(X,Y) is not equal to corr(log X, logY). Check the following reference:

Embrechts, P., McNeil, A.J., Straumann, D. (1999). Correlation: pitfalls and alternatives. Risk Magazine 5, 69-71.

Arturo Erdely

So forget about Pearson. Spearman is a copula-based measure, but still has a common flaw with Pearson: If Spearman (or Pearson) equals zero, this does not necessarily imply independence. It is better to use "real" dependence measures: copula-based ones and that are equal to zero IF AND ONLY IF the random variables are independent. For example Hoeffding (1940) or Schweizer and Wolff (1981) or any Lp distance between the underlying copula and the "independence" copula. Check:

Nelsen, R.B. (2006) An introduction to copulas. Springer

Eddie Seva See

I agree with almost all the answers. I would like to add that the formula in computing Pearson r is the same as the formula used in determining Spearman rho. although there is a shorter formula for the latter. Inasmuch as ratio/interval variables are required for Pearson r (where the distance between each sample could be as low as 0.000000, etc. and ordinal variables are required for Spearman rho (where the distance between each value could only be 0.5 and 1, there is a wider range of Pearson r values that could be computed than Spearman rho.

Josipa Kern

You should look at distribution of your data. In case of normal distribution (Gauss's distribution), you can use Pearson correlation coefficient. In case of non-normal distribution Spearman's correlation coefficient should be used.

Raimundas Vaitkevicius

Normal distribution is rarely (if ever) present in real observational data. Usually we only want to know if normal distribution is a good enough approximation for our data in our situation. I've calculated lots of these coefficients. In most cases they were very similar (when calculated both).

Marius-Victor Birsan

See Chapter 8 from "Statistical Methods in Water Resources" by Helsel and Hirsch:

http://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new.pdf

Raid Amin

I would calculate both correlation coefficients. If they differ very little, go with Pearson's since itis better known. If they differ a lot, take a good look at the distribution of the data. Usually, in such a case, Spearman's coefficient is more applicable. A plot of the data helps a lot here. Are (x,y) linearly related or are they related linearly in the ranks? Such statistics are among the most used/abused/over used around us. It is important to clarify to others how and when to use these basic statistics. It is quite bad how frequently we see the expression "x is correlated with y", when in fact the person saying this expression does not mean a coefficient of correlation.

Omar Mohammad Eidous

Despite that the spearman coefficient assumes nothing about the data distribution, it depends solely on pearson coefficient and derived from it. If the two coefficients were applicable for your data set, please select Pearson coefficient since it uses more informations which lead to more accurate.

Murali Dhar

Verify the validity of Pearson's correlation coefficient. If valid, go with it, otherwise, Spearman's rank correlation coefficient.

Hatem Maraqah

Pearson correlation coefficient is the most and widely used.which measures the strength of the linear relationship between normally distributed variables. When the variables are not normally distributed or the relationship between the variables is not linear, it may be more recommended to use the Spearman rank correlation method.

Raid Amin

A coefficient of correlation does not have any distributional assumptions. Only the inferential procedures for the correlation coefficient require normality.

(x,y) are linearly related ==>. Pearson

(x,y) are linear in the ranks ==> Spearman

If the relationship between x and y is not monotonic, neither coefficient should be used.

Scott Pardo

I must agree with Raid. I would only add that Spearman's rho is Pearson's r applied to ranks, and the rank transformation compensates for non-symmetrically distrbuted data. If the distributions of either X or Y appear to have one long tail, then Spearman's is probably better.

Richard David Gill

Spearman is Pearson applied to ranks. But any monotone transformation of ranks would be just as meaningful. So why not compute Pearson correlation of the normal quantiles corresponding to the ranks? It will be very close to ordinary Pearson when the data is bivariate normal, but will be much more robust to outliers when the data is close to bivariate normal. But ... your data is time-series data, not an iid sample. So the usual distributional theory, whether of Pearson or Spearman or whatever you like, is not relevant.

You want to summarize the dependence between two variables in a single number. Maybe a one-number statistic does not do justice to the form of dependence in question. Look at a scatterplot; investigate transformations of the two variables. What kind of relationship do you expect, from your scientific knowledge off the field in question?

I suspect that asking whether to use Spearman or Pearson is the wrong question. Study your data first.

Raid Amin

I often use normal quantiles transformations first (such as the one by Blom in SAS), and then if I calculate Pearson's correlation coefficient, then in fact the results are similar to those obtained with Spearman's coefficient on the raw data, as Richard has stated above.

Study your data. Exploratory data analysis is very important, as Tukey has pointed out in the past.

Giancarlo Ruggieri

I have not so much matter to add to the many exaustive replies to this question. I have only to suggest to chek if the data to correlate have a quite sufficient normality before to use the Pearson correlation . The Spearman's method is directed to data distributed according to ranks : to use this method you have to ascertain if the data are in ranks and in case the possibility to ranking your data.

Muhammad BILAL Bhatti

Pearson correlation coefficient will be suitable incase of normally distributed data otherwise use the spearman test.

Raid Amin

Normality is not a requirement for using Pearson's correlation coefficient. This has been pointed out several times above. The inferences about the correlation coefficient may require normality.

Eduardo Santamaría-Del-Angel

Raid have a very god point

the principal confusion is because Pearson coefficient formula are base in Covariance and Standart desviation of each variables...by this is denominated parametric coefficient...but a different of all parametric test in the no need a normal distribution (we uses the average only to calculated the SD and CoVar..

Please note that you can used to obtain the linear association between two variables in differences. By example variables in interval scale (like temperature degree ) and percentages scales (like relative abundance in some species).

Sperman see the global association .

The normal distribution and interval scale are requirements in a parametric test base in mean comparisons. Another big requirements of this kind of test is the

if you need know more about the scales you can check Siegel no parametric statitics (in particular the table 1) http://www.mun.ca/biology/quant/siegel.pdf

in conclusion Pearson only show the linear relationship between two variables

Spearman show all relationsship between two variables included the linear relationship

Abdel-Aziz Ahmad Sharabati

I think, the method to be used depends on the kind of data. I do not want to repeat what has been mentioned above. Whatever you use to test correlation you have to explain the logical relationships. Ms. Angel in her conclusion summarized clearly the differences. Both methods are valid and indicate the relationships value and significance (one to one).

William H. Fisher

I assume that you've adjusted your time series for seasonality and other factors that affect such data. It would be important also to examine them graphically. A "number" by itself is not going to convey much.

Aurelio Leone

I think that there is no clear prevalence to choice the Spearman or Pearson test, both of statistically meaning, but that depends on the fact that a rank coefficent correlation (Spearman) or a measure of linear association (Pearson) may be more useful to the prroject to analyze.

Jeffrey E. Jarrett

It is time to stop this discussion. Use Spearman only when your data is in ranks and the Person Product Moment when data is normally distributed or can be approximately normal. This discussion has gone on to long.!!!

Giancarlo Ruggieri

My compliments and support to Jeff Jarrett ! I want only to add a simple consideration : many statistical software do not need to have the choice of the researcher because automatically choice the type of correlation to use according to the distribution of data inserted. So long G Ruggieri

Olusegun Olanrele

In summary then, Pearson correlation is for interval/ratio scale data, normally distributed and or parametric while Spearman Rho is for ordinal scale data or non parametric data. Speraman correlation I think is good for relationship between two variables i.e. one independent variable while Speraman rho for more than one independent variables. What do you think?

Giancarlo Ruggieri

I want to add that the use of the Spearman correlation can be vey useful to evidence outliers or subgrouping of data in the files of the variables to correlate in a linear regression, due to the regular intervals of the distribution of the rank values in comparison with the possible anomalies within the items of a variable : a strong difference between the p of the linear regression with the Spearman p of the same files of data converted into the corresponding scores of the ranks could suggest the need to check the characteristics of the original data and of their distribution.

Maria José Palma Lampreia Dos-Santos

It depends the Pearson correlation coefficient is the most widely used when the variables are not normally distributed or the relationship between the variables is not linear, it may be more appropriate to use the Spearman rank correlation method.

Sandheep Sugathan

Hi. You have to find out if the continuous variables in the study are normally distributed. If normally distributed, Pearson's correlation coefficient is appropriate. If not normally distributed, Spearman correlation coefficient is to be selected.

Raid Amin

Normality is not needed for either correlation coefficient. This has been repeatedly discussed above. The normality assumption is only needed for some inferences on the true correlation coefficient. When the association between X and Y is linear, then we can consider Pearson's correlation coefficient. When it is non-linear and monotonic, then Spearman's correlation coefficient may work better.

Maria José Palma Lampreia Dos-Santos

Sorry about my last answer of course the information is correction, because I lost some words. Please see the remarks:

Pearson product moment correlation is for continues, mount shaped distributions. The normality assumption should be required only for inference, as some one said.

Spearman correlation is used for ordinal, most of the time we rank the data, and compute the correlation.

Kendall's correlation is for non parametric data.

Issam Ashqer

I prefer to use the correlation coefficient and p-value, these two values strengthen each other and give strong confidence about the result.

Giancarlo Ruggieri

According to what I know ; it's impossible to use indifferently .The reply By Maria Josè Dos Santos is correct: The Pearson correlation is used on data having a bell distribution and it give us the rate of correlation and the level of significance of this correlation when regressing. The Spearman correlation gives the correlation of ordinal data and as I said in a previous intervention we can usefully explore the presence of outliers or subgroups in files of data A and B when regressing A versus B simply converting their data in ranks data and applying on them the Spearman correlation.

Nedjad Losic

If you want to measure linear dependence between y and x: Pearsons's correlation efficient. If you want to measure general association between y and x: Spearman's.

In the case where one has a strictly montonically increasing relationship but which is not very linear, Spearman's coefficient is better to pick up such associations (higher correlation coefficent, and lower p-value) than Pearson's.

In exploratory analyses, I initially look for associations between y & x (if one increases X, does Y increase (or decrease)) without paying much importance to the relative size of the increase (or decrease) in Y: hence I prefer Spearman's.

NB That is the absolute size of the correlation coefficient which is important. p-values for correlation coefficients are notorious to become stat sign. even when the association is weak, given enough sample size.

Raid Amin

Measuring a linear association in the (x,y) values ==> Pearson

Measuring a linear association in the ranks of (x,y) ==> Spearman.

The latter picks up an "association" as long as the relationship between the ranks of (x,y) is monotone.

Roshini Sooriyarachchi

If your two variables are continuous use Pearson's correlation coefficient but if your variabes are ordinal use Spearman's correlation coefficient.

Raid Amin

I would say that when you assume that for each increase in X there is a fixed increase (or decrease) in Y, then we measure for a linear association in the data. When we assume that for each increase in X there is "some increase (or decrease" in Y, we should be measuring a monotonic (not necessarily linear) association with Spearman's rank correlation.

Giancarlo Ruggieri

I have the idea that at present sufficient interventions defined the difference and the different use between the Pearson correlation and Spearman correlation ,the first one to be used when between two variables A and B does exist a relationship based on the change of B in conformity with the change of A and vice versa, according to a mathematical link expressed by an equation ,while in the Spearman correlation the relationship is based on variable difference of order between the rank items of the two variables. This allows to verify if two variables to be correlated by the Pearson correlation have sufficient similar distributions to a have a correlation statistically significant. Changing their data in the corresponding ranks and applying on them the Spearman correlation : a significant Spearman p let us know that A and B have significantly accordant distributions, while a preliminary not significant or weak Spearman p let us know the existence of differing distributions based on the presence of subgroupings of data or/an of outliers. I vote up for Raid Amin.

Giancarlo Ruggieri

I want to add that an other different useful result can be obtained by checking a statistically significant result of a Pearson correlation. Applying the Pearson correlation on the ranks of the regressed data , this can give a significant or not significant p. In the first case we have the information that the data have really a significant correlation of their distribution. A not significant Spearman p strongly suggests to explore the regressed data because there are high probabilities that the significant Pearson p is due to the presence of one or few outliers able to mask the actual low correlation of the remaining prevalent part of data, that is to say that eliminating the outliers the Pearson correlation p should result not significant.

Aggarwal Abha

both gives the magnituge of the relationship between the two variables. I agree with Darvishan.

Giancarlo Ruggieri

There is no doubt that Pearson and Spearman correlation give the magnitude of the relationship between two variables, but according to that I reminded before, the two magnitudes are basically different each other because when applied to the same file of data as actual data (Pearson correlation) and to the ranks of the same data (Spearman correlation ) could result both in significant regression, p < 0,05 or differently in Pearson significant regression , p< 0,05 and in Spearman p > 0,05,

Ray G. Butler

Hello, Giancarlo

Please correct me if I'm wrong but are these aspects critical if we focus in the phase of exploratory data analysis (EDA)? I understand that EDA worries more about type II errors instead of type I...

Aggarwal Abha

Of course the two techniques are applicable to different situations. One for parametric and other for non parametric. Since other has explained I did not wrote in details. I agree with you.

Dominique Desbois

although they are not directly in your field of investigation, I suggest further reading in relation to the alternative offered between exploratory data analysis and modeling in the analysis of the correlation between time series which falls your question regarding the choice of the correlation coefficient, for example:

i) for EDA

Detecting trends using Spearman's Rank Correlation by Thomas Gauthier

(available at : http://plaza.ufl.edu/yiz21cn/refer/trend%20detection%20using%20spearman%20rank%20cc.pdf)

and

ii) for modeling :

Correlation Testing in Time Series, Spatial and Cross-Sectional Data by P.M.Robinson

avalaible at http://eprints.lse.ac.uk/25470/1/Correlation_Testing_in_Time_Series,_Spatial_and_Cross-Sectional_Data.pdf

Raid Amin

Dimitra,

Neither correlation coefficient is a "test". These are statistics that do not need a specific distribution, such as the normal, to be valid measures of correlation.

Kirsteen Burton

Hi there,

When the data are parametric, there are two correlation coefficients that may be estimated: Spearman's and Kendalls Tau. How to best select between the two?

Raid Amin

As a measure of concordance, Kendall's Tau has some type of applications that are best suited for it. I recall that with discrete data or categorical data we can u Kendall's Tau. I could e wrong.

Jagdish Khubchandani

Walter, this may sound strange, but have you tried any regression with your data? For example, a linear regression may give you more details compared to a pear corr. You can also account for confounders/covariates.

George Hart

In general, if both the variables are measured on the ordinal scale, or, one is ordinal and the other is interval or ratio scale use Spearman's rho or Kendall's tau. If both are interval or ratio then use Pearson's product-moment coefficient. See also the distance covariance [V] test of Gabor J. Szekely [2005].

Remember 'correlation is not necessaryily causation!'.

George Hart

Mohamed Ismail Mohideen Bawa

For data exploration, both methods can be used. Albeit, Spearman correlation is computed on ranks whereas Pearson correlation is calculated on true values and depicts linear relationships.

Sandheep Sugathan

Spearman's correlation coefficient can be applied when one of the two continuous variables or both of the 2 continuous variables are not normally distributed. Second assumption for Spearman's correlation is that there should be a monotonic relationship between the two variables. To check that, you have to draw a scatter diagram, to know if the relationship is monotonic.

Pearson's correlation can be applied when the variables are normally distributed and there are no outliers in both the continuous variables.

Raid Amin

No normality assumption needs to be made with either correlation coefficients.

Jyoti Gupta

I am still in doubt. Does pearson's coeff. of correlation assumes normal distribution or not? There are many data sets which are not normally distributed but free from outliers and are interval/ratio scale measures. Can we use spearman's rho on them strictly as a rule of thumb?

or can we still apply pearson's?

Moreover if pearson's corr. is relying so much on normality and it doesn't makes any difference whether we use spearman's or pearson's, then what is the logic behind existence of pearson's coeff.of correlation?

Jyoti Gupta

Pearson’s coefficient of correlation was discovered by Bravais in 1846, but Karl Pearson was the first to describe, in 1896, the standard method of

its calculation and show it to be the best one possible. Pearson also offered some comments about an extension of the idea made by Galton (who

applied it to anthropometric data). He called this

method the “product-moments’’ method (or the

Galton function for the coefficient of correlation

r). An important assumption in Pearson’s 1896

contribution is the normality of the variables

analysed, which could be true only for quantitative variables. Pearson’s correlation coefficient is a measure of the strength of the linear relationship between two such variables

Raid Amin

No normality is needed to calculate a correlation coefficient.

Normality is needed when you want to make inferences with a test or confidence interval on a Pearson's correlation coefficient

Jyoti Gupta

sir thank you so much for your reply but i read in many books that normality test should be performed before applyin pearson's coff. correlation in spss. and now i am unable to normalize the data on which Prof. Andy Field applied pearson's correlation.

I ll be really thankful to you if you can suggest some good article for coming out of this problem with a valid proof.

Raid Amin

Distributional assumptions are only needed when inferential procedures are used, Jyoti. If your only wish is to obtain the value of Pearson's correlation coefficient, then you need continuous data but not normality.

Many excellent texts have chapters on the correlation coefficient. Use texts with authors known for their excellent reputations.

The correlation coefficient is just a statistic. If we calculate the value of the statistic (sample mean) we do not assume normality to calculate the mean. If we want to use the sample mean in a t-test, we have to assume normality.

Jyoti Gupta

sir, when i applied bi-variate correlation, the pearson's value comes out -.441 and because of p value

Jyoti Gupta

pfa file

Raid Amin

Yes, you need to test for normality in such a case. If a test for normality indicates that the data are non-normal, then use either Spearman's Rank Correlation Coefficient (easiest choice) or transform the data to Normal Scores and use Pearson's correlation coefficient.

Yuanzhang Li

I agree with Raid, no normality is needed for both. For testing, you may use permutation test, the normality assumption is still not needed.

Only using Fisher transformation or t-test, uncorrelated bivariate normal distribution assumption is needed.

Spearman's Rank Correlation Coefficient is more robust than Pearson Correlation Coefficient, which is affected by outliers.

Say we have data sets

group x y

1 2 1

1 2 2

1 3 4

1 4 2

1 5 5

2 2 1

2 2 2

2 3 4

2 4 2

2 500 500

For Pearson you will ge for group 1: 0.72 , group 2: 0.9995

For Spearman, nodifference for two groups: 0.73.

Which one should be use, depending on your data.

If you think (500, 500) is typo. Using Spearman, if (500,500) give your valuable information, Using Pearson.

Arumugam P

While studying the relationship between two or more variables, we are using the correlation coefficient. The type of correlation we should use based on the scale of the variable value. If our variable value is either ratio or interval scale we can use parametric test. The variable is in ordinal or nominal scale we are using non parametric test. Karl Pearson is a parametric test and Spearman's rank correlation is a non parametric. Spearman is an appropriate iest, the data are in ordinal scale and hence the name of the test is also called as Spearman's rank correlation.

Javed Ahmad Dogar

Raid Amin your solutions are excellent and more helpful. thanks a lot for great support

Badges
Science topic

More Walter Folly's questions See All

Why does confusion occur between variograms or semivariograms in geostatistical papers?

Within the field of Geostatistics there is a pernicious and historic confusion concerning the terms variogram and semivariogram. While some authors employ such terms with caution in order to avoid...

08 September 2014 2,197 4 View

Why are some theorists of Psychoanalysis abandoning Freud's drive dualism concepts?

Since Lacan it is possible to observe a gradual abandonment of the Freudian concept of dualism between death drives and life drives. While Freud insisted in such dualism until the end, his...

03 April 2014 4,914 4 View

How to survive as a scientist nowadays by staying out of salami science world?

Is there some place in the sun for someone who decides for publish less than one article by year in order to dedicate himself to high quality publications?

03 April 2014 5,066 4 View

Is the nano-related scientific production declining worldwide?

At least in Brazil, the numbers of scientific production related to nanotechnology have been in decline since 2008-2010. Are we watching the end of this wave? (See the attached...

02 March 2014 1,124 1 View

What is the top value?

Suppose that you did not participate of any discussion here. You never asked or answered any question, however you have an excellent scientific production... What is the maximum value your RG...

10 November 2013 594 8 View

In your opinion, what constitutes theft of an idea?

Suppose that you have informally spoken with a colleague from your department or institute about a new research idea that you are considering. A few months later you discover that one of them is...

10 November 2013 2,847 35 View

Discovery by serendipity...

A reasonably number of great scientific discoveries can be attributed to "accidental procedure deviations", "fortunate mistakes", "unexpected insights" and other similar occurrences. Serendipity...

09 October 2013 4,457 8 View

Publishing here...

One of ambitions of RG is to make the researchers prefer to publish their findings here instead the conventional scientific journals... Suppose that you have new data of your research... New...

09 October 2013 9,437 6 View

Are we reading more articles now than before the OA journals?

We are experiencing an explosive growth of scientific publications as a consequence of the emergence of OA journals. Nevertheless, one may question about our capacity to read (and eventually cite)...

07 August 2013 9,928 16 View

Difficulty to publish and number of authors

Suppose a scientific article signed by a single author. In another situation, suppose this same article, but signed by more than one author. In your opinion, does the number of authors influence...

06 July 2013 9,055 12 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Just bounced on me. Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal...

31 July 2024 8,141 13 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How to back transform the results generated from analyses using log transformed with In(X+1) data?

I am conducting my analysis using SPSS. I log transformed my data using In(X+1) as my data contain zero values. However, when I want to back transform the regression coefficients generated from my...

31 July 2024 7,860 3 View

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

AI has made it easier to code and analyze data

25 July 2024 9,861 1 View

Can we eliminate the stress singularity at the tip of the crack by manipulating the elastic constants?

The aim of the research here is to prevent the propagation of the crack in the fabricated elastic medium with useful applications.

25 July 2024 9,976 3 View

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?

Five or four rating Likert scales e.g. Strongly agree, agree, neutral, disagree and strongly disagree or Strongly agree, agree, disagree and strongly disagree are usually collapse to SA/A, N, D/SD...

24 July 2024 9,841 4 View

How to test multivariate outlier in STATA?

Hey all, I need help testing for multivariate outliers using STATA for my master thesis. The literature recommends the Minimum Covariance Determinant (MCD) (Verardi & Dehon, 2010). I found the...

22 July 2024 8,821 2 View