Question about Correlation Analysis: I got the significant p<0.05, however, the R only indicated a weak correlation (r=0.15). Why?

04 April 2015 35 4K Report

With pearson correlation analysis, I got the significant p180, Normal distribution, biological data.

Thanks very much!

Ana Nora Alix Donaldson Popular answer

Greetings again Gui-jun Wa,

I do not agree that the answer to your question might be in a good basic statistical textbook. In my view, the issues involved in your question are less straightforward than those covered in basic treatment of the sample size formula.

As I, and some other colleagues have written in previous replies to you, P-values are influenced by sample size for every statistical test and, for a p-value to be meaningful, it has to be done with a sample size dictated by a power calculation. If this is not done, the exercise would be called a “fishing expedition”.

The power calculation, in turn, has to be given in terms of an important difference that is sought to detect. In your case, a biologically or medically important difference. When this is done, if a p-value signals statistical significance then the effect is significant.

Yours is a PhD project (not a fishing expedition as some may have assumed) and one should assumes that the effect you are testing in your analysis was considered as a biologically important effect at the design stage of your project.

To say that a correlation of 0.15 or any other “small size” effect is “weak” or has to be non significant is not correct. You need a context for your study before saying anything like that. Think of cancer experiments for example. Since unfortunately we do not have and do not expect to have breakthroughs in situations like that, the progress in this field is given in small steps. And this translate to clinically important effects to be small size effects and this translates into large scale studies (large sample sizes). To see small things we need a more powerful lens. Therefore, in many biological and medical studies, a correlation of 0.15 is not considered weak or non important. Equally as the example I gave in my previous post, a correlation of say 0.60 may not be reflecting a strong correlation. Statistical methods need a context.

I personally do not see that you can move away to an anova or any other method to avoid having this small size effect showing significance. For one thing, the effect size that you test comes at the design stage. The “small size” effect you are dealing with (reflected in a Pearson correlation of 0.15) will reflect in a small Cohen’s d effect size anyway. An important effect size does not mean a large effect size (see discussion above). If your biologically important effect size is a small size effect, then your data is finding that your effect is a significant effect.

Best regards, Ana

Sergio Scanu

citation: A statistically significant finding is one that is determined (statistically) to be very unlikely to happen by chance. I think your N is big and so you can "see" small variations...

Manfred Hammerl

exactly. with big enough sample size, everything will be significant. e.g. N=1,000.000, then r=0.001 will be significant...

Michel Vorenhout

The link below explains the differences between r and and the correlation.

In short (and my wording, bare with me): you assume a linear correlation, but probably have a cloud that has a positive (or negative correlation, but not all points lie on that linear line.

You will probably need to look at other variables and try to find your other correlations to describe your data. Did you start with an ANOVA/GLM?

http://learntech.uwe.ac.uk/da/Default.aspx?pageid=1442

Gui-Jun Wan

N=200~250, is it big enough to cause this situation?

Wojciech Świątkowski

Significance level (the p value) is not the same as effect size (your r= .15). It is possible to have very small effect sizes, that are « very significant », as well as substantial effect sizes, that are far from « reaching the significance level ». These are two distinct indicators. In your case, the effect size (the pearson’s r ) is the strength of a relationship between two variables, which is relatively small, but the significant p value refers to the fact that, given your sample size, your error measurement associated with these variables is small enough so the correlation is « reliable » (in statistical terms, the chances of finding this effect in a population in which the true effect is assumed to be zero, are less than 5%).

Eiko Fried

With an N of 7 billion, every conceivable r different from 0 will be "significant". There's a difference between finding a strong or meaningful finding (size of the correlation), or finding something that is likely above chance (p-value).

Gui-Jun Wan

So the role of Correlation Analysis means less than that of ANOVA? Can I say that it could do alternatively, but the ANOVA is a better choice?

Paola Lombardo

Correlation/regression dependance on sample size may be a problem, as I myself have noticed that a correlation/regression is always significant (p very small) when N is large (say N>100). Other researchers have explained the difference between p and r in this forum, which may not be that immediate to grasp if one's background in statistics is not that solid. If we leave the technical details to the real experts, we can however get away with getting the info that we need anyway.

Prairie (1996) tried to fix the problem of low p values from large ecological datasets (especially from field data, inherently more variable than lab data) by adding a correction factor. Prairie's (1996) general conclusion is that for such large datasets, regressions that produce r2 values 0.650. Prairie (1996) provides the full mathematical reasoning and proofing behind his claim.

I had used Prairie's (1996) rationale in a recent paper of mine, in which I had two regressions (using r2 as the coefficient of regression, as a cause-effect relationship was assumed to exist between the two variables), both based on N=125, and both highly significant despite a high spread of the data around the model for one of the two regressions (r2=0.098 and p=0.0004 for one regression, and r2=0.791 and p

Gui-Jun Wan

@Michel Vorenhout

Yes. I tried ANOVA, and the factor I chose shows significant effect.

Warren L May

There is a lot of discussion in a Psychology Journal about simply doing away with inference and p-values. In an age of "Big Data", that may be sound advice. If you have a large sample size, as already said, everything is "statistically significant". But that means little. What you really want is the "effect size". Admittedly, r=0.15 is pretty small for a linear relationship. Always look at your scatterplot before deciding, though...I often see people not doing that.

Rainer Duesing

To put it simple, it's because r describes the strength of association (how strong do the variables correlate) and p gives you an estimate if r is significantly different from zero. The problem is that p is directly influenced by your sample size N, because the associated t value is calculated: t=r*sqrt(N-2)/sqrt(1-r^2). As you can see, with growing sample size, the t value will automatically increase (it gets "more" significant) although your r value is held constant.

As many researcher suggest the null hypothesis testing is pointless!! P- values are somewhat shallow and pointless itself, unless you conducted a power anaylsis a priori. But have a look at

Cohen (1992). A power primer.

Cohen (1995). The Earth is round p

Marie Cachera

be careful, correlation does not mean causation :) with enough data, you can find significant correlation between whatever you like

Gui-Jun Wan

Yes, a regresion would be better if I could get more data of undetected independent variables. We have reached an agreement after a face-to-face discussion with our team members including several statistical experts. That is to focus on Anova analysis.

Khuong Van Dinh

this may be useful for you somehow!

https://dynamicecology.wordpress.com/2012/11/27/ecologists-need-to-do-a-better-job-of-prediction-part-i-the-insidious-evils-of-anova/

Ana Nora Alix Donaldson

Greetings Gui-jun Wa, P-values are influenced by sample size for every statistical test and that is why a power calculation is always required, so the p-values are meaningful. So keep this in mind as you move to ANOVA or any other test. I dont think there is any reason to move away from the test you have done. (And besides, why leave an unfullfilled emptiness in our thought process?). Only you need to realize that there is more to it when doing a statistical test to judge the significance of a Pearson correlation coefficient. Let me try to explain. We all know that 2 points determine a straight line. So, two points, no matter what population correlation they come from, would show a very high, perfect correlation. So, a small sample of points will show a high correlation even if they were completely unrelated. On this basis we realize that for a small set of points (sample size) to substantiate an important correlation, the correlation would have to be high indeed. For example, for a sample of 10 points to substantiate an important correlation the correlation would have to be above 0.63. If 10 points gave you a correlation of 0.60, that would not be enough. With a sample of 100 (more like your case), the coefficient required to reach significance at teh 5% level would be around the correlation you have: 0.15 (for one tail) or 0.19 (for two tails). There are tables to check for the precise values but with 150 your correlation is significant at the 5% level. Your data is definitely supporting a significant correlation at the usual 5% level. Best regards, Ana

James R Knaub

Gui-jun -

R and p-value statistics are among the most used, but perhaps most misused statistics. As others have pointed out above, they are impacted by various factors, notably sample size, and though they may seem appealing as single-number measures, they need a great deal of interpretation.

For continuous data, I recommend confidence intervals. Sometimes it is pointed out that they may be technically misinterpreted, but from a practical point-of-view, they are not really misleading when making decisions about the data. An analogy with confidence intervals would be to compare one to a p-value and a type II error or power analysis. So if a confidence interval is appropriate to your application, it will be more easily interpreted/meaningful/useful.

Scatterplots are informative for continuous data, where appropriate, also.

A good measure you might also investigate is the variance of the prediction error, found in econometrics texts and elsewhere. Heteroscedasticity is generally a part of this. Rather than transform, I suggest keeping it in the error structure. It is generally transformed out so that hypothesis tests can be used, but you have heard how problematic they are.

Cheers - Jim

Article Practical Interpretation of Hypothesis Tests - letter to the...

Gui-Jun Wan

Thanks very much for your attention and best regards to all of you!

Jayadevan Sreedharan

Gui-Jun Wan

Think about the null and alternate hypothesis of your situation. The null hypothesis is r=0 and the alternate hypothesis is r#0. You are testing the above null hypothesis against the alternate hypothesis and taking a decision whether to accept the null hypothesis or reject the null hypothesis. You got a p value less than 0.05, which means you have to reject the null hypothesis (accept the alternate hypothesis), which does not mean that the observed correlation coefficient is strong. You should make a conclusion from your coefficient of correlation. Here the conclusion is that there is a very week correlation between the variables and the observed correlation coefficient is not due to chance (p-value decide the observed value or difference is due to chance or it is real). Remember that the level of significance of a correlation coefficient does not mean that the correlation is strong or perfect. Hope it is clear

Mostafa Babaeian Jelodar

Try to test all possible variables that you can theoretically link to the dependent variable as its predictor and perform regression analysis; this will provide you with more insight on how the phenomenon could be described. Careful when the number for predictor variables increase you should find the best possible combination of predictors which can maximise explanation of the dependent variable (R squared), therefore a stepwise technique would be helpful, specially backward elimination, because it allows for the consideration of all possible variables in the regression model. This would give you a much better picture than correlation coefficients alone.

Ana Nora Alix Donaldson

Greetings again Gui-jun Wa,

Best regards, Ana

Gloriam Guenzburger

What is the size of your sample? If N is large enough even small correlations will be "significant". Remember that effect size and size of sample are confounded. You can have a "statistically significant" result that is "practically" irrelevant. Also, this is a correlation, that is, all it is showing is that predictor and criterion variable, for whichever reason, consistently vary with each other. In this result, they don't vary all that much.

Rainer Duesing

I totally agree with Ana's post!

Can you explain, why you wanted to use an ANOVA, now? I didn't get the point, sorry.

Gui-Jun Wan

Dear Rainer Duesing,

Actually I agree with Ana, too. Sorry for misleading, this question was praised in my underview manuscript which is reviewed by referees. In fact, this correlation analysis can support my main idea little in this paper, or to say, the correlation analysis was an extension or exporation. The ANOVA is enough to support our main conclusion without doubt in data analysis. Considering the question of weak correlation commented by the kindly reviewer which is not so important in contributing to our main conclusions, and in case presumptuous guest usurps the host's role, we decided to remove this part of correlation analysis. As you see in the answers, the weak correlation in our manuscript does attract too much attention which may shift readers focus from the main conclusion and other corresponding analysis. We may discuss the result of correlation analysis in the next paper.

Many thanks for your question.

Best regards, Wan

Gui-Jun Wan

Dear Ana and all the people interested in this question，

Thanks again for your kindly answer. Your warm and patient answers help me a lot in better understanding my data as well as future directions of its analysis.

Best regards,

Wan

Rachana Poudel

Dear Ana,

Thank you for the information. It was really helpful for my analysis.

Geetha Chandani

"According to my study there is not significance relationship (P>0.05)But there Is Positive Correlation...."

Why happened this............?

Please give to me best answer as soon......

Rachana Poudel

If your p value is >0.05, there‘s no point in dwelling the relationship of the variables. However in a case where your p value is very close to 0.05, including more number of observations (N) may help.

Nike Gnanateja

I believe in plotting the data (scatter plot) first . Then interpret the r and p or whatever values.

Akshay Raj Maggu

What's your N? You might end up with p < 0.05 with your 'r' value being 0.1 if your N is super large. At this juncture, I must point out that the game is not solely about the 'p's and 'r's. Rather, it's about what it would mean to your research question and your data.

Norhaslinda R.

I'm doing research on lab based right now, and I also encounter the same problem in which the data have weak correlation but not significant enough to support the literature review. Comments from Ana and others had give me more thoughtful how this happen.

Thank you.

Alex Gilgur

The p-value only tells you the probability of your correlation being random. The r-value is directly proportional to the slope: b=r*(Sy/Sx), and if both Y and X are standardized, this becomes b = r. So you have a linear correlation, with low sensitivity, but you are >95% confident that you do.

Nataliia Hudz

Correlation Analysis. Achim Buyul and Peter Tsefel’s classification was employed to estimate correlation coefficients (r): up to 0.2 - very weak, up to 0.5 - weak, up to 0.7 - medium, up to 0.9 is high and over 0.9 is very high correlation

Achim Bühl, Peter Zöfel. SPSS Version 10. Einführung in die modern Datenanalyse unter Windows, 7, überarbeitete und erweiterte Auflage, Diasoft: 2005, 602 p.

Federico Nave

In relation to the question, when a correlation analysis is needed, you must have a clear idea about what coefficient of correlation could be of interest depending on the variables and the practical interpretation. I suggest that is important that previously (in the design of the research) you must to define this and then you will not have problem to interpret the results; and something more, the null hypothesis of the significance test can be changed depending on the coefficient that you want to prove, most of the statistical software tests the null hypothesis that your coefficient is equal to cero, but you can change this hypothesis to test that your coefficient is equal to a given value (Rho0):

Traditional test: Ho: Rho = 0.0

In this case when you have a large sample size, the probability to reject the null hypothesis is greater, then the result is that you have a significant p-value with a very low correlation coefficient.

That can be changed by the hypothesis on the basis of an expected Rho (r) value:

Ho: Rho = Rho0

Best regards.

Gloriam Guenzburger

the value of p depends on size of effect and SIZE OF SAMPLE: "Even a small effect will become significant in a large sample". That is in fact why there is this whole movement from getting away from significance testing. Your N= 180. that is not a small sample. My guess, without knowing more about your variables, is that you have a really small correlation but, due to the size of the sample, it has reached significance. The shared variance (R2) is puny = 0.75%. I would say these two variables are not good predictors of each other.

Badges
Science topic

More Gui-Jun Wan's questions See All

Are there any rules require us to must conduct (or report )the two-way ANOVA before a one-way ANOVA?

Hi Colleagues, For example. I have a data set which can be categorized based on sampling time and treatment. However, I only concern the main effect of treatment on parameters I measured. What I...

06 July 2019 4,955 6 View

What is the difference and relationship between "virus circulation" and "virus transmission"?

What are the differances and relationships between "virus circulation" and "virus transmission"? Examples would be better. Thanks for your help.

31 December 2014 5,971 7 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View