What is the difference between a correlation and linear regression?

23 October 2017 28 2K Report

Correlation is between two issues e.g. age and physical activity. In linear regression you can see how e.g. depression is linked to age, sex and hours of physical activity and how much each of these variables explains or if they explain at all depression.

Han Ping Fung

What is the difference between a correlation and linear regression?

Correlation is both directional relationship between 2 variables e.g. X influence / predict Y and Y also influence / predict X. Some tests we used include Pearson Correlation (for interval or ratio variables), Spearman Correlation (for ordinal variables) etc.

Linear regression is one directional relationship from a variable to another variable e.g. X influence / predict Y. Linear regression can be simple linear regression like the above example (X -> Y) or multiple linear regression e.g. variable A, B & C influence / predict Y.

Ali Kadhum M. Al-Qurabat

What is the difference between a correlation and linear regression?

Correlation quantifies the degree to which two variables are related. Correlation does not fit a line through the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. When r is 0.0, there is no relationship. When r is positive, there is a trend that one variable goes up as the other one goes up. When r is negative, there is a trend that one variable goes up as the other one goes down.

Linear regression finds the best line that predicts Y from X. Correlation does not fit a line.

also,

Correlation is described as the analysis which lets us know the association or the absence of the relationship between two variables ‘x’ and ‘y’. On the other end,

Regression analysis, predicts the value of the dependent variable based on the known value of the independent variable, assuming that average mathematical relationship between two or more variables.

you can refer to the following sites

https://www.graphpad.com/support/faqid/1141/

keydifferences.com/difference-between-correlation-and-regression.html

Amol C Adamuthe

Regression method is useful when there is good correlation between dependent and independent variables.

S. Béatrice Marianne Ewalds-Kvist

I am used to explain things in a simple way so that my students understand.

Aladeen Alloubani

Dear Colleagues,

Thanks for your answer

Best Regards

Robert E. Davis

Correlation analysis attempts to measure direct affiliation intensity between two or more variables (Green & Salkind, 2014). Through correlation analysis, a researcher can identify and inspect the correspondence between predictor and criterion variables (Green & Salkind, 2014). Correlation between two variables depiction can occur by plotting data values on a single graph. This patterning forms a Scatterplot Diagram. If the points tend to form a straight line, there is a high correlation. If the points resemble a random pattern, there is little correlation. In standard statistical notation, the coefficient of correlation is “r” and the coefficient of determination is r-squared. The coefficient of correlation measures linear relationship strength. The coefficient of determination represents the total proportion variation in the predictor variables explained by the regression equation.

With Regression Analysis (also known as Least Squares Analysis), explanations of criterion variable attributes occur regarding one or more predictor variables. Regression analysis determines functional relationships between quantitative variables. Regression analysis permits finding trend lines and developing models based on the calculated association of variables. Simple regression employs only one predictor variable. Multiple regression employs more than one predictor variable. Jointly or separately, regression analysis extends subject correspondence seeking to find a linear relationship equation among selected variables (Green & Salkind, 2014).

Correlation and regression analysis are distinct quantitative forecasting techniques (Faul et al., 2009). A premise supporting correlation and regression analysis use is that a logical data relationship may exist and continue in the absence of researcher bias or changes in circumstances. However, neither correlation nor linear regression expositions enable cause-and-effect relationship inference from selected variables.

References

Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160. doi:10.3758/brm.41.4.1149

Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data. Upper Saddle River, NJ: Pearson Education.

Aladeen Alloubani

Dear Dr Béatrice,

Thanks for your response

Yes, this is the simplest way to explain the answer

Regards

Vahid Moqarabzadeh

Just a simple answer is that :

Linear regression and correlation both use for showing the relationship between two scale variable, but when you want predicting dependent variable by independent variable you can use linear regression.

Aladeen Alloubani

Dear Dr Vahid,

Thanks for your answer

Oluwafemi Samson Balogun

I agree with you Vahid

Martin Schmettow

In fact, there is a tight connection between (Pearson) correlation and linear regression. If beta_1 is the regression slope parameter, then

beta_1 = cor(x,y) * sd(x)/sd(y)

Consequently,

correlations do represent a linear dependence
if upfront you do z-transformation of outcome y and predictor x, you'll get the correlation again

Check out the attached R notebook for a demonstration.

The bottom line is that Pearson correlation is a measure of linear dependence under normalization of variance. The main advantage of linear regression is that the relationship is fully spelled out (including the Intercept) and on the original scale, such that one can obtain predictions. The main advantage of correlation is that, by standardization of variance, one can compare associations across different scales.

Furthermore, I would be very careful with statements that correlation is bidirectional, whereas regression is not. What to put on the right hand side of a regression model is purely the choice of the examiner.

Salma Omar Bleed

The simple correlation shows the degree, direction, and significance of the relation between two variables only without looking at which of these two variables affects the other, while the simple regression shows the relationship between the two variables with a significant indication of the variable effect on another variable.

Joachim Domsta

When talking about a two dimensional random vector, then the answer by @ Martin completely explains the difference. Let me just add some details: when the dependences of "y of x" or " x of y" are considered, then the two corresponding beta coefficients equal as follows, respectively, where cor(x,y) = cor(y,x):

1. In terms of statistical samples:

beta_1(y;x) = cor(x,y) * sd(y)/sd(x) and beta_1(x;y) = cor(x,y) * sd(x)/sd(y)

2. In terms of moments of the rv-s (\mu_2 stands for the variance = the central moment of gegree 2):

beta_1(Y;X) = cor(X,Y) * \sqrt{ \mu_2(Y)/ \mu_2(X)} and beta_1(X;Y) = cor(X,Y) * \sqrt{ \mu_2(X)/ \mu_2(Y)}

Moreover, the above relations concern the case when the regression is defined as the straight line equation y=\beta*x + \alpha best fitting the cost-function SUM:= \sum_{i=1}^n (y_i - \beta*x_i - \alpha )^2.

Remark. There are other definitions of the regression, e.g. the best fitting exponential line y = A * exp{ b*x }. Then the ralation between the correlation and the regression coefficients A and b are not so simple:)

Regards, Joachim

Joachim Domsta

Motivated by one of the answers above, I would like to share a remark, that to be independent and to be and uncorrelated are different.

EXAMPLE 1. If the set of observations is equal to {-1,0,1} \times {{0,1}, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated AND independent, in particular the conditional probability distribution of y is independent of x (this is classical pd on {0,1} ).

EXAMPLE 2. If the set of observations is equal to { (-1,0); (0,-1), (0,0), (0,1), (1,-2), (1,0), (1,2) }, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated BUT NOT independent. Indeed, the conditional probability distribution of y are dependent on x as follows: If x=-1, then it is concentrated at 0, if x=0, then the conditional pd of y is classical on {-1,0,1}, an if x=1, then then the conditional pd of y is classical on {-2,0,2}.

EXAMPLE 3. If the set of observations is equal to { (-1,0); (-1,1), (0,-2), (0,0), (1,0), (1,1) }, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated AND NOT independent. Additionaly, the conditional expectations of y given x ARE NOT THE SAME: E(y|x=-1) = E(y|x=1}= 1/2, whereas E(y|x=0)= -1. In all three examples the correlation equals 0, and the equation of the linear regression obtained by the mean square method in both cases is also the same: "y=0".

SUMMARY. One cannot state that there is no dependence of y on x in case of the correlation equal zero, even the conditional expectation may be dependent on x.

Regards, Joachim

Adriana Santos-Caballero

I agree with you Dr Joachim Domsta.

Regards!

Z. A. Al-Hemyari

Really, It is not easy to answer this question shortly. But it may be useful to explain each part of the question as below:

1. Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. The study of how variables are correlated is called correlation analysis.

A correlation coefficient gives a numerical summary of the degree of association between two variables - e,g, to what degree do high values of one variable go with high values of the other one? Correlation coefficients vary from -1 to +1, with positive values indicating an increasing relationship and negative values indicating a decreasing relationship. A “0” means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will produce).

2. Linear regression model/analysis is a technique used to predict the value of one quantitative variable by using its relationship with one or more additional quantitative variables. For example, if we know the relationship between height and weight in adult. males, we can use regression analysis to predict weight given a particular value for height.

The relationship between height and weight is familiar to us; generally, the taller a person is, the more he weighs. Another example of a familiar relationship is that of crop yield and the amount of fertilizer applied to the land; the more fertilizer applied to the land, the greater the yield-to a point. If too much fertilizer is applied, the crop will be killed off by the fertilizer chemicals-the land will be "burned." An important relationship in business is the relationship between the allocation of dollars to advertising effort and the level of sales of a product; the more money expended in advertising, the greater the level of sales (in general).

The simple linear regression model (which is one of the regression models) is a mathematical way of stating the statistical relationship that exists between two variables. The two principle elements of a statistical relationship are:

(1) the tendency of the dependent variable Y to vary in a systematic way with the independent variable X, and

(2) the scattering of points about the "curve" that represents the relationship between X and Y.

These two elements of a statistical relationship are represented in a simple linear regression model by assuming that:

(i) there is a probability distribution of Y for each value of X, and

(ii) the means of these probability distributions fall perfectly on a line.

Joachim Domsta

I like the wide explanation by @ Z. A. Al-Hemyari . BUT the last line. Example 3 of my last answer shows evidence, that the conditional averages do not have to lie on the linear regression (in that case: E(y|x=-1) = E(y|x=1}= 1/2, whereas E(y|x=0)= -1 and even though, the equation of the linear equation is "y=0". Best regards, Joachim

Andrés Santana

Correlation = how much does V2 tend to increase or decrease when V1 increases by 1 - and vice-versa. It only makes sense to talk about "how much does V increase" if V is quantitative. Thus, correlation measures the degree of linear association between two quantitative variables, V1 and V2. It is two-directional. If V1 is positively correlated with V2, then V2 is positively correlated with V1 - and vice-versa. The same applies if the correlation is negative. Indeed, correlation (V1,V2) = correlation (V2,V1).

Regression measures the effect of X1 on Y.

Linear regression measures the effect of X1 on a quantitative Y. How much does Y increase when X1 increases by 1. It is one-directional. Year of birth may affect income, but income is unlikely to affect year of birth.

Y is the effect and X1 is the cause. More Xs can and usually are included. Phenomena usually are the outcume of more than one causal factor. Several Xs may not be quantitative.

Regards

Mewa Singh Dhanoa

simple correlation (if non-spurious) is a measure of LINEAR association between two random variables. regression (I guess you mean Least squares linear regression (OLS); with some assumptions) is away to get equation in which nominated x-variable predicts the nominated response or y-variable viz : y=a+bX+error, here slope is correlation multiplied by [SD(y)/SD(x)] and the y-axis Intercept=y(mean)-slope*x(mean).

Unfortunately OLS ignores measurement uncertainty associated with the predictor variable with slope bias related consequences (see attached file)

Hussein Kadhem A-Hakeim

I have tried entering two variables only in SPSS for correlation and regression analysis. The results were same i.e., R and p were same in correlation and regression analysis. But when I used two variables to explain one variable, I found a different results, therefore, regression is used to use one or two variables to explain (predict) another dependent variable.

Mewa Singh Dhanoa

see attached for what you should be getting=>

Shady Abohashem

Correlatiob could be one or two directions and doesn't tell you which one is the cause or the result , just an association, linear regression , is a one way associoation of linearity between a predictor and an outcome which should be a continous variable

Mewa Singh Dhanoa

When you have more than one predictor, use Partial correlation to see 1-to-1 NET effects...

Md. Rezuanul Islam (Fahim)

Correlation coefficient is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other. In correlation, the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect.

The sign of the correlation coefficient (+ , -) defines the direction of the relationship either positive or negative. However, correlation does not mean causation while regression does. One can not draw cause and effect conclusions based on correlation.

There are two reasons why we can not make causal statements:

1. We don't know the direction of the cause - Does X cause Y or does Y cause X?

2. A third variable "Z" may be involved that is responsible for the co-variance between X and Y

Sabah Mushtaq Puswal

I have one queelated to this. I want to check the effect of temperature on start time of singing of x species. For this I applied linear mixed effect model minutes before sunrise as predictor and temperature as fixed site as random effect. The model shows correlation of fixed effect as -0.522 p value significant with F)

Temp 527.46 527.46 1 422.99 6.0088 0.01464 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Yahuza Salisu

In correlation, We only measure the degree of relationship while In regression we measured which of the variables course the relationship (Independent variable or predictor or explanatory variable) and also which one is taking effect on the relationship (Dependent or response variable).

Vahid Moqarabzadeh

Hi Aladeen Alloubani .

In linear models in statistics, we have a relationship and prediction. Correlation Coefficient just consider the first one, linear relationship. However linear regression consider the linear relationship, also it provide a prediction model for our variables and dataset.

Badges
Science topic

Similar topics
Correction

More Aladeen Alloubani's questions See All

Is it possible to make the correlation between 2 variables between 2 different populations and 2 different datasets? If yes how? if no; how we can?

In the first study, I have 2 populations (equal numbers) of Nurses (n=155); patients (N=155). However, I have 2 different data sets snd different surveys, whereby for the nurses we looked at...

20 February 2020 2,820 3 View

Is Magnet Status Worth the Cost?

Becoming a Magnet hospital brings the highest level of recognition to a hospital’s nursing services, but it requires significant investment in developing a well-structured nursing department. So,...

16 December 2017 5,236 13 View

What is the best method to use for a quantitative studies (a convenient sampling method or probability sampling method)?

I’d like to conduct a study to investigate the status and the nature of different leadership styles of head nurses, and its effects on the quality of nursing services as rated by patients. I’ll...

28 June 2017 6,083 7 View

What are some team leader skills that help in dealing with difficult coworkers?

Basically, the team leader in general needs to have a leadership inborn skills to lead his/her team just because he-she has leader title, so he/she needs to be a leader in reality and in title,...

08 April 2017 2,646 4 View

Does passion boost your career or does your job boost your passion?

28 February 2017 1,691 12 View

Can anyone suggest a validated questionnaire for assessing prevalence and risk factors of Iron deficiency anemia among female students?

We need a validated questionnaire for prevalence study. Please do inform us about such questionnaire.

22 March 2016 6,294 3 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Request a single Lecture notes for math as detailed as this that I can find in one place?

- The Existence/Uniqueness of Solutions to Higher Order Linear Differential Equations - Higher Order Homogenous Differential Equations - Wronskian Determinants of $n$ Functions - Wronskian...

03 August 2024 2,366 0 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

SAS Generalized Linear Model for trial/event anaysis and not survival (time to event) analysis?

I am looking for a published article using SAS or SPSS Generalized linear model for trial/event data and not survival analysis. Both software packages off the option for the number of success out...

30 July 2024 3,835 2 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

If in a panel data, T>N then which model is appropriate ?

In my data set, T is greater than N, so I chose quantile regression for my data set. So is it appropriate for that?

15 July 2024 6,416 4 View

What are the problems we face when we directly inverse a multivariate regression equation?

Why direct inversion of mutivariate regression equation is not preferred and instead optimization techniques are used?

15 July 2024 8,642 3 View