Correlation is between two issues e.g. age and physical activity. In linear regression you can see how e.g. depression is linked to age, sex and hours of physical activity and how much each of these variables explains or if they explain at all depression.
What is the difference between a correlation and linear regression?
Correlation is both directional relationship between 2 variables e.g. X influence / predict Y and Y also influence / predict X. Some tests we used include Pearson Correlation (for interval or ratio variables), Spearman Correlation (for ordinal variables) etc.
Linear regression is one directional relationship from a variable to another variable e.g. X influence / predict Y. Linear regression can be simple linear regression like the above example (X -> Y) or multiple linear regression e.g. variable A, B & C influence / predict Y.
What is the difference between a correlation and linear regression?
Correlation quantifies the degree to which two variables are related. Correlation does not fit a line through the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. When r is 0.0, there is no relationship. When r is positive, there is a trend that one variable goes up as the other one goes up. When r is negative, there is a trend that one variable goes up as the other one goes down.
Linear regression finds the best line that predicts Y from X. Correlation does not fit a line.
also,
Correlation is described as the analysis which lets us know the association or the absence of the relationship between two variables ‘x’ and ‘y’. On the other end,
Regression analysis, predicts the value of the dependent variable based on the known value of the independent variable, assuming that average mathematical relationship between two or more variables.
Correlation analysis attempts to measure direct affiliation intensity between two or more variables (Green & Salkind, 2014). Through correlation analysis, a researcher can identify and inspect the correspondence between predictor and criterion variables (Green & Salkind, 2014). Correlation between two variables depiction can occur by plotting data values on a single graph. This patterning forms a Scatterplot Diagram. If the points tend to form a straight line, there is a high correlation. If the points resemble a random pattern, there is little correlation. In standard statistical notation, the coefficient of correlation is “r” and the coefficient of determination is r-squared. The coefficient of correlation measures linear relationship strength. The coefficient of determination represents the total proportion variation in the predictor variables explained by the regression equation.
With Regression Analysis (also known as Least Squares Analysis), explanations of criterion variable attributes occur regarding one or more predictor variables. Regression analysis determines functional relationships between quantitative variables. Regression analysis permits finding trend lines and developing models based on the calculated association of variables. Simple regression employs only one predictor variable. Multiple regression employs more than one predictor variable. Jointly or separately, regression analysis extends subject correspondence seeking to find a linear relationship equation among selected variables (Green & Salkind, 2014).
Correlation and regression analysis are distinct quantitative forecasting techniques (Faul et al., 2009). A premise supporting correlation and regression analysis use is that a logical data relationship may exist and continue in the absence of researcher bias or changes in circumstances. However, neither correlation nor linear regression expositions enable cause-and-effect relationship inference from selected variables.
References
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160. doi:10.3758/brm.41.4.1149
Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data. Upper Saddle River, NJ: Pearson Education.
Linear regression and correlation both use for showing the relationship between two scale variable, but when you want predicting dependent variable by independent variable you can use linear regression.
In fact, there is a tight connection between (Pearson) correlation and linear regression. If beta_1 is the regression slope parameter, then
beta_1 = cor(x,y) * sd(x)/sd(y)
Consequently,
correlations do represent a linear dependence
if upfront you do z-transformation of outcome y and predictor x, you'll get the correlation again
Check out the attached R notebook for a demonstration.
The bottom line is that Pearson correlation is a measure of linear dependence under normalization of variance. The main advantage of linear regression is that the relationship is fully spelled out (including the Intercept) and on the original scale, such that one can obtain predictions. The main advantage of correlation is that, by standardization of variance, one can compare associations across different scales.
Furthermore, I would be very careful with statements that correlation is bidirectional, whereas regression is not. What to put on the right hand side of a regression model is purely the choice of the examiner.
The simple correlation shows the degree, direction, and significance of the relation between two variables only without looking at which of these two variables affects the other, while the simple regression shows the relationship between the two variables with a significant indication of the variable effect on another variable.
When talking about a two dimensional random vector, then the answer by @ Martin completely explains the difference. Let me just add some details: when the dependences of "y of x" or " x of y" are considered, then the two corresponding beta coefficients equal as follows, respectively, where cor(x,y) = cor(y,x):
Moreover, the above relations concern the case when the regression is defined as the straight line equation y=\beta*x + \alpha best fitting the cost-function SUM:= \sum_{i=1}^n (y_i - \beta*x_i - \alpha )^2.
Remark. There are other definitions of the regression, e.g. the best fitting exponential line y = A * exp{ b*x }. Then the ralation between the correlation and the regression coefficients A and b are not so simple:)
Motivated by one of the answers above, I would like to share a remark, that to be independent and to be and uncorrelated are different.
EXAMPLE 1. If the set of observations is equal to {-1,0,1} \times {{0,1}, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated AND independent, in particular the conditional probability distribution of y is independent of x (this is classical pd on {0,1} ).
EXAMPLE 2. If the set of observations is equal to { (-1,0); (0,-1), (0,0), (0,1), (1,-2), (1,0), (1,2) }, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated BUT NOT independent. Indeed, the conditional probability distribution of y are dependent on x as follows: If x=-1, then it is concentrated at 0, if x=0, then the conditional pd of y is classical on {-1,0,1}, an if x=1, then then the conditional pd of y is classical on {-2,0,2}.
EXAMPLE 3. If the set of observations is equal to { (-1,0); (-1,1), (0,-2), (0,0), (1,0), (1,1) }, and if the weights (probabilities of the points) are all the same, then their coordinates are uncorrelated AND NOT independent. Additionaly, the conditional expectations of y given x ARE NOT THE SAME: E(y|x=-1) = E(y|x=1}= 1/2, whereas E(y|x=0)= -1. In all three examples the correlation equals 0, and the equation of the linear regression obtained by the mean square method in both cases is also the same: "y=0".
SUMMARY. One cannot state that there is no dependence of y on x in case of the correlation equal zero, even the conditional expectation may be dependent on x.
Really, It is not easy to answer this question shortly. But it may be useful to explain each part of the question as below:
1. Correlation is used to test relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. The study of how variables are correlated is called correlation analysis.
A correlation coefficient gives a numerical summary of the degree of association between two variables - e,g, to what degree do high values of one variable go with high values of the other one? Correlation coefficients vary from -1 to +1, with positive values indicating an increasing relationship and negative values indicating a decreasing relationship. A “0” means there is no relationship between the variables at all, while -1 or 1 means that there is a perfect negative or positive correlation (negative or positive correlation here refers to the type of graph the relationship will produce).
2. Linear regression model/analysis is a technique used to predict the value of one quantitative variable by using its relationship with one or more additional quantitative variables. For example, if we know the relationship between height and weight in adult. males, we can use regression analysis to predict weight given a particular value for height.
The relationship between height and weight is familiar to us; generally, the taller a person is, the more he weighs. Another example of a familiar relationship is that of crop yield and the amount of fertilizer applied to the land; the more fertilizer applied to the land, the greater the yield-to a point. If too much fertilizer is applied, the crop will be killed off by the fertilizer chemicals-the land will be "burned." An important relationship in business is the relationship between the allocation of dollars to advertising effort and the level of sales of a product; the more money expended in advertising, the greater the level of sales (in general).
The simple linear regression model (which is one of the regression models) is a mathematical way of stating the statistical relationship that exists between two variables. The two principle elements of a statistical relationship are:
(1) the tendency of the dependent variable Y to vary in a systematic way with the independent variable X, and
(2) the scattering of points about the "curve" that represents the relationship between X and Y.
These two elements of a statistical relationship are represented in a simple linear regression model by assuming that:
(i) there is a probability distribution of Y for each value of X, and
(ii) the means of these probability distributions fall perfectly on a line.
I like the wide explanation by @ Z. A. Al-Hemyari . BUT the last line. Example 3 of my last answer shows evidence, that the conditional averages do not have to lie on the linear regression (in that case: E(y|x=-1) = E(y|x=1}= 1/2, whereas E(y|x=0)= -1 and even though, the equation of the linear equation is "y=0". Best regards, Joachim
Correlation = how much does V2 tend to increase or decrease when V1 increases by 1 - and vice-versa. It only makes sense to talk about "how much does V increase" if V is quantitative. Thus, correlation measures the degree of linear association between two quantitative variables, V1 and V2. It is two-directional. If V1 is positively correlated with V2, then V2 is positively correlated with V1 - and vice-versa. The same applies if the correlation is negative. Indeed, correlation (V1,V2) = correlation (V2,V1).
Regression measures the effect of X1 on Y.
Linear regression measures the effect of X1 on a quantitative Y. How much does Y increase when X1 increases by 1. It is one-directional. Year of birth may affect income, but income is unlikely to affect year of birth.
Y is the effect and X1 is the cause. More Xs can and usually are included. Phenomena usually are the outcume of more than one causal factor. Several Xs may not be quantitative.
simple correlation (if non-spurious) is a measure of LINEAR association between two random variables. regression (I guess you mean Least squares linear regression (OLS); with some assumptions) is away to get equation in which nominated x-variable predicts the nominated response or y-variable viz : y=a+bX+error, here slope is correlation multiplied by [SD(y)/SD(x)] and the y-axis Intercept=y(mean)-slope*x(mean).
Unfortunately OLS ignores measurement uncertainty associated with the predictor variable with slope bias related consequences (see attached file)
I have tried entering two variables only in SPSS for correlation and regression analysis. The results were same i.e., R and p were same in correlation and regression analysis. But when I used two variables to explain one variable, I found a different results, therefore, regression is used to use one or two variables to explain (predict) another dependent variable.
Correlatiob could be one or two directions and doesn't tell you which one is the cause or the result , just an association, linear regression , is a one way associoation of linearity between a predictor and an outcome which should be a continous variable
Correlation coefficient is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other. In correlation, the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect.
The sign of the correlation coefficient (+ , -) defines the direction of the relationship either positive or negative. However, correlation does not mean causation while regression does. One can not draw cause and effect conclusions based on correlation.
There are two reasons why we can not make causal statements:
1. We don't know the direction of the cause - Does X cause Y or does Y cause X?
2. A third variable "Z" may be involved that is responsible for the co-variance between X and Y
I have one queelated to this. I want to check the effect of temperature on start time of singing of x species. For this I applied linear mixed effect model minutes before sunrise as predictor and temperature as fixed site as random effect. The model shows correlation of fixed effect as -0.522 p value significant with F)
In correlation, We only measure the degree of relationship while In regression we measured which of the variables course the relationship (Independent variable or predictor or explanatory variable) and also which one is taking effect on the relationship (Dependent or response variable).
In linear models in statistics, we have a relationship and prediction. Correlation Coefficient just consider the first one, linear relationship. However linear regression consider the linear relationship, also it provide a prediction model for our variables and dataset.