I prefer regression because it is slightly more informative and uses the original units of measurement of x and y. In terms of testing an association they are formally equivalent - at least if a linear regression of y on x is compared to the correlation of x and y.
Testing for association only will need correlation. Regression analysis is required when there is need to say how given one variable you can predict the other.
Correlation is used to denote association between two quantitative variables while (linear) regression is used to estimate the best straight line to summarise the association.Correlation assumes there is a linear relationship between the two variables and it also describes the strength of an association between two variables. It is symmetrical i.e. the correlation between A and B is the same as the correlation between B and A . For regression, if two variables are related it means that when one changes by a certain amount the other changes on an average by a certain amount, If y represents the dependent variable and x the independent variable, this relationship is described as the regression of y on x. Regression simply means that the average value of y is a function of x, i.e. it changes with x. Regression equation is often more useful than the correlation coefficient. It enables us to predict y from x and gives us a better summary of the relationship between the two variables.Hope this helps.
A correlation that measures the strength and nature of the relationship between two variables (x,y) , it is not necessary to have one dependent on the other (corr _x.y=corr_y.x)., while the regression measures the direction of the relationship between two or more variables, one must be a supported variable while the second ( others) variables are explained for this variable. y=f(x1,x2,x3,....xp)
The relation between the dependent variable and the explanatory variable can also be tested (by ANOVA or T-test),the correlation and regression are identical in the results in this case.(regression case)
Correlation assesses the association between any 2 variables, with no assumptions on the functional relationship between the 2 variables. Regression requires specification of a model, such as linear, to estimate parameters of the regression model; there are no such parameters for correlation.
I prefer regression because it is slightly more informative and uses the original units of measurement of x and y. In terms of testing an association they are formally equivalent - at least if a linear regression of y on x is compared to the correlation of x and y.
I'd add a scatter plot to the list of statistics for examining the relationship between two quantitative variable. As Thom says, the relationship between Pearson's correlation and linear OLS regression is straightforward. But there are other versions of these. Are you just interested in linear and least squares? If you specify your research questions more you may get more directive responses.
Regression and correlation are twin concepts. They go hand in hand and are almost inseparable. The former refers to a mathematical model expressing the dependent variable(DV) in terms of the independent variable(s) (IV). The latter indicates the direction and degree of the association between the DV and the IV.
If your purpose is to estimate an unknown value based on known values (say, to estimate baby weight based on maternal characteristics) then a correlation is useless.
In real life settings regression is the norm. Aside from having a coefficients real-life interpretation, it extends to multivariable prediction.
Both correlation and regression can capture only linear relationship among two variables.
The second mantra is:
Either result should be supplemented by a graph, illustrating what are truly the associations.
In your case, I would try to compute the correlations coefficient (we speak obviously about Pearson's correlation coefficient). If this is large (>0.90) then this would truly indicate, that your traits exhibit a linear relationship. You should feel lucky, because such relationship are encountered very rarely in practice.
If the computed correlation coefficient is small (in particular, near to zero), you know only, that your variables are not associated in a linear way. However, they might be associated according to a non-linear relationship. It is a nknown fact, that points (x,y) following an exact nonlinear functional interdependence (like half of a circular dependence), when subjected to correlation evaluation, yield a correlation coefficient equal to zero.
In other words: Seeking for any general associations between two traits can not be accomplished neither by Pearsonian correlation coefficient nor by ordinary regression,
Making a 2D graph (x,y), one gets inspiration about the kind of relationship between the two considered traits x,y. Apart from the scatterplot (suggesten by Daniel), I would subdivide the x-axis say into about 10 parts (segments), and for each part I would depict the conditional mean with confidence intervals formed from standard deviations obtained from the respective values of y corresponding to x-es from this segment.
Of course, there are many other alternatives, depending on your data and your software.
I (and i think, not only I) would be glad to hear from You, if Your data proved to exhibit nonlinear associations.
No - The data could be linearly related with a negative slope.
The correlation is not a test of linearity.
A correlation exists between any 2 variables. In theory, Corr(X,Y)=Cov(X,Y)/SD(X)SD(Y), regardless of any relationship. The question is how to estimate the correlation.
Anna – and Daniel – make important points about plotting your data.
You should note that the human eye is not good at reading scatterplots. I always put in a scatterplot smoother (lowess, usually) to check the linearity of the relaitonship.
What I should also say is that departures from linearity can encourage overfitting. It's important to match your model to your understanding of the data generation process. People are too ready to drop terms like age^2 into a model without considering why anything might be a function of the square of age.
1. You got r = - 0.473. I assume, the data do not contain repeated measurements, that is, observations measured several times on the some
subject.
The value r = - 0.473 indicates, that there exist some - not big - inverse proportionality relation between the two variables (let's call them x,y).
In other words: if one of the variables tends to increase,
then the other one tends to decrease. This might be interesting from the users perspective. However, the predictions of one variable from the values of the other will be poor, and in reality not very usable in practice.
2. Before making the statement on the negative association (trend) between the two considered variables, one should first exclude the following two circumstances which are likely to be encountered in practice:
(a) the negative trend may by due do some outliers, that 'rotate' the linear
regression line towards the group of outliers (this happens especially, when the sample size is small)
(b) The considered data set might be non-homogenous and composed, say, from two or three groups of patients, for which the investigated x and y variables behave differently.
The easiest way for excluding the (a) and (b) cases, is inspecting a scatterplot of x,y. A smoother -- advice of Ronan above -- could be helpful here.
3. Now, returning to the value r = - 0.473. Its square R-square = 0.224 is called in (linear) regression the coefficient of determinacy. Suppose, that we consider the regression model y=a +bx. The values R-square tell, what part
of total scatter of y can be explained by the introduced model. In Your case this is only 22.4 percents ; thus the remaining
1. You got r = - 0.473. I assume, the data do not contain repeated measurements, that is, observations measured several times on the some
subject.
The value r = - 0.473 indicates, that there exist some - not big - inverse proportionality relation between the two variables (let's call them x,y).
In other words: if one of the variables tends to increase,
then the other one tends to decrease. This might be interesting from the users perspective. However, the predictions of one variable from the values of the other will be poor, and in reality not very usable in practice.
2. Before making the statement on the negative association (trend) between the two considered variables, one should first exclude the following two circumstances which are likely to be encountered in practice:
(a) the negative trend may by due do some outliers, that 'rotate' the linear
regression line towards the group of outliers (this happens especially, when the sample size is small)
(b) The considered data set might be non-homogenous and composed, say, from two or three groups of patients, for which the investigated x and y variables behave differently.
The easiest way for excluding the (a) and (b) cases, is inspecting a scatterplot of x,y. A smoother -- advice of Ronan above -- could be helpful here..
3. Now, returning to the value r= - 0.473. Its square R-square = 0.224 is called in (linear) regression the coefficient of determinacy. Suppose, that we consider the regression model y=a +bx. The values R-square tell, what part
of total scatter of y can be explained by the introduced model. In Your case this is only 22.4 percents ; thus the remaining 77.6 percentage of the y-scatter
remains unpredictable. This means that for 3 out of 4 patients the value of y is statistically completely unpredictable. It is something, yet not much.
4. If this problem is for you very important and you would like to deepen it,
you might compute - from your data of size n obtained in your experiment- say about B=100 bootstrap samples. Each bootstrap sample yields a slightly different correlation coefficient r. A histogram might show their distribution. It is interesting to see, how much might vary the computed r coefficient in the environment of the sample obtained in your experiment.
4. If this problem is for you very important and you would like to deepen it,
you might compute - from your data of size n obtained in your experiment- say about B=100 bootstrap samples. Each bootstrap sample yields a slightly different correlation coefficient r. A histogram might show their distribution. It is interesting to see, how much might vary the computed r coefficient in the environment of the sample obtained in your experiment.
The correlation coefficient is used to determine the direction and strength of the relationship between two variables, whether quantitative or qualitative, while the regression coefficient is used to determine the effect of an independent variable on the dependent variable,and compute or determine the explained and unexplained variation shown and undefined.
This response is quite in order. In addition to the cause and effect issue regression lanalysis also gives the probability value of the analysis or prediction.
Correlation and regression can be spurious and not real. Therefore first of all we should be sure that there is a real association between the two variables before fitting a regression to them.
Correlation only gives you the amount of association between two variables which are assumed to be linear (r), whereas the regression tells you how a change in the predictor variable(s) affects the predicting variable in the form of an equation.
I determined regression between two variable in which one factor dependent on the other factor such as factor A is caused by factor B in sample word factor is dependent on factor B and factor A is influence from factor B. Can I provide it " A vs B" or "B vs A". Which one is correct?
please help me and if you provide any literature result it will be really helpful?