Regression is used to fit regression coefficients to some given data, based on some hopefully appropriate error model. This is used to analyse "correlations" between predictors ("independent" variables) and a response (the "dependent" variable). Note that t-tests and ANOVAs are also related to regression (under the hood these are regression models with categorical predictors, a metric response and a normal error model; the focus here is not on the estimation of the coefficients but on some other model statistics).
Statistics has no tools to study "causality". Causality enters "by design": the study has to be designed in a way so that a any observed relationship can only be explained by a causal interaction. This is possible only in real experiments with explicit modulation of the interesting predictor (keeping all other influential factors as constant as possible or under statistically control). The data from observational studies can be analyzed with the same methods (regression etc.) but it is usually much more difficult if not impossible to infer causal interactions from such data.
You never test a causal relationship. You always can only assume a causal relationship. The data from a designed experiment can be used to estimate the strength of this relationship.
Regression is used to fit regression coefficients to some given data, based on some hopefully appropriate error model. This is used to analyse "correlations" between predictors ("independent" variables) and a response (the "dependent" variable). Note that t-tests and ANOVAs are also related to regression (under the hood these are regression models with categorical predictors, a metric response and a normal error model; the focus here is not on the estimation of the coefficients but on some other model statistics).
Statistics has no tools to study "causality". Causality enters "by design": the study has to be designed in a way so that a any observed relationship can only be explained by a causal interaction. This is possible only in real experiments with explicit modulation of the interesting predictor (keeping all other influential factors as constant as possible or under statistically control). The data from observational studies can be analyzed with the same methods (regression etc.) but it is usually much more difficult if not impossible to infer causal interactions from such data.
You never test a causal relationship. You always can only assume a causal relationship. The data from a designed experiment can be used to estimate the strength of this relationship.
can you kindly further explain me the difference between correlation and regression as when regression serves the same purpose as correlation provides then its not enough to test correlation to test the association between two variables why we go for regression analysis????
Correlation is a different aspect of how to look at linear or monotone relationships between two variables. The correlation coefficient is a kind of a "signal-to-noise" ratio. Knowing the correlation (coefficient) can tell you if the signal is (considerably) larger than the noise, but it does not tell you anything about the signal or the noise individually.
Regression is a technique to fit a functional model to data (the fit is obtained by maximizing the likelihood of the observed data baed on a given probability model). It estimates regression coefficients, it can model any kind of functional relationship (not just linear or monotone) between a "predictor" variable and a "response" variable and it can consider several different predictor variables at the same time. And it can model interactions between predictor variables (i.e. synergistic or antagonistic effects on the response).
Suppose you want to find the relationship between Nitrogen, Potassium, Phosphate, water, temperature and "sunlight" on the growth of plants. You can create an observational data set by doing soil analysis and setting up a rain gauge, thermometer and light monitor at dozens of plots of land. This will show a relationship between your predictor variables and your response variable(s).
A better option is to use some type of a designed experiment (generally some type of a factorial design) where you control the amount of N, P, K, H2O, Temp and light. Since you control the amount of nutrients, water, temp and light, you eliminate a lot variability you can't control in nature.
I can say the plant had an average temp of 25C. In nature that could be 25C +/- 30C. In a lab, that could be 25C +/- 2C.
I can say the plant got 100L of water over a 50 week period. In nature, that could be 30L all at once followed by a drought for 15 weeks, 30L followed by 15 weeks of drought, etc. In a lab, I can say it was 2L per week for 50 weeks.
These are some of the reasons why you shouldn't say there is a causal relationship between variables in an observational study.
You could take a look at Mulaik, S. A. (2009). Linear Causal Modeling with Structural Equations which gives a very rounded introduction into the relation between questions about causal relations vs. functional relations between variables.
Under certain restrictions, it is possible, from observational data alone, to learn about the causal structure (see e.g., Pearl, J., 2000. Causality; or Spirtes, P., Glymour, C. N., & Scheines, R. 2000. Causation, Prediction, and Search. MIT Press.). These restrictions concern the structure of the causal system (no circular paths between variables - acyclicity), its parameters (if variable x acts on z, there is no further variable y, which also acts on z, cancelling the effect of x - faithfulness) and the set of variables (no common causes from variables in the system description are omitted from that description).
The standard approach to test causal connections between variables is to do experimental manipulations. Say, you are interested in whether the relationship between two variables x and y is due to a causal connection from x to y (x->y). Then, you manipulate x with a manipulation M (with M->x) in an experiment and check, whether there still is a relationship between x and y. If you find such a relationship, you can conclude on a causal path x->y provided, there are no confounding variables. A confounding variable z is a further cause of y (z->y) which correlates with M (is not statistically independent from M).
We often learn regression in an experimental situation. Enclose air in a fixed volume and measure the change in pressure with a change in temperature. Plot the results and fit a straight line by regression. We are confident that the change in temperature caused the change in pressure. We are confident that the scatter about the fitted line is the result of measurement errors.
Cause may be argued to be the addition of heat and, therefore, temperature was also a consequence. This argument claims that temperature and pressure are correlated. Temperature is not a cause.
The temperature-pressure relationship departs from a straight line above a certain temperature (or above a certain pressure). What caused the departure, temperature, pressure, or heat? We might ascribe the departure to a phase transition. Is this a cause or an explanation?
More complicated systems such as human health risk are more confusing as to cause. Assume you wish to determine the risk of heart attack from drinking coffee. Assume your collected data has a straight-line relationship. The scatter about the fitted line cannot be only measurement error. The uncertainty includes differences in individual response and many other individual contributing factors.
Is your coffee risk factor a cause? Is it correlated with an untested factor? How much scatter can be tolerated about the fitted line?
I agree with the comments above. I am not such expert, but I remember that covariance is a concept, on which both correlation and regression are based. In addition it is generally known that CORRELATION (and also REGRESSION I believe) is not CAUSATION.
I also heard that structural equation modelling should lead closer to causation - but I have no experience in their use yet...