When we use the word 'impact' in our research objective. Like impact of a variable on another. is it necessary to apply regression analysis or there exists any other techniques?
When you say "impact", do you want to imply that there is a cause-and-effect relationship ? Or just that there is some kind of relationship (e.g. correlation) ? And maybe that knowledge about the real world can bring one to the conclusion of "impact" ?
@Annu Annu, I agree with @Salvatore S. Mangiafico. Please clarify whether you want to "correlation" or "causation" by the "impact of one variable on another variable". Regression is recommended for cause-and-effect relationships.
Can we generalise impact of one variable on another by using 'correlation' only? Because in the objective I didn't specify that I will establish a 'causation' or cause and effect relationship.
Since you want to generalize the impact only with the term of "correlation":
Correlation is a two-way street! If variables X and Y are highly correlated, you can say that there is a relationship between them, either positive or negative!
-You can claim for example that for your sample, records with high values of X tend to have high (or low) values of Y.
-But you cannot claim that higher values of X result in higher (or lower) values in Y! Or the opposite!
-Therefore, correlation does not imply "impact" of one variable over the other, it just reveals their relationship!
> If you want to check for possible relationship between two variables, X and Y, the first thing you should do is plot them! The graph would reveal whether there is a relationship or not!
> You can use some correlation coefficients, like Pearson or Spearman correlation (you can search for them in wikipedia). These can give you some answers, depending on your question and data.
> You can always use regression techniques to get their relationship. Based on the graph you take from their plot, you might find that they have a quadratic relationship, so you should consider the variable X^2 along with the original X to get a more optimal result!
I generally agree with Panos Petsas , but I think it's also fair to invoke the word "impact" if you can reasonably assume there's some causal relationship. Like if I say there's a correlation between ice cream consumption and air conditioner use. No one thinks that one causes the other. But if I say there's a correlation between daily air temperature and air conditioner use, it's fair to say that one impacts the other, because we know from life experience that it's reasonable to think that people use the air conditioner more when the daily air temperature is higher.
Overall, I wouldn't worry about about using the word "impact" in your research objective. Instead, just report your findings as fairly as possible, whether you think there's an "impact" or not.
Panos said "...the first thing you should do is plot...." Yup. However, in doing regression, one is the predictor and the other is the response, so beware that the regression may well have omitted variable bias. The best predicted-y is from the best combination of predictors - not too many and not too few, and just the right ones. But you can still see how one predictor acts alone, though it may be somewhat different in combination with other predictors. Also, sometimes, one predictor is best.
James R Knaub You are absolutely right! I assumed that there is only one predictor variable (X). If there are many predictors, your approach is the most suitable!
There are other possibilities besides regression. But selecting an appropriate approach depends both on how much data you have, and on what fits into your overall objective for your immediate project..
For example if both:
(i) you have a lot of data;
(ii) you want an easily interpretable report....
then you could aim to show estimated probability density functions of one variable, conditional on another variable being in particular classes. This would allow a visual presentation of the size of any effect in comparison to the size of underlying unexplained variations.
Other versions of this might use box-plots to show a visual assessment. Other visual approaches might include multi-coloured scatter plots to attempt to deal with several variables.
But to select a good approach you need to start with thinking about what would be a good way of presenting any results. There may be too much emphasis nowadays on "significance tests" rather than on relating the size of any "impact" to the real-world situation.
verified by a graphical residual analysis and a cross-validation to be reasonable. Then we see the "impact" of x on y, say x^3 predicts y, or whatever other function of x is found to perform well, the simplest being a ratio estimator, to predict for y.
But we may have other predictors needed, so that
y = f(x) + (predicted-y - f(x)) + e
is appropriate. Then we still see the "impact" of x, by whatever function, in whatever combination with other predictors.
Remember that the e, or better epsilon, often have higher sigma associated with larger predicted-y-values.
The word impact suggests that we are presenting a measure of effect size. And yes, there are many measures of effect size, some derived from regression models of various sorts, and others not so derived.
For example, we can measure the effect of a treatment on a binary outcome such as recovery using the relative risk, the odds ratio, the number needed to treat (and the number treated needlessly!), the preventable fraction in the treated etc.
The measure of effect size is determined not by the statistical model but vice versa. We have to specify the question to know the effect size estimate that will best answer it.
The way the question is put implies some potential agency of X over Y, and perhaps also a not so hidden third variable over which both are changing. An obvious current case would be the impact of CO2 (X) on global mean surface temperature (Y) where both are changing through time. In such a case it is helpful to work with delta-X versus delta-Y to help reduce the effect of the third variable, time. It is also useful to work with lagged cases - this years delta-X and last year's delta Y to assist with judging the direction of the agency. A lagged correlogram can be very revealing suggesting, as in this case, a two-way influence where both X and Y feed into and are fed from some wider system.