If the values of the independent variable is a part of the values of the dependent variable values, is it statistically okay to perform a regression analysis?
It might be easier to answer this question if you were to give an illustration.
Imagine a study in which a measure is used at baseline, then used again after implementing some treatment. The same measure/variable may be used as both predictor (baseline) and outcome (post-treatment). The usual question in such situations would be whether scores changed in level across occasions. So, testing for the presence of a relationship (as in linear regression) might be anticlimactic.
Unless the measure score (or the trait being measured) is temporally unstable, one would anticipate that there will be a relationship!
Thank you so much for your kind reply @David Morse. For example, relationship of head length and total length of 250 individuals. It is not about whether these two lengths have any relationship or not. Certainly they have. We want to separate the data set into 2 subsets (1st is less than or equal to a specific point and the 2nd one is greater that that specific point) with 10 cm intervals of head length throughout the data series and want to see how the relationship changes. If we do not find any difference between the two subsets, its okay. But if you notice any sudden change in the relationship between the two phases, we have a plan to derive something from here. Now the question is, as head length is contained within body length, is it okay to do such regression?
You have two dependent groups, why not use t-test for dependent groups at first. Then you select those 10 cm longer in one group and do repeated measures with your two groups. You can do it separately and then present the results in a figure with both groups.
Yes, you could use head length as a predictor of total length. The two series could be evaluated by: (a) separately estimating slope and intercept for the two subgroups, then comparing those estimates for equality (analogous to the Pothoff test); or (b) using the cut point as in spline regression to estimate slope below vs. slope above.
In general, the independent variable be contained in conditional probability density function of the dependent variable. That is why it statistically okay to perform a regression analysis.
If is there, there will be a problem of multicollinearity.
To fix the problem of multicollinearity, one of the step as mentioned here under:
1.Remove highly correlated predictors from the model. If you have two or more factors with a high VIF, remove one from the model. ...
2.Use Partial Least Squares Regression (PLS) or Principal Components Analysis, regression methods that cut the number of predictors to a smaller set of uncorrelated components.
As a point of clarification, Rahman's question did not imply the use of a set of highly correlated independent variables. Instead, he asked about an IV having a strong relationship with the DV because of a functional/structural relationship of the measures. Multicollinearity (and its indicators, such as VIF or tolerance) concerns the condition of too-high correlations among IVs only, and therefore isn't being made more likely by the situation he described.
There's something I don't understand. Why don't you plot your 250 dual measurements in an Excel sheet (head length against total length). Seeing if the relationship changes or not should be immediate ?
Thanks for this informative thread. I have a similar question: I am using an independent variable, which is calculated using the dependent variable.
In my case, the independent variable is the number of clients needed to achieve half the profit of a firm (as a measure of the composition/concentration of the client portfolio) and the dependent variable is the profit...