Hello everyone. I wanted to use rolling regression analysis , but i have multicollinearity issue and i do not want to delete variables as i only have 9 variables. Any suggestions?
Nine variables can be too many, depending upon circumstances. I guess you could use principal components, but I'd worry about being mislead by noise or the particular data set each time. You don't want to overfit.
You can compare model performances for a given sample using "graphical residual analysis" scattetplots. (You can search for that on the internet.) Thus you might compare various "reduced" models for a given sample. But you need to worry about overfitting, so "cross-validation" would be particularly important here, I think. Trying the comparison on each of several different samples might help, if you can.
If a scattetplot of y on the y-axis, and predicted-y on the x-axis does not indicate heteroscedasticity, you may have a model problem. Sometimes heteroscedasticity can be made artificially greater by model and/or data problems, but when there is no heteroscedasticity, that can be a problem too, which may particularly be a problem for complex models. That is, if you need too many predictors, you may not be able to model but so well. I know some models need a lot more predictors, but I don't think that is the ideal situation. See https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR.
To reiterate, if you really do need a lot of predictors, you might not be able to model very well. Can you think of a subject matter justification for this?
In addition to Dr Knaub's suggestion I would suggest trying variable selection with one of the lasso approaches. Lasso methods don't usually allow collinear solutions. References can be found in the attached paper. Best wishes David Booth
James R Knaub David Eugene Booth thank you for both of you for your explanations. My idea is that i wanted to discover if there is a change in the relationships and trend in the relationships between variables for 20 days only for 37 stations. meaning that every stations had 20 days.
I thought you might mean something like that, but you have to decide on the regression equation format you are going to use. Actually, that's the key, isn't it? You might want to use all nine predictors, but that does not sound feasible. In fact, if you have collinearity, that impacts the relationships between variables, sometimes even changing a sign. If you are going to look at this, then how can you tell much about relationships when they are so fragile?
Regression can be helpful. I also really like scatterplots. You could experiment with them. But like anything else, both regression and scatterplots can also be misinterpreted, so you will want to keep an open mind on interpretation. - Graphics can help look at the big picture better than various individual measures.
Perhaps someone else who has worked more specifically on this issue may have more suggestions.
As Dr Knaub has mentioned scatterplots are good and as Dr Mohammed mentioned VIFs are good statistics to look at. I would still consider a lasso approach. Some references are in the attached paper. Best wishes David Booth