I asked this because according to Gujarati (1995), one of the remedial measures to mitigate the problem of multicollinearity is to increase the sample size. Thus, can we say that multicollinearity is not a problem in big data?
It is still an issue. The correlations will have smaller errors, but if the population rhos are non-zero there will still be collinearity (and in fact more likely).
But why do you think it is a problem? If variables are correlated in nature, then they are. You can apply some transformation (e.g., use per capita variables rather than frequencies), and of course you should treat any individual effects (the betas) as if they are not conditional estimates. And if the collinearity is really high you can exclude a few variables. How many predictors do you have?
Thank you, Prof. Daniel, for your prompt response. I am about working on a study on big data but was curious to know if multicollinearity is also a problem that needs to be handle. Your contribution is helpful.
This problem is more challneging in larger sample and one way to deal with this issue is to use principal component analysis. Note that it is one the main assumptions prior running regression analysis