I'm performing a simple linear regression with one dependent and one independent variable: dependent variable (y): Nighttime lights raster, Independent variable (x): Population raster.
The issue is that the population raster was derived in part from nighttime lights data (among other sources). When I run the regression, I get a relatively high r-squared, which intuitively makes sense—areas with more lights tend to have more people.
However, I'm concerned about circularity: since the independent variable (population) was partially derived from the dependent variable (nighttime lights), does this invalidate the regression results or introduce bias? Does this make the regression model statistically invalid or just less informative? How should I interpret the r-squared in this context?
Any guidance on how to properly frame or address this issue would be appreciated.
PS: Please do not post answers derived from an LLM!