I'm studying the determinants of subjectve well being in my country and I have reported satisfaction with life as my dependent variable and almost 40 independent variables. I ran multicolinearity tests and I dind't find values bigger than 5 (in fact, just two variables had a VIF above 2). Also, my N=22 000, so I dont expect to have an overfitted model. Actually, at thet beggining, all was going well: the variables maintained their signficance and the values of their coefficients when I added or deleted some variables to test the robustness of the model and the adjusted R squared increased with the inclusion of more variables.
However, I finally included some variables that measure the satisfaction with specific life domains (family, work, profession, community, etc.) and there is when the problem started: my adjusted R squared tripled and the significance and even the signs of some variables changed dramatically, in some cases, in a counterintuitive way. I also tested multicolinearity and the correlation of these variables with the other estimators and I didn't find this to be a problem.
The literature says that it is very likely that there are endogeneity problems between satisfaction with life domains and satisfaction with life since it is not too much the objective life conditions that affect life satisfaction but the propensity to self-report satisfaction. Can this be the cause of my problem? If so, how?
PD: I'm not trying to demonstrate causality.