I have a complicated issue. I have a dataset of 44000 surveyed individuals from 68 countries for different time periods. All time periods are non-successive. for example data for Egypt is available for the year 2008 and 2012, data for Argentina is available for the year 2000 and 2005, etc. now this unbalanced pooled data is not representative of the whole country as I am selecting individuals of a certain religion. I ended up with very few observations from one country and very large observations from another country in which this religion is dominant. Some countries has as little as 1 observation and other countries has about 3000. I am testing the impact of the religiosity of a follower on his/her economic outcome living under different qualities of governments. Data about governmental qualities are macro data. so I tried collapsing my dataset by mean so I can add the government quality variable. I ended up with 114 observation which I ran a fixed effect model and got good convincing results with R-squared of about .50.
hettest produced Prob > chi2 = 0.7608. My main model produces Corr(u_i, Xb) = -0.8220 which indicates that fixed effects are strongly correlated with my explanatory variables and that FE is essential to take care of endoginety , rho =.7888 suggesting good reliability.
is collapsing such a very unbalanced datset was the right thing to do? is there a way to weight countries based on the number of observations they have? what would be the alternative if collapsing the set is not the right thing to do?