In a regression with a database with N=1200, I have an independent dummy variable that measures if the surveyed is unemployed or employed. The variable has the following characteristics:

Unemployment = 0 - Frecuency: 1196

Unemployment = 1 - Frecuency : 4

The regression gives me a significant coefficient, but, also, very counter intuitive (especifically, thay Life Satisfaction has a possitve association with unemployment). I think, however, that it's wrong to obtain a valid conclusion from just 4 cases in Unemployment=1. I also have other dummy variables where the situation is even less clear. For example:

Dummy = 0 - Frecuency: 1170

Dummy = 1 - Frecuency: 30

Or even more:

Categorical option A = 0 - Frecuency: 1150

Categorical option B = 1 - Frecuency: 30

Categorical option C = 2 - Frecuency: 12

Cateogorical optio D = 3 - Frecuency: 8

Can I obtain valid conlcusions from this? And, in more general terms, is there a minimun number of observations needed per category of response in each independent variable so the conslusions that arise from it are pertinent/correct? If that's the case, how can I calculate this number?

More Santiago Valdivieso's questions See All
Similar questions and discussions