06 December 2023 2 5K Report

Forgive me if I may not have been standard in my formulation of the question.

I wish to analyze the effect of residential environmental factors on depression through linear regression. In the conceptualization of the questionnaire, there are many items in the dependent variable. Two of the questions are 1) whether there is a sports facility near the residence and 2) if so, what is the environment of that sports facility.

Now the problem is that question 2 is only asked if the answer to question 1 is "yes". So as it stands, question 2 will have a lot of missing values. However, in my linear regression, I want to study the effect of both question 1 and 2 on depression. In this case, how should I change the way the questionnaire is asked, or the way the model is constructed, so that both questions can be taken into account in the linear regression.

Similar questions and discussions