I working on my dissertation and will be using multiple imputation with SAS to handle missing data. My dataset has 3 forms: one for parent’s evaluation of a child, one for counselor's evaluation of the child, and one for the child’s evaluation of him- or herself. Children get an extra 4 questions on their form that are not on parent or counselor forms. I could collapse all the data, so that there is only one record per identifier, but then I’m afraid that I’m not building variability in at the right step. I suppose my question come down to this: I’ll be analyzing my data at the census tract level which will require me to collapse (aggregate) my data in several stages, so in which stage do I use multiple imputation? For example, parent/youth/counselor records will be aggregated into one set of records and then those records will be further aggregated into census tracts.

Just to make things a little more interesting, I’m also adding in data from another source (census stats) and some of this information is missing. If I merge these records to individuals then any imputation will result in the same census tract having multiple answers. I have a feeling that aggregating those tracts together (which will need to be done) will screw up the variability. On the other hand I could do the imputation in multiple stages, but I wasn’t sure about the preferred methodology.

More Rick Massatti's questions See All
Similar questions and discussions