I am working on a project which uses waves 5, 6, and 7 of a longitudinal study (Millennium Cohort Study). I am trying to understand how to deal with missing data and how to weight the data to ensure that the sample is representative. The data include both design weights and non-response weights. I’m really confused about which weighting variable I should use. I will be using a cross-lagged panel model (SEM) to examine the bidirectional relationships between the variables of interest at the three waves. My question is, do I weight the data from each wave using the weight from that wave (e.g., wave 5 weight for wave 5 data) or do I use the wave 7 weight for all of the data? I assume I would use the design weight for this.

I am also trying to deal with missing data due to attrition, and it is recommended to use multiple imputation. Does a non-response weight need to be applied when using multiple imputation? If so, do I impute data for each wave separately using the weight for that wave, or do I impute data for the entire dataset using the wave 7 weight?

I have read the documentation for the study, but it is incredibly confusing and I’m still not sure how to handle weighing and missing data.

Any suggestions on how to handle weighting and missing data in longitudinal surveys with a complex design would be much appreciated.

More Jourdan Wilson's questions See All
Similar questions and discussions