All,
I have a data-set which contains information on both the type of property an individual(s) purchased, along with some information on the individuals themselves.
However, I am missing age related information for a number of years and I need to account for this as I think not thinking it will bias results. My data is not missing at random - I am aware that the missing cohort are generally older.
The variables I'm using in my model to predict age include; region the property was purchased, price of the house, whether the person was a first time buyer, whether the property was a house or apartment. When I run the model I'm getting an R2 of 0.26.
I also compare the median age of records that I actually have a value for, along with the predicted age. I have a median age of 39 for the actual age as opposed to 46 for the predicted age.
Is there any literature out there that has looked at attempting to predict the age of home buyers?