I am trying to estimate a duration model based on two data sets, let's say A and B. A and B are identical, except that (a) A only contains data for subjects that are older than X years, and B covers their entire life span (b) A covers all subjects, while B covers only part of them. Whether a subject is covered by data set B depends on criteria that imply that B is a biased sample. I can consider the following options (a) estimating the model using data set A only, but then I lose all the information for subjects who do not survive beyond X (b) estimating the model using data set B only, but then I lose all the information contained in set A, using a non-representative data set (c) using both data sets - this contains more information than any individual data set, but it does not solve the problem that all my data before year X are part of a biased sample. Does anyone know of statistical techniques that deal with this type of questions?

Similar questions and discussions