Dear all,

I would like to pick your brains for the following problem.

I want to combine data on an exposure (thyroid function or TSH) and outcomes from 5 studies. In each study, the outcome is measured in a similar way, and is comparable between cohorts. However, the exposure is measured using different assays and the absolute values are not comparable between cohorts.

There are two assumptions under which we have aimed to combine the data.

(1) The assumption from a population-based perspective: individuals at the 5th/50th/95th etc. percentile in each cohort are comparable, and by calculating population percentiles the data can be pooled. The main issue with this approach is that you assume that the difference between each percentile is similar between cohorts, which is probably not the case.

(2) Under the assumption that a 1SD difference in TSH is a similar change in each cohort, population-based SD scores can be pooled. The main issue with this approach is that this (still) does not take into account any differences in the distribution, in other words: in cohort A 10% of women are below the 10th percentile, but in cohort B only 5% would be below the 10th percentile if TSH was measured the same as in cohort A.

Adding random a random intercept or effect did not improve the model as compared to standard linear regression adjusting for cohort.

Can anyone advise me on a strategy to find the optimal way of combining and harmonizing such data across cohorts, I have been looking for an R package but failed to identify one that fits my needs.

Many thanks, Tim

More Tim I M Korevaar's questions See All
Similar questions and discussions