How to optimally combine/harmonize data measured using different methods?

22 November 2016 6 5K Report

Dear all,

I would like to pick your brains for the following problem.

I want to combine data on an exposure (thyroid function or TSH) and outcomes from 5 studies. In each study, the outcome is measured in a similar way, and is comparable between cohorts. However, the exposure is measured using different assays and the absolute values are not comparable between cohorts.

There are two assumptions under which we have aimed to combine the data.

(1) The assumption from a population-based perspective: individuals at the 5th/50th/95th etc. percentile in each cohort are comparable, and by calculating population percentiles the data can be pooled. The main issue with this approach is that you assume that the difference between each percentile is similar between cohorts, which is probably not the case.

(2) Under the assumption that a 1SD difference in TSH is a similar change in each cohort, population-based SD scores can be pooled. The main issue with this approach is that this (still) does not take into account any differences in the distribution, in other words: in cohort A 10% of women are below the 10th percentile, but in cohort B only 5% would be below the 10th percentile if TSH was measured the same as in cohort A.

Adding random a random intercept or effect did not improve the model as compared to standard linear regression adjusting for cohort.

Can anyone advise me on a strategy to find the optimal way of combining and harmonizing such data across cohorts, I have been looking for an R package but failed to identify one that fits my needs.

Many thanks, Tim

Fadi Al Machot

I recommend to use Dempster-Shafer or kalman fusion. Both can be used to fuse different measurements.

Wouter Edeling

Combining several studies sounds like a meta analysis problem, although I'm not very familiar with this area.

Perhaps you could do a Bayesian analysis on the first data set, and then update the result with the second data set, using the posterior as the prior in the second Bayesian analysis.

http://labstats.net/articles/combining_experiments.html

Tim I M Korevaar

Thanks both for helping out. I have been trying to grasp the Depster-Shafer and kalman fusion, but this seems a bit out of my league - and most of the literature on these techniques is on differences in questionnaire categories which is not fully applicable to TSH (i.e. a skewed continuous measurement).

Reading on Wouter's suggestion, this would make sense if I wanted to pool the results - however, I will have individual participant data available and wish to first pool the datasets and then analyze the data. Nontheless, even if I would do a two-step analysis: If the measurements are dissimilar between the cohorts, I will misclassify individuals when pooling - regardless of how I pool. I need to overcome this misclassification, as it will bias my effect estimates towards the null.

Jerry Miller

Hi Tim,

Your assumptions sound very logical to me. If test 'A' has a 10th percentile value, then that should mean that 10 percent of cases (in the population used to normalize the test originally) fall below that value and 90 percent above. Test 'B' should be the same: the 10th percentile is the 10th percentile.

With that said, test 'A' and test 'B' might not give the same result for the same patient. There are certainly differences between laboratory tests --- the same patient may test in the 'high' range at one laboratory using one test, and test 'normal' in another (or the same) laboratory using a different test. Why this is so isn't clear; but from what I understand there is a lot of lab-to-lab inconsistency---which suggests variation among technicians---as well as variations among the original normative populations that each company's tests were normalized on when they were first developed.

But this is not necessarily a bad thing. If company 'A' and company 'B' used different normative populations to initially validate their tests, these may represent legitimate patient populations (or healthy populations) to compare an individual patient against.

I would say go ahead and use all five tests, but convert the patients' test scores to Z-scores (also known as standard scores) so they will be comparable to one another: i.e., a test value of +1.6 z-units on test 'A' will be assumed to be the same as +1.6 z-units on test 'B'. (note of course that you must use the mean and SD of the normative population the tests were originally validated on by the company, and not your own sample's mean and SD! ---a mistake some students make. Thus your entire sample may fall below the tests's mean value, for instance).

In work I have been involved with, we've used a variety of intelligence and development tests to assess patient populations with in this way. Most intellectual/developmental tests have a mean of 100 and an SD of 15 but they each measure different things, so we simply reported these scores in z-units (for example "group X scored on average 1.2 standard deviations lower on all five developmental tests than group Y"). It sounds like the goal of your study isn't to validate one TSH test against another; rather, you wish to assume that they ARE all equal (normally distributed and easily converted to standard scores (z-scores)). You should probably try to verify the distributions of scores that were used to normalize each test as it was developed, to be sure they are all from a similar distribution (hopefully normal) so that you can use the same z-score formula and thus compare apples to apples. .

Pooling the scores might have some risk because of the above-mentioned inconsistences between tests (which may boil down to differences between the populations they were normalized on). If one company's tests are very different from all the rest (perhaps because they used a diseased population to normalize their test, and the other companies used healthy people) then this could wash out any differences that are real. But that information might not be available. You might want to report your results in the way we did, e.g., 'four out of five tests for the female subgroup showed TSH z-scores at least one unit lower than in the male subgroup" or similar language. You should probably also report results in a table which somehow breaks down the results of each company's test. Some lab test results are notoriously unreliable; I don't know about TSH tests but clinician readers might give more weight to some over others. They'll want to see which tests might be reporting consistently higher or lower TSH values even if you also pool the results.. You will probably have to state some conclusions and/or limitations regarding the comparability of the five tests, which again isn't your goal per se.

I['m not sure how you're analyzing this but you might have a cohort effect, a lab effect, and a test-type effect to deal with.

Another thought is that you could correct or adjust one test's norms to comply with the rest; in regard to your example of one test's 5th percentile being equal to another test's 10th percentile. I think I've seen some literature on this. But it opens another can of worms and may be tangential to your readers' ultimate interests. Good luck!

Tim I M Korevaar

Thanks for your response Jerry. The Z scores seem to work alright so far, but still I feel that the misclassification is causing regression to the null.

With regards to the normalisation of the manufacturer, the thing is we have much better data than the manufacturers. In addition, they manufacturers do not normalize their measurements based on another test, but rather on solutions with a certain fixed amount of TSH. As such, it is difficult to normalize based on the validation procedures used by the manufacturer.

Although there are some papers describing the correlation between assays, this does not allow for adequate standardisation given that it is likely that the difference between assays for lower concentrations is dissimilar to that of higher concentrations.

One thing I have been thinking of is taking out some of the between-population variation that is caused by differences in TSH determinants. For example, ethnicity is a known TSH determinant and the populations differ in ethnicities. As such, standardisation to ethnicity in each cohort may take out some of that variation.

In addition, it may also prove valuable to find a way to adjust for differences in kurtosis and skewness between cohorts, taking the largest cohort as a reference for example. Unfortunately, I have not yet managed to find any such techniques.

Tim I M Korevaar

Hi Rudolf, thanks for thinking along.

I am afraid I do not understand your concerns. TSH is the exposure variable in our case, so the fact that it has a skewed distribution is very unlikely to lead to skewed residuals (i.e we an use it to study the effects of interest and will meet the normality assumption).

We do not have an issue with reference ranges, but with comparability of the measurements. In the Katayev study the labs performed before hand "Standardization to the same methods, reagent lot numbers, calibrators, controls, and standard operating procedures significantly reduces the between-laboratory variability in test results.". What they fixed beforehand, is exactly what we need to fix retrospectively with our data - but I am having trouble with the how.

What are the key theories relating to the nature and exercise of leadership?

Sexual selection and speciation – unpublished data?

What can be the reason if electrode A has higher specific capacitance value than electrode B but electrode A has higher Rs value?

Which non-denaturing lysis buffer can comprehensively lyse THP-1 monocytes?

How to sovle the problem of protein sticking to Ni NTA/IDA columns after SUMO tag cleavage ?

Is the cif file for MXene(Ti3C2Tx) available?

Is spatial autocorrelation always a problem while modelling habitat suitability?

How to prevent sticking of single cancer cells to PDMS microfluidic chip?

Difference between power and myAUC in Seurat R package for analyzing Single Cell RNA-seq data?

Difference between power and myAUC in Seurat R package for analyzing Single Cell RNA-seq data?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?