I have a dataset generated via Ultrahigh Performance Liquid Chromatography-Tandem Mass Spectroscopy (UPLC-MS/MS).

It is a table of integers which are the readings for 700 samples times 1300 metabolites.

It looks like count data (when plotting histograms).

However when I plot mean against variance for the 1300 metabolites (so a 1300 point scatterplot) I find that mean variance.

In fact, mean = standard deviation is a good fit,

that implies each metabolite follows a gamma distribution.

That's as far as I've got.

Does anyone know of any good reviews explaining UPLC-MS/MS data? How to QC it? R libraries?

Ultimately, how to use them as predictors of a phenotype?

Any help would be appreciated as I'm a total beginner with this type of data.

More Desmond Campbell's questions See All
Similar questions and discussions