I have a dataset generated via Ultrahigh Performance Liquid Chromatography-Tandem Mass Spectroscopy (UPLC-MS/MS).
It is a table of integers which are the readings for 700 samples times 1300 metabolites.
It looks like count data (when plotting histograms).
However when I plot mean against variance for the 1300 metabolites (so a 1300 point scatterplot) I find that mean variance.
In fact, mean = standard deviation is a good fit,
that implies each metabolite follows a gamma distribution.
That's as far as I've got.
Does anyone know of any good reviews explaining UPLC-MS/MS data? How to QC it? R libraries?
Ultimately, how to use them as predictors of a phenotype?
Any help would be appreciated as I'm a total beginner with this type of data.