08 June 2018 4 7K Report

I have been applying pre-processing steps to metabolomics data prior to statistical analysis (PCA, heat maps and clustering based on correlations). In general:

- I have log-transformed the data to reduce heterocedasticity

- Then I pareto-scaled the data to balance high abundant and low abundant metabolites

I have been reading some papers and speaking with colleagues about data pre-processing because some people argue that scaling the data should be sufficient, others prefer to transform and then scale prior to analysis. So I was wondering what is your experience and suggestions regarding data pre-processing? Off course it also depends on the assumptions of the statistical method but what's your general view on this?

Similar questions and discussions