I have been applying pre-processing steps to metabolomics data prior to statistical analysis (PCA, heat maps and clustering based on correlations). In general:
- I have log-transformed the data to reduce heterocedasticity
- Then I pareto-scaled the data to balance high abundant and low abundant metabolites
I have been reading some papers and speaking with colleagues about data pre-processing because some people argue that scaling the data should be sufficient, others prefer to transform and then scale prior to analysis. So I was wondering what is your experience and suggestions regarding data pre-processing? Off course it also depends on the assumptions of the statistical method but what's your general view on this?