My design is that of repeated measures (within-subject design). That is, the subjects (name) are the same through the groups (group). This implies that there is dependency between the data, which is why we cannot use one-way ANOVA. The alternative, then, is to use one-way repeated-measures ANOVA (RANOVA) - this ANOVA accounts for the dependence between the data in its variance calculations. However, other problems appear: normality and sphericity.

Some metabolites are not normally distributed. Therefore, we cannot apply RANOVA to them. In addition, the other assumption of sphericity needs to be satisfied. The method I used to calculate RANOVA automatically checks the assumption and corrects it, if necessary. This method also returns the size of the generalized effect, which enriches our adjustment.

But, how to solve the problem of non-normal metabolites? We can apply ANOVA to those normally distributed, certainly, but for those others we can use the Friedman test: nonparametric equivalent to RANOVA. However, it is not that robust. Instead, it would be interesting to use mixed models.

As the design is dependent, so mixed models apply well, at first.

See what metabolomics data can be transformed into a long format:

name group metabolite value 1 S1 1 1-Methylnicotinamide 0.0190 2 S1 1 2-Aminoadipate 0.410 3 S1 1 2-Hydroxyisobutyrate 0.114 4 S1 1 2-Oxoglutarate 0.266 5 S1 1 3-Hydroxybutyrate 0.0579 6 S1 1 3-Hydroxyisobutyrate 0.303 7 S1 1 3-Hydroxyisovalerate 0.106 8 S1 1 3-Hydroxyphenylacetate 0.0445 9 S1 1 3-Indoxylsulfate 0.615 10 S1 1 4-Hydroxyphenylacetate 0.256

We can see them in a certain hierarchy:

group

name

metabolite

From what I have studied, in order to verify that metabolites have changed significantly over time (of the groups), we first created a baseline model and a main model. Next, we compare whether the main model significantly improved the fit. How is this done in practice?

In the first model, baseline, we considered only the group average as a predictor of metabolites. Imagine a regression where only the intercept varies. We expect the fit to be poor, right? A straight line (without slope) is not able to predict much. On the other hand, in the second, main model, the regression also has the group as a predictor. Compared to the baseline, if this model improves the fit for the metabolites, then we have evidence that the variable group has a significant effect.

What follows this comparison between models is a multiple comparison. What is the most robust method suggested in this case?

In the hierarchy I made above, we can say that metabolites vary across individuals and across groups. But I'm not sure it is a correct model. It could be: metabolites vary for each individual, and individuals vary in each group. It makes more sense, right?

Which treatment structure is the most suitable for this case? Is there any script available for R that can be used?

More Diego Salgueiro's questions See All
Similar questions and discussions