We recently analysed some metagenomic 16S data using QIIME (v1.9.1) and the SILVA database (release 128) for taxonomy assignment and then fed these results into LefSe (Segata et al., 2011 - http://doi.org/10.1186/gb-2011-12-6-r60) (we recently installed it locally with docker) to determine which clades were significantly more present in one group of samples than the other.

We understand most of the results, but there are some we find difficult to interpret when we compare different levels of analysis (ex. L2 with L5, with L7) (please, click on the question title to view the figures). For example, we see a phylum (Proteobacteria in the files attached) which is significantly enriched in treatment S (sugar) that is not then reported as significantly increased in the L5 analysis and then appears again (in isolation) in the L7 level . We understand there is no specific family within the Proteobacteria that is enriched, but there are some species that make the difference, hence the Proteobacteria bar appearing again in L7. We hope this is the right interpretation...

But when the opposite happens, we are at a loss for explanations: see for example family Lachnospiraceae appearing as a significantly enriched clade in treatment "Control" in L7, but being absent in the L5 results of the same analysis: how can this be explained? If family Lachnospiraceae is found as significant at the L7 level, why doesn't it appear at the L5 (family) level using the same LDA cutoff?

Can anybody help us explain this?

We would really appreciate.

More Ana C. Henriques's questions See All
Similar questions and discussions