I recently started working with 16s metabarcoding and I just encountered a situation I don't know it's common or not. I'm mostly working with the bioinformatics part of the project.

We have 5 sampling times (different years) and after processing the fastq files with dada2 (same parameters) and using phyloseq (I decided to use rarefy_even_depth despite the reservations from the phyloseq authors), we found out that observed richness varied significantly with the year. The first year, treatments had 1000 ASVs; the following three years had 2500-3000 ASVs; the last year, it was back to around 1000 ASVs.

Other factors involved:

a) the five sampling times were sequenced in different runs and different machines (same model, but different machines. That doesn't explain why three were in the same ball park, while another two were similar as well.

b) the fastq files concerning the last sampling time were provided with primers, so I used cutadapt to remove them. That doesn't explain why the first sampling time (provided without primers) and the last sampling time were similar.

I wonder if those results are legit and variation is due to edafoclimatic conditions and crop cultivated during sampling time.

P.s.: As a side note, when plotting a PCoA with bray-curtis dissimilarity, all sampling times are clearly differentiated.

More Pablo Schulman's questions See All
Similar questions and discussions