Hi,

I am analyzing 16S rRNA gene sequencing data from a low biomass sample using QIIME2 and DADA2 for preprocessing.

Our primary research question is: Is a specific bacterial taxon present in a particular sample?

One challenge we are facing is determining whether an ASV observed at low relative abundance is truly present in the sample or merely a contaminant or artifact (e.g., from PCR/sequencing error). Unfortunately, we do not have any positive or negative controls in this dataset to help identify background noise or contaminants.

We are considering filtering out low abundance taxa, using relative abundance thresholds of 0.01%, 0.1%, or 1%—based on what has been done in previous studies.

My specific questions are:

  • Is it appropriate to filter out low abundance taxa in this context?
  • How can we determine a reasonable threshold for filtering?
  • How would filtering low abundance taxa impact alpha and beta diversity metrics?
  • Could this filtering introduce bias, especially given the low biomass nature of the samples?
  • Any insights or recommendations would be greatly appreciated.

    Similar questions and discussions