16S rRNA data preprocessing – can I filter out low abundance ASVs?

Kornelija Rauduvytė @Kornelija_Rauduvyte2

28 July 2025 2 1K Report

Hi,

I am analyzing 16S rRNA gene sequencing data from a low biomass sample using QIIME2 and DADA2 for preprocessing.

Our primary research question is: Is a specific bacterial taxon present in a particular sample?

One challenge we are facing is determining whether an ASV observed at low relative abundance is truly present in the sample or merely a contaminant or artifact (e.g., from PCR/sequencing error). Unfortunately, we do not have any positive or negative controls in this dataset to help identify background noise or contaminants.

We are considering filtering out low abundance taxa, using relative abundance thresholds of 0.01%, 0.1%, or 1%—based on what has been done in previous studies.

My specific questions are:

Is it appropriate to filter out low abundance taxa in this context?

How can we determine a reasonable threshold for filtering?

How would filtering low abundance taxa impact alpha and beta diversity metrics?

Could this filtering introduce bias, especially given the low biomass nature of the samples?

Any insights or recommendations would be greatly appreciated.

Guojun Wu

Regarding "artifact (e.g., from PCR/sequencing error)", here is a paper for you reference. Article Minimizing spurious features in 16S rRNA gene amplicon sequencing

Regarding "merely a contaminant", it is better to have negative and positive controls especially for low-biomass samples.

Abhijeet Singh

This question is mixing two different things.

Data analysis in DADA2 and further down-stream analysis.

Filtering low abundance is not the part in DADA2, it is more or less visualization part.

And for the specific question in point 1-4, these are subjective questions which can't be and shouldn't be generalized. Filtering or not would depend on the specific datasets and the questions around the project. There can be several ways these question can practically be handled, but the aim always precede the practical workflow.

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to confirm the site-directed mutagenesis result without performing NGS?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?