Hello everyone,

I have recently started working with NGS data (amplicon sequencing) for phylogenetics and species-delimitation studies of closely-related specimens (typically within a genus or a family). I obtained short sequences (ca. 100 bp) from multiple nuclear protein-coding loci. I use a custom script that selects the most abundant amplicon for downstream analyses. For a portion of the loci, there is evidence of paralogs amplification within loci, e.g. 1 or a few sequences that differ strongly from the others. There are also cases of completely different sequences within loci (i.e. that don't align with the others), that should be removed. Would you have any automated methods or pipeline to recommend that would be suitable for amplicon sequencing and would allow for the detection and objective removal of these unwanted sequences?

Thank you for your help!

More Laurent Vuataz's questions See All
Similar questions and discussions