24 January 2020 2 4K Report

Recently, I have sequenced many libraries on Novaseq. I am aware of the dark cycles causing long polyG reads due to Novaseq's two color technology. However, I am also seeing pretty consistent overrepresentation of CA repeats and GT repeats representing between 0.2-0.6% of my total reads. It is also important to mention that these libraries were made using oligodT priming, so I shouldn't really be seeing a lot of short tandem repeats represented in the libraries. Is this something that other people are seeing, and is it an artifact of the technology or is it a potential contamination in the data?

The quality of my reads is good, and I'm not super worried about whether these need to be trimmed out or not because STAR usually does a great job mapping regardless of trimming, I'm just more curious about why I am seeing this occur regularly.

More James Ropa's questions See All
Similar questions and discussions