We currently aim to sequence human DNA with 90x coverage on a NovaSeq6000 system (TruSeq DNA PCR-Free Library Prep). The run parameters are all fine and with around 1.2 Tb the output should be sufficiently large (3 samples per flow cell). However, when we take a closer look at the coverage of our samples we can see that some regions of the genome are covered with up to 300-400x whereas a broad range of regions is covered with less than 40x (graphics attached). At the same time we have a very small percentage of duplicates (~6%). We are wondering where the selective amplification comes from?
Some details and stats for the alignment:
- Aligner: Isaac (Illumina, iSAAC-03.16.02.19)
- Reference genome: Homo sapiens (Ensembl GRCh37)
- Total aligned reads: ~2,000,000,000 (~93%)
- Fragment length: ~394bp
- Percent duplicate proper read pairs: ~6%
Many thanks in advance!