I have recently sequenced a number of PCR amplicons on an Illumina MiSeq, in a sample that is a compund heterozygote for 2 SNPs. The 2 SNPs are very close together, so in theory any read that covers both SNPs should have one of the variants, but never both. However, when I physically count them, only around 80% of reads follow this pattern, while the remaining 20% look like they have both variants, or neither.

The read length was set to 150bp, but quite a few of the reads (in the 20% group that don't appear as expected) are much shorter than that, as short as 40bp (when BAM files are viewed in IGV). Soft-clipped bases are toggled, but there don't appear to be many. Why would this be? And would it have an impact on the reliability of any SNP calls on that read?

More Louise M Burmeister's questions See All
Similar questions and discussions