In the library preparation for a NGS experiment, it is customary to bind adapter sequences. Ironically, these addendum may contribute to the contamination in the data that is headed for computational analysis, and might often present deceptive outcomes.
In an exceptional case of a paired-end data, I found no overrepresented or repeated sequences. Even more tantalizing was the fact that in a corresponding sample(again paired-end), there were overrepresented sequences in each reads, that were differently sourced. The scenario is as follows:
One of the reads I4_R1.fastq has the following attribute,
>>Overrepresented sequences warn
#Sequence Count Percentage Possible Source
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATG 36771 0.14170337546219017 TruSeq Adapter, Index 6 (100% over 49bp)
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC 36534 0.14079005518304247 TruSeq Adapter, Index 6 (100% over 50bp)
>>END_MODULE
while the other, I4_R2.fastq has the following:
>>Overrepresented sequences warn
#Sequence Count Percentage Possible Source
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCC 37938 0.14620061076077806 Illumina Single End PCR Primer 1 (100% over 50bp)
GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCG 35122 0.1353486702287956 Illumina Single End PCR Primer 1 (100% over 50bp)
>>END_MODULE
I would like to hear some opinions.
Thanks.