EDIT 29/10/2018: Added more details to bottom.

This might be a bit weird, but I thought I'd try here anyway.

Is it possible to configure Bowtie2, or similar programs, to match my query-sequence only from the end of the sequencing reads?





This query should give an an exact match score for Read1, no matter what the N's contain, while





Or anything else where the end of the read does not match the query exactly, would result in a mismatch.

I'm currently just using grep to match my reads in this way, but it is terribly slow.


Bit of the background. I'm sequencing an oligo library coding for short peptide sequences. The peptide length varies, so to make them all behave the same way in PCR etc., I have added filler sequence bringing them all to length of 200 bp. Each peptide coding sequence starts with Kozak (GCTAGCCCACC). So each oligo in the library looks like this:



Because of errors in oligo synthesis, PCR, and Illumina, there can be snips/indels in these sequences in the sequencing reads. Any mismatches in the N-part (filler) are of no consequence to the peptide being produced, which is why I would like to match the sequencing reads to my library only starting from the Kozak sequence. The filler length varies from 0-150 bp, so I cant think of an easy way of trimming these from the reads.

More Paul Carlson's questions See All
Similar questions and discussions