EDIT 29/10/2018: Added more details to bottom.

This might be a bit weird, but I thought I'd try here anyway.

Is it possible to configure Bowtie2, or similar programs, to match my query-sequence only from the end of the sequencing reads?

Query1:

AAATTTGGG

Read1:

NNNNNNNNNNNNAAATTTGGG

This query should give an an exact match score for Read1, no matter what the N's contain, while

Query1:

AAATTTGGG

Read2:

NNNNNNNNNNNNAAATTTGGGNNN

Or anything else where the end of the read does not match the query exactly, would result in a mismatch.

I'm currently just using grep to match my reads in this way, but it is terribly slow.

EDIT:

Bit of the background. I'm sequencing an oligo library coding for short peptide sequences. The peptide length varies, so to make them all behave the same way in PCR etc., I have added filler sequence bringing them all to length of 200 bp. Each peptide coding sequence starts with Kozak (GCTAGCCCACC). So each oligo in the library looks like this:

NNNN...NNNN-GCTAGCCCACCATGACCACAGGAGACACCTAGCT

1bp----Filler---------Kozak-----------------peptide-coding-----------200bp

Because of errors in oligo synthesis, PCR, and Illumina, there can be snips/indels in these sequences in the sequencing reads. Any mismatches in the N-part (filler) are of no consequence to the peptide being produced, which is why I would like to match the sequencing reads to my library only starting from the Kozak sequence. The filler length varies from 0-150 bp, so I cant think of an easy way of trimming these from the reads.

More Paul Carlson's questions See All
Similar questions and discussions