How can I match my query-sequence to only the ends of reads with Bowtie2 (or similar)?

25 October 2018 2 8K Report

EDIT 29/10/2018: Added more details to bottom.

This might be a bit weird, but I thought I'd try here anyway.

Is it possible to configure Bowtie2, or similar programs, to match my query-sequence only from the end of the sequencing reads?

Query1:

AAATTTGGG

Read1:

NNNNNNNNNNNNAAATTTGGG

This query should give an an exact match score for Read1, no matter what the N's contain, while

Query1:

AAATTTGGG

Read2:

NNNNNNNNNNNNAAATTTGGGNNN

Or anything else where the end of the read does not match the query exactly, would result in a mismatch.

I'm currently just using grep to match my reads in this way, but it is terribly slow.

EDIT:

Bit of the background. I'm sequencing an oligo library coding for short peptide sequences. The peptide length varies, so to make them all behave the same way in PCR etc., I have added filler sequence bringing them all to length of 200 bp. Each peptide coding sequence starts with Kozak (GCTAGCCCACC). So each oligo in the library looks like this:

NNNN...NNNN-GCTAGCCCACCATGACCACAGGAGACACCTAGCT

1bp----Filler---------Kozak-----------------peptide-coding-----------200bp

Because of errors in oligo synthesis, PCR, and Illumina, there can be snips/indels in these sequences in the sequencing reads. Any mismatches in the N-part (filler) are of no consequence to the peptide being produced, which is why I would like to match the sequencing reads to my library only starting from the Kozak sequence. The filler length varies from 0-150 bp, so I cant think of an easy way of trimming these from the reads.

Abhijeet Singh

* If you are aligning both sequences, query would only align to the same/similar sequence in your read, doesn't matter if there are N.

* But if you already know that there are Ns in your read, the obvious approach should be the trimming of Ns before moving to MSA.

Mohamed A. El-Esawi

I agree with Abhijeet Singh

Does anyone have experience electroporating SKOV3 cells with a large plasmid using Neon transfection?

What is the main used curing agent for epoxy based coating?

How to scrape off the surface layer of compound sputtering targets as with multiple usage top layer's stoichiometry changes from the original target?

How to use NCBI datasets ?

What are the best ways to automate my job and live without working?

What are the best ways to earn more money as a researcher?

How do I monitor my CCTV camera in Bangalore from California?

How do I check if the research work is human made or AI generated?

How does an AI detection tool work?

How do I prove that my research work is noval and not plagiarized?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to confirm the site-directed mutagenesis result without performing NGS?

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

Does crude extraction using NaOH and Tris work well with Fungi?