We had sent some phytoplankton samples for sequencing. And we had just received the generated sequences, and the next step was to do BLAST to identify what the phytoplankton that we sent is. Basically DNA Barcoding.
To give some context, when we send our samples for sequencing to the sequencing facility, they send us back two files, one for the forward sequence and another for the reverse sequence, based on the primers (forward and reverse) we gave.
So, the initial step involves us checking the quality of the sequences, specifically looking for any signs of low quality, ambiguity, or overlapping signals in the chromatograph.
Now, I'm a bit uncertain about the next steps.
The following step would be sequence trimming. To do this, I need to identify the start of each sequence by locating the primer sequence. This means finding the forward primer sequence in the generated forward sequence and doing the same for the reverse primer in the reverse sequence.
Afterward, I perform reverse complementation on the reverse sequence.
Following that, I conduct a pairwise alignment between the generated forward and reverse sequences and subsequently generate the consensus sequence.
My questions are, as I am a bit stumped with this (I apologize in advance, I'm a bit new with bioinformatics), (1) what if neither of the generated sequences have the primer sequences? Would that mean the sequences generated were of bad/low quality? and (2) Is this approach correct, or have I missed a crucial step?
Thank you!