I have been trying to assemble some data from a Illumina miseq system analysing a bacterial whole genome sequence for the first time. I firstly used SPAdes genome assembler to assemble the sequence and then used Mauve multiple genome alignment to order the contigs using a very closely related strain as the reference.

Then I tried to submit the sequence to the genebank. They informed me that a foreign contamination screening on these sequences has shown that the sequence contains adaptors which must be removed. Besides, a preliminary annotation of the genome finds 31 fragmented rRNAs, indicating that the assembly is incorrect. 

Then I have been working on trimming the original fasta.gc file using cutadapt/skewer. However, after the trimming the FastQC results are still not good. 

I'm hoping experts/researchers in this field please give me some guidances on what I should do, or maybe provide some reference/tutorials to help me better learn this.

More Qinhong Cai's questions See All
Similar questions and discussions