I have my transcriptome data of plant samples. Since the reference genome of my plant or its closest species is not available i have to go for denovo transcriptome assembly. Before assembly do i have to remove rRNAs besides qualiyfiltering?
The first step is always to remove the low-quality reads. The second step depends on your purpose. If you want to identify ONLY novel genes, you may remove the reads that mappable to known genes. If you want to identify novel alternative splicing isoforms of known genes, exon-extensions of known genes or fusion genes, I suggest you use all reads. Using all reads (excluding the low-quality ones) for assembly, you may acquire a MORE complete transcriptome collection.
In my opinion, you need not check for rRNAs, because they fall into the category of other ncRNAs. Mostly the small RNA sequencing kits, capture these RNAs. Since you are having transcriptome data, you can proceed via low quality trimming, adapter trimming, and then de novo assembly of transcripts. Hope this will help. Best of Luck!!
Thank you all for your quick reply. It seems the preprocessing involves adapter removal and quality filtering only. Now i can direct my analysis to assembly.
I routinely remove rRNA before fungal transcriptome assembly; sortmeRNA is a good tool for this purpose. My rationale is that rRNA reads can be a large fraction of a set of RNA-Seq reads, depending on how effective poly-A selection was during library preparation. The assembler would waste a lot of time on the rRNA reads in the best case and incorporate some of them into chimeric contigs in the worst case. The cleaner the input, the better your assembly should be. You could try assembling the reads with and without rRNA removal to see if it makes a difference; in my experience it usually does. If you want to assemble the rRNA repeat sequence, it would be better to do a separate assembly of the rRNA reads segregated by sortmeRNA.
When you move on to transcript quantitation, you should explicitly handle the rRNA reads by removing them before read mapping or by providing the rRNA sequence as a sink for those reads during read mapping.
I agree with Ian, removing the rRNA reads will save a lot of computation time (and likely memory) during de novo assembly and will probably lead to a cleaner assembly. rRNAs are so abundant that they can amount to several percent of your data/reads even if you used polyA selection during the library preparation.