After starting to work a little while ago with RNA in prokaryotes in order to perform RNA-Seq analysis, I finally arrived to the moment of the bioinformatic analysis of the obtained reads.

I prepared the cDNA library by following a modified TruSeq protocol for Illumina and the quality of the preparation by analysis using DNA High Sensitivity chip in Bioanalyzer was very good.

After sequencing reaction, I used CLC Genomics Workbench to perform the RNA-Seq analysis. First of all, I run the tool for checking the quality of the reads, and the sequencing reaction seemed to be almost perfect, so I was very happy. But when running the RNA-Seq Analysis tool included in the software (using default parameters) it happens that more than 60% of the reads doesn't match with my reference genome.

I have to say that as reference genome I use a multifasta file containing the list of all CDS, but not the assemble annotated genome.

I was wondering why just 30% of the reads are mapped during the analysis. Now I think that it might be due to the use of a CDS list. That would make all reads falling in between two CDS or intergenic regions will not be mapped. Am I right? Have any of you any other suggestion?

Thank you very much in advance.

More Cristina Andrés Barrao's questions See All
Similar questions and discussions