04 September 2018 5 3K Report

Currently I am working in a Transcriptomics lab and we are optimizing the RNA-Seq analysis pipeline to get best out of the data. We are counting the reads using bedtools against gene coordinates. Now while calculating what should we consider the gene length or sum of exons' length.

My feeling is if we generate the count for the overall gene, eventually we are counting against exons, as using RNA-seq we are suppose to capture only exons. Thus sum of exon length is a better option to choose. But due to alternative splicing in the form of exon skipping can cause shortened transcript and we may end up with wrong normalized counts for such genes. I would look forward to suggestions regarding this.

Also if we neglect the effect of alternative splicing on the normalized count, my second question is which transcript to consider while calculating sum of exons' length among all the possible transcripts of a gene.

Thanks in advance.

More Sourav Nayak's questions See All
Similar questions and discussions