12 December 2013 7 3K Report

Working with paired-end RNA-seq data for a diploid plant, both root and shoot separately, I currently have a matrix of read counts for a couple of thousand genes for both. I am now trying to identify genes differentially expressed in (e.g.) root with respect to shoot.

I have spent some time experimenting with RPKM and TPM values, but these don't really tell me much. Also tried EdgeR, and I have a smear-plot (attached), but am at a loss as to what this is really telling me.

My questions:

1. Why is EdgeR preferred over RPKM/TPM values?

2. What decides the threshold to be set for identifying truely differentially expressed genes?

3. If I use HTSeq rather than a simple script to count mapped reads, does it take into account only a small portion of the gene, or is its complete length mapped?

Similar questions and discussions