Working with paired-end RNA-seq data for a diploid plant, both root and shoot separately, I currently have a matrix of read counts for a couple of thousand genes for both. I am now trying to identify genes differentially expressed in (e.g.) root with respect to shoot.
I have spent some time experimenting with RPKM and TPM values, but these don't really tell me much. Also tried EdgeR, and I have a smear-plot (attached), but am at a loss as to what this is really telling me.
My questions:
1. Why is EdgeR preferred over RPKM/TPM values?
2. What decides the threshold to be set for identifying truely differentially expressed genes?
3. If I use HTSeq rather than a simple script to count mapped reads, does it take into account only a small portion of the gene, or is its complete length mapped?