I received RNA seq data on a cell line and the level of expression of a certain receptor is "x"% (e.g. 29%). I'm not sure how this percentage was calculated from RPKM? Thnx in advance.
When you talk about expression in this way, you will be referring to another sample. For example, you might say "expression of Gene X in the treated sample is 29% when compared to untreated." Even if the paper you are reading in this doesn't explicitly state this, it is implied, as it has to be 29% of something.
You would need to normalise your expression data (RPKM means it is normalised WITHIN sample not BETWEEN samples) using TMM or another type of normalisation strategy. Then you would either compare the numbers or use a statistical method (e.g. z-scoring, log fold change in a differential expression workflow) to talk about the expression in this way.
Personally, I find that talking about fold changes is much more clear than talking about percentages, even though you are saying the same thing. For example, if you are talking about a gene that is expressed twice as much in one sample than another, saying 2-fold higher expression sounds much more natural than saying 200% expressed.
Thanks Sophie! This is why I'm a bit confused...I don't know what my dataset is being compared to. It is a cancer cell line. Is there a widely used method in which they use internal controls to come up with these percentages? I'll have to clarify with them I suppose...