Knowing that RNA-Seq data represent absolute counts while RTqPCR data are relevant. Is it scientifically accepted to compare RNA-Seq data to RTqPCR data? Thanks
Yes and no. There is a lot of background information needed to really understand why this is. I'm guessing that you've seen my response to Iddo's question on absolute quantification which is related to this question - so since you've posed it here, let me expand on this a little bit.
There are a couple of different ways that people generally perform qPCR normalization - and each way depends on the expression of so called housekeeping genes. This a form of combined within-sample and between-sample normalization that provides a common standard against which relative quantification is made. Back when RNA-seq sucked - say up until around 2012 or 2013 around when people totally gave up on technologies such as the Solid sequencer etc, it was common practice to perform RNAseq - probably using an replicate free experimental design due to cost - and then also perform qPCR validation on a handful of genes.
Clearly, the only meaningful way to validate RNA-seq using qPCR is to perform within sample normalization on the RNAseq data (say, CPM) to correct for gene length bias, and and then perform qPCR on several genes from the same sample that you saved an aliquot from - preferably from the same RT reaction used to create the library, though this is rarely possible when outsourcing - and then normalize to one or more housekeeping genes that are strongly expressed. If you see similar patterns between your genes in terms of their relative expression, then you can safely confirm/validate your RNA-seq run.
In this sense, you can try to compare RNA-seq data to RTqPCR data and call it scientific. This is still often done - especially in non-model organisms. Sorting through cancer or developmental studies, you'll surely run across plenty of qPCR data. But keep in mind, there are gold standards, and there are 'do what we can with what we have' standards. And we gotta make our science work so we don't lose our jobs. Amarite?
To be clear - raw RNA-seq data (or more specifically, the counts that may be derived from it using software such as HT-seq or Subread featureCount) is NOT absolute quantification. Its actually a more complex situation than this and frankly, I often hear people speaking about RNAseq data in such terms and it is incorrect to do so. There are a variety of natural biases that accompany any RNA-seq experiment that must be correct for in order to consider counts absolute quantification. If you perform within sample normalization using CPM, TPM, FPKM (paried end), RPKM (single read), Quantile, or any other within sample normalization - these are loosely referred to as 'absolute quantification' and as such allow you to compare one gene to another gene within the same sample. Since these methods do not account for library composition bias, your common divisor (library size) does not get equalized between samples,which means even though you have CPM=256.7 for Gene A in sample 1 and CPM=256.7for Gene A in sample 2, if you're library masses are different by 25% (which is totally not unreasonable to expect), then the 'absolute quantification' gets exposed as not really 'absolute' at all. They are absolute on to their respective sample relative to its library mass. It is so critical to understand this. Library mass normalization is performed using a different set of methods of which TMM is currently hands down the best. But when you normalize library mass - you sacrifice between gene measurements. You can compare the same gene across multiple samples - relatively, but you cannot compare two genes within the same sample. So again, the RNA-seq quantification loses its 'absolute' character. Furthermore, if sequencing is done across multiple machines in batches, there can be systematic batch effects that need to corrected if possible. And some pretty gnarly batch effects can be corrected for.
Here is the paper on TMM normalization - but you should spend a few days googling bioinformatics blogs and reading papers comparing normalization methods. There are heaps.
https://www.ncbi.nlm.nih.gov/pubmed/20196867
Realistically, beyond these bias corrections, I would hardly call RNA-seq or any other technology absolute. People who say this either don't really know what they are talking about, or are speaking haphazardly/loosly (which is OK! But you should one should not take it literally).
Back to qPCR, it is not possible to directly compare data points generated by qPCR because - as you say - it is a relative measure. i.e. How quickly does my gene of interest amplify during PCR relative to this other gene? There is no count data associated. ddPCR (digital droplet) provides a potential solution - but when comparing to RNA-seq, unless the initial equipment investment is already made, its probably not a cost effective solution. So if you are to compare qPCR to RNA-seq, its going to be done completely relatively. Furthermore, there is justified trend towards skipping qPCR validation of RNA-seq altogether. I see nothing wrong with this - so long as the experimental design includes the production of biological and - preferably in addition - technical replicates. This is the home run approach. But again, costs.
Strictly speaking, qPCR isn't so bad within a given sample, but it is actually not such a great approach to compare between samples. Consider a study comparing two conditions and you've selected a housekeeping gene - how can you be sure the housekeeping gene isn't deferentially expressed along with your gene of interest? How can you be sure there isn't natural variance in that gene? This is a particularly difficult problem in time series studies. I've seen talks where qPCR validation of RNAseq exposed this problem... and the speaker wasn't even aware of it. The fact of the matter is that qPCR has been used like CRAZY over the past 15-20 years because its so darn accessible. But we also used to create gigantic BAC libraries to walk through genome sequences. Both of these technologies are nearly obsolete - though they are both still used.
So is it scientifically accepted to compare RNA-seq and qPCR? If you are doing some relative comparisons (gene A is higher than gene B in this one sample in both my RNA-seq data and qPCR data), then yes it is accepted. But understand the technology thats generating the data and its limitations, the methods used to properly process and analyze the data, and what each method says and does not say about your system. otherwise, its not truly a scientific approach.
I hope this was helpful in your understanding. I've attached a few links to get you started on some of the pertinent background information.
As always - if anyone has anything to add or correct or discuss, please make it be known!
qRT-PCR is relative when compared to a housekeeping or control gene. However, you can get an absolute copy number for a specific gene by qRT-PCR if you make a standard curve using an input template of your target sequence. make serial 10 fold dilutions of a plasmid template that contains your target and then you can calcualte exaclty how many copies of cDNA you have .
Please refer to this study "Use of RNA-seq data to identify and validate RT-qPCR reference genes for studying the tomato-Pseudomonas " at http://www.nature.com/articles/srep44905
Hi Paul Gradie , thank you for your great answer to this question! It helped me a lot. I was wondering if you could explain a little bit more about why "when you normalize library mass - you sacrifice between gene measurements. You can compare the same gene across multiple samples - relatively, but you cannot compare two genes within the same sample". Thank you!!!