When using Sybr green or TaqMan qPCR assays to measure gene expression changes, can differences that are less than at least two-fold be taken as credible? What minimum fold-change or percent change is reliable?
This depends on the precision you obtain in your experiments. This precision in turn depends on the variance of the ct values and the sample size. Thus, in principle, any precision can be achieved just by increasing the sample size.
Just a numerical example: consider (the utopic case) that there is no biological variance (e.g. from a "perfect" ideal cell culture) and that the technical variance is 0.09 cycles² (so the standard deviation, SD is 0.3 cycles), what seems to be quite common. Also quite common is the habit to measure in triplicates, so standard error (SE) for the mean ct is SE = sqrt(0.09/3) = sqrt(0.03). By error propagation, the SE of a mean difference (delta-ct) is sqrt(2*0.03) and the SE for a difference of such differences (delta-delta-ct) is sqrt(2*2*0.03) = sqrt(0.12) = 0.346. This SE is determined on 2*(3-1) = 4 degrees of freedom, so the 95% confidence interval has a half width of t[0.025;4]*0.346 = 2.77*0.346 = 0.96 or roughly 1 cycle. The total width of the interval is thus 2 cycles. Hence, a 2-fold difference is considerably smaller than the 95% CI obtained *without* any biological variance when 3 replicates are measured. With 24 replicates, the CI narrows down to plusminus 0.33 cycles, what is well sufficient to confidently distinguish a 2-fold (i.e. 1-cycle-)difference.
This depends on the precision you obtain in your experiments. This precision in turn depends on the variance of the ct values and the sample size. Thus, in principle, any precision can be achieved just by increasing the sample size.
Just a numerical example: consider (the utopic case) that there is no biological variance (e.g. from a "perfect" ideal cell culture) and that the technical variance is 0.09 cycles² (so the standard deviation, SD is 0.3 cycles), what seems to be quite common. Also quite common is the habit to measure in triplicates, so standard error (SE) for the mean ct is SE = sqrt(0.09/3) = sqrt(0.03). By error propagation, the SE of a mean difference (delta-ct) is sqrt(2*0.03) and the SE for a difference of such differences (delta-delta-ct) is sqrt(2*2*0.03) = sqrt(0.12) = 0.346. This SE is determined on 2*(3-1) = 4 degrees of freedom, so the 95% confidence interval has a half width of t[0.025;4]*0.346 = 2.77*0.346 = 0.96 or roughly 1 cycle. The total width of the interval is thus 2 cycles. Hence, a 2-fold difference is considerably smaller than the 95% CI obtained *without* any biological variance when 3 replicates are measured. With 24 replicates, the CI narrows down to plusminus 0.33 cycles, what is well sufficient to confidently distinguish a 2-fold (i.e. 1-cycle-)difference.
The general "rule of thumb" is that a two-fold difference in conc'n is determinable by the Ct value. Of course, there are sensitivity issues and these can all be worked out in your lab. Try serial dilutions of your template and compare the "theoretical Ct" vs. "observed Ct". You may be able to get down to 1.25 fold difference in amplicon amount with %RSD < 5 on triplicated wells with good pipetting and other typical quality controls.
.Depends on your standard and effciency of the recation. If your unknown is within the linear dynamic range of your standard curve with efficiency between 80% but below 100%, you can safey call a difference of 1 cycle (2-fold change)
Also consider the baseline expression of the gene itself (especially with respect to the expression of the housekeeping gene). The higher the cycle number (i.e. the lower the gene expression) the less reliable the measurement. Better to rely on statistical analysis than assume that a particular fold change is evidence of a difference from control values. There is a useful article by Livak and Schmittgen (METHODS 25, 402–408, 2001) which tackles the issue of statistical analysis in a user-friendly way.
The golden rule is that the qPCR machines are not usually able to read differences below 1 cycle, so every change below 1 cycle (2-fold change) menans no change, basically.
I confirm that the best is the use of triplicates for all samples. I do it in all my experiments, because of the sensitivity of the method. I also deny the data and repeat the experiment if the SD is high (>0.2-0.5). In order to be sure, when the Ct are very close, I use a second reference gene, and compare the results.
I would reinforce that I find TaqMan results more trustworthy than sybr. No doubt that the number of assays certainly will influence the reliability of the result. Repeating the assays with serial dillutions of the samples may be helpful, it is also necessary to look into the standard devision and do statistical analysis. Performing all assays in triplicates is always needed.
I agree with Kennet and with most of the previous answers.
In my experience, just using SybrGreen assays, we are able to evidence statistically significant differences of 1.2-1.3. This is possible by running samples in triplicate and using a quite high number of experimental samples (from 6 to 10 mice in my case, although the result obtained with 4 mice already reflects the final result in many cases).
The small variations are as much reliable as you confirm it in independent experiments.
Another question is, once statistical significance has been obtained, which is the biological significance of such small variation...
In summary, the main points to be considered respect to the initial question are:
- number of intra-sample replicates (at least 3)
- number of inter-sample replicates (number of samples for the same experimental condition
- errors associated with the two previous variables
- amplification efficiency (typically between 90-99 % for each primer pair, including the housekeeping)
- confirmation of the results with more than one housekeeping
Olga, it is not correct that TaqMan results ("TaqMan" is a trademark; the correct name is "hydrolysis probes") are more trustworthy than results obtained with SYBR Green in *quantitative* PCR. The only advantages of sequence specific probes are the possibility of multiplexing (what comes with own special problems in quantitative assays) and the clear detection (not quantification!) of the specific product in the presence of other amplified (unspecific) products. For quantification, in contrast, sequence specific probes decoy people *not* to check for the amplification of unspecific products in quatnification assays. However, checking this is vital, because the amplification of unspecific products can change the amplification efficiency for the specific product and thus considerably change the ct values. At best, this problem is negligible, but then both probe systems are just equally trustworthy.
I'm sorry, but I'm going to fall down on the uber-cautious side in this discussion. This seems to me to be one of those cases where the ability of our machines to generate number-places out beyond the decimal point (i.e. Ct Values which are not whole numbers) deludes us into believing that more decimal places equals a higher degree of accuracy/precision.
I'm not so confident that this conjecture (about 'measurable' fold-differences less than two-fold) can be supported by what is biologic/experimental plausibility, using the qPCR methods at hand. Construing any fold-difference less than X2.0 as "real" and as biologically-significant (note absence of "statistical" here) simply does not acknowledge the basic biologic principle of what's actually going on in a PCR reaction tube.
Let's pause for a moment, put aside discussion about number of replicates, inter/intra-reaction consistency, statistical power, and sensitivity of machines...and consider basic biologic and mathematical principles to PCR and qPCR.
Let us assume "the most-ideal" of experiemental conditions (so, 100% PCR reaction efficiency in every reaction you measure, and getting the exact (!) same amount of primers, fluorophores, and RNA/cDNA in every reaction). And so, in this case, one cycle of PCR represents one theoretical exact doubling of your amplified cDNA for every cycle of qPCR run, and there should be exactly twice as much quantified cDNA after cycle #11 as there was after cycle #10.
Typically, your machine is reading fluorescence, at one distinct (identical) time-point during each PCR cycle (typically post-elongation, right?). Therefore, no matter what machine value you are getting, either directly or via graphic interpolation, you are not using a biological system which is intrinsically capable of generating anything but whole-number values (therefore, NON-continuous variables), which are always separated from each other by some power of two.
So, what are the errors of assumption we make when we compare a mean value of Ct=10.20000 to a mean value of Ct=11.00000:
1. That any machine-derived value between, say, a mean value of Ct=10 and Ct=11, is an actual biological, measurable event-fact;
2. That Ct values are actually continuous variables (or that their means can be, or should be treated as such). Going back to our first basic biostats courses, we are always limited, in any test of significance we use, by the value containing the LEAST number of significant digits that are, indeed, both biologically and mathematically signficant. This, regardless of the number of decimal places our machines or our interpolative/analytic calculations give us;
3. That because we can calculate (and thus report) values of difference beyond the decimal points, that these represent "real" phenomena;
4. That we can accurately recover and quantify end-concentrations of PCR-amplified cDNA/DNA, given existing technologies, with sufficient accuracy to be able to support our calculations of
Much of the variability in RT-qPCR is in sample preparation of the RNA and technical variability. If you work to control the these variables, we can usually detect statistical changes in gene expression at 0.5 cycles or greater with an n = 6 for control of biological variability. This is with a Sybr Green assay and Ct of less than 30 for threshold.
Larry, you miss two important points in your line of arguments that eventually bring you to your (IMHO) wrong conclusion.
Ct values are not integers. Ct values are the intersection points of a MODEL of the amplification curve with a horizontal threshold line. The most common model used is the linear regression line through 3-5 points withing the appearantly log-linear phase of the reaction. The intersection point (the ct value) is in fact very sensitive to differences in the starting amount of target sequence in the PCR, because the absolute heights of the measured values "around" the ct *are* depending on this starting amount. This fact is very simple to visualize: Make a half-log plot. Use the log to base 2 for convenience. Take some value at the log-Y axis as the starting amount. From this point, move one unit to the rigth on the X axis (1 PCR cycle) and one unit up on the Y axis (on a log[2] scale this is a doubeling). Continue this for, say 40 cycles. You will find 40 points all lying on a stright line. Now do this one again, but start from a slightly different value on the Y-Axis. You get points on a parallel line with the same slope but a slightly shifted intercept. You *measured* only the points, but the exponential *model* can be drawn as a straight line. Now consider that the very low signals at the early cycles are masked by noise - you don see these points on the line, instead you see only noise there. After some amount of product has accumulated, the noise is negligible and you do see this log-linear increase. Some cycles later the amount of product is so high that the signal is not anymore doubeling each cycle and finally reaching a plateau, usually defined by the amount of dye in the reaction mix. As a result, there is som window on the Y-axis left where the dependency between measured fluorescence and cycle number can be well modeled by a straight line. These linse are parallel and their horizontal distance is proportional to log ratio of starting amounts (i.e., to the difference in the intercepts of these lines). This difference *is* a fractional number. It is not wrong because the values are only measured after the completeion of the individual cycles, but because there are pipetting errors, measurement errors, errors caused by physico-chemical effects in the reactions, and errors in the amplicifation efficiency.
The second point you is that iven if ct values were integers (what is not the case!) - even then the averages of replicates could give you any desired precision just by measureing enough replicates. (ok, to get 2 decimal places here one would need a hell lot of replicates, but I am talking about the principle). An allegory may be seen in the way nerve cells integrate the synaptic input of other cells. The integration may be over time and/or over the number of synapses, but in both cases the individual signals are either "on" or "off", and still is the resuklt a quite precise estimate of the strength of the signal on a continuous scale. These "ons" and "offs" can be translated into "signal in the previous cycle above threshold" and "signal in the next cycle above threshold". Interestingly, this principle is not disturbed by noise - it relys on the presence of noise (it would not work with perfect signals).
Finally may be a another point, to a different topic: The precise measurement can be well important, a tiny difference can be biologically relevant. You forgot that we do not always measure homogeneous samples. It may be that only a small subpopulation of the cells changes the expression of the gene of interest and so induce some important biological effect. If other cells do express the same gene as well, then the measured changes are small, although this may be a real "on/off"-regulation in the relevant cells. Further, quantitative real-time PCR can also be used to measure genomic DNA. In tumor biology, the detection of allele losses of tumour suppressor genes is such a task, and here you often get samples where only a fraction of the cells are tumor cells and again a fraction of these cells may have lost an allele (the same applies for gene amplifications of oncogenes; however, these amplifications are often more that 2-folds and thuse a little easier to detect). You may check these papers: Anal Biochem. 2003 Jun 15;317(2):218-25 and Clin Chem. 2003 Feb;49(2):219-29.
Nice to get downvotes. It shows that I do touch peoples minds. However, I would find it constructive to get a hint why my answers are down voted - otherwise I won't be able to learn and propably correct my thoughts.
I didn't downvote you.... I see a lot of sense in your comment. Larry's argument is initially seductive, but it's something of a red herring. Another way to compare qPCR results would be to check the amount of DNA of two samples at the same cycle. This does away with the philosophical argument over continuous or non-continuous distribution.
Doing it that way, you will get the same answers as measuring the threshold cycle, as long as both PCRs are still in the "log-linear" phase of their amplification. And that answer will *not* necessarily be an integer.
I think the reason we use Ct instead is that it overcomes the problems of widely differing expression where the two samples are never in the log-linear region at the same cycle with sufficient amplification over background.
Thank you to both Jochem and Mark for this question and for your comments. I sincerely apologize if my comments have created any sense of uncertainty (or might have seemed like intentional misdirection), but the ideas have kind of 'nagged at my subconscious' for some time. Does anyone have a really good publication reference which empirically discusses (and analyzes) a concrete example of qPCR machine/reaction variability and 95%CI's, in the context of using means derived from varying numbers of reaction replicates?
Hi Larry, I dont have such a publication and I would guess that there is no such publication. The main interesting part is the variance of replicates. The fact that the standard errors decrease with the square root of the number of replicates is very consolidated text-book knowledge. Unfortunately, the variance depends on 1001 things related to the particular assay, machine, and method of ct-determination. There is no much common interest in a particular variance estimation for a very particular setup and assay. The "rule-of-thumb" of 0.1 square-cycles variance if good enough for most practical purposes. However, your question is in fact (or should!) be of interest for companies selling maschines or assays. They should compare the variances obtained for the competeing assays or maschines. But here, the relation (more or less) is more important than the actual values. But you could probably get some information from companies...
I think that the reliability of our results always is a very interesting issue. I appreciate these discussions, they should help us to reach to a higher knowledge place.
I use q-RT-PCR to quantify minimal residual disease in acute leukemia, and we ussually discuss about the precision of our results... and how to interpret them.
I have found interesting papers about quantification in q-RT-PCR from S.A. Bustin, including one intending to state the minimal requirements for publication of results: The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments in Clinical Chemistry 2009;55:4 611–622.(free) You can look for his articles in Pubmed.
One good standard to judge the reliability of Real time PCR is the R square value of the assay. In our lab, the sequential standard with triplicated samples is set to draw the standard curve. If the R square value is more than 0.99, the data in the unknown samples would be reliable. However, the value sometimes fall in less than 0.9, it should be questionable, if the difference would be less than two fold. We usually confirm the difference of expression with different primer pairs in such questionable case.
The analysis software of the real time machine would help you.
The R² is not at all a good measure for the reliability of a qPCR assay (although often said and written so!). The R² value (alone) is by far not enough to get an impression of the assay performance. All important questions remain unanswered be a high R², and on the other side a low R² does not automatically indicate that the assay is useless for the purpose. The questions (depending on the purpose) to be answered are:
What is the limit of detection?
What is the limit of quantification?
What is the amplification efficiency?
What is the variance of technical replicates (depending on the concentration)?
None of these questions can be answered by looking at the R². Further, the high dynamic range the method likely produces high R² values for standard curves. Typically, the limits of the standard concentrations largely exceed the range of (differential) quantification of the samples, so high R² values are not very indicative of what to expect for "small fold-changes".
Can't help but intuitively agree with Jochen here. The entire 'qPCR' process is an exercise in ever-increasing stochaism in the face of ever-changing reactants and product accumulation. The dynamic universe inside the reaction is never the same from one moment to the next. In addition, the entire process gets more and more random with decreasing target template. Not only that (and especially with SYBR Green-based reactions), more and more thermodynamically rare events are enticed like mice to come out and play when the cat (target template) is away. Primers, if at all thermodynamically possible, will try to bind to 'something,' - even themselves as we know very well. Variance itself is likely not constant over the dynamic range in which a target can possibly amplify and R² values in the obvious strong range of an assay's good standard curve may be telling a story that is too good to be true. Being that a certain high degree of disagreement in technical replicates is used mathematically (by Poisson distribution) to determine when the signature of a single copy reaction has presented itself, it should also follow that departure from idealty all along the entire dynamic range of the qPCR should also be able to be used to determine the degree of acceptability and/or usefulness of all Cq values depending on where they appear all along the scale from cycle 6 to cycle 40.
I'm not entirely sure what that math looks like - but perhaps all of us have already run across it at some point, in some form, in some discipline or daydream. If and when possible, increasing technical replicates here and there (when appropriate, without going overboard) always provides a direct/concrete way to shed more light at problem moments; as long as one doesn't run out of [in some cases precious] sample along the way.
Several studies by Rutledge-Stewart from 2008 on provide good evidence that amplification reactions are not continuously efficient throughout the log-linear amplification phase - confounding the hopes of a simple, linear grasp of the technique/process even further. The more we think we know about the nature of this assay - the more elusive it seems to become. Poor technique and poor initial designs only add insult to injury when things aren't working well in the first place. I feel I have been lucky (result-wise), but, since I mainly run self-designed hydrolysis probe-based RT-qPCR reactions, I never really know what the primers are doing (by themselves) since I don't have the luxury of melt curve analysis to tell me more about what else is going on behind the curtain so-to-speak. All I can do is name what I use, and use what I name - in materials and methods; and try to stay as MIQE-compliant as possible as it seems a fairly good/trustworthy north star to navigate by for now.
Hi Jack, just one tiny note regarding your problem that you "never really know what the primers are doing": You always have the chance to run a gel after the PCR. Further, since you are using hydrolysis probes, you may use a reporter with an emmission spectrum different to FAM/SYBR so you can add SBYR Green to the reaction and record meltings curves, provided your instrument/software is able to measure different wavelengths and possibly correct fluorescence crosstalk.
All depends upon the n number and how good your normalisiong genes are, if you use 3 good normalisers and repeat often enough you'll see reliable less-than-2-fold changes
I use a dinosaur GeneAmp 5700 -- but replace the parts myself. So different wavelengths are not possible. Very little time for qPCR these days as well -- onto another project that is keeping our lab 'afloat.' It will eventually involve qPCR - but down the road a little - and the assay I will run for it I've already optimized nicely and it behaves well. I'll keep the Gel-runs in mind for future endeavors - thanks Jochen.
Primer (and probe) design is key, agreed - it is probably rule #1.
When amplifications show good efficiency it is also a sign that primer and probe designs are well-suited to task. I work with a species that has no validated primer-probe sets - so I must design my own based on the best information in all available data bases keeping in mind possible splice variants and intron-exon junctions always. Primer Express has really worked well for this.
Have you noticed that all primer-probe pre-validated sets (for those species that are covered) are all to be used at 900 nM primers and 250 nM hydrolysis probe (from AB)? The take-home point here, is that, apparently, the question of when to use asymmetrical qPCR (different Fwd primer concentration than Rev primer concentration as determined by running a matrix of different concentration schemes up front) becomes less and less of a concern when primer and probe designs are optimal/pre-validated.
In those projects I've assisted where such pre-validated sets were purchased and used (in mouse), we found that reducing these concentrations to half gave better efficiencies: 450 nM primers, 125 nM hydrolysis 6FAM-MGB probe. So, playing with final in-well primer/probe concentrations a bit does still help as well, even though the company suggests one blanket concentration for all sets...
I wonder if Origene uses a different mastermix when they 'optimize' their designs? Your mastermix perhaps is more optimal for your own designs - given different ionic strengths, Mg++, additives, betaine, etc. Effective Tm values of primers I'm sure are influenced by all of these factors. Start your own company! ;]
I've read that primer designing algorithms are improving with time... and the programs you're using are among the best (most up-to-date). Oddly, though, sometimes trial and error does a good job wherein on many occasions primers that shouldn't really be expected (theoretically/thermodynamically) to work, end up working very well. There are certainly other contours at work - secondary structures etc. IDT has a nice graphic that shows the secondary structures of the target involved with each primer choice. But, even with sophisticated physical chemistry-based thermodynamic features figured in, a random choice can still win the triple crown. You may have the insight to know what to look for in your primer sequence choices. Fidelity of primers/probes for the exact target is a patient lonely exercise in tapping all available information in all available data bases and/or publications. Nonetheless, I never trust a "published" primer or probe set or sequence to perform the way it reportedly did in a paper. I have to see the source sequence used with my own eyes, check the date and source, and see exactly where the primers bind, then BLAST all sequences to be absolutely certain high fidelity performance is a reasonable expectation. Only one set (primers and hydrolysis probe) have I trusted in a publication, then ordered and used it - and it worked very well. All the other 200 or so sets, I have designed myself. Probe-based is always too expensive. SYBR Green-based (or Eva Green and other intercalator-based) qPCR/RT-qPCR is most certainly the most cost-effective way to go. But, the clean, specific signal I get with the hydrolysis-probe approach is hard to turn away from. Fortunately, the primers and probes currently on hand will still last a long long time stowed away safely at
Good point, Jack: good primers are found empirically, not theoretically. Primer design software can help, at least to rule out obviousely bad primers. I very often faced the situation that a primer pair *not* recommended by the software (I tried several different) eventually "won the triple crown". It is always worth to test several different primer pairs and then select the ones that perform best.
One minor correction: SYBR-Green and analogues are *not* intercalators but minor-groove binders. The generation of fluorescence (gain of fluorescence emmision) is entirely different. For intercalators, the exciting light is absorbed by the DNA, longitudinally transferred by Dexter-transfer until it comes across a dye molecule that emits the energy as fluorescence. Minor groove binders absorb themselves, but free in solution they convert the energy to heat (translational movement of the moieties). When the molecule is fixed in the minor groove, distortions of the molecular shape are not possible anymore and so the energy is emitted as fluorescence. Sure, from the application point-of-view the result is the same :)
Thank you for pointing this out - I've seen SYBR Green called an intercalating dye before - now I know that is not correct. Always good to learn something here! Ethidium bromide would be considered a true intercalator then.
You're wellcome. And yes, Ethidium bromide *is* an intecalator. The confusion was initiated by the fact that the SYBR dyes replaced EtBr for signal generation (in gels as well as in qPCR), so people took it for granted that the principle of signal generation is the same...
In fact the best thing to do is to look at the level of expression of the corresponding proteins since the level of mRNA does not correlate with the protein translation amplification.
A Cp - value (usually is meant the "crossing point" of the amplification curve of a PCR, i.e. the cycle number at which the amplification curve crosses the threshold. This is also calle a ct-value, for "threshold cycle"). If you talk about this, then a Cp of zero means that the fluorescence signal of the PCR is already above the threshold even before the amplification started. A cause for actually getting such a value can be only a failed signal processing (e.g. background correction) or a problem with the signal noise, because one of the first steps in the analysis of the amplification curves is to estimate and subtract the (average) signal from the first few cycles as "background" (the concrete calculations differ from software to software and are often a bit more complicated - but this is what happens essentially). Therfore the (average) signal in the first few cycles should be (very close) to zero and thus below the threshold.
Even when the PCR is over-loaded with extremely high concentration of template, you will get a relatively high signal initially, but the background correction will set the initial signal (close to) zero. The amplification curve of such PCRs usually will stay quite "flat" (there won't be any visible amplification), usually because the probes are "maxed out". So what can happen is that you "measure" no Cp (i.e. Cp > 40, the background-corrected amplification curve will not increase to an amount to cross the threshold) what would actually indicate the absence or very low concentration of the template (but the extreme opposite is the case).
Generally, Cp values below 10-15 are not reliable, because either the processing went wrong or the initial template concentration was too high, what likely causes problems in amplification, signal generation, and background correction.
There are so long replies that I could not read them all, but I just want to emphasize that the level of the transcripts does not correlate the level of the proteins that govern the gene function. There a strong messengers that are efficiently translated (and translation is a good amplifier of gene expression) and weak messengers. A strong messenger even if its level in a cell is low can produce many functional proteins and the reverse is true. And RT-qPCR does not correlate with the level of mature messengers.
In addition, some messengers for some reasons can be silenced also in a cell, mRNA is present but not the protein.
Therefore, even if the level of the messenger is important do not forget the proteins.
I think we should all start to consider to use "mRNA expression" instead of gene expression. Of course above 2 fold of difference in mRNA levels may be a good suggestion for increased gene expression, in particular, at some point of an experiment especially in a time course using an inducer.
I watched your webinar (direct link is https://www.youtube.com/watch?v=RFqKvqZ8ONE&feature=youtu.be ) and very much liked the presentation of the concepts of Limit of Detection and Limit of Quantitation. The implication for this discussion seems to be that for some samples with amounts below the limit of quantitation, a fold change can not be measured. And since the limit of detection is lower than the limit of quantitation, there will be some samples where analyte signal may be detected but there is still no possibility of reliable quantitation--and thus no possibility of reliable relative quantitation.