I want to analyze the results of a plate with a target gene and 2 or 3 housekeeping genes. How does it work? How to combine the housekeeping genes? How would the calculation (delta Ct; delta-delta Ct) work with 2 or 3 housekeeping genes?
I assume that your PCR assay is well optimized. Then thhis is the simplest way to analyze your data:
a) for each sample average the ct values of the 3 ref genes --> ct[ref] = mean( ct[ref1], ct[ref[2], ct[ref3] )
b) for each sample, calculate the dct as the difference dct = ct[ref] - ct[goi]
c) you should now have six dct values, three for each of your two groups (healthy and diseased). You can calculate the ddct as mean(dct[diseased]) - mean(dct[healthy]), you can use a t-test, etc.
in the first step, you calculate delta Ct for each housekeeping gene, then you calculate the geometric mean of all housekeeping gene for each sample, and use this mean to calculate the relative expression.
Ratio = Etarget^(deltaCttarget (control-sample))/geometric mean(Ereference^(Ctreference (control-sample))
where E is amplification efficiency, which you can determine with a dilution series (E=10^(-1/slope of Ct against dilution)) or set it to 2 in case of the delta Ct method
Nicole, if E is not 2 (or you have reasons to believe that it isn't) and you correct for it with an erroneous estimate from a standard curve, some error propagation! Have you ever checked the width of a confidence interval you get then? Subtituting one error by another that is subsequently ignored is not always a good way to go.
Geometric mean is simply average or mean. The best method with 2 or more than 2 housekeeping genes is take average of the ct values of housekeeping genes and then use that for the further calculation.
Note that the geometric mean simply corresponds to the arithmetric mean (average) of the log-values that is back-transformed to the original scale.
Often, the error distribution of values around a center is not symmetric. In biology, this often coincides with or follows from effects that are multiplicative on the relevant scale. For instance gene-regulation. A cell won't count if there are 100 mRNA molecules more. If the concentration of this mRNA is high (some billion molecules in the cell), 100 more or less won't matter. If the concentration is low, a change by 100 molecules may have a relevant biological effect. The cell actually "recognizes" the change of the mRNA concentrations as proportions. This can be modeled with multiplicative models, where effects are given as factors instead of simple additive terms. However, most statistical procedures were developed for additive models. Fortunately, the logarithm of a multiplicative model *is* an additive model (over the logarithms):
log(A*B*C*...) = log(A) + log(B) + log(C) +...
Now the ct-values measured *are* already logarithms of the mRNA concentration. Actually, ct = -p*log(Conc), where p is a typically unknown proportionality factor (p>0) depending on the instrument, the signal chemistry, the threshold settings, the amplification efficiency and possibly many more things. A dct ist the difference of two such logarithms, so actually a log-ratio of concentrations. There are still the proprotionality factors, so a dct has no particularily interpretable meaning in itself. But it can be compared to another dct value (from another sample) that was calculated for the same genes in the same way.
Averaging dct-values is mathematically identical with the log of the geometric mean.
Multiplicative processes (such as gene regulation) should be modelled on the log-scale. Therefore I suggest to stay with (delta-)ct values. Even if for efficiency corrections some "relative expression values" are calculated, these values should be loged again to be further processed. These log-values can then simply be averaged.
I do not understand why some authors start confusing the community by emphasizing that taking geometric means is important. They should better stay on the (biologically more relevant) log-scale.
If I am using 2 reference genes, how can i find the Relative expression. In Pfaffl method, in the denominator, which reference gene's efficiency i have to take. Should I find the GM for efficiencies too?
Hi, I think I have the same question as Ahad: what if I have, let's say, 3 control samples and 3 treated samples, each of which were treated and analyzed in parallel. For example, RNA from 3 patients and 3 healthy people exposed to the same drug. If I have data for 3 control genes and the GOI for all my 6 samples how can I compare the two groups (healthy people and patients)?
I assume that your PCR assay is well optimized. Then thhis is the simplest way to analyze your data:
a) for each sample average the ct values of the 3 ref genes --> ct[ref] = mean( ct[ref1], ct[ref[2], ct[ref3] )
b) for each sample, calculate the dct as the difference dct = ct[ref] - ct[goi]
c) you should now have six dct values, three for each of your two groups (healthy and diseased). You can calculate the ddct as mean(dct[diseased]) - mean(dct[healthy]), you can use a t-test, etc.
I have no control group or test group. But a study with mixed interventions, such as intervention A, B, C, A & B both, B & C both or A & C both. I have three housekeeping genes and target genes. So, which formula to use for this condition. A geometric mean of 3 house keeping genes as a normalization factor or, average of the three housekeeping genes?
And do I need to calculate dct: ct[ref]-ct[goi] mentioned in the formula above?
Thank you very much, Jochen Wilhelm and the others.
I have now a better idea about why performing these calculations on qPCR analysis, and the way to do it with more than one reference gene. I understood the model (!)
I have a couple of questions regarding the SD/SE, namely when normalizing to several internal control genes.
Should the standard deviation of the technical replicates be carried on in downstream calculations? Or should we assume each mean Ct of technical replicates as our "true" value for that sample, and only calculate the SD of the delta Cts from biological replicates? I've seen conflicting information regarding this.
I am also not sure how to calculate the SD of delta Cts when we have multiple control genes.
I don't think you need to carry the values of technical replicates in the downstream calculations. It will unnecessarily complicate the analysis. It would be better to calculate the mean of technical replicates and use that mean as an individual biological replicate. What you can do though is calculate the relative standard deviation of technical replicates to see if your mean ct values are acceptable or not. You can arbitrarily set the relative standard deviation threshold between 5-10%. If mean of technical replicates go beyond that RSD value, reject those samples or repeat the pcr on those samples.