Please see this link; it also discusses the difference between parametric and non-parametric test: https://www.biogazelle.com/seven-tips-bio-statistical-analysis-gene-expression-data
"Seven tips for bio-statistical analysis of gene expression data"
I have some problems with the advise of "parametric" and "non-parametric" tests. They test different hypotheses, so they are NOT really comparable. I have to decide which hapotheses I want to test - this will essentially fix the test I have to do. Some hypotheses simply cannot be tested validly with certain data. Then there simply is no way to test this desired hypothesis. Just jumpng to another, different hypothesis, noly because this can be safely tested with the data - is not what I'd call science.
Another point I am concerned of is the multiple testing. Again is is NOT that "if you analyzed several targets then you have to correct for multiple testing". No. It rather depends on what kind of error rate you want to control (if you want to control an error rate at all; this may sometimes not be sensible in a research setting). Clearly, in screenings may you have to control an error rate - unless you are not just seeking some few "top candidates" to process further anyway. And then it's the question if the FWER or the FDR should be controlled (again depending on the aims: what should be actually done with the list of candidates).
The paper from Schmittgen and Livak reproduce the same unfortunate mistake the authors introduced in their first paper 2001:
http://www.gene-quantification.net/livak-2001.pdf
Further they state:
"When real-time PCR data is to be presented as individual data points it should be presented as 2-DCT or 2-C rather than the raw CT value" (for what they cite their own 2001 paper). - I strongly disagree here. It is also striking that they easily accept the presentation of DCT in heatmaps (but they do not mention why this should be sensible, or why showing DCT values should be non-sensible in other plots/diagrams).
In "Example 2" they present two values for the very same measure: the "Fold change due to treatment" is presented as 0.287 as well as -3.5. They really call it both the same ("Fold change due to treatment"). This is annoying.
In "Example 3" they average fold-changes. Another no-go.
I could continue...
To my opinion, this paper is a great source of things that can really be messed up in real-time PCR. It is embarrassing that such papers make it into Nature Protocols. I really would like to know who the reviewrs of this paper were.
I feel sad that the authors did not correct their mistakes, not even after 7 years, and that nobody else did. The 2008 paper about the analysis is still authured by an employee of Applied BioSystems and a pharmacologist. There was no data analyst, no mathematician and no statistician involved, not even mentioned in acknowledgements. And this in a paper that should highlight a data analysis.
Normalizing to 1 implies you are transforming log values into linear values. That is you are using the 2-DDCT formula instead of DDCT. This will possibly change a bit the significance of any gene expression modulation observed. Indeed, any inhibition of expression will be flatten between 0 and 1 values whereas increse in expression will occur betweeen 0 and infinite. On the other hand, log values allow you to normalize to 0 therefore displaying negative effects in the 0 -infinte range and positive effects in the 0 +infinite range.
As regards statistics, it depends on your samples number type. Generally, if you have a small sample size from animal tissues a non-parametric test is more appropriate. Otherwise, for in vitro studies both parametric and non-parametric tests are accepted, also for a limited number of samples. In any case, if you want to be sure your data follow a normal distribution, perform a Shapiro-Wilk test.
Sabrina, the test to be used is not dictated by the sample size. It is given by the hypothesis you want to test (only if there are several tests testing the desired hypothesis one should chose the test with the highest power). Parametric- and non-parametric tests generally do not test the same hypothesis and so they are not simply interchangeable.
If tests are performed "just to get a p-value", well, then it does not matter anyway what test one uses. Given the obviousely common practice that tests are interpreted after adusting the data for the test or selecting the test based on the same data, or that different tests are tried and the one with the lowest p-value is eventually selected, and that experiments are repeated until the p-value is low enough for publication... we do not need to care about anything (except one point: the reviewers have to accept the paper).
Note that your suggestion to perform a Shapiro-Wilk test falls into this category: selecting the test based on the (same) data. This advice is flawd for two reasons: 1) the test does not tell you the important thing: "is the violation of the assumption critical/relevant for the problem under consideration?" (if it is "significant" the violation may still not be a problem at all; if "it is non-significant" you do not know if you made a type-II error) and 2) as indicated above, selecting the test on a feature of the data to be tested renders the p-values meaningless. A possible remedy was to have independent data for judging the data characteristics and the appropriate test but the to perform the test on different independent data.
Another great program to use is CLC Genomics Workbench. And you can find their manuals and tutorials online at their website http://www.clcbio.com/products/clc-genomics-workbench/
Very interesting comments, Jochen. However, I thought that parametric tests were to be used for data with gaussian distribution, whereas non-parametric tests should be used for data for which you don't know whether they are gaussian or not. But in both cases, I thought the test was actually to determine the probability of mistake in rejecting the "no difference" hypothesis. Am I wrong with this? It does not seem to be what you say...
a parameter is a property of a "parametric" model. Such models have a "deterministic" part (describing the quantitative relationship between the predictors and the response) and a "stochastic" part (describing our expectations about the variability of the response dure to factors we do not apecifically consider in the model [for whatever reasons]). The stochastic part determines how uncertainties about model parameters are to be derived (and, thus, how hypotheses about parameter values are "tested"). This may be based on the gaussian distribution but it may also be based on other probability distributions.
A non-parametric test is not based on such a model. Rather than evaluating a quantitative relationship such non-parametric procedures consider only some stochastic association of ranks (order statistics).
Parametric and non-parametric tests are therefore neccessarily about very different hypotheses and they are no alternatives. They are just different things, answering different kinds of questions.
For instance, is you want to test if the expectation for the response depends on some predictor and the data (the residuals) have a gaussian distribution (at least approximately), then the t-test is appropriate. If the residuals are considerably non-gaussian, then the t-test is not correct. But there is no other simple test for testing the desired hypothesis. A Wilcoxon test (often proposed as the "alternative") will test a different hypothesis. The only available alternative to really test the desired is a bootstrap-based test.
But the real problem and the good solution is yet different! The "non-gaussian distribution" of the residuals tells us that our model irgnores some obviousely relevant information or that it is systematically wrong. Either we are asking some non-sensical question, or we won't get a good answer as long as we ignore what the data tries to tell us.
In practice, people usually do not care for the kind of hypothesis they want to test. Actually (I think) most don't want to test any hypothesis at all they only want a p-value. They also don't care what it really means or what they really do. The relevant part is that the p-value should be smaller than 0.05, so that they can call the result "significant", and that a reviewer won't moan that they used a wrong method to get the p-value. Looking this way, using the Wilcoxon test is never the wrong method to get a p-value, since the tested hypothesis is a) not stated and b) will anyway depend on the actual distribution. That the actually tested hypopthesis is eventually not what the authors were intersted in is a topic that is usually never discussed.
So your point of view seems to be coming out of this fashion: "If the data is normal, you can use a t.test to get a p-value and no-one will criticize it. But if not, better use a rank-based test to get a p-value, so the reviewers will wave it through".
This is, to my opinion, related to our publish-or-perish culture but has only little to do with good science, that should ask for a good model (rather than for a small p-value).