Hi all, is someone has idea how to calculate p value from RNA seq data if we have only one value for fold change. I have average of two replica not individual one so I have only one value.
What sort of RNA-seq data are you looking at? Are you trying to do differential expression analysis? If so, you can see if the deseq vignette has any information (it covers some examples using only 1 sample)
If these are technical replicates, then it won't be worth to calculate any p-value. So I assume these are biological replicates.
You know n (2) and the log fold-change. If you would additionally know the standard deviation or the standard error of the log fold-change, then you can calculate the t statistic and you know the number of degrees of freedom for this value (what is n-1 = 1). Since the assumption of the normal distribution is reasonable for log-fold changes, you can get your p-value from the t-distribution with 1 d.f. The p-value according to your t-statistic tests the hypothesis that the mean log fold-change is 0.
As far as I remember the DESeq2 has a parameter ("blind") for working without replicates. You could look that up in the vignette ot the package. However, this is not really recommended. You may also be interested in http://www.bioconductor.org/help/workflows/rnaseqGene/.
BTW. DESeq2 uses a model for the read counts based on the negative binomial distribution, since RNA-seq counts tend to be overdispersed. In the absence of replicates doing a t-statistic on the log-fold change is a simplification and probably leads to many false positves compared to tests with biological replicates and a model applied to the count data (DESeq2 or EdgeR).
Is DESeq really using a t-test on LFCs when there are no replicates? Actually one would test the difference of the rate parameter of teh Poisson distribution, what is usually achieved be a Likelihood ratio test based on the Chi² distribution. This is in fact possible without replication (sincd the count is an estimate for the rate parameter).
The negative binomial model requires estimating two parameters, what requires at least two replicates and should thus not be applicable on data without replication.
I just interpreted your comment as doing this manually on the LFCs.
True, for the negative binomial model the mean and the dispersion parameter is needed.
The DESeq2 Vignette says under 5.7:
"Can I use DESeq2 to analyze a dataset without replicates?
If a DESeqDataSet is provided with an experimental design without replicates, a message is printed, that the samples are treated as replicates for estimation of dispersion. More details can be found in the manual page for ?DESeq."
My interpretation is: Even then DESeq tries to get an estimate of the needed dispersion parameter. Although using samples as replicates imo only holds when there are not too many truely differential expressed genes in the experiment. The extreme case of a single treatment vs a single control would have uncertain results regardless which statistics is applied.
Note that in my original comment I referred to the fact that the Shilpi was talking about a mean LFC from 2 replicas, and so there might be an estimate of a standard error of that mean LFC, what would allow to perform a t-test (at least technically). It remains the question if the replicas were biological or technical and if it makes sense at all to look for statistics other than the mean itself.
I never meant to say that a t-test is possible for n=1 per group if there is no information available for the variance (->standard error).