Due to the high cost of RNA-seq per sample. Do you think that it will be correct if I bulk three-four biological replicates and send this bulk for RNA-seq?
Of course you can do, and you might use the results to generate hypotheses about genes that might play a role in your experiment. But: you won't get the data published, and you won't have any estimate on the (biological) variability of the results and it won't be possible to get the statistical significance of the estimated fold-changes. Many genes show a considerable variation in expression between biological replicates. These genes may not be reliably linked to your treatments, and/or it may be practically impossible in follow-up experiments to reliably detect the regulation of these genes, even when they show a large fold-change in the RNA-seq data.
I would not recommend this. Always make sure to have some information about the variability of your results. I'd consider a sample size of 5 as the bare minimum.
If you don't have the resources to make a proper experiment, then think of a different method (or even of a different research question) instead of doing a too small experiment that provides essentially useless data (you would spend, say, one-fifth of the actually required money, but that would just mean that you right away wasted all that money - so better spend it for something else: e.g. for a different, but proper experiment). Saving money is a completely wrong argument for doing such "n=1"-experiments. The argument is opposite: if you have a surplus of money, just enough for such an "n=1"-experiment, you could just do it and see if it helps in addition to what you do anyway. But you should not be depending on the results of this experiment.
Hello! I think you can get samples sequenced for around $170.00 per sample from Novogene or BGI, depending on where you are in the world. This was what plasmid sequencing cost not too long ago. You would need at a very minimum three samples and in a relative comparison with a control, profiling is not very helpful. At a bare minimum you would need three treatment and three control. I recommend more than this (4, but 6 is better see the paper) because accidents happen and with raw data you can do principle component analysis and hierarchical clustering ('ggDendro'), and you should. If there isn't very good separation of your conditions and clustering of your replicates your differential gene expression analysis will be poor. I highly recommend the R/Bioconductor software package 'limma' and the voom method. Here is a pretty good paper about replicates. Article How many biological replicates are needed in an RNA-seq expe...
thanks for the insightful commentaries of my colleagues. However, I beg to differ in some detail as this problem goes a bit deeper. (Just my 0.02$)
1. Throwing three samples into one is, of course, a bad idea because you waste ressources. Why not make three differently barcoded libraries and then send them for sequencing on one lane. You do not loose information and you can always ask for more reads if your sequencing depth is not sufficient. Thus you save in sequencing costs but keep all the options.
2. Do not make technical replicates. If you master the technique they will be +/- identical. If you have technical problems no replicate will help you anyway.
3. If you run biological replicates I wouldn't use the classical R-programs. Most assume that there is a "true" value that you can't measure because of random variation in your method/sample. However that is not exactly what happens in nature. Imagine you derive three transgenic cell lines with an inducible transcription factor to find target genes. Now you compare 3 times TFon/TFoff. You get following values:
TFon TFoff
geneA
sampleA: 1000 100
sampleB: 100 10
sampleC: 10 1
It is clear that geneA is very interesting. However, if you define sampleA,B,C as triplicate most analysis programs will throw this gene out because base-line expression has a higher variation as the overall difference between on/off.
Alas, geneA may be a perfect and important target gene as you do not control the overall concentration of the transcripiton factor in the transgenic cells. So it may be perfectly ok that you see this large variation in base expression. Fold-change is what counts here. In biology a value very frequently depends on more than one factor and not all can be controlled. Classical statistics fails in this cases.
Therefore I'd recommend to run barcoded libraries - evaluate each one individually and look for the intersection of genes that come up as interesting in all three instances. Then follow up on these.
In the end no statistics can replace the good old biological confirmatory experiment anyway (although "wet" biology seems to be out of fashion nowadays).
Point #2 is also reasonable, as one should get the assay working reliably and reproducibly before doing hot experiments. But yet, tech reps are a valid approach to reduce the contribution to or eliminate a part of technical variation in your data. For NGS, tech reps are expensive and the typical contribution of the variance added by NGS is typically negligible compared to the biological variance.
I don't fully subscribe to point #3, since most R packages dealing with NGS (e.g. limma or DEseq) work on the log scale of read counts, and a difference of logs is a log ratio (or log FC). So it is just matching what you say: "Fold-change is what counts here.".
As others have said it is not a good idea to pool your biological replicates and bulk sequence them. Because you are measuring the expression of tens of thousands of genes in parallel, the odds of false positive results are exceedingly high. For this reason, measures of biological repeatability (only possible with the inclusion of biological replicates) are of central importance.
I agree that barcoding / sequencing on one lane is the way to go. However, the main expense in RNA-seq is in the labour required to prepare the libraries, so separately barcoded libraries run on one lane will only partially fix the issue of cost.
What I suggest is that you find a collaborator that is good at preparing RNA-seq libraries themselves, and include them as a co-author on any work you publish. Then you'll only need to pay for sequencing costs, which should be a lot cheaper than library prep + sequencing.
I don't suggest preparing the libraries yourself unless you plan to do a lot of RNA-seq in the future, as learning to make the libraries takes a bit of trial and error.
Hello! I do think it is best for you to do as much as you can yourself. You don't need an NGS scientist or the worlds greatest statistician to help you with your plasmid sequencing. If you can, I guess, but they may blow you off. Lol. Personally, I think that every single molecular biologist should be able to isolate and treat their cell type of interest, isolate and characterize high quality total RNA, or even subset preparations (ex, miRNA), and then send for library production and sequencing (BGI or Novogene or wherever) and then qualify and characterize the results. Because bulk RNA sequencing is now so affordable, it simply cannot be only the domain of the "qualified" that takes too long and no one really cares as much about the results you need more than you do. That being said you should be able to consult with the greatest NGS scientists and statisticians to make sure you aren't messing it up. There are community standards, just make sure you know them.
My experience is probably not typical, but across a number of institutes here, I have personally witnessed very high ranking PI literally begging, possibly legendary NGS scientists for help with extremely mundane data analysis. The same PI scoffs when you approach to help them. Their results are mediocre and take forever, they are able to take that hit no problem and over the span of time are totally fine. My takeaway is that just because someone is gifted or very well educated doesn't mean they are helpful. In addition, the "core" facilities across this area never turn away work. They pretty much know that the experiments are not scaled properly, or the isolation and/or viability is poor, but they don't argue because it just causes problems. Here the success, or failure, of the PI doesn't depend so much on any individual success (or failure). Also, the totally dry labs are buried in garbage. I understand why people want to stay home and just process data, but that will eventually bite you (or not). In order to stake your own life on the data and its high quality you pretty much have to generate the data, or the confirmation (in-situ, or qPCR) in the lab yourself. The only way to navigate this mess is to do it for yourself.