Variations across studies, methods and limitations of interpretations have prompted most scientists to shift to RNA-seq. But huge amount of tax-payers' money, researcher's effort and samples from donors' has gone into the generation of the microarray data. What is the value of all that now? Do you think something useful can still be produced from existing microarray data? [I think it is possible, I may be wrong; but curious to know what other researchers think]
Hi everyone
Very interesting point to discuss. In fact, microarray technology has serious advantages over RNAseq in several fields. The most obvious is the prize, but the main advantage of microarrays is related with its intrinsic robustness. Do not forget that these guys are in the market from several years, and today the microarrays are extremely sensitive and reliable. They are easily customizable, reproductible, and can be addapted to many situations.
Another advantage is related with the lack of dependence to the "bioinformaticians", a rare human sub-species that is required for the processing of RNAseq data. In fact, I like to be independent, but at some point I have always to ask to the bioinformaticians to solve some RNA-seq derived problem. Also related with this field, the microarray processing software has evolved to a very user-friendly family of programs, able to run in a poor laptop computer and extremely powerful. Many of these softwares are for free, and maintained by a really excellent cohort of dedicated people. The RNA-seq software is not so user-friendly, and it is constantly evolving at a rate that is difficult to assimilate even by the bioinformaticians. RNA-seq is still an ongoing technique, and the golden standards are currently based on a commercial fight between companies like Illumina. I used to compare that with the fight that happens in the later 70-80s with the video systems. VHS was the winner, but it was not the best, only the one that was cheaper and more suitable for mass production. In NGS we still not have a clear winner.
Regarding your question of what about all the stored data from microarray experiments, I would like to point out that this is a real advantage for labs like mine, with limited resources. You can easily, for example, use old data to ask new questions. For instance, you can use data that has been used to study gene expression, to analyze alternative splicing and vice versa. I am currently using old array data to sduty lincRNA expression, but you can think in other apps. In consequence, there is a whole universe in GEO that still needs explorers like us.
Best
Paco
Why do you think that RNAseq data will be any better? The problems are not caused by the technology, they are caused by experimental design and the unreflected idea that data could be vaildy compared across different experiments, experimental questions, and technologies (probes, amplification, signal-generation). A further "misconception" was that deep, comprehensive and reliable "knowledge" could be obtained from array studies. We must face that biology doesn't work like that. We now get more and more evidence from array data that we need different ways to think about regulatory networs and that we must consider much more details as we were used to by looking at the set of "keyplayer" genes being drastically regulated anyway and with a more or less defined meaning of "up" and "down" regulation in the studied contexts. RNAseq is different insofar as we do not yet have this enourmous body of data telling us again that our understanding of regulartory networks (or "systems biology") is far too simple.
Or thinking in another directiion: "big data". Analysis is typically heuristic and may lead to useful predictions anyway, independent of any useful mechanistical insight.
Why would you think that the development of newer technologies makes the data and information from previous technologies "a waste"?
The data from microarray studies has not lost value, nor has the conclusions drawn from it and the information gained from it. It is not as if the invention of RNA-seq technologies has suddenly demonstrated that the biology gleaned from microarray studies is now proven wrong, or fundamentally flawed.
That's like saying that the invention of DNA sequencing somehow made all previous genetics (RFLP's, allozymes, phenotypic studies) useless and thus a waste.
Data generating technologies constantly evolve and change, but the scientific knowledge gained along the way builds on itself. That's the natural progress of science. It's not any particular data per se that has value, it is what you learned from it that matters. And I would say we have certainly learned a very great deal from microarray studies (and will for some time yet - people still frequently publish microarray analyses so the demise of microarray technology seems a bit premature to me).
RNA-seq is, to my mind, an evolution of genetic data technology, not a revolution. It builds on what has been learned from microarrays, but hardly negates the last 15 years of genomic array efforts. And it raises as many new issues (things like unexplored sample bias, proper statistical techniques for analyses, etc) as it solves.
Hi everyone
Very interesting point to discuss. In fact, microarray technology has serious advantages over RNAseq in several fields. The most obvious is the prize, but the main advantage of microarrays is related with its intrinsic robustness. Do not forget that these guys are in the market from several years, and today the microarrays are extremely sensitive and reliable. They are easily customizable, reproductible, and can be addapted to many situations.
Another advantage is related with the lack of dependence to the "bioinformaticians", a rare human sub-species that is required for the processing of RNAseq data. In fact, I like to be independent, but at some point I have always to ask to the bioinformaticians to solve some RNA-seq derived problem. Also related with this field, the microarray processing software has evolved to a very user-friendly family of programs, able to run in a poor laptop computer and extremely powerful. Many of these softwares are for free, and maintained by a really excellent cohort of dedicated people. The RNA-seq software is not so user-friendly, and it is constantly evolving at a rate that is difficult to assimilate even by the bioinformaticians. RNA-seq is still an ongoing technique, and the golden standards are currently based on a commercial fight between companies like Illumina. I used to compare that with the fight that happens in the later 70-80s with the video systems. VHS was the winner, but it was not the best, only the one that was cheaper and more suitable for mass production. In NGS we still not have a clear winner.
Regarding your question of what about all the stored data from microarray experiments, I would like to point out that this is a real advantage for labs like mine, with limited resources. You can easily, for example, use old data to ask new questions. For instance, you can use data that has been used to study gene expression, to analyze alternative splicing and vice versa. I am currently using old array data to sduty lincRNA expression, but you can think in other apps. In consequence, there is a whole universe in GEO that still needs explorers like us.
Best
Paco
Very relevent question...well i hv no personal experiance of microarray but i can say the technology is blindly used ....smtimes widout giving a thot that what will be the steps after data generation....how that b analyzed...wat are the investigators expecting and where wud they focus in that oceanic data...widout havin clear design and preplanning ....Microarray can go waste...
@Gabriele
Completely agree with your opinion and doubts. However, I still think that microarrays are good enough to drive new conclusions. Of course we must be critic about the conditions and design of the experiments, but this must be applied in a case-to-case basis.
The idea of using the old data to drive new ideas, it's exactly the right use for old microarray data.
Thanks for your post. Best
Thank you all. A few additions from my side:
a) A lot of existing gene expression data has been produced using low quality probes (see http://www.biomedcentral.com/1471-2164/14/922 - where we developed a method to identify alternatively spliced forms of expressed mRNAs, using existing microarray data, but after filtering out poor probes) and array-designs.
You may mis-lead yourself by using such data from unreliable platforms (I have experienced a case at least).
b) The question was prompted by opinions of a few colleagues. I think, existing microarray data can actually be useful; we have been trying to put them to better use. It may be better to meta analyze existing data, rather than simply use one or a few data sets from a single study. We have tried a few meta analysis methods to enable better use of such data (e.g. http://www.biomedcentral.com/1471-2164/11/467).
Keep in mind though that RNA-seq does not necessarily nor inherently overcome all of the limitations of microarrays. Library bias in RNA-seq may in turn lead to significant bias in SNP calling, for example. High variance in low count RNA-seq data makes estimation of differential expression in low expressors problematic (in my own experience, microarrays seem better at detecting low levels of expression, while RNA-seq is superior at high(er) levels of expression). Cost of sequencing often leads studies to include far too few replicates/samples, making their conclusions suspect simply because they used a compromised study design to begin with. Immature analytical methods also mean that sometimes, someone analyzing the exact same NGS data with an alternative software algorithm may counter previous conclusions - that is one thing that, just like in the past with microarrays, I'd expect to eventually go away as tools and analyses mature.
I'd agree that one of the great untapped resources is all the micro-array data sitting in public databases, but the exacty same situation exists for NGS data. More often than not, any given data accession was only analyzed with a single study or goal in mind, yet it may be useful for all sorts of other questions. If data from various studies is taken together, then even further possibilities exist. And I'm not criticizing the submitters (myself included) - we all generated data with a particular study in mind, and rarely have the funding or the opportunity to go back and see what else that data, with or without other public data, might be used for.
That's one area where cloud computing resources could play a big role - making data instantly accessible to everyone, along with the tools and the compute resources to explore it, add their own data, combine outside data, invite collaborations and even place analyses and results for immediate interactive sharing. Databases for deposition are great, but only go so far. Fully interactive cloud based data centers should be the next step and people are already working towards such things.
True: "RNA-seq does not necessarily nor inherently overcome all of the limitations of microarrays"
A very good discussion! We need more of these discussing conceptual topics such as advantages and limitations of currently used platforms for genome-wide assessment of gene expression. I think that with his last comment Michael did nail it down! As a person who dabbles in both technology I can see the advantages and disadvantages of both and right now I'd say they equal each other. From a purely data generation point of view, the RNA-Seq are clear winner. We do get a lot of data, however that also is the Achilles heel, as the bioinformatics (not to mention statistical issues) approaches dealing with this large amount of data are still struggling to identify a pattern in the sea of data. A simple example, "what is the best approach to normalize the expression data from RNA-Seq experiment?" Microarray platform on the other hand are by far more mature and well understood and that is understandable as these have been around for close to 2 decades. The analytical approaches are much better developed and any potential issues can be far more easily recognized and dealt with. I believe that the microarray platforms will be around for awhile longer, especially for clinical diagnostic purposes, or assessment of experimental conditions with well defined expectations.
As Michael put it, one, may be not huge, but still very relevant problem, of the RNA-Seq data generation is bias in the library preparation leading to an unequal representation, especially of low abundant, transcripts in the cell. And that should not be surprise either, after all the massive parallel signature sequencing was originally developed for sequencing DNA (which is far more static than RNA) and later adapted for RNA.
I personally think, and of course I may be completely wrong, that RNA-Seq in their current form will be supplanted by the sequencing by synthesis approach. In contrast with deep sequencing which was adapted to sequence RNA, sequencing by synthesis approach is truly best suited for detecting and very precise quantitation of transcript abundance. This approach also bypass two other issues with RNA-Seq, library preparation and inherent ambiguity in read mapping. Granted the latter is improving as the read length increases, but it is still not nearly close to the average length of 2Kb sequence.
Just my 2 cents :)
@Vladimir, great point!:... I am also dealing with the generation of both type of data, and I completely agree with your comment about the development of High-throughtput techniques for DNA and its further transport to the field of RNA.
However, and as a curious science individual, I got very excited with your idea that SEQ techniques will be substituted with sequencing by synthesis. Myself, personally, and the whole forum will be forever grateful if you can develop more this idea. If you have some references, opinions, texts, etc, to share with us that will be great.
Thanks Vladimir, we appreciate your participation.
Cheers
Paco
Microarray is matured technology with established statistical frame work. Huge data were generated in the last decade from micoarray. RNA-seq is in developing phase. It will be more benefit if we discuss how we can correlated the previously microarray data to new RNA-seq data and how to build the RNA-seq data as robust as microarray data with quality control and standard process from RNA-extraction, library prep and statistical frame work for the analysis. .
As in any other field of Science and Technology, we are constantly moving towards more efficient and reliable methods to measure biological parameters; in this particular case, gene expression, or more accurately, RNA quantity.
In this sense, microarray technology was the expectable evolution from the old hybridization-baed Northern Blot, as RNAseq is probably descending from NGS and from pyrosequencing. I think that we can all agree to this, and also to the often forgotten fact that when someone uses any technology, he/she must be aware of its limitations, biases and flaws. I do not think microarrays brought the definitive answer to gene expression analysis, but they sure opened the door for genome-wide thinking, and we have learned a lot from those experiments; mind you, from the good experiments. This is also an extremely important point, because tons and tons of microarrays have been (and still are) run without the proper phenotype description of the samples analyzed, without using standardized techniques, and without the appropriate comparison groups and sample sizes. This has brought about an inflation in the amount of data, but has lowered the quality of the information and conclusions that can be robustly drawn from them.
RNA-seq is a marvelous tool that will help to learn more of the myriad of transcripts that arise from DNA genomes, and can give multi-dimensional information in terms of quantity and structure of transcripts. Now, how we use this technology, how careful we are with sample processing, comparison group selection, sample size... and everything that is important in any biological experiment... this will make the real difference between data that is reliable (as are thousands of good Northern blots, RT-PCR experiments, microarrays, etc.) and will remain reliable when this technology is overcome by the next, and the presumably huge amount of rubbish that will accumulate alongside, as has always done.
Standardized methodologies, robust QC criteria and requirements for publication, publicly available data repositories and above all, scientists' responsibility are necessary to produce reliable and perdurable results, whichever technology we use, be it space-age or stone-age.
Not the science learned by microarray but down the line microarrays will go. RNA-seq is evolving and as any technique or process takes time to establish both in accuracy and cost so the case with NGS techniques. Microarray as being more reliable and established, has also provided a good amount of help in proving and improving the scientific viability of RNA-seq transcriptomic analysis. As such there is nothing wrong with microarray at small scale analysis but as now we are moving to genomewide , its not capable on handling such however for small scale studies which are narrowed to a particular region/specific analysis, microarray still may be useful to validate our confidence.
Old microarray data generated so far as being more reliable at this point in time while RNA-seq is evolving is really useful and can be used to validate our hypothesis and finding out of RNA-seq.
Also as the scale of analysis widen we are trying to find out many many complex things through RNAseq so the data analysis and tools going to be complex compare to microarray analysis and need a computational expert.
We work on all data sets for a particular protein ( Microarray, RNAseq and CHIPseq also plan in future CLIPseq) and so have good insight into how the all data should be gelled to get the reliable biological outcomes.
Thanks for this great discussion - very interesting! I was recently speaking on this topic with sales representatives from Affymetrix, and they made the point that microarray can be a great independent validation for RNA-seq. I am inclined to validate my own RNAseq data with qRT-PCR, which is fairly labour intensive and requires the cherry picking of a few very interesing genes. This is my best option as I work on non-model organisms. Of course for model organisms with commercially available chips then it is a different story. Given that microarray and RNAseq have different chemistries and different limitations, running both of them in parallel really does sound like a great fast of validating an entire experiment in this case.
@Helen
Very interesting idea... I think that this is a great point to take into consideration.
The data format that is RNASeq is clearly the next step in genomics, and it will be used for far more than differential transcription.
Microarray data is just as accurate and valid today as when it was in its prime. As with all approaches, it has both strengths and limitations. The most critical negative is that microarrays require good bioinformatics making the chip in the first place. Many commercial chips were, shall we say, limited in accuracy.
In our country RNA-Seq's are still expensive. To control individuas of a large sample we have, up this moment, to use microarray expression, although in some projects we have used RNA-Seq
This is a very informative discussion. As a newcomer to Microarray analysis, I am very enthusiastic about its value. Pathway analysis and Biomarker prediction are possible with microarray data, and that gives valuable leads which need to be confirmed with other methods. As a biochemist, I prefer confirmation of microarray data with proteome data...but this is not freely available..Any suggestions on this last point would be appreciated !
@Venil, it depends what kind of confirmation you are looking for, and in what system. There are freely available public repositories of proteomics data, such as PRIDE, but these may not have data from the particular organisms/cell lines etc. you want, and there can be all sorts of reasons for inter-sample variability. The best plan is to isolate protein and RNA at the same time from the same samples and do parallel proteomics and transcriptomics. (Transcriptomics is great for biomarker discovery but does not always have a great correlation with protein levels and may not be so useful for elucidating the actual molecular mechanics of your system.)
https://www.researchgate.net/post/Gene_expression_and_protein_levels_How_good_are_the_correlations?exp_tc=tprc
The value of the expression data depends on fit for purpose experimental design and execution, coverage and quality of informative information including biological sample information and measurements (sensitivity, specificity, accuracy and precision, etc.), data usability and re-usability for additional mining. Such principle can be applied to data sets generated from different technologies. Technology evolves but good data sets keep their value in biology research.
Currently , re-use of data from microarrays is clearly more frequent and more efficient than RNAseq data.
This is because:
- Arrays Express and GEO provide well curated microarray data-sets
- microarray data analysis pipelines are well developed and easier today for non-bioinfomaticians (making pool of people who can actually benefit from microarray data much bigger than from RNAseq one )
However, If cost of sequencing will continue dropping the situation will change soon.
Also I am sure that user-friendly software for people who never heard about Linux , Python , Perl and R is coming as it came for microarray data analysis
RNA-seq or microarrays; both are dependent on polyadenylation - this is an examle of a common bias. Both are only screening methods. Both can be really valuable if one can select something interestig. Changes of RNA levels itself are not very informative because transcription regulation is relatively expensive for the cell.
Dear Kshitish K Acharya,
My direct answer to your question is that DNA microarrays are crucial until we can have a similar method for RNA sequencing (money and time wise).
We can still extract loads of precious data with more advanced analytical techniques not available at the time when the microarrays were first made.
In a matter of fact, my lab is analyzing online available data sets to complement our findings.
@Przemyslaw, it depends how you prepare your library. It does not have to use poly-A selection - you can use ribosome depletion to go for all RNA including non-coding RNA, for example. RNASeq is also not just a screening method - it can be used for de novo discovery if done right.
RNASeq is not just a screening method and it depends on well established methods to obtain the starting material just as microarrays do. Yes, RNASeq is not the future but the present although microarrays are more versatile and easier to use. An additional power of microarrays is the ability to use independent studies using available data sets.
I shall agree almost with Alexandre! I do not know if it is mofre versatile the microarrays. It not cover the whole possibilities. One needs to define a priori what the part of DNA is going to be covered.
Dear Carlos,
I fully agree that microarrays have the already known limitations. I have been predicting the problems of the non-coding regions since the human genome program had its impact. Maybe the RNASeq can help us with that Pandora's box where DNA Microarrays cannot help us anymore.
@Tong Zhu Very well said. Your excellent statement is the core of every study in science. Good quality data (both measured features and metadata) are the basis for the value of a dataset, created by whatever technology.
Well, RNAseq has a great advantage over microarray, Zero means Zero.
As a technique, it is an ancilliary method, not definitive one.
Zero means Zero is only half-way correct :)
There is still the issue of limit-of-detection and limit-of-quantification.
I believe that even in RNA-Seq, 1´s and 2´s may really mean zeros and have to trash those frequencies.
The reality is very simple to analyze and describe. We have detected some ghost effects with RNAseq and the results are everything but 0 or 1. We advised one of our Ph.D. students to abandon his RNAseq experiments because he was running after faint signatures.
On the zero matter, of course, one should bear in mind the QC on the RNA preparation.DNA traces will eventually bias the call of 1's and 2's as false positives, but you can estimate the genome background in order to correct for those. Therefore one needs strict QC before library prep to decrease to acceptable the levels of the DNA contamination in libraries. Secondly I agree that greater depth in sequencing combined with more biological replication can transform preferentially zeros in true positives if you are looking for rare but significant events.
Microarrays data are very useful for basic biology if (and only if) acquiring a statistical mechanics perspective by studying the genome dynamics AS A WHOLE without concentrating on single genes...signatures are an illusion...
Alessandro, I would not be that strict. Screening for strongly and consitently regulated genes is a valuable tool, and such genes may provide a "signature" for a particular biological process in a given context. This has been successful in the past for many today well-known (and thus "boring") indicator-genes (identified in northern blots and PCRs). But you are absolutely right: creating "signatures" simply from gene sets that have been selected based on hypothesis tests will presumably only double the trouble.
Correct dear Jochen, the point I was interested to stress is the emphasis on post-hoc signatures that is very well criticized here:
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002240#pcbi-1002240-g006
On the contrary I think microarray are extremely useful to study biological regulation in terms of dynamical systems, i.e. on a physically grounded perspective (a two metres long molecule compacted into 10 micron space as DNA is cannot reasonably be imagined as RAM of a computer or an infinitely accessible Turing machine only ruled by logical Watson-Crick rules between transcription factors and promoters) taking into account the global effect on genome-wide expression of chromatine folding/refolding dynamics and in general the defintion of macrostates (see Sui Huang illuminating articles on the arguments)..we did some applications too:
http://www.biomedcentral.com/1752-0509/4/85/
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0012116#pone-0012116-g005
Thus I'm convinced microarray studies have a great potential to make biologists to acquire a different , less genocentric/essentialist, and more physically grounded attiotude to their matter of study.
I would like to have a sample of patients with both measurements, microarray and RNA-Seq. Do any one knows how to find such a public data bank?
It would be wounderful for us to have a comparison of them. It would be nice to have patients and control with both measurements.
Hi Carlos,
Check this one:
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197
RNA-sequencing of 60 HapMap CEU individuals
Experiment types: DNA-seq, co-expression, high throughput sequencing, in vitro, individual genetic characteristics
It's from 2010 but is has been updated recently.
I'm sure there are many others out there:
http://bowtie-bio.sourceforge.net/recount/
http://seqanswers.com/forums/showthread.php?t=20469
Hi Carlos,
Take a look at TCGA:
https://tcga-data.nci.nih.gov/tcga/
It's a huge and useful source of data.
Luciano
Do we have a current review on the real solutions that we have from Microarray and RNAseq in Plant science?
Hi T.O. Magomere,
I haven't followed plant biology for a while but there are some interesting recent papers to answer your question.
For example:
The transcriptome landscape of early maize meiosis.
BMC Plant Biol. 2014 May 3;14:118. doi: 10.1186/1471-2229-14-118.
Dual RNA-seq transcriptional analysis of wheat roots colonized by Azospirillum brasilense reveals up-regulation of nutrient acquisition and cell cycle genes.
BMC Genomics. 2014 May 16;15:378. doi: 10.1186/1471-2164-15-378.
There are many recent papers using both methods.
will be perusing through the publications plus others thanks Alexandre
Comparison of independent micro-array experiments and extraction of meaningful information from such comparisons is complicated and difficult in most cases, but definitely NOT impossible. Robust statistical methods (developed by the rare human sub-species) can definitely save our lives.
It is difficult to compare independent data mainly due to the lack of common standards regarding experimental design, sampling stages, intrinsic heterogeneity generated from various platforms and used methods for evaluation of the resulting gene expression data. Another difficulty is the number of sample size needed for controlling the FDR. Level of difficulties are considerably less in comparing large-scale data generated from a single experiment under homogeneous experimental and control conditions. Such a mega scale experiment consisting more than 500 homogeneous genome scale micro-array data has been published last year (http://www.ncbi.nlm.nih.gov/pubmed/23447525).
As mentioned above by many others, meta analysis of microarray data can be tremendously useful. In a way, genome scale gene expression analysis (irrespective of microarray or RNA-seq) provide an overall idea of the global transcriptomic signatures and are useful for differential pathway or GO analysis. But, for any significant biological conclusions, we have to go back again to our specific gene of interest. It is again a reverse journey of holism to reductionism.
Another concern is the highly dynamic nature of transcriptome. Microarray data at high temporal resolution will be of great advantage. Such data can help to develop predictive computational models with enhanced accuracy.
The great strength of RNA-seq technology is the usability for non-model species. In recent days, It has also been used for metatranscriptomic studies.
Combination of transcriptomics and regulatory genomics is becoming an interesting trend these days. Correlating differential gene-expression with the sequence variations (SNPs/SNVs) can explore any potential regulatory role of these SNPs/SNVs. RNA-seq has it's advantage in this regard.
Dear Pankaj Barah
I believe your point of view is correct and the statistical comparison will make sense in the PRESENT but not required in the FUTURE. RNA-seq should be considered a stand alone technique and kept apart from DNA microarray analysis. One technique may support and complement its results/findings of the other.
Hi Alexandre
Irrespective of present or future, DNA microarray or RNA-seq; statistical comparison is always an unavoidable requirement. Probably it would be more automatized through user-friendly pipelines like 'sample in -> data+results out' type interface. People have already developed such pipelines for both DNA microarray and RNA-seq. I do agree on the advantages of RNA-seq over microarray but don't completely agree with your point about considering RNA-seq as 'stand alone technique'. Multifaceted complexity of biological system can not be explored with a 'stand alone technique'.
Evolutionary life cycle of modern omics platform are comparatively very short. I wish, future developments could be focused on more flexible designs with the possibility of sequential up-gradation of the existing platforms to the newer one instead of completely replacing the old one.
Hi Pankaj
I understand your interpretation of my comment and I think I was not clear.
When I pointed RNA-seq as a stand alone technique I was not suggesting it was sufficient to verify your point of view or your findings. Just like when we started using the DNA microarrays, we always had to use other methods to confirm the new findings. I was often asked for qPCRs or just RT-PCRs, not to mention confirmation of gene dosage effects or physiological effects.
Regarding the statistic analysis implementation using a more user-friendly platform that was always something I had problems with.
Thank you for helping me clarifying my comment.
The only wasted microarray data are those that are of poor quality or where there is insufficient data (MIAME or clinical annotation) to allow their appropriate use. There are certainly plenty of worthless microarray data sets out there but these were poor before the advent of RNAseq. As others have noted, there are still substantial drawbacks to the new technology - it will take a little time to sort these out. Microarrays still tend to produce more robust data at a much lower cost (no doubt we could get a god argument from those who have invested heavily in the latest toys). Its also not clear that you get fundamentally different results (what's up or down with one technology is often also up or down by the other). I am hopeful that eventually the cost and quality/robustness of RNAseq will overtake microarrays.
For the moment, I have yet to be convinced that there is much to gain (yet) from RNAseq other than a larger bill and a bigger hole in your grant funds. If you just want differential expression, a well designed and executed microarray study will cost you less, likely offer better quality data, and (with a bit of luck) you will find comparable independent datasets in the public domain to allow you to do some good, independent validation of your results.
If you need differential expression and sequence information, then you will likely be better off with RNAseq.
My 2 cents worth...Bob
And what about low quality RNA-seq datasets that migth also be present on our current data repositories and we are not aware about? In fact other related sequencing based technologies like ChIP-seq replaced the previous ChIP-chip but their quality remains still to be evaluated...just that you had an idea about the heterogenity of such results you may come to our website: www.ngs-qc.org where we are collecting quality certificates for all publicly available ChIP-seq and enrichment related datasets (inlcuding RNA-seq)...the real issue is that metrics for assessing quality in a universal manner are(were) missing...we have developped a method that tries to answer to such question and we are using to certificate public datasets!!!!
There is plenty of signal to extract from older datasets, the challenge is to harmonize the data in a computable framework, and extract this signal. In GeneWeaver.org, we have created a resource that allows investigators to evaluate sets of gene expression data from multiple species and platforms to find convergent evidence for the roles of genes and gene products in user defined queries. The database of 75000+ gene sets, includes many legacy microarray data sets, QTL mapping studies from stocks that no longer exist, and other seemingly obsolete data. User data can be integrated into the analyses and kept private if desired. The idea is that convergent positive matching, which we find through combinatorial algorithms, will reduce the impact of false positive and spurious associations.
Regarding the original question, I think that since published microarray data can always be consulted, actually there are also pubic repository, they have not been wasted (except if you consider that the original prices were very high and they are cheaper now). If you wonder about specific responses to a stimulus you can search databases and find microarray data set, which provide first evidence justifying further inquiry with QPCR, Western blot, immnuno histochemistry, etc... so no waste there.
RNA seq and other NGS method do not really provide the same types of answers but they allow discovery of novel phenomena, like gene fusion in some pathology, some had already be described but rarer types or types which are specific to one individual could be detected. Etc...
I am surprised about comments on the price of RNA-seq, I am actually opting for that method because I can process more samples for what it would cost to do just one regular microarray.
But one definite disadvantage (although probably not for long as faster sequencer are coming) is the sequencing time, microarray take something like a week from scratch, RNA seq takes 2 weeks from when the libraries are made, so with our ever shorter work contract microarray still has a future.
in our differential gene expression studies it's not uncommon for us to have 100-200 samples (multiple doses, time points, compounds and 4-8 biological replicates for all, plus sometimes multiple species and both sexes). The time factor for RNA-seq is a big issue - it is nowhere near the throughput of microarrays for the same cost (since that time difference equates to peoples time, which is expensive).
So for awhile yet, for large scale toxicology assessment or risk assessment based studies, microarrays are still much cheaper and take far less time. Although admittedly, NGS data is more informative in terms of insights into mechanisms of toxicity, but for those sorts of studies, one could down-size experiments a bit to decrease the time to complete the study.
Also, even for our in-vitro work, cells and cost of cells are often the limiting factor - to get enough material for an RNA-seq library is often simply not cost effective, not given the number of samples we need to run to cover our desired dose or time range. We can still manage microarrays with much less starting sample. Techniques like laser capture sampling also often do not yield enough material for RNA-seq experiments, but can be useable for microarrays.
So there are simple practical issues that indicate to me that microarrays will be around for at least some years yet. Sample throughput being just one of them.
The old microarray data is not a waste, it is a mountain with treasures.
Biologists produced a lot of data, performing and performing experiments. Often they have not time and ability to analyze them. These data are waiting for us-- bioinformaticians to use and explain them. Some rich laboratories made a good job producing solid and extensive microarray data but use them from the one angle. There are many others angles exist to look at this data. As Paco was mentioned, our laboratory also used public microarray data and we have good and novel results Of course, sensitivity of earlier microarrays leave much too be desired, but we still can use them, using our knowledge, developing new methods and algorithms to process them.
Agree that existing data from microarray is not a waste. With all their particular pros and cons, microarrays have been a major contributor to the development of biological research based at high throughput scale. The microarray files contained in databases are an archive that could be mined with new questions, until a competitive technology such as NGS or other proves clearly superior and as widespread in the scientific community.
Hie
I think that every technics have their biases. you should not ask for the entire and full truth. For microarrays you had to confirm what you see but it's useful as first approach. If you take freedom with the statistical analysis and don't see the PCA axis only by importance but you search which axis underline you experiment, you can find very good candidates genes even if you use data from inexact replicats conditions. I did it successfully from different arrays types, The worth is to build a synthetic dataset with genes lists differents (and same name but distinct sequence). Microarray are not a result at all but a tools wich help to find the way.
good evening (sorry for syntax)
We have switched completely to RNA-Seq. It provides quantitative and comprehensive expression data. In fact, we have retired the RT-PCR machines, too. Why? It is more cost effective to add in a few extra samples to the RNA-Seq runs thereby increasing the N and making the statistics more reliable. This also eliminates the need for post-hoc verification by RT-PCR. A complete waste of time. Just do more samples.
RNA-Seq provides real counts of sequenced transcript fragments and unequivocal identification of the fragments rather than a fluorescence reading. Plus, the two ends of the "gene expression spectrum" are well assessed. It excels at showing, quantitatively, highly expressed genes and low expressed genes (and no-expressed genes). We frequently see reports of laboratories examining genes that are expressed at an inconsequential level, or have chased down changes that don't really matter. We also see reports that have been led astray by RT-PCR results that are due either to contamination or over-ampification. When we cross reference to our RNA-Seq datasets we see little or no expression of the targets! A recent example is an upcoming paper from our lab in Molecular Pain about lack of expression of the itch peptide gastrin releasing peptide in dorsal root ganglion.
To continue: We also don't need to go back to subtraction cloning and differential amplification techniques either. No need for these methods for detecting differences in tissue states or induction since all the data are obtained with RNA-Seq.
When you get the expression level for all the transcripts and you obtain multiple time points and multiple manipulations you can discern new patterns and events that no other method is capable of providing.
This is just my two cents after a few trillion bases of sequence information.
MJI
Not a waste at all. The number of microarrays in public databases is growing and also the bioinformatics methods for processing and integration. It is true that RNA-seq provide other possibilities, however, that don't change the value of the already available information. If the microarray study was done properly and combined with clinical information, the applicability can be increased as well as reliability. In fact, there are some papers using this tech in diagnosis/predictions and the results seems promising.
On the other hand, in the bioinformatics arena, microarrays remains as an interesting experimental background for modeling and prediction (no problem free) and from my perspective, an interesting/active area of research. However, should we increase the RNA-seq assays and the processing methods? certainly yes, and bioinformaticians should be aware of that.
Microarray is an expensive method that produces massive amount of data difficult to analyze and/or interpret. It could be used as a first approximation to analyze gene expressions. Then you should move to more precise and, in some extend, cheaper methods as SAGE, RNA seg, and Tilling array for higher plex, or Nother, Western, FISH, RT-PCR for mid and lower plex.
Looks like the answers are so long to read through. Here is my short version.
1) Microarray is cheaper and faster than NGS sequencing;
2) Microarray may detects more isoforms;
3) How the samples are prepared is very much a determinant to your reliablity of the results;
4) Since the nature of gene expression heterogenecity is everywhere, the ultimate solution is single-cell RNA sequencing;
5). Before all the availabity, you have to compromise according to what you have in hands. Again, microarray is just a preliminary screening step, there is a long way to go once you have the results.
6) For those existing array deposit, the ones that marked significant may be trustworthy, the ones that had not being marked may still be hallmarks under certain microniche.
Many of the comments state that it is worthwhile to reanalyze Microarray data. But, I see few publications showing results of such data mining. Also, can anyone suggest which journals would accept such work -especially if one lacks wet lab confirmation of the new microarray results??.
To really judge about data from array platforms in general, I think it's important to rely on valid test data. Recently, there has been published an interesting article about the evaluation of quantitative miRNA expression platforms (see below), indicating that there are indeed differences between platforms. One important point is, that the authors suggest to choose a platform based on the experimental setting or a specific research question. By this means, I think, one can at least try to avoid or minize the risk of producing conflicting results.
Who is interested in further reading, follow the link below:
http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3014.html#methods
I would agree that of course microarray data is neither wasted or still a waste of time.
I would however point out that comments such as that provided by Michael Ladarola from the NIH are entirely out of step with published evidence and extremely unlikely to hold true for many situations where you want an RNA profile. It such 1-side views that disrupt progress and slow science down.
If you have no limit to your budget and are interested in discovering the RNA landscape RNAseq will be excellent - but as pointed out by others - its still a method that relies on PCR and it will simply not yield accurate information on all RNA species. And it is certainly not quantitative across RNA species e.g. microRNAS. Further, you use data models to determine what you are actually detecting and those have arbitary features - you get lots of data but some of it you have to estimate (in a statistical sense) what it might be.
If you are interested in technical replication (e.g. r=0.99), translational medicine and using RNA profiling to catergorise patients or predict clinical outcomes, the Affymetrix array platforms offer superior reproduciblity, FDA approved platforms and processes and a proven track-record of working in clinical trial settings. The EXON/HTA platform yields data on the transcriptome and enables splicing analysis.
You pick the right tool for the job.
upon what questions you want to answer, microarray still can be very useful, especially for identify and compare genome wide expression patterns, as long as with a good experiment desgin and enough tech/biological replicates. The power (statistic) to detect low expression change/difference for microarray is much lower than RNA seq, and many use microarray for very first step to identify interesting questions or candidate gene/pathway. For multiple sample points, microarray is cheaper and efficient way to verify expression for many genes than RT-PCR.
Hi James and colleagues
First off the last name is Iadarola with an "I" not an "L".
Secondly, my frame of reference with microarray is with spinal cord tissue. This is a very cellularly complex tissue and frequently the signals after an upregulation paradigm are not large. some of my colleagues who work with less complicated tissues with a more uniform cellular composition seemed to get good results with microarray. I attribute these positive results and the apparent positive voice for microarrays from the discussion to experimental paradigms using cultured cell lines or cellularly uncomplicated tissues. Nonetheless comparison of the results obtained with an Affymetrix array in spinal cord and RNA-Seq indicate that RNA-Seq is superlative in all respects for this tissue and transynaptic, in vivo up-regulation paradigms.
Thirdly, in terms of cost savings, by the time you finish up verifying microarray results with RT-PCR, you could have done the same experiment with RNA-Seq and added in a couple of extra samples or time points and come out ahead in terms of cost when the time of the researcher is taken into account. And you would have a more statistically reliable reliable result, AND all the genes in the sample too. The value of the latter cannot be understimated since unexpected expression signatures show up.
Fourth, obviously you will not get microRNAs if you analyze a polyA+ mRNA sample. but the method needs to be adjusted for the question. Same way an array would have to have the miRNA probes included.
Fifth, I do disagree with Zheng about the lower limit of detection. RNA-Seq is far superior to microarrays in this regard. Plus, the method is scalable. Adjusting the depth of sequencing clearly increases the ability to detect low expressed transcripts in a reliable fashion. We have done this numerous times. Although it will not solve the general problem of working with low expressed genes or proteins.
Sixth, the ability to map the transcripts completely and accurately does require a well annotated genome database. We see differences between human, mouse which are well annotated versus rat which is less so but getting better.
Seventh, In terms of archival value. I think RNA-Seq will provide the correct level of defined, sequence based information that archival datasets require. Mainly since the results are based on actual sequence rather than hybridization and fluorescence.
I guess you can tell that my lab will never go back to a microarray.
Hope this helps.
Mike Iadarola
Hello Mike,
I guess we will have to agree to disagree then.
For example, we don't validate microarrays using rtQPCR. Statistically it has always been complete nonsense to pick 5 most regulated genes and do rtQPCR and declare that the array is working. One should pick a large random selection - and that is never done. Plus rtQPCR has its own problems.
Instead, we aim to rely on independent clinical data-sets and confirm either sets of genes (e.g. performnce of classification signatures across studies or array platforms) or for more biology orientated analysis we ensure that we yield comparable pathway/ontology profiles from the independent data-sets. So rtPCR is not a valid factor for considering either high throughput manner.
The issue with miRNA is more simple - they have a very large range of GC content and this impacts on whether RNAseq or arrays can detect appropriately - yet RNAseq was sold as a solution for miRNA abundance - which is clearly not true - its a false claim.
I'm glad you are getting good RNAseq data - however most don't and at great cost. I noted a recent adipocyte paper (we work on those too) that could annotated ~9500 genes using RNAseq (seq depth of 50). Using Affymetrix EXON arrays we could identify >17,000 genes in the same cell type. Most likely it was poor quality informatics that explains this (yet one would expect at Harvard they would know what they were doing...)
Regards
Jamie
Hi Jamie,
Do you work with isolated cells, ex vivo tissues or both?
We seem to agree on using more replicates to increase the N to obtain reliable results.
For miRNA, I don't have first hand knowledge comparing the two platforms for miRNA so can't comment in an informed fashion.
We routinely see ~13,000 reliably expressed genes in a tissue like dorsal spinal cord or dorsal root ganglion or sorted cells obtained therefrom. I can push it to ~16,500 by adding in very low expressed genes. So the number comes quite close to 17K. But reliable expression e.g., equal to or greater than 1 RPKM yields 13K and adding in from less than 1 to > 0.5 RPKM gives another couple of thousand. But it will be a headache to do westerns on genes expressed at that level.
If you analyze a particular cell type (or some few types, like different neurons) that are differentiated, do you think that 13-17K expresses genes (of a total of ~21K) is reasonable?
Hi, Michael,
" The power (statistic) to detect low expression change/difference for microarray is much lower than RNA seq" -- microarray is not good as RNA seq in detecting small difference in expression, so it has lower statistic power, not lower statistic limit. In another word, the scale in RNA seq is much smaller (higher power) than microarray. I did not say microarray has lower limits... just to clarify this!
Hi Guys,
I guess both methods have their supporters.
One thing needs to clarify, if we really want to know the isoform information, especially those with different TSS, or different 3' end combined with internal cassette exons, the full lenth coverage is required, and Exon array is impossible to detect it, neither Illumina's current pair-end tool. But long read RNA seq is the one can achieve it.
Jochen, for gene numbers, Ezkuradia I, et al, just published a paper recently which showed the total human genes is maybe as few as 19000. It's understandable, some genes are expressed in very low level, and even not constantly expressed.
As effervescent is this (interesting) discussion ! It is still taking the time of people !
Over these past 15-17 years, sequencing-based methods have always been the main "competitors" of microarray technology.
It is not only just now with the emergence of RNA-Seq (NGS) that a group of researchers sought to overthrow microarrays.
I remember clearly the time of the SAGE technique when it was the main competitor of microarrays. Almost nobody currently uses SAGE and microarrays have continued to evolve. The results of SAGE are a waste? Of course not.
Many important results have been generated using this methodology.
What is interesting is that microarrays remain on the market and continued to evolve. Evolved in their manufacture, coverage and format and also in data analysis (bioinformatics).
We cannot forget that it is the experimental design and the type of data analysis that makes microarray data useful. This is valid for RNA-Seq.
Today microarrays are very robust and we work with very small amounts of total RNA (a few nanograms). There are several bioinformatics pipelines available and data are usually deposited in public databases (ArrayExpress or GEO) for later use. It is an appropriate and proven mature technology.
The advent of RNA-Seq technology will help the technology of microarrays. With the discovery of new sequences, especially those poorly expressed will increase the coverage of microarrays.
What I see is that the technology of RNA-Seq does not nullify the microarrays but will help improve its coverage with the discovery of new transcribed sequences.
The bottleneck is in bioinformatics to analyze the huge amount of data from RNA-Seq.
Thanks for the discussion !
The field of genomics is an exciting one and the gradual technology shift from microarray to RNA-Seq is a fragment along the continuum of genomics technology advancement. It is certain that RNA-Seq will be replaced by some other newer technology in the future. Will all the RNA-Seq data that we are accumulating today be a waste then. All genomics data have their value, and as colleagues have elaborated above, their value is not just historic.
Some would say that today's RNA-seq will soon be displaced anyway, by high-throughput single cell whole transcriptome sequencing. Instead of analyzing a tissue sample, you will soon simultaneously analyze a whole population of individual cells to look at not only the average cellular genomic response of a tissue or sample, but the variance in response across those cells. What we know today as RNA-Seq may only be a short lived transitional step in technology.
for example, see
http://www.rna-seqblog.com/tag/single-cell-rna-seq
http://genome.cshlp.org/content/24/3/496
areI do agree with your question in point. For all the results generating from each steps in the procedure of microarrays technique, the possible standard error of the means ranged from 5-10%. When you multiple these 5-10% 3 to 4 times (steps), the final possible standard error of the means will be completely unreliable. That is why the digital Next-Generation-Sequencing technique is adopted to replace the microarray methodology.
I agree on the higher error rates in microarray. But I am also sure that the microarray data can still be put to do a good use.
I would like to believe that we have demonstrated this through our work. We will soon produce some more useful information using the existing microarray data.
What is the error rate for RNA -Seq? Can one compare RNA-Seq and Microarray data from the same set of tissue samples, and then look for false + and false - from each method. Are such data-sets available? Is such a comparison enough to answer the queries raised in this debate? Hope this helps.
In RNA-Seq or DNA Seq, we have their GENE BANK data to match with to get the reliabilities, however, in Microarray data, we do not have peptide sequence analysis for each spots and a Peptide Data Bank to match in computer as conveniently and reliably as the previous DNAs. A lot of Data Bank Data makes the difference between the two.
Venil N Sumantran: Others also seem to be more interested in this aspect (e.g. Carlos A de B Pereira mentioned similar aspect in his comment above & Alexandre Gonçalves replied to this). There are many other direct comparisons.
Examples include:
PLoS One. 2014 Jan 16;9(1):e78644
BMC Genomics 2012, 13:629
BMC Bioinformatics 2013, 14(Suppl 9):S1
Identification of false positives will not be straight-forward at least (our meta analysis method [BMC Genomics 2010] can be one of the ways). How do we validate? Helen M Gunter and Robert Clarke have earlier discussed about validation of RNA-seq. Error rates were also discussed earlier in some other contexts in this answer-series.
I think most scientists would not question the advantages of RNA-seq. But many hesitate to even consider the value in already existing microarray data. This prompted the question here. My take on this issue: while there seems to a good amount of microarray that is indeed a waste, there is a large amount that very valuable. I have earlier quoted our work to support this statement.
I appreciate the discussions on the merits of microarrays but.......
It is pretty evident that the microarray has now been superceded by RNA-Seq.
This paper in the area of gene regulation in nervous system in various pain models found very little reproducibility between studies using microarrays.
LaCroix-Fralish ML, Austin JS, Zheng FY, Levitin DJ, Mogil JS: Patterns of pain: meta-analysis of microarray studies of pain. Pain. 2011 Aug;152(8):1888-98. PMID:
21561713
This is not to say that there is no value in many of the datasets generated by microarrays but there is now a better tool. RNA-Seq has high precision, more accuracy of gene identification, and better quantitative results.
These are the reasons we have switched.
Researchers studying different model-systems or organisms have compared the two technologies (microarrays and RNA-Seq) and reached the conclusion that when the objective is to evaluate the differential expression the results offered by both types of methods are comparable.
Sure, for the discovery of new sequences, RNA-Seq is the method of choice.
I suggest the following references, among several others:
Rudy J and Valafar F (2011) BMC Bioinformatics 12: 467
Hu HY et al (2011) Plos Genetics 7(10): e1002327
Kroster MB et al (2012) BMC Genomics 13: 596
Liu X et al (2012) Genome Res 22: 611
Raghavachari N et al (2012) BMC Medical Genomics 5: 28
Raherison E et al (2012) BMC Medical Genomics 13: 434
Palmblad M et al (2013) BMC Res Notes 6: 428
Rasche A et al (2014) Nucleic Acids Res (june 11)
Black MB et al (2014) Toxicol Sci 137 : 385
The community of researchers who mostly used microarrays in their studies until now arrived at the conclusion that the RNA-Seq method will contribute to the improvement of arrays.
The more new sequences are discovered, the coverage of the arrays also increases.
Scientists who have the opportunity to use both types of methods know that ideally we use both. I keep saying the bottleneck is in the data analysis.
From now what we do know and/or discuss is that bioinformaticians are developing for us to do our analysis.
I agree with Geraldo, last week at ISMB2014 we were presenting the first outlook of the results obtained within the SEQC consortium (http://www.fda.gov/ScienceResearch/BioinformaticsTools/MicroarrayQualityControlProject/#MAQC-IIIalsoknownasSEQC).
The general message is that both technologies have their strengths and limitations and we see that these two are complementary. We are aware of more limitations of microarray at the moment just because is better studied.
In the late August the set of papers from the SEQC consortium will be published on-line at Nature Biotechnology and they will appear in paper in September issue. I will post here a link to the publications one they will be available. Below I am pasting an ISMB talk abstract:
We present an extensive multi-centre multi-platform study of the US-FDA MAQC/SEQC-consortium, introducing a landmark RNA-Seq reference dataset comprising 30 billion reads. Several next-generation-sequencing, microarray, and qPCR platforms were examined. The study design features known mixtures, wide-dynamic range ERCC spikes, and a nested replication structure -- together allowing a large variety of complementary benchmarks and metrics. We find that none of the examined technologies can provide a ‘gold standard,’ making the built-in truths of this reference set a critical device for the development and validation of novel or improved algorithms and data processing pipelines. In contrast to absolute expression-levels, for relative expression measures, good inter-site reproducibility and agreement of across platforms could be achieved with additional filtering steps. Comparisons with microarrays identified complementary strengths, with RNA-Seq at sufficient read-depth detecting differential expression more sensitively, and microarrays achieving higher rank-reproducibility. At the gene level, comparable performance was reached at widely varying read-depths, depending on the application scenario. On the other hand, RNA-Seq has heralded a gold-rush for the study of alternative gene-transcripts. Even at read-depths beyond 100 million, we find thousands of novel junctions, with good agreement between platforms. Remarkably, junctions supported by only ~10 reads achieved qPCR validation-rates >80-100%, illustrating the unique discovery power of RNA-Seq. Finally, the modelling approaches for inferring alternative transcripts expression-levels from read counts along a gene can similarly be applied to probes along a gene in high-density next-generation microarrays. We show that this has advantages in quantitative transcript-resolved expression profiling. There is still much to do!
That's a lot of data Pawel! Thanks for sharing.
As many, and recently Michael Iadarola, said -RNA-seq is the current choice of the majority for most transcriptomic analysis. But so many donors' samples and scientists' efforts have gone in generating huge data by microarray. I was only talking about making a better use of that. Opiniating that all microarray data is useful is not good. But thinking RNA-sequencing is the only way to answer 'every question on the transcriptome' is not good either. It is possible to make use of available microarray data in some cases - depending on the question in context.
Comparatives studies will help others to decide cases where it would be possible to rely on existing microarray data. Across the comments by me, Michael, Geraldo A Passos and others here - we seem to have listed quite a few comparative studies on the comparative results! I am tempted to sit and compile more, analyze and extract specific aspects & hidden observations from all such comparisons (we do have our own humble amount of comparative data)! I guess this only make an incremental progress in science - given so many have already reviewed/commented in their studies. But a clean job will help gene expression related science? Anyone thinking similarly, and willing to collaborate?
I was astonished with the SEQC consortium.
There's so much information to absorb I don't dare to say anything else.
Thank you very much for sharing this information.
Since many of us are drawn to comparisons between RNA-seq and microarray data, I wish to stress that the comparisons are not that straight forward to conclude from. A lot depends on the specific details of the comparisons conducted and we can't easily extrapolate and generalize on either side (RNA-seq vs. microarray)!
I can surmise that most of us give a good credit to the value in the existing microarray data, while some (like me) caution that a substantial portion of it may not be usable. Then there is EST data - which is also huge, & of course very valuable even today. There has been a strong relationship between ESTs and microarrays. There have been other types of variations in chip-designs. All this makes it difficult to extend findings from one study (even the comparisons)!
Scientists wanted to explore proteins specific to tissues and conditions. But they found their prediction easier via mRNAs. ESTs were the first effective way to screen expression of all protein-coding genes. Then came microarrays, many of which used the EST data to design probes. Looking at the more refined ESTs and other recent sequence-data, some of the earlier designed microarray-probes appear faulty. The microarray design parameters (e.g. melting temperature assessment, number and regions of probe-selection) also have been incorrect in many cases - I think. These have added to the variations across microarray experiments & their results. The type of microarrays used in comparative studies matter. One cannot extrapolate observations from comparisons of 'Affymetrix exon arrays vs. Illumina RNA-sequence data' to what might be the case with cDNA arrays or Pac Bio sequencing!
As have mentioned a few weeks ago the SEQC consortium papers are now available on-line:
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2957.html
Detecting and correcting systematic variation in large-scale RNA sequencing data
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3000.html
The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3001.html
plus complementary one by ABRF consortium:
Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2972.html
Dear All
This topic is becoming more and more hot. Just take a look to the recent paper published in PLoS ONE. Best. Paco
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126545