ESTs are single sequence reads of a clone. The theory was that sequencing all (or most of) clones of a cDNA library could give an idea of the complexity of this transcriptome as well as of the relative abundance of each transcript in the tissue from which the library was derived. Thus the aim was to mass-sequence as much clones as possible, and to accomplish this a single reaction (read) was enough. Interestingly, if you look at EST databases you will notice that the lengths read increase chronologically, mirroring the advances in DNA sequencing technology
Thanks Estanis for your answer. So the major cause of short est is the sequencing of the clone and not the length of the mRNA sequence that is in the clone. I had read that most est are from the 5' and 3' end of the gene but in the NCBI database I see alot of est's that are in the middle of genes how do these est's that only span a small portion of the middle section of a gene arise?
Yes, Peter, most ESTs are not full-lenght sequences because cDNA libraries did not include many full-lenght cDNAs, specially from long transcripts (i.e> 2kb). They are mostly 5'end reads because 3' reads could include uninformative (long) stretches of the poly A tail. These ESTs that only span a middle portion of the transcript arised from an incomplete cDNA plus a short 5' read. Furthermore, EST sequences were edited, this meaning that the portions of low quality sequence (ambiguously called bases and "n"s) were eliminated from the sequence before sending it to the Genbank.
Because they were used as a tool of the Human Genome Project Sequence for annotation and identification of transcript using Blast, not for cloning purpose. To be used or Blast a 50-100 DNA fragment is enough.
Hi Peter, I m agree with all above answers but in addition to that, I would say the abortive transcription during the stress/ or stimulus under which study is carried out may also one of the reason to get only short length of est's during sequencing. In complete CDNA synthesis and sequencing are also possible reasons..........
Thanks everyone for your answers. I was wondering do people generate est/cdna libraries with the use of the randomn hexamers? this would help would it not in getting the middle portion of large transcripts instead of having a bias towards the 3' end if you do the reverse transcription with the oligo dt primer. Or is there a reason why people would not use randomn hexamers if they wanted to generate an est/cDNA library.
Hi Peter, this is a good question!. The first libraries used for EST-sequencing projects were just plain cDNA libraries, not made especifically for EST-sequencing, and most of these were primed with oligo-dT/anchored oligo-dT. I don't think that there's a reason to choose among random primers and oligo-dT for priming cDNA libraries. With random hexamers you got libraries enriched in 5'ends, while with oligo-dT these were enriched in 3'ends, or at least that was the common thinking in the early 90's.....Large transcripts have been always notoriously difficult to clone, mainly because early reverse transcriptases had intrinsic RNAaseH activity and were "lazzy" when copying ,furthermore most vectors were not designed to clone large cDNAs efficiently