I do not understand precisely what is a unigene? I've been looking on the internet but I always find UniGene NCBI database as a result. Does anyone know the definition of what is a unigene?
from De Novo Sequencing and Transcriptome Analysis of Wolfiporia cocos to Reveal Genes Related to Biosynthesis of Triterpenoids (S Shu 2013): "Trinity [a program] combined reads with a certain length of overlap to form longer fragments without N, which were called contigs. These contigs were subjected to further processing of sequence clustering to form longer sequences without N. Such sequences were defined as unigenes."
Basically, it is a collection of expressed sequences that are aligned or locate to same position on genome, but not enough is known about them to call them a gene.
UniGene identifies transcripts from the same locus; analyzes expression by tissue, age, and health status; and reports related proteins (protEST) and clone resources.
unigene is primarily a database in NCBI. But unigene refers to cluster of genes that perform a particular function. Broadly we can tell, clusters ESTs and other mRNA sequences, along with coding sequences (CDSs) annotated on genomic DNA, into subsets of related sequences.
You are correct; UniGene is a database and not a biological concept. It contains all of the RNA molecules produced by a cell. This is a pretty cool database since RNA production is not static.
Thank you all for the answers. But I still do not understand as I read in one article they said, Transcriptome de novo assembly is carried out with short reads assembling
program Trinity.And the result sequences of trinity are known as unigenes. It sounds like a biological concept.
UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.
from De Novo Sequencing and Transcriptome Analysis of Wolfiporia cocos to Reveal Genes Related to Biosynthesis of Triterpenoids (S Shu 2013): "Trinity [a program] combined reads with a certain length of overlap to form longer fragments without N, which were called contigs. These contigs were subjected to further processing of sequence clustering to form longer sequences without N. Such sequences were defined as unigenes."
Basically, it is a collection of expressed sequences that are aligned or locate to same position on genome, but not enough is known about them to call them a gene.
as mentioned above, in biological concept, UNIGENE is a unique transcript that is transcribed from a genome. but in transcriptome assembly context, such as Trinity, Soapdenovo-Trans and etc.., unigene often refer to a uniquely assembled transcript (all isoforms from a unique gene). also in some case its better to cluster transcriptome assembly results (isoforms), to avoid redundant transcripts. some tools like TGICL is used to cluster transcripts into unigenes and individual clusters.
What term should be used for assembled unique sequence? Should I use unigene, contig, scaffold, or transcript. Can any body tell difference between all these terms?