A hypothetical protein is one predicted from an open reading frame analysis of a genomic sequence. The possible coding sequence and putative protein resulting from the ORF translations would be searched against known sequences and proteins and putative homology assigned by similarity of sequence match.
However, it is considered hypothetical in that it is merely predicted from computer analysis of available genomic sequence. But it allows one to fill in functional annotations onto a genomic build rather than just leaving large stretches of raw DNA without any annotation or hint of what they might be.
For an organism which has not been studied much in the past, it may be the only way to begin to annotate a genomic assembly once available, since there may be few confirmed cDNA's and their products in any database for that specific organism. A lot of the early annotations of various mammalian genomes begins with automated analysis of all possible ORF's, then taking their possible coding strand products and translational products and BLASTing them against other known genomes and databases. But the annotation that results from that has to be considered hypothetical initially, until further curation can confirm things.
A hypothetical protein is one predicted from an open reading frame analysis of a genomic sequence. The possible coding sequence and putative protein resulting from the ORF translations would be searched against known sequences and proteins and putative homology assigned by similarity of sequence match.
However, it is considered hypothetical in that it is merely predicted from computer analysis of available genomic sequence. But it allows one to fill in functional annotations onto a genomic build rather than just leaving large stretches of raw DNA without any annotation or hint of what they might be.
For an organism which has not been studied much in the past, it may be the only way to begin to annotate a genomic assembly once available, since there may be few confirmed cDNA's and their products in any database for that specific organism. A lot of the early annotations of various mammalian genomes begins with automated analysis of all possible ORF's, then taking their possible coding strand products and translational products and BLASTing them against other known genomes and databases. But the annotation that results from that has to be considered hypothetical initially, until further curation can confirm things.
Hypothetical protein is a protein whose existence has been predicted, but with no experimental evidence regarding the functionality of the protein. However, the crystal structures which are hypothetical are also deposited in protein data bank (pdb).
In bioinformatics perspective, the hypothetical proteins are predicted by the computational approach of gene prediction during genomic analysis using bioinformatic tools. The gene prediction finds the open reading frames of the un-characterized homologue (hypothetical protein) within the database. Then the tool displays "hypothetical protein". The function of a hypothetical protein can be predicted by comparing the hypothetical protein sequence using conserved domain search within known family domains of the homologue's.
Danish, check the chapter in the NCBI handbook on refseq as an example of the curation process. I seem to remember there have also been descriptions of various centers curation programs in Nature Proceedings, and BioInformatics.
For example with the RefSeq database, once a computerized analysis has produced an Xm_ record (a model mRNA record) and an XP_ (a model protein), the staff will search for GenBank entries of homologous cDNA entries that match that sequence well, and especially if they come from the very same or a very closely related species. A refseq entry that begins with NM_ is one that is confirmed by an actual species specific mRNA entry in GenBank, and there would be then an NP_ protein entry for it.
But different genome centers and groups may have somewhat different curation approaches for what weight of evidence they use to promote a sequence from a model or theoretical one to high confidence or confirmed entry.
Ultimately, what you would want is direct experimental confirmation of expression of the gene.