There are thousands of protein/genes in databases. Those are uncharacterised don't have specific function reported in databases. How can we suggest a function to them? Which tools/ softwares can we use for this?
Even without sequence similarity, guilt-by-association analyses can be useful. Similarity in behaviour of the gene with known genes or with functional groups in transcriptome/proteome datasets or genome-wide datasets of knock-down/overexpression/mutant analyses is a very useful indicator of similarity in function. Many groups have generated algorithms to look at this, depending on your organisms of choice. Basically the only requirement is a large set of datasets, the more and more diverse the better. Of course, these are always only suggestions, and experimental follow up is required.
By searching the aminoacid sequence of the proteins with the Domain Database (NCBI), by similarity / domain structure you can with rather good probability (more often than not) predict protein's function. Additionally, with cellular localization/ membrane folding pattern (particular onlie services gathered and used for instance by Aramemnon), you can confirm your protein's putative function by comparing these features with others, oublished of the putative function.
Hope that roughj algorithm helps, otherwise please hit me on priv for details :)
Similarity search with the known database is the best endowed, This will give a near hit for your protein and then taking those functions as lead we can further characterize and explore the protein to the fullest :) Not a big deal :P
Even without sequence similarity, guilt-by-association analyses can be useful. Similarity in behaviour of the gene with known genes or with functional groups in transcriptome/proteome datasets or genome-wide datasets of knock-down/overexpression/mutant analyses is a very useful indicator of similarity in function. Many groups have generated algorithms to look at this, depending on your organisms of choice. Basically the only requirement is a large set of datasets, the more and more diverse the better. Of course, these are always only suggestions, and experimental follow up is required.
I do accept Smits, Gene knockdown may help in this concern and their transcriptome analyses may add a good knowledge on their function. I could recall an article by Murali et al in Nature Biotechnology, "The art of Gene Function Prediction" He had explained it lucidly. :)
Domain similarity and family structure then through deep searching of motif active site to predict protein function and if doable making assays to get final scop
My work in prokaryotes is sort of simiilar at the moment - basically, my workflow starts as of BLAST, BLASTp, ScanProsite, MyHits and SMART - These will tell you the nucleotide similarity, amino acid similarity, motifs, domains and signalling peptides.
Similarity in these areas may suggest a function for your gene/protein which can then be tested through knockouts, inhibition or stress studies, gene expression microarrays etc.
Sequence similarity searches using BLAST can be pretty effective, which is why so many people use them as a baseline for comparison. A few years ago we used a very simple 1-NN classifier (find the most similar protein sequence in the database and transfer its annotations to the novel protein) and had surprisingly good results. If you're just looking for a quick and easy solution, that may do it for you. As others have noted, you can get a lot more sophisticated if you choose.
One caveat though: regardless of how you perform guilt-by-association, you should be careful about transferring annotations that were computationally derived, as this introduces bias and may propagate errors. The Gene Ontology annotations include codes you can use to see how a protein's functional annotations were determined.
Something to keep in mind is that about one third of the genes in E. coli are unannotated, and it's not for lack of BLAST searches or guilt-by-association:
Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G, et al. (2009) Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins. PLoS Biol 7(4): e1000096. doi:10.1371/journal.pbio.1000096
Plus a lot of homology-based annotation in the databases is just wrong.
A useful tool for this is dcGO predictor http://supfam.cs.bris.ac.uk/SUPERFAMILY/cgi-bin/dcpredictormain.cgi also you can find other tools for function prediction listed in the Critical Assessment of Function Prediction competition.