I am interested in predicting and confirming the biochemical functions of some proteins. Protein modelling results showed some ligand binding sites. Of what importance are they and how can one predict protein functions based on the ligands.
Generally, it would be helpful if you could provide some additional information for your question.
What is your source for you proteins? I assume you are dealing with sequence data? What organism are you dealing with and how did these sequences come to your attention. Is there any hypothesis as to what they might be.
Next you might give a few details about what protein modelling techniques you used.
Did you BLAST the sequences? Did you do domain / motif predictions on a web-server and if so which one did you use.
To at least start to give you an answer on your question:
Ligand binding sites are probably the most important feature in a protein if you want to characterise its function. Enzymes for example bind their substrates via ligand binding sites.
Had you for example an ATP binding site identified you can be almost certain you are dealing with an enzyme catalysing an ATP dependent reaction.
However, just from sequence data it will be very difficult to predict the ligand for the binding site. You would have to characterise the binding property of the molecule by some other methods.
I agree with Peter that it would be important to learn both on which proteins you have been looking at as well as what computer program you have been using to identify potential ligand binding sites. There are several commercial programs - many used by pharmaceutical industry in search of lead compounds for potential drugs. And
Are you looking for metal binding sites? Or are you interested in potential protein phosphorylation sites - important modifiers of protein/enzyme function. In that case there are freely available programs available.
It seems that you have very little information about the protein you are investigating. So it seems to be necessary to lay out a rational plan of action for finding or developing further information:
1. What is the source of the protein? What is it's origin? Is it from an animal, plant or fungal source, or is it bacterial or viral?
2. Is it physiologically extracellular or is it tightly associated with cells? If cellular, then what cellular compartment is involved?
3. How was the protein identified and isolated? What is currently known about it?
4. If protein sequence is not available then that is the first objective. A tryptic digest of the SDS-PAGE band should be helpful in obtaining a rough sequence from a suitable MS approach. Based on your mention of modeling, it seems that you have at least a partial amino acid sequence. What are it's physical properties? How large is the protein? How many residues? Which residues are modified and how?
5. Compare the study protein with known proteins with respect to their sequence homology and domain structure. This is when some (not all) potential ligands may be identified. It is, however, important to know, for example, if the study protein has homology to some known structural protein (tubulin, fibrinogen, etc), a carrier/transport protein (albumin, hemoglobin, transferrin, etc) or some enzyme (kinase, protease, etc).
6. Consider the properties of the potential ligands. Are they ions (Na+, Ca2+, Cu2+, etc), second messenger type molecules (cAMP, cGMP etc), coenzymes/cofactors (Vitamin K, Vitamin B6, etc), or larger macromolecules (GAGs, lipids, prosthetic groups, peptides, other proteins, etc). Needless to say, any potential ligand would need to be verified through actual binding studies to ensure that the interaction actually occurs. Clearly if the interaction is real it would have enormous implications for protein function.
7. Merely because a structural feature is identified with homology to a known interaction, however, does not mean that in the study protein it will also serve a similar role. A variety of reasons why this may not be so may be in play, such as: a) Compartmentalization may limit access; b) The putative binding region may not be sterically accessible to ligand; c) It may not be fully competent for the interaction. There are numerous examples of protein domains with homologies to specific known binding domains in other proteins but without the expected actual interactions (usually due to some critical residue modifications).
Hopefully, the above points will help to organize the approach to this question a little.
Thanks guys for the interest you've shown in my question. Certainly am new to this part of research, but willing to learn. At this stage I can only tell you a bit about what am interested in
1. Its a mycoplasma gene cloned from available genome sequence
2. No homologues found but of interest because its in an important operon
3. Purified protein after GST tagging in E. Coli
4. Sequenced to confirm protein of interest correct
5. Submitted a.a seq to RaptorX
6. Showed DT (thymidine 5 monophosphate) and DC (2 Deoxycytidine 5 monophosphate) ligands.
Maybe this may help in refining the question. Cheers everyone
Ok. So you have the sequence and you have the protein available in the lab. That is a good start since you can test your computational results.
I would also suggest you try and look into a few other computational tools.
www.expasy.org should be a valuable resource for you.
Under proteomics you find a subpage for protein function an characterisation. There the membrane protein topology prediction could be of interest.
Further more in the proteomics tab there is the patterns menu which will allow you to search for known patterns from Prosite or similar.
I understand that you confirmed the dt and dc ligands experimentally? In that case could you develop a hypothesis for the function from potential other genes in the operon? You might also try and find out on which side of the membrane your ligand binding site is.
Dear Peter, thanks for responding. I haven't experimentally confirmed the dt and dc. I looked at the protein topology as suggested and almost all residues are below the lower cut off point except the first 15 a.a, which I guess is the signal peptide. My ultimate aim is to experimentally find out its biochemical function.
I forgot to ask earlier have you tried pBlast against other genomes from similar species?
Now it would be interesting to check whether you can get dAMP and dGMP to bind as well. Generally, probing similar ligands to learn about the specificity would be an idea.
Apart from that you might make a list of what functions you could suspect e.g is it an nucleotide importer, or is it a sensory receptor? Does the protein catalyse a reaction and which?
You might also look at your original source species and see what the phenotype of a knockout mutant is. You said it was from some fungus? Maybe temperature sensitivity or something similar is elevated. If you find out when the protein gets expressed that might be an indication?