Yes, I think so, do you know the way how to separate active sites containing sequences inside pdb db? It is quite the same as I would BLASTP on it:)
The core of the question is: how to distinguish peptide sequences inside PDB db (not single PDB, but thousands of PDB) that are containing active sites in both (or more) chain.
So far I have not seen a filter to apply on pdb db to separate active sites. Probably some custom script needed.
The problem is that active site residues are usually not contiguous in sequence, therefore blastP is not the ideal tool to search for these. In pdb advanced search (https://www.rcsb.org/search/advanced), you can also search for sequence motifs and structural motifs, or for linked external resources, such as UniProt-mapped resources, which would include such things as EC numbers denoting the catalytic function.
Thank you, I am actually using rcsb advanced filters you linked now. However it is still not able to limit only to active sites containing sequences. But .pdb has this info inside it, or at least it can be known (counted) from pdb containing interacting molecules.
BLASTP is not actually a requirement but some .db only with active sites to practice different search algo. I will be implementing my own server with bioinformatics tools this Spring and will try to establish a program to search/filer active sites containing sequences .pdb and output db of it.
Well- you can download a list of all hetero compounds from the pdb - however, not all of these are ligands, the list contains any non-protein/nucleic acid component found in any of the structures, e.g. crystallisation additives, buffer compounds, lipids etc.. As to redundancy, you might look at the biological units rather than the asymmetric unit of the crystals.
Thank you for a very valuable suggestion with chosing biological units rather than asymmetric units.
Meanwhile I've found a very good resource with many protein related tools, and there is a db of interactions with a quite good search similar to described in the first message - PepBDB. http://huanglab.phys.hust.edu.cn/pepbdb/db/1a93_A/
Regarding own local db - yes it is definitely an option I am looking to and want to implement.
So what you wanted were peptide interactions? Since you asked for active sites, I assumed you were looking for enzyme - substrate, enzyme -product, enzyme-inhibitor, and enzyme-cofactor interactions, and answered your questions accordingly . If you are still looking for these, you might want to look at BioLiP (http://zhanglab.ccmb.med.umich.edu/BioLiP), which I just came across searching for something different on the Zhang Lab site.
BioLiP is a manually curated database for high-quality, biologically relevant ligand-protein binding interactions. The data is collected primarily from the Protein Data Bank (PDB), with biological insights mined from literature and other specific databases, followed by both computational and manual verifications. References:
Jianyi Yang, Ambrish Roy, and Yang Zhang. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions, Nucleic Acids Research, 41:D1096-D1103, 2013.
Yes, thank you, I am actually now using all zhanglab products since discovered it and find it very useful. I am interested specifically in interactions of some concrete motifs (which I was looking and found) of proteins with other molecules, mostly peptides, but not only peptides.
P.S. I thought I actually shared BipLip link from Zhanglab because this was actually first I found on my request.