I used to be a heavy user of Pfam and its hmm profiles, and even though Interpro is supposedly its successor there is still a lot of stuff available at Pfam that is not yet available on Interpro which I am having to do writing my own scripts using HMMER, the representative proteomes database, etc. I can usually do most of the stuff I want when I have a HMM available, but I don't know how to deal with IPR entries.

For example, if I have a huge number of sequences (tens of thousands, sometimes hundreds of thousands) that present a given Pfam domain, I can very quickly align them using the HMM of that domain via HMMER's hmmalign. Is there a similar way of doing it from a IPR domain?

For example, IPR052750 is the interpro entry for Glycosyl Hydrolase 18 Chitinase. On its webpage ( https://www.ebi.ac.uk/interpro/entry/InterPro/IPR052750/ ) I can have a list of all proteins with such domain, and the available PDB structures, but I can't download an alignment as was possible with Pfam. There is no "curation" tab on which I could download a HMM file to make my own alignment. Is there a way to get whatever model defines what a IPR entry is and, if that is the case, can it be used to creat a multiple sequence alignment by fitting sequences to the profile (rather than using regular MSA software, which would be impossible for tens of thousands of sequences)? It seems to me that for some of the entries (this one I mentioned for example) it isn't even clear what profile is used.

More Lucas Bleicher's questions See All
Similar questions and discussions