When downloading protein family alignment from very large families, I usually chose the filtered options from Pfam (Representative Proteomes or Reference Proteomes) in order to have a manageable number of sequences that sampled the entire tree of life.

With the shutdown of Pfam, I haven't found an alternative for doing that. If I query Uniprot with a Pfam code I get all proteins containing a domain, but none of the filtering options can be useful for doing that. Similarly, I can search for a given domain in Interpro,but the results cannot be filtered by such criteria. Is there any way to do that in any of those databases? Alternatively, is there any tool that would take a very large number of proteins (or their Uniprot codes) and filter by choosing those from representative species on the tree of life (low redundancy, and ideally also prioritizing the most reliable sequences when there is redundancy)?

More Lucas Bleicher's questions See All
Similar questions and discussions