I try to find homologs of my protein of interest in other species before doing some phylogeny analysis. For this I use pBLAST and I would like to know what could be a good percentage of identity and query coverage?
Hey Claire, unfortunately the answer would be "It depends" and "lots of legwork".
A common rule for protein homology is to look around 30-40% identity over the whole sequence length. But i'd recommend not getting stuck on that and taking some time to look at your protein of interest (poi): do you already have a set of know protein sequences for the poi, how is similarity within these sequences, hows the protein structure, does the poi have domains that might be in other not-of-interest proteins as well and drive up similarity?
And take some time to look over the matches: are there lots of longs gap stretches, are mismatches plausible in that region, meaning can the protein still do the same things as the poi, what do the matching proteins do, and use that to iterate over a few different cutoffs.
In case you have a nice set of sequences for you poi, hmmer might also be an option.
If anybody has more tips, please add! i find that allways a very challenging task as well