The genemodels have been predicted using three different programs for this genome. If only blasp is used, is there a possibility of missing out hits due to poorly assembled regions, regions with low sequence coverage or scaffolds termini?
While Blastp would be helpful when searching only for proteins, since you are scanning genome sequences to locate protein families, i would suggest you try BLASTx , TBLASTX, and/or TBLASTN. T BLASTN would be helpful if your query sequence is an amino acid (protein) sequence. If your query is a nucleotide sequence then go for BLASTX
How distant is your target genome from the reference you are using, evolutionarily? If the distance is too large, it might help to do the following:
Extract proteins from the predicted genes in your target assembly. From known members of your family of interest, create a HMM profile. Use the profile to search the predicted genes from the target genome assembly. You can adjust the sensitivity of the profile by limiting the divergence in your initial data set.
While this is a roundabout way of searching related proteins, it is also quite sensitive. BLAST family of tools is great, but not too sensitive over long evolutionary distances.
Thanks for your inputs. I am using HMMs from Pfam for this family as queries and performing blastp against the target predicted proteome as well. My dilemma is whether multiple blastp and psi-blast with different query sets is appropriate to maximize the hits. If I perform tblastn as well, will it be a redundant analysis or there could be possibilities of getting some additional hits?
Yes, it could increase your set of candidate sequences. Most of the results will be identical though. In fact, I would go for a custom psi-tblastn search. The BLAST manual should help you out.