I did a blast search with a sequence of my protein of interest. I got around 4000 sequences in the result. How do I eliminate redundant sequences and multiple entries for the same organism from a blast search result? Using CD hit to set a cut off of 90% still, gives a huge number of redundant sequences. I have even gone down to 40% cutoff and still ended up with multiple of redundant entries. When I do sequence search in Pfam database, there also I get around 6000 sequences. But seed sequences are only 12. Which is the right way to obtain a good number of homolog sequences? thanks in advance...

More Lakshmeesha Kempaiah Nagappa's questions See All
Similar questions and discussions