I have a group of orthologous proteins as A A' A" B C D D' E F G H where all alphabets belong to different species; A A' A" and D D' are inparalogs. I want to remove the inparalogs from this group and only want keep only one of the inparalog that is much closer (similar) to other group members. The final group should be like A* B C D* E F G H (where A* and D* is much closer to other species sequences). I can do it manually be creating a phylogenetic tree subsequently reducing the inparalogs but the problem is that I have around 23-33 species and >1000 groups. Strictly, occurrence of one species is allowed only once per group.

More Shishir K Gupta's questions See All
Similar questions and discussions