I am revisiting a phylogeny I did years ago but now refseq in genbank has about 15000+ relevant protein sequences!! I wanted to filter those out to a more manageable set and I am using t-coffee (like I used to) but it's taking a long time and I am wondering what people do these days. Are there other things people use to automatically remove sequences that are very similar to each other?? Since I am interested in the deep branches I don't need to have all sequences, just the few hundred most divergent ones (a lot of these are different strains of e.coli for example). Any suggestions??

More Ramiro Barrantes's questions See All
Similar questions and discussions