I want to construct a phylogenetic tree using almost 2000 sequences from prokaryotes as well as eukaryotes. To simply the tree, I would like to shortlist the sequences. What is the best way forward to do it without any bias in selecting the sequences?