Hi all!

I want to construct a phylogenetic tree of all earthworm (Lumbricidae) COI sequences available on GenBank.

My search on the NCBI nucleotide database has retrieved about 10,000 sequences. Obviously, I can't construct a tree using all these. Also, I realize there will be redundant results, duplicates, and unverified sequences.

What would be my next step when dealing with these sequences? Should I clean my dataset of 10,000 sequences? If yes, how would I do that? Which tools or software are commonly recommended?

Any insight on the logical next steps would be immensely appreciated.

More Kamyar Amirhosseini's questions See All
Similar questions and discussions