Hi, I have over 2 million nucleotide sequence and each sequence is average 30kb in size. I want to run A sequence identity and clustering analysis on that..I tried CD-HIT suite but it falling every time. So is there anything available to run a sequence identity analysis on 2 million sequence? Or any suggestions?

Similar questions and discussions