We have a big dataset of mtDNA COI sequences, around 1500 with around 30 insect species. We used DAMBE (http://dambe.bio.uottawa.ca/dambe.asp) to condense sequences into haplotypes. However, after alignment with clustalX and trimming with Gblocks (http://molevol.cmima.csic.es/castresana/Gblocks_server.html) of all the poorly aligned regions, we managed to get a compression of only ~ 50% (800 haplotypes). Many sequences have a very small degree of divergence, sometimes less than 1%. I'd like to know if you know a program capable of condensing slightly divergent sequences into consensus sequences. Another strategy we would follow would be to build a phylogenetic tree and select sequences which cluster together with small branch length between each other, but it would be certainly more time-consuming.
Thank you in advance for your help!