I have a set of 2000 sequences. In that set there are two sequences
>Seq1
AAAAAAAAAAAAAAAAAAAAAAA
>Seq2
UUUUUUUUUUUUUUUUUUUUU
When clustering with Cd-hit-est with default options, these two are put in same cluster with Seq1 as representative and Seq2 as.... -/100%
Is this correct?