Dear all, I am trying to use CD-hit to remove the duplicates from the file that is the output from trinity (RNA seq assembly).

I used the following parameters:

cd-hit-est -i in.fasta -o out_cdhit90.fasta -c 0.90 -n 9 -d 0 -M 0 -T 0

But the output file still contains lots of small or fragmented sequence plus the best one. How can I remove those small or fragmented duplicates by changing the parameters?

thanks

ZQ

More Zhiqiang Wu's questions See All
Similar questions and discussions