06 June 2014 1 3K Report

I am dealing with non-reference plant species. I de novo assembled reads into contigs, clustered them based on sequence similarity, and used the longest contig of each cluster as a unigene set.

My question is, which of the two datasets should be used for downstream analysis (such as GO annotation and expression analysis), the contig assembly or the unigene sets? I have seen people using either or both.

More Shu Chen's questions See All
Similar questions and discussions