I am dealing with non-reference plant species. I de novo assembled reads into contigs, clustered them based on sequence similarity, and used the longest contig of each cluster as a unigene set.
My question is, which of the two datasets should be used for downstream analysis (such as GO annotation and expression analysis), the contig assembly or the unigene sets? I have seen people using either or both.