I recently did a transcriptome assembly using Trinity and I got one Fasta file. I want to do further analyses on unigenes only. My question is how do I identify the unigenes from the transcripts and have a Fasta file of unigenes only?
dear Lungelo Khanyile In the context of transcriptome assembly, such as Trinity,, unigene refers to a uniquely assembled transcript (all isoforms from a unique gene). basically to reduce the redundant data from the assembled contigs.
you can identify the unigene from your assembled contigs, you just need to use the CD-HIT pipeline (https://github.com/weizhongli/cdhit).