Hello everyone,

These days I've read many papers regarding RNA-seq and its corresponding analysis, I found when it comes to assembly part, almost every paper would tell how many contigs and scaffolds, which are basically called 'unigenes', have formed out of cleaning reads.

Since I'm not familiar with the assembly process I'm looking on the internet and find a pic elucidating the procedure involved which I've attached below. As in the pic, there are four contigs which comprise two scaffolds, name them as 1,2,3 and 4.

MY QUESTION is first why it cannot be the case that contig 1,2,3 or 4 itself rather than larger sequence (scaffolds) they combined would be recognized as 'unigene'.

Second question would be how to determine which two contigs in the pic comprising scaffolds which represent the real transcriptomes, since basically contig 2 and 3 are also able to be gathered as a scaffold, all of them have certain gaps inbetween.

Many thanks in advance.

More Sicheng Xu's questions See All
Similar questions and discussions