I am assembling a genome from an organism that we can't cultivate. We don't know if we sequenced multiple strains or one strain which is polymorphic. Anyway, the point is when I get the assemblies from Abyss, Ray and SOAP-denovo, I always get a high redundancy; by redundancy I mean that there are long sequences (in one of the assemblies up to 45000nt) that have been placed by the algorithm in multiple scaffolds. Now I see there is at least one software for reducing this (Simplifier) but I don't think it is ok to use in this situation. I would like to choose the assembly characterized by the minimum redundancy. How can I "measure" redundancy? Someone used the total number of non-self alignments exhibited by scaffolds, but this does not inform on the extent of those alignments. At the same time, the total number of nucleotides in those alignments would provide this information but will not tell how many of the scaffolds contain redundant sequences.

More Matteo Brilli's questions See All
Similar questions and discussions