We know that many bacterial genomes have repeated regions / genes. This repeated region can be totally identical between copies over the course of the genome, or it can have some variation. How do genome assembly tools treat these regions?
I am experiencing a problem with a sample that I am still unable to resolve:
I know that my sample has at least two not completely identical copies of the ITS region (already provided by PCR + Sanger sequencing). Among them, the first 237 nucleotides are identical then there are 195 nucleotides that vary and, finally, another 135 identical nucleotides.
When trying to align the sequenced genome (Illumina paired-end sequencing) with the Spades tool, in one of the largest scaffolds there is the ITS region with an annotation “NNN” in the variable part. But a small scaffold (