Hello everyone,

I'm working on a viral genomics project and have encountered an interesting assembly result that I'd appreciate some professional advice on.

I have sequenced the genome of an alfamovirus (e.g., Alfalfa mosaic virus, AMV), which is known to have a tripartite RNA genome (RNA1, RNA2, RNA3). My de novo assembly pipeline, however, produced a single large contig of approximately 6.2 kb.

Upon analysis, this contig was found to contain two large ORFs. BLAST analysis of these ORFs revealed strong matches to the P1 (replicase component) and P2 (replicase component) proteins, which are normally encoded on the separate RNA1 and RNA2 segments, respectively. The total length of the contig (~6.2 kb) corresponds almost exactly to the sum of the expected lengths of RNA1 (~3.6 kb) and RNA2 (~2.6 kb).

My question is twofold:

  • Is this a common assembly artifact for tripartite or multi-segmented viral genomes?
  • For a manuscript and GenBank submission, how should I handle this? Should I manually split the contig into two separate sequences (RNA1 and RNA2) for submission, or is there a standard protocol for submitting such assembly artifacts? I want to ensure my data accurately reflects the biological reality of the virus while also being transparent about the assembly process.
  • Any advice or shared experiences on this topic would be greatly appreciated. Thank you in advance!

    More Mesele Tilahun Belete's questions See All
    Similar questions and discussions