At the HIV Sequence and Immunolgogy Databases where I work, we have used a bit of creativity to solve some difficult problems in multiple sequence alignment. Often we want to produce an alignment of gene sequences from more than 20,000 different isolates of HIV-1 in less than a few minutes. We are very good at "deep" multiple alignment, thousands of copies of the same small genome.

My problem comes when I want to align the genomes of other viruses or similar sized gene regions (the complete mitochondrial genomes of vertebrates for example, which are roughly 7 kb in size), they don't always have the same gene order. A good example are the mitochondrial genomes of birds and mammals, which are mostly co-linear, but with the NADH6 gene moved to a different location (see attached mitochondrial genome maps).

In other cases, I think it is the primate mitochondrial genomes, the authors all used a different site for the "base #1" in the circular genome. So, although the primate mitochondrial genomes are 100% co-linear with other vertebrates, we have to chop several thousand bases off the right end and past them onto the left end (5' end, beginning) to make them align with the mt-genomes of other mammals.

So, it seems to me that there ought to be a multiple sequence alignment tool, that can read GenBank files with their annotation, and use the annotation to help with the alignment process. One tool that I am aware of, which can help a lot, is the "Artemis Genome Comparison Tool" (ACT) and its associated DOUBLE-ACT server. The DOUBLE-ACT server uses BLAST to find regions on a pair of genomes which are homologous/similar and creates a table of these matched regions. The Artemis Comparison Tool then loads both genomes into an ARTEMIS Genome Browser tool and uses the BLAST hit table to help the browser get both genomes "in synch" with each other as you browse the genomes. Although the DOUBLE-ACT BLAST step here is not dependent on annotations at all, the annotations are visible when browsing the genomes in ACT.

I am quite sure that I am not the only one in the world who needs this type of tool. I am increasingly seeing large multiple sequence alignments being done for classification of organisms, where the authors could have used such a tool.

Please let me know if you have any ideas about where to look for such a tool, or which groups of bioinformatics workers might be able to develop one.

More Brian Thomas Foley's questions See All
Similar questions and discussions