I am having some trouble in understanding the nomenclature for mutations in the gyrB cds of M. tuberculosis. The gyrB gene of strain H37rv is 2028 bp long corresponding to a 675 aa protein (http://www.ncbi.nlm.nih.gov/gene/887081). The first reported sequence for gyrB in M. tuberculosis is somewhat longer (http://aac.asm.org/content/38/4/773.full.pdf+html). Neither of the sequences seems to correspond to the most used nomenclature in papers reporting gyrB mutations (e.g. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0120470#sec011, or http://aac.asm.org/content/55/10/4524.full.pdf+html). I also found some sequences (http://www.ncbi.nlm.nih.gov/nuccore/378408791?from=1&to=2145&sat=4&sat_key=67263822&report=gbwithparts) reporting a longer gyrB sequence, but I don't understand how the same gene, which is also highly conserved, can have different lenghts within the same species.
Thanks for your help.