Hello,

I am working on a project which aims at the exact identification of bacterial strains obtained from isolated cultures. I have whole genome data that I used to generate the genome assemblies. Now I need to determine the exact strain to which each bacteria belongs.

The strains were initially characterized by 16S sequence, however when I compare the phylogeny obtained from 16S with the one obtained from the whole genome I obtain different results. For this I identified groups of single copy orthologous genes and used all of them to perform the genomic based phylogeny. In my experience 16S alone might not be enough to discriminate strains because it can be too much conserved. Do you agree with this?

My questions are:

1) is there a consensus on which is the minimum sequence similarity and coverage needed to assign a strain ID to a genomic sequence? 2)How can you be sure that the strain you find is exactly your strain?

3) Maybe the database does not contain the specific sequence of your strain, so what is the minimum difference between two bacterial genomes to say that they belong to different strains?

Any advice?

Similar questions and discussions