I have a question about the design of the schema for the cgMLST analysis: why are the genomes with the best quality generally selected for the schema? Which variant is more reliable whether to select 73 genomes (variant 1) with the best quality –complete genomes and chromosomes or all – complete genomes, chromosomes, scaffolds and contigs (615 ) (variant 2) regardless of the quality of the assembly. In my case for the 73 to cgMLST scheme a larger number of loci will be used e.g. about 1900 while in the case of the scheme for 615 genomes these loci will be about 1700. Following this line of thought it seems that variant 1 is better but in a further analysis in which against each scheme I analyse the similarity of all genomes plus the strains tested, I am not so convinced anymore. Would the fact that I use variant 1 in which I analyse a larger number of loci not affect the analysis picture, or is it better to use variant 2 with fewer loci but which have been identified in a larger number of genomes? I don't know if I have explained this sufficiently? Please help me with this problem.