Aim:
Are the features that we are interested in more conserved in a set of virus strains than expected?
Input:
1) 100 virus genomes (strains of same virus) as pairwise aligned sequences (no MSA).
2) Features we are interested in as locations in one of the genomes (e.g.: GFF).
3) As 2) but randomly selected regions of same size as 2).
Intuitively, I would just calculate the differences for these features (2) over all genomes and compare the %nucleotide changes to random selected parts (3).
How should this actually be done according to the current state of the art and are there any tools to do this calculation?