12 December 2016 6 8K Report

Aim:

Are the features that we are interested in more conserved in a set of virus strains than expected?

Input:

1) 100 virus genomes (strains of same virus) as pairwise aligned sequences (no MSA).

2) Features we are interested in as locations in one of the genomes (e.g.: GFF).

3) As 2) but randomly selected regions of same size as 2).

Intuitively, I would just calculate the differences for these features (2) over all genomes and compare the %nucleotide changes to random selected parts (3).

How should this actually be done according to the current state of the art and are there any tools to do this calculation?

Similar questions and discussions