I am trying to calculate per gene nucleotide diversity (π) for whole-genome-sequence data. I basically have whole genome resequenced data for many hundred individuals with ~1.2 million SNPs and a well annotated species with 36k genes. I was wondering if there is a method that would calculate per gene nucleotide diversity for whole genome sequencing data, ideally from a VCF file and in command line?

So far, I tried calculating π with vcftools - - window-pi as well as - - site-pi but the window approach is not useful as my genes do not regularly distribute along windows and gaps. For the - -site-pi there is no explanation on how it is calculated and more problematically it does calculate more positions than exist. Another option was DnaSP6, but here I would need to produce vcf files for each gene and as piping or merging is not possible, all files would need to be uploaded manually.

More Jessica Heblack's questions See All
Similar questions and discussions