Dear experts,
I encountered some issues while analyzing the genomic nucleotide diversity, and I sincerely hope to receive valuable advice.
I have processed sequencing data, generating BAM files, and subsequently utilized Picard and GATK to produce corresponding VCF.gz files. Following this, I employed tools from https://github.com/simonhmartin/genomics_general to merge multiple VCF.gz files and convert them into a geno.gz file.
I am using this geno.gz file to analyze the genomic diversity (pi) among different populations. The command I employed for this analysis is as follows: python popgenWindows.py -w 1000000 -m 100 -g input.geno.gz -o output.csv.gz -f phased -T 5 --popsFile pops.txt
Given that the sequencing depth for a sample in my data is approximately 0.3x, I opted for a smaller value for the '-m' parameter. However, I have noticed that many windows in the final results exhibit a genomic diversity (Pi) close to 1. I am uncertain whether these Pi values are within the expected range, and I am seeking guidance on how to enhance the reliability of my data and results.
I would be immensely grateful for any advice or suggestions you could provide to help address this issue. Thank you very much for your time and consideration.
Best regards