06 December 2023 2 3K Report

Dear experts,

I encountered some issues while analyzing the genomic nucleotide diversity, and I sincerely hope to receive valuable advice.

I have processed sequencing data, generating BAM files, and subsequently utilized Picard and GATK to produce corresponding VCF.gz files. Following this, I employed tools from https://github.com/simonhmartin/genomics_general to merge multiple VCF.gz files and convert them into a geno.gz file.

I am using this geno.gz file to analyze the genomic diversity (pi) among different populations. The command I employed for this analysis is as follows: python popgenWindows.py -w 1000000 -m 100 -g input.geno.gz -o output.csv.gz -f phased -T 5 --popsFile pops.txt

Given that the sequencing depth for a sample in my data is approximately 0.3x, I opted for a smaller value for the '-m' parameter. However, I have noticed that many windows in the final results exhibit a genomic diversity (Pi) close to 1. I am uncertain whether these Pi values are within the expected range, and I am seeking guidance on how to enhance the reliability of my data and results.

I would be immensely grateful for any advice or suggestions you could provide to help address this issue. Thank you very much for your time and consideration.

Best regards

More Song Shiwen's questions See All
Similar questions and discussions