The objective is to calculate the genetic risk score, given the genotyping data, effective allele and effective size. Totally 1.5 million SNPs are included in the bfile.bim, and they all have variant ID with form Chr:BP; e.g., 1:12345. However, after I submit the code

/projects/bsi/gentools/bin/plink2 --bfile GenotypingData --score ScoreFile header sum --threads 6

Then I have the result:

     FID        IID  PHENO    CNT      CNT2    SCORESUM        01        01     -9    3067556 2438692    9.411        02        02     -9    3067556 2440321  9.16466        03        03     -9    3067556 2440784  9.50342        04        04     -9    3067556 2443276   10.615

About 1.5 million snps are concerned in the score. So I can understand thant CNT is approximately 2*1.5 million (for diploid genomes). However, how to understand CNT2? What does "Sum of named allele counts" mean as in plink.profile description. Second question: in pline allelic scoring function, it says "Also, note that scores are multiplied by 0..1 dosages, not 0..2 diploid allele counts, unless the 'double-dosage' modifier is present". What does this mean? What is the difference between 0..1 dosages and 0..2 diploid allele counts? Third question: To use --score ScoreFile function, in ScoreFile, we need SNP ID, effect allele and effect size. Say, if the effect allele is the minor allele, then the score contributed by the SNP should be effective size * dosage (effect allele); however, if the effect allele is not the minor allele, then how to calculate the contribution of the SNP. It should be effect size * (2 - dosage (effect allele)), or effect size * - dosage(effect allele)? I think the third question is also related with the second one. Hope somebody can help.

Similar questions and discussions