The objective is to calculate the genetic risk score, given the genotyping data, effective allele and effective size. Totally 1.5 million SNPs are included in the bfile.bim, and they all have variant ID with form Chr:BP; e.g., 1:12345. However, after I submit the code
/projects/bsi/gentools/bin/plink2 --bfile GenotypingData --score ScoreFile header sum --threads 6
Then I have the result:
FID IID PHENO CNT CNT2 SCORESUM 01 01 -9 3067556 2438692 9.411 02 02 -9 3067556 2440321 9.16466 03 03 -9 3067556 2440784 9.50342 04 04 -9 3067556 2443276 10.615
About 1.5 million snps are concerned in the score. So I can understand thant CNT is approximately 2*1.5 million (for diploid genomes). However, how to understand CNT2? What does "Sum of named allele counts" mean as in plink.profile description. Second question: in pline allelic scoring function, it says "Also, note that scores are multiplied by 0..1 dosages, not 0..2 diploid allele counts, unless the 'double-dosage' modifier is present". What does this mean? What is the difference between 0..1 dosages and 0..2 diploid allele counts? Third question: To use --score ScoreFile function, in ScoreFile, we need SNP ID, effect allele and effect size. Say, if the effect allele is the minor allele, then the score contributed by the SNP should be effective size * dosage (effect allele); however, if the effect allele is not the minor allele, then how to calculate the contribution of the SNP. It should be effect size * (2 - dosage (effect allele)), or effect size * - dosage(effect allele)? I think the third question is also related with the second one. Hope somebody can help.