How to interpret --score results in Plink?

08 August 2018 2 6K Report

The objective is to calculate the genetic risk score, given the genotyping data, effective allele and effective size. Totally 1.5 million SNPs are included in the bfile.bim, and they all have variant ID with form Chr:BP; e.g., 1:12345. However, after I submit the code

/projects/bsi/gentools/bin/plink2 --bfile GenotypingData --score ScoreFile header sum --threads 6

Then I have the result:

FID IID PHENO CNT CNT2 SCORESUM 01 01 -9 3067556 2438692 9.411 02 02 -9 3067556 2440321 9.16466 03 03 -9 3067556 2440784 9.50342 04 04 -9 3067556 2443276 10.615

About 1.5 million snps are concerned in the score. So I can understand thant CNT is approximately 2*1.5 million (for diploid genomes). However, how to understand CNT2? What does "Sum of named allele counts" mean as in plink.profile description. Second question: in pline allelic scoring function, it says "Also, note that scores are multiplied by 0..1 dosages, not 0..2 diploid allele counts, unless the 'double-dosage' modifier is present". What does this mean? What is the difference between 0..1 dosages and 0..2 diploid allele counts? Third question: To use --score ScoreFile function, in ScoreFile, we need SNP ID, effect allele and effect size. Say, if the effect allele is the minor allele, then the score contributed by the SNP should be effective size * dosage (effect allele); however, if the effect allele is not the minor allele, then how to calculate the contribution of the SNP. It should be effect size * (2 - dosage (effect allele)), or effect size * - dosage(effect allele)? I think the third question is also related with the second one. Hope somebody can help.

Kévin Vervier

Dear Iowan fellow,

You might already have read this page: http://zzz.bwh.harvard.edu/plink/profile.shtml, but my understanding is that:

i) CNT2 represents the total number of ref alleles observed in an individual. If you do the ratio between CNT2 and CNT, it gives you ~80%, meaning that 80% of the alleles you observed were reference alleles (as expected).

ii) when using dosage data, the score is obtained by summing all the effects for the observed alleles only (not counted twice if it happens).

iii) in this case, I think that only the effect allele is counted in the score, meaning that if two non-effect alleles are observed for a locus, zero is added to the score.

HTH,

Kevin

Baosheng He

Thanks, @kevin vervier

Is this website real?

Are you looking for research collaboration ?

How to create parameter and coordinate files for ligand in amber when FATAL error (maybe due to duplicate bond specifications) happens?

How can I quantify MDEA (N-Methyldiethanolamine) with a GC-FID?

How can I quantify MDEA (N-Methyldiethanolamine) with a GC-FID?

Is there any way to calculate whole genome similarity of organism without the whole genome sequencing ?

GC-FID Resolution Issue ?

Author ship confirmation on this researsh, 10.3390/ani11123564?

What are some reliable methods to measure the concentration of H2 gas at ppb levels?

Have pathomorphological studies of the lungs been performed in patients with desminopathy?