I want to analyse my data to see if there is an association between a disease and an SNP, should I use an additive model? How can I perform the analysis using R or SPSS?
Choose which allele you want to code as your effect allele. E.g. Let's assume your genotypes are AA, AG, GG. If G is the minor allele and also the effect allele, then code AA = 0, AG = 1, and GG =2. This is now your additively coded SNP. Then setup a logistic regression model with disease as the outcome variable and your additively coded SNP as the predictor variable. In R you can use the glm function : glm(outcome~ snp , family = binomial)
If you have a single disease (e.g. measles yes or no) and a single SNP, e.g. AA, AG, GG then you don't need SPSS or R, you simply need a CHi-square test to tell you whether there is an association or not between the SNP and the disease.
You can do this two ways:
set up a 2x2 table thus:
Disease +ve Disease -ve
n m Allele A
o p Allele G
THis will quite simply tell you if there is a significant association between one or other of the alleles and the presence of the disease.
alternatively, set up a 3x2 table, thus:
Disease +ve Disease -ve
a b Genotype AA
c d Genotype AG
e f Genotype GG
This will tell you if there is an association with the presence or absence of disease, and one of the possible genotypes. Note that the association may be protective, in which case the genotype will be less well represented in the disease group.
Now - think about this and come back and tell me how you can use the data to determine whether your allele is having a dominant or recessive effect, and whether you need to correct for multiple comparisons.
well… to determine whether there are additive effects *between different SNPs*, then of course one needs data from two or more SNPs. If they are on the same chromosome, then the degree of linkage disequilibrium becomes important. If they are in complete linkage disequilibrium, then they behave as if they were a single allele, and the above applies. If they are not on the same chromosome, or on the same chromosome but not linked haplotypically, then they can be studied for additive effects.
R is excellent at this - simply include each allele as an independent variable where any person can have 0, 1 or two copies of the allele.
Good Luck
G
See Phillips answer, below, for an analysis of allele-dosing (additive) effects - a much better answer to your question
For a SNP (e.g. AA, AG, GG), if A is the minor allele, then,
for dominant model, both AA and AG genotype will increase disease risk;
for recessive model, only AA genotype will increase disease risk;
for additive model, we assume an increased copy of "A" allele will increase disease risk, so AA>AG>GG. This can be tested by logistic regression after you set AA=2, AG=1 and GG=0.
Choose which allele you want to code as your effect allele. E.g. Let's assume your genotypes are AA, AG, GG. If G is the minor allele and also the effect allele, then code AA = 0, AG = 1, and GG =2. This is now your additively coded SNP. Then setup a logistic regression model with disease as the outcome variable and your additively coded SNP as the predictor variable. In R you can use the glm function : glm(outcome~ snp , family = binomial)
If you are try a QTL mapping approach, R/QTL is great : http://www.rqtl.org . You can try CIM (no document presented online but you can obtain information typing ?cim)
Hi, I want to adress a similar but different question within this topic. I have a study population (n=50 subjects) and found n=19 SNPs in the candidate gene. Now I would like to determine a kind of association between my observed genotype and clinical phenotype.However, I do not have data on controls as we did not include these in the study for genotyping. Therefore, I was wondering I it is allowable/justified to use a representative population genotype data from hapmap or 1000 genomes and compare these to our genotype data. In addition, SNP that deviate from controls I would like use to determine associations. Any thought on this?
I'm struggling with this issue too, wanting to perform analyses for additive inheritance of several snps (n=10). I have coded a snp as suggested above and performed multivariate logistic regression (adding some clinically important variables). ( using Stata 14 )
So my question - I get p-values for each genotype, clinical variables and for the overall model - but how do I get a p-vallue for this SNP having additive effect on my outcome (dichotome) examined??
From your answer I suddenly realise that's it's fairly simple:
by adding the prefix i. to each snp, I will calculate p-values for associations between reference genotype (often homozygote WT) and the individual genotypes of the snp (heterozygote and homozyg variant, respectively).
by avoiding to add this prefix it will give me the p-values for an additive effect (assuming snps are coded as 0, 1, and 2)
Yes that's true, but creating a dummy it will help you to interpret how carrying AA, AG or GG is putting someone at risk to your phenotype. You can check the RRR or OR or intercepts. If you don't use the dummy it might be tricky on how you will interpret the whole 'effect'.
Do we need equal sample size for case control group for this analysis? If I have 80 cases and 300 control, how can I do CHi-square test and odds ratio calculation??