I have the allele frequencies of both case and control groups, and I want to see if the allele frequencies are significantly different or not, I want to do the analysis in R.
You simply have 2 groups and want to compare frequencies?
(1) BY NORMAL APPROXIMATION
For n*p > 10 and n*(1-p) > 10 the normal approximation works well. The you can calculate the mean difference in the estimated proportions and devide this by its standard error
SE(p1-p2) = sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
to get the z-value. The confidence interval and the test can be done using the normal distribution (function qnorm(.) for the CI and pnorm(.) to get a p-value for the test).
(2) BY FISHERS EXACT TEST
The exact way to compare the proportions can be achieved using Fishers exact test:
fisher.test(.)
The odds ratio from Fisher's exact test can be transformed to a relative risk by
RR = OR / (1 – p + (p*OR))
where OR is the odds ratio, RR the relative risk and p the proportion in the control group.
(3) BY A LOGISTIC REGRESSION
If you have a more complicated design with some covariables, a logistic regression (ie. a generalized linear model of the binomial family with logit link) would be most appropriate. Then R function is glm(.)
You simply have 2 groups and want to compare frequencies?
(1) BY NORMAL APPROXIMATION
For n*p > 10 and n*(1-p) > 10 the normal approximation works well. The you can calculate the mean difference in the estimated proportions and devide this by its standard error
SE(p1-p2) = sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
to get the z-value. The confidence interval and the test can be done using the normal distribution (function qnorm(.) for the CI and pnorm(.) to get a p-value for the test).
(2) BY FISHERS EXACT TEST
The exact way to compare the proportions can be achieved using Fishers exact test:
fisher.test(.)
The odds ratio from Fisher's exact test can be transformed to a relative risk by
RR = OR / (1 – p + (p*OR))
where OR is the odds ratio, RR the relative risk and p the proportion in the control group.
(3) BY A LOGISTIC REGRESSION
If you have a more complicated design with some covariables, a logistic regression (ie. a generalized linear model of the binomial family with logit link) would be most appropriate. Then R function is glm(.)
Thank you Jochen Wilhelm , my data is in an excel data sheet consist of 2 columns, one the details of genotype from case group and the other control group.( The data is like this, A/A, A/G, A/A, G/G ...........)
Now how can analyze the data in R to detect the genotype distribution difference (if any) between these 2 groups.
Thank you very much Jochen Wilhelm, for your explanation, I really appreciate it. I put all genotypes data from case and control groups in a single column (header=all) in excel, and in another column (header=dis), I entered "0" for each genotype from control group and "1" for each genotype from case group. Then I imported it to R and analyzed with "SNPassoc" package wirh "association(dis~all, data=mydat)" command. Did I do the right thing?