Is it advisable to estimate the SNP allele frequency in a population by testing it against control group before starting SNP genotyping? Is it their optimal way to do so?
Before starting *genotyping*? No, not really. However, it is important to understand population allele frequencies when you're selecting SNPs to genotype, and post-genotyping when deciding on what the results mean.
A particular problem with genetic association studies is that the most associated SNP is unlikely to be the causative variant, and any associations may not be applicable in other populations. Such a false-positive link is made for SNP variants specific to a particular population, rather than the phenotype of interest. An understanding of SNP frequencies in populations (or population subgroups) can go a long way towards filtering out these problems before they become entrenched in the literature.
Marker (SNP) selection for genetic association study remain to be most challenging aspect of study design. There are several way to candidate a SNP to study in clinical samples. When you have limitations because of techniques or funds, it is usual to select markers based on previous similar studies (doing replicate study, it is different from duplicate study).
If you are going to select novel marker, you should consider to select a marker with most potential to be causative based on some criteria such as SNP location: SNP can occurs in any site of the genome: promoter, exon, intron, plice site and... As expected SNPs in regulatory region have potential to affect expression levels and in other hand SNPs in coding regions may affect folding and .....
After a SNP was candidate it is necessary to check population genetics metrics such minor allele frequency (MAF). MAF is calculated based on population allele frequency data from genomes projects such as HapMap and mainly 1000 genome projects. As a convention for genetic association study, it is usual to select SNPs with MAFs>15-20%.
You can easily check MAF for each SNPs in db SNP. global MAF is a standard estimation of SNPs allele frequencies. Based on study population ansectory , it may also helpful to check MAF for a population in databases which is close to the study population.
> As a convention for genetic association study, it is usual to
> select SNPs with MAFs>15-20%.
> You can easily check MAF for each SNPs in db SNP. global
> MAF is a standard estimation of SNPs allele frequencies.
> Based on study population ansectory , it may also helpful
> to check MAF for a population in databases which is close to
> the study population.
I did this with the HapMap CEU vs JPT/CHB populations as a model for European vs Maori populations, selecting SNPs with a MAF of >20%, and a little under half of the SNPs had a similar MAF between European and Maori:
It sort of works as a selection procedure, but it's not wonderful. You'd be better off running a few SNPchips on a small sample of the population (e.g. 20-30 individuals), calculating expected MAF from that, and using those MAFs to select SNPs.
Regarding SNP analysis, any idea how I can calculate effect size in case of an unmatched case-control study? I ask this because I will have 700 controls without disease and need to assess how many patients do I need to analyse so see a signficant SNP allele frequency effect.