I am interested in finding the clinically relevant SNPs in a gene. I need to select a few SNPs and to check its significance in male infertile patients. So where should I start? And what are the criteria for selecting SNPs ?
You may want to combine two aproaches. First, you can run haploview to see possible LD blocks in the gene. Once you define the blocks, you can run the tagger (also in haploview) and select tag-SNPs which covers these blocks. You may also have a look at 1000 genomes browser and see the effect of each variant along the gene (http://browser.1000genomes.org/Homo_sapiens/Gene/Variation_Gene/Table?db=core;g=ENSG00000166411;r=15:78423840-78464291#missense_variant_tablePanel).
Specially for the non-coding SNPs I would recommend you to use several tools such as the ones I have listed below. Nevertheless, remember that finding a significant SNP in one of them does not make it clinically significant. You need to find several layers of evidence to build a good hypothesis that can be tested either in the lab or the clinic.
- eQTL dabases, (eQTLs are basically SNPs that associate with significant differences in mRNA levels) genevar from the Sanger Institute is a good one (http://www.sanger.ac.uk/resources/software/genevar/)
- ENCODE data, The ENCODE project aims to map the functional regions of the genome. You can perhaps look if you SNP resides in to a potential regulatory region such as a DNAse I or transcription factor pulled down region. You can view these both in UCSC genome browser or ENSMBL
- Conservation, several algorithms, also available at the UCSC, are designed to look for genomic regions that are conserved in lower species such as mouse or flies.
- Natural Selection, You can use tools to screen for signatures of natural selection to select potentially functional haplotypes. There are several online tools. Two of the most commonly used statistics are the FST and the iHS scores.
- The GWAS catalog can help you find if a SNP residing within or close-by your gene of interest has been associated with a given phenotype, which can be either physiologic such as height or pathologic such as cancer risk
- Transcription Factor binding sites, something a bit more laborious is to find SNPs that alter a transcription factor binding motif, there are several online tools that you can use, but to do this "agnostically" makes it almost impossible to interpret. Nevertheless this tools are great once you have a SNP in a regulatory region such as a enhancer or promoter.
Normally, you would look up ENTREZ dbSNP for all of the SNPs in the gene of interest, and it would mark the clinically significant ones with links to PubMed. I checked it (http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?chooseRs=all&locusId=3419 - 3419 is the Gene ID for IDH3A), but there is yet no SNP marked as clinically significant. It is best to do your own SNP selection for most functional ones or haplotype tagging ones to include in a study as has been suggested by other colleagues. You may also use empirical evidence in eQTL databases (GTeX, SCAN, GWASdb) or RegulomeDB to assess SNPs in your gene for functionality. Try also NCBI PheGeI which lists a lot of information including known eQTLs in your gene (http://www.ncbi.nlm.nih.gov/gap/phegeni?tab=1&gene=3419). You can find links to bioinformatics tools useful for your purpose at this page: http://www.dorak.info/mtd/bioinf.html (go down to SNP Resources).