I probably wouldn't do a GWAS in the first place, but it depends on the resources and the phenotype of interest. CNVs perform similarly to SNPs in the GWAS design since common CNVs are generally in LD with SNPs, and rare or de novo CNVs won't produce a signal. Burden tests are what the field is doing with CC or population study designs these days, even for SNPs.
Rare variants are of interest right now, but I think that burden tests of common variants make sense as well, since functional common variants are usually observed to have small effect sizes. There are a lot of different approaches for doing burden tests being worked on right now, and I don't think a consensus best-approach has been established . As to affecting the phenotype the same way, some tests are flexible with the direction of the effect. I have heard good things about SKAT, but I haven't used it myself. Here is a link: http://www.hsph.harvard.edu/research/skat/.
Some approaches are gene based, and others are whole genome. Evan Eichler's group's work with mental disabilities has produced some good examples of how burden tests can be an effective approach. If GWAS hasn't been done for your phenotype of interest, it is certainly a justifiable first step. If you did that I would also impute genotypes from the 1000 genomes project using IMPUTE2:
Dear August, Thank you very much for these very interesting remarks. Actually, I think pretty much the same as you. To gain in power, a study would mix analysis of rare and more common variants, the first one with stronger effect and the second ones with moderate impact on the phenotype. Yet, I was not able to name the method as you did. I will have a close look at the work you mentioned.
Actually, no GWAS has been done yet for the complex disease I think of. I am pleased to know your opinion about that, since I was not convinced that a "classical" GWAS approach based on the hybridization of arrays including only common variants. As you said, it might probably be a relevant first step, but it would certainly not be suffcient to unravel a large part of the heritability.
As August mentioned, GWAS is very much dependent on phenotype you are looking at. Of course when you are looking at a phenotype like Breast cancer or common cold you cannot go for linkage studies and the load of common variants will be much more than rare variants. Nowdays labs like Eichler's and people from NHGRI , NIH concentrate on CNVs but a complementary approach is always good. Case control analysis for variant allele burden. If you have good funding then exome sequence small families or sib pairs and sanger confirm. GWAS with CNVs through CGH array is helpful. Low covergae genome sequencing will give you false positives so you will be simple fishing out things in a pool of huge variants and the confirmation will take long time.
SNP array and CGH array should be feasible on affimetrix. For power your sample size matters and you must be aware that no one knows how big is really Big!
Missing out rare variants in GWAS can happen or alleles with moderate effect and common variants can only be picked up with good confidence. Your sample access and Phenotype should decide the approach. University of Washington and 1000 genome data shows all allele frequencies all over the world and if you find your disorder being represented as common or rare in terms of the variants reported in the genes through there SNPs then decide accordingly.
Thank you Kalpita. You are rigt that low coverage seqeuncing would probably induce numerous false positive. My little experience in exome sequencing confirms your observation; even with a supposed high coverage sequencing many artefacts can be ruled out by checking with Sanger sequencing.
According to your comments, it could be a better idea to use arrays combining SNPs and CGH -on the sema array or on separate ones-. In wondered about the relevancy to use a chip including rare variants inferred from exome sequencing in thousands of peapole. I guess that Illumina proposes such a product. We'll have to find the balance between the target to reach and our funding. As you mentioned, sample size can be quite limiting, all the more that our study will concern a complex disease.
Ya Sabastien, since you are looking at cancer, m sure ur sample sizes will be limiting. N looking for rare variants where nowdays in cancer, u can see that correlation to common SNPs with high confidence is also being associated. No approach is full proof but U cannot ignore CNVs at this point. Either you do whole genome where u cover introns, UTRs and coding plus miRNAs if u have good funding coz cancer relation to miRNA is growing.
You surely can pick up best associated SNPs or genes and get customed chip generated. Coverage you cannot ignore hence atleast 3 samples needs to be given as also these samples are not related to each other so you need to sequence more in no. to keep the genetics right. Else it will be simply getting lot f variants without a good correlation.
You are right indeed about regulation by miRNAs and about the usefulness of parsing CNVs. As you said, there is no full proof for any of the methods avaialble to date.
Well, if we had unlimited funding, or if costs dramatically dropped, it would be informative to sequence exomes or genomes from every individuals of the cohort. It wouldn't be easy to analyze, but most we would have then most part of the genetic information.
Obviously, it would not even give all the answers, as we can't neglect epigenetic mechanisms which play a part in the pathogenesis of complex disease.
Ya that's why in cancer field, either people go for promoter methylation scans globally or at genomic or sub genomic level, you could use the kind of chips already available for some cancers really worked out and go for sequencing your patients. Best would be still to for atleast few cases if at all where there might be a positive heritability.
It isnt avoidable that most CNVs have a denovo origin while again you cannot rule out that they might be inherited too..complex with modifiers and highly epigenetically regulated.
Cancer as we speak, is really a variable system. And papers in cell have already showed that a single cell level sequencing from a tumor shows a different profile that another cell from a neighboring tissue!..what do we expect here!
It's definitively a tough field of research. Anyhow, it would be great studies like the one we are going to start could help even very modestly to unravel a bit of the complex mechanism of carcinogenesis. We wouldn't conduct this kind of study if we didn't expect so.
Thanks again for your very intersting comments, Kalpita.
Well, it depends on the question you want to ask. I think GWAS are useless for certain common complex traits with a strong gene-environment effect because the effect sizes are ridiculously small and the sample sizes ridiculously large (see the results from GWAS for major depression ). That said, if you have a "good question" I would suggest the opposite order: low coverage genome sequencing + custom SNP array (custom for your population and variants of interest).