Is it correct to use Genomic Control (GC) adjustment in association analysis of candidate SNPs?

Nick VL Serão Popular answer

Dear Amanda,

Good question!

To begin with, what is PCA? Principal Component Analysis (PCA) is a dimensionality reduction procedure. In simple words, it takes "N" variables (~700 in your case) that are somewhat correlated and creates "N" (~700 in your case) variables that are uncorrelated. For example, food intake and body weight are highly correlated (let's say r = 0.8). In this sense, you expect to obtain similar results (for whatever you are doing) when you analyze intake or weight as your response variable. If you use PCA, you will create two Principal Components (PC) that are independent from each other. The first PC will account for the vast majority of the variation for these two traits (intake and weight). So, instead of performing the analyses two times, you may use PC 1 to obtain the overall results.

In your case, you are using PCA on your SNPs. SNPs that are closer to each other on the genome are more likely to be correlated to each other because of linkage. As SNPs get further away from each other, chances of crossing-over increase, decreasing the correlation between SNPs. Likewise, members from the same family share more of the same SNP alleles, creating population structure. In your analysis, you want to identify SNPs associated with the trait regardless of the genetic background of your populations. Therefore, by performing PCA on your SNPs, you are summarizing the genetic information of your individuals into fewer variables that are supposedly capturing the population structure in your data.

That being said, although you did not ask about it, I suggest you using more than 2 PCs in your model. By the way, how much % of the variation are these two PC explaining from the SNP data. Please, this % of the variation has nothing to do with your association analysis!

Now that you have an idea of what PCA means without a stats connotation, let's go to FDR and GC.

These two methods are NOT doing the same thing. FDR is related to you controlling for false positives due to random chances, which COULD be due to population structure. In contrast, GC is related to PCA in the sense that is related to the occurrence of false positives because of population stratification. By fitting 2 PCs in the model, you are already accounting for population structure (my question/comment above about % variation is about my belief that you are NOT accounting for it that much). Thus, additional correction of your test statistics using GC is not appropriate, as you are already adjusting your data for population structure.

FDR can be a problem when a small number of tests are being performed. In your case, I would not say that 700 is too little (granted that is not that many either).

My suggestion is to stick to FDR while adding more PCs in the model.

In additional to one SNP at a time and the 2 PCs, are there other effects in the model?

I hope this discussion helps you!

Thanks, Nick

Nick VL Serão

Dear Amanda,

Good question!

Now that you have an idea of what PCA means without a stats connotation, let's go to FDR and GC.

FDR can be a problem when a small number of tests are being performed. In your case, I would not say that 700 is too little (granted that is not that many either).

My suggestion is to stick to FDR while adding more PCs in the model.

In additional to one SNP at a time and the 2 PCs, are there other effects in the model?

I hope this discussion helps you!

Thanks, Nick

Mouse CD3 antibody sequence?

Qual o principal pensamento do docente que irá formar licenciados em pedagogia?

Does the potentiostat have to be in a fume hood when polymerising the PANI?

Does anyone know how to improve calcium-imaging fluorescence with GRIN lenses?

What do you think is the main advantage of cell membrane-camouflaged nanoparticles, according to your experience?

Full text of Kelikian, 1957?

How to obtain nanoparticles from microparticles?

How to study the interaction established between two drug molecules?

What is the Ideal cover in a shotgun soil metagenomic?

How do I solve "No special action if energy rises" when indeed something must change?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How do you delete a duplicate pdf for the same paper on ResearchGate?