I have (Linkage Disequilibrium) LD data for two SNPs - r2 is about 0.14, D' is around 0.8. Could these SNPs be said to be in strong LD?

09 September 2012 35 391 Report

Is it better to cite D' or r2 values when considering LD?

HI Kerry

Sorry for my late answer, but I just came across your question. To give you a quick and dirty answer, a D´ of 0.8 is high disequilibrium. Basically the two SNPs are coinherited roughly 80% of the time. The reason your r2 is low is that this takes account of allele frequency.

D´ and r2 values are widely used but poorly understood. The current “trend” seems to be to take more notice of the r2 value whereas I feel that the D´ is more meaningful and easier to understand. The idea of disequilibrium values is that they are a measure of the non-random association of alleles at two or more loci, i.e how often alleles are coinherited. If two loci are not coinherited at all (they are independent) then both the D´ and r2 values will be 0.0 irrespective of either allele frequency. As another example if you had two polymorphisms both with a 50% allele frequency and in total disequilibrium then both the D´ and r2 values would be 1.0. However the story changes when the allele frequencies are not the same. For example if you had two polymorphisms, one with a 50% allele frequency and the other with a 1% allele frequency that were still in total disequilibrium then the D´ value would be 1.0 but the r2 value would only be 0.01. Basically the D´ is saying when the rare allele is present it is always inherited with one particular allele of the 50% polymorphism whereas the r2 is saying it is a rare allele so the vast majority of the time the common allele is not found with it (but only because it is rare, not because it is not in disequilibrium). However, even with such a low r2 this SNP adds nothing to an association study because they are in complete disequilibrium.

I don’t want to labour my answer but it may help gain a better understanding with another example where the two SNPs are coinherited about half the time. With both SNPs having a 50% allele frequency the D´ value would be about 0.5 and the r2 value would be about 0.25 but if one SNP had a 1% allele frequency then the D´ value would still be about 0.5 but the r2 value would only be about 0.005

So to me the D´ is more meaningful. I hope that helps.

André Scherag

Dear Kerry, for what purpose? If you use them in a gene marker mapping experiment to replace one by the other you should go for r2 and an r2 of .14 is very small (so the markers are no good proxies of each other).

Fabio Marroni

I agree with André. I wouldn't say that an r2 of 0.14 is high! r2 and D' have different properties, with D' suffering from a ceiling effect (it easily tends to 1). You can read a recent paper I published on Tree Genetics and Genomes: "Nucleotide diversity and linkage disequilibrium in Populus nigra cinnamyl alcohol dehydrogenase (CAD4) gene" in which I discuss the differences briefly.

You could also read previous excellent work on LD properties by Muller "Linkage disequilibrium for different scales and applications" or by Hedrick "Mutation and linkage disequilibrium in human mtDNA". Good luck!

Carolina Sanchez-Jimeno

D = 1 (SNPs with strongest LD) 0.8, is strong LD. But, with only these two SNPs , you can't construct haplotypes because r2 is very low ( 0.14) and not get all possible haplotypes existing. The optimal r2 is 1.

Adela Mansilla

Agree w Fabio. As by D', they are in LD but becuase r2 is so low they are not good predictors of each other; therefore for mapping or association purposes you may want to use them both.

Joaquin Zuñiga

I agree, r2=0.14 is low and suggest that these SNPs are not in high linkage disequilibrium. I have obtained data of neighboring SNPs with r2 values between 0.60-0.80, these are indicators of high LD.

Hanan Ramadan Mohamed

I agree with Margit, r2=0.14 is low indicating weak correlation between SNPs. D' isn't suitable for predicting LD and r2 is better for LD prediction.

Abhimanyu --

To answer the original question yes they are in strong LD but as elucidated by earlier peers that they can't serve as proxies of each other, nonetheless, they are in strong LD. As mentioned both r2 and D have different purposes .Typically, R2 is preferred when the focus is on the predictability of one polymorphism given the other (and

hence it is often used in power studies for association designs), D', instead, is the measure of choice to assess recombination patterns (haplotypes blocks have often

been defined on the basis of D'). So choose as applies to you. Good luck...

Kerry Andrea Pettigrew

Thanks everyone for your helpful replies. I've been trying to determine the phase between these two SNPs (as opposed to seeing if one could tag the other), and was looking for an explanation of how r2 can be low while D' is fairly high (thanks to Margit and Abhimanyu for answering this particular aspect). I've been trying to work out haplotype frequencies for these 2 SNPs using HapMap genotype data but the two SNPs are in non-adjacent blocks (1 and 5), so I'm wondering now if this is possible or a bit too complicated to estimate. Any advice on this further question would be much appreciated.

Margit Burmeister

Can someone explain this? ResearchGate sent me several XYZ voted up your answer, I have seen responses here, but now it is gone? I am not narcisistic, but it seems strange. Can someone delete another person's answer?

Fabio Marroni

I think that no one shouldn't be able to delete your answers, unless they know your credentials. I also read your answer before it disappeared and I see no reason why administrators should remove it. If I were you I would change my password, just in case...

Hanan Ramadan Mohamed

I also read your answer before it disappeared, you should ask administrators why your answer disappeared.

C. Phillip Morris

HI Kerry

So to me the D´ is more meaningful. I hope that helps.

Kerry Andrea Pettigrew

Charles - thanks, that was by far the clearest explanation yet.

Robert C Elston

I believe all the answers so far referred to a homogeneous population. As far as I know, nobody has carefully studied what happens to the two measures when your population is a mixture of two populations, but that might give another clue to what is going in your particular situation. As Sir Ronald Fisher once wrote: "“It is a statistical commonplace that the interpretation of a body of data requires a knowledge of how it was obtained" (Fisher (1934) The effect of methods of ascertainment upon the estimation of frequencies. Annals of Eugenics 6, 13-25).

Margit Burmeister

Here I am reposting what I said earlier and some commented on before, and a little longer. A D' of 0.8 is fairly high LD, but because of the r2 of 0.14, the SNPs can't substitute each other. This is most likely due to one SNP being much rarer than the other. For 2 SNPs, there are 4 possible haplotypes (call them AB, Ab, aB, ab). If one of the haplotypes is completely missing, for example because a new mutation, b, arose on the background of a, then only 3 haplotypes exist, AB, aB, and ab, the newest and rarer haplotype. In that case, D' will be 1, and r2 small. In this case, b completely predicts a, but a does not completely predict b (knowing b, you can conclude a, but knowing a, it can be either B or b). For population genetics, D' can be very important, but for tagging or substituting one SNP for another r2 is the important measure. r2 will be 1 only if there are only 2 of the 4 possible haplotypes, and the allele frequencies of the two SNPs are the same.

Ashish Kumar

I am getting a (D',R^2)= (-0.206,-0.018),(-1.000,-0.159),(-0.683,-0.165) in Linkage disequilibirium in three populations....what that means..?

C. Phillip Morris

Ashish, you ask "I am getting a (D',R^2)= (-0.206,-0.018),(-1.000,-0.159),(-0.683,-0.165) in Linkage disequilibrium in three populations....what that means..?"

In the same order the polymorphisms are 1: in low disequilibrium (coinherited about 20% of the time) with one SNP having a significantly lower allele frequency; 2: complete disequilibrium (coinherited 100% of the time) with one SNP having a significantly lower allele frequency and; 3: in modest disequilibrium (coinherited about 700% of the time) with one SNP having a slightly lower allele frequency.

C. Phillip Morris

Make that 70% not 700%

Ashish Kumar

thank you C. Phillip Morris...what does negative and positive sign indicates..?

Ashish Kumar

And how did you told about the allele frequency by r^2..?

C. Phillip Morris

Hi Ashish,

The negative sign was in your original question but I did not notice it. D' and r^2 should be positive.

To understand where I get an idea of the allele frequency read my answer at the top of this question. Basically the r^2 is affected by allele frequency but the D' is not.

Let me know if that is not clear.

Phil.

Ashish Kumar

What does negative sign means..?

Ashish Kumar

Does negative sign means that they are not in linkage disequilibirium..?

C. Phillip Morris

Hi Ashish,

I don't know where your negative signs came from but they should not be there and they certainly do not mean there is no LD. D' and r^2 must always be positive. Take a look at the equations for D' and r^2 in the LD entry on Wikipedia (http://en.wikipedia.org/wiki/Linkage_disequilibrium). This should make it clear that they are always positive.

Phil.

Ashish Kumar

Can you tell me which software to prefer for LD calculation. ? I have used dnasp..

C. Phillip Morris

Hi Ashish,

For a limited number of SNPs I prefer JLIN (see download at http://www.gohad.uwa.edu.au/software/jlin-download).

Phil.

David Andrew Eccles

With D' of 0.8 and r^2 of 0.14 I would say that the markers are in LD, but I wouldn't call it "strong" LD. You should specify both D' and r^2 values when reporting LD, because you get an incomplete picture using only one of the two values.

Juan P Steibel

Strong or weak is a totally arbitrary classification and, if in doubt, you are better of avoiding the labeling. Besides, strong or weak depends on the intended use of such LD. For statistical purposes, such as for imputation, r2 is the measure to go by in my opinion. An R2 of 0.14 is not enough to use one SNP to impute the other for example (as it was mentioned before), but if you have a set of several SNP (say 10 or 20) with r2 0.14 with each other, you could probably impute one of them with very high accuracy. Why? because of the "weak LD" with each other, as a group they "almost add" their % of explanation of any single SNP. This is the basis of one method used to select tagSNP for genomic prediction. We have a paper on this: http://www.biomedcentral.com/1471-2156/14/8 and we used r2 as low as 0.1 to select tagSNP. It works because we use MANY SNP simultaneously in our analysis.

On the other side, if you want to select SNP to do GWA one SNP at a time using the SAME algorigthm, you need to set up the minimum LD to a high value, perhaps 0.8 or more to consider it useful (from statistical purposes).

Also, weak or strong in my view, depends of distance. In some species (I work with pigs) we see r2=0.1 at several Mb distance. To me that is "not a weak LD" considering the distance.

I know I am not directly answering your question, but this is the way I think of LD: is it usable or not for a certain purpose?

Collet Dandara

What is the best method for calculating sample size for a case only study looking effects of 5 SNPs on the observed plasma drug levels, the SNPs having the following frequencies; 0.05, 0.10, 0.15 , 0.20 and 0.25 to reach a minimum power of 80%, does one assume p-value significance after taking into account correction (e.g. Bonferroni) or is the standard 0.05 appropriate?

Kanwal Naqvi

C. Phillip Morris Hi

I have worked on genotyping of two SNPs in TNF alpha gene and associate their frequency with OA disease. I have calculated their haplotypes also via manual method. Now one of the reviewer asked me to mention linkage disequilibrium r2 values for these SNPs in my population. I have genotypic data, haplotype frequencies and allele frequency data for these SNPs. can i calculate values of r2 from this data directly? is there any easy method?

What about this formula

r=d/(p1p2q1q2)1/2

Any other manual way? Any help would be highly appreciated.

Samanta Zelasco

Hi, I am using DNASP 6v for linkage disequilibrium analysis but the output show negative values for D' and R ( R insetad of R squared) . What is menaing?

I know that D' and R are always positive values so I can not understand why these kind of results came out . Thank you for your replies.

David Andrew Eccles

Samanta Zelasco the sign of the statistic indicates whether it is a positive or negative correlation (i.e. both variants increasing in proportion vs one variant increasing in proportion and the other variant decreasing in proportion).

In many cases people are only interested in whether or not SNPs are correlated, which is why the absolute values (or squares) are used.

Samanta Zelasco

Thank you very much

Oluwayemi Bamikole

How about you instead use haploview to check for the linkage disequilibrum under the confidence color scheme.

Badges
Science topic

More Kerry Andrea Pettigrew's questions See All

Methylation analysis relating to cognitive traits - does DNA have to be from the tissue of interest?

Wondering whether the use of DNA from saliva (or non-tissue-specific DNA in general) is generally considered to be acceptable for methylation analysis. I'm interested to look at this within the...

01 February 2014 9,427 5 View

Telomere length measurement by qPCR: which protocol is better - Cawthon 2002 or Cawthon 2009?

It seems to me like the 2009 method is preferable, but many recent papers use the 2002 protocol. Is there any reason for this?

05 June 2013 3,728 1 View

Designing qPCR assay to validate a 3Mb CNV (sybr green or equivalent)

How many primer pairs would I need to design to validate this size of CNV by qPCR? Would one PCR fragment be enough, or would I need several?

04 May 2013 2,836 0 View

Has fine mapping by sequencing the region flanking an associated SNP been rendered obsolete by recent developments e.g. 1000 Genomes?

I am wondering if people are still resequencing regions flanking an associated SNP, in order to identify hitherto undiscovered common variants which may be the causative SNP? Or is this a bit of...

10 November 2012 657 3 View

What's better for DNA extraction, phenol chloroform or salting out?

I'm more used to extracting DNA using spin column kits, but trying to work out what is the best non-kit option, in terms of yield and DNA quality.

09 October 2012 4,585 8 View

What is the earliest age that a child has been diagnosed with type 1 diabetes?

I'm wondering how early diabetic complications could potentially begin, in terms of neonatal brain development. Could the effects of poor glucose regulation begin immediately after birth, and how...

08 September 2012 4,815 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

After performing symmetric PCR, PCR purification was performed. Afterwards, asymmetric PCR was performed using the PCR purification product as a template, but no ssDNA band was confirmed in the...

08 August 2024 1,668 3 View