Sorry for my late answer, but I just came across your question. To give you a quick and dirty answer, a D´ of 0.8 is high disequilibrium. Basically the two SNPs are coinherited roughly 80% of the time. The reason your r2 is low is that this takes account of allele frequency.
D´ and r2 values are widely used but poorly understood. The current “trend” seems to be to take more notice of the r2 value whereas I feel that the D´ is more meaningful and easier to understand. The idea of disequilibrium values is that they are a measure of the non-random association of alleles at two or more loci, i.e how often alleles are coinherited. If two loci are not coinherited at all (they are independent) then both the D´ and r2 values will be 0.0 irrespective of either allele frequency. As another example if you had two polymorphisms both with a 50% allele frequency and in total disequilibrium then both the D´ and r2 values would be 1.0. However the story changes when the allele frequencies are not the same. For example if you had two polymorphisms, one with a 50% allele frequency and the other with a 1% allele frequency that were still in total disequilibrium then the D´ value would be 1.0 but the r2 value would only be 0.01. Basically the D´ is saying when the rare allele is present it is always inherited with one particular allele of the 50% polymorphism whereas the r2 is saying it is a rare allele so the vast majority of the time the common allele is not found with it (but only because it is rare, not because it is not in disequilibrium). However, even with such a low r2 this SNP adds nothing to an association study because they are in complete disequilibrium.
I don’t want to labour my answer but it may help gain a better understanding with another example where the two SNPs are coinherited about half the time. With both SNPs having a 50% allele frequency the D´ value would be about 0.5 and the r2 value would be about 0.25 but if one SNP had a 1% allele frequency then the D´ value would still be about 0.5 but the r2 value would only be about 0.005
So to me the D´ is more meaningful. I hope that helps.
Dear Kerry, for what purpose? If you use them in a gene marker mapping experiment to replace one by the other you should go for r2 and an r2 of .14 is very small (so the markers are no good proxies of each other).
I agree with André. I wouldn't say that an r2 of 0.14 is high! r2 and D' have different properties, with D' suffering from a ceiling effect (it easily tends to 1). You can read a recent paper I published on Tree Genetics and Genomes: "Nucleotide diversity and linkage disequilibrium in Populus nigra cinnamyl alcohol dehydrogenase (CAD4) gene" in which I discuss the differences briefly.
You could also read previous excellent work on LD properties by Muller "Linkage disequilibrium for different scales and applications" or by Hedrick "Mutation and linkage disequilibrium in human mtDNA". Good luck!
D = 1 (SNPs with strongest LD) 0.8, is strong LD. But, with only these two SNPs , you can't construct haplotypes because r2 is very low ( 0.14) and not get all possible haplotypes existing. The optimal r2 is 1.
Agree w Fabio. As by D', they are in LD but becuase r2 is so low they are not good predictors of each other; therefore for mapping or association purposes you may want to use them both.
I agree, r2=0.14 is low and suggest that these SNPs are not in high linkage disequilibrium. I have obtained data of neighboring SNPs with r2 values between 0.60-0.80, these are indicators of high LD.
To answer the original question yes they are in strong LD but as elucidated by earlier peers that they can't serve as proxies of each other, nonetheless, they are in strong LD. As mentioned both r2 and D have different purposes .Typically, R2 is preferred when the focus is on the predictability of one polymorphism given the other (and
hence it is often used in power studies for association designs), D', instead, is the measure of choice to assess recombination patterns (haplotypes blocks have often
been defined on the basis of D'). So choose as applies to you. Good luck...
Thanks everyone for your helpful replies. I've been trying to determine the phase between these two SNPs (as opposed to seeing if one could tag the other), and was looking for an explanation of how r2 can be low while D' is fairly high (thanks to Margit and Abhimanyu for answering this particular aspect). I've been trying to work out haplotype frequencies for these 2 SNPs using HapMap genotype data but the two SNPs are in non-adjacent blocks (1 and 5), so I'm wondering now if this is possible or a bit too complicated to estimate. Any advice on this further question would be much appreciated.
Can someone explain this? ResearchGate sent me several XYZ voted up your answer, I have seen responses here, but now it is gone? I am not narcisistic, but it seems strange. Can someone delete another person's answer?
I think that no one shouldn't be able to delete your answers, unless they know your credentials. I also read your answer before it disappeared and I see no reason why administrators should remove it. If I were you I would change my password, just in case...
Sorry for my late answer, but I just came across your question. To give you a quick and dirty answer, a D´ of 0.8 is high disequilibrium. Basically the two SNPs are coinherited roughly 80% of the time. The reason your r2 is low is that this takes account of allele frequency.
D´ and r2 values are widely used but poorly understood. The current “trend” seems to be to take more notice of the r2 value whereas I feel that the D´ is more meaningful and easier to understand. The idea of disequilibrium values is that they are a measure of the non-random association of alleles at two or more loci, i.e how often alleles are coinherited. If two loci are not coinherited at all (they are independent) then both the D´ and r2 values will be 0.0 irrespective of either allele frequency. As another example if you had two polymorphisms both with a 50% allele frequency and in total disequilibrium then both the D´ and r2 values would be 1.0. However the story changes when the allele frequencies are not the same. For example if you had two polymorphisms, one with a 50% allele frequency and the other with a 1% allele frequency that were still in total disequilibrium then the D´ value would be 1.0 but the r2 value would only be 0.01. Basically the D´ is saying when the rare allele is present it is always inherited with one particular allele of the 50% polymorphism whereas the r2 is saying it is a rare allele so the vast majority of the time the common allele is not found with it (but only because it is rare, not because it is not in disequilibrium). However, even with such a low r2 this SNP adds nothing to an association study because they are in complete disequilibrium.
I don’t want to labour my answer but it may help gain a better understanding with another example where the two SNPs are coinherited about half the time. With both SNPs having a 50% allele frequency the D´ value would be about 0.5 and the r2 value would be about 0.25 but if one SNP had a 1% allele frequency then the D´ value would still be about 0.5 but the r2 value would only be about 0.005
So to me the D´ is more meaningful. I hope that helps.
I believe all the answers so far referred to a homogeneous population. As far as I know, nobody has carefully studied what happens to the two measures when your population is a mixture of two populations, but that might give another clue to what is going in your particular situation. As Sir Ronald Fisher once wrote: "“It is a statistical commonplace that the interpretation of a body of data requires a knowledge of how it was obtained" (Fisher (1934) The effect of methods of ascertainment upon the estimation of frequencies. Annals of Eugenics 6, 13-25).
Here I am reposting what I said earlier and some commented on before, and a little longer. A D' of 0.8 is fairly high LD, but because of the r2 of 0.14, the SNPs can't substitute each other. This is most likely due to one SNP being much rarer than the other. For 2 SNPs, there are 4 possible haplotypes (call them AB, Ab, aB, ab). If one of the haplotypes is completely missing, for example because a new mutation, b, arose on the background of a, then only 3 haplotypes exist, AB, aB, and ab, the newest and rarer haplotype. In that case, D' will be 1, and r2 small. In this case, b completely predicts a, but a does not completely predict b (knowing b, you can conclude a, but knowing a, it can be either B or b). For population genetics, D' can be very important, but for tagging or substituting one SNP for another r2 is the important measure. r2 will be 1 only if there are only 2 of the 4 possible haplotypes, and the allele frequencies of the two SNPs are the same.
Ashish, you ask "I am getting a (D',R^2)= (-0.206,-0.018),(-1.000,-0.159),(-0.683,-0.165) in Linkage disequilibrium in three populations....what that means..?"
In the same order the polymorphisms are 1: in low disequilibrium (coinherited about 20% of the time) with one SNP having a significantly lower allele frequency; 2: complete disequilibrium (coinherited 100% of the time) with one SNP having a significantly lower allele frequency and; 3: in modest disequilibrium (coinherited about 700% of the time) with one SNP having a slightly lower allele frequency.
The negative sign was in your original question but I did not notice it. D' and r^2 should be positive.
To understand where I get an idea of the allele frequency read my answer at the top of this question. Basically the r^2 is affected by allele frequency but the D' is not.
I don't know where your negative signs came from but they should not be there and they certainly do not mean there is no LD. D' and r^2 must always be positive. Take a look at the equations for D' and r^2 in the LD entry on Wikipedia (http://en.wikipedia.org/wiki/Linkage_disequilibrium). This should make it clear that they are always positive.
With D' of 0.8 and r^2 of 0.14 I would say that the markers are in LD, but I wouldn't call it "strong" LD. You should specify both D' and r^2 values when reporting LD, because you get an incomplete picture using only one of the two values.
Strong or weak is a totally arbitrary classification and, if in doubt, you are better of avoiding the labeling. Besides, strong or weak depends on the intended use of such LD. For statistical purposes, such as for imputation, r2 is the measure to go by in my opinion. An R2 of 0.14 is not enough to use one SNP to impute the other for example (as it was mentioned before), but if you have a set of several SNP (say 10 or 20) with r2 0.14 with each other, you could probably impute one of them with very high accuracy. Why? because of the "weak LD" with each other, as a group they "almost add" their % of explanation of any single SNP. This is the basis of one method used to select tagSNP for genomic prediction. We have a paper on this: http://www.biomedcentral.com/1471-2156/14/8 and we used r2 as low as 0.1 to select tagSNP. It works because we use MANY SNP simultaneously in our analysis.
On the other side, if you want to select SNP to do GWA one SNP at a time using the SAME algorigthm, you need to set up the minimum LD to a high value, perhaps 0.8 or more to consider it useful (from statistical purposes).
Also, weak or strong in my view, depends of distance. In some species (I work with pigs) we see r2=0.1 at several Mb distance. To me that is "not a weak LD" considering the distance.
I know I am not directly answering your question, but this is the way I think of LD: is it usable or not for a certain purpose?
What is the best method for calculating sample size for a case only study looking effects of 5 SNPs on the observed plasma drug levels, the SNPs having the following frequencies; 0.05, 0.10, 0.15 , 0.20 and 0.25 to reach a minimum power of 80%, does one assume p-value significance after taking into account correction (e.g. Bonferroni) or is the standard 0.05 appropriate?
I have worked on genotyping of two SNPs in TNF alpha gene and associate their frequency with OA disease. I have calculated their haplotypes also via manual method. Now one of the reviewer asked me to mention linkage disequilibrium r2 values for these SNPs in my population. I have genotypic data, haplotype frequencies and allele frequency data for these SNPs. can i calculate values of r2 from this data directly? is there any easy method?
What about this formula
r=d/(p1p2q1q2)1/2
Any other manual way? Any help would be highly appreciated.
Hi, I am using DNASP 6v for linkage disequilibrium analysis but the output show negative values for D' and R ( R insetad of R squared) . What is menaing?
I know that D' and R are always positive values so I can not understand why these kind of results came out . Thank you for your replies.
Samanta Zelasco the sign of the statistic indicates whether it is a positive or negative correlation (i.e. both variants increasing in proportion vs one variant increasing in proportion and the other variant decreasing in proportion).
In many cases people are only interested in whether or not SNPs are correlated, which is why the absolute values (or squares) are used.