About 10 years ago I published a paper in Aging Cell in which I suggested that the field of comparative studies of ageing had become confused because large numbers of papers had been published in which species specific values of some given physiological trait were compared to species differences in longevity without taking into account the shared variation due to body mass differences between the species (and incidentally their phylogenetic relationships - although that is a separate issue).
The paper is
Speakman, J.R. (2005) Correlations between physiology and lifespan – two widely ignored problems with comparative studies. Aging Cell 4: 167-175
The full text is available on researchgate, or on my web site at
www.abdn.ac.uk/energetics-research/publications
where the pdf can be downloaded (paper number 223).
Gustavo Barja has just published a short paper (letter to editor) suggesting that in fact this approach is flawed, and we should not remove this covariance due to body mass.
His paper is
Barja, G. (2014) Correlations with longevity and body size: to correct or not correct? Journals of Gerontology A Biol Sci Med Sci doi:10.1093/gerona/glu020
Having read his letter I remain convinced that removing body size the right way to do the analysis, and that any real association of physiology to lifespan, that was not an artefact of size effects, would be detected in the residuals. However, what does everyone else think? Should we remove body size effects or not?
Carlos. Thanks for your comment. I agree body size (mass) has a major effect on energy expenditure. The question is does the relationship between body size and longevity come about because of the effect of size on metabolism and free-radical production as you suggest, and as is generally inferred from the simple analyses advocated by Barja that we should not remove body size effects in the analysis?
If it does then you would expect that at a fixed body mass those individuals with high metabolism would live shorter lives, and visa versa thosewith low metabolic rates for their size would live longer. In fact if you do this analysis there is no significant association between metabolism and longevity in either mammals or birds (see Speakman 2005 Body size, energy metabolism and lifespan Journal of Experimental Biology 208: 1717-1730). In addition if energy expenditure is experimentally elevated there is also no link to free-radical production/damage or longevity (Selman, C., McLaren, J.S., Collins, A.R., Duthie, G.G. and Speakman, J.R. (2008) The impact of experimentally elevated energy expenditure on oxidative stress and lifespan in the short-tailed field vole Microtus agrestis. Proceedings of the Royal Society of London B 275:1907-16). Finally, as noted above by Jeff Arendts, within species it is the individuals that are smaller, and have higher metabolic rate, that live longer (e.g. in dogs: Speakman, J.R., van Acker, A., and Harper, E.J. (2003) Age related changes in the metabolism and body composition of three dog breeds and their relationship to life expectancy. Aging Cell.2: 265-279. In mice: Speakman, J.R., Talbot, D.A., Selman, C., Snart, S., McLaren, J.S., Redman, P., Krol, E.Jackson, D.M., Johnson,M.S. & Brand, M.D. (2004) Uncoupled and surviving: individual mice with high metabolism have greater mitochondrial uncoupling and live longer. Aging Cell 3:87-95.)
My feeling is that this example perfectly exemplifies the confusion that can be generated by performing the analysis without removing the effects of body size.
John
If we are only interested in a descriptive, correlative inference, then it is easy to make the argument that there is no need for body mass adjustment. If we are really implying causality, then it is not so obvious. Is body mass a confounder of the relation between physiology and longevity? It would appear so since rate of physiological processes would seem to scale with body mass, and it is well known that longevity also varies proportional to body mass. Is that the most important confounder? I don't know. But, if it is, then it would make sense to account for an obvious and important confounder. Note that the scaling of physiological processes with body mass could be highly nonlinear, so the typical covariance removal approaches might only remove a part of, but not all of, the confounding.
When we say that the relationship between physiology and longevity is causal, what we are really implying is that if we alter the physiology of a given organism, then its lifespan would change. To establish causality, body mass adjustment alone would not suffice. We need an understanding of the mechanisms, as well as a knowledge of other potential confounders; or alternatively, conduct a good experiment where physiology can be altered and lifespan be observed.
Please allow me to add another comment. Body mass in this context seems analogous to chronological age in a aging studies. No one believes that age is causally linked to health outcomes, but yet, the need for age adjustment to eliminate the most important source of confounding is virtually undisputed. How can one make sense of this? Age "represents" a host of known and unknown factors that are potentially causally linked to the exposure and the outcome of interest, hence it is accounting for confounding by "proxy", if you wish. It seems to me that body mass might play a similar role in the context under discussion.
To me undoubtedly, the field of comparative physiology largely benefits from large datasets and from phylogenetic regression and it is obvious that this approach is the the more appropriate. Completely similar is the necessity to account for body weight, "the most pervasive trait in physiology" as John has put it in his paper back then. How could we now just decide to not include it in equations where obviously a lot of variance is indeed explained by body weight? Is it not mandatory for us scientists to come up with valid models that account for all effects that are known to date? At least for me, this approach is the only acceptable one so the quick and clear answer is yes, we should remove body size effects. Teresa
P.S.to increase my citation rate and justify my emotional comment. Following up on John's paper in ACE in 2005, we back then compiled data on membrane fatty acid composition and lifespan and re- analysed the relationships between membrane unsaturation, lifespan and metabolic rate by both accounting for body mass and phylogeny just as was suggested (Aging Cell 6: 15-25). It turned out that many, but not all of the correlations fell apart. Still, one relationship between membrane fatty acid composition and lifespan remained robust (n-3:n-6 ratio and MLSP) and we then concluded that there might be a causal relationship between the two traits but other than previously suggested.
Ravi/Theresa
Thanks for your comprehensive answers. I agree.
Playing devil’s advocate for a moment however. Barja’s argument is basically this. The relationship between longevity and body size must come about by some physiological mechanism. It doesn’t just happen. And that physiological mechanism must also change with body size or else it wouldn’t generate the longevity effect. Hence to detect that impact we need to analyse without removing body size effects. That is because removing such effects removes the very things we are trying to understand. There is a sort of logic to this argument.
My view is that it is true that the relationship of body size to longevity must be caused by some aspect of physiology that also changes with body size. However making an uncorrected analysis we can’t distinguish things that are important from things that are also related to body size for completely different reasons, and hence end up related to longevity despite having no causal association. The examples in my original paper are eyeball diameter and leg length, for example, both of which end up related to longevity because they are related to body size. I think that if something is causally related then it will also have a relationship when size effects are removed. For example, low metabolic rate per gram of tissue is associated with greater body size and hence greater longevity. If this was causal we would also anticipate that at a fixed body size those with higher metabolic rates should live shorter than those with low metabolic rates – in fact when using basal metabolic rate as the metric they don’t (Speakman 2005 Body size, metabolic rate and Longevity. Journal of experimental biology). I wonder if there is any situation, however, where Barja might be correct? Is there a condition where a link may be present in the uncorrected data that isn’t also expected in the corrected data if the link is causal? I can imagine for example that in the corrected data detecting a link may be more difficult because the range of values is necessarily smaller – so a larger sample may be necessary to detect the smaller effect size….but if you measured enough you would find it. Does that logic make sense or are there situations where an effect could be in the uncorrected data and not found in the corrected data, no matter what the sample size or power issues?
I fully agree with all that has been stated above and particulary when pointing out that it is about the size of the respective data set. I'd like to quickly provide three available examples where the basic relationships remained even after correcting for both body weight and phylogeny.
1) again the inverse relationship between n-3 and n-polyunsaturated fatty acids and maximum lifespan (MLSP) see ACE 6 15-25.
2) the relationship between (low) rates of H2O2 production in isolated mitochondria of long-lived 'vertebrate homeotherms' (Lambert et al., ACE 6, 607-618)
3) the inverse relationship between longevity and methionine (Aledo et al. 2011, ACE 10, 198-207)
Also, I'd like to point out that even researchers who follow the logic of the "devil's" argument presented above have attempted to include the body size allometries in their studies by comparing size- matched creatures e.g. the pigeon- rat comparison or similarly, the naked mole rat vs. mouse comparison with somewhat interesting results.
Is the consensus therefore still trying to apply both corrections but apply it to large datasets comprising of many different species?
It seems to me the best solution, when one cannot really determine cause-and-effect relationships, is to present both results with correction for body size and results without correction.
I really just have 2 things to add that might be worth considering. Both relate to the idea that there are many different ways to alter size, thus we shouldn't expect the effects of size on either physiological traits or lifespan to be simple or consistent for all comparisons. First, although cross-species comparisons usually find a positive correlation between size and lifespan, there appears to be a negative correlation within species. Aside from domestic dog breeds, this is actually not well established but I find the Austad chapter (Animal size, metabolic rate, and survival, among and within species) in the 2009 book Comparative Biology of Aging to be a good summary of what is known. Second, despite a long history of trying to tie metabolic rate to a single scaling factor (usw. 2/3 or 3/4), the empirical data have never really backed this up. Recently, Kozlowski et al. (2003. PNAS 100(24):14080-14085) suggested metabolic scaling depended upon whether size was related to just changes in cell number, cell size, or a mixture. Some empirical data seem to back up this idea (Chown et al. 2007. Functional Ecology 21(2):282-290, Maciak et al. 2011. Functional Ecology 25(5):1072-1078, and Maciak et al. 2014. J. Evol. Biol. 27(3):478-487).
I guess what this boils down to in my mind is that the role of size will depend upon just what comparison is being made. It might be causative, a by-product, or a spurious correlation. I would also note that, until recently, I would have thought the role of cell size was less important for mammals than other groups (esp. invertebrates). However, the last paper cited above and a recent review in Science (vol 314 pp. 725-727) indicate that variation in cell size among mammals is probably quite important.
I agree with Dr. Gustavo Barja that body mass is an important variable which influences body energy expenditure. The last is strongly associated with free radicals production and longevity. Before 70 years-old having a lower body mass is associated with an increased life span. Notwithstandind, after that age a lower body mass has been positively associated with increased mortality risk.
Carlos. Thanks for your comment. I agree body size (mass) has a major effect on energy expenditure. The question is does the relationship between body size and longevity come about because of the effect of size on metabolism and free-radical production as you suggest, and as is generally inferred from the simple analyses advocated by Barja that we should not remove body size effects in the analysis?
If it does then you would expect that at a fixed body mass those individuals with high metabolism would live shorter lives, and visa versa thosewith low metabolic rates for their size would live longer. In fact if you do this analysis there is no significant association between metabolism and longevity in either mammals or birds (see Speakman 2005 Body size, energy metabolism and lifespan Journal of Experimental Biology 208: 1717-1730). In addition if energy expenditure is experimentally elevated there is also no link to free-radical production/damage or longevity (Selman, C., McLaren, J.S., Collins, A.R., Duthie, G.G. and Speakman, J.R. (2008) The impact of experimentally elevated energy expenditure on oxidative stress and lifespan in the short-tailed field vole Microtus agrestis. Proceedings of the Royal Society of London B 275:1907-16). Finally, as noted above by Jeff Arendts, within species it is the individuals that are smaller, and have higher metabolic rate, that live longer (e.g. in dogs: Speakman, J.R., van Acker, A., and Harper, E.J. (2003) Age related changes in the metabolism and body composition of three dog breeds and their relationship to life expectancy. Aging Cell.2: 265-279. In mice: Speakman, J.R., Talbot, D.A., Selman, C., Snart, S., McLaren, J.S., Redman, P., Krol, E.Jackson, D.M., Johnson,M.S. & Brand, M.D. (2004) Uncoupled and surviving: individual mice with high metabolism have greater mitochondrial uncoupling and live longer. Aging Cell 3:87-95.)
My feeling is that this example perfectly exemplifies the confusion that can be generated by performing the analysis without removing the effects of body size.
John
John,
I entirely agree that we must control for body mass (and phylogeny) in order to see whether residual variation in some trait, such as metabolic rate, is associated with longevity.
However, even if we need to statistically eliminate the confounding effect of body mass, we still need to explain the correlation between body mass and longevity. Which factor X that effects longevity is correlated with body mass? There are, of course, ideas out there to explain this. For instance, decreasing body mass is associated with increasing mortality of young adults, which is thought to reflect extrinsic mortality, namely predation (review in Rickleffs (2008) Functional Ecology 2008, 22, 379–392). This is important in the context of the disposable soma theory of aging (Kirkwood TB (1977) Nature 170:201–204) which predicts that investment into somatic maintenance (e.g. telomere lengthening, antioxidant defences), i.e., delaying aging, pays only if the risk of death by extrinsic causes (e.g. predators) is low. This theory explains a number of observations. For example, it has been argued that birds have lower initial mortality (and higher longevity) because they are more readily able to escape predators via flight than similar sized non-flying mammals (e.g., Rickleffs 2008). Further it was shown (Turbill et al. (2011) Proc Biol Sci 278(1723):3355–3363) that hibernators live ~50% longer than non-hibernators of the same size. This is also thought to be due to predator avoidance by retreat into underground hibernacula or caves, which leads to extremely low mortality during the hibernation season (Turbill et al. 2011, Lebl et al. (2011) Ecography 34(4):683–692). By combining two avenues of predator avoidance, hibernation and flight, bats achieve even lower extrinsic mortality and extremely high longevity (Austad& Fischer (1991). Journal of Gerontology, 46(2), B47-B53; Turbill et al 2011). I am mentioning these observations and theories because I had the impression that the above discussion was largely dominated by purely physiological arguments. Although I am a physiologist by training, I am convinced that we need to consider ecological factors that shape life histories, and that variation in longevity will never be explained by physiological constraints alone.
Many thanks for your perceptive input Thomas.
I guess the difference may also be characterised as looking for ultimate versus proximate causes. For example the low mortality of the bats may 'explain' at an ultimate evolutiionary level their longevity but tells us nothing about the proximate phyeiological reasons that mediate this effect. Plus I absolutely agree the body size effect on longevity must have a cause - both ultimate and proximate, but how we diagnose these effects from spurious things that are also related to body size will be very tricky.
John,
You asked a question: "I wonder if there is any situation, however, where Barja might be correct? Is there a condition where a link may be present in the uncorrected data that isn’t also expected in the corrected data if the link is causal?"
Let us use P for the putatively causal physiological process of interest; M for body mass, and L for longevity. You are asking whether it is possible for P to be a cause of L, but when M is adjusted for, P is no more causal.
Yes, this is likely. If M is a mediator of the effect of P on L, then adjusting for M can change the effect of P on L. In other words, if some of the effect of P on L occurs by affecting M, then adjusting for M can dampen or heighten the unconditional effect of P on L. Statistically speaking: Prob(L | P, M) need not equal Prob(L | P), when M mediates the action of P on L. This is sometimes referred to as "overadjustment." In your case, however, I don't think it is sensible to view body mass, M, as a mediator.
I find that it is often useful to draw a causal cartoon (also known as DAG - directed acyclic graphs) that reflects the causal relationships in the problem. Then, the rules of DAG can help you figure out the impact of adjustment or "conditioning" (as statisticians would call it). Epidemiologists have thought deeply about causality. You may want to read this paper in Amer J Epidemiology:
http://aje.oxfordjournals.org/content/155/2/176.full.pdf
In particular, take a look at DAG in Figure 5. I think this represents your situation more closely.
In my opinion, it is easy to separate the impact of mass on physiology and longevity to see whether physiology has an effect on longevity. This must be done because there is every reason to believe that longevity is associated with mass and therefore a relationship between longevity and physiology exists because both correlate with mass, which does not imply a function relationship between these factors. Each of these two characteristics must be made independent of mass. This can be done by determining the mathematical relationship of each to body mass. Then for each species, under the assumption that one is trying to determine the potential relationship between longevity and energy expenditure in a series of species, each species must have its measured metabolic rate divided by the one calculated from the scaling relationship and a similar calculation should be done for longevity. Now these factors are mass independent. Then question is does the mass-independent rate of metabolism correlate with the mass-independent longevity? The most convenient metabolic rate would be the basal rate of metabolism, which has been measured in more species of mammals than any other available rate of metabolism.
Thanks for your input Brian. That is exactly what was done in my 2005 paper in JEB and the answer is no relationship in either birds or mammals once the mass effects are removed if we use BMR as the metric.
In reply to Ravi, mediation is possible and especially if the effect of P on L is fully mediated via M, one could (assuming equal measurement error) compare the correlations among them, if the correlations are equal, mediation cannot be excluded. Please correct Ravi? Whether such physiology would be interesting, I do not know, it will likely be little different from physiology regulating interspecific differences in body mass (M), (that are correlated to unknown 'causal' processes affecting lifespan).
One important point however away from statistics, is that body mass is not size (and vice versa). Meaning that if one corrects for body mass, the residual variance that is left, might be due to differences in size/body mass ratio. This might very well be important in shaping life-histories and lifespan. Especially because it changes heat dissipation/retention and the associated metabolic processes. Moreover starvation/predation trade-offs will be different and these likely shape the trade-off of somatic investment decreasing intrinsic mortality rate (in response to extrinsic mortality pressure). Therefore as with any analysis correcting or not correcting for unknown or known variables can change the outcome. However when one does NOT correct for body mass, all axes converge on the size or body mass continuum, and neither can be distinguished.
Hence it might seem worthwhile reporting the relationship without body mass correction and with body mass correction, usually the two will differ and one can find many spurious correlations with lifespan that can be misinterpreted to be causal. Like for example size.. ;). Therefore I still think correcting for body mass is the sensible (conservative) thing to do.
It remains intriguing however that the body mass lifespan correlation is so strong. Would we really expect such strong covariation with for example extrinsic mortality rates or is there something about the physiology of size that we do not know. For example giraffes might have evolved body size to reach leaves high up in trees, whereas the size of a deer is based on reproductive and male competition. These two will likely have different physiological outcomes or trade-offs with lifespan. Therefore a more causal relationship between processes regulating (at least interspecifically) body mass and lifespan is plausible. This will be very worthwhile to investigate, but I doubt the comparative route is the way to go there, expect if one could distinguish such selective pressures on body mass.
Sorry the post become longer than I anticipated.
Thanks for your comments Mirre, I confess I am among the guilty who have (mis)used size and mass as synonymous, when of course they are not. As regards the last point. The r2 values for the relationships between mass and longevity are 0.39 for mammals and 0.458 for birds. This compares to the equivalent values for mass v BMR relationships which are 0.915 and 0.958 for mammals and birds respectively. so in fact the relationships are NOT very strong at all.
John - Yes you are right 'strong' is of course subjective. But that the correlation of mass v BMR is stronger does not tell you that the relationship with mass and longevity is not 'strong' compared to other associations with lifespan.
So, for example if the relationship between BMR and lifespan stronger than the relationship between mass and lifespan, one might include that BMR tells you something more than mass about lifespan.
Note that when you take the residuals of mass (or include mass in the model) you look at a differential axis in the data derived for mass. My point is that this residual variation need not be the same as the variation on the main axis of BMR vs lifespan (for example). It would be interesting to just do a comparison of effect sizes without correcting for mass, and when correcting for mass (or taking the partial correlations). This might give a clue. One will need to ignore measurement error of the variables of interest but with most core "life-history" measures this should be OK. There are also different scalings of variance that might help here and/or partial correlations in path analysis.
I must admit I have not read the Barja paper I should. Some of the arguments I give here might be in there as well (or not).
Again, I still think correcting for mass makes sense in comparative analyses, but there are potential pitfalls and there is room for additional approaches. But we should agree that simply reporting the association between BMR and lifespan without mentioning mass at all does not help us at all (I would say..).
I just did some rank correlations (to disregard any differences in distribution causing the effects, not including phylogeny) in a mammal dataset.
The relationships with body mass are stronger than the relationships with litter size, not looking at significance (all variables uncorrected for mass). Quite interesting, but of course not conclusive or anything. Might be a nice project to really get to the bottom of this.
all Rs
Max longevity - litter size -0.39
Max longevity - litters per year -0.57
Max longevity - Birth weight 0.69
Max longevity - Weaning weight 0.81
Max longevity - Adult weight 0.57
Max longevity - Growth rate -0.18
Max longevity - Metabolic rate 0.47
The issue of correcting for body mass may need to be considered more generally. Often in scaling studies body mass is functionally being used as a surrogate for metabolic rate. At least for mammals using mass as a surrogate is confounding analyses as metabolic rate does not scale linearly with body mass (e.g. most recently Kolokotrones et al, 2010). Therefore, residuals calculated based on linear analyses of nonlinear data reflect variation due to both an inappropriate statistical model and biological variation. I bring this issue up as recently my own analyses trying to directly compare birds, mammals and reptiles for a number of different variables using metabolic rate data showed pattern differences from earlier analyses where mass was used as the independent variable.
The standard for estimating the mass-independent effect of metabolic rate must be a power function, which turns out to be linear only after logging the data. Furthermore, mass only accounts for some of the variation in quantitative characters: to get a more complete analysis of the factorS 'determining', e.g., basal rate, one must bring in the various behavioral and ecological factors that distinguish one species from another. (See, McNab, 2003, CBP, 135A: 357-368)
If one had a physiological characteristic X that completely determined both lifespan and body size/mass and one used size/mass as a covariate wouldn't the residuals be expected to be random? If there was some pattern to the residuals wouldn't that suggest that another characteristic had an effect on both lifespan and characteristic X but not on size/mass in at least some species?
it depends on what you mean by the residuals being random. If x 'completely determined' lifespan and body size i.e. with correlations equal to 1.0 then the residuals would all be = to 0 and there would be no space for another variable to explain any residual variance.
If, as in practice, our measurements, etc., have 'error' then there would be residuals; those residuals are unlikely to be related to X
That's true - but still if the residuals only reflected measurement error there would also still be no variance available to be explained by anything else.
Yes, I agree and in this unnatural example there should not be. In this example, all the variance would have been 'explained' by the covariate. The 'true' relationship between X and lifespan would not be seen. In practice, in cases where the physiological characteristic affects both lifespan and body size/mass, and body size/mass is used as the covariate then isn't one relying on the relationship between the physiological variable and lifespan being much stronger and significant than that between the physiological variable and body size/mass to find it? Wouldn't one risk missing all those cases were the physiological characteristic was correlated with both lifespan and body size/mass but although more with lifespan not necessarily much more closely to lifespan than body size/mass?
What would happen (happens?) if instead of using body size/mass as a covariate one looks at partial correlation coefficients between lifespan, the physiological variables and body size/mass?
I think you are correct that this is an implicit assumption in the underlying model. But since longevity cannot in any sense be directly 'caused' by body size, but must be mediated via a physiological variable, this seems a reasonable assumption. If a physiological variable x contributes causally to lifespan variation but also is correlated with body size (or even contributes causally to body size variations) the assumption is that removing the covariance due to body size will expose the true relationship of x to lifespan. I take the point regarding the above scenarios but I think the scenarios you raise where this wouldn't happen seem unrealistic biologically - ie they hinge on a strong causal link of lifespan to body size. You are correct that partial correlations achieve the same thing as including size as a covariate.
Although partial correlations have a similar goal as using a covariate I don't think that they necessarily produce the same mathematical results. I'm not a mathematician so when faced with statistical analyses that intuitively cause me doubts I create examples, run analyses and modify the examples, run analyses, etc.
In this case, in the examples I have analyzed so far, when I regress body size against lifespan and then correlate the residuals against the physiological characteristic the strength of that correlation seems to depend on the size of the difference between the correlation of lifespan with body size (r1) and that of lifespan with the physiological characteristic (r2). The stronger/'larger' r2 is in comparison to r1 the stronger the correlation between the residuals and the physiological characteristic. However, the partial correlation between lifespan and the physiological variable (body size 'constant') is larger than the equivalent correlation between the physiological variable and the residuals. The partial correlation can be significantly larger.
I assume that using body size as a covariate removes (accounts for) as much of the variance as possible through body size leaving a minimum for the physiological variable but that partial correlation (or multiple regression) shares the variance between body size and the physiological variable more equitably, placing less importance on the relationship between body size and lifespan than does the covariate analysis.
My misgivings may be touched upon in these papers ("Bias due to 2-stage residual-outcome regression analysis in genetic association studies" Genet Epidemiol. 35(7): 592–596, 2011. "Risk Factors, Confounding, and the Illusion of Statistical Control" Psychosomatic Medicine 66:868–875, 2004)
It appears that the partial correlation of a physiological characteristic with lifespan (statistically controlling for body size/mass) would measure the extent to which the residual of lifespan and the residual of the physiological characteristic are correlated with each other. A semipartial correlation (confusingly sometimes called the part correlation) measures the extent to which the residual of lifespan (statistically controlling for body size/mass) and the physiological characteristic are correlated with each other [Semipartial Correlation, M. Lauriola, In The SAGE Encyclopedia of Social Science Research Methods, Volume 1 edited by M. S. Lewis-Beck, A. E Bryman, T. Futing Liao, Sage Publications, pg 1018-1020, 2003].
The semipartial correlation would typically be lower than the relevant partial correlation [Lauriola, 2003]. The semipartial correlation seems to correspond to using a covariate.
A thought experiment modifying the leg and body size example.
For a group of animals I measure the amount of tree leaves eaten (absolute amount - ratios or proportions are not of interest), I measure leg length and I measure body size/mass. I'm interested in determining if there is a positive correlation between leg length and the amount of tree leaves eaten. However, body size and the amount of food eaten are positively correlated and body size and leg length are positively correlated. I want to statistically control for body size since my interest is in whether taller animals eat more tree leaves (perhaps because they can reach more leaves or it is easier to reach them, etc.) and not because they simply eat more food (including tree leaves). That is, is there a positive correlation between amount of tree leaves eaten and leg length once body size/mass is controlled?
Then I measure the amount of tree leaves eaten, body size/mass and the cellular concentration of an enzyme that is involved specifically in the metabolism of tree leaves. I'm interested in whether there is a relationship (correlation) between the amount of tree leaves eaten and the concentration of the enzyme. I already know that there is a positive correlation between body size and the amount of tree leaves eaten. Should I statistically control for body size/weight? I don't think I should.
Maurice - thanks for your input and for drawing attention to these two papers. I am however having trouble sourcing them. Are you the author? if not can you send details of the authors to more easily locate them.
With respect t the scenario in your last post. I think this is interesting but not quite analogous to the ageing problem where the issues being discussed here originated. The situation in your post is where the enzyme "is specifically involved" in metabolism of leaves ie there is prior knowledge of its function. Hence the link to the dependent variable (eating leaves) is established already, and the question is does this enzyme respond to the greater intake of leaves in larger animals. Clearly in that scenario one wouldn't remove the size effect..but it isnt directly relevant to the ageing question.
Let me modify the scenario slightly so it matches the issues we are debating more closely, and then see if you maintain your position that you should not statistically control for size/weight. So in the new scenario, just like before, you measure the amount of tree leaves eaten, and body size/mass, but this time you measure the activity of an enzyme that you don't know its function- but you think it could be related in some way to eating the leaves. What you do know is that like all enzymes its activity is related to body mass/size. You are interested to know whether the enzyme is related to digestion or metabolism of tree leaves? Should you statistically control for the effect of size/mass?
I think the answer is unequivocally yes. Otherwise you will get an enormously high probability of a false positive. Anything related to mass/size (and almost everything in biology is!), will end up related to eating leaves - only because both are related to mass/size.
I'm sorry; I should have included the complete references.
("Bias due to 2-stage residual-outcome regression analysis in genetic association studies" Serkalem Demissie and L. Adrienne Cupples, Genet Epidemiol. 35(7): 592–596, 2011. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3201714/pdf/nihms305270.pdf
"Risk Factors, Confounding, and the Illusion of Statistical Control", Christenfeld NJ, Sloan RP, Carroll D, Greenland S, Psychosomatic Medicine 66:868–875, 2004, http://www.psychosomaticmedicine.org/content/66/6/868.full.pdf).
I'm running a large number of examples using the formula for semipartial correlations from Lauriola (2003, above).
I will post again, including revisiting the leaf-enzyme example later today hopefully.
My apologies but in this post I may be very 'long winded'. It seems much more difficult to explain in writing than in speaking.
Some of the questions that provide information which I would use to determine the answer to the modified tree-leaf enzyme example.
In the modified enzyme-tree leaves example, how do I know that the enzyme's activity is related to body mass/size? Have I experimentally modified body size mass and observed a change in the enzyme concentration? Have I experimentally modified enzyme concentration and observed a change in body size/mass? Or have I observed that there is a correlation between body size/mass and the enzyme concentration under some set of circumstances? If the relationship is an observed correlation, is the correlation present and the same under all circumstances?
Whether I would automatically apply a statistical correction depends on all the information that would be available concerning any possible relationships between the enzyme concentration and the other variables. If I had no other knowledge other than a general trend of correlations between physiological characteristics and body size/mass I don't think I would automatically use a covariate. I might provide simple correlations, partial correlations, and the covariate analysis.
In the case of the leg length, body size/mass example there is an apparent reasonable relationship between the two characteristics, that is, they do not seem to be independent. The relationship between body size/mass and lifespan seems less reliable - it does not seem to be present or the same under all circumstances. It does not seem obvious why those two characteristics might or could not be independent.
My idea in posting two examples, involving leg length and enzyme concentration was to provide one example of circumstances where it was obvious that statistically controlling for body size/mass was appropriate and a second example where it was obvious that there are circumstances where it is inappropriate (and therefore would produce a false negative).
I could modify the enzyme example: thus I don't know a priori that the enzyme is involved in tree leaf metabolism. When I statistically control for body size there is no correlation between the absolute amount of tree leaves eaten and the enzyme concentration. Many years later another researcher experimentally feeds large animals small and large amounts of tree leaves and feeds smaller animals small and large amounts of tree leaves. The finding is that the absolute amount of tree leaves eaten is positively correlated to the enzyme concentration - body size/mass is not relevant. By statistically controlling for body size I had produced a false negative.
Say a physiological characteristic is correlated with lifespan and the physiological characteristic is correlated with body size. When that is the case, body size will be correlated to lifespan through its relationship with the physiological characteristic. That would seem to be the case no matter what other relationships there might be either directly or indirectly between body size/mass and lifespan.
Something like B is related to A at 0.9 B is related to C at 0.8 and C is therefore related to A at 0.9 X 0.8 = 0.72. That will not be the way the correlation coefficients are actually mathematically related but it is my attempt at an analogy. There could be other factors and relationships that modify (increase or decrease) that 0.72 value. If I apply a statistical method that removes that 0.72 contribution from the 0.9 value between B and A then I have removed too much or something I should perhaps not removed.
The question in my mind is how often does using body size/mass as a covariate result in false negatives?
In the examples I have run using semipartial correlation (covariate) versus partial correlation it seems that the vast majority of the time the semipartial correlation (covariate) is lower than the corresponding partial correlation. That seems a bias that would produce more false negatives than partial correlations would do (and partial correlations must themselves produce false negatives at some unknown frequency).
Given that there is a large number of variables that are correlated to body mass/size that would all be found related to lifespan if no correction was made, and the relatively small number of variables that are causally related to lifespan in a manner that would not be revealed by analysis of residuals, once size/mass effects had been removed, the risk of a false positive must be considerably stronger (probably by orders of magnitudes) than the risk of a false negative.
The difficulty I have is with the statement " the relatively small number of variables that are causally related to lifespan in a manner that would not be revealed by analysis of residuals, once size/mass effects had been removed".
Please bear with me as I try another example/model.
I have three characteristics I can measure: lifespan, physiology (some single physiological characteristic), and body size.
The biology of the system (which is unknown to the experimenter) is that the variability in the physiological characteristic causes the variability in the lifespan and it causes the variability in the body size. The relationships are linear. There is no causal relationship at all between the variability in body size and lifespan. Unless I add 'error' to the measured values the biological relationship between physiology and lifespan should be 1.0 and that between physiology and body size should be 1.0 while that between body size and lifespan should be 0.0 If the analysis of the observational study produced that finding then the analysis would represent the biology present. It is easy to model that system, generate some sets of values and calculate the three correlations. The unavoidable problem is that although there is no biological causal relationship between body size and lifespan there is a mathematical one. Instead of producing two correlations of 1.0 and one of 0.0 the observed result is three correlations of 1.0. Body size is automatically mathematically correlated with lifespan through its relationship with the physiology.
Correlation is not causation. If I calculate partial correlations they will be equal and less than 1.0. If I calculate the semipartial correlation to remove the effect of body size (residuals analysis) then the correlation between physiology and lifespan would be 0. None of these hint at the underlying 'true' biological relationships.
To make the model more realistic I could add 'error' independently to the two sets of causal relationships. That would change the expected relationship between physiology and lifespan from 1.0 to some value less than 1. It would change the relationship between physiology and body size to less than 1.0 independently to some value less than 1. There is no a priori reason to require that the amount of 'error' between physiology and lifespan is smaller or larger than the amount of 'error' between physiology and body size. That is, the relationship between physiology and body size might drop from 1.0 to 3.0 while that between physiology and body size might drop from 1.0 to 0.6. Due to the 'error' introduced into the causal relationships between the physiology and lifespan and between physiology and body size, the mathematical relationship between body size and lifespan will also drop from 1.0 to some value dependent on the causal relationships. Calculating semipartial and partial correlations now would produce more natural values. The residual analysis would probably produce a reduced correlation between physiology and lifespan but not zero. The partial correlations would no longer be equal. The results would still not represent the biology.
With the three characteristics of lifespan, physiology and body size there are at least 36 ways I can model the causal relationships depending on whether I use two direct causal relationships (leaving the third to be indirect) or three causal relationships. I have attached a file with 12 of those models.
My knowledge of physiology is ancient (1968). So I can blithely consider the potential models and come to the conclusion that the majority of the time model number 1 is likely to be the biological situation, perhaps sometimes model 2, perhaps sometimes model 3 and sometimes model 4, etc.. Thus statistically controlling for body size would not be automatic. Objectively, I should consider that each of the 36 models has some frequency or probability of occurring with those probabilities depending on the circumstances of each study (species, physiological variable, etc.). Then statistically controlling for body size (semipartial correlation - residuals analysis) for the correlation between lifespan and physiology will have some effect (possibly different) in each model. Finally I would consider whether that effect is positively conducive to learning about the 'true' underlying biological situation or negatively.
I doubt that is going to happen.
The end result for me is that I would examine the circumstances for each individual study. Where it is clear/obvious/etc. that body size can affect the physiological variable in a manner that is undesirable then I would provide, simple correlations, partial correlations and semi-partial (residuals analysis). But not necessarily otherwise. So if I was measuring whole animal oxygen consumption as the physiological variable I would partial out body size. If I was measuring the actual oxygen consumption of individual cells I would not. (That might cause a problem if variability in body size was determined by variability in cell size or cell size and cell number rather than cell number alone - and perhaps there other similar situations that would cause problems but those would be other observations that would be desirable in such a study).
Another paper of possible interest and which provides a number of references into the literature of statistical control:
The Illusion of Statistical Control: Control Variable Practice in Management Research
K. D. Carlson and J. Wu
Organizational Research Methods 2012 15(3): 413-435
https://umdrive.memphis.edu/dsherrll/public/SCMS8540/Carlson%20%26%20Wu-2012.pdf
I must apologize for belabouring the tree leaves example and possibly for a reductio ad absurdum argument.
In the tree leaves example, if I independently experimentally modify body size/mass by surgically adding or removing muscle or fat there will be a change in the amount of food eaten (presumably) and therefore potentially in the amount of tree leaves eaten. Therefore it seems correct to statistically control for body size/mass. After such an experimental manipulation would there be a change in the actual amount of enzyme measured in individual cells? If I cannot simply/reasonably? assume that there would be, (or I don't know) then I don't think I should automatically statistically control for body size/mass for the/an enzyme study.
There are obvious situations where a similar surgical experimentation would presumably change some physiological variables, such as total animal oxygen consumption. However, would it affect the majority of the physiological variables that have been measured in such research, particularly those variables that do not seem to have an obvious relationship with body size/mass?
My apologies, in the post three above (with the attached file), the 3.0 should have been 0.3 and the text "to some value less than 1" after "independently" should be deleted.
Thinking about cause and effect: there are at least some physiological variables which can be experimentally manipulated while controlling for all other potential factors. For example, by genetically engineering extra copies of one gene to increase cellular enzyme concentrations and using gene silencing to reduce cellular enzyme concentrations. But how can body size/mass be experimentally manipulated while controlling physiology? One could surgically implant extra muscle or fat, add extra skin, perhaps extra liver lobes. That is manipulating components of body size/mass and most organisms are unlikely to be able to produce changes in body size/mass using a similar method. Body size/mass is cell size/mass and cell number. it is difficult to think of how cause and effect can be tested by experimentally modifying cell size/mass and cell number without affecting cell physiology. Cell fusion? That makes me consider that body size/mass cannot normally be separated from physiology in most circumstances and cannot be considered as a cause but is an effect of cellular physiology. Body size/mass encompasses the fundamental components of all cell sizes and all the cells of an organism. It seems to be a surrogate variable for all the physiological aspects that determine cell size and cell number. It is then not surprising that many aspects of physiology are correlated with body size/mass.
That leads me to re-examine what may be the origin of situation.
Body/size mass is found to be correlated with lifespan. The next step would be to resolve body size/mass into its components. I am going to substitute weight for size/mass below to save typing.
body weight = the sum of all organ and tissue weights, etc.
I measure liver weight. I correlate it with lifespan. There is a correlation. Should I statistically control (residuals analysis) for body weight? No. That would defeat the entire purpose. Having found a correlation between body weight and lifespan the next question is surely how is that correlation determined? I do not want to effectively remove that correlation from analysis. Or put another way, if I statistically control for body weight I am removing that part of lifespan that is affected by body weight leaving that part that is unrelated to body weight. If I statistically control for body weight and then attempt to correlate for example, liver weight with that part of lifespan unrelated to body weight I should expect no correlation. When I do that I am asking, is liver weight correlated with that part of lifespan unaffected by body weight. However, the correlation between body weight and lifespan is the correlation of lifespan with each component of body weight altogether. In the simplest case where body weight and all its components were related to lifespan identically and I statistically control for body weight the end result would be that although body weight and lifespan were correlated none of the components of body weight would be correlated with lifespan.
Decomposing liver weight would produce the liver cell sizes and number. If I correlate those with lifespan there is no change in the rationale for not statistically controlling for body weight. Decomposing liver cell sizes and number to physiological variables (such as cellular enzyme concentrations) the rationale does not change.
If I find a correlation between body weight and lifespan I am surely trying to understand how that correlation is created. I would be trying to partition that correlation amongst all the components of body weight at all levels of organization down to cellular physiology. I cannot do that if I effectively remove that correlation from the analyses.
As an aside, in the example of liver weight it is obvious that the correlation of body weight with lifespan is not mathematically independent of liver weight. Any correlations calculated would have to involve the variables (body weight minus liver weight) and liver weight. That correction does not remove all dependencies.
Rephrased one last time.
Having found a correlation between body weight and lifespan should I then only devote effort looking for other completely independent relationships or delve deeper into the relationship already discovered. Or both?
Maurice, I think it is important, reiterating some of the earlier posts, that you distinguish intraspecific effects of size – which you seem to be mostly discussing, and inter-specific effects which have been mostly the discussion in the ageing field, by for example Barja and also Pamplona etc. Regarding intra-specific effects there is no need to think about manipulations such as implanting or removing organs. Generally these are not very successful because simply putting tissue into an animal does not work very well because there is no nervous or blood supply to the implanted organ (see eg studies recently implanting brown adipose tissue and other tissues into mice) unless some very fancy surgery is considered. But this is unnecessary because it is very easy to generate body size impacts by disrupting the growth hormone axis during development. There is a very large literature on such effects and generally the results are clear that the impact of making animals smaller, by reducing GH or IGF-1 levels is that they live longer – completely the opposite of the inter-specific effect of body size. The mechanisms involved are still under debate but I have for example shown recently a link to fatty acid composition in the tissues of such dwarf mice (Phospholipid composition and longevity: lessons from Ames dwarf mice. AGE (Dord) 2013 Vol. 35, 2303-2313).
Studies of interspecific correlations between body size/mass and lifespan are different. There is no credible experimental manipulation that will turn one species into another so these relationships will always remain correlations. Given that is the case removing the body size/mass effect, for example as in my earlier study of fatty acid composition, referred to in an earlier post, is essential unless we are to generate a string of false targets to pursue. Please also see our most recent publication on this topic in mitochondrial membranes (Making heads or tails of mitochondrial membranes in longevity and aging: a role for comparative studies, 2014, Longev & Healthspan in press). Perhaps the cause of the body size correlation with lifespan between species will just remain impossible to find - and we should perhaps then accept that trying to find it is not the best approach to elucidating the causes of ageing.
Regards, Teresa
Teresa,
Thank you for the comments and very importantly for the reference to the open access paper (as I do not have academic access to scientific literature other than that available to the general public).
Yes, I am not looking at correlations between species or any other groupings. Those basically involve correlations between 'means' more or less. And they can be positive, negative or non-existent between grouping while correlations based on individuals are consistent. I would describe correlations between species as examining the way the relationship between two characteristics has evolved. Statistically, such correlations are termed "ecological correlations" and introduce their own set of things to look out for.
I will look at your paper, but I don't think that it will help resolve the fundamental differences involved in the answer to the question. From above:
When examining relationships between traits such as physiology and longevity should we remove the covariance due to body mass?
I am repeating my basic answer here.
It depends.
if one wants to find a new completely independent (from body size) relationship between lifespan and some variable x then remove the covariance.
If one wants to learn more about the relationship between body size and lifespan then no, partition the covariance.
Lifespan = body size + "error" (this does not mean just measurement error but includes and all other factors in the model that affect lifespan).
If one wants to find the other factors that are present in the "error" then one removes the covariance and "checks" the relationship between the factor being examined and adjusted lifespan.
If one wants to find the factors that are part of the body size - lifespan relationship then one does not remove the covariance, one does not adjust lifespan one, partitions the covariance.
I would remove the covariance if I wanted to investigate the lifespan - "error" relationships. I would partition the covariance if I wanted to investigate the lifespan - body size relationships.
Let me take this a step further.
When I am interested in the lifespan - body size relationship I decompose body size.
body size = liver size + skin size + heart size + kidney size + blood size + ... + skeletal size. if I measure all the body parts then I have equality. An animal that weighs 30 g when measured alive will have all its parts total 30g when each is measured separately (disregard surgical losses for the example).
That means
lifespan = body size (1) is replaced completely by
lifespan = liver size + skin size + heart size + ... + skeletal size (1).
If the correlation in (1) is 0.32 then the correlation in (2) is 0.32. (a given).
I can do a multiple regression using (2). The variable 'body size' does not enter in any direct way- it does not get statistically controlled - I can decompose each organ and tissue weight to cell size and cell counts and then further down to physiological variables, such as cellular enzyme concentrations and body size is not statistically controlled because its 'covariance' is being partitioned (more or less).
I find I am repeating myself.
"There is a very large literature on such effects and generally the results are clear that the impact of making animals smaller, by reducing GH or IGF-1 levels is that they live longer "
I don't disagree with this. The question generated by such a finding is::
Did the animals live longer because they were smaller?
Or
Did the animals live longer because reducing GH makes them smaller and reducing GH makes them live longer?
To examine those two questions one would have to work out all the intermediate steps on the physiological/biosynthetic? path from reduced GH to increased lifespan and from reduced GH to reduced body size, step by step.
OK. I accept that I am failing to get my point across. I will try another example. Perhaps removing it from the subject area of lifespan will help.
I go into a field. I count the number of seeds on individual plants and I measure their height. I find a correlation. I also measure the concentration of nitrogen in the soil in which each plant is growing. I find there is a correlation (simple Pearson) between nitrogen concentration and the number of seeds. I statistically control for the relationship between height and seed count and then find that there is no correlation of seed count with nitrogen.
Over the following years I repeat the purely observational study many times (changing fields, etc.). After controlling for height the relationship between seed count and nitrogen never remains.
Then someone else repeats the basic study design; finds a correlation between seed count and height; finds a correlation between nitrogen and seed count. When height is statistically controlled, the correlation between nitrogen and seed count remains. They suggest a hypothesis that explains my results and their results and indicates that I should not have statistically corrected for height.
The purpose of this example is to show that there are times when statistical correction should be applied and other times when it should not. In practice, I think each study should be examined to determine if applying statistical control is appropriate with respect to its research goals.
The hypothesis that explains all the results does not rely on experimental/methodological errors. I will post the answer in two days.
There are several ways of explaining the differences between the two situations in the post above.
1) There were fewer other factors correlated with height in situation 1 than in situation 2.
2) The correlation between height and nitrogen was stronger in situation 1 than it was in situation 2.
Perhaps an interesting way that might produce such results is if inbred strains were used in situation 1 and individuals from 'wild' populations were used in situation 2.
Phenotypic correlation = genetic correlation + environmental correlation.
Genetic and environmental correlations can have opposite signs.
Phenotypic correlations of characters in inbred strains are environmental correlations as there are no genetic (or next to no) genetic differences between individuals.
Phenotypic correlations in populations where genetic variation is segregating (e.g. wild populations) have both genetic and environmental correlations as components.
Genetic correlations are based on genetic linkage or pleiotropy.
A similar difference may affect the intra-species and interspecies correlations of body mass with lifespan. 'Domesticated' strains/breeds typically have less genetic variability than wild populations. Phenotypic correlations within those groups may be weighted towards environmental correlations. Phenotypic correlations across species may be weighted more towards genetic correlations.
Dobzhansky (Genetics and the Origin of Species, 1951, 3rd edition) wrote:
"The survival values of the different phenotypes which can arise on the basis of a given genotype are often unequal. The phenotypes which develop in response to environmental influences which recur regularly in the normal habitats of a species are usually adaptive and conducive to survival. The reactions to environmental stimuli which the species encounters rarely or never in its normal habitats, are on the contrary, seldom adaptive."
The environments domesticated species experience are likely to include many influences that those species have never encountered in their evolutionary histories. Perhaps their phenotypes and the correlations between them are maladaptations. Or perhaps they are what happens when one extrapolates outside of the evolutionary 'normal' range of conditions.
This reference may be of interest (J. K.Christians, "Controlling for Body Mass Effects: Is Part-Whole Correlation Important?", Physiological and Biochemical Zoology 72(2):250–253. 1999) - even apparently obvious situations are not necessarily so when it comes to statistical analyses.
My research has found about 36 biological parameters and physiological factors vary with body size, such as height, weight and BMI. In some cases only two body size factors apply.
Prey species are small and have short lifespans. Their populations expand rapidly to match the carrying capacity of the environment, and then crash when conditions are less favorable. Short lifespans help the species to match the carrying capacity of the environment.
Predator species, especially top predators, are larger and live longer. By controlling for body size, you are also removing an ecological influence.
Metabolic data should not be indexed to body size.