May linear regression show that an method/methods is/are valid even though this method(s) is not valid according to bland-altmand and paired sample t test.
If the observations are paired (paired t-test or Bland-Altman), then this pairing should be taken into account with the regression analysis. This might entail a mixed effects model, or using the difference or ratio of observations as the dependent variable.
Thank you for the answers. I would like to make my question more clear. I have one gold standard method and 5 different method
Bland Altmands shows low agreement between the methods. Also, there is significantly difference between the methods. Only one method has high agrement and not significanlty different(CP5, please see bland altman results)
Note: If CoV (SEE%) value is lower than %5, It is accepteble for reliability and validity (or method comparison studies) in sports sciences.
What is reason of the conflicting results ?
1) May ıt be an reason there was 15 participants even though 15 is enough in sports sciences studies?
2) the description of CT(gold standard) is the highest exercise intensity where VO2 can be stabilized before reaching VO2peak. It is so sensitive physiological threshold. 10 or 15 W increase can affect this physciological threshold. Hence, may linear regression be not sensitive to realise 10w changes.
3) I may even misinterpret the results of linear regression for comparison method studies. How should I interpret it correctly for comparison method studies
What do you think about hypothetical reasons above?
What do you suggest as an another analysing methods on method comparisons or agreement studies?
I shared an doc and pdf to explain better.
Note: I just gave one linear regression results (CT and CP3) ın the doc.
If my explanation is not clear, please let me know to explain it better
All of your CP methods are well correlated to the CT method. And you are right that your treatments, except CP5, are lower in value than CT. The low %CV and high r-square of your regression reflect the correlation but don't account for the lack of agreement. This is the whole point of the Bland Altman analysis: It separates the idea of correlation from that of agreement. So there's no conflict in your results; it just matters whether you care about the methods being correlated or being in agreement.
Thank you four your answer. The answer made me a little bit confused. I may have misunderstood the answer. I would like to explain why ı am confused about it. Yes, Linear regression is based on correlation. But, As much as I read, regression analysis and CV are common methods in agreement studies in sport sciences. I am sharing one study to express myself better.
You might find these articles interesting to read: http://sportsci.org/jour/04/amb.htm and https://sportsci.org/jour/04/wghbias.htm
Since this is a murky area, my advise is not to disregard either Bland Altmann nor linear regression. Given unclear results (and of course a no sig diff), to follow this up by an informed decision on meaningful difference in your measurements. What is acceptable, what not (you can use here the 5% or 10% CV rule) but in reality this might be a large difference. Have you calculated individual SEE values?
I don't want to disregard both of them. I am thinking SEE or CV% should evaluate according to knowledge of the field ( the field is Critical power). I would like to explain more.
According to our results, SEE between CT(It is our gold standard and true CP based on description of CP) vs. CP1, CP2, CP3, CP4, CP5, was between 13-15 W.
As the power outputs of cyclists were high, CV's were under 5%.
As we knew, Critical power is so sensitive threshold and some studies showed "the metabolic steady state impairs when athletes exercise above critical power (15W above Critical Power). However, our study showed that VO2 at CT+15 reached VO2peak.
As you said "It may be large difference in reality"
How can we believe 15 W SEE is not important and makes it valid because the models have CV under 5%.
Hence, May ıt be better to evaluate result of SEE or CV% according to specific topic ( like Critical Power)?
If I understand it correctly, yes I calculated individual SEE derived from each mathematical modelling. Each models fit well (R2>0,96 and %5>SEE%)
Refik Cabuk , I think it's important to be clear what CV means in a linear model. It assesses the variation in comparing the predicted y values and the actual y values. But it doesn't say anything about whether the x values and y values are similar in magnitude.
As an example, take, x = (1, 2, 3, 4, 5) and y = (6, 7, 8, 9, 10.1). The two are strongly correlated. The r-square of the linear model will be high. And the cv is < 1%. But none of this tells you if x and y and similar in magnitude.
Bettina Karsten thanks for the articles. I almost read them.
But I need to note an important point (as implied by Salvatore). One's aim when performing a method comparison study is to compare the two methods (x:candidate and y:gold standard) and see if they agree. It is not his/her aim to find out if one method (x) "predicts" the gold standard (y) or find the degree of prediction.
So the question of how the predicted values (yhat) are close to real values of gold standard (y) (which is what SEE reveals) is not one's objective. Rather he/she wants to reveal the bias (differences of x and y, not the differences of yhat and y). SEE is about prediction and can have a limited usage in a method comparison framework.
A close look at the BA plots provided by the poster shows that methods CP1-4 underestimate the gold standard CT (look at the graphs including the 45 degree lines). I am curious if and how regression will reveal these insights.
If the standard error rate is below 5 percent, the x method may be an alternative to the gold standard method. The %5 CV rule in Linear regression is authoritarian and popular analysis method for agrement, validity and reliability studies, like Bland-Altman. In my opinion, When I evaluated results of analysing method (Linear regression, Bland-Altman and Paired Sample T test) and my results, I started to think of "If the difference smaller like my results (312W vs 326W) (please see the previous messages for the details), It may gives conflicting results (between 5% CV rule and Bland Altman or Paired)". But, no any study reported like this conflicting results. I wonder whether I am the first person about this conflicting results.
As you mentioned previous answer, ıt may not tells similarity in magnitude because the basis of linear regression is based on correlation relation. Hence, Sal Mangiafico Mehmet Sinan Iyisoy as a statiscian, Would you say CV rule is not so correct method to show agrement or validity?
If I am wrong in any point, please feel free to make it correct
EDIT: This response heavily edited from the original.
What's clear is that the CV from linear regression doesn't really address the question of "absolute agreement", where I'm defining "absolute agreement" here as the values of the two methods being equal in absolute value.
This is easy to see with toy data. Take for example X=(1, 2, 3, 4, 5) and Y=(6, 7, 8, 9, 10.1). The CV from the linear regression is very low. But the "absolute agreement" is terrible.
Some confusion may come from the fact that there are different things for which you can be calculating the CV.
If you want to measure repeatability, you might use sd/mean of a single set of measurements. (I think this is what the paper you cited is doing).
And the CV of a linear model has been mentioned (which I think is what you were doing).
If you want to assess the "absolute agreement" of paired measurements, I think the way I would assess this is to calculate the RMSE of (Method - Gold Standard) and divide that by the mean of Gold Standard. This is sometimes called CV(RMSE), and is analogous to the way CV is calculated for linear models.
"But what if you want to assess the agreement of paired measurements, like you have? I think the way I would assess this is to calculate the RMSE of (Method - Gold Standard) and divide that by the mean of Gold Standard. This is sometimes called CV(RMSE), and is analogous to the way CV is calculated for linear models. "
As far as I understand, you don't calculate CV (RMSE) from linear regression for the agrement, Hence, may I ask/learn where or how you calculate RMSE of two method?
It's important to ask whether you want two methods to agree absolutely or to be concordant. In concordant measurements, the methods rank the targets in the same order. Bland and Altman addresses absolute agreement. You might look at intraclass correlation here, case 3.
Refik Cabuk , I added some R code and results for my toy data to my response above. The calculation for RMSE is pretty simple. Hopefully that makes sense.
This has a better interpretation and value for agreement studies than SEE/mean(GoldStandard) or the CV that you described.
I think ICC is more relevant also than SEE/mean(GoldStandard), RMSE and CV.
I need to point that Bland Altman graphs show mean differences (bias), which are crucial to interpret. In Refik's case, those biases are very large for CP1-4 and very close to 0 for CP5. This is another aspect of Bland Altman analysis, which can not be corresponded to by regression.
I believe Refik's data is a good example that why such regression methods are not well suited for an agreement study. (Ronan, we are supposing absolute agreement is intended).
The only interpretation I can make for SEE/mean(GoldStandard)
Sal Mangiafico CP1 does not have good agreement with CT. There is also a substantive systematic bias. People wrongly use CV type of measures to assess agreement.