I have seen many papers that use eGFR to produce labels for CKD or non-CKD. For calculating eGFR usually sex, race, Serum Creatinine, and age are used. Then they usually use more than 20 features to estimate the labels produced using these four factors and train ML models for it. Why don't they just use eGFR? Because ML algorithms can only be as good as the data they have seen. If our labels are produced using an equation, the best job an ML model can do is to fit that equation exactly. It cannot outperform that. If they use measured GFR I totally get it, but with estimated GFR, I don't.

Thanks

More Armin Frouzanfar's questions See All
Similar questions and discussions