01 January 1970 1 3K Report

Dear Colleagues

I hope all is well.

I have done machine learning model for prediction of survival among kidney patients.

My boss is keen to do subgroup analysis based on ethnicity to see if our prediction model metrics hold on minor ethnicities or no

My question:

To my understanding, I cannot do subgroup from the whole dataset as this will lead to data leak . Is that right?

In this case, the only way to do subgroup analysis is by taking a subgroup from the test dataset , not the training dataset.

Is that right?

More Hatem Ali's questions See All
Similar questions and discussions