Dear Colleagues
I hope all is well.
I have done machine learning model for prediction of survival among kidney patients.
My boss is keen to do subgroup analysis based on ethnicity to see if our prediction model metrics hold on minor ethnicities or no
My question:
To my understanding, I cannot do subgroup from the whole dataset as this will lead to data leak . Is that right?
In this case, the only way to do subgroup analysis is by taking a subgroup from the test dataset , not the training dataset.
Is that right?