I have trained a VGG-16 model toward a binary classification task. The model was trained on equal numbers of abnormal and normal images (n=2000). Literature studies demonstrate that model calibration is performed, when the model is trained on an imbalanced dataset, to rescale the probabilities to reflect the true likelihood of occurrence of the samples of a given class. However, I experimented to see if calibration impacts performance on a model trained on balanced dataset. The model, though, trained on a balanced dataset, underpredicted the positive class as observed from the below figure as the uncalibrated outputs were lying above the y=x diagonal. I observed that on applying various calibration methods, the expected calibration error (ECE) decreased compared to that obtained with the non-calibrated output (Non-calibrated ECE:0.039 vs. Platt calibrated output: 0.02). Also, the calibrated outputs closely followed the y=x diagonal. After calibration, the precision, Kappa, F-score, and MCC metrics increased compared to that obtained by the uncalibrated outputs.
I have the following questions: