Why there is no difference in calibration performance at its optimum operating threshold?

Sivaramakrishnan Rajaraman @Sivaramakrishnan-Rajaraman

30 July 2021 0 6K Report

I trained a VGG-16 model on a highly imbalanced dataset where the positive samples (class-1) were only 20% of the negative samples (class-0) ( # positive samples: 100 and # negative samples: 500). The trained model was evaluated on a test set with equal number of positive and negative samples (n=100 each). I calibrated the model outputs using temperature scaling to rescale the probabilities to represent the true distribution of positive samples. The table below shows the performance of the baseline (uncalibrated) model and that obtained after calibration using temperature scaling.

I observed that the baseline model at its default threshold (T=0.5) did no good. However, after temperature scaling of the baseline model outputs, the performance greatly improved in terms of all metrics. The log loss, Brier, expected calibration error (ECE), and Maximum calibration error (MCE) obtained with the recalibrated probabilities were the least. I then identified the optimum operating threshold (T=0.18) using Geometric Means for the baseline model to obtain the best trade-off between sensitivity and specificity. The baseline model performance greatly improved as well here. After recalibrating the probabilities using temperature scaling, I identified the optimum threshold (T=0.48) using Geometric Means. I am surprised to see that the performance obtained with optimum threshold (T=0.18) using the baseline model uncalibrated output probabilities and the performance obtained with the optimum threshold (T=0.48) for the temperature-scaling-based recalibrated output probabilities were exactly the same. How could I interpret this behavior?

Badges
Science topic

More Sivaramakrishnan Rajaraman's questions See All

How do I create a custom function for shrinking and perturbing model weights in Keras?

I am referring to this study https://proceedings.neurips.cc/paper/2020/file/288cd2567953f06e460a33951f55daaf-Paper.pdf entitled "On Warm-Starting Neural Network Training". Here, the authors...

24 January 2023 921 1 View

Why the structural similarity index measure decreases with decreasing image resolution?

I computed the structural similarity index (SSIM) value between a ground truth mask and its corresponding predicted mask in an image segmentation task using a UNet model. Both the ground truth and...

12 January 2023 7,403 2 View

Is there any good free app to make concept maps for the Literature review or to present conceptual framework theoretically?

How to draw concepts maps while writing thesis .

05 July 2022 4,721 3 View

Is there any software which would write the references in APA style 7th Edition?

Are there any software's that would help me enhance the quality of writing the different chapters?

29 May 2022 1,403 3 View

How to interpret calibration curves in terms of class imbalance?

I am training a deep learning model toward a image classification task. The VGG-16 model is trained individually on two different training sets with varying degrees of data imbalance. Set-1 had...

29 July 2021 2,946 3 View

Does calibration impact performance in a model trained with balanced dataset?

I have trained a VGG-16 model toward a binary classification task. The model was trained on equal numbers of abnormal and normal images (n=2000). Literature studies demonstrate that model...

26 July 2021 6,253 4 View

What would be the best way to combine several denoised images?

I have generated denoised images using several models and would like to ensemble at the prediction level to achieve superior denoising results. What would be the best way to combine (averaging,...

10 March 2021 2,270 1 View

How to choose the best segmentation model using the area under the precision recall curve, IOU and Dice metrics?

I am using several U-Net variants for a medical image segmentation task. I get the following values for the performance measures including Dice, IOU, Area under receiver-operating characteristic...

27 December 2020 7,525 3 View

How to perform pixel-level mappings between different views of chest x rays?

I have a frontal and lateral chest x ray projection for a given patient as shown in the attached figures. I am thinking about performing pixel-level mappings across these views. What are the steps...

26 October 2020 7,617 3 View

Synthesis of black TiO2 powder by vacuum calcination?

I have synthesized TiO2 powder through sol gel method. I tried to do calcination of this white TiO2 powder under vacuum at 500 deg C. No change in colour was observed. Is this normal? because I...

20 August 2020 5,624 4 View

Using OBD technique i am trying to measure laser induced shockwaves velocity i found that at start velocity increases and then decay?

i am unable to interpret why its increases in start as shown in figure

11 August 2024 2,179 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why only alpha wave of brain are found in some patients?

I have found an EEG where only alpha waves are present. Beta waves are not found in active patients. What interpretations ?

26 July 2024 4,741 1 View

How to interpret results of AST for Pseudomonas aeruginosa?

While doing AST for Pseudomonas aeruginosa, after incubation, no zone of inhibition observed in the plate near the well. wells surrounded by bacterial growth, when the same plate observed under UV...

25 July 2024 9,229 1 View

Does anyone know the scoring and interpretation for Psychological Birth Order Inventory (PBOI) of White Campbell?

Hi, I am currently a upcoming 4th year student who is need of your help as I couldn't find any accessible file for the manual scoring and interpretation for the PBOI - White Campbell. My group and...

23 July 2024 2,269 1 View

Can you please interprete this analysis?

This is a analysis on IHC scoring of WNVs and USUV performed with culex pipiens

22 July 2024 1,880 0 View

Correct statistical approach for a hermit crab shell exchange experiment?

Hello, I'm planning a shell exchanging experiment with two marine, hermit crab species inside of a tank. I only have one tank available for this experiment. I plan on running 30-40 shell...

20 July 2024 5,676 3 View

What analysis to use for an dependent variable with repeated measures and a independent variable only measured once?

Hi all, I am trying to use mixed effect model to analyze my data, which including a baseline measurement for my exposure (A), and repeated measurements for the outcome (B). I do have some...

17 July 2024 8,682 3 View

How to bring baseline to zero for an absorbance data for chromatogram?

I forgot to autozero during the run (Size exclusion chromatography.) and later i realised i forgot to do that and the baseline was not zero but below zero (and in some cases it above zero). I...

15 July 2024 5,551 6 View