I am training a model using the Mask-RCNN deep learning model(There are multiple classes available ). I need to know for evaluation purposes how can I calculate the mAP(Mean Average Precision), mAR(Mean Average Recall), and F1 score correctly with k-fold cross-validation. I have noticed that different code segments out there in the issues section of the official repository regarding this matter. But the problem is there are mainly two approaches out there to calculate the F1 score and still the discussion is going on about which one is correct. Below source code is extracted from the issues section of Mask-RCNN repository(Link:https://github.com/matterport/Mask_RCNN/issues/2474) Even though one of the approaches is correct, according to my knowledge F1 is defined as follows.
PR(Precision)
RC(Recall)
F1 score = [(2 x PR x RC) x 100/(PR+RC)]
So I need to know,
1) Does PR = mAP and RC = mAR ?
2) If yes, then does calculating PR for a model mean calculating the mAP and calculating the RC 3) for a model mean calculating the mAR.Is my argument correct?
3) What do the precisions and recalls array contain?
4) What's the correct way to calculate mAp, mAR, and F1 metrics?
5) If I am using k-fold cross validation should I calculate each of these values at the end of each iteration and get the average?
Method 1
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image
from mrcnn.utils import compute_ap, compute_recall
from numpy import expand_dims
from mrcnn import utils
def evaluate_model(dataset, model, cfg):
APs = list();
F1_scores = list();
for image_id in dataset.image_ids:
#image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id)
scaled_image = mold_image(image, cfg)
sample = expand_dims(scaled_image, 0)
yhat = model.detect(sample, verbose=0)
r = yhat[0]
AP, precisions, recalls, overlaps = utils.compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
AR, positive_ids = compute_recall(r["rois"], gt_bbox, iou=0.2)
ARs.append(AR)
F1_scores.append((2* (mean(precisions) * mean(recalls)))/(mean(precisions) + mean(recalls)))#Method 1
APs.append(AP)
mAP = mean(APs)
mAR = mean(ARs)
return mAP, mAR, F1_scores
Method 2
mAP, mAR, F1_score = evaluate_model(dataset_val, model, inference_config)
print("mAP: %.3f" % mAP)
print("mAR: %.3f" % mAR)
print("first way calculate f1-score: ", F1_score)
F1_score_2 = (2 * mAP * mAR)/(mAP + mAR)#Method 2
print('second way calculate f1-score_2: ', F1_score_2)