1-During Prediction, how is the process actually applied. would each bounding box with different aspect ratio be allowed to output a prediction and an offset?
2-How actually is the convolution layer producing 5 outputs exactly (4 offset and 1 confidence)?
3-How is the scale related to the dimension of the feature map , would I apply bounding boxes with different scales on same Feature map?