I am working on an image classification competition for an internship. Dataset images are segmented buildings pasted on black background. The task is to guess if the top is made of concrete, plastic, and so on for a total of 6 classes. The challenge is images vastly in aspect ratios some very wide some very tall. I padded them on a square sized image keeping the aspect ratio. But I am not getting anywhere. train, Val, test accuracies are in synch this seems too good to be true. Confusion matrix is also not showing any major shift. I have 74% accuracy across three sets + - 2%
But when I submit prediction on unlabeled data to rank I get an f1 score of only 16%. I am using frozen effnetB2. I never worked with satellite images before what am I missing