UNet is 'relatively' recent and was invented mainly to deal with biomedical images, yet is it also possible, for a limited dateset of images, to use Convolution filters + Random Forest instead of UNet?
You may use any pre-trained network like VGG16, ResNet50 and obtain its feature representation from the last conv layer. (For VGG, it is after the global average pooling layer). Use that representation to train an RDF classifier for your classification task.
Also, this approach works best if the images are very less. If you have even around ~1000 images, try training/fine-tuning the networks.