Deep Learning has become an indispensable tool of Artificial Intelligence. Among its many applications, it is now commonly used to solve complex Computer Vision tasks through supervised learning.
For zero-shot image classification, the siamese neural network can be used, however, CLIP, UNITER ((UNiversal Image-TExt Representation), ViLBERT (Vision-and-Language BERT), GPT (Generative Pre-trained Transformer) Models are SOTA for zero-shot image classification.