I'm working on multimodal emotion detection and I'm using a dataset that contains images and text where the two data are related to each other as the posts we are doing on Instagram Facebook which contain a caption that represents the person feelings