Text-to-image models are a type of machine learning model that takes an input natural language description and produces an image matching that description. These models have made great strides in recent years, as evidenced by models like GLIDE, DALL-E 2, Imagen, and more. The most effective models generally combine a language model, which transforms the input text into a latent representation, and a generative image model, which produces an image conditioned on that representation.
https://abdulkaderhelwan.medium.com/text-to-image-generation-model-with-cnn-ca904427d1e7