Text-to-Image Model

From GM-RKB
Jump to navigation Jump to search

A Text-to-Image Model is a generative model that accepts a text and can produce an image file.



References

2023

  • chat
    • Researchers and organizations have developed several text-to-image models. Here are a few examples:
      • DALL-E: DALL-E is a neural network-based generative model developed by OpenAI that can generate images from textual input by combining various objects, animals, and scenes in novel and creative ways.
      • CLIPDraw: CLIPDraw is a recent model developed by OpenAI that can generate images from textual descriptions. The model is based on the CLIP (Contrastive Language-Image Pre-training) framework, which allows the model to understand natural language and visual concepts and generate images that correspond to the input text.
      • StackGAN: StackGAN is a model that generates high-resolution images from textual descriptions by using a two-stage generative approach. The model first generates a low-resolution image from the text input and then refines it to generate a high-resolution image.
      • AttnGAN: AttnGAN is a model that generates images from textual descriptions by using an attention mechanism that focuses on specific parts of the image. The model can generate images that are both diverse and realistic, and it can also generate images that correspond to complex and abstract concepts.
      • Generative Adversarial Text to Image Synthesis (2016) by Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele.
      • TAC-GAN (2017) by Jingyu Yang, Jiacheng Chen, Jun Zhu, and Yanghua Jin.
      • MirrorGAN (2019) by Ting-Chun Wang, Xiaodong Yang, Cheng-Yang Fu, Daniel McDuff, and Lei Zhang.
      • DM-GAN (2019) by Jinshan Pan, Han Zhang, Kai Yu, and Yawei Luo.
      • VQGAN+CLIP (2021) by Katherine Crowson. This model is a combination of a generative model called VQGAN and a language-image pre-trained model called CLIP, which allows it to generate images from text inputs.

2022

  • (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Text-to-image_model Retrieved:2022-12-12.
    • A text-to-image model is a machine learning model which takes as input a natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural networks. In 2022, the output of state of the art text-to-image models, such as OpenAI's DALL-E 2, Google Brain's Imagen and StabilityAI's Stable Diffusion began to approach the quality of real photographs and human-drawn art.

      Text-to-image models generally combine a language model, which transforms the input text into a latent representation, and a generative image model, which produces an image conditioned on that representation. The most effective models have generally been trained on massive amounts of image and text data scraped from the web.