Text-to-* Model Prompt Programming Task

From GM-RKB
Jump to navigation Jump to search

A Text-to-* Model Prompt Programming Task is a programming task that requires the creation of a AI model text prompts (for a text-to-* model) to solve a prompt-based text-to-* model inference task.



References

2023

2023

  • (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Prompt_engineering Retrieved:2023-6-17.
    • Prompt engineering is a concept in artificial intelligence, particularly natural language processing. In prompt engineering, the description of the task that the AI is supposed to accomplish is embedded in the input, e.g. as a question, instead of it being explicitly given. Prompt engineering typically works by converting one or more tasks to a prompt-based dataset and training a language model with what has been called "prompt-based learning" or just "prompt learning".

2023

Overall, prompt engineering is a key component of developing and optimizing text-to-* models, as it can help to improve their accuracy, relevance, and effectiveness for a given task or application.

2023b

  • (ChatGPT-OpenAi, 2023) ⇒ https://chat.openai.com
    • ... Another term that more specifically reflects the AI and NLP-focused nature of prompt engineering is “prompt programming". Prompt programming refers to the process of creating prompts or queries that are used to elicit specific responses from NLP models. The term "programming" emphasizes the technical nature of the task and suggests a more structured approach to designing prompts tailored to the needs of specific NLP models. ...

2023c

  • (Liu et al., 2023) ⇒ Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. (2023). “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.” In: ACM Computing Surveys, 55(9).
    • QUOTE: ... Now, as of this writing in 2021, we are in the middle of a second sea change, in which the “pre-train, fine-tune” procedure is replaced by one in which we dub “pre-train, prompt, and predict.” In this paradigm, instead of adapting pre-trained LMs to downstream tasks via objective engineering, downstream tasks are reformulated to look more like those solved during the original LM training with the help of a textual prompt. For example, when recognizing the emotion of a social media post, “I missed the bus today,” we may continue with a prompt “I felt so ” and ask the LM to fill the blank with an emotion-bearing word. Or if we choose the prompt “English: I missed the bus today. French: ”), then an LM may be able to fill in the blank with a French translation. In this way, by selecting the appropriate prompts we can manipulate the model behavior so that the pre-trained LM itself can be used to predict the desired output, sometimes even without any additional task-specific training (Table 1(d); e.g., Brown et al. [9], Petroni et al. [100], Radford et al. [105], Schick and Schütze [120]). The advantage of this method is that, given a suite of appropriate prompts, a single LM trained in an entirely unsupervised fashion can be used to solve a great number of tasks [9, 131]. However, as with most conceptually enticing prospects, there is a catch — this method introduces the necessity for prompt engineering, finding the most appropriate prompt to allow a LM to solve the task at hand. ...

2023d

2022

  • (Zhou et al., 2022) ⇒ Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. (2022). “Learning to Prompt for Vision-language Models.” International Journal of Computer Vision 130, no. 9
    • ABSTRACT: Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming—one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose Context Optimization (CoOp), a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt’s context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts.