InstructGPT LLM Model

From GM-RKB
(Redirected from InstructGPT)
Jump to navigation Jump to search

An InstructGPT LLM Model is a finetuned LLM model that is trained to follow instructions and complete requests thoughtfully.



References

2022

  • https://openai.com/blog/instruction-following/
    • QUOTE: We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API.

2022

  • https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md
    • QUOTE: InstructGPT is a GPT-style language model. Researchers at OpenAI developed the model by fine-tuning GPT-3 to follow instructions using human feedback. There are three model sizes: 1.3B, 6B, and 175B parameters.
      • Model date: January 2022
      • Model type: Language model
    • InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI. Our aim was to select a group of labelers who were sensitive to the preferences of different demographic groups, and who were good at identifying outputs that were potentially harmful. Thus, we conducted a screening test designed to measure labeler performance on these axes. We selected labelers who performed well on this test. We collaborated closely with the labelers over the course of the project. We had an onboarding process to train labelers on the project, wrote detailed instructions for each task, and answered labeler questions in a shared chat room.

      The dataset consists of input prompts (from the OpenAI API or written by labelers), demonstrations of the desired model behavior written by our labelers, and labeler rankings of outputs from multiple models. The text prompts submitted to the OpenAI API were from an earlier version of the InstructGPT models (trained via supervised learning on a subset of our demonstration data on the Playground interface). Customers using the Playground were informed that their data could be used to train further models via a recurring notification any time InstructGPT models were used. To reduce the risk of the models learning potentially sensitive customer details, we filtered all prompts in the training split for personally identifiable information (PII). Some of our prompts were also written by contractors themselves, because we needed an initial source of instruction-like prompts to bootstrap the process, and these kinds of prompts weren't often submitted to the regular GPT-3 models on the API.

2022

  • (Ouyang et al., 2022) ⇒ Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. (2022). “Training Language Models to Follow Instructions with Human Feedback.” arXiv preprint arXiv:2203.02155
    • ABSTRACT: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.