OpenAI GPT-2 Large Language Model (LLM)

From GM-RKB
(Redirected from OpenAI GPT-2 System)
Jump to navigation Jump to search

An OpenAI GPT-2 Large Language Model (LLM) is an transformer-based language modeling system, developed by OpenAI, that can predict a next word or phrase for an inputted natural language sequence.



References

2023

  • chat
    • GPT-2, or Generative Pre-trained Transformer 2, is an autoregressive language model developed by OpenAI. It is based on the Transformer architecture introduced by Vaswani et al. in 2017. Like GPT-3, GPT-2 also employs a single stack of Transformer layers without separate encoder and decoder components. The architecture mainly consists of self-attention mechanisms and feed-forward layers.

      The full GPT-2 model has 1.5 billion parameters. However, OpenAI released several smaller versions of GPT-2 with fewer parameters, allowing users to choose a model that best fits their computational resources and performance requirements. Here's a list of the published GPT-2 model versions along with their number of parameters:

      • GPT-2 Small (also known as "117M" or "DistilGPT-2"): 117 million parameters, the smallest GPT-2 model, designed for lower-resource tasks and faster response times.
      • GPT-2 Medium (also known as "345M"): 345 million parameters, offering a balance between performance and computational requirements.
      • GPT-2 Large (also known as "774M"): 774 million parameters, a larger model with improved performance compared to the smaller variants.
      • GPT-2 Extra Large (also known as "1.5B"): 1.5 billion parameters, the largest and most powerful GPT-2 model, delivering the highest-quality results for various NLP tasks.

2019b

  • (OpenAI, 2019) ⇒ https://openai.com/blog/better-language-models/
    • QUOTE: Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.

       GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages (the dataset which emphasizes diversity of content, by scraping content from the Internet. In order to preserve document quality, we used only pages which have been curated filtered by humans — specifically, we used outbound links from Reddit which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting (whether educational or funny), leading to higher data quality than other similar datasets, such as CommonCrawl.). ... GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.

2019c

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/OpenAI#GPT2 Retrieved:2019-9-8.
    • GPT2 (2019) is an AI system that generates text matching its input in subject and tone. For example, when fed the first sentence of George Orwell's novel Nineteen Eighty-Four it produces plausible futuristic fiction set in China. Unlike previous OpenAI products, GPT2 has not been released to the public out of concerns of potential misuse, including applications for writing fake news. Much of the academic community is skeptical that GPT2 poses a significant threat. The Allen Institute for Artificial Intelligence followed up with a tool to detect "neural fake news". Other researchers, like Jeremy Howard, warn of "the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter".

2019d

2019a