OpenAI GPT-2 Large Language Model (LLM)
An OpenAI GPT-2 Large Language Model (LLM) is an transformer-based language modeling system, developed by OpenAI, that can predict a next word or phrase for an inputted natural language sequence.
- Context:
- It can be a successor to GPT-1 and a predecessor to GPT-3.
- It can have source code available at:
https://github.com/openai/gpt-2
. - It can be evaluated by a GPT-2 Benchmark Task.
- System's Architecture - It is based on a OpenAi-GPT LM Neural Network with the following modifications:
- a layer normalization (Ba et al., 2016) moved to the input layer of each sub-block;
- a pre-activation residual network (He et al., 2016) with an additional layer normalization added after the final self-attention block.
- a modified weight initialization which accounts for the accumulation on the residual path with model depth.
- a scaling of the residual layer's weights at initialization by a factor of $1/\sqrt{N}$.
- an extended vocabulary with 50,257 items.
- a context size is 1024 tokens; and batch size of 512 is used.
- Training System and other ML tools:
- It trains a GPT-2 LM Neural Network to solved several NLP and NLG tasks such as: modelling of common names, named entities , and long-range dependencies in text as well as neural machine translation, text summarization, and question-answering generation.
- It uses a byte pair encoding (BPE) for constructing input vector representations.
- It uses a GPT-2 web scraper to create the WebText dataset.
- It can (typically) be a Trained GPT-2 Model such as
- …
- Example(s):
- one as described in (Radford et al., 2019).
- GPT-2 Small (also known as "117M" or "DistilGPT-2").
- GPT-2 Medium (also known as "345M").
- GPT-2 Large (also known as "774M").
- GPT-2 Extra Large (also known as "1.5B").
- GPT-2 Explorer Online System (https://gpt2.apps.allenai.org/).
- a Trained GPT-2 ONNX Format Model [1].
- …
- Counter-Example(s):
- See: Glutamic-Pyruvic Transaminase 2, OpenCog, Open Neural Network Exchange, Open-Source Robotics, Transformer Network, Language Model, Artificial Intelligence.
References
2023
- chat
- GPT-2, or Generative Pre-trained Transformer 2, is an autoregressive language model developed by OpenAI. It is based on the Transformer architecture introduced by Vaswani et al. in 2017. Like GPT-3, GPT-2 also employs a single stack of Transformer layers without separate encoder and decoder components. The architecture mainly consists of self-attention mechanisms and feed-forward layers.
The full GPT-2 model has 1.5 billion parameters. However, OpenAI released several smaller versions of GPT-2 with fewer parameters, allowing users to choose a model that best fits their computational resources and performance requirements. Here's a list of the published GPT-2 model versions along with their number of parameters:
- GPT-2 Small (also known as "117M" or "DistilGPT-2"): 117 million parameters, the smallest GPT-2 model, designed for lower-resource tasks and faster response times.
- GPT-2 Medium (also known as "345M"): 345 million parameters, offering a balance between performance and computational requirements.
- GPT-2 Large (also known as "774M"): 774 million parameters, a larger model with improved performance compared to the smaller variants.
- GPT-2 Extra Large (also known as "1.5B"): 1.5 billion parameters, the largest and most powerful GPT-2 model, delivering the highest-quality results for various NLP tasks.
- GPT-2, or Generative Pre-trained Transformer 2, is an autoregressive language model developed by OpenAI. It is based on the Transformer architecture introduced by Vaswani et al. in 2017. Like GPT-3, GPT-2 also employs a single stack of Transformer layers without separate encoder and decoder components. The architecture mainly consists of self-attention mechanisms and feed-forward layers.
2019b
- (OpenAI, 2019) ⇒ https://openai.com/blog/better-language-models/
- QUOTE: Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.
GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages (the dataset which emphasizes diversity of content, by scraping content from the Internet. In order to preserve document quality, we used only pages which have been curated filtered by humans — specifically, we used outbound links from Reddit which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting (whether educational or funny), leading to higher data quality than other similar datasets, such as CommonCrawl.). ... GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data.
- QUOTE: Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper.
2019c
- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/OpenAI#GPT2 Retrieved:2019-9-8.
- GPT2 (2019) is an AI system that generates text matching its input in subject and tone. For example, when fed the first sentence of George Orwell's novel Nineteen Eighty-Four it produces plausible futuristic fiction set in China. Unlike previous OpenAI products, GPT2 has not been released to the public out of concerns of potential misuse, including applications for writing fake news. Much of the academic community is skeptical that GPT2 poses a significant threat. The Allen Institute for Artificial Intelligence followed up with a tool to detect "neural fake news". Other researchers, like Jeremy Howard, warn of "the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter".
2019d
- (Lee & Hsiang, 2019) ⇒ Jieh-Sheng Lee, and Jieh Hsiang. (2019). “Patent Claim Generation by Fine-Tuning OpenAI GPT-2.”
- QUOTE: Deep learning and pre-training models have demonstrated excellent results in several language tasks recently. Particularly, fine-tuning the pre-trained models such as ELMO (Embeddings from Language Models) [1], OpenAI GPT (Generative Pre-Training) [2], GPT-2 [3] and BERT (Bidirectional Encoder Representations from Transformers) [4 ] has become the best practice for state—of—the-art results. GPT-2 is the successor to GPT. Although both GPT-2 and BERT are capable of text generation, Wang and Cho [5] found that GPT-2 generations are of better quality. In fact, GPT-2 is claimed to be so powerful that the risk of its malicious use is high. For this reason, OpenAI decided to keep its largest model (1.5B parameters) closed so that there is more time to discuss its ramifications.
2019a
- (Radford et al., 2019) ⇒ Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. (2019). “Language Models Are Unsupervised Multitask Learners.” In: OpenAI Blog Journal, 1(8).
- QUOTE: We trained and benchmarked four LMs with approximately log—uniformly spaced sizes. The architectures are summarized in Table 2. The smallest model is equivalent to the original GPT, and the second smallest equivalent to the largest model from BERT (Devlin et al., 2018). Our largest model, which we call GPT-2, has over an order of magnitude more parameters than GPT. The learning rate of each model was manually tuned for the best perplexity on a 5% held—out sample of WebTeXt. All models still underfit WebText and held—out perplexity has as of yet improved given more training time.