Pretrained Large Language Model (LLM): Difference between revisions

Latest revision as of 02:38, 7 February 2025

A Pretrained Large Language Model (LLM) is a pretrained language model that is a large language model.

Context:
- It can be an input to a In-Context Learning System.
- It can be an input to a LLM Fine-Tuning System.
- ...
- It can range from being a Pure Pretrained LLM to being a Finetuned LLM (such as an instruction-tuned LLM).
- ...
Example(s):
- a General Purpose Pretrained LLM, such as GPT-4.
- a Domain-Specific Pretrained LLM, such as:
  - a Pretrained Biomedical LLM (e.g. BioGPT) or a Pretrained Protein LLM.
  - a Pretrained Software LLM, such as Codex LLM.
  - a Pretrained Finance LLM, such as Bloomberg LLM.
  - a Pretrained Legal LLM, such as [[]].
- a Proprietary Pretrained LLM, such as:
  - a Google Pretrained LLM, Azure Pretrained LLM, ...
- a Base LLM, such as: llama31-405b-base-bf-16.
- …
Counter-Example(s):
- a Pre-Trained Small Language Model.
- a Pre-Trained Image Generation Model.
See: Language Model Metamodel, LLM Architecture, ULMFiT.

References

2023

(Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List_of_large_language_models Retrieved:2023-3-19.

List of large language models
Name	Release dateTemplate:Efn	Developer	Number of parametersTemplate:Efn	Corpus size	LicenseTemplate:Efn	Notes
BERT	2018	Google	340 million^[1]	3.3 billion words^[1]	Apache 2.0^[2]	early and influential language model^[3]
GPT-2	2019	OpenAI	1.5 billion^[4]	40GB^[5] (~10 billion tokens)^[6]	MIT^[7]	general-purpose model based on transformer architecture
GPT-3	2020	OpenAI	175 billion	499 billion tokens^[6]	Template:Public web API	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.^[8]
GPT-Neo	March 2021	EleutherAI	2.7 billion^[9]	825 GiB^[10]	MIT^[11]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[11]
GPT-J	June 2021	EleutherAI	6 billion^[12]	825 GiB^[10]	Apache 2.0	GPT-3-style language model
Ernie 3.0 Titan	December 2021	Baidu	260 billion^[13]^[14]	4 Tb	Proprietary	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[15]	December 2021	Anthropic	52 billion^[16]	400 billion tokens^[16]	Template:Closed beta	fine-tuned for desirable behavior in conversations^[17]
GLaM (Generalist Language Model)	December 2021	Google	1.2 trillion^[18]	1.6 trillion tokens^[18]	Proprietary	sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
LaMDA (Language Models for Dialog Applications)	January 2022	Google	137 billion^[19]	1.56T words^[19]	Proprietary	specialized for response generation in conversations
Megatron-Turing NLG	October 2021^[20]	Microsoft and Nvidia	530 billion^[21]	338.6 billion tokens^[21]	Restricted web access	standard architecture but trained on a supercomputing cluster
GPT-NeoX	February 2022	EleutherAI	20 billion^[22]	825 GiB^[10]	Apache 2.0	based on the Megatron architecture
Chinchilla	March 2022	DeepMind	70 billion^[23]	1.3 trillion tokens^[23]^[24]	Proprietary	reduced-parameter model trained on more data
PaLM (Pathways Language Model)	April 2022	Google	540 billion^[25]	768 billion tokens^[23]	Proprietary	aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer)	May 2022	Meta	175 billion^[26]	180 billion tokens^[27]	Template:Non-commercial research Template:Efn	GPT-3 architecture with some adaptations from Megatron
YaLM 100B	June 2022	Yandex	100 billion^[28]	1.7TB^[28]	Apache 2.0	English-Russian model
BLOOM	July 2022	Large collaboration led by Hugging Face	175 billion^[29]	350 billion tokens (1.6TB)^[30]	Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
AlexaTM (Teacher Models)	November 2022	Amazon	20 billion^[31]	1.3 trillion^[32]	Template:Public web API^[33]	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	February 2023	Meta	65 billion^[34]	1.4 trillion^[34]	Template:Non-commercial research Template:Efn	trained on a large 20-language corpus to aim for better performance with fewer parameters.^[34]
GPT-4	March 2023	OpenAI	UnknownTemplate:Efn	Unknown	Template:Public web API	Available for ChatGPT Plus users. Microsoft confirmed that GPT-4 model is used in Bing Chat.^[35]

2023

(Zhao, Zhou et al., 2023) ⇒ Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. (2023). “A Survey of Large Language Models.” In: arXiv preprint arXiv:2303.18223. doi:10.48550/arXiv.2303.18223

2022

(Li, Tang et al., 2021) ⇒ Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. (2021). “Pretrained Language Models for Text Generation: A Survey.” arXiv:2105.10311 https://doi.org/10.48550/arXiv.2201.05273
- ABSTRACT: Text Generation aims to produce plausible and readable text in a human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. We begin with introducing three key aspects of applying PLMs to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.

↑ ^{Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper
↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022
↑ Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease
↑ "Better language models and their implications". https://openai.com/research/better-language-models.
↑ ^{Jump up to: 6.0} ^6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3.
↑ "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog
↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
↑ ^{Jump up to: 10.0} ^10.1 ^10.2 Template:Cite arxiv
↑ ^{Jump up to: 11.0} ^11.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo
↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.
↑ Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu.
↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731.
↑ "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.
↑ ^{Jump up to: 16.0} ^16.1 Template:Cite arxiv
↑ Template:Cite arxiv
↑ ^{Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog
↑ ^{Jump up to: 19.0} ^19.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog
↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
↑ ^{Jump up to: 21.0} ^21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint
↑ Template:Cite conference
↑ ^{Jump up to: 23.0} ^23.1 ^23.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog
↑ Template:Cite arxiv
↑ Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog
↑ "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
↑ Template:Cite arxiv
↑ ^{Jump up to: 28.0} ^28.1 Template:Citation
↑ Cite error: Invalid <ref> tag; no text was provided for refs named bigger-better
↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
↑ "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
↑ Template:Cite arxiv
↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.
↑ ^{Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog
↑ Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.

[bert-paper-1] {Jump up to: 1.0} ^1.1 Cite error: Invalid <ref> tag; no text was provided for refs named bert-paper

[bert-web-2] "BERT". March 13, 2023. https://github.com/google-research/bert.

[Manning-2022-3] Cite error: Invalid <ref> tag; no text was provided for refs named Manning-2022

[15Brelease-4] Cite error: Invalid <ref> tag; no text was provided for refs named 15Brelease

[5] "Better language models and their implications". https://openai.com/research/better-language-models.

[LambdaLabs-6] {Jump up to: 6.0} ^6.1 "OpenAI's GPT-3 Language Model: A Technical Overview" (in en). https://lambdalabs.com/blog/demystifying-gpt-3.

[7] "gpt-2". GitHub. https://github.com/openai/gpt-2. Retrieved 13 March 2023.

[chatgpt-blog-8] Cite error: Invalid <ref> tag; no text was provided for refs named chatgpt-blog

[gpt-neo-9] "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.

[Pile-10] {Jump up to: 10.0} ^10.1 ^10.2 Template:Cite arxiv

[vb-gpt-neo-11] {Jump up to: 11.0} ^11.1 Cite error: Invalid <ref> tag; no text was provided for refs named vb-gpt-neo

[12] "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront" (in en). https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. Retrieved 2023-02-28.

[13] Nast, Condé. "China's ChatGPT Black Market Is Thriving". https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu.

[14] Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan et al. (December 23, 2021). ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv:2112.12731. http://arxiv.org/abs/2112.12731.

[15] "Product" (in en). https://www.anthropic.com/product. Retrieved 14 March 2023.

[AnthroArch-16] {Jump up to: 16.0} ^16.1 Template:Cite arxiv

[17] Template:Cite arxiv

[glam-blog-18] {Jump up to: 18.0} ^18.1 Cite error: Invalid <ref> tag; no text was provided for refs named glam-blog

[lamda-blog-19] {Jump up to: 19.0} ^19.1 Cite error: Invalid <ref> tag; no text was provided for refs named lamda-blog

[20] Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.

[mtnlg-preprint-21] {Jump up to: 21.0} ^21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mtnlg-preprint

[“gpt-neox-20b”-22] Template:Cite conference

[chinchilla-blog-23] {Jump up to: 23.0} ^23.1 ^23.2 Cite error: Invalid <ref> tag; no text was provided for refs named chinchilla-blog

[24] Template:Cite arxiv

[palm-blog-25] Cite error: Invalid <ref> tag; no text was provided for refs named palm-blog

[26] "Democratizing access to large-scale language models with OPT-175B" (in en). https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.

[27] Template:Cite arxiv

[:0-28] {Jump up to: 28.0} ^28.1 Template:Citation

[bigger-better-29] Cite error: Invalid <ref> tag; no text was provided for refs named bigger-better

[30] "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.

[31] "20B-parameter Alexa model sets new marks in few-shot learning" (in en). 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.

[32] Template:Cite arxiv

[33] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. Retrieved 13 March 2023.

[llama-blog-34] {Jump up to: 34.0} ^34.1 ^34.2 Cite error: Invalid <ref> tag; no text was provided for refs named llama-blog

[35] Lardinois, Frederic (March 14, 2023). "Microsoft’s new Bing was using GPT-4 all along". https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/. Retrieved March 14, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

@@ Line 1: / Line 1: @@
-#REDIRECT [[Pretrained Large Neural Language Model (Pretrained LLM)]]
+A [[Pretrained Large Language Model (LLM)]] is a [[pretrained language model]] that is a [[large language model]].
+* <B>Context:</B>
+** It can be an input to a [[In-Context Learning System]].
+** It can be an input to a [[LLM Fine-Tuning System]].
+** ...
+** It can range from being a [[Pure Pretrained LLM]] to being a [[Finetuned LLM]] (such as an [[instruction-tuned LLM]]).
+** ...
+* <B>Example(s):</B>
+** a [[General Purpose Pretrained LLM]], such as [[GPT-4]].
+** a [[Domain-Specific Pretrained LLM]], such as:
+*** a [[Pretrained Biomedical LLM]] (e.g. [[BioGPT]]) or a [[Pretrained Protein LLM]].
+*** a [[Pretrained Software LLM]], such as [[Codex LLM]].
+*** a [[Pretrained Finance LLM]], such as [[Bloomberg LLM]].
+*** a [[Pretrained Legal LLM]], such as [[]].
+** a [[Proprietary Pretrained LLM]], such as:
+*** a [[Google Pretrained LLM]], [[Azure Pretrained LLM]], ...
+** a [[Base LLM]], such as: [[llama31-405b-base-bf-16]].
+** …
+* <B>Counter-Example(s):</B>
+** a [[Pre-Trained Small Language Model]].
+** a [[Pre-Trained Image Generation Model]].
+* <B>See:</B> [[Language Model Metamodel]], [[LLM Architecture]], [[ULMFiT]].
+----
+----
+== References ==
+=== 2023 ===
+* (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Large_language_model#List_of_large_language_models Retrieved:2023-3-19.
+{| class="wikitable sortable"
+|+ List of large language models
+|-
+! Name !! Release date{{efn|This is the date that documentation describing the model's architecture was first released.}} !! Developer !! Number of parameters{{efn|In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.}} !! Corpus size !! License{{efn|This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated.}} !! Notes
+|-
+| [[BERT (language model)|BERT]] || 2018 || [[Google]] || 340 million<ref name=bert-paper/> || 3.3 billion words<ref name=bert-paper/> || {{yes|Apache 2.0}}<ref name=bert-web>{{Cite web|url=https://github.com/google-research/bert|title=BERT|date=March 13, 2023|via=GitHub}}</ref>
+| early and influential language model<ref name=Manning-2022/>
+|-
+| [[GPT-2]] || 2019 || [[OpenAI]] || 1.5 billion<ref name="15Brelease"/> || 40GB<ref>{{cite web |title=Better language models and their implications |url=https://openai.com/research/better-language-models |website=openai.com}}</ref> (~10 billion tokens)<ref name="LambdaLabs">{{cite web |title=OpenAI's GPT-3 Language Model: A Technical Overview |url=https://lambdalabs.com/blog/demystifying-gpt-3 |website=lambdalabs.com |language=en}}</ref> || {{yes|MIT}}<ref>{{cite web|work=GitHub|title=gpt-2|url=https://github.com/openai/gpt-2|access-date=13 March 2023}}</ref>
+| general-purpose model based on transformer architecture
+|-
+| [[GPT-3]] || 2020 || OpenAI || 175 billion  || 499 billion tokens<ref name="LambdaLabs"/> || {{public web API}}
+| A fine-tuned variant of [[GPT-3]], termed GPT-3.5, was made available to the public through a web interface called [[ChatGPT]] in 2022.<ref name=chatgpt-blog/>
+|-
+| [[GPT-Neo]] || March 2021 || [[EleutherAI]] || 2.7 billion<ref name="gpt-neo">{{Cite web|url=https://github.com/EleutherAI/gpt-neo|title=GPT Neo|date=March 15, 2023|via=GitHub}}</ref> || 825 GiB<ref name="Pile">{{cite arxiv |last1=Gao |first1=Leo |last2=Biderman |first2=Stella |last3=Black |first3=Sid |last4=Golding |first4=Laurence |last5=Hoppe |first5=Travis |last6=Foster |first6=Charles |last7=Phang |first7=Jason |last8=He |first8=Horace |last9=Thite |first9=Anish |last10=Nabeshima |first10=Noa |last11=Presser |first11=Shawn |last12=Leahy |first12=Connor |title=The Pile: An 800GB Dataset of Diverse Text for Language Modeling |arxiv=2101.00027|date=31 December 2020 }}</ref> || {{yes|MIT}}<ref name=vb-gpt-neo/>
+| The first of [[EleutherAI#GPT-3 Replications|a series of free GPT-3 alternatives]] released by EleutherAI. GPT-Neo outperformed an equivalent-size [[GPT-3 model]] on some benchmarks, but was significantly worse than the largest GPT-3.<ref name=vb-gpt-neo/>
+|-
+| [[GPT-J]] || June 2021 || [[EleutherAI]] || 6 billion<ref>{{Cite web |title=GPT-J-6B: An Introduction to the Largest Open Source GPT Model {{!}} Forefront |url=https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model |access-date=2023-02-28 |website=www.forefront.ai |language=en}}</ref> || 825 GiB<ref name="Pile"/> || {{yes|Apache 2.0}}
+| GPT-3-style language model
+|-
+| Ernie 3.0 Titan || December 2021 || [[Baidu]] || 260 billion<ref>{{Cite web|url=https://www.wired.co.uk/article/chinas-chatgpt-black-market-baidu|title=China's ChatGPT Black Market Is Thriving|first=Condé|last=Nast|via=www.wired.co.uk}}</ref><ref>{{Cite journal|url=http://arxiv.org/abs/2112.12731|title=ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation|first1=Shuohuan|last1=Wang|first2=Yu|last2=Sun|first3=Yang|last3=Xiang|first4=Zhihua|last4=Wu|first5=Siyu|last5=Ding|first6=Weibao|last6=Gong|first7=Shikun|last7=Feng|first8=Junyuan|last8=Shang|first9=Yanbin|last9=Zhao|first10=Chao|last10=Pang|first11=Jiaxiang|last11=Liu|first12=Xuyi|last12=Chen|first13=Yuxiang|last13=Lu|first14=Weixin|last14=Liu|first15=Xi|last15=Wang|first16=Yangfan|last16=Bai|first17=Qiuliang|last17=Chen|first18=Li|last18=Zhao|first19=Shiyong|last19=Li|first20=Peng|last20=Sun|first21=Dianhai|last21=Yu|first22=Yanjun|last22=Ma|first23=Hao|last23=Tian|first24=Hua|last24=Wu|first25=Tian|last25=Wu|first26=Wei|last26=Zeng|first27=Ge|last27=Li|first28=Wen|last28=Gao|first29=Haifeng|last29=Wang|date=December 23, 2021|via=arXiv.org|arxiv=2112.12731}}</ref> || 4 Tb || {{no|Proprietary}}
+| Chinese-language LLM. [[Ernie Bot]] is based on this model.
+|-
+| [[Claude]]<ref>{{cite web |title=Product |url=https://www.anthropic.com/product |website=Anthropic |access-date=14 March 2023 |language=en}}</ref> || December 2021 || [[Anthropic]] || 52 billion<ref name="AnthroArch">{{cite arxiv |last1=Askell |first1=Amanda |last2=Bai |first2=Yuntao |last3=Chen |first3=Anna |last4=Drain |first4=Dawn |last5=Ganguli |first5=Deep |last6=Henighan |first6=Tom |last7=Jones |first7=Andy |last8=Joseph |first8=Nicholas |last9=Mann |first9=Ben |last10=DasSarma |first10=Nova |last11=Elhage |first11=Nelson |last12=Hatfield-Dodds |first12=Zac |last13=Hernandez |first13=Danny |last14=Kernion |first14=Jackson |last15=Ndousse |first15=Kamal |last16=Olsson |first16=Catherine |last17=Amodei |first17=Dario |last18=Brown |first18=Tom |last19=Clark |first19=Jack |last20=McCandlish |first20=Sam |last21=Olah |first21=Chris |last22=Kaplan |first22=Jared |display-authors=3 |title=A General Language Assistant as a Laboratory for Alignment |arxiv=2112.00861 |date=9 December 2021 }}</ref> || 400 billion tokens<ref name="AnthroArch"/> || {{Closed beta}}
+| fine-tuned for desirable behavior in conversations<ref>{{cite arxiv |last1=Bai |first1=Yuntao |last2=Kadavath |first2=Saurav |last3=Kundu |first3=Sandipan |last4=Askell |first4=Amanda |last5=Kernion |first5=Jackson |last6=Jones |first6=Andy |last7=Chen |first7=Anna |last8=Goldie |first8=Anna |last9=Mirhoseini |first9=Azalia |last10=McKinnon |first10=Cameron |last11=Chen |first11=Carol |last12=Olsson |first12=Catherine |last13=Olah |first13=Christopher |last14=Hernandez |first14=Danny |last15=Drain |first15=Dawn |last16=Ganguli |first16=Deep |last17=Li |first17=Dustin |last18=Tran-Johnson |first18=Eli |last19=Perez |first19=Ethan |last20=Kerr |first20=Jamie |last21=Mueller |first21=Jared |last22=Ladish |first22=Jeffrey |last23=Landau |first23=Joshua |last24=Ndousse |first24=Kamal |last25=Lukosuite |first25=Kamile |last26=Lovitt |first26=Liane |last27=Sellitto |first27=Michael |last28=Elhage |first28=Nelson |last29=Schiefer |first29=Nicholas |last30=Mercado |first30=Noemi |last31=DasSarma |first31=Nova |last32=Lasenby |first32=Robert |last33=Larson |first33=Robin |last34=Ringer |first34=Sam |last35=Johnston |first35=Scott |last36=Kravec |first36=Shauna |last37=Showk |first37=Sheer El |last38=Fort |first38=Stanislav |last39=Lanham |first39=Tamera |last40=Telleen-Lawton |first40=Timothy |last41=Conerly |first41=Tom |last42=Henighan |first42=Tom |last43=Hume |first43=Tristan |last44=Bowman |first44=Samuel R. |last45=Hatfield-Dodds |first45=Zac |last46=Mann |first46=Ben |last47=Amodei |first47=Dario |last48=Joseph |first48=Nicholas |last49=McCandlish |first49=Sam |last50=Brown |first50=Tom |last51=Kaplan |first51=Jared |display-authors=3 |title=Constitutional AI: Harmlessness from AI Feedback |arxiv=2212.08073 |date=15 December 2022 }}</ref>
+|-
+| [[GLaM]] (Generalist Language Model) || December 2021 || Google || 1.2 trillion<ref name=glam-blog/> || 1.6 trillion tokens<ref name=glam-blog/> || {{no|Proprietary}}
+| sparse mixture-of-experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
+|-
+| [[LaMDA]] (Language Models for Dialog Applications) || January 2022 || Google || 137 billion<ref name=lamda-blog/> ||  1.56T words<ref name=lamda-blog/> || {{no|Proprietary}}
+| specialized for response generation in conversations
+|-
+| [[Megatron-Turing NLG]] || October 2021<ref>{{cite web |last1=Alvi |first1=Ali |last2=Kharya |first2=Paresh |title=Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model |url=https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/ |website=Microsoft Research |date=11 October 2021}}</ref> || [[Microsoft]] and [[Nvidia]] || 530 billion<ref name=mtnlg-preprint/> || 338.6 billion tokens<ref name=mtnlg-preprint/> || {{no|Restricted web access}}
+| standard architecture but trained on a supercomputing cluster
+|-
+| [[GPT-NeoX]] || February 2022 || [[EleutherAI]] || 20 billion<ref name=“gpt-neox-20b”>{{cite conference |title=GPT-NeoX-20B: An Open-Source Autoregressive Language Model |conference=Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models |date=2022-05-01 |last=Black |first=Sidney |last2=Biderman |first2=Stella |last3=Hallahan |first3=Eric |display-authors=etal |volume=Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models |pages=95-136 |url=https://aclanthology.org/2022.bigscience-1.9/ |accessdate=2022-12-19 }}</ref> || 825 GiB<ref name="Pile"/> || {{yes|Apache 2.0}}
+| based on the Megatron architecture
+|-
+| [[Chinchilla AI|Chinchilla]] || March 2022 || [[DeepMind]] || 70 billion<ref name=chinchilla-blog/> || 1.3 trillion tokens<ref name=chinchilla-blog/><ref>{{cite arxiv |last1=Hoffmann |first1=Jordan |last2=Borgeaud |first2=Sebastian |last3=Mensch |first3=Arthur |last4=Buchatskaya |first4=Elena |last5=Cai |first5=Trevor |last6=Rutherford |first6=Eliza |last7=Casas |first7=Diego de Las |last8=Hendricks |first8=Lisa Anne |last9=Welbl |first9=Johannes |last10=Clark |first10=Aidan |last11=Hennigan |first11=Tom |last12=Noland |first12=Eric |last13=Millican |first13=Katie |last14=Driessche |first14=George van den |last15=Damoc |first15=Bogdan |last16=Guy |first16=Aurelia |last17=Osindero |first17=Simon |last18=Simonyan |first18=Karen |last19=Elsen |first19=Erich |last20=Rae |first20=Jack W. |last21=Vinyals |first21=Oriol |last22=Sifre |first22=Laurent |title=Training Compute-Optimal Large Language Models |arxiv=2203.15556 |date=29 March 2022}}</ref> || {{no|Proprietary}}
+| reduced-parameter model trained on more data
+|-
+| [[PaLM]] (Pathways Language Model) || April 2022 || Google || 540 billion<ref name=palm-blog/> || 768 billion tokens<ref name=chinchilla-blog/> || {{no|Proprietary}}
+| aimed to reach the practical limits of model scale
+|-
+| [[OPT (Open Pretrained Transformer)]] || May 2022 || [[Meta Platforms|Meta]] || 175 billion<ref>{{cite web |title=Democratizing access to large-scale language models with OPT-175B |url=https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/ |website=ai.facebook.com |language=en}}</ref> || 180 billion tokens<ref>{{cite arxiv |last1=Zhang |first1=Susan |last2=Roller |first2=Stephen |last3=Goyal |first3=Naman |last4=Artetxe |first4=Mikel |last5=Chen |first5=Moya |last6=Chen |first6=Shuohui |last7=Dewan |first7=Christopher |last8=Diab |first8=Mona |last9=Li |first9=Xian |last10=Lin |first10=Xi Victoria |last11=Mihaylov |first11=Todor |last12=Ott |first12=Myle |last13=Shleifer |first13=Sam |last14=Shuster |first14=Kurt |last15=Simig |first15=Daniel |last16=Koura |first16=Punit Singh |last17=Sridhar |first17=Anjali |last18=Wang |first18=Tianlu |last19=Zettlemoyer |first19=Luke |title=OPT: Open Pre-trained Transformer Language Models |arxiv=2205.01068 |date=21 June 2022}}</ref> || {{Non-commercial research}}{{efn|The smaller models including 66B are publicly available, while the 175B model is available on request.}}
+| GPT-3 architecture with some adaptations from Megatron
+|-
+|YaLM 100B
+|June 2022
+|[[Yandex]]
+|100 billion<ref name=":0">{{Citation |last=Khrushchev |first=Mikhail |title=YaLM 100B |date=2022-06-22 |url=https://github.com/yandex/YaLM-100B |access-date=2023-03-18 |last2=Vasilev |first2=Ruslan |last3=Petrov |first3=Alexey |last4=Zinov |first4=Nikolay}}</ref>
+|1.7TB<ref name=":0" />
+|{{Yes|Apache 2.0}}
+|English-Russian model
+|-
+| [[BLOOM (language model)|BLOOM]] || July 2022 || Large collaboration led by [[Hugging Face]] || 175 billion<ref name=bigger-better/> || 350 billion tokens (1.6TB)<ref>{{cite web |title=bigscience/bloom · Hugging Face |url=https://huggingface.co/bigscience/bloom |website=huggingface.co}}</ref> || {{yes|Responsible AI}}
+| Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
+|-
+| [[AlexaTM (Teacher Models)]] || November 2022 || [[Amazon (company)|Amazon]] || 20 billion<ref>{{cite web |title=20B-parameter Alexa model sets new marks in few-shot learning |url=https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning |website=Amazon Science |language=en |date=2 August 2022}}</ref> || 1.3 trillion<ref>{{cite arxiv |last1=Soltan |first1=Saleh |last2=Ananthakrishnan |first2=Shankar |last3=FitzGerald |first3=Jack |last4=Gupta |first4=Rahul |last5=Hamza |first5=Wael |last6=Khan |first6=Haidar |last7=Peris |first7=Charith |last8=Rawls |first8=Stephen |last9=Rosenbaum |first9=Andy |last10=Rumshisky |first10=Anna |last11=Prakash |first11=Chandana Satya |last12=Sridhar |first12=Mukund |last13=Triefenbach |first13=Fabian |last14=Verma |first14=Apurv |last15=Tur |first15=Gokhan |last16=Natarajan |first16=Prem |display-authors=3|title=AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |arxiv=2208.01448 |date=3 August 2022}}</ref> || {{public web API}}<ref>{{cite web |title=AlexaTM 20B is now available in Amazon SageMaker JumpStart {{!}} AWS Machine Learning Blog |url=https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/ |website=aws.amazon.com |access-date=13 March 2023 |date=17 November 2022}}</ref>
+| bidirectional sequence-to-sequence architecture
+|-
+| [[LLaMA]] (Large Language Model Meta AI) || February 2023 || [[Meta Platforms|Meta]] || 65 billion<ref name=llama-blog/> || 1.4 trillion<ref name=llama-blog/> || {{Non-commercial research}}{{efn|Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.}}
+| trained on a large 20-language corpus to aim for better performance with fewer parameters.<ref name=llama-blog/>
+|-
+| [[GPT-4]] || March 2023 || OpenAI || Unknown{{efn|As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like [[GPT-4]], this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."<ref name="GPT4Tech">{{Cite web |date=2023 |title=GPT-4 Technical Report |url=https://cdn.openai.com/papers/gpt-4.pdf |website=[[OpenAI]] |access-date=March 14, 2023 |archive-date=March 14, 2023 |archive-url=https://web.archive.org/web/20230314190904/https://cdn.openai.com/papers/gpt-4.pdf |url-status=live }}</ref>}} || Unknown || {{public web API}}
+| Available for ChatGPT Plus users. Microsoft confirmed that [[GPT-4 model]] is used in [[Bing Chat]].<ref>{{Cite web |date=March 14, 2023 |url=https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/ |title=Microsoft’s new Bing was using [[GPT-4]] all along |last=Lardinois |first=Frederic |website=TechCrunch |access-date=March 14, 2023 |archive-date=March 15, 2023 |archive-url=https://web.archive.org/web/20230315013650/https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/ |url-status=live }}</ref>
+|}
+=== 2023 ===
+* ([[Zhao, Zhou et al., 2023]]) ⇒ [[Wayne Xin Zhao]], [[Kun Zhou]], [[Junyi Li]], [[Tianyi Tang]], [[Xiaolei Wang]], [[Yupeng Hou]], [[Yingqian Min]], [[Beichen Zhang]], [[Junjie Zhang]], [[Zican Dong]],  Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and [[Ji-Rong Wen]]. ([[2023]]). &ldquo;[https://arxiv.org/pdf/2303.18223.pdf A Survey of Large Language Models].&rdquo; In: arXiv preprint arXiv:2303.18223. [http://dx.doi.org/10.48550/arXiv.2303.18223 doi:10.48550/arXiv.2303.18223]
+=== 2022 ===
+* ([[Li, Tang et al., 2021]]) ⇒ [[Junyi Li]], [[Tianyi Tang]], [[Wayne Xin Zhao]], [[Jian-Yun Nie]], and [[Ji-Rong Wen]]. ([[2021]]). “Pretrained Language Models for Text Generation: A Survey.” arXiv:2105.10311 https://doi.org/10.48550/arXiv.2201.05273
+** ABSTRACT: Text Generation aims to produce plausible and readable text in a human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on [[pre-trained language models (PLMs)]]. Text generation based on [[PLM]]s is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of [[PLM]]s in text generation. We begin with introducing three key aspects of applying [[PLM]]s to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize [[PLM]]s given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these [[PLM]]s for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.
+----
+__NOTOC__
+[[Category:Concept]]
+[[Category:Quality Silver]]

Pretrained Large Language Model (LLM): Difference between revisions

Latest revision as of 02:38, 7 February 2025

References

2023

2023

2022

Navigation menu

Search