Transformer-based Large Language Model (LLM)

From GM-RKB
(Redirected from Transformer-based LLM)
Jump to navigation Jump to search

A Transformer-based Large Language Model (LLM) is a transformer-based LM that is a large transformer-based model.



References

2023

  • chat
    • Large Language Models (LLMs) like BERT and GPT have revolutionized the field of Natural Language Processing (NLP) by leveraging the Transformer architecture. Transformers consist of three main components: encoders, decoders, and the attention mechanism. Here's a description of encoder-only, decoder-only, and encoder-decoder Transformer models with respect to LLMs:
      • Encoder-only Transformer-based LNLM models:

        Encoder-only models focus on encoding input text into meaningful contextualized representations. These models are particularly well-suited for tasks where the main goal is to understand and extract information from the input text. BERT (Bidirectional Encoder Representations from Transformers) is a prominent example of an encoder-only LLM. BERT uses the Transformer's encoder layers to pretrain deep bidirectional representations, capturing the context from both left and right directions. This allows BERT to excel in tasks like sentence classification, named entity recognition, and question-answering.

      • Decoder-only Transformer-based LNLM models:

        Decoder-only models concentrate on autoregressive language modeling and text generation. These models predict the next word in a sequence given the previous words while attending only to the left context. GPT (Generative Pre-trained Transformer) is a well-known example of a decoder-only LLM. GPT uses the Transformer's decoder layers and is trained to generate text by predicting the next token in a sequence. This makes GPT particularly effective for tasks such as text generation, summarization, and translation.

      • Encoder-decoder Transformer-based LNLM models: Encoder-decoder models combine both the encoding and decoding components of the Transformer architecture. The encoder is responsible for encoding the input text into a contextualized representation, while the decoder generates output text based on this representation. These models are particularly useful for tasks that require mapping input text to output text, such as machine translation and text summarization. A notable example of an encoder-decoder LLM is T5 (Text-to-Text Transfer Transformer), which frames all NLP tasks as a text-to-text problem. In T5, the input and output are both sequences of text, and the model is fine-tuned on various tasks by providing appropriate input-output pairs.
    • In conclusion, encoder-only, decoder-only, and encoder-decoder transformer models are different instantiations of the Transformer architecture in LLMs. Each type of model excels in specific NLP tasks, with encoder-only models focusing on understanding input text, decoder-only models focusing on text generation, and encoder-decoder models focusing on mapping input text to output text.

2020

  • https://towardsdatascience.com/gpt-3-transformers-and-the-wild-world-of-nlp-9993d8bb1314
    • QUOTE: 2.2 Architecture
      • In terms of architecture, transformer models are quite similar. Most of the models follow the same architecture as one of the “founding fathers”, the original transformer, BERT and GPT. They represent three basic architectures: encoder only, decoder only and both.
        • Encoder only (BERT): Encoder is usually a stack of attention and feed-forward layers, which encode the input text sequence into contextualised hidden states. To generate different output format of language tasks, a task specific head is often added on top of the encoder. For example, a causal language model (CLM, or simply LM) head to predict the next word, or a feed-forward (linear) layer to produce classification labels.
        • Decoder only (GPT): In many ways, an encoder with a CLM head can be considered a decoder. Instead of outputting hidden states, decoders are wired to generate sequences in an auto-regressive way, whereby the previous generated word is used as input to generate the next one.
        • Both (Transformer): The distinction between encoder and decoder makes most sense when they both exist in the same structure, as in Transformer. In an encoder-decoder structure, input sequence is first “encoded” into hidden states, and then “decoded” to generate an output sequence. Encoder and decoder can even share the same weights to make training more efficient.