Decoder-only Transformer-based Large Language Model (LLM)

From GM-RKB
Jump to navigation Jump to search

A Decoder-only Transformer-based Large Language Model (LLM) is a transformer-based LLM that is a decoder-only transformer-based model.



References

2023

2023

2022

2019

2019

2020

  • https://towardsdatascience.com/gpt-3-transformers-and-the-wild-world-of-nlp-9993d8bb1314
    • QUOTE: 2.2 Architecture
      • In terms of architecture, transformer models are quite similar. Most of the models follow the same architecture as one of the “founding fathers”, the original transformer, BERT and GPT. They represent three basic architectures: encoder only, decoder only and both.
        • Decoder only (GPT): In many ways, an encoder with a CLM head can be considered a decoder. Instead of outputting hidden states, decoders are wired to generate sequences in an auto-regressive way, whereby the previous generated word is used as input to generate the next one.