Transformer-based Neural Network Architecture

From GM-RKB
Jump to navigation Jump to search

A Transformer-based Neural Network Architecture is a feedforward parallel processing attention-based neural network architecture that processes sequential input data through stacks of transformer-based neural network blocks using transformer-based self-attention mechanisms.



References

2023

  • Chat
    • A Transformer Model Architecture, on the other hand, is a blueprint or template for building Transformer-based neural networks. It defines the overall structure and components of the network, including the arrangement of transformer blocks, self-attention mechanisms, feed-forward layers, and other architectural details. The architecture serves as a foundation for creating specific neural network models with different configurations, hyperparameters, and training data.

      Example: The GPT (Generative Pre-trained Transformer) architecture is a Transformer Model Architecture. It consists of a decoder-only structure composed of a stack of transformer blocks. The architecture can be used to create various Transformer-based neural networks for different tasks, such as language modeling and text generation. GPT-3 is one of the models based on the GPT architecture, and the "Davinci" model is a specific instance within the GPT-3 family.

2017

  • (Vaswani et al., 2017) ⇒ Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. (2017). "Attention Is All You Need." In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017).
    • NOTES: Introduced the transformer architecture, demonstrating that models based entirely on attention mechanisms could achieve state-of-the-art performance on machine translation tasks without recurrence or convolution.