Multi-Head Attention Mechanism

From GM-RKB
Jump to navigation Jump to search

A Multi-Head Attention Mechanism is an attention mechanism that enables simultaneous attention to information from different representation subspaces at different positions.



References

2017

  • (Vaswani et al., 2017) ⇒ Ashish Vaswani, Noam Shazeer, ..., Łukasz Kaiser, and Illia Polosukhin. (2017). “Attention Is All You Need.” In: Advances in Neural Information Processing Systems, 30 (NeurIPS 2017). arXiv:1706.03762
    • NOTE: Introduced the concept of Multi-Head Attention Mechanism as a means to allow the model to jointly attend to information from different representation subspaces at different positions, enhancing the ability to capture complex input relationships .
    • NOTE: The mechanism projects queries, keys, and values multiple times with different, learned linear projections, enabling parallel processing of attention which significantly contributes to both efficiency and model performance .

2018

2020