Longformer Model

From GM-RKB
Jump to navigation Jump to search

A Longformer Model is a transformer-based model designed to handle long input contexts.



References

2023

2022

  • https://huggingface.co/docs/transformers/model_doc/longformer
    • QUOTE: ... Since the Longformer is based on RoBERTa, it doesn’t have token_type_ids. You don’t need to indicate which token belongs to which segment. Just separate your segments with the separation token tokenizer.sep_token (or </s>).

      A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g., what are the two tokens left and right?) is enough to take action for a given token. Some preselected input tokens are still given global attention, but the attention matrix has way less parameters, resulting in a speed-up. See the local attention section for more information.

    • Longformer self attention employs self attention on both a “local” context and a “global” context. Most tokens only attend “locally” to each other meaning that each token attends to its 12w21​w previous tokens and 12w21​w succeeding tokens with ww being the window length as defined in config.attention_window. Note that config.attention_window can be of type List to define a different ww for each layer. A selected few tokens attend “globally” to all other tokens, as it is conventionally done for all tokens in BertSelfAttention. ...

2022

2020