spaCy Lemmatizer

From GM-RKB
Jump to navigation Jump to search

A spaCy Lemmatizer is a word lemmatisation system within spaCy (that assigns base forms to tokens using rules based on part-of-speech tags or lookup tables).

  • Context:
    • It can be implemented for different languages via language-specific factories.
    • It can use a standalone pipeline component that can be added to a spaCy NLP pipeline.
    • It can determine the lemma of a word based on its intended meaning and part of speech within a sentence.
    • It can operate in different modes, such as "rule" or "lookup," depending on the configuration and available language-specific lemmatizer.
    • It can be configured to overwrite existing lemmas or to use specific modes for lemmatization.
    • It can utilize the spaCy-lookups-data extension package for its default data.
    • It can support customizable lemmatization by allowing users to specify the mode (e.g., "lookup" or "rule") and whether to overwrite existing lemmas.
    • It can be part of a customizable pipeline where it is positioned after components that assign coarse-grained POS tags.
    • ...
  • Example(s):
  • Counter-Example(s):
  • See: spaCy, Natural Language Processing, Lemma, Part-of-Speech Tagging.