spaCy NLP System

From GM-RKB
(Redirected from spaCy)
Jump to navigation Jump to search

A spaCy NLP System is a Python/Cython-based natural language processing library.

  • Context:
    • It can be designed for production usage, offering high-speed performance and state-of-the-art accuracy.
    • It can be used for various NLP tasks such as tokenization, named-entity recognition, part-of-speech tagging, and syntactic parsing.
    • It is known for its non-destructive tokenization, meaning the original text can be fully reconstructed from the tokenized output.
    • It supports over 25 languages with statistical models for 8 languages and pre-trained word vectors.
    • It integrates with deep learning frameworks, allowing for the use of convolutional neural network models for tagging, parsing, and named entity recognition.
    • It provides built-in visualizers for syntax and named entities, aiding in the analysis and interpretation of text data.
    • It is designed with a focus on efficiency, scalability, and integration into existing Python-based software stacks.
    • It is released under the MIT license, making it freely available for commercial and non-commercial use.
    • ...
  • Example(s):
    • v3.x (2021-present): Enhanced models with transformer support, improved pipeline customization, and added features for machine learning workflows.
    • v2.x (2017-2020): Introduction of convolutional neural network models for NLP tasks and improvements in API for model training and updating.
      • v2.0.11 (2018-04-04).
    • v1.x (~2015-2017): Initial releases focusing on providing a solid foundation for NLP tasks with efficiency and ease of use.

offer different sets of features.



References

2018a

  • https://github.com/explosion/spaCy
    • QUOTE: spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 20+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.

2018

2018b