General Architecture for Text Engineering (GATE) Framework

(Redirected from GATE NLP Framework)
Jump to navigation Jump to search

A General Architecture for Text Engineering (GATE) Framework is an Java-based NLP framework.



  • (Wikipedia, 2020) ⇒ Retrieved:2020-6-23.
    • General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages. [1] GATE has been compared to NLTK, R and RapidMiner. As well as being widely used in its own right, it forms the basis of the KIM semantic platform. GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects. As of May 28, 2011, 881 people are on the gate-users mailing list at, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005. The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications" [2] has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide, include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady, [3] and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.



    • the Eclipse of Natural Language Engineering, the Lucene of Information Extraction, a leading toolkit for Text Mining
    • used worldwide by thousands of scientists, companies, teachers and students
    • comprised of an architecture, a free open source framework (or SDK) and graphical development environment
    • used for all sorts of language processing tasks, including Information Extraction in many languages
    • funded by the EPSRC, BBSRC, AHRC, the EU and commercial users
    • 100% Java reference implementation of ISO TC37/SC4 and used with XCES in the ANC
    • 10 years old in 2005, used in many research projects and compatible with IBM's UIMA
    • based on MVC, mobile code, continuous integration, and test-driven development, with code hosted on SourceForge


  • K. Bontcheva, V. Tablan, D. Maynard, H. Cunningham. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering. 10 (3/4), pp. 349-373. (2004). Pre-print. BibTex entry.