2010 LTPAChineseLanguageTechnologyPl

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

LTP (Language Technology Platform) is an integrated Chinese processing platform which includes a suite of high performancenatural language processing (NLP) modules and relevant corpora. Especially for the syntactic and semantic parsing modules, we achieved good results in some relevant evaluations, such as CoNLL and SemEval. Based on XML internal data representation, users can easily use these modules and corpora by invoking DLL (Dynamic Link Library) or Web service APIs (Application Program Interface), and view the processing results directly by the visualization tool.

1 Introduction

A Chinese natural language processing (NLP) platform always includes lexical analysis (word segmentation, part-of-speech tagging, named entity recognition), syntactic parsing and semantic parsing (word sense disambiguation, semantic role labeling) modules. It is a laborious and time consuming work for researchers to develop a full NLP Platform, especially for Chinese, which has fewer existing NLP tools. Therefore, it should be of particular concern to build an integrated Chinese processing platform. There are some key problems for such a platform: providing high performance language processing modules, integrating these modules smoothly, using processing results conveniently, and showing processing results directly. LTP (Language Technology Platform), a Chinese processing platform, is built to solve the above mentioned problems. It uses XML to transfer data through modules and provides all sorts of high performance Chinese processing modules, some DLL or Web service APIs, visualization tools, and some relevant corpora.

2 Language Technology Platform

LTP (Language Technology Platform) is an integrated Chinese processing platform. Its architecture is shown in Figure 1. From bottom to up, LTP comprises 6 components: 1) Corpora, 2) Various Chinese processing modules, 3) XML based internal data presentation and processing, 4) DLL API, ⑤ Web service, and 5) Visualization tool. In the following sections, we will introduce these components in detail

2.1 Corpora

Many NLP tasks are based on annotated corpora. We distributed two key corpora used by LTP. First, WordMap is a Chinese thesaurus which contains 100,093 words. In WordMap, each word sense belongs to a five-level categories. There are 12 top, about 100 second and 1,500 third level,and more fourth and fifth level categories.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 LTPAChineseLanguageTechnologyPlTing Liu
Wanxiang Che
Zhenghua Li
LTP: A Chinese Language Technology Platform2010