2000 LTTTT

From GM-RKB
Jump to: navigation, search

Subject Headings: Surface Word Segmentation Task, Tokenization Algorithm, Tokenization System, LT TTT Tokenization System.

Notes

Quotes

Abstract

We describe LT TTT, a recently developed software system which provides tools to perform text tokenisation and mark-up. The system includes ready-made components to segment text into paragraphs, sentences, words and other kinds of token but, crucially, it also allows users to tailor rule-sets to produce mark-up appropriate for particular applications. We present three case studies of our use of LT TTT: named-entity recognition (MUC-7), citation recognition and mark-up and the preparation of a corpus in the medical domain. We conclude with a discussion of the use of browsers to visualise marked-up text.


References


  author volume date title type journal titleUrl doi note year
2000 LTTTT LT TTT - A flexible tokenisation tool Proceedings of LREC Conference http://www.ltg.ed.ac.uk/papers/00tttlrec.pdf 2000