Text Segmentation Task
From GM-RKB
(Redirected from Text Chunking Task)
A Text Segmentation Task is a sequence segmentation task that requires the text annotation of coherent text segments.
- AKA: Text Segmentation, General Chunking Task, GCT, Text Chunking, Text Chunk Tagging
- Context:
- It can range from being a Full Text Segmentation Task to being a Partial Text Segmentation Task.
- It can range from being a Supervised Text Segmentation Task (if provided with a tagged text) to being a Unsupervised Segmentation Task.
- It can be solved by a Text Segmentation System that implements a Text Segmentation Algorithm.
- Example(s):
- a Syntactic-Phrase Chunking Task, such as: CoNLL-2000 Shared Task.
- a Word Segmentation Task, such as:
- a Morph Segmentation Task, such as:
- MST("Famous notaries public include ex-attorney generals.") ⇒ ([?Famous] [notarie-] [-s] [public] [in-] [-clude] [ex-] [attorney] [general-] [-s]).
- MST("The wolves' den was empty.") ⇒ ([The] [wolv] [es'] [den] [was] [empty])
- a Written Sentence Segmentation Task.
- a Written Phrase Segmentation Task.
- a Written Word Mention Segmentation Task.
- an Orthographic Word Segmentation Task.
- a Morph Segmentation Task.
- a Grapheme Segmentation Task.
- Counter-Example(s):
- a Text Tagging Task, such as POS Tagging.
- a Speech Segmentation Task.
- a DNA Segmentation Task.
- See: Text, Text Segment.
References
2011
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Text_segmentation
- QUOTE: Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
Compare speech segmentation, the process of dividing speech into linguistically meaningful portions.
- QUOTE: Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
2005
- (McDonald & al, 2005) ⇒ Ryan McDonald, Koby Crammer, and Fernando Pereira. (2005). "Flexible text segmentation with structured multilabel classification." In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing ([[HLT/EMNLP], 2005).
2000
- (McCallum & al, 2000) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). "Maximum Entropy Markov Models for Information Extraction and Segmentation." In: Proceedings of ICML-2000.
- (Choi, 2000) ⇒ Freddy Y. Y. Choi. (2000). "Advances in Domain Independent Linear Text Segmentation." In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference.
1999
- (Beeferman et al, 1999) ⇒ Doug Beeferman, Adam Berger, and John D. Lafferty. (1999). "Statistical Models for Text Segmentation." In: Machine Learning, 34(1–3).
- QUOTE: This paper introduces a new statistical approach to automatically partitioning text into coherent segments. ... Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.
1988
- (Hobbs et al, 1988) ⇒ Jerry R. Hobbs, Mark Stickel, Paul Martin, and Douglas Edwards. (1988). "Interpretation as Abduction." In: Proceedings of the 26th annual meeting on Association for Computational Linguistics ([[ACL] 1988).