Word Segmentation Algorithm

From GM-RKB
Jump to navigation Jump to search

A Word Segmentation Algorithm is a sequence segmentation algorithm that can be applied by a word segmentation system (to solve a word segmentation task.



References

2007

  • (Schmid, 2007) ⇒ Helmut Schmid. (2007). “Tokenizing.” In: Corpus Linguistics: An International Handbook. Walter de Gruyter, Berlin.

1999

  • (Brent, 1999) ⇒ Michael R. Brent. (1999). “An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery.” In: Machine Learning, 34(1-3). doi:10.1023/A:1007541817488.

1997

  • (Palmer, 1997) ⇒ David D. Palmer. (1997). “A Trainable Rule-based Algorithm for Word Segmentation.” In: Proceedings of the ACL 1997 Conference. doi:10.3115/976909.979658.
    • QUOTE: This paper presents a trainable rule-based algorithm for performing word segmentation. The algorithm provides a simple, language-independent alternative to large-scale lexical-based segmenters requiring large amounts of knowledge engineering. As a stand-alone segmenter, we show our algorithm to produce high performance Chinese segmentation. In addition, we show the transformation-based algorithm to be effective in improving the output of several existing word segmentation algorithms in three different languages.