1999 DiscoveringChineseWordsFromUnsegText
Jump to navigation
Jump to search
- (Ge et al., 1999) ⇒ Xianping Ge, Wanda Pratt, Padhraic Smyth. (1999). “Discovering Chinese Words from Unsegmented Text.” In: Proceedings of SIGIR-1999. doi:10.1145/312624.313472
Subject Headings: Surface Word Segmentation Task.
Notes
Cited By
~69 http://scholar.google.com/scholar?cites=5555599676164448189
Quotes
Abstract
- In English written text, words are separated by spaces, but in written Chinese text, there are no such separators between words. (See Figure 1.) Thus, effective information retrieval of Chinese text first requires good word segmentation. In this paper, we investigate an efficient algorithm to discover the words and their occurrence probabilities from a corpus of unsegmented text without using a dictionary. Using the probabilities of the words, word segmentation is done according to the maximum likelihood principle. Comparing the segmentation output by the algorithm with the correct segmentation, recall/precision of 65:65%=71:91% is achieved. If some simple post-processing is performed, recall/precision can be boosted up to 97:72%=91:05%.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1999 DiscoveringChineseWordsFromUnsegText | Padhraic Smyth Xianping Ge Wanda Pratt | Discovering Chinese Words from Unsegmented Text | http://ictclas.cn/otherdocs/Discovering Chinese words from unsegmented text.pdf | 10.1145/312624.313472 |