Written Word Segmentation Task
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Text_segmentation#Word_segmentation
- Word segmentation is the problem of dividing a string of written language into its component words. In English and many other modern languages using some form of the Latin alphabet dividing text using the space character is a good approximation to word segmentation. (Some examples where the space character alone may not be sufficient include contractions like can't for can not.) However the equivalent to this character is not found in all written scripts and without it word segmentation is a difficult problem. Languages which do not have a trivial word segmentation process include Chinese, Japanese and Thai.