A Chinese Language is a Natural Language based in China.



  • (Sproat et al, 1996) ⇒ Richard Sproat, William A. Gale, Chilin Shih, and Nancy Chang. (1996). “A Stochastic Finite-state Word-Segmentation Algorithm for Chinese.” In: Computational Linguistics, 22(3).
    • The first point we need to address is what type of linguistic object a hanzi represents. Much confusion has been sown about Chinese writing by the use of the term ideograph, suggesting that hanzi somehow directly represent ideas. The most accurate characterization of Chinese writing is that it is morphosyllabic (DeFrancis 1984): each hanzi represents one morpheme lexically and semantically, and one syllable phonologically.
    • Thus in a two-hanzi word like ~]~ zhongl-guo2 (middle country) 'China' there are two syllables, and at the same time two morphemes. Of course, since the number of attested (phonemic) Mandarin syllables (roughly 1400, including tonal distinctions) is far smaller than the number of morphemes, it follows that a given syllable could in principle be written with any of several different hanzi, depending upon which morpheme is intended: the syllable zhongl could be ~ 'middle,' ~ 'clock,', ~ 'end,' or, ~ 'loyal.' A morpheme, on the other hand, usually corresponds to a unique hanzi, though there are a few cases where variant forms are found. Finally, quite a few hanzi are homographs, meaning that they may be pronounced in several different ways, and in extreme cases apparently represent different morphemes: The prenominal modification marker ft~ deO is presumably a different morpheme from the second morpheme of I~l~ mu4-di4, even though they are written the same way.
    • (In Chinese, numerals and demonstratives cannot modify nouns directly, and must be accompanied by a classifier. The particular classifier used depends upon the noun.)