Jump to: navigation, search

A lexeme is an abstract entity that maps a morphosyntactic word to a wordsense set.

Word Lemmatisation Task.




  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/lexeme Retrieved:2015-4-10.
    • A 'lexeme () is a unit of lexical meaning that exists regardless of the number of inflectional endings it may have or the number of words it may contain. It is a basic unit of meaning, and the headwords of a dictionary are all lexemes. [1] Put more technically, a lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN. [2] A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.

      A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme RUN has a present third person singular form runs, a present non-third-person singular form run (which also functions as the past participle and non-finite form), a past form ran, and a present participle running. (It does not include runner, runners, runnable, etc.) The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as RUN, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.

      A lexicon consists of lexemes.

      In many formal theories of language, lexemes have subcategorization frames to account for the number and types of complements. They occur within sentences and other syntactic structures.

      The notion of a lexeme is very central to morphology, and thus, many other notions can be defined in terms of it. For example, the difference between inflection and derivation can be stated in terms of lexemes:

      • Inflectional rules relate a lexeme to its forms.
      • Derivational rules relate a lexeme to another lexeme.




  • http://folk.uio.no/hhasselg/terms.html
    • lexeme (lexeme): an item of vocabulary; a 'family' of words that are related to each other in that they are inflected forms of the same stem, and carry the same core meaning. E.g. draw, draws, drew, drawn, drawing are all instances of the same lexeme ('draw'). However, the noun drawing represents another lexeme (that can be realized by drawing, drawings). A lexeme is usually cited as the base form of a word; the citation form which is what is recorded in dictionaries. See also word.


  • http://www.phon.ucl.ac.uk/home/dick/enc/words.htm Encyclopedia of English Grammar and Word Grammar
    • A 'dictionary-word', which may comprise several different inflections; e.g. DOG is the lexeme which covers both dog and dogs. Where there is no inflectional variation (e.g. for prepositions, adverbs and the like) it makes no difference whether you write them as lexemes (e.g. OF) or as simple word-forms (e.g. of).

      In general each lexeme belongs to just one word-class (or combination of word-classes - see mixed category), so a lexeme does not necessarily bring together all words which we feel to be closely related - e.g. the verb and noun walk as in (1) belong to different lexemes, WALKv and WALKn. (1.a) We walk everywhere. (1.b) We had a nice walk. This distinction is forced on us by the need to distinguish verbs and nouns. The links between similar lexemes are shown in the network of the grammar by links such as `nominalization of' which handle `word-formation' rules. The general principle, then, is that we must recognise distinct lexemes whenever two (or more) characteristics co-vary - e.g. if meaning and morphology co-vary. This general principle allows us to group lexemes into clusters, with a 'super-lexeme' and various 'sub-lexemes' which inherit most of the super-lexemes characteristics but differ in specific ways. For example, we can recognise a super-lexeme STANDv, which is a verb and has the irregular ed-form and en-form stood; it is different from STANDn, the noun in examples like: “We erected a stand for the spectators.” Beneath STANDv we recognise two sub-lexemes, the intransitive STANDvi and the transitive STANDvt found in (3a). “We stand on our feet.” and (3b). “We can't stand the noise.”. Each of these has a distinct syntactic valency (object or no object) paired with a distinct meaning (`be upright' versus `tolerate').

      One special kind of lexeme is used for idioms (e.g. `hot dog', `kick the bucket'). Most idioms are built round one word, whose dependents change its normal meaning. This can be handled by recognising the parent word as a special sub-lexeme of the lexeme concerned: DOGsausage or KICKdie, which have the idiomatic meaning and require the dependent which forces this meaning - e.g. DOGsausage means `sausage-filled roll' and requires hot as its preadjunct.


  • (Sag et al., 2003) ⇒ Ivan A. Sag, Thomas Wasow, and Emily M. Bender. (2003). “Syntactic Theory: A Formal Introduction, 2nd edition." CSLI Publications.
    • NOTES: It includes a Type Hierarchy between Lememe, Expression, Phrase, and Word.
    • NOTES: It suggests that Lexical Entries serve as the basis for constructing Words and Words serve as the building blocks for Syntactic Structures.
    • But we also want to capture what people have in mind when they use 'word' in the second sense. That is, we want to be able to express the relationship between runs and ran (and run and running). We do this be means of a new type lexeme. A lexeme It can be thought of as an abstract proto-word, which, by means to be discussed in this chapter, gives rise to genuine words (that is, instances of the type word).

      We incorporate the notion of lexeme in to our theory by first revising a high-level distinction in our type hierarchy - the types that distinguish among the syntactic-semantic complexes we have been referring to as expressions, words and phrases. We will refer to the most general such type of feature structure simply as synsem (indication that it is a complex of syntactic and semantic information). The type expression with then be an immediate subtype of synsem, as will the new type lexeme.

      The lexical entries, taken together with the constraints inherited via the lexeme hierarchy characterize the set of basic lexical elements of the language .... Thus, lexical entries serve as the basis for constructing words and words serve as the building blocks for syntactic structures.

      Among lexemes, we draw a further distinction between those that give rise to a set of inflected forms and those that do not show any morphological inflection. That is, we posit inflecting-lexeme (infl-lxm) and constant-lexeme (const-lxm) as two subtypes of lexeme. The type hierarchy we will assume for nominal and verbal lexemes in English is sketched …

      lexeme: The term 'word' is used ambiguously to mean either a particular form, such as sees, or a set of related forms such as see, sees, saw, seen, and seeing. To avoid this ambiguity, linguists sometimes posit an abstract entity called a 'lexeme' that gives rise to a family of related words. See also word.

      lexical entry: Information about individual words [q.v.] that must be stipulate is put int the lexicon [q.v.] in the form of descriptions that we call lexical entries. They are ordered pairs, consisting of a phonological form (description) and partial feature structure description. Fully resolved lexical sequences [q.v.] consistent with lexical entries can serve as the INPUT values to lexical rules [q.v.].

      lexical rule: Lexical rules are one of the mechanisms (along with the type type hierarchy [q.v.]) used to capture generalizations within the lexicon. Families of related words - such as the different inflectional forms of a verb - can be derived from a single lexical entry [q.v.] by means of lexical rules. We formalize lexical rules as a type of feature structure with features INPUT and OUTPUT. There are three sybtypes of lexical rules: derivational (relating lexemes [q.v.] to lexemes), inflectional (relation lexemes to words [q.v.]), and post-inflectional (relating words to words).

      lexical rule instantiation: Our lexical rules [q.v.] are descriptions, specifying the value of some features and leaving others unspecified. A lexical rule instantiation is a fully resolved feature structure that is consistent with the specification of some lexical rule.

      lexical sequence: Ordered pairs that can serve as the INPUT and OUTPUT values of lexical rules [q.v.] are called lexical sequences. They consist of a phonological form and a fully resolved feature structure.

      lexicon: The list of all words [q.v.] (or lexemes [q.v.]) of a language is called its 'lexicon'. The lexicon is the repository of all idiosyncratic information about particular words including syntactic, semantic, and phonological information. In some theories of grammar, the lexicon can also contain a great deal more systematic information organized by a type hierarchy [q.v.] and/or lexical rules.



  • (Manning and Schütze, 1999) ⇒ Christopher D. Manning and Hinrich Schütze. (1999). “Foundations of Statistical Natural Language Processing." The MIT Press.
    • QUOTE: The major types of morphological processes are inflection, derivation, and compounding. Inflections are the systematic modifications of a root form by means of prefixes and suffixes to indicate grammatical distinctions like singular and plural. Inflection does not change words class of meaning significantly, but varies features such as tense, number, and plurality. All the inflectional forms of a word are often grouped as manifestations of a single lexeme.

      Note that this means that we will often have multiple forms, perhaps some treated as one words and others as two, for what is best though of as a single lexeme (a single dictionary entry with a single meaning).

      Another question is whether on wants to keep word forms like sit, sits and sat separate or to collapse them. The issues here are similar to those in the discussion of capitalization, but have traditionally been regarded as more linguistically interesting. At first, grouping such forms together and working in terms of lexemes feels as if it is the right thing to do. Doing this is usually referred to in the literature as stemming in reference to a process that strips off affixes and leaves you with a stem. Alternatively, to find the lemma or lexeme of which one is looking at an inflected form. These latter terms imply disambiguation at the level of lexemes, such as whether a use of lying represents the 'verb lie-lay 'to prostrate oneself' or lie-lied 'to fib.'


  • (Carter, 1998) ⇒ Ronald Carter. (1998). “Vocabulary: Applied Linguistic Perspectives; 2nd edition." Routledge.
    • QUOTE: One theoretical notion which may help us to resolve some of the above problems is that of the lexeme. A lexeme is the abstract unit which underlies some of the variants we have observed in connection with 'words'. Thus BRING is the lexeme which underlies different grammatical variants: 'bring', 'brought', 'brings', 'bringing' which we can refer to as word-forms (note a lexeme is conventionally represented by upper-case letters and that quotation marks are used for its word-forms). Lexemes are the basic, contrasting units of vocabulary in a language. When we look up words in a dictionary we are looking up lexemes rather than words. That is, 'brought' and 'bringing' will be found under and entry for BRING. The lexeme BRING is an abstraction. It does not actually occur itself in texts. Instead, it realizes different word-forms. Thus, the word-form 'bring' is realized by the lexeme BRING; the lexeme GO realizes the word-form 'went'. In a diction each lexeme merits a separate entry or sub-entry.

      The term lexeme also embraces items which consist of more than one word-form. Into the category come lexical items such as multi-word verbs (to catch up on), phrasal verbs (to drop in) and idioms (kick the bucket). Here, KICK THE BUCKET is a lexeme and would appear a such in a single dictionary entry even though it is a three-word form. …

      QUOTE: We can also see that the notion of lexeme helps us to represent the polysemy - or the existence of several meanings - in individual words: that, far (n.). “fair (adj. as in good, acceptable) and fair (adj. as in light in colour, especially of hair), would have three different lexeme meanings for the same word-form. The same applies to the different meanings of lap … But there are numerous less clear-cut categories. For example, in the case of line (draw a line; rail line; clothes line) is the same surface form realized by one, two, or three separate underlying lexemes? And are the meanings of chair (professional appointment; seat) or paper (newspaper; academic lecture) or dressing (sauce; manure; bandages) specialization of the same basic lexeme or not.

      An important question which also arises her concerns our own metalanguage in this book. Should we talk of words or word-forms or lexemes or lexical items? It is clear that the uses of these words word or vocabulary have a general common-sense validity and are serviceable when there is no real need to be precise. They will continue to be used for general reference. The terms lexeme and the word-forms of a lexeme are valuable theoretical concepts and will be used when theoretical distinctions are necessary. Lexical item(s) (or sometimes vocabulary items or simply items) is a useful and fairly neutral hold-all term which captures and, to some extend, helps to overcome instability in the term word, especially when it become limited by orthography.

      In this chapter there is a distinct shift from examining lexical items at the level of the orthographic ‘word’ or in the patterns which occur in fixed expressions towards a consideration of lexis in larger units of language organization.

  • (Roelofs et al, 1998) ⇒ Ardi Roelofs, Antje S. Meyer, and Willem J.M. Levelt. (1998). “Case for the Lemma/Lexeme Distinction in Models of Speaking: Comment on Caramazza and Miozzo (1997).” In: Cognition, 69.
    • QUOTE: In lexical access, speakers draw on stored knowledge about words. This stored information comprises the meanings of words, their syntactic properties (such as the word class, subcategorization features for verbs, and grammatical gender for nouns), and information about their morphological structure and phonological form. The received view holds that lexical access consists of two major steps, corresponding to the formulation stages of syntactic encoding and morphophonological encoding, respectively. During the first step, often called lemma retrieval, a word’s syntactic properties and, on some views, its meaning are retrieved from memory. During the second step, information about the word’s morphophonological form, often called its lexeme, is recovered.


  • (Breidt et al., 1996) ⇒ Elisabeth Breidt, Frédérique Segond, and Giuseppe Valetto. (1996). “Formal Description of Multi-Word Lexemes with the Finite-State Formalism IDAREX.” In: Proceedings of COLING 1996.



  • (Aronoff, 1994) ⇒ Mark Aronoff. (1994). “Morphology by Itself. Cambridge, MA: MIT Press.
    • http://www.facstaff.bucknell.edu/rbeard/lexbase.html
    • QUOTE: Aronoff (1994) distinguishes a lexeme based morphology from morpheme based theories. The latter 'reduce[s] language to simplex signs, each of which is an arbitrary union of sound and meaning', i.e. the 'morpheme'. Lexeme-based morphology, on the other hand, 'starts from two decidedly unstructuralist assumptions: that the morpheme is not the basic unit of language and that morphology and syntax are not one and the same.' Morpheme-based morphology, in other words, assumes that language contains only one type of meaningful unit, the morpheme, which includes stems and affixes, all of which are signs. Lexeme-based morphology assumes that only lexemes, derived or underived, are signs, and that affixes, reduplication, revowelling, metathesis, subtraction, stem mutation, and the like, are means of phonologically marking independent derivational operations which a lexeme might have undergone.


  • (Beard, 1986) ⇒ Robert Beard. (1986). “Neurological Evidence for Lexeme/Morpheme-based Morphology.” In: Proceedings of the International Conference "Theoretical Approaches to Word Formation". Acta Linguistica Academia Scientiarum Hungarica, 36.


  1. The Cambridge Encyclopedia of The English Language. Ed. David Crystal. Cambridge: Cambridge University Press, 1995. p. 118. ISBN 0521401798
  2. RUN is here intended to display in small caps. Software limitations may result in its display either in full-sized capitals (RUN) or in full-sized capitals of a smaller font; either is anyway regarded as an acceptable substitute for genuine small caps.