Formal Language

Jump to navigation Jump to search

A Formal Language is a subset of all possible strings that can be composed from an alphabet.


  • (Wikipedia, 2009) ⇒
    • A formal language is a set of words, i.e. finite strings of letters, or symbols. The inventory from which these letters are taken is called the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar. Formal languages are a purely syntactical notion, so there is not necessarily any meaning associated with them. To distinguish the words that belong to a language from arbitrary words over its alphabet, the former are sometimes called well-formed words (or, in their application in logic, well-formed formulas).
    • Formal languages are studied in the fields of logic, computer science and linguistics. Their most important practical application is for the precise definition of syntactically correct programs for a programming language. The branch of mathematics and computer science that is concerned only with the purely syntactical aspects of such languages, i.e. their internal structural patterns, is known as formal language theory.
    • Although it is not formally part of the language, the words of a formal language often have a semantical dimension as well. In practice this is always tied very closely to the structure of the language, and a formal grammar (a set of formation rules that recursively defines the language) can help to deal with the meaning of (well-formed) words. Well-known examples for this are "Tarski's definition of truth" in terms of a T-schema for first-order logic, and compiler generators like lex and yacc.


  • (Kakkonen, 2007) ⇒ Tuomo Kakkonen. (2007). “Framework and Resources for Natural Language Evaluation." Academic Dissertation. University of Joensuu.
    • Definition 3-1. Symbol, terminal and alphabet.
      • A symbol is a distinguishable character, such as “a”, “b” or “c”.
      • Any permissible sequence of symbols is called a terminal (also referred to as a word).
      • A finite, nonempty set ∑ of terminals is called an alphabet.
    • Definition 3-2. String and sets of strings.
      • Let Σ be an alphabet.
      • A finite sequence of symbols S=(x1 x2… xn), n≥0, x∈Σ is called a string in alphabet Σ.
      • The length |S| of string S is n.
      • The empty string is the sequence of length 0; written ε.
      • Σ* is the set of all strings in Σ.
      • In addition, Σ+ = Σ*- {ε}.
    • Definition 3-3. Language and sentence.
      • Let Σ be an alphabet.
      • Any subset [math]\displaystyle{ L }[/math] of Σ* is called a language over alphabet Σ.
      • Sequence δ = (α1 α2 … αn), where αiLi, 1≤in, n' ∈ natural numbers, is called a sentence in language L.
    • A language follows the rules of a given grammar and is represented by using a particular grammar formalism.
    • Definition 3-4. Grammar, rules.
      • A grammar [math]\displaystyle{ G }[/math] is a description of a language L.
      • A grammar [math]\displaystyle{ G }[/math] consists of a lexicon and rules.
      • A lexicon is a structure that defines the terminals in a language.
      • Rules describe how the terminals combine into larger entities.