# Formal Language

A Formal Language is a subset of all possible strings that can be composed from an alphabet.

**AKA:**Language**Context:**- It can be an Empty Set, a Finite Set, or an Infite Set.
- It can be generated by a Formal Grammar (Formal System?).
- It can have Well-Formed Words.
- It can be the Union of two Languages. L = L1 union L2
- It can be the Intersection of two Languages. L = L1 intersect L2
- It can be the Complement of a Languages. L = Σ* - L1
- It can be the Difference of two Languages. L = L1 - L2

**Example(s):**- {0,10010,1,1101}
- {(I), (walk), (talk), (I walk), (I talk), (talk I), (walk I), (talk walk), (walk talk)}
- A Programming Language, such as LISP, Prolog, Java, or Perl.
- A Logic Language, such as a Propositional Logic Language.
- A Mathematical Language, such as an Arithmetic Language.

**Counter-Example(s):****See:**Terminal, Formal String, Grammar, Production Rule, Regular Language, Regular Expression, Formal Ontology.

## References

- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Formal_language
- A formal language is a set of words, i.e. finite strings of letters, or symbols. The inventory from which these letters are taken is called the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar. Formal languages are a purely syntactical notion, so there is not necessarily any meaning associated with them. To distinguish the words that belong to a language from arbitrary words over its alphabet, the former are sometimes called well-formed words (or, in their application in logic, well-formed formulas).
- Formal languages are studied in the fields of logic, computer science and linguistics. Their most important practical application is for the precise definition of syntactically correct programs for a programming language. The branch of mathematics and computer science that is concerned only with the purely syntactical aspects of such languages, i.e. their internal structural patterns, is known as formal language theory.
- Although it is not formally part of the language, the words of a formal language often have a semantical dimension as well. In practice this is always tied very closely to the structure of the language, and a formal grammar (a set of formation rules that recursively defines the language) can help to deal with the meaning of (well-formed) words. Well-known examples for this are "Tarski's definition of truth" in terms of a T-schema for first-order logic, and compiler generators like lex and yacc.

### 2007

- (Kakkonen, 2007) ⇒ Tuomo Kakkonen. (2007). “Framework and Resources for Natural Language Evaluation." Academic Dissertation. University of Joensuu.
- Definition 3-1. Symbol, terminal and alphabet.
- A symbol is a distinguishable character, such as “a”, “b” or “c”.
- Any permissible sequence of symbols is called a terminal (also referred to as a word).
- A finite, nonempty set ∑ of terminals is called an alphabet.

- Definition 3-2. String and sets of strings.
- Let Σ be an alphabet.
- A finite sequence of symbols
*S=(x1 x2… xn), n≥0, x∈Σ*is called a*string*in alphabet Σ. - The
*length*|S| of string S is*n*. - The
*empty string*is the sequence of length 0; written ε*.* - Σ* is the set of all strings in Σ.
- In addition, Σ+ = Σ*- {
*ε*}.

- Definition 3-3. Language and sentence.
- Let Σ be an alphabet.
- Any subset [math]L[/math] of Σ* is called a
*language*over alphabet Σ. - Sequence δ = (
*α1 α2 … αn*), where*αi*∈*L*∀*i*, 1≤*i*≤*n*, n' ∈ natural numbers, is called a*sentence*in language*L*.

- A language follows the rules of a given grammar and is represented by using a particular grammar formalism.
- Definition 3-4. Grammar, rules.

- Definition 3-1. Symbol, terminal and alphabet.