Levenshtein Edit Distance Measure

From GM-RKB
Jump to navigation Jump to search

A Levenshtein Edit Distance Measure is a string edit distance function that is based on the minimum number of string edit operations required to transform one string into another string.



References

2011

  • (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Levenshtein_distance
    • In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The term edit distance is often used to refer specifically to Levenshtein distance. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.[1]

      For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

      1. kitten → sitten (substitution of 's' for 'k')
      2. sitten → sittin (substitution of 'i' for 'e')
      3. sittin → sitting (insertion of 'g' at the end).
    • Levenshtein distance is not the only popular notion of edit distance. Variations can be obtained by changing the set of allowable edit operations: for instance,
    • Edit distance in general is usually defined as a parametrizable metric in which a repertoire of edit operations is available, and each operation is assigned a cost (possibly infinite). This is further generalized by DNA sequence alignment algorithms such as the Smith–Waterman algorithm, which make an operation's cost depend on where it is applied.
  1. В.И. Левенштейн (1965). "Двоичные коды с исправлением выпадений, вставок и замещений символов". Доклады Академий Наук СCCP 163 (4): 845–8.  Appeared in English as: Levenshtein VI (1966). "Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10: 707–10. http://www.scribd.com/doc/18654513/levenshtein?secret_password=1aycnw239qw4jqjtsm34#full. 

2009

  • http://www.algorithmic-solutions.info/leda_manual/distance.html
    • An instance D of the data type distance is an object maintaining two strings and providing several string distance functions.
      • The Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
      • The General Levenshtein distance is given by the minimum sum of the costs over a sequence of operations needed to transform one string into the other, where operation is an insertion, deletion, or substitution and a cost is assigned to each of the operations.
      • The Damerau-Levenshtein distance is an extension of Levenshtein distance that counts transposition as a single edit operation. Strictly speaking, the Damerau-Levenshtein distance is equal to the minimal number of insertions, deletions, substitutions and transpositions needed to transform one string into the other.

2003