Damerau-Levenshtein Distance Function

From GM-RKB
(Redirected from Damerau-Levenshtein)
Jump to navigation Jump to search

A Damerau-Levenshtein Distance Function is a variant of the Levenshtein Distance Function where the Transposition Operation counts as a single operation.



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Damerau–Levenshtein_distance Retrieved:2015-8-17.
    • In information theory and computer science, the Damerau–Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein ) is a distance (string metric) between two strings, i.e., finite sequence of symbols, given by counting the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters. In his seminal paper, Damerau not only distinguished these four edit operations but also stated that they correspond to more than 80% of all human misspellings. Damerau's paper considered only misspellings that could be corrected with at most one edit operation. The Damerau–Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations. The classical Levenshtein distance only allows insertion, deletion, and substitution operations. Modifying this distance by including transpositions of adjacent symbols produces a different distance measure, known as the Damerau–Levenshtein distance.[1] While the original motivation was to measure distance between human misspellings to improve applications such as spell checkers, Damerau–Levenshtein distance has also seen uses in biology to measure the variation between DNA. [2]
  1. . The isbn produces two hits: a 2007 work and a 2010 work at World Cat.
  2. The method used in:

2009

  • http://alias-i.com/lingpipe/demos/tutorial/stringCompare/read-me.html
    • String comparison attempts to measure the similarity between strings. This is useful for applications ranging from database deduplication and record linkage to terminology extraction, spell checking, and k-nearest-neighbors classifiers. In this tutorial, we demonstrate the ways in which string comparisons are used in LingPipe.
    • Damerau-Levenstein Distance The simplest form of edit distances were introduced in the by Damerau (1964) and Levenshtein (1965). The distance between two strings is defined as the minimal number of edits required to convert one into the other. In the simpler Levenshtein distance, the allowable edit operations are the deletion of a single character, the insertion of a single character and the substitutione of one character for another. In Damerau's version, transposition is also an allowable edit.
  • http://www.algorithmic-solutions.info/leda_manual/distance.html
    • An instance D of the data type distance is an object maintaining two strings and providing several string distance functions.
    • The Damerau-Levenshtein distance is an extension of Levenshtein distance that counts transposition as a single edit operation. Strictly speaking, the Damerau-Levenshtein distance is equal to the minimal number of insertions, deletions, substitutions and transpositions needed to transform one string into the other.