String Edit Distance Function

From GM-RKB
Jump to navigation Jump to search

A String Edit Distance Function is a string metric that is a edit distance function (based on string edit operations).



References

2012

  1. Algorithms and Theory of Computation Handbook, CRC Press LLC, 1999, "Levenshtein distance", in Dictionary of Algorithms and Data Structures, Paul E. Black, ed., U.S. National Institute of Standards and Technology, 14 August 2008 (accessed 31 October 2011).
  2. See p. 190 in Jacobs, Nico “Relational Sequence Learning and User Modelling”. Katholieke Universiteit Leuven, Faculteit Wetenschappen, Faculteit Toegepaste Wetenschappen; Departement Computerwetenschappen, Leuven, Oct. 2004, xvi + 235 pp.

2009

  • Wiktionary.
    • String distance: (computer science) Any of several metrics that represent the degree of similarity betwen two strings of characters. The string distance between 'here' and 'there' is 1 (insertion of a t).


  • www.stanford.edu/class/cs276/handouts/lecture3-tolerantretrieval.ppt
    • Given two strings S1 and S2, the minimum number of operations to convert one to the other Operations are typically character-level Insert, Delete, Replace, (Transposition) E.g., the edit distance

      from dof to dog is 1

      From cat to act is 2 (Just 1 with transpose.)

      from cat to dog is 3.

    • Generally found by dynamic programming.

      Weighted edit distance can encode background knowledge such as that the letter "m" is more likely to be mis-typed as the letter "n" than as the letter "q".

2005

  1. One could also straight-forwardly imagine a different regression-based scenario in which [math]\displaystyle{ z }[/math] is real-valued, or also a ranking-based criteria, in which two pairs are provided and [math]\displaystyle{ z }[/math] indicates which pair of strings should be considered closer.

2003

2000

  • (Zhu & Ungar, 2000) ⇒ J. J. Zhu, and Lyle H. Ungar. (2000). “String Edit Analysis for Merging Databases].” In: KDD 2000 Workshop on Text Mining.

1997

  • (Porter & Winkler, 1997) ⇒ Edward H. Porter, and William E. Winkler. (1997). “Approximate String Comparison and its Effect on an Advanced Record Linkage Systems. U.S. Bureau of the Census, Research Report.

1990

  • (Winkler, 1990) ⇒ William E. Winkler. (1990). “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage.” In: Proceedings of the Section on Survey Research Methods, American Statistical Association, 354-359.

1981

  • (Smith & Waterman, 1981) ⇒ T.F. Smith, and M.S. Waterman. (1981). “Identification of Common Molecular Subsequences.” In: Journal of Molecular Biology, 147.