Jaro-Winkler Distance Measure

From GM-RKB
Jump to: navigation, search

A Jaro-Winkler Distance Measure is an unit edit distance measure that modifies the weights of poorly matching string pairs that share a common string prefix.



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Jaro–Winkler_distance Retrieved:2015-3-24.
    • In computer science and statistics, the Jaro–Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995), a type of string edit distance, and was developed in the area of record linkage (duplicate detection) (Winkler, 1990). The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.

2009


  • http://alias-i.com/lingpipe/demos/tutorial/stringCompare/read-me.html
    • QUOTE: String comparison attempts to measure the similarity between strings. This is useful for applications ranging from database deduplication and record linkage to terminology extraction, spell checking, and k-nearest-neighbors classifiers. In this tutorial, we demonstrate the ways in which string comparisons are used in LingPipe.

      ... Jaro-Winkler Distance There are a family of distance measures defined by the U.S. Census Bureau for comparing single person names. The original metric was defined by Matt Jaro and later refined by Bill Winkler.

2006

2003

1999

1997

  • Edward H. Porter, and William E. Winkler. (1997). "Approximate String Comparison and its Effect on an Advanced Record Linkage Systems. U.S. Bureau of the Census, Research Report.

1995

1990

  • (Winkler, 1990) ⇒ William E. Winkler. (1990). "String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage." In: Proceedings of the Section on Survey Research Methods, American Statistical Association.

1989