Jaro-Winkler Distance Measure

Jump to: navigation, search

A Jaro-Winkler Distance Measure is an unit edit distance measure that modifies the weights of poorly matching string pairs that share a common string prefix.



  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Jaro–Winkler_distance Retrieved:2015-3-24.
    • In computer science and statistics, the Jaro–Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995), a type of string edit distance, and was developed in the area of record linkage (duplicate detection) (Winkler, 1990). The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.


  • http://alias-i.com/lingpipe/demos/tutorial/stringCompare/read-me.html
    • QUOTE: String comparison attempts to measure the similarity between strings. This is useful for applications ranging from database deduplication and record linkage to terminology extraction, spell checking, and k-nearest-neighbors classifiers. In this tutorial, we demonstrate the ways in which string comparisons are used in LingPipe.

      ... Jaro-Winkler Distance There are a family of distance measures defined by the U.S. Census Bureau for comparing single person names. The original metric was defined by Matt Jaro and later refined by Bill Winkler.





  • Edward H. Porter, and William E. Winkler. (1997). "Approximate String Comparison and its Effect on an Advanced Record Linkage Systems. U.S. Bureau of the Census, Research Report.



  • (Winkler, 1990) ⇒ William E. Winkler. (1990). "String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage." In: Proceedings of the Section on Survey Research Methods, American Statistical Association.