Person Record Duplicate Detection Task

Jump to: navigation, search

A Person Record Duplicate Detection Task is a Domain Specific Duplicate Record Detection Task that requires the Detection of Person Records is a Person Record Set that Refer to the same person.



  • (Ferreira et al., 2012) ⇒ Anderson A. Ferreira, Marcos André Gonçalves, and Alberto H. F. Laender. (2012). “A Brief Survey of Automatic Methods for Name Disambiguation.” In: SIGMOD Record, 41(2).
    • QUOTE: In case of the author names attribute, a component corresponds to the name of a single unique author and is a reference [math]r_j[/math] to a real author. In case of the other attributes, a component corresponds to a word/term. The objective of a disambiguation method is to produce a function that is used to partition the set of references to authors {r1, . . . , rm} into n sets {a1, . . . , an}, so that each partition ai contains (all and ideally only all) the references to a same author.

      To disambiguate the bibliographic citations of a DL, first we may split the set of references to authors into groups of references whose values of the author name attribute are ambiguous. These are called ambiguous groups (i.e., groups of references having the value of the author name attribute with similar names). The ambiguous groups may be obtained by using blocking methods [37] which address scalability issues avoiding the need for comparisons among all references.