Toponym Normalization Task

From GM-RKB
Jump to navigation Jump to search

A Toponym Normalization Task is an Entity Mention Normalization Task where Toponym Mentions in a Document need to be mapped to Toponym Records in a Gazetteer.



References

2007

  • (Leidner, 2007) ⇒ Jochen L. Leidner. (2007). “Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD Thesis, The University of Edinburgh.
    • The problem of automatic toponym resolution, or computing the mapping from occurrences of names for places as found in a text to an unambiguous spatial footprint of the location referred to, such as a geographic latitude/longitude centroid is difficult to automate due to insufficient and error-prone geographic databases, and a large degree of place name ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous (London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth).
    • This thesis investigates how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text by collecting a repertoire of linguistic heuristics and extra-linguistic knowledge sources such as population. I then investigate how to combine these sources of evidence to obtain a superior method. Noise effects introduced by the named entity tagging that toponym resolution relies on are also studied. While few attempts have been made to solve toponym resolution, these were either not evaluated, or evaluation was done by manual inspection of system output instead of creating a re-usable reference corpus. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required, so a reference gazetteer and an associated novel reference corpus with human-labelled referent annotation were created for this thesis, to be used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics catalogued in the inventory. Performance of the same resolution algorithms is compared under different conditions, namely applying it to the output of human named entity annotation and automatic annotation using an existing Maximum Entropy sequence tagging model.
    • More formally, the task can be describes as follows. We start with a corpus D comprising a set of documents D = {D1, . . ., D|D|} as input. Each document Di comprises a sequence of tokens TOKENS= (TOKEN[1]. . .TOKEN[|TOKENS|]). We further need a gazetteer G, i.e. an inventory that lists all candidate referents R = { R1 . . . R|R | }. A gazetteer entry G(Ti) for a toponym Ti is a tuple containing a feature type7 and set of referents R G for Ti. Here, referents are represented by the centroid of the location’s latitude and longitude, respectively. A toponym resolver is a function FG(·, ·) that maps from a document Di 2 D in which the toponyms are not resolved yet, to a document with the same content in which the toponyms are resolved, i.e. where for each toponym (or for some toponyms, in the case of a partial toponym resolver) a referent from the set of candidate referents has been chosen. Referents can be represented in various ways, including polygons or simply pairs of latitude and longitude of the centroid
  • Jochen L. Leidner. (2007). “Toponym resolution in text: annotation, evaluation and applications of spatial grounding.” In: SIGIR Forum 41(2): 124-126.
    • Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete.

2006

  • Jochen L. Leidnera. (2006). “An Evaluation Dataset for the Toponym Resolution Task." Computers, Environment and Urban Systems 30(4): 400-417.
    • "Toponym resolution is the task of linking place name instances in a text with spatial footprints, given the context in which they occur. Whereas a lot of work on the evaluation of temporal resolution is ongoing (e.g. [Setzer, A., & Gaizauskas, R. (2000). On the importance of annotating temporal event–event relations in text. In LREC 2000 Workshop on annotation standards for temporal information in natural language, Vol. 3 (pp. 1281–1286). Athens, Greece]), to date no reference resource is available to evaluate competing algorithms for toponym resolution. It is thus argued that a shareable, reusable evaluation resource is necessary."

2003

  • Huifeng Li, Rohini K Srihari, Cheng Niu, and Wei Li. (2003). “InfoXtract location normalization: a hybrid approach to geographic references in information extraction.” In: Andr´as Kornai and Beth Sundheim, editors, HLT-NAACL 2003 Workshop: Analysis of Geographic References, pages 39–44. Association for Computational Linguistics.

2002

  • Huifeng Li, Rohini K Srihari, Cheng Niu, and Wei Li. (2002). “Location normalization for information extraction.” In: Nineteenth International Conference on Computational Linguistics (COLING 2002).