1997 DisambigOfProperNamesInText

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Named Entity Mention Resolution Algorithm, Named Entity Mention Detection.

Notes

Cited By

Quotes

Abstract

  • Identifying the occurrences of proper names in text and the entities they refer to can be a difficult task because of the manyto-many mapping between names and their referents. We analyze the types of ambiguity - structural and semantic - that make the discovery of proper names difficult in text, and describe the heuristics used to disambiguate names in Nominator, a fully-implemented module for proper name recognition developed at the IBM

1. Introduction

  • The need to identify proper names has two aspects: the recognition of known names and the discovery of new names. Since obtaining and maintaining a name database requires significant effort, many applications need to operate in the absence of such a resource.
  • A PP may be attached to the preceding NP and form part of a single large name, as in NP[Midwest Center PP[for NP[Computer Research]]). Alternatively it may be independent of the preceding NP, as in NP[Carnegie Hall] PP[for NP[Irwin Berlin]], where for separates two distinct names, Carnegie Hall and Irwin Berlin.

Conclusion

  • In another sense, however, development of a module like Nominator still requires considerable human effort to discover reliable heuristics, particularly when only minimal information is used. These heuristics are somewhat domain dependent: different generalizations hold for names of drugs and chemicals than those identified for names of people or organizations. In addition, as the heuristics depend on linguistic conventions, they are language dependent, and need updating when stylistic conventions change. Note, for example, the recent popularity of software names which include exclamation points as part of the name. Because of these difficulties, we believe that for the forseeable future, practical applications to discover new names in text will continue to require the sort of human effort invested in Nominator.

References

  • Agarwal R. and L. Boggess, (1992). A simple but useful approach to conjunct identification In: Proceedings of the 30th Annual Meeting of the ACL, pp.15-21, Newark, Delaware, June.
  • Brill E. and P. Resnick, (1994). A rule-based ap-

proach to prepositional phrase disambiguation, URL: http://xxx.lanl.gov/list/cmp.lg/9410026. Coates-Stephens S., (1993). The analysis and acquisi- tion of proper names for the understanding of free text, In Computers and the Humanities, Vol.26, pp.441-456. Cowie J. and W. Lehnert., (1996). Information Extraction In Communications of the ACM, Vol.39(1), pp.83-92. Cowie J., L. Guthric, Y. Wilks, James Pustejovsky and S. Waterman, (1992). Description of the Solomon System as used for MUC-4 In: Proceedings of the Fourth Message Understanding Conference, pp.223-232. Jensen K. and Binot J-L, (1987). Disambiguating prepositional phrase attachments by using on-line definitions, In Computational Linguistics, Vol. 13, 3-4, pp.251-260. Hayes P., (1994). NameFinder: Software that finds names in text, In: Proceedings of RIAO 94, pp.762-774, New York, October. Hindle D. and M. Rooth., (1993). Structural am- biguity and lexical relations, In Computational Linguistics, Vol.19, [math]\displaystyle{ i }[/math], pp.103-119. Mani I., T.R. Macmillan, S. Luperfoy, E.P. Lusher, and S.J. Laskowski, (1993). Identifying unknown proper names in newswire text. In B. Boguraev and James Pustejovsky, eds., Corpus Processing for Lexical Acquisition, pp.41-54, MIT Press, Cam- bridge, Mass. McDonald D.D., (1993). Internal and external evi- dence in the identification and semantic catego- rization of proper names. In B. Boguraev and James Pustejovsky, eds, Corpus Processing for Lezi- cal Acquisition, pp.61-76, MIT Press, Cambridge, Mass. NIST (1993). TIPSTER Information-Retrieval Text Research Collection, on CD-ROM, published by The National Institute of Standards and Technol- ogy, Gaithersburg, Maryland. Paik W., E.D. Liddy, E. Yu, and M. McKenna, 1993. Categorizing and standardizing proper nouns for efficient information retrieval, In B. Boguraev and James Pustejovsky, eds, Corpus Processing for Lezi- cal Acquisition, pp.44-54, MIT Press, Cambridge, Mass. Quirk R., S. Greenbaum, G. Leech and J. Svar- tik, 1972. A Grammar of Contemporary English, Longman House, Harlow, U.K.

  • Ravin Y. and N. Wacholder, (1996). Extracting

Names from Natural-Language Text, IBM Re- search Report 20338.

  • Wacholder N., Y. Ravin and R.J. Byrd, (1994). Retrieving information from full text using linguistic knowledge, In: Proceedings of the Fifteenth National Online Meeting, pp.441-447, New York, May.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1997 DisambigOfProperNamesInTextNina Wacholder
Yael Ravin
Misook Choi
Disambiguation of Proper Names in TextProceedings of the fifth Conference on Applied Natural Language Processinghttp://acl.ldc.upenn.edu/A/A97/A97-1030.pdf1997