2007 LeveragingCtxtInUserCentEntDetection

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Entity Detection, Content Syndication, Contextual, Context, Contextual Shortcuts, Information Extraction

Notes

Cited By

Quotes

Abstract

A user-centric entity detection system is one in which the primary consumer of the detected entities is a person who can perform actions on the detected entities (e.g. perform a search, view a map, shop, etc.). We contrast this with machine-centric detection systems where the primary consumer of the detected entities is a machine. Machine-centric detection systems typically focus on the quantity of detected entities, measured by precision and recall metrics, with the goal of correctly identifying every single entity in a document.

However, the simple precision/recall scores of machine-centric entity detection systems fail to accurately reflect the quality of detected entities in user-centric systems, where users may not necessarily want to “see” every possible entity. We posit that not all of the detected entities in a given piece of text are necessarily relevant to the main topic of the text, nor are they necessarily interesting enough to the user to warrant further action. In fact, presenting all of the detected entities to a user may annoy the user to the point where he decides to turn this capability off completely, an undesirable outcome. Therefore, we propose to measure the quality and utility of user-centric entity detection systems in three core dimensions: the accuracy, the interestingness, and the relevance of the entities it presents to the user. We show that leveraging surrounding context can greatly improve the performance of such systems in all three dimensions by employing novel algorithms for generating a concept vector and for finding concept extensions using search query logs.

We extensively evaluate the proposed algorithms within Contextual Shortcuts - a large-scale user-centric entity detection platform - using 1,586 entities detected over 1,519 documents. The results confirm the importance of using context within user-centric entity detection systems, and validate the usefulness of the proposed algorithms by showing how they improve the overall entity detection quality within Contextual Shortcuts.


References

  • D. Appelt, J. Hobbs, J. Bear, D. J. Israel, and M. Tyson. FASTUS: a finite-state processor for information extraction from real-world text. In: Proceedings of IJCAI-93, 1993.
  • Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, Megumi Kameyama, David Martin, Karen Myers, Mabry Tyson, SRI International FASTUS system: MUC-6 test results and analysis, Proceedings of the 6th conference on Message understanding, November 06-08, 1995, Columbia, Maryland doi:10.3115/1072399.1072420
  • S. Baluja, V. Mittal, and R. Sukthankar. Applying Machine Learning for High Performance Named-Entity Extraction. Computational Intelligence, 16(4), November 2000.
  • 4. Oliver Bender, Franz Josef Och, Hermann Ney, Maximum entropy models for named entity recognition, Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, p.148-151, May 31, 2003, Edmonton, Canada doi:10.3115/1119176.1119196
  • Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name, Machine Learning, v.34 n.1-3, p.211-231, Feb. 1999 doi:10.1023/A:1007558221122
  • A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, 1998.
  • Susan Dumais, Edward Cutrell, Raman Sarin, Eric Horvitz, Implicit queries (IQ) for contextualized search, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom doi:10.1145/1008992.1009137
  • (Frank et al., 1999) ⇒ Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. (1999). “Domain-Specific Keyphrase Extraction.” In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 1999)
  • J. Goodman and V. R. Carvalho. Implicit queries for email. In: Proceedings of the 2nd Conference on Email and Anti-Spam, 2005.
  • Ralph Grishman, Beth Sundheim, Design of the MUC-6 evaluation, Proceedings of the 6th conference on Message understanding, November 06-08, 1995, Columbia, Maryland doi:10.3115/1072399.1072401.
  • Monika Henzinger, Bay-Wei Chang, Brian Milch, Sergey Brin, Query-free news search, Proceedings of the 12th International Conference on World Wide Web, May 20-24, 2003, Budapest, Hungary doi:10.1145/775152.775154
  • Anette Hulth, Improved automatic keyword extraction given more linguistic knowledge, Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, p.216-223, July 11, 2003 doi:10.3115/1119355.1119383
  • P. Jackson and I. Moulinier. Natural Language Processing for Online Applications. John Benjamins Publishing Company, 2002.
  • S. Kapur and D. Joshi. Systems and methods for generating concept units from search queries. United States Patent 7051023, May 2006.. 15. Bonnie A. Nardi, James R. Miller, David J. Wright, Collaborative, programmable intelligent agents, Communications of the ACM, v.41 n.3, p.96-104, March 1998 doi:10.1145/272287.272331
  • David D. Palmer, David S. Day, A statistical profile of the Named Entity task, Proceedings of the fifth Conference on Applied Natural Language Processing, p.190-193, March 31-April 03, 1997, Washington, DC doi:10.3115/974557.974585.
  • Milind S. Pandit, Sameer Kalbag, The selection recognition agent: instant access to relevant information and operations, Proceedings of the 2nd International Conference on Intelligent user interfaces, p.47-52, January 06-09, 1997, Orlando, Florida, United States doi:10.1145/238218.238285.
  • Jignashu Parikh, Shyam Kapur, Unity: relevance feedback using user query logs, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA doi:10.1145/1148170.1148319
  • Gerard Salton, Chris Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, Ithaca, NY, 1987. 20. Wen-tau Yih, Joshua Goodman, Vitor R. Carvalho, Finding advertising keywords on web pages, Proceedings of the 15th International Conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland doi:10.1145/1135777.1135813
  • Peter D. Turney, Learning Algorithms for Keyphrase Extraction, Information Retrieval, v.2 n.4, p.303-336, May 2000 doi:10.1023/A:1009976227802,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 LeveragingCtxtInUserCentEntDetectionVadim von Brzeski
Utku Irmak
Reiner Kraft
Leveraging Context in User-Centric Entity Detection SystemsProceedings of the sixteenth ACM conference on Conference on information and knowledge managementhttp://portal.acm.org/citation.cfm?id=132153710.1145/1321440.13215372007