2010 OntologyBasedIE

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Ontology-based Information Extraction.

Notes

Cited By

Quotes

Abstract

Information extraction (IE) aims to retrieve certain types of information from natural language text by processing them automatically. For example, an IE system might retrieve information about geopolitical indicators of countries from a set of web pages while ignoring other types of information. Ontology-based information extraction (OBIE) has recently emerged as a subfield of information extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the IE process. Because of the use of ontologies, this field is related to knowledge representation and has the potential to assist the development of the Semantic Web. In this paper, we provide an introduction to ontology-based information extraction and review the details of different OBIE systems developed so far. We attempt to identify a common architecture among these systems and classify them based on different factors, which leads to a better understanding on their operation. We also discuss the implementation details of these systems including the tools used by them and the metrics used to measure their performance. In addition, we attempt to identify the possible future directions for this field.

References

  • 1. Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach, Pearson Education, 2003
  • 2. Ellen Riloff, Information extraction as a stepping stone toward story understanding, Understanding language understanding: computational models of reading, MIT Press, Cambridge, MA, 1999
  • 3. Thomas R. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition, v.5 n.2, p.199-220, June 1993 [doi>10.1006/knac.1993.1008]
  • 4. Rudi Studer, V. Richard Benjamins, Dieter Fensel, Knowledge engineering: principles and methods, Data & Knowledge Engineering, v.25 n.1-2, p.161-197, March 1998 [doi>10.1016/S0169-023X(97)00056-6]
  • 5. C. Hwang, Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: E. Franconi and M. Kifer (eds), Proceedings of the 6th International Workshop on Knowledge Representation Meets Databases (ACM, New York, 1999).
  • 6. B. Adrian, G. Neumann, A. Troussov and B. Popov, Proceedings of the First International and KI-08 Workshop on Ontology-Based Information Extraction Systems (DFKI, Kaiserslautern, Germany, 2008).
  • 7. Ermelinda Oro, Massimo Ruffolo, Towards a System for Ontology-Based Information Extraction from PDF Documents, Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems, November 09-17, 2008, Monterrey, Mexico [doi>10.1007/978-3-540-88873-4_38]
  • 8. Yaoyong Li, Kalina Bontcheva, Hierarchical, perceptron-like learning for ontology-based Information Extraction, Proceedings of the 16th International Conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada [doi>10.1145/1242572.1242677]
  • 9. Fei Wu, Daniel S. Weld, Autonomously semantifying wikipedia, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal [doi>10.1145/1321440.1321449]
  • 10. Daniel S. Weld, Raphael Hoffmann, Fei Wu, Using Wikipedia to bootstrap open information extraction, ACM SIGMOD Record, v.37 n.4, December 2008 [doi>10.1145/1519103.1519113]
  • 11. David W. Embley, Toward semantic understanding: an approach based on information extraction ontologies, Proceedings of the 15th Australasian database conference, p.3-12, January 01, 2004, Dunedin, New Zealand
  • 12. Alexander Maedche, Günter Neumann, Steffen Staab, Bootstrapping an ontology-based information extraction system, Intelligent exploration of the web, Physica-Verlag GmbH, Heidelberg, Germany, 2003
  • 13. Burcu Yildiz, Silvia Miksch, ontoX - a method for ontology-driven Information Extraction, Proceedings of the 2007 International Conference on Computational science and its applications, August 26-29, 2007, Kuala Lumpur, Malaysia
  • 14. Paul Buitelaar, Philipp Cimiano, Peter Haase, Michael Sintek, Towards Linguistically Grounded Ontologies, Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, May 31-June 04, 2009, Heraklion, Crete, Greece [doi>10.1007/978-3-642-02121-3_12]
  • 15. L. McDowell and M.J. Cafarella, Ontology-driven information extraction with OntoSyphon. In: Proceedings of the 5th International Semantic Web Conference (Springer, Berlin, 2006).
  • 16. Fei Wu, Raphael Hoffmann, Daniel S. Weld, Information extraction from Wikipedia: moving down the long tail, Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2008, Las Vegas, Nevada, USA [doi>10.1145/1401890.1401978]
  • 17. J. Kietz, A. Maedche and R. Volz, A method for semi-automatic ontology acquisition from a corporate intranet . In: Proceedings of the EKAW***00 Workshop on Ontologies and Text (Springer, Berlin, 2000).
  • 18. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, Oren Etzioni, Open information extraction from the web, Proceedings of the 20th international joint conference on Artifical intelligence, p.2670-2676, January 06-12, 2007, Hyderabad, India
  • 19. M. Vargas-Vera, E. Motta, J. Domingue, S.B. Shum and M. Lanzoni, Knowledge extraction by using an ontology-based annotation tool. In: Proceedings of the Workshop on Knowledge Markup and Semantic Annotation (ACM, New York, 2001).
  • 20. Thierry Declerck, Christian Federmann, Bernd Kiefer, Hans-Ulrich Krieger, Ontology-Based Information Extraction and Reasoning for Business Intelligence Applications, Proceedings of the 31st annual German conference on Advances in Artificial Intelligence, September 23-26, 2008, Kaiserslautern, Germany [doi>10.1007/978-3-540-85845-4_48]
  • 21. Daya C. Wimalasuriya, Dejing Dou, Using multiple ontologies in Information Extraction, Proceeding of the 18th ACM conference on Information and knowledge management, November 02-06, 2009, Hong Kong, China [doi>10.1145/1645953.1645985]
  • 22. T. Berners-Lee, J. Hendler and O. Lassila, The Semantic Web, Scientific American 284(5) (2001).
  • 23. Philipp Cimiano, Siegfried Handschuh, Steffen Staab, Towards the self-annotating web, Proceedings of the 13th International Conference on World Wide Web, May 17-20, 2004, New York, NY, USA [doi>10.1145/988672.988735]
  • 24. D. Maynard, W. Peters and Y. Li, Metrics for evaluation of ontology-based information extraction. In: Proceedings of the WWW 2006 Workshop on Evaluation of Ontologies for the Web (ACM, New York, 2006).
  • 25. David I. Seidman, John J. Ritsko, Preface, IBM Systems Journal, v.43 n.3, p.449-450, July 2004 [doi>10.1147/sj.433.0449]
  • 26. Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, Angel Kirilov, KIM – a semantic platform for information extraction and retrieval, Natural Language Engineering, v.10 n.3-4, p.375-392, September 2004 [doi>10.1017/S135132490400347X]
  • 27. B. Popov, A. Kiryakov, A. Kirilov, D. Manov, D. Ognyanoff and M. Goranov, KIM - semantic annotation platform. In: Proceedings of the 2nd International Semantic Web Conference (Springer-Verlag, Berlin, 2003).
  • 28. G.A. Miller and C. Fellbaum, WordNet: A Lexical Database for the English Language (2006). Available at: http://wordnet.princeton.edu (accessed 25 June 2009).
  • 29. M. Musen, N. Noy, M. O***Connor, T. Redmond, D. Rubin, S. Tu, T. Tudorache and J. Vendetti, Protégé Ontology Editor and Knowledge Acquisition System (2005). Available at: http://protege.stanford.edu (accessed 25 June 2009).
  • 30. M. Dean, G. Schreiber, S. Bechhofer, F.V. Harmelen, J. Hendler, I. Horrocks, D.L. McGuinness, P.F. Patel-Schneider and L.A. Stein, OWL Web Ontology Language Reference (2004). Available at: www.w3.org/TR/owl-ref (accessed 25 June 2009).
  • 31. T. Berners-Lee, Cleaning up the User Interface (1997). Available at: www.w3.org/DesignIssues/UI.html (accessed 25 June 2009).
  • 32. Jing Lu, Li Ma, Lei Zhang, Jean-Sébastien Brunner, Chen Wang, Yue Pan, Yong Yu, SOR: a practical system for ontology storage, reasoning and search, Proceedings of the 33rd International Conference on Very large data bases, September 23-27, 2007, Vienna, Austria
  • 33. Horacio Saggion, Adam Funk, Diana Maynard, Kalina Bontcheva, Ontology-based information extraction for business intelligence, Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, November 11-15, 2007, Busan, Korea
  • 34. M. Moens, Information Extraction: Algorithms and Prospects in a Retrieval Context (The Information Retrieval Series) (Springer-Verlag, Secaucus, NJ, 2003).
  • 35. D.E. Appelt, J.R. Hobbs, J. Bear, D.J. Israel and M. Tyson, FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. In: Ruzena Bajcsy (ed.), Proceedings of the 13th International Joint Conference on Artificial Intelligence (Morgan Kaufmann, Chambéry, France, 1993).
  • 36. H. Cunningham, K. Bontcheva, V. Tablan and D. Maynard, General Architecture for Text Engineering (GATE) (2003). Available at: www.gate.ac.uk (accessed 25 June 2009).
  • 37. (Müller, 2004) ⇒ Hans-Michael Müller, Eimear E. Kenny, and Paul W. Sternberg. (2004). “Textpresso: an ontology-based information retrieval and extraction system for biological literature.” In: PLoS Biol, 2(11):e309. doi:10.1371/journal.pbio.0020309.
  • 38. Stephen Soderland, David Fisher, Jonathan Aseltine, Wendy Lehnert, CRYSTAL inducing a conceptual dictionary, Proceedings of the 14th international joint conference on Artificial intelligence, p.1314-1319, August 20-25, 1995, Montreal, Quebec, Canada
  • 39. T.M. Mitchell, Generalization as search, Artificial Intelligence, 18(2) (1982) 203-226.
  • 40. W. Bruce Croft, Lab report special section: the University of Massachusetts Center for Intelligent Information Retrieval, ACM SIGIR Forum, v.29 n.1, p.1-7, Spring 1995 [doi>10.1145/207556.207557]
  • 41. R. Romano, L. Rokach and O. Maimon, Automatic discovery of regular expression patterns representing negated findings in medical narrative reports. In: Proceedings of the 6th International Workshop on Next Generation Information Technologies and Systems (Springer, Berlin, 2006).
  • 42. E.W. Myers, An O.(ND) difference algorithm and its variations, Algorithmica 1(2) (1986) 251-266.
  • 43. P. Buitelaar and M. Siegel, Ontology-based information extraction with SOBA. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (European Language Resources Association, Genoa, Italy, 2006).
  • 44. Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Using uneven margins SVM and perceptron for Information Extraction, Proceedings of the Ninth Conference on Computational Natural Language Learning, June 29-30, 2005, Ann Arbor, Michigan
  • 45. Ofer Dekel, Joseph Keshet, Yoram Singer, Large margin hierarchical classification, Proceedings of the twenty-first International Conference on Machine learning, p.27, July 04-08, 2004, Banff, Alberta, Canada [doi>10.1145/1015330.1015374]
  • 46. Jerry R. Hobbs, Mark Stickel, Paul Martin, Douglas Edwards, Interpretation as abduction, Proceedings of the 26th annual meeting on Association for Computational Linguistics, p.95-103, June 07-10, 1988, Buffalo, New York [doi>10.3115/982023.982035]
  • 47. A. Maedche and S. Staab, The Text-To-Onto Ontology Learning Environment. In: Software Demonstration at the Eighth International Conference on Conceptual Structures (Springer-Verlag, Berlin, 2000).
  • 48. Amalia Todirascu, Laurent Romary, Dalila Bekhouche, Vulcain - An Ontology-Based Information Extraction System, Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, p.64-75, June 27-28, 2002
  • 49. P. Lopez, Robust Parsing with Lexicalized Tree Adjoining Grammars (PhD Thesis, INRIA, Nancy, France, 1999).
  • 50. Marti A. Hearst, Automatic acquisition of hyponyms from large text corpora, Proceedings of the 14th conference on Computational linguistics, August 23-28, 1992, Nantes, France [doi>10.3115/992133.992154]
  • 51. Philipp Cimiano, Günter Ladwig, Steffen Staab, Gimme' the context: context-driven automatic semantic annotation with C-PANKOW, Proceedings of the 14th International Conference on World Wide Web, May 10-14, 2005, Chiba, Japan [doi>10.1145/1060745.1060796]
  • 52. Fei Wu, Daniel S. Weld, Automatically refining the wikipedia infobox ontology, Proceeding of the 17th International Conference on World Wide Web, April 21-25, 2008, Beijing, China [doi>10.1145/1367497.1367583]
  • 53. T.Q. Dung and W. Kameyama, Ontology-based information extraction and information retrieval in health care domain. In: Proceedings of the 9th International Conference on Data Warehousing and Knowledge Discovery (Springer, Berlin, 2007).
  • 54. Benjamin Adrian, Jörn Hees, Ludger Van Elst, Andreas Dengel, iDocument: using ontologies for extracting and annotating information from unstructured text, Proceedings of the 32nd annual German conference on Advances in artificial intelligence, September 15-18, 2009, Paderborn, Germany
  • 55. E. Prud***hommeaux and A. Seaborne, SPARQL Query Language for RDF (2008). Available at: www.w3.org/TR/rdf-sparql-query/ (accessed 25 June 2009).
  • 56. W. Drozdzynski, M. Becker, H.-U. Krieger, J. Piskorski, U. Schäfer and F. Xu, SProUT (Shallow Processing with Unification and Typed Feature Structures) (2002). Available at: http://sprout.dfki.de (accessed 25 June 2009).
  • 57. C. Manning and D. Jurafsky, The Stanford Natural Language Processing Group (1999). Available at: http://nlp.stanford.edu/index.shtml (accessed 25 June 2009).
  • 58. W.B. Croft, J. Allen, A. McCallum, R. Manmatha and D.A. Smith, The Center for Intelligent Information Retrieval (CIIR) (1992). Available at: http://ciir.cs.umass.edu/ (accessed 25 June 2009).
  • 59. E. Hinrichs, P. Gupta, L. Lemnitzer, R. Barkey, C. Frey, M. Hinrichs and C. Kunze, GermaNet - The German Wordnet (1997). Available at: www.sfs.uni-tuebingen.de/GermaNet/ (accessed 25 June 2009).
  • 60. P. Bhattacharyya and P. Pande, Hindi WordNet: A Lexical Database for Hindi (2001). Available at: www.cfilt.iitb.ac.in/wordnet/webhwn/ (accessed 25 June 2009).
  • 61. D. Fensel and F. van Harmelen, OntoEdit (2002). Available at: www.ontoknowledge.org/about.shtml (accessed 25 June 2009).
  • 62. B. Adrian, H. Maus, M. Kiesel and A. Dengel, Towards ontology-based information extraction and annotation of paper documents for personalized knowledge acquisition. In: Proceedings of the First International Workshop on Personal Knowledge Management (Bonner Köllen Verlag, Solothurn, Switzerland, 2009).
  • 63. L. Sauermann, L. van Elst and A. Dengel, PIMO - a framework for representing personal information models. In: Proceedings of the International Conferences on New Media Technology and Semantic Systems (JUCS, Graz, Austria, 2007).
  • 64. L. Sauermann, A. Bernardi and A. Dengel, Overview and outlook on the semantic desktop . In: Proceedings of the 1st Workshop on the Semantic Desktop at the ISWC 2005 Conference (CEUR-WS, Galway, Ireland, 2005).
  • 65. PROTON Ontology (2005). Available at: http://proton.semanticweb.org/ (accessed 25 June 2009).
  • 66. Jiawei Han, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2005
  • 67. Precision and Recall (2009). Available at: http://en.wikipedia.org/wiki/Precision_and_recall (accessed 25 June 2009).
  • 68. Udo Hahn, Klemens Schnattinger, Towards text knowledge engineering, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.524-531, July 1998, Madison, Wisconsin, United States
  • 69. Xin Dong, Alon Halevy, Jayant Madhavan, Reference reconciliation in complex information spaces, Proceedings of the 2005 ACM SIGMOD International Conference on Management of data, June 14-16, 2005, Baltimore, Maryland [doi>10.1145/1066157.1066168]
  • 70. D. Dou, D.V. McDermott and P. Qi, Ontology translation on the semantic web, Journal of Data Semantics 2(1) (2005) 35-57.
  • 71. B.C. Grau, B. Parsia and E. Sirin, Working with multiple ontologies on the semantic web. In: Proceedings of the 3rd International Semantic Web Conference (Springer, Berlin, 2004).

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 OntologyBasedIEOntology-based Information Extraction: An introduction and a survey of current approacheshttp://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf10.1177/0165551509360123