2008 OCASOntBasCorpusandAnnotSchTowanOBIE...

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Topic Hierarchy Generation, Text Segment, Clustering, Partitioning, Search-Result Snippet, Web Data Mining.

Notes

Cited By

Quotes

Abstract

This paper presents strategies and lessons learned from the creation of a corpus. It suggests a gold standard for evaluating ontology-based information extraction (OBIE) systems. This OBIE gold standard is called OCAS2008 and consists of: (i) an OBIE layer cake for comparing OBIE systems by subtasks, (ii) a document corpus of 121 documents with 31,000 words about a closed domain, (iii) a compact domain ontology including more than 40,000 instances, (iv) two annotation scenarios that extend traditional template-based evaluations, (v) an annotation set that contains typed annotations according to the ontology and the OBIE layer cake, (vi) annotations that concern text phrases, symbols, instances, explicitly written facts, implicit facts, and (vii) finally, human created annotations according to predefined specifications. We claim that the use of OCAS2008 provides a basis for comparable and significant evaluations of OBIE systems.

References

  • Hobbs, J.R.: The generic information extraction system. In: MUC5 ’93: Proceedings of the 5th conference on Message understanding, Morristown, NJ, USA, ACL (1993) 87–91
  • Sintek, M., Junker, M., van Elst, L., Abecker, A.: Using Information Extraction Rules for Extending Domain Ontologies. In: Workshop on Ontology Learning. CEUR-WS.org (2001)
  • Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering 10(3/4) (2004) 349 — 373
  • Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology-based Information Extraction System. Studies in Fuzziness and Soft Computing. In Szczepaniak, P., Segovia, J., Kacprzyk, J., Zadeh, L.A., eds.: Intelligent Exploration of the Web. Springer, Berlin (2002)
  • Buitelaar, P., Cimiano, P., Racioppa, S., Siegel, M.: Ontology-based Information Extraction with SOBA. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), ELRA (MAY 2006) 2321–2324
  • Maynard, D.: Benchmarking ontology-based annotation tools for the Semantic Web. In: In UK e-Science Programme All Hands Meeting (AHM2005) Workshop Text Mining, eResearch and Grid-enabled Language Technology. (2005)
  • Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. (2002)
  • Chinchor, N.: Overview of MUC-7/MET-2. In: Message Understanding Conference Proceedings: MUC-7. (1998)
  • NIST: ACE08 Evaluation Plan. http://www.nist.gov/speech/tests/ace/ 2008/doc/ace08-evalplan.v1.2d.pdf (2008)
  • Linguistic Data Consortium, University of Pennsylvania: Creating Data Resources. http://www.ldc.upenn.edu/Creating (2007)
  • Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: Proceedings of DARPA Broadcast News Workshop. (1999) 249–252
  • Peters, W., Aswani, N., Bontcheva, K., Cunningham, H.: Quantitativa Evaluation Tools and Corpora V1. Technical report, SEKT project deliverable D2.5.1 (2005)
  • Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic Extraction of Hierarchical Relations from Text. In: ESWC. (2006) 215–229
  • Adrian, B., Dengel, A.: Believing Finite-state cascades in Knowledge-based Information Extraction. In: KI 2008: Advances in Artificial Intelligence. (2008, to appear)
  • Maynard, D., Peters, W., Li, Y.: Metrics for Evaluation of Ontology-based Information Extraction. In: Proceedings of WWW 2006 Workshop on Evaluation of Ontologies for the Web (EON 2006). (2006)
  • Ogren, P.V.: Knowtator: A Protege plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations. (2006) 273–275
  • Agne, S., Reuschling, C., Dengel, A.: Dynaq - dynamic queries for electronic document management. In: EDOCW ’06: Proceedings of the 10th IEEE on International Enterprise Distributed Object Computing Conference Workshops, Washington, DC, USA, IEEE Computer Society (2006) 61
  • Cimiano, P., Handschuh, S., Staab, S.: Towards the self-annotating web. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, New York, NY, USA, ACM (2004) 462–471,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 OCASOntBasCorpusandAnnotSchTowanOBIE...Alexander Grothkast
Benjamin Adrian
Kinga Schumacher
Andreas Dengel
Ontology-Based Corpus and Annotation Scheme. Towards an OBIE Gold Standard that Contains Even Implicit FactsProceedings of the High-level Information Extraction Workshophttp://www.ecmlpkdd2008.org/sites/ecmlpkdd2008.org/files/pdf/workshops/hlie/3.pdf2008