2006 ClassiSemRelsBetNominalsTask4Descr

From GM-RKB
Jump to navigation Jump to search

Subject Headings: SemEval-1 Task 4, SemEval, Relation Recognition from Text Task.


Notes


Cited By


Quotes

Description of the Task

There is growing interest in the task of classifying semantic relations between pairs of words. However, many different classification schemes have been used, which makes it difficult to compare the various classification algorithms. We will create a benchmark dataset and evaluation task that will enable researchers to compare their algorithms.

Rosario and Hearst (2001) classify noun-compounds from the medical domain, using a set of 13 classes that describe the semantic relation between the head noun and the modifier in a given nouncompound. Rosario et al. (2002) classify noun-compounds using a multi-level hierarchy of semantic relations, with 15 classes at the top level. Nastase and Szpakowicz (2003) present a two-level hierarchy for classifying noun-modifier relations in general domain text, with 5 classes at the top and 30 classes at the bottom. Their class scheme and dataset have been used by other researchers (Turney and Littman, 2005; Turney, 2005; Nastase et al., 2006). Moldovan et al. (2004) use a 35-class scheme to classify relations in noun phrases. The same scheme has been applied to noun compounds (Girju et al., 2005). Chklovski and Pantel (2004) use a 5-class scheme, designed specifically for characterizing verbverb semantic relations. Stephens et al. (2001) use a 17-class scheme created for relations between genes. Lapata (2002) uses a 2-class scheme for classifying relations in nominalizations.

Algorithms for classifying semantic relations have potential applications in Information Retrieval, Information Extraction, Summarization, Machine Translation, Question Answering, Paraphrasing, Recognizing Textual Entailment, Thesaurus Construction, Semantic Network Construction, Word Sense Disambiguation, and Language Modeling. As the techniques for semantic relation classification mature, some of these applications are being tested. Tatu and Moldovan (2005) applied the 35-class scheme of Moldovan et al. (2004) to the PASCAL Recognizing Textual Entailment (RTE) challenge, obtaining significant improvement in a state-of-the-art algorithm.

There is no consensus on schemes for classifying semantic relations, and it seems unlikely that any single scheme could be useful for all applications. For example, the gene-gene relation scheme of Stephens et al. (2001) includes relations such as “X phosphorylates Y”, which are not very useful for general domain text. Even if we focus on general domain text, the verb-verb relations of Chklovski and Pantel (2004) are unlike the noun-modifier relations of Nastase and Szpakowicz (2003) or the noun phrase relations of Moldovan et al. (2004).

We will create a benchmark dataset for evaluating semantic relation classification algorithms, embracing several different existing classification schemes, instead of attempting the daunting chore of creating a single unified standard classification scheme. We will treat each semantic relation separately, as a single two-class (positive-negative) classification task, rather than taking a whole N-class scheme of relations as an N-class classification task (Nastase and Szpakowicz, 2003).

To constrain the scope of the task, we have chosen a specific application for semantic relation classification, relational search (Cafarella et al., 2006). We describe this application in Section 2. The application we envision is a kind of search engine that can answer queries such as “list all X such that X causes asthma” (Girju, 2001). Given this application, we have decided to focus on semantic relations between nominals (i.e., nouns and base noun phrases, excluding named entities).

The dataset for the task will consist of annotated sentences. We will select a sample of relation classes from several different classification schemes and then gather sentences from the Web using a search engine. We will manually markup the sentences, indicating the nominals and their relations. Algorithms will be evaluated by their average classification performance over all of the sampled relations, but we will also be able to see whether some relations are more difficult to classify than others, and whether some algorithms are best suited for certain types of relations.



References

  • Cafarella, M.J., Banko, M., and Oren Etzioni (2006). Relational Web Search. University of Washington, Department of Computer Science and Engineering, Technical Report 2006-04-02.
  • Chklovski, T., and Pantel, P. (2004). VerbOcean: Mining the Web for fine-grained semantic verb relations. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04). pp. 33-40. Barcelona, Spain.
  • Gildea, D., and Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3):245-288.
  • Girju, R. (2001). Answer fusion with on-line ontology development. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) - Student Research Workshop (NAACL 2001), Pittsburgh, PA.
  • Girju, R., Moldovan, D., Tatu, M., Antohe, D. (2005). On the semantics of noun compounds. Computer Speech and Language, 19:479-496.
  • Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING-92), pages 539-545.
  • Lapata, M. (2002). The disambiguation of nominalisations. Computational Linguistics, 28(3):357-388.
  • Moldovan, D., Badulescu, A., Tatu, M., Antohe, D., and Girju, R. (2004). Models for the semantic classification of noun phrases. In: Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004, pp. 60-67, Boston, MA.
  • Nastase, V., and Szpakowicz, S. (2003). Exploring noun-modifier semantic relations. In Fifth International Workshop on Computational Semantics (IWCS-5), pp. 285-301. Tilburg, The Netherlands.
  • Nastase, V., Sayyad-Shirabad, J., Sokolova, M., and Szpakowicz, S. (2006). Learning noun-modifier semantic relations with corpus-based and WordNet-based features. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006). Boston, MA.
  • Rosario, B., and Hearst, M. (2001). Classifying the semantic relations in noun-compounds via a domain-specific lexical hierarchy. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-01), pp. 82-90.
  • Rosario, B., Hearst, M., and Fillmore, C. (2002). The descent of hierarchy, and selection in relational semantics. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), Philadelphia, PA, pp. 417-424.
  • Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R., and Mostafa, J. (2001). Detecting gene relations from MEDLINE abstracts. In: Proceedings of the Sixth Annual Pacific Symposium on Biocomputing, pp. 483-496.
  • Tatu, M., and Moldovan, D. (2005). A semantic approach to recognizing textual entailment. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pp. 371-378, Vancouver, Canada.
  • Turney, P.D. (2005). Measuring semantic similarity by latent relational analysis. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), pp. 1136-1141, Edinburgh, Scotland.
  • Turney, P.D. and Littman, M.L.. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning, 60(1–3):251-278.

,