2002 AutomaticLabelingOfSemanticRoles

Jump to: navigation, search

Subject Headings: Semantic Role Labeling, FrameNet


Cited By



We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as AGENT or PATIENT, or more domain-specific semantic roles, such as SPEAKER, MESSAGE, and TOPIC. The system is based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers.Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall. Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.


  • Baayen, R.H., R. Piepenbrock, and L. Gulikers. (1995). The CELEX Lexical Database (Release 2) [CD-ROM]. Linguistic Data Consortium, University of Pennsylvania [Distributor], Philadelphia, PA.
  • Baker, Collin F., Charles J. Fillmore, and John B. Lowe. (1998). The Berkeley FrameNet project. In: Proceedings of COLING/ACL, pages 86–90, Montreal, Canada.
  • Blaheta, Don and Eugene Charniak. (2000). Assigning function tags to parsed text. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), pages 234–240, Seattle, Washington.
  • Carroll, Glenn and Mats Rooth. (1998). Valence induction with a head-lexicalized PCFG. In: Proceedings of the 3rd Conference on Empirical Methods in Natural Language Processing (EMNLP 3), Granada, Spain.
  • Eugene Charniak. (1997). Statistical Parsing with a Context-Free Grammar and Word Statistics. In AAAI-97, pages 598–603, Menlo Park, August. AAAI Press.
  • Collins, Michael. (1997). Three generative, lexicalised models for statistical parsing. In: Proceedings of the 35th Annual Meeting of the ACL, pages 16–23, Madrid, Spain.
  • Michael Collins. (1999). Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.
  • Dahiya, Yajan Veer. (1995). Panini as a linguist: Ideas and Patterns. Eastern Book Linkers, Delhi, India. Defense Advanced Research Projects Agency, editor. (1998). Proceedings of 7th Message Understanding Conference.
  • Dowty, David R. (1991). Thematic proto-roles and argument selction. Language, 67(3):547–619.
  • Fellbaum, Christiane, editor. (1998). WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts.
  • Fillmore, Charles J. 1968. The case for case. In EmmonW. Bach and Robert T. Harms, editors, Universals in Linguistic Theory. Holt, Rinehart &Winston, New York, pages 1–88.
  • Fillmore, Charles J. 1971. Some problems for case grammar. In R. J. O’Brien, editor, 22nd annual Round Table. Linguistics: developments of the sixties – viewpoints of the seventies, volume 24 of Monograph Series on Language and Linguistics. Georgetown University Press,Washington D.C., pages 35–56.
  • Fillmore, Charles J. 1976. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, volume 280, pages 20–32.
  • Fillmore, Charles J. (1986). Pragmatically controlled zero anaphora. In BLS-86, pages 95–107, Berkeley, California.
  • Fillmore, Charles J. and Collin F. Baker. (2000). FrameNet: Frame semantics meets the corpus. In Poster presentation, 74th Annual Meeting of the Linguistic Society of America, January.
  • Gildea, Daniel and Thomas Hofmann. (1999). Topic-based language models using EM. In EUROSPEECH-99, pages 2167–2170, Budapest.
  • Hearst, Marti. (1999). Untangling text data mining. In: Proceedings of the 37th Annual Meeting of the ACL, pages 3–10, College Park, Maryland.
  • Hobbs, Jerry R., Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, Mark E. Stickel, and Mabry Tyson. (1997). FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing. MIT Press, Cambridge, MA, pages 383–406.
  • Hofmann, Thomas and Jan Puzicha. (1998). Statistical models for co-occurrence data. Memo, Massachussetts Institute of Technology Artificial Intelligence Laboratory, February.
  • Jackendoff, Ray. 1972. Semantic Interpretation in Generative Grammar. MIT Press, Cambridge, Massachusetts.
  • Jelinek, Frederick and Robert L. Mercer. 1980. Interpolated estimation of Markov source parameters from sparse data. In: Proceedings, Workshop on Pattern Recognition in Practice, pages 381–397, Amsterdam. North Holland.
  • Johnson, Christopher R., Charles J. Fillmore, Esther J. Wood, Josef Ruppenhofer, Margaret Urban, Miriam R. L. Petruk, and Collin F. Baker. (2001). The FrameNet project: Tools for lexicon building. Version 0.7, http://www.icsi.berkeley.edu/˜framenet/book.html.
  • (Kipper et al., 2000) ⇒ Karin Kipper, Hoa Trang Dang, William Schuler, and Martha Palmer. (2000). “Building a Class-based Verb Lexicon Using TAGs.” In: TAG+5 Fifth International Workshop on Tree Adjoining Grammars and Related Formalisms, pages 147–154.
  • Lapata, Maria and Chris Brew. (1999). Using subcategorization to resolve verb class ambiguity. In Joint SIGDAT Conference on

Empirical Methods in NLP and Very Large Corpora, pages 266–274, College Park, Maryland.

  • Levin, Beth. (1993). English Verb Classes And Alternations: A Preliminary Investigation. University of Chicago Press, Chicago.
  • Levin, Beth and Malka Rappaport Hovav. (1996). From lexical semantics to argument realization. manuscript.
  • Marcus, Mitchell P., Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. (1994). The Penn Treebank: Annotating predicate argument structure. In ARPA Human Language Technology Workshop, pages 114–119, Plainsboro, NJ. Morgan Kaufmann.
  • Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. (1993). Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19(2):313–330, June.
  • McCarthy, Diana. (2000). Using semantic preferences to identify verbal participation in role switching alternations. In: Proceedings

of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), pages 256–263, Seattle,Washington.

  • Miller, Scott, Heidi Fox, Lance Ramshaw, and RalphWeischedel. (2000). A novel use of statistical parsing to extract information from text. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL), pages 226–233, Seattle, Washington.
  • Miller, Scott, David Stallard, Robert Bobrow, and Richard Schwartz. (1996). A fully statistical approach to natural language interfaces. In: Proceedings of the 34th Annual Meeting of the ACL, pages 55–61, Santa Cruz, California.
  • Misra, Vidya Niwas. 1966. The Descriptive Technique of Panini. Mouton, The Hague. Pereira, Fernando, Naftali Tishby, and Lillian
  • Lee. (1993). Distributional clustering of English words. In: Proceedings of the 31st ACL, pages 183–190, Columbus, Ohio. ACL.
  • Pietra, Stephen Della, Vincent Della Pietra, and John D. Lafferty. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence]], 19(4):380–393, April.
  • Pollard, Carl and Ivan A. Sag. (1994). Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago.
  • Ellen Riloff. (1993). Automatically constructing a dictionary for information extraction tasks. In: Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI), pages 811–816, Washington, D.C.
  • Ellen Riloff and Mark Schmelzenbach. (1998). An empirical approach to conceptual case frame acquisition. In: Proceedings of the Sixth Workshop on Very Large Corpora, pages 49–56, Montreal, Canada.
  • Rocher, Rosane. 1964. “Agent” et “Objet” chez Panini. Journal of the American Oriental Society, 84:44–54.
  • Rooth, Mats. (1995). Two-dimensional clusters in grammatical relations. In: Proceedings of AAAISymposium on Representation and Acquisition of Lexical Knowledge, Stanford, California.
  • Rooth, Mats, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil. (1999). Inducing a semantically annotated lexicon via EM-based clustering. In: Proceedings of the 37th Annual Meeting of the ACL, pages 104–111, College Park, Maryland.
  • Schank, Roger C. 1972. Conceptual dependency: a theory of natural language understanding. Cognitive Psychology, 3:552–631.
  • Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York, second edition.
  • Somers, Harold L. (1987). Valency and Case in Computational Linguistics. Edinburgh University Press, Edinburgh, Scotland.
  • Stallard, David. (2000). Talk’n’travel: A conversational system for air travel planning. In: Proceedings of the 6th Applied Natural Language Processing Conference (ANLP’00), pages 68–75.
  • Van Valin, Robert D. (1993). A synopsis of role and reference grammar. In Robert D. Van Valin, editor, Advances in Role and Reference Grammar. John Benjamins Publishing Company, Amsterdam, pages 1–166.
  • Winograd, Terry. 1972. Understanding natural language. Cognitive Psychology, 3(1):1–191. Reprinted as a book by Academic Press, 1972.



    author = "Daniel Gildea and Daniel Jurafsky",
    title = "Automatic Labeling of Semantic Roles",
    journal = "Computational Linguistics",
    pages = "245",
    volume = "28",
    year = "2002"  }


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 AutomaticLabelingOfSemanticRolesDaniel Gildea
Daniel Jurafsky
Automatic Labeling of Semantic RolesComputational Linguistics Research Areahttp://www.cs.rochester.edu/~gildea/gildea-cl02.pdf2002