2007 AddingNPStructToThePennTreebank

(Vadas & Curran, 2007) ⇒ David Vadas, and James R. Curran. (2007). “Adding Noun Phrase Structure to the Penn Treebank.” In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007).

Subject Headings: Penn Treebank, noun phrase, Noun-Phrase Premodifier, NP Bracketing Task.

Notes

It wants to Tag Noun Phrase Structure more accurately.
It tests the effect on performance on NP Bracketing Task and on Full Parsing Task.
The Annotation Guidelines are at 2008_NounPhraseBracketingGuidelinesV1.
They plan to transfer their annotation to CCGbank

Cited By

~31 …

Quotes

Abstract

The Penn Treebank does not annotate within base noun phrases (NPs), committing only to flat structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs.

This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliability of our annotations. Finally, we use this resource to determine NP structure using several statistical approaches, thus demonstrating the utility of the corpus. This adds detail to the Penn Treebank that is necessary for many NLP applications.

References

Ann Bies, Mark Ferguson, Karen Katz, and Robert MacIntyre. (1995). Bracketing guidelines for Treebank II style Penn Treebank project. Technical report, University of Pennsylvania.
Daniel M. Bikel. (2004). On the Parameter Space of Generative Lexicalized Statistical Parsing Models. Ph.D. thesis, University of Pennsylvania.
Thorsten Brants and Alex Franz. (2006). Web 1T 5-gram version 1. Linguistic Data Consortium.
Ted Briscoe and John Carroll. (2006). Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank. In: Proceedings of the Poster Session of COLING/ACL-06. Sydney, Australia.
Michael Collins. (1999). Head-Driven StatisticalModels for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.
Usama M. Fayyad and Keki B. Irani. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artifical Intelligence (IJCAI–93), pages 1022– 1029. Chambery, France.
Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. (2005). On the semantics of noun compounds. Journal of Computer Speech and Language - Special Issue on Multiword Expressions, 19(4):313–330.
Julia Hockenmaier. (2003). Data and Models for Statistical Parsing with Combinatory Categorial Grammar. Ph.D. thesis, University of Edinburgh.
Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M. Kaplan. (2003). The PARC700 dependency bank. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). Budapest, Hungary.
Seth Kulick, Ann Bies, Mark Libeman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, and Lyle Ungar. (2004). Integrated annotation for biomedical information extraction. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Boston.
Mirella Lapata and Frank Keller. (2004). The web as a baseline: Evaluating the performance of unsupervised web-based models for a range of NLP tasks. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 121–128. Boston.
(Lauer, 1995a) ⇒ Mark Lauer. (1995). “Corpus Statistics Meet the Noun Compound: Some empirical results.” In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. doi:10.3115/981658.981665
Mitchell Marcus. 1980. A Theory of Syntactic Recognition for Natural Language. MIT Press, Cambridge, MA.
Mitchell Marcus, Beatrice Santorini, and Mary Marcinkiewicz. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
Preslav Nakov and Marti Hearst. (2005). Search engine statistics beyond the n-gram: Application to noun compound bracketing. In: Proceedings of CoNLL-2005, Ninth Conference on Computational Natural Language Learning. Ann Arbor, MI.
Lance A. Ramshaw and Mitchell P. Marcus. (1995). Text chunking using transformation-based learning. In: Proceedings of the Third ACLWorkshop on Very Large Corpora. Cambridge MA, USA.
Mark Steedman. (2000). The Syntactic Process. MIT Press, Cambridge, MA.
Ralph Weischedel and Ada Brunstein. (2005). BBN pronoun coreference and entity type corpus. Technical report, Linguistic Data Consortium.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 AddingNPStructToThePennTreebank	David Vadas James R. Curran			Adding Noun Phrase Structure to the Penn Treebank		Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics	http://www.cs.usyd.edu.au/~james/pubs/pdf/acl07nps.pdf			2007