1995 StatDecTreeModelsForParsing

(Magerman, 1995) ⇒ David M. Magerman. (1995). “Statistical Decision Tree Models for Parsing.” In: Proceedings of ACL-1995. doi:10.3115/981658.981695

Subject Headings: Decision Tree Algorithm, Statistical NLP, Syntactic Parsing, SPATTER Algorithm.

Notes

Cited By

~558 http://scholar.google.com/scholar?q=%22Statistical+decision+tree+models+for+parsing%22+1995

Quotes

Abstract

Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing n-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall Street Journal corpus using the PARSEVAL measures, SPATTER achieves 86% precision, 86% recall, and 1.3 crossing brackets per sentence for sentences of 40 words or less, and 91% precision, 90% recall, and 0.5 crossing brackets for sentences between 10 and 20 words in length.

1 Introduction

…

The claim of this work is that statistics from a large corpus of parsed sentences combined with information-theoretic classification and training algorithms can produce an accurate natural language parser without the aid of a complicated knowledge base or grammar. This claim is justified by constructing a parser, called SPATTER (Statistical PATTErn Recognizer), based on very limited linguistic information, and comparing its performance to a state-of-the-art grammar-based parser on a common task.

3.1 SPATTER Representation

…

In SPATTER, a parse tree is encoded in terms of four elementary components, or features: words, tags, labels, and extensions. Each feature has a fixed vocabulary, with each element of a given feature vocabulary having a unique representation. The word feature can take on any value of any word. The tag feature can take on any value in the part-of-speech tag set. The label feature can take on any value in the non-terminal set. The extension can take on any of the following five values:

right - the node is the first child of a constituent;

left - the node is the last child of a constituent;

up - the node is neither the first nor the last child of a constituent;

unary - the node is a child of a unary constituent;

root - the node is the root of the tree.

For an n word sentence, a parse tree has [math]\displaystyle{ n }[/math] leaf nodes, where the word feature value of the ith leaf node is the ith word in the sentence. The word feature value of the internal nodes is intended to contain the lexical head of the node's constituent. A deterministic lookup table based on the label of the internal node and the labels of the children is used to approximate this linguistic notion.

References

L. R. Bahl, P. F. Brown, P. V. deSouza, and R. L. Mercer. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 36, No. 7, pages 1001--1008.
L. E. Baum. 1972. An Inequality and associated maximization technique in statistical estimation of probabilistic functions of markov processes. Inequalities, Vol. 3, pages 1--8.
E. Black, Steven P. Abney, S. Flickenger, C. Gdaniec, C. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, T. Strzalkowski, Procedure for quantitatively comparing the syntactic coverage of English grammars, Proceedings of the workshop on Speech and Natural Language, p.306-311, February 19-22, 1991, Pacific Grove, California doi:10.3115/112405.112467.
E. Black, R. Garside, and G. Leech. (1993). Statistically-driven computer grammars of english: the ibm/lancaster approach. Rodopi, Atlanta, Georgia.
Leo Breiman, Jerome H. Friedman, R. A. Olshen, and C. J. Stone. (1984). Classification and Regression Trees. Wadsworth and Brooks, Pacific Grove, California.
Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, Jenifer C. Lai, Class-based n-gram models of natural language, Computational Linguistics, v.18 n.4, p.467-479, December 1992
David Mitchell Magerman, Natural language parsing as statistical pattern recognition, Stanford University, Stanford, CA, 1994,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1995 StatDecTreeModelsForParsing	David M. Magerman			Statistical Decision Tree Models for Parsing		Proceedings of ACL-1995	http://mti.ugm.ac.id/~adji/courses/resources/doctor/parsing-alg/StatTreeParse95.pdf	10.3115/981658.981695		1995