1995 TheProblOfCompTheMostProbTreeInDOPParsing

(Bod, 1995) ⇒ Rens Bod. (1995). “The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars.” In: Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics (EACL 1995). doi:10.3115/976973.976989

Subject Headings: Syntactic Parser, Data Oriented Parsing.

Notes

Cited By

~20 http://scholar.google.com/scholar?q=%22The+Problem+of+Computing+the+Most+Probable+Tree+in+Data-Oriented+Parsing+and+Stochastic+Tree+Grammars%22+1995

Quotes

Abstract

We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.

We show that in STSG, in contrast with stochastic context-free grammar, the Viterbi algorithm cannot be used for computing a most probable tree of a string. We propose a simple modification of Viterbi which allows by means of a "select-random" search to estimate the most probable tree of a string in polynomial time.

Experiments with DOP on ATIS show that only in 68% of the cases, the most probable derivation of a string generates the most probable tree of that string. Therefore, the parse accuracy obtained by the most probable trees (96%) is dramatically higher than the parse accuracy obtained by the most probable derivations (65%).

It is still an open question whether the most probable tree of a string can be deterministically computed in polynomial time.

References

M. van den Berg, R. Bod & R. Scha, (1994). “A Corpus-based Approach to Semantic Interpretation", Proceedings Ninth Amsterdam Colloquium, Amsterdam.
E. Black, R. Garside and G. Leech, 1993. Statistically-Driven Computer Grammars o/English: The IBM/Lancaster Approach, Rodopi: Amsterdam- Atlanta.
R. Bod, (1992). “A Computational Model of Language Performance: Data Oriented Parsing", Proceedings COLING'92, Nantes.
R. Bod, 1993a. “Using an Annotated Corpus as a Stochastic Grammar", Proceedings European Chapter fo the ACL'93, Utrecht.
R. Bod, 1993b. “Monte Carlo Parsing", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
R. Bod, 1993c. “Data Oriented Parsing as a General Framework for Stochastic Language Processing", in:
K.Sikkel & A. Nijholt (eds.), Parsing Natural Language, TWLT6, Twente University.
R. Bod, (1995). Enriching Linguistics with Statistics: Performance Models of Natural Language. PhD-thesis, University of Amsterdam (forthcoming).
T. Briscoe and J. Carroll, (1993). “Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars", Computational Linguistics 19(1), 25-59.
T. Fujisaki, F. Jelinek, J. Cocke, E. Black and T. Nishino, (1989). “A Probabilistic Method for Sentence Disambiguation", Proceedings 1st International Workshop on Parsing Technologies, Pittsburgh.
J.M. Hammersley and D.C. Handscomb, 1964. Monte Carlo Methods, Chapman and Hall, London.
C.T. Hemphill, J.J. Godfrey and G.R. Doddington, 1990. “The ATIS spoken language systems pilot corpus". Proceedings DARPA Speech and Natural Language Workshop, Hidden Valley, Morgan Kaufmann.
F. Jelinek, J.D. Lafferty and R.L. Mercer, 1990. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684), Yorktown Heights.
M. Kay, (1980). Algorithmic Schemata and Data Structures in Syntactic Processing. Report CSL-80-12, Xerox PARC, Palo Alto, Ca.
D. Magerman and C. Weir, (1992). “Efficiency, Robustness and Accuracy in Picky Chart Parsing", Proceedings A CL'92, Newark, Delaware.
M. Marcus, B. Santorini and M. Marcinkiewicz, 1993. “Building a Large Annotated Corpus of English: the Penn Treebank", Computational Linguistics 19(2).
D. Rumelhart and J. McClelland, (1986). Parallel Distributed Processing, The MIT Press, Cambridge, Mass.
R. Scha, (1990). “Language Theory and Language Technology; Competence and Performance" (in Dutch), in Q.A.M. de Kort & G.L.J. Leerdam (eds.), Computertoepassingen in de Neerlandistiek, Almere: Landelijke Vereniging van Neerlandici (LVVNjaarboek).
Y. Schabes, (1992). “Stochastic Lexicalized Tree Adjoining Grammars", Proceedings COLING'92, Nantes.
Y. Schabes and R, Waters, (1993). “Stochastic Lexicalized Context Free Grammars", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
K. Sima'an, R. Bod, S. Krauwer and R. Scha, 1994. “Efficient Disambiguation by means of Stochastic Tree Substitution Grammars", Proceedings International Conference on New Methods in Language Processing, UMIST, Manchester.
A. Viterbi, 1967. “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Trans. Information Theory, IT-13, 260-269.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1995 TheProblOfCompTheMostProbTreeInDOPParsing	Rens Bod			The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars		Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics	http://acl.ldc.upenn.edu/E/E95/E95-1015.pdf			1995