1995 TheProblOfCompTheMostProbTreeInDOPParsing

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Syntactic Parser, Data Oriented Parsing.

Notes

Cited By

Quotes

Abstract

We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.

We show that in STSG, in contrast with stochastic context-free grammar, the Viterbi algorithm cannot be used for computing a most probable tree of a string. We propose a simple modification of Viterbi which allows by means of a "select-random" search to estimate the most probable tree of a string in polynomial time.

Experiments with DOP on ATIS show that only in 68% of the cases, the most probable derivation of a string generates the most probable tree of that string. Therefore, the parse accuracy obtained by the most probable trees (96%) is dramatically higher than the parse accuracy obtained by the most probable derivations (65%).

It is still an open question whether the most probable tree of a string can be deterministically computed in polynomial time.


References

  • M. van den Berg, R. Bod & R. Scha, (1994). “A Corpus-based Approach to Semantic Interpretation", Proceedings Ninth Amsterdam Colloquium, Amsterdam.
  • E. Black, R. Garside and G. Leech, 1993. Statistically-Driven Computer Grammars o/English: The IBM/Lancaster Approach, Rodopi: Amsterdam- Atlanta.
  • R. Bod, (1992). “A Computational Model of Language Performance: Data Oriented Parsing", Proceedings COLING'92, Nantes.
  • R. Bod, 1993a. "Using an Annotated Corpus as a Stochastic Grammar", Proceedings European Chapter fo the ACL'93, Utrecht.
  • R. Bod, 1993b. "Monte Carlo Parsing", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
  • R. Bod, 1993c. "Data Oriented Parsing as a General Framework for Stochastic Language Processing", in:
  • K.Sikkel & A. Nijholt (eds.), Parsing Natural Language, TWLT6, Twente University.
  • R. Bod, (1995). Enriching Linguistics with Statistics: Performance Models of Natural Language. PhD-thesis, University of Amsterdam (forthcoming).
  • T. Briscoe and J. Carroll, (1993). “Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars", Computational Linguistics 19(1), 25-59.
  • T. Fujisaki, F. Jelinek, J. Cocke, E. Black and T. Nishino, (1989). “A Probabilistic Method for Sentence Disambiguation", Proceedings 1st International Workshop on Parsing Technologies, Pittsburgh.
  • J.M. Hammersley and D.C. Handscomb, 1964. Monte Carlo Methods, Chapman and Hall, London.
  • C.T. Hemphill, J.J. Godfrey and G.R. Doddington, 1990. "The ATIS spoken language systems pilot corpus". Proceedings DARPA Speech and Natural Language Workshop, Hidden Valley, Morgan Kaufmann.
  • F. Jelinek, J.D. Lafferty and R.L. Mercer, 1990. Basic Methods of Probabilistic Context Free Grammars, Technical Report IBM RC 16374 (#72684), Yorktown Heights.
  • M. Kay, (1980). Algorithmic Schemata and Data Structures in Syntactic Processing. Report CSL-80-12, Xerox PARC, Palo Alto, Ca.
  • D. Magerman and C. Weir, (1992). “Efficiency, Robustness and Accuracy in Picky Chart Parsing", Proceedings A CL'92, Newark, Delaware.
  • M. Marcus, B. Santorini and M. Marcinkiewicz, 1993. "Building a Large Annotated Corpus of English: the Penn Treebank", Computational Linguistics 19(2).
  • D. Rumelhart and J. McClelland, (1986). Parallel Distributed Processing, The MIT Press, Cambridge, Mass.
  • R. Scha, (1990). “Language Theory and Language Technology; Competence and Performance" (in Dutch), in Q.A.M. de Kort & G.L.J. Leerdam (eds.), Computertoepassingen in de Neerlandistiek, Almere: Landelijke Vereniging van Neerlandici (LVVNjaarboek).
  • Y. Schabes, (1992). “Stochastic Lexicalized Tree Adjoining Grammars", Proceedings COLING'92, Nantes.
  • Y. Schabes and R, Waters, (1993). “Stochastic Lexicalized Context Free Grammars", Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy.
  • K. Sima'an, R. Bod, S. Krauwer and R. Scha, 1994. "Efficient Disambiguation by means of Stochastic Tree Substitution Grammars", Proceedings International Conference on New Methods in Language Processing, UMIST, Manchester.
  • A. Viterbi, 1967. "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm", IEEE Trans. Information Theory, IT-13, 260-269.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1995 TheProblOfCompTheMostProbTreeInDOPParsingRens BodThe Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammarshttp://acl.ldc.upenn.edu/E/E95/E95-1015.pdf