2004 FastDeepLinguisticStatisticalDepParsing

Subject Headings: Dependency Grammar, Long-Distance Relationship.


  • It describes an implemented Dependency Parser.
  • It has a freely available implementation. [1]
  • Though it would likely take some time to install and integrate
  • Pro3Gres stands for PRObability-based, PROlog-implemented Parser for RObust Grammatical Relation Extraction System. It is a fast, broad-coverage, deep-syntactic parsing system. It is a flexible and perspicuous hybrid parser using easily editable hand-written rules and statistical lexicalization from the Penn Treebank. Its performance is state-of-the-art or almost state-of-the-art. Its statistical model is based on the decisions that (a human or a machine) parser has to take during the parsing process.

  • We present and evaluate an implemented statistical minimal parsing strategy exploiting DG charateristics to permit fast, robust, deep-linguistic analysis of unrestricted text, and compare its probability model to (Collins, 1999) and an adaptation, (Dubey and Keller, 2003). We show that DG allows for the expression of the majority of English LDDs in a context-free way and o ers simple yet powerful statistical models.

1 Introduction

  • We present a fast, deep-linguistic statistical parser that pro ts from DG characteristics and that uses am minimal parsing strategy. First, we rely on nite-state based approaches as long as possible, secondly where parsing is necessary we keep it context-free as long as possible1. For low-level syntactic tasks, tagging and base-NP chunking is used, parsing only takes place between heads of chunks. Robust, successful parsers (Abney, 1995; Collins, 1999) have shown that this division of labour is particularly attractive for DG.
  • Deep-linguistic, Formal Grammar parsers have carefully crafted grammars written by professional linguists. But unrestricted real-world texts still pose a problem to NLP systems that are based on Formal Grammars. Few handcrafted, deep linguistic grammars achieve the coverage and robustness needed to parse large corpora (see (Riezler et al., 2002), (Burke et al., 2004) and (Hockenmaier and Steedman, 2002) for exceptions), and speed remains a serious challenge. The typical problems can be grouped as follows.


