2006 MultilingualDependencyAnalysisw

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Correlated Label, Dependency Parse.

Notes

Cited By

Quotes

Abstract

We present a two-stage multilingual dependency parser and evaluate it on 13 diverse languages. The first stage is based on the unlabeled dependency parsing models described by McDonald and Pereira (2006) augmented with morphological features for a subset of the languages. The second stage takes the output from the first and labels all the edges in the dependency graph with appropriate syntactic categories using a globally trained sequence classifier over components of the graph. We report results on the CoNLL-X shared task (Buchholz et al., 2006) data sets and present an error analysis.

3. Label Classification

The simplest labeler would be to take as input an edge [math]\displaystyle{ (i, j) \in y }[/math] for sentence x and find the label with highest score,

l(i,j) = argmax s(l, (i, j), y,x)
            l

Doing this for each edge in the tree would produce the final output. Such a model could easily be trained using the provided training data for each language. However, it might be advantageous to know the labels of other nearby edges. For instance, if we consider a head [math]\displaystyle{ x_i }[/math] with dependents xj1 , . . . , xjM, it is often the case that many of these dependencies will have correlated labels. To model this we treat the labeling of the edges (i, j1), . . . , (i, jM) as a sequence labeling problem,

(l(i,j1), . . . , l(i,jM)) = ¯l = argmax s(¯l, i, y,x)
¯l

We use a first-order Markov factorization of the score

= argmax
m=2
s(l(i,jm), l(i,jm−1), i, y,x)

in which each factor is the score of labeling the adjacent edges (i, jm) and (i, jm−1) in the tree y. We attempted higher-order Markov factorizations but they did not improve performance uniformly across languages and training became significantly slower.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 MultilingualDependencyAnalysiswRyan T. McDonald
Fernando Pereira
Kevin Lerman
Multilingual Dependency Analysis with a Two-stage Discriminative Parser