2002 RankingAlgorithmsforNamedEntity

From GM-RKB
(Redirected from M. Collins (2002))
Jump to navigation Jump to search

Subject Headings: Voted Perceptron Algorithm, Boosting Algorithm, Named Entity Recognition Task.

Notes

Cited By

2007

2005

  • (Collins & Koo, 2005) ⇒ Michael Collins, and Terry Koo. (2005). “Discriminative Reranking for Natural Language Parsing.” In: Computational Linguistics, 31(1) doi:10.1162/0891201053630273
    • QUOTE: Perceptron-based algorithms, or the voted perceptron approach of Freund and Schapire (1999), are another alternative to boosting and LogLoss methods. See Collins (2002a, 2002b) and Collins and Duffy (2001, 2002) for applications of the perceptron algorithm. Collins (2002b) gives convergence proofs for the methods; Collins (2002a) directly compares the boosting and perceptron approaches on a named entity task; and Collins and Duffy (2001, 2002) use a reranking approach with kernels, which allow representations of parse trees or labeled sequences in very-high-dimensional spaces.

2004

Quotes

Abstract

This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.

1. Introduction

Recent work in statistical approaches to parsing and tagging has begun to consider methods which incorporate global features of candidate structures. Examples of such techniques are Markov Random Fields (Abney 1997; Della Pietra et al. 1997; Johnson et al. 1999), and boosting algorithms (Freund et al. 1998; Collins 2000; Walker et al. 2001). One appeal of these methods is their flexibility in incorporating features into a model: essentially any features which might be useful in discriminating good from bad structures can be included. A second appeal of these methods is that their training criterion is often discriminative, attempting to explicitly push the score or probability of the correct structure for each training sentence above the score of competing structures. This discriminative property is shared by the methods of (Johnson et al. 1999; Collins 2000), and also the Conditional Random Field methods of (Lafferty et al. 2001).

In a previous paper (Collins 2000), a boosting algorithm was used to rerank the output from an existing statistical parser, giving significant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms have been applied to natural language generation, with good results, in (Walker et al. 2001). In this paper we apply reranking methods to named-entity extraction. A state-of-the-art (maximum-entropy) tagger is used to generate 20 possible segmentations for each input sentence, along with their probabilities. We describe a number of additional global features of these candidate segmentations. These additional features are used as evidence in reranking the hypotheses from the max-ent tagger. We describe two learning algorithms: the boosting method of (Collins 2000), and a variant of the voted perceptron algorithm, which was initially described in (Freund & Schapire 1999). We applied the methods to a corpus of over one million words of tagged web data. The methods give significant improvements over the maximum-entropy tagger (a 17.7% relative reduction in error-rate for the voted perceptron, and a 15.6% relative improvement for the boosting method).

One contribution of this paper is to show that existing reranking methods are useful for a new domain, named-entity tagging, and to suggest global features which give improvements on this task. We should stress that another contribution is to show that a new algorithm, the voted perceptron, gives very credible results on a natural language task. It is an extremely simple algorithm to implement, and is very fast to train (the testing phase is slower, but by no means sluggish). It should be a viable alternative to methods such as the boosting or Markov Random Field algorithms described in previous work.

2.2 The baseline tagger

The problem can be framed as a tagging task – to tag each word as being either the start of an entity, a continuation of an entity, or not to be part of an entity at all (we will use the tags S, C and N respectively for these three cases). As a baseline model we used a maximum entropy tagger, very similar to the ones described in (Ratnaparkhi 1996; Borthwick et. al 1998; McCallum et al. 2000). Max-ent taggers have been shown to be highly competitive on a number of tagging tasks, such as part-of-speech tagging (Ratnaparkhi 1996), named-entity recognition (Borthwick et. al 1998), and information extraction tasks (McCallum et al. 2000). Thus the maximum-entropy tagger we used represents a serious baseline for the task.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 RankingAlgorithmsforNamedEntityMichael CollinsRanking Algorithms for Named-entity Extraction: Boosting and the Voted Perceptron10.3115/1073083.10731652002