2005 WebBasedModelsForNLP

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Language Model, Noun Compound Bracketing.

Notes

Cited By

Quotes

Abstract

Previous work demonstrated that web counts can be used to approximate bigram counts, thus suggesting that web-based frequencies should be useful for a wide variety of NLP tasks. However, only a limited number of tasks have so far been tested using web-scale data sets. The present paper overcomes this limitation by systematically investigating the performance of web-based models for several NLP tasks, covering both syntax and semantics, both generation and analysis, and a wider range of n-grams and parts of speech than have been previously explored. For the majority of our tasks, we find that simple, unsupervised models perform better when n-gram counts are obtained from the web rather than from a large corpus. In some cases, performance can be improved further by using backo or interpolation techniques that combine web counts and corpus counts. However, unsupervised web-based models generally fail to outperform supervised state-of-the-art models trained on smaller corpora. We argue that web-based models should therefore be used as a baseline for, rather than an alternative to, standard supervised models.

1. INTRODUCTION

The web is being increasingly used as a data source in a wide range of natural language processing (NLP) tasks. Several researchers have explored the potential of web data for machine translation, either by creating bilingual corpora [Resnik and Smith 2003] or by using the web to lter out or postedit translation candidates [Grefenstette 1998; Cao and Li 2002; Way and Gough 2003]. Other work discovers semantic relations by querying the web for lexico-syntactic patterns indicative of hyponymy [Modjeska et al. 2003; Shinzato and Torisawa 2004], entailment [Szpektor et al. 2004], similarity, antonymy, or enablement [Chklovski and Pantel 2004]. A number of studies have investigated the usefulness of the web for word sense disambiguation [Mihalcea and Moldovan 1999; Rigau et al. 2002; Santamar a et al. 2003], question answering [Dumais et al. 2002; Hildebrandt et al. 2004; Soricut and Brill 2004], and language modeling [Zhu and Rosenfeld 2001; Keller and Lapata 2003; Bulyko et al. 2003].

Keller and Lapata [2003] have undertaken several studies to examine the validity of web counts for a range of predicate-argument bigrams (verb-object, adjective-noun, and noun-noun bigrams). They presented a simple method for retrieving bigram counts from the web by querying a search engine and demonstrated that web counts (a) correlate with frequencies obtained from a carefully edited, balanced corpus such as the 100M words British National Corpus (BNC), (b) correlate with frequencies recreated using smoothing methods in the case of unseen bigrams, (c) reliably predict human plausibility judgments, and (d) yield state-of-the-art performance on pseudo-disambiguation tasks.

Keller and Lapata's [2003] results suggest that web-based frequencies can be a viable alternative to bigram frequencies obtained from smaller corpora or recreated using smoothing. However, they do not demonstrate that realistic NLP tasks can bene t from web counts. In order to show this, web counts would have to be applied to a diverse range of NLP tasks, both syntactic and semantic, involving analysis (e.g., disambiguation) and generation (e.g., selection among competing outputs). Also, it remains to be shown that the web-based approach scales up to larger n-grams (e.g., trigrams), and to combinations of different parts of speech (Keller and Lapata [2003] only tested bigrams involving nouns, verbs, and adjectives). Another important question is whether web-based methods, which are by definition unsupervised, can be competitive alternatives to supervised approaches used for most tasks in the literature. Finally, Keller and Lapata's [2003] work raises the question whether web counts (noisy, but less sparse) can be fruitfully combined with corpus counts (less noisy, but sparse) into a single model.

The present paper aims to address these questions. We start by exploring the performance of web counts on two generation tasks for which the use of large data sets has previously shown promising results: (a) target language candidate selection for machine translation [Grefenstette 1998] and (b) context-sensitive spelling correction [Banko and Brill 2001a; 2001b]. Then we investigate the generality of the web-based approach by applying it to a range of analysis and generations tasks, involving both syntactic and semantic knowledge: (c) ordering of prenominal adjectives, (d) compound noun bracketing, (e) compound noun interpretation, (f) noun countability detection, (g) article restoration, and (h) PP attachment disambiguation. Table I gives an overview of these tasks and their properties. As the table illustrates, our choice of tasks covers n-grams of di erent sizes and includes a wide variety of parts of speech.

7. BRACKETING OF COMPOUND NOUNS

The first analysis task we consider is the syntactic disambiguation of compound nouns, which has received a fair amount of attention in the NLP literature [Pustejovsky et al. 1993; Resnik 1993; Lauer 1995]. The task can be summarized as follows: given a three word compound n1 n2 n3, determine the correct binary bracketing of the word sequence (see (4) for an example).

Previous approaches typically compare diff erent bracketings and choose the most likely one. The adjacency model compares [n1 n2] against [n2 n3] and adopts a right branching analysis if [n2 n3] is more likely than [n1 n2]. The dependency model compares [n1 n2] against [n1 n3] and adopts a right branching analysis if [n1 n3] is more likely than [n1 n2].

The simplest model of compound noun disambiguation compares the frequencies of the two competing analyses and opts for the most frequent one [Pustejovsky et al. 1993]. Lauer [1995] proposes an unsupervised method for estimating the frequencies of the competing bracketings based on a taxonomy or a thesaurus. He uses a probability ratio to compare the probability of the left-branching analysis to that of the right-branching

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 WebBasedModelsForNLPMirella Lapata
Frank Keller
Web-based Models for Natural Language Processinghttp://homepages.inf.ed.ac.uk/mlap/Papers/tslp05.pdf