1999 DomainSpecificKeyphraseExtraction

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Keyphrase Extraction Task, Computer Science Literature.

Notes

Cited By

~310 http://scholar.google.com/scholar?cites=17586814598882386185

2000

Quotes

Abstract

1 Introduction

  • Keyphrases give a high-level description of a document's contents that is intended to make it easy for prospective readers to decide whether or not it is relevant for them. But they have other applications too. Because keyphrases summarize documents very concisely, they can be used as a low-cost measure of similarity between documents, making it possible to cluster documents into groups by measuring overlap between the keyphrases they are assigned. A related application is topic search: upon entering a keyphrase into a search engine, all documents with this particular keyphrase attached are returned to the user. In summary, keyphrases provide a powerful means for sifting through large numbers of documents by focusing on those that are likely to be relevant.
  • Unfortunately, only a small fraction of documents have keyphrases assigned to than — mostly because authors only provide keyphrases when they are explicitly instructed to do so — and manually attaching keyphrases to existing documents is a very laborious task. Therefore, ways of automating this process using artificial intelligence — more specifically, machine learning techniques — are of interest. There are two different ways of approaching the problem: keyphrase assignment and keyphrase extraction. In keyphrase assignment, also known as text categorization [Dumais et al., 1998], it is assumed that all potential kephrases appear in a predefined controlled vocabulary — the categories. The learning problem is to find a mapping from documents to categories using a set of training documents, which can be accomplished by training a classifier for each category, using documents that belong to it. as positive examples and the rest as negative ones. A new document is then processed by each of the classifiers and assigned to those categories whose classifiers identify it as a positive example. The second approach, keyphrase extraction, which we pursue in this paper, does not restrict the set of possible keyphrases to a selected vocabulary. On the contrary, any phrase in a new document can be identified — extracted — as a keyphrase. Using a set of training documents, machine learning is used to determine which properties distinguish phrases that are keyphrases from ones that are not.
  • The main finding of this paper is that performance can be boosted significantly if Kea is trained on documents that are from the same domain as those from which keyphrases are to be extracted.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
1999 DomainSpecificKeyphraseExtractionIan H. Witten
Eibe Frank
Gordon W. Paynter
Carl Gutwin
Craig G. Nevill-Manning
Domain-Specific Keyphrase Extractionhttp://ijcai.org/Past Proceedings/IJCAI-99 VOL-2/PDF/002.pdf