1999 DomainSpecificKeyphraseExtraction
Jump to navigation
Jump to search
- (Frank et al., 1999) ⇒ Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, Craig G. Nevill-Manning. (1999). “Domain-Specific Keyphrase Extraction.” In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 1999)
Subject Headings: Keyphrase Extraction Task, Computer Science Literature.
Notes
- It proposes the use of a Naive Bayes Classifier to the Supervised Keyphrase Extraction Task.
Cited By
~310 http://scholar.google.com/scholar?cites=17586814598882386185
2000
- (Turney, 2000) ⇒ Peter D. Turney. (2000). “Learning Algorithms for Keyphrase Extraction.” In: Journal of Information Retrieval, 2(4). doi:10.1023/A:1009976227802
- … Frank et al. (1999) have implemented a system, Kea, which builds on our work (Turney, 1997, 1999). It treats keyphrase extraction as a supervised learning problem, but it uses a Bayesian approach instead of a genetic algorithm approach. Their experiments indicate that Kea and GenEx have statistically equivalent levels of performance. The same group (Gutwin et al., 1999) has evaluated Kea as a component in a new kind of search engine, Keyphind, designed specially to support browsing. Their experiments suggest that certain kinds of tasks are much easier with Keyphind than with conventional search engines. The Keyphind interface is somewhat similar to the interface of Tetranet’s Wisebot.
Quotes
Abstract
- Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive Bayes learning scheme performs comparably to the state of the art. It goes on to explain how this procedure's performance can be boosted by automatically tailoring the extraction process to the particular document collection at hand. Results on a large collection of technical reports in computer science show that the quality of the extracted keyphrases improves significantly when domain-specific information is exploited.
1 Introduction
- Keyphrases give a high-level description of a document's contents that is intended to make it easy for prospective readers to decide whether or not it is relevant for them. But they have other applications too. Because keyphrases summarize documents very concisely, they can be used as a low-cost measure of similarity between documents, making it possible to cluster documents into groups by measuring overlap between the keyphrases they are assigned. A related application is topic search: upon entering a keyphrase into a search engine, all documents with this particular keyphrase attached are returned to the user. In summary, keyphrases provide a powerful means for sifting through large numbers of documents by focusing on those that are likely to be relevant.
- Unfortunately, only a small fraction of documents have keyphrases assigned to than — mostly because authors only provide keyphrases when they are explicitly instructed to do so — and manually attaching keyphrases to existing documents is a very laborious task. Therefore, ways of automating this process using artificial intelligence — more specifically, machine learning techniques — are of interest. There are two different ways of approaching the problem: keyphrase assignment and keyphrase extraction. In keyphrase assignment, also known as text categorization [Dumais et al., 1998], it is assumed that all potential kephrases appear in a predefined controlled vocabulary — the categories. The learning problem is to find a mapping from documents to categories using a set of training documents, which can be accomplished by training a classifier for each category, using documents that belong to it. as positive examples and the rest as negative ones. A new document is then processed by each of the classifiers and assigned to those categories whose classifiers identify it as a positive example. The second approach, keyphrase extraction, which we pursue in this paper, does not restrict the set of possible keyphrases to a selected vocabulary. On the contrary, any phrase in a new document can be identified — extracted — as a keyphrase. Using a set of training documents, machine learning is used to determine which properties distinguish phrases that are keyphrases from ones that are not.
- …
- The main finding of this paper is that performance can be boosted significantly if Kea is trained on documents that are from the same domain as those from which keyphrases are to be extracted.
References
,