2000 UsingNounPhraseHeadsToExtrDocKeyPhr

(Barker & Cornacchia, 2000) ⇒ Ken Barker, Nadia Cornacchia. (2000). “Using Noun Phrase Heads to Extract Document Keyphrases.” In: Canadian Conference on AI (CAI 2000). doi:10.1007/3-540-45486-1.

Subject Headings: Base Noun Phrase.

Notes

Cited By

(Hulth, 2003) ⇒ Anette Hulth. (2003). “Improved Automatic Keyword Extraction Given More Linguistic Knowledge.” In: Proceedings of ACL.
- Finding potential terms — when no machine learning is involved in the process — by means of POS patterns is a common approach. For example, Barker and Cornacchia (2000) discuss an algorithm where the number of words and the frequency of a noun phrase, as well as the frequency of the head noun is used to determine what terms are keywords.

Quotes

Abtract

Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores.
This paper describes a simple system for choosing noun phrases from a document as keyphrases. A noun phrase is chosen based on its length, its frequency and the frequency of its head noun. Noun phrases are extracted from a text using a base noun phrase skimmer and an off-the-shelf online dictionary.
Experiments involving human judges reveal several interesting results: the simple noun phrase-based system performs roughly as well as a state-of-the-art, corpus-trained keyphrase extractor; ratings for individual keyphrases do not necessarily correlate with ratings for sets of keyphrases for a document; agreement among unbiased judges on the keyphrase rating task is poor.

3 Extracting Keyphrases

Our system for extracting keyphrases from documents proceeds in three steps: it skims a document for base noun phrases; it assigns scores to noun phrases based on frequency and length; it filters some noise from the set of top scoring keyphrases.

3.1 Skimming for Base Noun Phrases

… A base noun phrase is a non-recursive structure consisting of a head noun and zero or more premodifying adjectives and/or nouns. The base noun phrase does not include noun phrase postmodifiers such as prepositional phrases or relative clauses. A base noun phrase skimmer proceeds through a text word-by-word looking for sequences of nouns and adjectives ending with a noun and surrounded by non-noun/adjectives.

7. Future Considerations

… A more ambitious project would be to plug the different keyphrase extractors into a larger system. How would different keyphrases affect sentence extraction in a text summarization system, for example? It would also be interesting to adjust the keyphrase selection algorithm to allow for compound heads: theoretical natural language processing and empirical natural language processing are kinds of natural language processing, not just kinds of processing.

8 Conclusions

In this paper we have presented a simple system for extracting keyphrases automatically from documents. It requires no training and makes use of publicly available lexical resources only. Despite its lack of sophistication, it appears to perform no worse than the state-of-the-art, trained Extractor system in experiments involving human judges.
More importantly, however, experiments show that judges do not necessarily consider the quality of sets of keyphrases as a simple function of the quality of individual keyphrases. This suggests that neither experiments involving the rating of individual keyphrases only (as reported in [11]) nor experiments rating the quality of sets of keyphrases only (as proposed in [12]) are sufficient for evaluating the performance of a keyphrase extraction system.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2000 UsingNounPhraseHeadsToExtrDocKeyPhr	Ken Barker Nadia Cornacchia			Using Noun Phrase Heads to Extract Document Keyphrases		Canadian Conference on AI	https://www.cs.utexas.edu/~kbarker/papers/canai00-keyphrase.pdf	10.1007/3-540-45486-1		2000