2000 UsingNounPhraseHeadsToExtrDocKeyPhr

Jump to: navigation, search

Subject Headings: Base Noun Phrase.


Cited By

  • (Hulth, 2003) ⇒ Anette Hulth. (2003). “Improved Automatic Keyword Extraction Given More Linguistic Knowledge.” In: Proceedings of ACL.
    • Finding potential terms — when no machine learning is involved in the process — by means of POS patterns is a common approach. For example, Barker and Cornacchia (2000) discuss an algorithm where the number of words and the frequency of a noun phrase, as well as the frequency of the head noun is used to determine what terms are keywords.



  • Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores.
  • This paper describes a simple system for choosing noun phrases from a document as keyphrases. A noun phrase is chosen based on its length, its frequency and the frequency of its head noun. Noun phrases are extracted from a text using a base noun phrase skimmer and an off-the-shelf online dictionary.
  • Experiments involving human judges reveal several interesting results: the simple noun phrase-based system performs roughly as well as a state-of-the-art, corpus-trained keyphrase extractor; ratings for individual keyphrases do not necessarily correlate with ratings for sets of keyphrases for a document; agreement among unbiased judges on the keyphrase rating task is poor.

3 Extracting Keyphrases

Our system for extracting keyphrases from documents proceeds in three steps: it skims a document for base noun phrases; it assigns scores to noun phrases based on frequency and length; it filters some noise from the set of top scoring keyphrases.

3.1 Skimming for Base Noun Phrases

7. Future Considerations

  • … A more ambitious project would be to plug the different keyphrase extractors into a larger system. How would different keyphrases affect sentence extraction in a text summarization system, for example? It would also be interesting to adjust the keyphrase selection algorithm to allow for compound heads: theoretical natural language processing and empirical natural language processing are kinds of natural language processing, not just kinds of processing.

8 Conclusions

  • In this paper we have presented a simple system for extracting keyphrases automatically from documents. It requires no training and makes use of publicly available lexical resources only. Despite its lack of sophistication, it appears to perform no worse than the state-of-the-art, trained Extractor system in experiments involving human judges.
  • More importantly, however, experiments show that judges do not necessarily consider the quality of sets of keyphrases as a simple function of the quality of individual keyphrases. This suggests that neither experiments involving the rating of individual keyphrases only (as reported in [11]) nor experiments rating the quality of sets of keyphrases only (as proposed in [12]) are sufficient for evaluating the performance of a keyphrase extraction system.,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2000 UsingNounPhraseHeadsToExtrDocKeyPhrKen Barker
Nadia Cornacchia
Using Noun Phrase Heads to Extract Document KeyphrasesCanadian Conference on AIhttps://www.cs.utexas.edu/~kbarker/papers/canai00-keyphrase.pdf10.1007/3-540-45486-12000