Keyphrase Extraction Task

Jump to navigation Jump to search

A Keyphrase Extraction Task is an IE from text task that requires the extraction of keyphrases from a text corpus.



  • (Wikipedia - Keyphrase Extraction) ⇒
    • QUOTE: The task is the following. You are given a piece of text, such as a journal article, and you must produce a list of keywords or key[phrase]s that capture the primary topics discussed in the text.[1] In the case of research articles, many authors provide manually assigned keywords, but most text lacks pre-existing keyphrases. For example, news articles rarely have keyphrases attached, but it would be useful to be able to automatically do so for a number of applications discussed below.

      Consider the example text from a news article: "The Army Corps of Engineers, rushing to meet President Bush's promise to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps last year despite warnings from its own expert that the equipment would fail during a storm, according to documents obtained by The Associated Press".

      A keyphrase extractor might select "Army Corps of Engineers", "President Bush", "New Orleans", and "defective flood-control pumps" as keyphrases. These are pulled directly from the text. In contrast, an abstractive keyphrase system would somehow internalize the content and generate keyphrases that do not appear in the text, but more closely resemble what a human might produce, such as "political negligence" or "inadequate protection from floods". Abstraction requires a deep understanding of the text, which makes it difficult for a computer system.

      Keyphrases have many applications. They can enable document browsing by providing a short summary, improve information retrieval (if documents have keyphrases assigned, a user could search by keyphrase to produce more reliable hits than a full-text search), and be employed in generating index entries for a large text corpus. Depending on the different literature and the definition of key terms, words or phrases, keyword extraction is a highly related theme.


  • (Hasanaidul & Ng, 2014) ⇒ Kazi S. Hasanaidul, and Vincent Ng. (2014). “Automatic Keyphrase Extraction: A Survey of the State of the Art.” In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1262-1273.





  1. Alrehamy, Hassan H; Walker, Coral (2018). "SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation". Advances in Computational Intelligence Systems. Advances in Intelligent Systems and Computing. 650. pp. 222–235. doi:10.1007/978-3-319-66939-7_19. ISBN 978-3-319-66938-0.