Document Index Creation Task

From GM-RKB
Jump to navigation Jump to search

A Document Index Creation Task is an Index Creation Task that requires the creation of a Document Index (to facilitate Search Task for a Documents in a Corpus).



References

2008

2007

  • http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/indexing.htm
    • Indexing: Indexing is the representation of a document (or a part of a document or an "information object") in a record or in an index for the purpose of retrieval. Common forms of indexes appear in library catalogs, bibliographical databases and back-of-the-book indexes.
  • http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/human_indexing.htm
    • Human indexing is often contrasted to automatic indexing. It is also termed "manual indexing" (cf., automation). Machine-aided indexing is an overlapping form combining human skills with computer power.
  • http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/automatic_indexing.htm
    • Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music.
    • In text-databases may the algorithm perform string searching, but is mostly based on searching the words in the the single document representation as well as in the total database (via inverted files). The use of words is mostly based on stemming). Algorithms may count co-occurrences of words (or references), they may consider levels of proximity between words, and so on.
  • http://www.db.dk/bh/Core%20Concepts%20in%20LIS/articles%20a-z/text_categorization.htm
    • Text categorization: "Text categorization is a machine learning approach, in which also information retrieval methods are applied. It involves manually categorizing a number of documents to pre-defined categories (which normally lack devices for the control of polysemy, synonymy and homonymy). By learning the characteristics of those documents the automated categorization of new documents takes place. Text categorization is known as supervised learning, since the process is 'supervised' by learning categories' characteristics from manually categorized documents". (Golub, p. 52).

2005

2001