Document Index Creation Task

A Document Index Creation Task is an Index Creation Task that requires the creation of a Document Index (to facilitate Search Task for a Documents in a Corpus).

AKA: Document Cataloging.
Context:
- It can be solved by a Document Index Creation System (that implements a Document Index Creation Algorithm).
- It can range from being a Manual Document Index Creation Task to being an Automated Document Index Creation Task.
- It can support a Document Classification Task.
- It can involve the collection and organization of Meta-Information.
- …
Counter-Example(s):
See: Information Retrieval, CiteSeer, Open Archives Initiative, Image Indexing Task, Term Indexing Task.

References

2008

(Dextre Clarke et al., 2008) ⇒ Stella Dextre Clarke, Alan Gilchrist, Ron Davies and Leonard Will. (2008). “Glossary of Terms Relating to Thesauri and Other Forms of Structured Vocabulary for Information Retrieval." Willpower Information
- indexing
  - intellectual analysis of the subject matter of a document to identify the concepts represented in it, and allocation of the corresponding preferred terms to allow the information to be retrieved
  - The term "subject indexing" is often used for this concept, but within a context that does not deal with other elements such as authors or dates, "indexing" is sufficient.----

2007

http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/indexing.htm
- Indexing: Indexing is the representation of a document (or a part of a document or an "information object") in a record or in an index for the purpose of retrieval. Common forms of indexes appear in library catalogs, bibliographical databases and back-of-the-book indexes.
http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/human_indexing.htm
- Human indexing is often contrasted to automatic indexing. It is also termed "manual indexing" (cf., automation). Machine-aided indexing is an overlapping form combining human skills with computer power.
http://www.db.dk/bh/Lifeboat_KO/CONCEPTS/automatic_indexing.htm
- Automatic indexing is indexing made by algorithmic procedures. The algorithm works on a database containing document representations (which may be full text representations or bibliographical records or partial text representations and in principle also value added databases). Automatic indexing may also be performed on non-text databases, e.g. images or music.
- In text-databases may the algorithm perform string searching, but is mostly based on searching the words in the the single document representation as well as in the total database (via inverted files). The use of words is mostly based on stemming). Algorithms may count co-occurrences of words (or references), they may consider levels of proximity between words, and so on.
http://www.db.dk/bh/Core%20Concepts%20in%20LIS/articles%20a-z/text_categorization.htm
- Text categorization: "Text categorization is a machine learning approach, in which also information retrieval methods are applied. It involves manually categorizing a number of documents to pre-defined categories (which normally lack devices for the control of polysemy, synonymy and homonymy). By learning the characteristics of those documents the automated categorization of new documents takes place. Text categorization is known as supervised learning, since the process is 'supervised' by learning categories' characteristics from manually categorized documents". (Golub, p. 52).

2005

(ANSI Z39.19, 2005) ⇒ ANSI. (2005). “ANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies." ANSI.
- QUOTE: "indexing
  - 1. A method by which terms or subject headings from a controlled vocabulary are selected by a human or computer to represent the concepts in or attributes of a content object. The terms may or may not occur in the content object.
  - 2. An operation intended to represent the results of the content analysis of a document by means of a controlled indexing language or by natural language. [ISO 5127/1]
(Woodley, 2005b) ⇒ Mary S. Woodley, Gail Clement, and Pete Winn. (2005). “DCMI Glossary." Dublin Core Metadata Initiative.
- indexing: The process of evaluating information entities and creating terms that aid in finding and accessing the entity. Index terms may be in natural language or controlled vocabulary or a classification notation.

2001

(Jacquemin, 2001) ⇒ Christian Jacquemin. (2001). “Spotting and Discovering Terms Through Natural Language Processing." MIT Press. ISBN:0262100851
- Automatic indexing: Automatic indexing is the association of descriptors to documents for the purpose of information retrieval.

Document Index Creation Task

References

2008

2007

2005

2001

Navigation menu

Search