2005 ANSI Z39.19

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

This Standard presents guidelines and conventions for the contents, display, construction, testing, maintenance, and management of monolingual controlled vocabularies. This Standard focuses on controlled vocabularies that are used for the representation of content objects in knowledge organization systems including lists, synonym rings, taxonomies, and thesauri. This Standard should be regarded as a set of recommendations based on preferred techniques and procedures. Optional procedures are, however, sometimes described, e.g., for the display of terms in a controlled vocabulary. The primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval. Vocabulary control is accomplished by three principal methods: defining the scope, or meaning, of terms; using the equivalence relationship to link synonymous and nearly synonymous terms; and distinguishing among homographs.

1. Introduction

Optimizing the production, maintenance and extension of electronic lexical resources is one of the crucial aspects impacting human language technologies (HLT) in general and natural language processing (NLP) in particular, as well as human-oriented translation technologies. A second crucial aspect involves optimizing the process leading to their integration in applications. Lexical Markup Framework (LMF) is an abstract metamodel that provides a common, standardized framework for the construction of computational lexicons. LMF ensures the encoding of linguistic information in a way that enables reusability in different applications and for different tasks. LMF provides a common, shared representation of lexical objects, including morphological, syntactic, and semantic aspects.

The goals of LMF are to provide a common model for the creation and use of electronic lexical resources ranging from small to large in scale, to manage the exchange of data between and among these resources, and to facilitate the merging of large numbers of different individual electronic resources to form extensive global electronic resources. The ultimate goal of LMF is to create a modular structure that will facilitate true content interoperability across all aspects of electronic lexical resources.

The LMF core package describes the basic hierarchy of information of a lexical entry, including information on the form. The core package is supplemented by various resources that are part of the definition of LMF. These resources include:

  • Specific data categories used by the variety of resource types associated with LMF, both those data categories relevant to the metamodel itself, and those associated with the extensions to the core package;
  • The constraints governing the relationship of these data categories to the metamodel and to its extensions;
  • Standard procedures for expressing these categories and thus for anchoring them on the structural skeleton of LMF and relating them to the respective extension models;
  • The vocabularies used by LMF to express related informational objects for describing how to extend LMF through linkage to a variety of specific resources (extensions) and methods for analyzing and designing such linked systems.

Extensions of the core package which are documented in this standard in annexes include:

  • Machine Readable Dictionaries
  • Natural Language Processing lexical resources

LMF extensions are expressed in a framework that describes the reuse of the LMF core components (such as structures, data categories, and vocabularies) in conjunction with the additional components required for a specific resource.

Types of individual instantiations of LMF can include such electronic lexical resources as fairly simple lexical databases, NLP and machine-translation lexicons, as well as electronic monolingual, bilingual and multilingual lexical databases. LMF provides general structures and mechanisms for analyzing and designing new electronic lexical resources, but LMF does not specify the structures, data constraints, and vocabularies to be used in the design of specific electronic lexical resources. LMF also provides mechanisms for analyzing and describing existing resources using a common descriptive framework. For the purpose of both designing new lexical resources and describing existing lexical resources, LMF defines the conditions that allow the data expressed in any one lexical resource to be mapped to the LMF framework, and thus provides an intermediate format for lexical data exchange.

Glossary

...

  • acronym An abbreviation composed of the first letters of a compound term or phrase; e.g. Automatic Teller Machine = ATM, United Nations = UN.
  • associative relationship A relationship between or among terms in a controlled vocabulary that leads from one term to other terms that are related to or associated with it; begins with the words SEE ALSO or related term (RT).
  • asymmetric Lacking symmetry. In the context of controlled vocabularies, reciprocal relationships are asymmetric when the relationship indicator used between a pair of linked terms is different in one direction than it is in the reverse direction, e.g. BT / NT. See also symmetric and reciprocity.
  • authority file A set of established headings and the cross-references to be made to and from each heading, often citing the authority for the preferred form or variants. Types of authority files include name authority files and subject authority files.
  • authorization / authorizing body. The process (authorization) or oversight group (authorizing body) responsible for selecting terms and establishing relationships for a controlled vocabulary.
  • blind reference
    • 1. A term in a controlled vocabulary that has not been assigned to any content objects. These may be needed in some instances as place holders in taxonomies and other structured vocabularies.
    • 2. A preferred term used in a SEE or USE reference where the term pointed to does not exist in the vocabulary.
  • bound term A term consisting of a compound term or phrase that indicates a single concept. (The phrase was originated by Mortimer Taube in his Studies in Coordinate Indexing, vol. 1, 1953, p. 43.) See also compound term.
  • broader term A term to which another term or multiple terms are subordinate in a hierarchy. In thesauri, the relationship indicator for this type of term is BT.
  • browsing The process of visually scanning through organized collections of representations of content objects, controlled vocabulary terms, hierarchies, taxonomies, thesauri, etc.
  • candidate term A term under consideration for admission into a controlled vocabulary because of its potential usefulness. Also known as provisional term.
  • category A grouping of terms that are semantically or statistically associated, but which do not constitute a strict hierarchy based on genus/species, parent/child, or part/whole relationships. See also tree structure.
  • classification scheme A method of organization according to a set of pre-established principles, usually characterized by a notation system and a hierarchical structure of relationships among the entities.
  • compound term A term consisting of more than one word or a phrase that represents a single concept. Compound terms must be constructed according to the guidelines of this Standard. See also bound term and precoordination.
  • concept A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.
  • concept map A representation in two dimensions of the conceptual relationships among terms and the concepts they represent.
  • content object An entity that contains data/information. A content object can itself be made up of content objects. For example, a journal is a content object made up of individual journal articles, which can each be a content object. The text, figures, and photographs included in a journal article can also be separate content objects. Paintings, sculpture, maps, photographs, and other non-textual objects are also content objects. The metadata for a content object can itself be a content object.
  • controlled vocabulary A list of terms that have been enumerated explicitly. This list is controlled by and is available from a controlled vocabulary registration authority. All terms in a controlled vocabulary must have an unambiguous, non-redundant definition.
    • NOTE: This is a design goal that may not be true in practice; it depends on how strict the controlled vocabulary registration authority is regarding registration of terms into a controlled vocabulary. At a minimum, the following two rules must be enforced:
    • 1. If the same term is commonly used to mean different concepts, then its name is explicitly qualified to resolve this ambiguity. NOTE: This rule does not apply to synonym rings.
    • 2. If multiple terms are used to mean the same thing, one of the terms is identified as the preferred term in the controlled vocabulary and the other terms are listed as synonyms or aliases.
  • cross-reference 1. A direction from one term to another. See associative relationship; equivalence relationship; hierarchical relationship. descriptor See preferred term. difference See modifier.
  • document Any item, printed or otherwise, that is amenable to cataloging and indexing. The term applies not only to written and printed materials in paper or microform versions (e.g., books, journals, maps, diagrams), but also to non-print media (e.g., machine-readable records, transparencies, audiotapes, videotapes) and, by extension, to three-dimensional objects or realia (e.g., museum objects and specimens). A document is a content object. drop-down menu See pick list.
  • entry term The non-preferred term in a cross reference that leads to a term in a controlled vocabulary. Also known as "lead-in term.” In thesauri, the relationship indicator for this type of term is U (USE); its reciprocal is UF (USED FOR). See also preferred term. entry vocabulary The set of non-preferred terms (USE references) that lead to terms in a controlled vocabulary.
  • eponym A term incorporating the name of a real or mythical person, generally the discoverer of a phenomenon or inventor of an object, e.g., Herculean labor, Parkinson’s disease, pasteurization.
  • equivalence relationship A relationship between or among terms in a controlled vocabulary that leads to one or more terms that are to be used instead of the term from which the cross-reference is made; begins with the word SEE or USE.
  • facet A grouping of concepts of the same inherent category. Examples of categories that may be used for grouping concepts into facets are: activities, disciplines, people, materials, places, etc.
  • facet indicator See node label. false hit A content object retrieved whose content does not match the intent of the concepts represented by the search terms used. Previously called false drop.
  • federated searching See metasearching. filing rules A set of guidelines that determine how letters and numbers, spaces, and special characters will be treated in assembling an alphabetical or other listing.
  • flat format An alphabetical display format of controlled vocabularies in which only one level of broader terms and one level of narrower terms are shown for each term.
  • focus In a compound term, the noun component that identifies the class of concepts to which the term as a whole refers. Also known as head noun. See also modifier.
  • free text Antonym of controlled vocabulary. Natural language terms appearing in content objects, which can complement controlled vocabulary terms in an information storage and retrieval system. In
  • free text searching, controlled vocabulary terms can also be retrieved. See also keyword.
  • generic posting
    • 1. In controlled vocabularies, the treatment of narrower terms as equivalents, e.g., furniture UF beds; UF sofas. See also upposting.
    • 2. In indexing and subject cataloging, the assignment of a broader term instead of a specific term, e.g., furniture to a content object on sofas.
  • generic structure A controlled vocabulary format that indicates all hierarchical levels of terms within an alphabetic display by means of codes, indentation, and/or punctuation marks.
  • gloss An explanation or definition of an obscure or ambiguous word in a text. See also qualifier.
  • graphics display A method of representing information that uses space and distance in addition to words.
  • head noun See focus heading A preferred name or term. Types of headings include proper name
  • headings (which may be called identifiers), subject headings, and terms. A heading may include a qualifier.
  • hierarchical relationship . A relationship between or among terms in a controlled vocabulary that depicts broader (generic) to narrower (specific) or whole-part relationships; begins with the words broader term (BT), or narrower term (NT).
  • hierarchy Broader (generic) to narrower (specific) or whole-part relationships, which are generally indicated in a controlled vocabulary through codes or indentation. See also broader term; narrower term.
  • history note A note in a term record in a controlled vocabulary that provides the date of entry of a term as well as the history of modifications to its scope, relationships, etc.
  • homograph One of two or more words that have the same spelling, but different meanings and origins. In controlled vocabularies, homographs are generally distinguished by qualifiers.
  • HTML (Hyper Text Markup Language) A markup language used to describe the layout and presentation of a document on the World Wide Web.
  • hyperlink A method of using embedded links to connect different parts of a content object to one another.
  • identifier
    • 1. A proper name (or its abbreviation or acronym) of an institution, person, place, object, or process, optionally treated as a category of heading distinct from terms. Identifiers may be held in a separate file (compare authority file), and their form may be controlled (e.g., the name of an international organization having different names in various languages, only one of which is selected).
    • 2. In some systems, a provisional term that may be upgraded to approved status, or a highly specific term that is not eligible for term status, but which is considered useful for retrieval and is assigned to one or more content objects without vocabulary control.
  • indexing
    • 1. A method by which terms or subject headings from a controlled vocabulary are selected by a human or computer to represent the concepts in or attributes of a content object. The terms may or may not occur in the content object.
    • 2. An operation intended to represent the results of the content analysis of a document by means of a controlled indexing language or by natural language. [ISO 5127/1]
  • indexing language A controlled vocabulary or classification system and the rules for its application. An indexing language is used for the representation of concepts dealt with in documents [content objects] and for the retrieval of such documents [content objects] from an information storage and retrieval system. [ISO 5127/1]
  • indexing term The representation of a concept in an indexing language, generally in the form of a noun or noun phrase. Terms, subject headings, and heading-subheading combinations are examples of indexing terms.
  • information storage and retrieval system. A set of operations and the associated equipment, software, and documentation by which content objects are indexed and the data are stored, so that selected content objects can be retrieved in response to requests employing commands that can be handled by the system.
  • initialism A set of initials by which something is known in preference to the full form of its name. Example: IBM, ICBM. See also acronym.
  • interoperability The ability of two or more systems or components to exchange information and use the exchanged information without special effort on the part of either system.
  • keyword: A word occurring in the natural language of a document that is considered significant for indexing and retrieval. See also free text.
  • KWIC (Key Word In Context) index. A type of index, arranged alphabetically, in which each significant word in a string of text serves as an access point, by being graphically emphasized and surrounded by the rest of the string. The keyword is generally in a centered column and is followed on the right by the continuation of the string, which provides the context. The balance of the string, if any, is positioned to the left of the keyword.
  • KWOC (Key Word Out Of Context) index A type of index, arranged alphabetically, in which each significant word in a string of text serves as an access point, usually positioned in the left-hand column of a page, followed by the complete string. The keyword may therefore not be in the immediate context of the words that surround it.
  • lexeme: A fundamental unit of the vocabulary of a language.
  • 3.27 lexical database/lexical resource: A database containing terms as well as information about the terms such as part of speech, type of term, etc.
  • lexicographer A person who is knowledgeable about terms, their uses, parts of speech, etc. Lexicographers often construct controlled vocabularies.
  • literary warrant Justification for the representation of a concept in an indexing language or for the selection of a preferred term because of its frequent occurrence in the literature. See also organizational warrant and user warrant.
  • mapping A set of correspondences between categories, schema element names, or controlled terms. Mappings are used for transforming data or queries from one vocabulary for use with another.
  • metasearching The simultaneous searching across multiple databases, sources, platforms, and protocols. Also known as broadcast searching, cross-database searching, federated searching, or parallel searching.
  • microcontrolled vocabulary A subset of a controlled vocabulary, covering a limited range of topics within the domain of the controlled vocabulary. A microcontrolled vocabulary may contain highly specialized terms that are not in the broad controlled vocabulary. Such terms should map to the hierarchical structure of the broad controlled vocabulary. A microcontrolled vocabulary is internally consistent with respect to relationships among terms.
  • modifier In a compound term, one or more components that serve to narrow the extension of a focus and specify one of its subclasses. Also known as difference.
  • multilevel hierarchy A set of hierarchical relationships among terms that has multiple levels of specificity extending from the most broadly defined terms to the most specific.
  • narrower term A term that is subordinate to another term or to multiple terms in a hierarchy. In thesauri, the relationship indicator for this type of term is NT.
  • natural language A language used by human beings for verbal communication. Words extracted from natural language texts for indexing purposes without vocabulary control are often called keywords.
  • navigation The process of moving through a controlled vocabulary or an information space via some pre-established links or relationships. For example, navigation in a controlled vocabulary could mean moving from a broader term to one or more narrower terms using the predefined relationships.
  • near-synonym A term whose meaning is not exactly synonymous with that of another term, yet which may nevertheless be treated as its equivalent in a controlled vocabulary. Example: salinity, saltiness node label A “dummy” term, often a phrase, that is not assigned to documents when indexing, but which is inserted into the hierarchical section of some controlled vocabularies to indicate the logical basis on which a class has been divided. Node labels may also be used to group categories of related terms in the alphabetic section of a controlled vocabulary.
  • non-preferred term See entry term. See also preferred term.
  • OPAC (Online Public Access Catalog) A library or other catalog of content objects that is accessible online. The catalog may or may not be accessible to the public, but it is still called an OPAC.
  • organizational warrant Justification for the representation of a concept in an indexing language or for the selection of a preferred term due to characteristics and context of the organization. See also literary warrant and user warrant.
  • orphan term A term that has no associative or hierarchical relationship to any other term in a controlled vocabulary.
  • orthography The art of writing words with the proper letters according to standard usage.
  • PDF (portable document format) A file format developed by Adobe Systems that provides hardwareand software-independent viewing of a formatted document.
  • permuted display A type of index where individual words of a term are rotated to bring each word of the term into alphabetical order in the term list. See also KWIC and KWOC.
  • pick list A graphical user interface device that allows the user to select from a pre-set list of terms. Typically the list of terms is shown when the user clicks on a down arrow next to the entry box for the term.
  • polyhierarchy A controlled vocabulary structure in which some terms belong to more than one hierarchy. For example, rose might be a narrower term under both flowers and perennials in a horticulture vocabulary.
  • polyseme A word with multiple meanings. In spoken language, polysemes are called homonyms; in written language they are called homographs. Only the latter are relevant to controlled vocabularies designed for textual information.
  • postcoordination The combining of terms at the searching stage rather than at the subject heading list construction stage or indexing stage. See also precoordination.
  • postings The number of content objects to which a term is assigned.
  • precision A measure of a search system's ability to retrieve only relevant content objects. Usually expressed as a percentage calculated by dividing the number of retrieved relevant content objects by the total number of content objects retrieved.
    • A high-precision search ensures that, for the most part, the content objects retrieved will be relevant. However, a high-precision search may not retrieve all relevant content objects. See also recall. Recall and precision tend to be inverse ratios. When one goes up, the other usually goes down.
  • precoordination The formulation of a multiword heading or the linking of a heading and subheadings to create a formally controlled, multi-element expression of a concept or object. Precoordination is often used to ensure logical sorting of related expressions. Examples of precoordinated headings:
    • New England — Genealogy — Handbooks, Manuals, etc.
    • Searching, Bibliographic
    • United States — History — Civil War, 1861-1865
    • See also postcoordination.
  • preferred term One of two or more synonyms or lexical variants selected as a term for inclusion in a controlled vocabulary. See also non-preferred term.
  • provisional term See candidate term.
  • qualifier A defining term, used in a controlled vocabulary to distinguish homographs. A qualifier is considered part of a term, subject heading, or entry term, but is separated from it by punctuation. The qualifier is generally enclosed in parentheses. Example: Mercury (metal) See also gloss.
  • quasi-synonym See near synonym.
  • recall A measure of a search system's ability to retrieve all relevant content objects. Usually expressed as a percentage calculated by dividing the number of retrieved relevant content objects by the number of all relevant content objects in a collection.
    • A high recall search retrieves a comprehensive set of relevant content objects from the collection. However, high recall increases the possibility that less relevant content objects will also be retrieved. See also precision. Recall and precision tend to be inverse ratios. When one goes up, the other usually goes down. reciprocity Semantic relationships in controlled vocabularies must be reciprocal, that is each relationship from one term to another must also be represented by a reciprocal relationship in the other direction. Reciprocal relationships may be symmetric, e.g. RT / RT, or asymmetric e.g. BT / NT. See also asymmetric and symmetric.
  • related term A term that is associatively but not hierarchically linked to another term in a controlled vocabulary. In thesauri, the relationship indicator for this type of term is RT.
  • relationship indicator A word, phrase, abbreviation, or symbol used in thesauri to identify a semantic relationship between terms. Examples of relationship indicators are UF (USED FOR), and RT (related term).
  • romanization The conversion of a non-roman script by means of transcription or transliteration or a combination of the two methods. rotated listing See permuted display.
  • running head A page heading indicating the first and last entries that appear on that page. The heading changes on each page to reflect the changed content.
  • scope note A note following a term explaining its coverage, specialized usage, or rules for assigning it.
  • semantic linking A method of linking terms according to their meaning or meanings.
  • semantic web A representation in two (or possibly three) dimensions of the semantic relationships between and among terms and the concepts they represent.
  • sibling A term that shares the same broader term (one level higher) as other terms.
  • stop list A list of words considered to be of no value for retrieval. It consists primarily of function words — articles, conjunctions, and prepositions — but may also include words that occur very frequently in the literature of a domain.
  • subheading A term appended to a heading in order to modify or delimit the heading by indicating a particular aspect or relationship pertaining to it. A term with a subheading may be subject to further modification. See also precoordination.
  • subject heading A word or phrase, or any combination of words, phrases, and modifiers used to describe the topic of a content object. Precoordination of terms for multiple and related concepts is a characteristic of subject headings that distinguishes them from controlled vocabulary terms. See also precoordinated term and precoordination.
  • subject heading list An alphabetical list of subject headings with cross-references from non-preferred terms and links to related terms. These lists often include separate sequences of standardized subheadings that may be combined with all or only some subject headings. Rules for applying subheadings usually accompany such lists.
  • symmetric Having symmetry. In the context of controlled vocabularies reciprocal relationships are symmetric when the relationship indicator used between a pair of linked terms is the same in one direction as it is in the reverse direction, e.g. RT / RT. See also asymmetric and reciprocity.
  • synonym A word or term having exactly or very nearly the same meaning as another word or term.
  • synonym ring A group of terms that are considered equivalent for the purposes of retrieval. systematic display See tree structure.
  • taxonomy A collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/narrower) relationships to other terms in the taxonomy.
  • term One or more words designating a concept. See also compound term, entry term, and precoordinated term.
  • term record A collection of information associated with a term in a controlled vocabulary, including the history of the term, its relationships to other terms, and, optionally, authorities for the term.
  • thesaurus (plural: thesauruses, thesauri)
    • A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.
    • Its purpose is to promote consistency in the indexing of content objects, especially for postcoordinated information storage and retrieval systems, and to facilitate browsing and searching by linking entry terms with terms. Thesauri may also facilitate the retrieval of content objects in free text searching.
    • NOTES: The term “Thesaurus” is the Latin form of the Greek word thesauros, originally meaning “treasure store.” In the 16th century, it began to be used as a synonym for “dictionary” (a treasure store of words), but later it fell into disuse. Peter Mark Roget resurrected the term in 1852 for the title of his dictionary of synonyms. The purpose of that work is to give the user a choice among similar terms when the one first thought of does not quite seem to fit. A hundred years later, in the early 1950s, the word “thesaurus” began to be employed again as the name for a word list, but one with the exactly opposite aim: to prescribe the use of only one term for a concept that may have synonyms. A similarity between Roget’s Controlled Thesaurus and thesauri for indexing and information retrieval is that both list terms that are related hierarchically or associatively to terms, in addition to synonyms.
  • top term The broadest term in a controlled vocabulary hierarchy, sometimes indicated by the abbreviation TT.
  • transcription The process of recording the phonological and/or morphological elements of a language in terms of a specific writing system.
  • transliteration The process of recording the graphic symbols of one writing system in terms of the corresponding graphic symbols of another writing system.
  • tree structure A controlled vocabulary display format in which the complete hierarchy of terms is shown. Each term is assigned a tree number or line number which leads from the alphabetical display to the hierarchical one. The hierarchical display is also known as a systematic display.
  • typography The style, arrangement, appearance, or typeface used to represent information.
  • up-posting The automatic assignment of broader terms in addition to the specific term by which a document is indexed. Also known as autoposting. See also generic posting. user interface The way in which a user interacts with a computer-based system.
  • user warrant Justification for the representation of a concept in an indexing language or for the selection of a preferred term because of frequent requests for information on the concept or free-text searches on the term by users of an information storage and retrieval system. See also literary warrant and organizational warrant.
  • vocabulary control The process of organizing a list of terms (a) to indicate which of two or more synonymous terms is authorized for use; (b) to distinguish between homographs; and (c) to indicate hierarchical and associative relationships among terms in the context of a controlled vocabulary or subject heading list. See also controlled vocabulary.


References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 ANSI Z39.19American National Standards InstituteANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularieshttp://www.niso.org/kst/reports/standards?step=2&gid=None&project key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a2005