2009 AReviewofAuditingMethodsApplied

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Terminology; Auditing method; Quality factor; Knowledge source; Manual; Automated; Systematic; [[Heuristic

Abstract

Although controlled biomedical terminologies have been with us for centuries, it is only in the last couple of decades that close attention has been paid to the quality of these terminologies. The result of this attention has been the development of auditing methods that apply formal methods to assessing whether terminologies are complete and accurate. We have performed an extensive literature review to identify published descriptions of these methods and have created a framework for characterizing them. The framework considers manual, systematic and heuristic methods that use knowledge (within or external to the terminology) to measure quality factors of different aspects of the terminology content (terms, semantic classification, and semantic relationships). The quality factors examined included concept orientation, consistency, non-redundancy, soundness and comprehensive coverage. We reviewed 130 studies that were retrieved based on keyword search on publications in PubMed, and present our assessment of how they fit into our framework. We also identify which terminologies have been audited with the methods and provide examples to illustrate each part of the framework. Keywords

1. Introduction

The quality of a controlled terminology can be characterized from any of several different perspectives. The design of a terminology can, from the outset, determine much about the future capabilities of the terminology. Many aspects of terminology design have been identified and characterized as desirable or undesirable [1] and [2]. Standards development organizations have paid much attention to creating guidelines for quality control in terminology development. For example, the ISO/TC215 WG3 (Health Informatics – Semantic Content) has been working on such guidelines, 1 and the latest American National Standards Institute guidelines for designing controlled terminologies (ANSI/NISO Z39.19-2005) serves as a comprehensive reference [3]. In some cases, there is lack of consensus about desirability of particular design features (for example, some desire multiple hierarchy [1] and [2] while others feel it should be avoided [4]).

The structure of a terminology can be studied to determine whether it supports or contradicts the stated design principles of the terminology. For example, Logical Observation Identifiers Names and Codes (LOINC) is designed to have meaningless identifiers; its use of sequential integers with check digits satisfies this requirement [5]. Similarly, the relationships in the Unified Medical Language System (UMLS) are designed to be reciprocal; the MRREL file provides a mechanism for delivering this information, as described in [6].

Finally, the content of a terminology can be assessed to determine if is comprehensive and accurate from lexical and semantic (as opposed to structural) standpoints. For example, the list of all laboratory tests contained in LOINC can be evaluated to identify whether it in fact contains all the terms used by hospital laboratories.

To illustrate these distinctions, consider the assessment of a terminology with respect to multiple hierarchies. A terminology can be designed to include multiple hierarchies, but can be found to have a structural characteristic that interferes with true multiple hierarchies, such as the tree addresses used in the Medical Subject Headings (MeSH), as described in [7]. Even when the terminology has a high-quality structure to support multiple hierarchies, its content might be deficient if a term that should have two parents is found to have only one.

A great deal of thoughtful planning is generally applied to terminology design, construction and maintenance. Design decisions (however controversial) are made with care, while the structural integrity can generally be guaranteed through good programming and database design. The quality of a terminology’s content, on the other hand, is often not immediately obvious. However, well-intentioned, authoritative, and cautious a terminology builder may be, there is always the chance for errors of omission or commission.

At the very least, good quality assurance practices dictate that assessment for errors should be a standard part of terminology management [8]. However, these practices, collectively referred to as auditing, can be challenging. Manual, expert review of a large terminology may provide little confidence that all errors have been detected. For example, any manual attempt to identify redundant terms in a large (>100,000 term) terminology will likely require memory that goes beyond human capacity.

To address this problem, informatics researchers and terminology developers have devised a number of methods to audit terminologies in systematic ways. Their methods often use knowledge in the terminology itself to perform the assessment and use computers to support – and in some cases entirely automate – the assessment. This paper reviews the major efforts in this area and organizes them into a framework that considers the aspects of terminology content that are audited, the methods used in the audits, and the terminology content that is employed to actually support the auditing process. 2. Quality factors to be audited

We first identify quality factors by which terminology content can be assessed. We consider intrinsic quality factors that are inherent to terminology content and that can be audited independently from external reference standards. Intrinsic factors include concept orientation, consistency, soundness, and non-redundancy. We also consider extrinsic quality factors that are contingent on comprehensive coverage of external user requirements, domain-specific contextual needs, or other external reference standards. Both types of quality factors can be further applied to the content and knowledge structure of a terminology. We describe these factors below and summarize them in Table 1.

6. Conclusions

The last 20 years have been witness to a proliferation of terminology auditing methods that employ a variety of creative methods and exploit a variety of terminological knowledge to better evaluate and improve the terminologies that are emerging today as important components of biologic, clinical and public health systems. Much of the work has gone beyond the experimental stage to become key components of standards development and information system maintenance.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 AReviewofAuditingMethodsAppliedXinxin Zhu
Jung-Wei Fan
David M Baorto
Chunhua Weng
James J Cimino
A Review of Auditing Methods Applied to the Content of Controlled Biomedical Terminologies2009