Curated Database

A Curated Database is a quality-controlled maintained annotated dataset that contains canonical records undergoing active curation by domain experts to ensure data accuracy and information reliability.

AKA: Curated Dataset, Curated Record Set, Curated Artifact, Curated Data, Maintained Database, Expert-Curated Database, Quality-Controlled Dataset.
Context:
- It can typically undergo Continuous Curation Processes through expert review, data validation, and quality assessment.
- It can typically maintain Canonical Record Sets of Curated Data Records with authoritative annotations.
- It can typically enforce Data Quality Standards through curation guidelines and validation protocols.
- It can typically provide Reliable Reference Data for scientific research, clinical decision-making, or knowledge discovery.
- It can typically incorporate Expert Knowledge through domain specialist curation and peer review processes.
- ...
- It can often require Manual Curation by Curators with domain expertise and specialized training.
- It can often employ Curation Systems and Curation Tools for data management and quality control.
- It can often integrate Multiple Data Sources through data harmonization and standardization processes.
- It can often undergo Version Control with release cycles and change tracking.
- It can often support Data Provenance Tracking through curation history and annotation lineage.
- ...
- It can range from being a Small Curated Database to being a Large-Scale Curated Database, depending on its record volume.
- It can range from being a Single-Curator Database to being a Community-Curated Database, depending on its curation model.
- It can range from being a Manually Curated Database to being a Semi-Automatically Curated Database, depending on its curation automation level.
- It can range from being a Domain-Specific Curated Database to being a Cross-Domain Curated Database, depending on its subject scope.
- It can range from being a Static Curated Database to being a Dynamically Updated Database, depending on its update frequency.
- It can range from being a Lightly Curated Database to being a Deeply Curated Database, depending on its curation depth.
- ...
- It can be produced by Curation Tasks following curation protocols and quality guidelines.
- It can be managed by Database Curators using curation workflows and validation procedures.
- It can be maintained through Quality Assurance Processes including error detection and consistency checking.
- It can be enhanced via Literature Curation extracting knowledge from scientific publications.
- It can be distributed through Database Portals with access control and usage licenses.
- ...
Example(s):
- Biological Curated Databases, such as:
  - Protein Sequence Curated Databases, such as:
    - Swiss-Prot Database with manually annotated protein sequences and functional annotations.
    - RefSeq Database with curated reference sequences for genes, transcripts, and proteins.
    - UniProtKB/Swiss-Prot with expert-reviewed protein entrys.
  - Gene Function Curated Databases, such as:
  - Protein Interaction Curated Databases, such as:
- Medical Curated Databases, such as:
  - Disease Curated Databases, such as:
  - Drug Curated Databases, such as:
    - DrugBank Database with curated drug information and drug target data.
    - ChEMBL Database with curated bioactive molecules and pharmacological data.
    - PharmGKB Database with curated pharmacogenomic knowledge.
- Canonical Entity Databases, such as:
  - Wikidata with community-curated structured data about entitys.
  - DBpedia with curated structured information from Wikipedia.
  - Freebase (historical) with curated entity relationships.
- Literature Curated Databases, such as:
- Chemical Curated Databases, such as:
  - PubChem Database with curated chemical compound information.
  - ChemSpider Database with curated chemical structures and property data.
  - Cambridge Structural Database with curated crystal structures.
- Genomic Curated Databases, such as:
  - Ensembl Database with curated genome annotations.
  - UCSC Genome Browser with curated genomic data tracks.
  - FlyBase with curated Drosophila genetic data.
- Archaeological Curated Databases, such as:
  - Open Context with curated archaeological research data.
  - tDAR (Digital Archaeological Record) with curated archaeological datasets.
- Linguistic Curated Databases, such as:
  - WordNet with curated lexical semantic networks.
  - FrameNet with curated semantic frames and lexical units.
  - Universal Dependencies with curated syntactic annotations.
- Model Organism Curated Databases, such as:
  - WormBase for C. elegans curated data.
  - TAIR for Arabidopsis thaliana curated data.
  - SGD (Saccharomyces Genome Database) for yeast curated data.
- ...
Counter-Example(s):
- Raw Data Repositorys, which lack active curation and quality control.
- Automatically Generated Databases, which rely on algorithmic extraction without human oversight.
- Crowd-Sourced Databases without expert validation or quality assurance.
- Static Archives, which preserve historical data without ongoing curation.
- Web Scraping Databases, which aggregate unverified information from web sources.
See: Curation Task, Database Curator, Curation System, Data Quality Control, Annotated Dataset, Canonical Record, Expert Annotation, Literature Curation, Data Validation, Quality Assurance, Version Control System, Data Provenance, Peer Review Process, Swiss-Prot Database, Gene Ontology, UniProt, RefSeq, OMIM, ClinVar, DrugBank, PubMed, Wikidata, Knowledge Base, Reference Database, Gold Standard Dataset, Authoritative Source, Data Harmonization, Metadata Standard, Controlled Vocabulary, Ontology, Data Integration.

References

2009

(Cusick et al., 2009) ⇒ Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, and Marc Vidal. (2009). “Literature-Curated Protein Interaction Datasets.” In: Nature Methods 6, 39 - 46 (2009)
- Why is reliability of literature curation so low? Our findings of large error rates in curated protein interaction databases, at least for yeast and human, are consistent with recent hints that the quality of literature-curated datasets may not be as high as widely perceived23,29,43–45. Perhaps occasionally curator error is responsible. However, we suggest that the errors are due not so much to curators but to the simple reality that extracting accurate information from a long free-text document can be extremely difficult. Gene name confusion is particularly thorny30,46. An example from our curated yeast sample illustrates the difficulties. A purification with a tandem affinity purification tag with Vps71/Swc6 (slash separates synonymous approved names) as bait47 pulls down a protein named Swc3, but double-checking this finds that the coresponding open reading frame is actually SWC3 (locus name YAL011w), and not the ALR1/SWC3 (locus name YOL130w) open reading frame curated in the database. A shared synonym thoroughly muddled the curation.

2003

http://www.inproteo.com/nwglosbc.html
- Curated database: Annotated database created under the supervision of a curator, who makes judgments as data are cleaned up and merged.

Curated Database

References

2009

2003

Navigation menu

Search