(Redirected from Curated Data)Jump to navigation Jump to search
- AKA: Curated Dataset, Curated Record Set, Curated Artifact, Curated Data.
- See: Annotated Artifact.
- (Cusick et al., 2009) ⇒ Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, and Marc Vidal. (2009). “Literature-Curated Protein Interaction Datasets.” In: Nature Methods 6, 39 - 46 (2009)
- Why is reliability of literature curation so low? Our findings of large error rates in curated protein interaction databases, at least for yeast and human, are consistent with recent hints that the quality of literature-curated datasets may not be as high as widely perceived23,29,43–45. Perhaps occasionally curator error is responsible. However, we suggest that the errors are due not so much to curators but to the simple reality that extracting accurate information from a long free-text document can be extremely difficult. Gene name confusion is particularly thorny30,46. An example from our curated yeast sample illustrates the difficulties. A purification with a tandem affinity purification tag with Vps71/Swc6 (slash separates synonymous approved names) as bait47 pulls down a protein named Swc3, but double-checking this finds that the coresponding open reading frame is actually SWC3 (locus name YAL011w), and not the ALR1/SWC3 (locus name YOL130w) open reading frame curated in the database. A shared synonym thoroughly muddled the curation.
- Curated database: Annotated database created under the supervision of a curator, who makes judgments as data are cleaned up and merged.