2008 BigData

Jump to: navigation, search

Subject Headings: Biocuration Task, Biocurator.


Cited By

  • (Cusick et al., 2009) ⇒ Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, and Marc Vidal. (2009). “Literature-Curated Protein Interaction Datasets.” In: Nature Methods 6, 39 - 46 (2009)



To thrive, the field that links biologists and their data urgently needs structure, recognition and support.

The exponential growth in the amount of biological data means that revolutionary measures are needed for data management, analysis and accessibility. Online databases have become important avenues for publishing biological data.

The exponential growth in the amount of biological data means that revolutionary measures are needed for data management, analysis and accessibility. Online databases have become important avenues for publishing biological data. Biocuration, the activity of organizing, representing and making biological information accessible to both humans and computers, has become an essential part of biological discovery and biomedical research. But curation increasingly lags behind data generation in funding, development and recognition.

We propose three urgent actions to advance this key field. First, authors, journals and curators should immediately begin to work together to facilitate the exchange of data between journal publications and databases.

Second, in the next five years, curators, researchers and university administrations should develop an accepted recognition structure to facilitate community-based curation efforts.

Third, curators, researchers, academic institutions and funding agencies should, in the next ten years, increase the visibility and support of scientific curation as a professional career.

Failure to address these three issues will cause the available curated data to lag farther behind current biological knowledge. Researchers will observe an increasing occurrence of obvious gaps in knowledge. As these gaps expand, resources will become less effective for generating and testing hypotheses, and the usefulness of curated data will be seriously compromised.

When all the data produced or published are curated to a high standard and made accessible as soon as they become available, biological research will be conducted in a manner that is quite unlike the way it is done now. Researchers will be able to process massive amounts of complex data much more quickly. They will garner insight about the areas of their interest rapidly with the help of inference programs. Digesting information and generating hypotheses at the computer screen will be so much faster that researchers will get back to the bench quickly for more experiments. Experiments will be designed with more insight; this increased specificity will cause an exponential growth in knowledge, much as we are experiencing exponential growth in data today.

Data avalanche

Such data, produced at great effort and expense, are only as useful as researchers' ability to locate, integrate and access them. In recent years, this challenge has been met by a growing cadre of biologists — 'biocurators' — who manage raw biological data, extract information from published literature, develop structured vocabularies to tag data and make the information available online.


  1. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. Nucl. Acid. Res. 36, D25–D30 (2008). | Article | ChemPort |
  2. Wheeler, D. L. et al. Nucl. Acid. Res. 36, D13–D21 (2008). | Article | ChemPort |
  3. Salimi, N. & Vita, R. PLoS Comput. Biol. 2, e125 (2006). | Article | PubMed | ChemPort |
  4. Brazma, A. et al. Nature Genet. 29, 365–371 (2001). | Article |
  5. Deutsch, E. W. et al. Nature Biotechnol. 26, 305–312 (2008). | Article |
  6. Field, D. et al. Nature Biotechnol. 26, 541–547 (2008). | Article |
  7. Jenkins, H. et al. Nature Biotechnol. 22, 1601–1606 (2004). | Article |
  8. Orchard, S. et al. Nature Biotechnol. 25, 894–898 (2007). | Article |
  9. Taylor, C. F. et al. Nature Biotechnol. 25, 887–893 (2007). | Article |
  10. Bourne, P. PLoS Comput. Biol. 1, 179–181 (2005). | PubMed | ChemPort |
  11. Seringhaus, M. R. & Gerstein, M. B. BMC Bioinformatics 8, 17 (2007). | Article | PubMed | ChemPort |
  12. Seringhaus, M. & Gerstein, M. FEBS Lett. 582, 1170 (2008). | Article | PubMed | ChemPort |
  13. Ort, D. R. & Grennan, A. K. Plant Physiol. 146, 1022–1023 (2008). | Article | PubMed | ChemPort |
  14. Burkhardt, K., Schneider, B. & Ory, J. PLoS Comput. Biol. 2, e99 (2006). | Article | PubMed | ChemPort |
  15. Rhee, S. Y. Plant Physiol. 134, 543–547 (2004). | Article | PubMed | ChemPort |
  16. Mons, B. et al. Genome Biol. 9, R89 (2008). | Article | PubMed | ChemPort |
  17. Huss, J. W. et al. PLoS Biol. 6, e175 (2008). | Article | PubMed | ChemPort |
  18. Palmer, C. L., Heidorn, P. B., Wright, D. & Cragin, M. H. International J. Dig. Curation 2, 31–40 (2007).,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 BigDataDoug Howe
Maria Costanzo
Petra Fey
Takashi Gojobori
Linda Hannick
Winston Hide
David P. Hill
Renate Kania
Mary Schaeffer
Susan St Pierre
Simon Twigger
Owen White
Seung Yon Rhee
Big Data: The future of biocurationNature10.1038/455047a2008