2009 CallingInternationalRescue

Jump to: navigation, search

Subject Headings: Scientific Literature, Semantic Annotation.


Cited By



  • We live in interesting times. Portents of impending catastrophe pervade the literature, calling us to action in the face of unmanageable volumes of scientific data. But it isn't so much data generation per se, but the systematic burial of the knowledge embodied in those data that poses the problem: there is so much information available that we simply no longer know what we know, and finding what we want is hard – too hard. The knowledge we seek is often fragmentary and disconnected, spread thinly across thousands of databases and millions of articles in thousands of journals. The intellectual energy required to search this array of data-archives, and the time and money this wastes, has led several researchers to challenge the methods by which we traditionally commit newly acquired facts and knowledge to the scientific record. We present some of these initiatives here – a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment. With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist? We explore these questions, and, as you read on, we invite you to engage in an experiment with us, a real-time test of a new technology to rescue data from the dormant pages of published documents. We ask you, please, to read the instructions carefully. The time has come: you may turn over your papers…

Elsevier Grand Challenge

  • In 2008, to stimulate further efforts to improve the way scientific information is communicated and used, Elsevier announced its Grand Challenge of Knowledge Enhancement in the Life Sciences. The focus of the contest was to develop tools for semantic annotation of journals and text-based databases, to improve access to, and sharing of, the knowledge contained within them: in short, to change the way that science is published. The winners of the contest developed a tool (Reflect) that addresses the routine need of life scientists to be able both to jump from gene or protein names to their molecular sequences, and to understand more about particular genes, proteins or small molecules encountered in the literature [44]. ...

The Semantic Biochemical Journal experiment

  • The Semantic Biochemical Journal (BJ) experiment was a collaborative project involving the BJ editorial staff and the developers of Utopia [73], a software suite that semantically integrates visualization and data-analysis tools with documentreading and document-management utilities. The principal aim of the project was to make the content of BJ electronic publications and supplementary data richer and more accessible. To achieve this, Utopia was integrated with in-house editorial and document management workflows, allowing copy editors to mark up content prior to publication; this removed the mark-up burden from submitting authors, and ensured rigour and consistency from the outset. …


The PDF debate

  • In recent years, the literature has seen the value of PDF as a mechanism for digitizing the printed page rather hotly contested. PDF, although easy for humans to read, is not regarded as an efficient medium for gathering information, nor for sharing, integrating and interacting with knowledge; it is considered semantically limited by comparison with XML, and antithetical to the spirit of the Web [11,34,35,37,77].
  • Notwithstanding the critics, PDFs are still the dominant means of dissemination of scientific papers. For the human reader, they are like ‘electronic paper’ – they generally inherit the standard typesetting conventions of the original journal and hence feel ‘natural’ to read. People also like to have their own copies of documents, which can be read offline, with the added comfort of knowing that the PDF won’t disappear even if its originating website does.
  • Adobe’s PDF has therefore become the de facto standard for document dissemination (although technically a proprietary format, it is sufficiently open to be supported by all platforms). It supports basic annotation and hyperlinking (within a document, and to external sources), and also allows inclusion of metadata. Interestingly, earlier this year, the Charlesworth Group, working with Nature Publishing Group, completed a project to incorporate eXtensible Metadata Platform (XMP) metadata within Nature’s online PDFs (the metadata include article titles, author details, keywords, images, DOIs and so on; http://www.nature.com/press_releases/charlesworth.html). This has the dual advantage of presenting scholarly information both in a human-readable form and in a format accessible to software applications. However, although all new Nature research articles will contain embedded XMP metadata as they are published, there are no plans for retrospective mark-up of the Nature archives. Moreover, as the metadata are embedded at the point of publication, they are effectively as fixed as the original PDF and are unavailable for future modification. This is in contrast with the approach taken with UD, which vivifies the static PDF document by overlaying dynamic, customizable metadata, in turn adding evolvable, interactive content to the underlying file. As mentioned above, this system also yields the potential for sharing community comments and annotations on any document (past and present), storing them on a common server and making them accessible to future semantic Web applications.
  • Clearly, the technology to add value to PDF documents, whether with links to websites, links to interactive analysis tools or to live online commentaries or blogs, is with us now; the time is therefore ripe to exploit it. On a technical level, the ultimate goal is effective ‘knowledge management’ [11,78]; on a human level, it is to deliver to the research community a tangible way not simply to bring sanity to the sprawling mass of scientific data and literature, but to rescue the knowledge being systematically entombed in world-wide literature and data archives.


  • [44] E. Pafilis, S. I. O’Donoghue, L. J. Jensen, H. Horn, M. Kuhn, N. Peter F. Brown, and R. Schneider. (2009). “Reflect: Augmented browsing for the life scientist.” In: Nat. Biotechnol. 27, 508–510
  • [70] E. van Mulligen, M. Diwersy, B. Schijvenaarsa, M. Weebera, C. van der Eijka, R. Jeliera, M. Schuemiea, J. Korsa, and B. Mons. (2004). “Contextual annotation of web pages for interactive browsing.” In: MEDINFO 2004, 94–97
  • [73] Steve R. Pettifer, Thorne, D., Philip McDermott, James Marsh, A. Villeger, Douglas B. Kell, and Teresa K. Attwood. (2009). “Visualising biological data: a semantic approach to tool and database integration.” In: BMC Bioinform. 10, S18
  • [77] (Renear & Palmer, 2009) ⇒ Allen H. Renear, and Carole L. Palmer. (2009). “Strategic Reading, Ontologies, and the Future of Scientific Publishing.” In: Science, 325(5942). doi:10.1126/science.1157784.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 CallingInternationalRescueTeresa K. Attwood
Douglas B. Kell
Philip McDermott
James Marsh
Steve R. Pettifer
David Thorne
Calling International Rescue: knowledge lost in literature and data landslide!Biochemical Journalhttp://www.biochemj.org/bj/424/0317/4240317.pdf10.1042/BJ200914742009