2014 WikidataAFreeCollaborativeKnowl

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Wikidata KB, Wikidata Service.

Notes

Cited By

Quotes

Author Keywords

Abstract

This collaboratively edited knowledgebase provides a common source of data for Wikipedia, and everyone else.

Introduction

Unnoticed by most of its readers, Wikipedia continues to undergo dramatic changes, as its sister project Wikidata introduces a new multilingual "Wikipedia for data" (http://www.wikidata.org) to manage the factual information of the popular online encyclopedia. With Wikipedia's data becoming cleaned and integrated in a single location, opportunities arise for many new applications.

Originally conceived in 2001 as a mainly text-based resource, Wikipedia1 has collected increasing amounts of structured data, including numbers, dates, coordinates, and many types of relationships, from family trees to the taxonomy of species. It has become a resource of enormous value, with potential applications across all areas of science, technology, and culture. This development is hardly surprising, given that Wikipedia is committed to "a world in which every single human being can freely share in the sum of all knowledge," according to its vision statement (https://wikimediafoundation.org/wiki/Vision). There is no question this must include data that can be searched, analyzed, and reused.

It may be surprising that Wikipedia does not provide direct access to most of it, through either query services or downloadable data exports. Actual use of the data is rare and often restricted to specific pieces of information (such as geo-tags of Wikipedia articles used in Google Maps). The reason for this striking gap between vision and reality is that Wikipedia's data is buried in 30 million Wikipedia articles in 287 languages from which extraction is inherently very difficult.

This situation is unfortunate for anyone wanting to use the data but is also an increasing threat to Wikipedia's main goal of providing up-to-date, accurate, encyclopedic knowledge. The same information often appears in articles in many languages and in many articles within a single language. Population numbers for Rome, for example, can be found in English and Italian articles about Rome but also in the English article "Cities in Italy.” The numbers are all different.

Wikidata aims to overcome such inconsistencies by creating new ways for Wikipedia to manage its data on a global scale; see the result at http://www.wikidata.org. The following essential design decisions characterize the Wikidata approach.

Open editing
As in Wikipedia, Wikidata allows every user to extend and edit the stored information, even without creating an account. A form-based interface makes editing easy.
Community control
Not only is the actual data controlled by the contributor community, so, too, is the schema of the data. Contributors edit the population number of Rome but also decide whether there is such a number in the first place.
Plurality
It would be naive to expect global agreement on the "true" data, since many facts are disputed or simply uncertain. Wikidata allows conflicting data to coexist and provides mechanisms to organize this plurality.
Secondary data
Wikidata gathers facts published in primary sources, together with references to these sources; for example, there is no "true population of Rome" but rather a "population of Rome as published by the city of Rome in 2011."
Multilingual data
Most data is not tied to a single language; numbers, dates, and coordinates have universal meaning, so labels like "Rome" and "population" are translated into many different languages. Wikidata is multilingual by design. While Wikipedia has independent editions for each language, there is only one Wikidata site.
Easy access
Wikidata's goal is to allow data to be used both in Wikipedia and in external applications. Data is exported through Web services in several formats, including JavaScript Object Notation, or JSON, and Resource Description Framework, or RDF. Data is published under legal terms that allow the widest possible reuse.
Continuous evolution
In the best tradition of Wikipedia, Wikidata grows with its community of editors and developers and the tasks they give it. Rather than develop a perfect system to be presented to the world in a couple of years, new features are deployed incrementally and as early as possible.

These properties characterize Wikidata as a specific kind of curated database.8

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 WikidataAFreeCollaborativeKnowlDenny Vrandečić
Markus Krötzsch
Wikidata: A Free Collaborative Knowledgebase10.1145/26294892014