2009 TheARTCorpus

From GM-RKB
Jump to navigation Jump to search

Subject Headings: ART Corpus.

Notes

Cited By

Quotes

Abstract

The ART corpus consist of 225 papers manually annotated the CISP labels (i.e. "Goal", "Method", "Result"). The ART Corpus is >1 million words, 35,040 sentences. These papers cover topics in physical chemistry and biochemistry and were provided by the Royal Society of Chemistry (RSC) Publishing. The Corpus was developed primarily to add value to scientific papers, through semantic markup that would make it easier for natural language processing and semantic web applications to automatically extract information pertaining to core scientific concepts. The ART corpus can also be used as a training set for machine learning algorithms, in order to automate the annotation of papers with CISP meta-data. The corpus is available as a collection of 225 .xml files, where each file corresponds to a separate paper whose sentences have been annotated individually with core scientific concepts.


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 TheARTCorpusMaria Liakata
Larisa Soldatova
The ART Corpushttp://cadair.aber.ac.uk/dspace/handle/2160/1979