2010 CorporaForTheConceptuAndZonOfSciPaps

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Abstract

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, applied to primary research articles in chemistry. AZ-II is the extension of AZ for chemistry papers. AZ has been shown to have been reliably annotated by independent human coders and useful for various information access tasks. Like AZ, AZ-II follows the rhetorical structure of a scientific paper and the knowledge claims made by the authors. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weaknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one.


References

  • T. Byrt, J. Bishop, and J.B Carlin. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 45(5):423–429.
  • J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46.
  • Joseph L. Fleiss, Jacob Cohen, and B.S. Everitt. 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5):323–327.
  • Joesph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76:378–381.
  • K. Hirohata, N. Okazaki, S. Ananiadou, and M. Ishizuka. (2008). Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of the IJCNLP 2008.
  • A. Korhonen, L. Sun, I. Silins, and U. Stenius. (2009). The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature. BMC Bioinformatics, 323(10).
  • Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, CA.
  • (Liakata & Soldatova, 2009) ⇒ Maria Liakata and Larisa N. Soldatova. (2009). “The ART Corpus." Technical report, Aberystwyth University.
  • (Liakata et al., 2009) ⇒ Maria Liakata, Q. Claire, and Larisa N. Soldatova. (2009). “Semantic Annotation of Papers: Interface & enrichment tool (sapient).” In: Proceedings of BioNLP Workshop (BioNLP 2009).
  • T. McIntosh and J.R. Curran. (2009). Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics, 10(311).
  • B. Medlock and T. Briscoe. (2007). Weakly supervised learning for hedge classification in scientific literature. In: 45th Annual Meeting of the ACL, pages 23–30, Prague, Czech Republic.
  • Sidney Siegel and N. John Jr. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw- Hill, Berkeley, CA, 2nd edition.
  • Larisa N. Soldatova, and R.D. King. (2006). An ontology of scientific experiments. Journal of the Royal Society Interface, 3:795–803.
  • Larisa N. Soldatova, and M. Liakata. (2007). An ontology methodology and cisp - the proposed core information about scientific papers. Technical Report JISC Project Report, Aberystwyth University.
  • Simone Teufel, Advaith Siddharthan, and Colin Batchelor. (2009). Towards discipline-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of EMNLP-09, Singapore.
  • P. Thompson, S.A. Iqbal, J. McNaught, and S. Ananiadou. (2009). Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics, 10(349).
  • V. Vincze, G. Szarvas, R. Farkas, G. Mra, and J. Csirik. (2008). The bioscope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 CorporaForTheConceptuAndZonOfSciPapsMaria Liakata
Simone Teufel
Advaith Siddharthan
Colin R. Batchelor
Corpora for the Conceptualisation and Zoning of Scientific Papershttp://www.lrec-conf.org/proceedings/lrec2010/pdf/644 Paper.pdf