2010 IdentifyingTheInfStructOfSciAbstracts

(Guo et al., 2010) ⇒ Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun, Ulla Stenius. (2010). “Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes.” In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology (BioNLP 2010).

Subject Headings: Scientific Paper Abstract, Scientific Paper.

Notes

Cited By

~2 http://scholar.google.com/scholar?cites=6014723020024385131

Quotes

Abstract

Many practical tasks require accessing specific types of information in scientific literature; e.g. information about the objective, methods, results or conclusions of the study in question. Several schemes have been developed to characterize such information in full journal papers. Yet many tasks focus on abstracts instead. We take three schemes of different type and granularity (those based on section names, argumentative zones and conceptual structure of documents) and investigate their applicability to biomedical abstracts. We show that even for the finest-grained of these schemes, the majority of categories appear in abstracts and can be identified relatively reliably using machine learning. We discuss the impact of our results and the need for subsequent task-based evaluation of the schemes.

References

J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46.
J. R. Curran, S. Clark, and J. Bos. (2007). Linguistically motivated large-scale nlp with c&c and boxer. In: Proceedings of the ACL 2007 Demonstrations Session, pages 33–36.
K. Hirohata, N. Okazaki, S. Ananiadou, and M. Ishizuka. (2008). Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of 3rd International Joint Conference on Natural Language Processing.
Anna Korhonen, L. Sun, I. Silins, and U. Stenius. (2009). The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature. BMC Bioinformatics, 10:303.
J. Lafferty, A. McCallum, and F. Pereira. (2001). Conditionl random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML.
J. R. Landis and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159–174.
Maria Liakata and L.N. Soldatova. (2008). Guidelines for the annotation of general scientific concepts. Aberystwyth University, JISC Project Report http://ie-repository.jisc.ac.uk/88/.
Maria Liakata, Claire Q, and L.N. Soldatova. (2009). Semantic annotation of papers: Interface & enrichment tool (sapient). In: Proceedings of BioNLP-09, pages 193–200, Boulder, Colorado.
Maria Liakata, S. Teufel, A. Siddharthan, and C. Batchelor. (2010). Corpora for the conceptualisation and zoning of scientific papers. To appear in the 7th International Conference on Language Resources and Evaluation.
J. Lin, D. Karakos, D. Demner-Fushman, and S. Khudanpur. (2006). Generative content models for structural analysis of medical abstracts. In: Proceedings of BioNLP-06, pages 65–72, New York, USA.
J. Lin. (2009). Is searching full text more effective than searching abstracts? BMC Bioinformatics, 10:46.
S. Merity, T. Murphy, and J. R. Curran. (2009). Accurate argumentative zoning with maximum entropy models. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 19–26. Association for Computational Linguistics.
Y. Mizuta, Anna Korhonen, T. Mullen, and N. Collier. (2005). Zone analysis in biology articles as a basis for information extraction. International Journal of Medical Informatics on Natural Language Processing in Biomedicine and Its Applications.
T. Mullen, Y. Mizuta, and N. Collier. (2005). A baseline feature set for learning rhetorical zones using full articles in the biomedical domain. Natural language processing and text mining, 7:52–58.
P. Ruch, C. Boyer, C. Chichester, I. Tbahriti, A. Geissbuhler, P. Fabry, J. Gobeill, V. Pillet, D. Rebholz- Schuhmann, C. Lovis, and A. L. Veuthey. (2007). Using argumentation to extract key sentences from biomedical abstracts. Int J Med Inform, 76:195– 200.
H. Shatkay, F. Pan, A. Rzhetsky, and W. J. Wilbur. (2008). Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 18:2086–2093.
S. Siegel and N. J. Jr. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw- Hill, Berkeley, CA, 2nd edition.
L. Sun and Anna Korhonen. (2009). Improving verb clustering with automatically acquired selectional preference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing.
I. Tbahriti, C. Chichester, Frederique Lisacek, and P. Ruch. (2006). Using argumentation to retrieve articles with similar citations. Int J Med Inform, 75:488–495.
S. Teufel and M. Moens. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28:409–445.
S. Teufel, A. Siddharthan, and C. Batchelor. (2009). Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of EMNLP.
I. H. Witten, 2008. Data mining: Practical Machine Learning Tools and Techniques with Java Implementations. http://www.cs.waikato.ac.nz/ml/weka/.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2010 IdentifyingTheInfStructOfSciAbstracts	Anna Korhonen Maria Liakata Yufan Guo Ilona Silins Lin Sun Ulla Stenius			Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes		Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology	http://www.aclweb.org/anthology/W/W10/W10-1913.pdf			2010

2010 IdentifyingTheInfStructOfSciAbstracts

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search