2004 CitationSentenForSemantAnalysisOfBioscienceText

Jump to: navigation, search

Subject Headings: Paraphrase Generation.


Cited By



We propose the use of the text of the sentences surrounding citations as an important tool for semantic interpretation of bioscience text. We hypothesize several different uses of citation sentences (which we call citances), including the creation of training and testing data for semantic analysis (especially for entity and relation recognition), synonym set creation, database curation, document summarization, and information retrieval generally. We illustrate some of these ideas, showing that citations to one document in particular align well with what a hand-built curator extracted. We also show prelimary results on the problem of normalizing the different ways that the same concepts are expressed within a set of citances, using and improving on existing techniques in automatic paraphrase generation.


  • F. R. Bach and M. I. Jordan. Learning spectral clustering. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
  • Regina Barzilay and L. Lee. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In: Proceedings of HLT-NAACL., pages 16–23, 2003.
  • Regina Barzilay and Kathleen R. McKeown. Extracting paraphrases from a parallel corpus. In: Proceedings of ACL., pages 50–57, 2001.
  • G. Bhalotia, P. Nakov, A. Schwartz, and M. Hearst. Biotext team report for the trec 2003 genomics track. In: Proceedings of TREC, 2003.
  • S. Bradshaw. Reference directed indexing: Redeeming relevance for subject search in citation indexes. In: Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, 2003.
  • S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. COMPUTER NETWORKS AND ISDN SYSTEMS, 1–7:107–117, 1998.
  • N. Cancedda, ´Eric Gaussier, C. Goutte, and J.-M. Renders. Word-sequence kernels. Journal of Machine Learning Research, 3:1059–1082, 2003.
  • S. Chakrabarti, B. Dom, Prabhakar Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In: Proceedings of the seventh International Conference on World Wide Web 7, pages 65–74. Elsevier Science Publishers B. V., 1998.
  • N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 250–257. ACM Press, 2001.
  • J. Furnkranz. Exploiting structural information for text classification on the www. In: Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis, pages 487–498. Springer-Verlag, 1999.
  • E. Garfield. Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159):108–111, 1955.
  • E. Garfield. Can citation indexing be automated? National Bureau of Standards Miscellaneous Publication, 269:189–192, 1965.
  • C. L. Giles, K. D. Bollacker, and S. Lawrence. Citeseer: an automatic citation indexing system. In: Proceedings of the third ACM conference on Digital libraries, pages 89–98. ACM Press, 1998.
  • Gregory Grefenstette. Sextant: Exploring unexplored contexts for semantic extraction from syntactic analysis. In: Proceedings of ACL, pages 324–326, 1992.
  • Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, 1994.
  • A. Ibrahim, B. Katz, and J. Lin. Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of Second International Workshop on Paraphrasing (IWP 2003), pages 57–64, 2003.
  • Dekang Lin. Dependency-based evaluation of minipar. In: Proceedings of Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation., 1998.
  • Dekang Lin and P. Pantel. Discovery of inference rules for question answering. Natural Language Engineering, 7(4):343–360, 2001.
  • B. A. Lipetz. Improvements of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation, 16:81–90, 1965.
  • M. Liu. Progress in documentation. the complexities of citation practice: A review of citation studies. Journal of Documentation, 49(4):370–408, 1993.
  • R. E. Mercer and C. D. Marco. A design methodology for a biomedical literature indexing tool using the rhetoric of science. In BioLink workshop in conjunction with NAACL/HLT, pages 77–84, 2004.
  • M. J. Moravcsik and P. Murugesan. Some results on the function and quality of citations. Social Studies of Science, 5:86–92, 1975.
  • H. Nanba, N. Kando, and M. Okumura. Classification of research papers using citation links and citation types: Towards automatic review article generation. In American Society for Information Science SIG Classification Research Workshop: Classification for User Support and Learning, pages 117–134, 2000.
  • B. Pang, K. Knight, and D. Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In: Proceedings of HLTNAACL, pages 181–188, 2003.
  • J. Rennie and A. McCallum. Using reinforcement learning to spider the web efficiently. In: Proceedings of the Sixteenth International Conference on Machine Learning, pages 335–343. Morgan Kaufmann Publishers Inc., 1999.
  • Matthew Richardson and Pedro Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.
  • Y. Shinyama and S. Sekine. Paraphrase acquisition for information extraction. In: Proceedings of Second International Workshop on Paraphrasing (IWP2003), 2003.
  • Y. Shinyama, S. Sekine, K. Sudo, and R. Grishman. Automatic paraphrase acquisition from news articles. In: Proceedings of HLT, pages 40–46, 2002.
  • S. Teufel and M. Moens. Summarizing scientific articles – experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409–445, 2002.
  • H. D. White. Citation analysis and discourse analysis revisited. Applied Linguistics, 25(1):89–116, 2004.
  • J. Whitfield, S. Neame, L. Paquet, O. Bernard, and J. Ham. Dominantnegative c-jun promotes neuronal survival by reducing bim expression and inhibiting mitochondrial cytochrome c release. Neuron, 29:629–643, 2001.
  • http://www.mitre.org/public/biocreative/.
  • http://biotext.berkeley.edu/.
  • http://www.ncbi.nlm.nih.gov/locuslink/.
  • http://trec.nist.gov/.

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2004 CitationSentenForSemantAnalysisOfBioscienceTextPreslav Nakov
Ariel Schwart
Marti Hearst
Citation Sentences for Semantic Analysis of Bioscience TextProceedings of the Workshop on Search and Discovery in Bioinformatics at SIGIR 2004http://biotext.berkeley.edu/papers/citances-nlpbio04.pdf2004