2007 RuleBasedProteinTermIdentification

(Wang, 2007) ⇒ Xinglong Wang. (2007). “Rule-based Protein Term Identification with Help from Automatic Species Tagging.” In: Proceedings of CICLING. doi:10.1007/978-3-540-70939-8_26

Subject Headings: Organism Component Semantic Relation Recognition Task, ITI TXM Corpora, Organism Mention Normalization Task, Organism NER.

Notes

Cited By

2008

(Wang and Grover, 2008) ⇒ Xinglong Wang and Claire Grover. (2008) Learning the Species of Biomedical Named Entities from Annotated Corpora.” In: Proceedings of LREC-2008.
- Our previous work (Wang, 2007) reported initial results of a species disambiguation system and the performance of TI with the system integrated. The accuracy of species tagging was 56.0% as tested by 10-fold cross validation on the training data and was 75.0% on the development test data. This species tagging component also improved the performance of a rule-based TI system by 10%. Note that those experiments were conducted on a different dataset using a different species ontology from the ones reported in this paper, and therefore the results are not comparable to those presented in this paper.

Quotes

Abstract

In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions described in literature, which only work on gene/protein mentions on a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared the performance of our automatic system to that of human annotators, with very promising results.

References

1. Michael Krauthammer, Nenadic, G.: Term identification in the biomedical literature. Journal of Biomedical Informatics Special Issue on Named Entity Recognition in Biomedicine) 37(6) (2004) 512–526
2. Lynette Hirschman, Morgan, A.A., Yeh, A.S.: Rutabaga by any other name: extracting biological names. J Biomed Inform 35(4) (2002) 247–259
3. Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological nomenclature: A source of lexical knowledge and ambiguity. In: Proceedings of Pac Symp Biocomput. (2004). 238–249
4. Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through term variation. In: Proceedings of 20th International Conference on Computational Linguistics (Coling 2004), Geneva, Switzerland (2004)
5. Chen, L., Liu, H., Friedman, C.: Gene name ambiguity of eukaryotic nomenclatures.Bioinformatics (2005) 248–256
6. (Fang et al., 2006) ⇒ Haw-ren Fang, Kevin P. Murphy, Yang Jin, Jessica S. Kim, and Peter S. White. (2006). “Human Gene Name Normalization Using Text Matching with Automatically Extracted Synonym Dictionaries.” In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Diology (BioNLP 2006).
7. Lynette Hirschman, Colosimo, M., Morgan, A., Columbe, J., Yeh, A.: Task 1B: Gene list task BioCreAtIve workshop. In: BioCreative: Critical Assessment for Information Extraction in Biology. (2004)
8. Hanisch, D., Fundel, K., Mevissen, H.T., Zimmer, R., Fluck, J.: ProMiner: Organism-specific protein name detection using approximate string matching. BMC Bioinformatics 6(Suppl 1):S14 (2005)
9. Crim, J., McDonald, R., Fernando Pereira: Automatically annotating documents with normalized gene lists. BMC Bioinformatics 6(Suppl 1):S13 (2005)
10. Fundel, K., Güttler, D., Zimmer, R., Apostolakis, J.: A simple approach for protein name identification: prospects and limits. BMC Bioinformatics 6(Suppl 1):S15 (2005)
11. Tamames, J.: Text detective: A rule-based system for gene annotation. BMC Bioinformatics 6(Suppl 1):S10 (2005)
12. Hackey, B., Nguyen, H., Nissim, M., Alex, B., Grover, C.: Grounding gene mentions with respect to gene database identifiers. In: BioCreAtIvE Workshop Handouts. (2004). Granada, Spain.
13. Liu, H.: BioTagger: A biological entity tagging system. In: BioCreAtIvE Workshop Handouts. (2004). Granada, Spain.
14. Morgan, A., Lynette Hirschman, Colosimo, M., Yeh, A., Colombe, J.: Gene name identification and normalization using a model organism database. J Biomedical Informatics 37 (2004) 396–410
15. Hanisch, D., Fluck, J., Mevissen, H., Zimmer, R.: Playing biology’s name game: identifying protein names in scientific text. Pac Symp Biocomput 403-14 (2003)
16. Rada Mihalcea, T. Chklovski, A. Killgariff. (2004). “The Senseval-3 English lexical sample task.” In: Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3). (2004)
17. Schwartz, A., Hearst, M.: A simople algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing.(2003)
18. Ghanem, M., Guo, Y., Lodhi, H., Zhang, Y.: Automatic scientific text classification using local patterns: KDD Cup (2002). In: ACM SIGKDD Explorations Newsletter. Volume 4(2). (2003). 95–96,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2007 RuleBasedProteinTermIdentification	Xinglong Wang			Rule-based Protein Term Identification with Help from Automatic Species Tagging			http://www.ltg.ed.ac.uk/np/publications/ltg/papers/Wang2007Rulebased.pdf	10.1007/978-3-540-70939-8_26