2003 InvestSemantSimMeasAcrossTheGeneOntoTheRelBetwSeqAndAnno

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Semantic Similarity Measure.

Notes

Cited By

Quotes

Abstract

Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or ‘semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases.

References

  • Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
  • Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.
  • Blagosklonny,M.V. and Pardee,A.B. (2002). Unearthing the gems. Nature, 416, 373.
  • Budanitsky,A. and Hirst,G. (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh.
  • Camon,E., Magrane,M., Barrell,D., Binns,D., Fleischmann,W., Kersey,P., Mulder,N., Oinn,T. and Apweiler,R. (2002). The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL and InterPro. Genome Res., 13, 666–672.
  • Chang,J., Raychaudhuri,S. and Altman,R. (2001) Including biological literature improves homology search. Pac. Symp. Biocomput., 6, 374–383
  • Fellbaum,C. (ed.) (1998) WordNet. An electronic lexical database. Massachusetts, Cambridge, MIT Press.
  • Jiang,J.J. and Conrath,D.W. (1998) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics. ROCLING X, Taiwan.
  • Dekang Lin (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, pp. 296–304.
  • Lord,P., Stevens,R., Brass,A. and Goble,C. (2003). Semantic similarity measures as tools for exploring the Gene Ontology. Pac. Symp. Biocomput., 8, 601–612.
  • MacCallum,R.M., Kelley,L.A. and Sternberg,M.J. (2000) SAWTED: structure assignment with text description–enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics, 16, 125–129.
  • Odell,J. (1998) Six Different Kinds of Aggregation. In advanced object-oriented analysis and design using UML. Cambridge University Press, pp. 139–149.
  • Rada,R., Mili,H., Bicknell,E. and Blettner,M. (1989) Development and application of a metric on semantic nets. IEEE Transaction on Systems, Man, and Cybernetics, 1, 17–30.
  • Resnik,P. (1999) Semantic similarity in a taxonomy: an informationbased measure and its application to problems of ambiguity in natural language. J. Artif. Intelligence Res., 11, 95–130.
  • Stevens,R., Goble,C. and Sean Bechhofer. (2000) Ontology-based Knowledge Representation for Bioinformatics. Briefings in Bioinformatics, 1, 398–416.
  • The Gene Ontology Consortium (2001) Creating the Gene Ontology resource: design and implementation. Genome Res., 11, 1425– 1433.
  • Wilbur,W.J. and Yang,Y. (1996) An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Comput. Biol. Med., 26, 209–222.
  • Winston,M., Chaffin,R. and Herrmann,D. (1987) A taxonomy of part-whole relations. Cognitive Science, 11, 417–444.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 InvestSemantSimMeasAcrossTheGeneOntoTheRelBetwSeqAndAnnoPhillip W. Lord
Robert D. Stevens
Andy Brass
Carole A. Goble
Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and AnnotationBioinformatics Subject Areahttp://www.cs.man.ac.uk/~stevensr/papers/bioinformatics-semantic-similarity.pdf2003