2005 ProteinFunctionPredictionViaGraphKernels

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Graph Kernel Function

Notes

Cited By

2006

Quotes

Abstract

Motivation

Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs.

Results

Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively.



References

  • Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
  • Andreeva,A., Howorth,D., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2004). Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res., 32, D226–D229.
  • Bartlett,G.J., Todd,A.E. and Thornton,J.M. (2003). Inferring protein function from structure. In Bourne,P.E. and Welssig, H.(eds), Structural Bioinformatics. Wiley-Liss, Inc., New York,pp. 387–407.
  • Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242.
  • Binkowski,T.A., Naghibzadeh,S. and Liang,J. (2003). Castp: computed atlas of surface topography of proteins. Nucleic Acids Res.,31, 3352–3355.
  • Boyd,S. and Vandenberghe,L. (2004). Convex Optimization. Cambridge University Press.
  • Cai,C.Z., Han,L.Y., Ji,Z.L. and Chen,Y.Z. (2004). Enzyme family classification by support vector machines. Proteins, 55, 66–76.
  • Cai,C.Z., Wang,W.L., Sun,L.Z. and Chen,Y.Z. (2003). Protein function classification via support vector machine approach. Math. Biosci., 185, 111–122.
  • Charton,M. and Charton,B.I. (1982) The structural dependence of amino acid hydrophobicity parameters. J. Theor. Biol., 99, 629–644.
  • Cid,H., Bunster,M., Canales,M. and Gazitua,F. (1992) Hydrophobicity and structural classes in proteins. Protein Eng., 5, 373–375.
  • Cortes,C., Haffner,P. and Mohri,M. (2003). Positive definite rational kernels. In Schölkopf,B. and Warmuth,M.K. (eds), Proceedings of the 16th Annual Conference on Learning Theory, pp. 41–56.
  • Cortes,C. and Vapnik,V. (1995) Support vector networks.Machine Learning, 20, 273–297.
  • Dobson,P.D. and Doig,A.J. (2003). Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330, 771–783.
  • Fauchere,J.L., Charton,M., Kier,L.B., Verloop,A. and Pliska,V. (1988) Amino acid side chain parameters for correlation studies in biology and pharmacology. International J. Pept. Protein Res., 32, 269–278.
  • Fine,S. and Scheinberg,K. (2001) Efficient SVM training using lowrank kernel representations. J. Mach. Learn. Res., 2, 243–264.
  • Gärtner,T., Flach,P. and Wrobel,S. (2003). On graph kernels: hardness results and efficient alternatives. In Schölkopf,B. and Warmuth,M.K. (eds), Proceedings of the 16th Annual Conference on Learning Theory, pp. 129–143.
  • Grantham,R. (1974) Amino acid difference formula to help explain protein evolution. Science, 185, 862–864.
  • Harrison,A., Pearl,F., Mott,R., Thornton,J. and Orengo,C. (2002). Quantifying the similarities within fold space. J. Mol. Biol., 323, 909–926.
  • Haussler,D. (1999) Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Department, University of California, Santa Cruz CA.
  • Hegyi,H. and Gerstein,M. (1999) The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol., 288, 147–164.
  • Holm,L. and Sander,C. (1996) Mapping the protein universe. Science, 273, 595–602.
  • Kashima,H., Tsuda,K. and Inokuchi,A. (2003). Marginalized kernels between labeled graphs. Proceedings of ICML, Washington, DC, pp. 321–328.
  • Kawashima,S., Ogata,H. and Kanehisa,M. (1999) Aaindex: amino acid index database. Nucleic Acids Res., 27, 368–369.
  • Kondor,R.S. and Lafferty,J. (2002). Diffusion kernels on graphs and other discrete structures. Proceedings of ICML, Sydney, Australia, pp. 325–322.
  • Krissinel,E. and Henrick,K. (2003). Protein structure comparison in 3D based on secondary structure matching (SSM) followed by Cα alignment, scored by a new structural similarity function. In Kungl,A.J. and Kung,P.J. (eds), Proceedings of the 5th International Conference on Molecular Structural Biology, Vienna, pp. 88.
  • Lanckriet,G.R.G, De Bie,T., Cristianini,N., Jordan,M.I. and Noble,W.S. (2004). A statistical framework for genomic data fusion. Bioinformatics, 20, 2626–2635.
  • Ong,C.S. and Smola,A.J. (2003). Machine learning with hyperkernels. Proceedings of ICML, Washington, DC, pp. 568–575.
  • Ong,C.S., Smola,A.J. and Williamson,R.C. (2003). Hyperkernels. In Becker,S., Thrun,S. and Obermayer,K. (eds), Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, MA, pp. 495–502.
  • Orengo,C.A., Pearl,F.M. and Thornton,J.M. (2003). The cath domain structure database. Methods Biochem. Anal., 44, 249–271.
  • Pellegrini,M., Marcotte,E.M., Thornton,M.J., Eisenberg,D. and Yeates,T.O.E. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.Proc. Natl Acad. Sci. USA, 96, 4258–4288.
  • Schölkopf,B. and Smola,A.J. (2002). Learning with Kernels. MIT Press, Cambridge, MA. i55 K.M.Borgwardt et al.
  • Schölkopf,B., Tsuda,K. and Vert,J.P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.
  • Schomburg,I., Chang,A., Ebeling,C., Gremse,M., Heldt,C., Huhn,G. and Schomburg,D. (2004). BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res., 32,D431–D433.
  • Tsodikov,O.V., Record,M.T.Jr. and Sergeev,Y.V. (2002). A novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature. J. Comput. Chem., 23, 600–609.
  • Whisstock,J.C. and Lesk,A.M. (2003). Prediction of protein function from protein sequence and structure. Q. Rev. Biophys., 36, 307–340.
  • Wilks,H.M., Hart,K.W., Feeney,R., Dunn,C.R., Muirhead,H., Chia,W.N., Barstow,D.A., Atkinson,T., Clarke,A.R. and Holbrook,J.J. (1988) A specific, highly active malate dehydrogenase by redesign of a lactate dehydrogenase framework. Science, 242, 1541–1544.
  • Xenarios,I., Salwinski,L., Duan,X., Higney,P., Kim,S.M. and Eisenberg,D. (2002). Dip, the database of interacting proteins: a research tool for studying cellualr networks of protein interactions. Nucleic Acids Res., 30, 303–305.
  • Yao,H., Kristensen,D.M., Mihalek,I., Sowa,M.E., Shaw,C., Kimmel,M., Kavraki,L. and Lichtarge,O. (2003). An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol., 326, 255–261.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 ProteinFunctionPredictionViaGraphKernelsHans-Peter Kriegel
Alexander J. Smola
Karsten M. Borgwardt
Cheng Soon Ong
Stefan Schönauer
S.V.N.Vishwanathan
Protein function prediction via graph kernelsISMBhttp://bioinformatics.oxfordjournals.org/cgi/reprint/21/suppl 1/i4710.1093/bioinformatics/bti10072005