2017 ClipperALowLatencyOnlinePredict

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Trained Predictive Model Deployment, Prediction Serving System, TensorFlow Serving System.

Notes

Cited By

Quotes

Abstract

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.

In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the Tensorflow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.

References

  • 1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, Et Al. Tensorflow: Large-scale Machine Learning on Heterogeneous Systems, 2015. Software Available from Tensorflow.org.
  • 2. A. Agarwal, S. Bird, M. Cozowicz, L. Hoang, J. Langford, S. Lee, J. Li, D. Melamed, G. Oshri, O. Ribas, Et Al. A Multiworld Testing Decision Service. arXiv Preprint ArXiv:1606.03966, 2016.
  • 3. Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, Liang Zhang, LASER: A Scalable Response Prediction Platform for Online Advertising, Proceedings of the 7th ACM International Conference on Web Search and Data Mining, February 24-28, 2014, New York, New York, USA
  • 4. Sharad Agarwal, Jacob R. Lorch, Matchmaking for Online Games and Other Latency-sensitive P2P Systems, ACM SIGCOMM Computer Communication Review, v.39 n.4, October 2009
  • 5. Amr Ahmed, Moahmed Aly, Joseph Gonzalez, Shravan Narayanamurthy, Alexander J. Smola, Scalable Inference in Latent Variable Models, Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, February 08-12, 2012, Seattle, Washington, USA
  • 6. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, v.32 n.1, p.48-77, 2003
  • 7. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A Cpu and Gpu Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), Volume 4, Page 3. Austin, TX, 2010.
  • 8. Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, 2006
  • 9. Leo Breiman, Bagging Predictors, Machine Learning, v.24 n.2, p.123-140, Aug. 1996
  • 10. C. Chelba, D. Bikel, M. Shugrina, P. Nguyen, and S. Kumar. Large Scale Language Modeling in Automatic Speech Recognition. arXiv Preprint ArXiv:1210.8440, 2012.
  • 11. J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting Distributed Synchronous SGD. arXiv.org, Apr. 2016.
  • 12. T. Chen and C. Guestrin. XGBoost: A Scalable Tree Boosting System. arXiv.org, Mar. 2016.
  • 13. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv Preprint ArXiv:1512.01274, 2015.
  • 14. Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, Karthik Kalyanaraman, Project Adam: Building An Efficient and Scalable Deep Learning Training System, Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, October 06-08, 2014, Broomfield, CO
  • 15. Dah-Ming Chiu, Raj Jain, Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks, Computer Networks and ISDN Systems, v.17 n.1, p.1-14, June 10, 1989
  • 16. R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop, Number EPFL-CONF-192376, 2011.
  • 17. F. J. Corbato. A Paging Experiment with the Multics System. 1968.
  • 18. Microsoft Cortana. Https://www.microsoft.com/en-us/mobile/experiences/cortana/.
  • 19. D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. In CIDR 2015, 2015.
  • 20. James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath, The YouTube Video Recommendation System, Proceedings of the Fourth ACM Conference on Recommender Systems, September 26-30, 2010, Barcelona, Spain
  • 21. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng, Large Scale Distributed Deep Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1223-1231, December 03-06, 2012, Lake Tahoe, Nevada
  • 22. J. Donahue. Caffenet. Https://github.com/BVLC/ Caffe/tree/master/models/bvlc_reference_caffenet.
  • 23. Aditya Ganjam, Junchen Jiang, Xi Liu, Vyas Sekar, Faisal Siddiqi, Ion Stoica, Jibin Zhan, Hui Zhang, C3: Internet-scale Control Plane for Video Quality Optimization, Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, p.131-144, May 04-06, 2015, Oakland, CA
  • 24. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. Darpa Timit Acoustic Phonetic Continuous Speech Corpus Cdrom, 1993.
  • 25. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, Carlos Guestrin, PowerGraph: Distributed Graph-parallel Computation on Natural Graphs, Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, October 08-10, 2012, Hollywood, CA, USA
  • 26. Google Now. Https://www.google.com/landing/now/.
  • 27. Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, Ralf Herbrich, Web-scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine, Proceedings of the 27th International Conference on International Conference on Machine Learning, p.13-20, June 21-24, 2010, Haifa, Israel
  • 28. h20. http://www.h2o.ai.
  • 29. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. arXiv Preprint ArXiv:1512.03385, 2015.
  • 30. G. Hinton, O. Vinyals, and J. Dean. Distilling the Knowledge in a Neural Network. arXiv.org, Mar. 2015.
  • 31. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, Proceedings of the 22nd ACM International Conference on Multimedia, November 03-07, 2014, Orlando, Florida, USA
  • 32. A. Krizhevsky and G. Hinton. Cifar-10 Dataset. Https: //www.cs.toronto.edu/~kriz/cifar.html.
  • 33. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1097-1105, December 03-06, 2012, Lake Tahoe, Nevada
  • 34. J. Langford, L. Li, and A. Strehl. Vowpal Wabbit Online Learning Project, 2007.
  • 35. Y. LeCun, C. Cortes, and C. J. Burges. MNIST Handwritten Digit Database. 1998.
  • 36. X. Lei, A. W. Senior, A. Gruenstein, and J. Sorensen. Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. INTERSPEECH, Pages 662-665, 2013.
  • 37. Romain Lerallut, Diane Gasselin, Nicolas Le Roux, Large-Scale Real-Time Product Recommendation at Criteo, Proceedings of the 9th ACM Conference on Recommender Systems, September 16-20, 2015, Vienna, Austria
  • 38. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su, Scaling Distributed Machine Learning with the Parameter Server, Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, October 06-08, 2014, Broomfield, CO
  • 39. H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica, Ad Click Prediction: A View from the Trenches, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 11-14, 2013, Chicago, Illinois, USA
  • 40. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar, MLlib: Machine Learning in Apache Spark, The Journal of Machine Learning Research, v.17 n.1, p.1235-1241, January 2016
  • 41. Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, Proceedings of the 27th International Conference on Neural Information Processing Systems, p.2204-2212, December 08-13, 2014, Montreal, Canada
  • 42. Deep MNIST for Experts. Https://www.tensorflow. Org/versions/r0.10/tutorials/mnist/pros/index.html.
  • 43. Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012
  • 44. John Nagle, Congestion Control in IP/TCP Internetworks, ACM SIGCOMM Computer Communication Review, v.14 n.4, p.11-17, October 1984
  • 45. nginx [engine X]. http://nginx.org/en/.
  • 46. Vivek S. Pai, Peter Druschel, Willy Zwaenepoel, Flash: An Efficient and Portable Web Server, Proceedings of the Annual Conference on USENIX Annual Technical Conference, p.15-15, June 06-11, 1999, Monterey, California
  • 47. Portable Format for Analytics (PFA). http://dmg.org/pfa/index.html.
  • 48. PMML 4.2. http://dmg.org/pmml/v4-2-1/GeneralStructure.html.
  • 49. (Russakovsky et al., 2015) ⇒ Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. (2015). "ImageNet Large Scale Visual Recognition Challenge". In: International Journal of Computer Vision (IJCV). DOI:10.1007/s11263-015-0816-y.
  • 50. Douglas C. Schmidt, Reactor: An Object Behavioral Pattern for Concurrent Event Demultiplexing and Event Handler Dispatching, Pattern Languages of Program Design, ACM Press/Addison-Wesley Publishing Co., New York, NY, 1995
  • 51. Scikit-Learn Machine Learning in Python. http://scikit-learn.org.
  • 52. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison, Hidden Technical Debt in Machine Learning Systems, Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada
  • 53. J. Sill, G. Takács, L. Mackey, and D. Lin. Feature-weighted Linear Stacking, 2009.
  • 54. K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv Preprint ArXiv:1409.1556, 2014.
  • 55. Apple Siri. http://www.apple.com/ios/siri/.
  • 56. Skype Real Time Translator. Https://www.skype.com/en/features/skype-translator/.
  • 57. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going Deeper with Convolutions. In CVPR, Pages 1-9, 2015.
  • 58. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture for Computer Vision. arXiv Preprint ArXiv:1512.00567, 2015.
  • 59. TensorFlow Serving. Https://tensorflow.github.io/serving.
  • 60. Turi. Https://turi.com.
  • 61. Matt Welsh, David Culler, Eric Brewer, SEDA: An Architecture for Well-conditioned, Scalable Internet Services, Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, October 21-24, 2001, Banff, Alberta, Canada
  • 62. Eric P. Xing, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, Yaoliang Yu, Petuum: A New Platform for Distributed Machine Learning on Big Data, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 10-13, 2015, Sydney, NSW, Australia
  • 63. S. J. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland. The HTK Book, Version 3.4. Cambridge University Engineering Department, Cambridge, UK, 2006.
  • 64. Jeong-Min Yun, Yuxiong He, Sameh Elnikety, Shaolei Ren, Optimal Aggregation Policy for Reducing Tail Latency of Web Search, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, August 09-13, 2015, Santiago, Chile
  • 65. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, Resilient Distributed Datasets: A Fault-tolerant Abstraction for in-memory Cluster Computing, Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, April 25-27, 2012, San Jose, CA;


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 ClipperALowLatencyOnlinePredictIon Stoica
Michael J. Franklin
Daniel Crankshaw
Xin Wang
Giulio Zhou
Joseph E. Gonzalez
Clipper: A Low-latency Online Prediction Serving System2017