2017 ClipperALowLatencyOnlinePredict

(Crankshaw et al., 2017) ⇒ Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. (2017). “Clipper: A Low-latency Online Prediction Serving System.” In: Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. ISBN:978-1-931971-37-9

Subject Headings: Trained Predictive Model Deployment, Prediction Serving System, TensorFlow Serving System.

Notes

Cited By

Quotes

Abstract

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.

In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the Tensorflow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.

References

1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, Et Al. Tensorflow: Large-scale Machine Learning on Heterogeneous Systems, 2015. Software Available from Tensorflow.org.
2. A. Agarwal, S. Bird, M. Cozowicz, L. Hoang, J. Langford, S. Lee, J. Li, D. Melamed, G. Oshri, O. Ribas, Et Al. A Multiworld Testing Decision Service. arXiv Preprint ArXiv:1606.03966, 2016.
3. Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, Liang Zhang, LASER: A Scalable Response Prediction Platform for Online Advertising, Proceedings of the 7th ACM International Conference on Web Search and Data Mining, February 24-28, 2014, New York, New York, USA
4. Sharad Agarwal, Jacob R. Lorch, Matchmaking for Online Games and Other Latency-sensitive P2P Systems, ACM SIGCOMM Computer Communication Review, v.39 n.4, October 2009
5. Amr Ahmed, Moahmed Aly, Joseph Gonzalez, Shravan Narayanamurthy, Alexander J. Smola, Scalable Inference in Latent Variable Models, Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, February 08-12, 2012, Seattle, Washington, USA
6. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, v.32 n.1, p.48-77, 2003
7. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A Cpu and Gpu Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), Volume 4, Page 3. Austin, TX, 2010.
8. Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, 2006
9. Leo Breiman, Bagging Predictors, Machine Learning, v.24 n.2, p.123-140, Aug. 1996
10. C. Chelba, D. Bikel, M. Shugrina, P. Nguyen, and S. Kumar. Large Scale Language Modeling in Automatic Speech Recognition. arXiv Preprint ArXiv:1210.8440, 2012.
11. J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting Distributed Synchronous SGD. arXiv.org, Apr. 2016.
12. T. Chen and C. Guestrin. XGBoost: A Scalable Tree Boosting System. arXiv.org, Mar. 2016.
13. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv Preprint ArXiv:1512.01274, 2015.
14. Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, Karthik Kalyanaraman, Project Adam: Building An Efficient and Scalable Deep Learning Training System, Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, October 06-08, 2014, Broomfield, CO
15. Dah-Ming Chiu, Raj Jain, Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks, Computer Networks and ISDN Systems, v.17 n.1, p.1-14, June 10, 1989
16. R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop, Number EPFL-CONF-192376, 2011.
17. F. J. Corbato. A Paging Experiment with the Multics System. 1968.
18. Microsoft Cortana. Https://www.microsoft.com/en-us/mobile/experiences/cortana/.
19. D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. In CIDR 2015, 2015.
20. James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath, The YouTube Video Recommendation System, Proceedings of the Fourth ACM Conference on Recommender Systems, September 26-30, 2010, Barcelona, Spain
21. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Andrew Y. Ng, Large Scale Distributed Deep Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1223-1231, December 03-06, 2012, Lake Tahoe, Nevada
22. J. Donahue. Caffenet. Https://github.com/BVLC/ Caffe/tree/master/models/bvlc_reference_caffenet.
23. Aditya Ganjam, Junchen Jiang, Xi Liu, Vyas Sekar, Faisal Siddiqi, Ion Stoica, Jibin Zhan, Hui Zhang, C3: Internet-scale Control Plane for Video Quality Optimization, Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, p.131-144, May 04-06, 2015, Oakland, CA
24. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. Darpa Timit Acoustic Phonetic Continuous Speech Corpus Cdrom, 1993.
25. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, Carlos Guestrin, PowerGraph: Distributed Graph-parallel Computation on Natural Graphs, Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, October 08-10, 2012, Hollywood, CA, USA
26. Google Now. Https://www.google.com/landing/now/.
27. Thore Graepel, Joaquin Quiñonero Candela, Thomas Borchert, Ralf Herbrich, Web-scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine, Proceedings of the 27th International Conference on International Conference on Machine Learning, p.13-20, June 21-24, 2010, Haifa, Israel
28. h20. http://www.h2o.ai.
29. K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. arXiv Preprint ArXiv:1512.03385, 2015.
30. G. Hinton, O. Vinyals, and J. Dean. Distilling the Knowledge in a Neural Network. arXiv.org, Mar. 2015.
31. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, Proceedings of the 22nd ACM International Conference on Multimedia, November 03-07, 2014, Orlando, Florida, USA
32. A. Krizhevsky and G. Hinton. Cifar-10 Dataset. Https: //www.cs.toronto.edu/~kriz/cifar.html.
33. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1097-1105, December 03-06, 2012, Lake Tahoe, Nevada
34. J. Langford, L. Li, and A. Strehl. Vowpal Wabbit Online Learning Project, 2007.
35. Y. LeCun, C. Cortes, and C. J. Burges. MNIST Handwritten Digit Database. 1998.
36. X. Lei, A. W. Senior, A. Gruenstein, and J. Sorensen. Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. INTERSPEECH, Pages 662-665, 2013.
37. Romain Lerallut, Diane Gasselin, Nicolas Le Roux, Large-Scale Real-Time Product Recommendation at Criteo, Proceedings of the 9th ACM Conference on Recommender Systems, September 16-20, 2015, Vienna, Austria
38. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su, Scaling Distributed Machine Learning with the Parameter Server, Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, October 06-08, 2014, Broomfield, CO
39. H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica, Ad Click Prediction: A View from the Trenches, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 11-14, 2013, Chicago, Illinois, USA
40. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar, MLlib: Machine Learning in Apache Spark, The Journal of Machine Learning Research, v.17 n.1, p.1235-1241, January 2016
41. Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, Recurrent Models of Visual Attention, Proceedings of the 27th International Conference on Neural Information Processing Systems, p.2204-2212, December 08-13, 2014, Montreal, Canada
42. Deep MNIST for Experts. Https://www.tensorflow. Org/versions/r0.10/tutorials/mnist/pros/index.html.
43. Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012
44. John Nagle, Congestion Control in IP/TCP Internetworks, ACM SIGCOMM Computer Communication Review, v.14 n.4, p.11-17, October 1984
45. nginx [engine X]. http://nginx.org/en/.
46. Vivek S. Pai, Peter Druschel, Willy Zwaenepoel, Flash: An Efficient and Portable Web Server, Proceedings of the Annual Conference on USENIX Annual Technical Conference, p.15-15, June 06-11, 1999, Monterey, California
47. Portable Format for Analytics (PFA). http://dmg.org/pfa/index.html.
48. PMML 4.2. http://dmg.org/pmml/v4-2-1/GeneralStructure.html.
49. (Russakovsky et al., 2015) ⇒ Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. (2015). “ImageNet Large Scale Visual Recognition Challenge". In: International Journal of Computer Vision (IJCV). DOI:10.1007/s11263-015-0816-y.
50. Douglas C. Schmidt, Reactor: An Object Behavioral Pattern for Concurrent Event Demultiplexing and Event Handler Dispatching, Pattern Languages of Program Design, ACM Press/Addison-Wesley Publishing Co., New York, NY, 1995
51. Scikit-Learn Machine Learning in Python. http://scikit-learn.org.
52. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison, Hidden Technical Debt in Machine Learning Systems, Proceedings of the 28th International Conference on Neural Information Processing Systems, p.2503-2511, December 07-12, 2015, Montreal, Canada
53. J. Sill, G. Takács, L. Mackey, and D. Lin. Feature-weighted Linear Stacking, 2009.
54. K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv Preprint ArXiv:1409.1556, 2014.
55. Apple Siri. http://www.apple.com/ios/siri/.
56. Skype Real Time Translator. Https://www.skype.com/en/features/skype-translator/.
57. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going Deeper with Convolutions. In CVPR, Pages 1-9, 2015.
58. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture for Computer Vision. arXiv Preprint ArXiv:1512.00567, 2015.
59. TensorFlow Serving. Https://tensorflow.github.io/serving.
60. Turi. Https://turi.com.
61. Matt Welsh, David Culler, Eric Brewer, SEDA: An Architecture for Well-conditioned, Scalable Internet Services, Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, October 21-24, 2001, Banff, Alberta, Canada
62. Eric P. Xing, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, Yaoliang Yu, Petuum: A New Platform for Distributed Machine Learning on Big Data, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 10-13, 2015, Sydney, NSW, Australia
63. S. J. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland. The HTK Book, Version 3.4. Cambridge University Engineering Department, Cambridge, UK, 2006.
64. Jeong-Min Yun, Yuxiong He, Sameh Elnikety, Shaolei Ren, Optimal Aggregation Policy for Reducing Tail Latency of Web Search, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, August 09-13, 2015, Santiago, Chile
65. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, Resilient Distributed Datasets: A Fault-tolerant Abstraction for in-memory Cluster Computing, Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, April 25-27, 2012, San Jose, CA;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2017 ClipperALowLatencyOnlinePredict	Ion Stoica Michael J. Franklin Daniel Crankshaw Xin Wang Giulio Zhou Joseph E. Gonzalez			Clipper: A Low-latency Online Prediction Serving System						2017

2017 ClipperALowLatencyOnlinePredict

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search