Uber Big Data Platform

From GM-RKB
Jump to navigation Jump to search

An Uber Big Data Platform is a in-house Big Data platform used by User, Inc..



References

2018

  • Reza Shiftehfar. (2018). "Uber’s Big Data Platform: 100+ Petabytes with Minute Latency." Blog post, October 17, 2018
    • QUOTE: ... Over time, the need for more insights has resulted in over 100 petabytes of analytical data that needs to be cleaned, stored, and served with minimum latency through our Hadoop-based Big Data platform. Since 2014, we have worked to develop a Big Data solution that ensures data reliability, scalability, and ease-of-use, and are now focusing on increasing our platform’s speed and efficiency. In this article, we dive into Uber’s Hadoop platform journey and discuss what we are building next to expand this rich and complex ecosystem. ...

      ... By early 2017, our Big Data platform was used by engineering and operations teams across the company, enabling them to access new and historical data all in one place. Users could easily access data in Hive, Presto, Spark, Vertica, Notebook, and more warehouse options all through a single UI portal tailored to their needs. With over 100 petabytes of data in HDFS, 100,000 vcores in our compute cluster, 100,000 Presto queries per day, 10,000 Spark jobs per day, and 20,000 Hive queries per day, our Hadoop analytics architecture was hitting scalability limitations and many services were affected by high data latency.