Difference between revisions of "Big Data Platform"

From GM-RKB
Jump to: navigation, search
 
Line 12: Line 12:
 
** a [[Small Data Plotfom]].
 
** a [[Small Data Plotfom]].
 
** a [[Machine Learning Platform]].
 
** a [[Machine Learning Platform]].
* <B>See:</B> [[Spark Framework]].
+
* <B>See:</B> [[Spark Framework]], [[Big Data Architecture]].
 
----
 
----
 
----
 
----

Latest revision as of 05:16, 4 December 2019

An Big Data Platform is a data platform for Big Data datasets that support Big Data tasks.



References

2018

  • Reza Shiftehfar. (2018). "Uber’s Big Data Platform: 100+ Petabytes with Minute Latency." Blog post, October 17, 2018
    • QUOTE: ... Over time, the need for more insights has resulted in over 100 petabytes of analytical data that needs to be cleaned, stored, and served with minimum latency through our Hadoop-based Big Data platform. Since 2014, we have worked to develop a Big Data solution that ensures data reliability, scalability, and ease-of-use, and are now focusing on increasing our platform’s speed and efficiency. In this article, we dive into Uber’s Hadoop platform journey and discuss what we are building next to expand this rich and complex ecosystem. ...

      ... By early 2017, our Big Data platform was used by engineering and operations teams across the company, enabling them to access new and historical data all in one place. Users could easily access data in Hive, Presto, Spark, Vertica, Notebook, and more warehouse options all through a single UI portal tailored to their needs. With over 100 petabytes of data in HDFS, 100,000 vcores in our compute cluster, 100,000 Presto queries per day, 10,000 Spark jobs per day, and 20,000 Hive queries per day, our Hadoop analytics architecture was hitting scalability limitations and many services were affected by high data latency.