2017 DataIngestionfortheConnectedWor

From GM-RKB
(Redirected from Meehan et al., 2017)
Jump to navigation Jump to search

Subject Headings: Data Ingestion Task, Data Ingestion System.

Notes

Cited By

Quotes

Abstract

In this paper, we argue that in many "Big Data" applications, getting data into the system correctly and at scale via traditional ETL (Extract, Transform, and Load) processes is a fundamental roadblock to being able to perform timely analytics or make real-time decisions. The best way to address this problem is to build a new architecture for ETL which takes advantage of the push-based nature of a stream processing system. We discuss the requirements for a streaming ETL engine and describe a generic architecture which satisfies those requirements. We also describe our implementation of streaming ETL using a scalable messaging system (Apache Kafka), a transactional stream processing system (S-Store), and a distributed polystore (Intel's BigDAWG), as well as propose a new time-series database optimized to handle ingestion internally.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 DataIngestionfortheConnectedWorJohn Meehan
Cansu Aslantas
Stan Zdonik
Nesime Tatbul
Jiang Du
Data Ingestion for the Connected World.