AWS Data Pipeline
Jump to navigation
Jump to search
An AWS Data Pipeline is a data transfer service between different AWS compute services and AWS storage services.
- …
- Counter-Example(s):
- Counter-Example(s):
- See: Amazon SimpleDB, Amazon DynamoDB, NoSQL, Solid-State Drive, Memcached, Redis, Amazon Relational Database Service.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Amazon_Web_Services#Database Retrieved:2015-8-21.
- AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage services (e.g., Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon EMR). In other words this service is simply a data-driven workload management system, which provides a simple management API for managing and monitoring of data-driven workloads in cloud applications.
- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html
- QUOTE: AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you've set up.
The following components of AWS Data Pipeline work together to manage your data:
- A pipeline definition specifies the business logic of your data management.
- A pipeline schedules and runs tasks. You upload your pipeline definition to the pipeline, and then activate the pipeline. You can edit the pipeline definition for a running pipeline and activate the pipeline again for it to take effect. You can deactivate the pipeline, modify a data source, and then activate the pipeline again. When you are finished with your pipeline, you can delete it.
- Task Runner polls for tasks and then performs those tasks. For example, Task Runner could copy log files to Amazon S3 and launch Amazon EMR clusters. Task Runner is installed and runs automatically on resources created by your pipeline definitions. You can write a custom task runner application, or you can use the Task Runner application that is provided by AWS Data Pipeline.
- QUOTE: AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you've set up.