AWS Glue Service

From GM-RKB
Jump to navigation Jump to search

An AWS Glue Service is a fully-managed serverless AWS ETL service.



References

2019

  • https://aws.amazon.com/glue/faqs/#AWS_Glue_Data_Catalog/
    • QUOTE: ... Q. What are the main components of AWS Glue?

      AWS Glue consists of a Data Catalog which is a central metadata repository, an ETL engine that can automatically generate Scala or Python code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. Together, these automate much of the undifferentiated heavy lifting involved with discovering, categorizing, cleaning, enriching, and moving data, so you can spend more time analyzing your data. …

2017a

2017b

2017

  • https://console.aws.amazon.com/glue/home?region=us-east-1#get-started:
    • Build your AWS Glue Data Catalog: AWS Glue automatically stores metadata in a central data catalog. It can create table definitions for many common data stores, including, S3 buckets, web logs, and AWS databases. AWS Glue recognizes, infers, organizes, and classifies your data.
    • Generate and edit transformations: PySpark transformation scripts are auto generated using source and target metadata. You can store customized versions to transform your data to meet your business needs. AWS Glue provides an environment to modify your jobs.
    • Schedule and run your jobs: AWS Glue runs your ETL jobs in a serverless environment. You don’t need to set up the infrastructure, you just use Amazon's infrastructure and pay for the resources you use. You can define triggers to run jobs based on a schedule or event. AWS Glue enables you to monitor your jobs.

2017

2017

2016