Apache Airflow Workflow Management Platform

From GM-RKB
Jump to navigation Jump to search

An Apache Airflow Workflow Management Platform is a open-source automated workflow management platform.



References

2022

  • HTTP://chat.openai.com
    • A: ... Apache Airflow is a platform for managing and scheduling workflows. It is designed to allow users to define, schedule, and monitor the execution of workflows, typically in the context of data processing pipelines or other batch processing tasks. It includes a web-based interface for managing and monitoring workflows, as well as a rich set of features for defining and scheduling tasks, including support for dependencies and error handling. ...

      Overall, Apache Airflow and Zapier are both useful tools for managing and automating workflows, but they have different capabilities and are intended for different use cases. Apache Airflow is better suited for complex, data-intensive workflows, while Zapier is more focused on automating tasks and processes across multiple web-based applications and services.

2021

  • https://airflow.apache.org/docs/apache-airflow/2.1.0/
    • QUOTE: Airflow is a platform to programmatically author, schedule and monitor workflows.

      Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

      When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.

2017a

  • https://airflow.incubator.apache.org/
    • QUOTE: Airflow is a platform to programmatically author, schedule and monitor workflows.

      Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed. When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.

      Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!). Airflow is not in the Spark Streaming or Storm space, it is more comparable to Oozie or Azkaban. Workflows are expected to be mostly static or slowly changing. You can think of the structure of the tasks in your workflow as slightly more dynamic than a database structure would be. Airflow workflows are expected to look similar from a run to the next, this allows for clarity around unit of work and continuity.

2017b

  • https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8
    • QUOTE: … Architecture

      … While you can get up and running with Airflow in just a few commands, the complete architecture has the following components:

      • The job definitions, in source control.
      • A rich CLI (command line interface) to test, run, backfill, describe and clear parts of your DAGs.
      • A web application, to explore your DAGs definition, their dependencies, progress, metadata and logs. The web server is packaged with Airflow and is built on top of the Flask Python web framework.
      • A metadata repository, typically a MySQL or Postgres database that Airflow uses to keep track of task job statuses and other persistent information.
      • An array of workers, running the jobs task instances in a distributed fashion.
      • Scheduler processes, that fire up the task instances that are ready to run.