Apache Oozie Platform

From GM-RKB
Jump to navigation Jump to search

An Apache Oozie Platform is a workflow orchestration platform.



References

2015

  • (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Oozie Retrieved:2015-7-15.
    • Oozie is a workflow scheduler system to manage Hadoop jobs. It is a server-based Workflow Engine specialized in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs. Oozie is implemented as a Java Web-Application that runs in a Java Servlet-Container.

      For the purposes of Oozie, a workflow is a collection of actions (e.g. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Directed Acyclic Graph). A "control dependency" from one action to another means that the second action can't run until the first action has completed.

      The workflow actions start jobs in remote systems (Hadoop or Pig). Upon action completion, the remote systems call back Oozie to notify the action completion; at this point Oozie proceeds to the next action in the workflow.

      Oozie workflows contain control flow nodes and action nodes.

      Control flow nodes define the beginning and the end of a workflow (start, end and fail nodes) and provide a mechanism to control the workflow execution path (decision, fork and join nodes).

      Action nodes are the mechanism by which a workflow triggers the execution of a computation/processing task. Oozie provides support for different types of actions: Hadoop MapReduce, Hadoop file system, Pig, SSH, HTTP, eMail and Oozie sub-workflow. Oozie can be extended to support additional types of actions.

      Oozie workflows can be parameterized (using variables like ${inputDir} within the workflow definition). When submitting a workflow job, values for the parameters must be provided. If properly parameterized (using different output directories), several identical workflow jobs can run concurrently.

      Oozie is distributed under the Apache License 2.0.

2013

  • http://oozie.apache.org/
    • Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

      Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

      Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

      Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

      Oozie is a scalable, reliable and extensible system.