
Often it is necessary to run Oozie workflows on regular time intervals, but in coordination with unpredictable levels of data availability or events. If the task fails to invoke the callback URL, Oozie can poll the task for completion.


When Oozie starts a task, it provides a unique callback HTTP URL to the task, thereby notifying that URL when it’s complete. Oozie detects completion of tasks through callback and polling. This allows Oozie to leverage other capabilities within the Hadoop stack to balance loads and handle failures. Oozie triggers workflow actions, but Hadoop MapReduce executes them. Action nodes trigger the execution of tasks. In this way, Oozie controls the workflow execution path with decision, fork and join nodes. Control nodes define job chronology, setting rules for beginning and ending a workflow. Oozie Bundle provides a way to package multiple coordinator and workflow jobs and to manage the lifecycle of those jobs.Īn Oozie Workflow is a collection of actions arranged in a Directed Acyclic Graph (DAG).

Oozie can also schedule jobs specific to a system, like Java programs or shell scripts.Īpache Oozie is a tool for Hadoop operations that allows cluster administrators to build complex data transformations out of multiple component tasks. It is integrated with the Hadoop stack, with YARN as its architectural center, and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie combines multiple jobs sequentially into one logical unit of work. What Oozie DoesĪpache Oozie is a Java Web application used to schedule Apache Hadoop jobs. Apache Oozie provides some of the operational services for a Hadoop cluster, specifically around job scheduling within the cluster. The blueprint for Enterprise Hadoop includes Apache™ Hadoop’s original data storage and data processing layers and also adds components for services that enterprises must have in a modern data architecture: data integration and governance, security and operations.
