Azkaban

Preview:

Citation preview

Azkaban from

Solve the problem of Hadoop job dependencies

Now Voldemort can easily manage his Hadoop jobs

Anatoliy Nikulin

OverviewAzkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs

Features:● Compatible with any version of Hadoop● Easy to use web UI● Simple web and http workflow uploads● Project workspaces● Scheduling of workflows● Modular and pluggable● Authentication and Authorization● Tracking of user actions● Email alerts on failure and successes● SLA alerting and auto killing● Retrying of failed jobs

Plugins

Azkaban Plugins

● HDFS Browser● Job Types Plugins● Azkaban Security Manager● Job Summary● Pig Visualizer● Reportal

Azkaban Pros/ConsPros:● Simple workflow configuration● Rich DAG visualization● User-friendly Web UI● Jobs history● Easy access to log files

Cons:● Small community (mostly Linkedin)● Only time based scheduling.● Unable to run none-Hadoop tasks in distributed mode

Architecture

There are two versions:● solo server mode - All in one process (H2 instead MySQL). Good choice for investigation ● two server mode - For production work

What about none-Hadoop jobs?

Azkaban able to handle it● It can run command-line processes● Good alternative for Crontab

Nice UI. Isn’t It?

What about native Hadoop scheduler?

Oozie - Scheduler framework. Also good tool

Pros:● Rich and very powerful configuration abilities for Workflow● Rich API (REST, command-line)● Integrated with Cloudera● Large community● Good documentation

Cons:● Complex configuration with XML hell !● Pure visualization of workflow

Any questions?

Resources

http://azkaban.github.io/http://oozie.apache.org/

Recommended