Upload
anatoliy-nikulin
View
38
Download
2
Tags:
Embed Size (px)
Citation preview
Azkaban from
Solve the problem of Hadoop job dependencies
Now Voldemort can easily manage his Hadoop jobs
Anatoliy Nikulin
OverviewAzkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs
Features:● Compatible with any version of Hadoop● Easy to use web UI● Simple web and http workflow uploads● Project workspaces● Scheduling of workflows● Modular and pluggable● Authentication and Authorization● Tracking of user actions● Email alerts on failure and successes● SLA alerting and auto killing● Retrying of failed jobs
Plugins
Azkaban Plugins
● HDFS Browser● Job Types Plugins● Azkaban Security Manager● Job Summary● Pig Visualizer● Reportal
Azkaban Pros/ConsPros:● Simple workflow configuration● Rich DAG visualization● User-friendly Web UI● Jobs history● Easy access to log files
Cons:● Small community (mostly Linkedin)● Only time based scheduling.● Unable to run none-Hadoop tasks in distributed mode
Architecture
There are two versions:● solo server mode - All in one process (H2 instead MySQL). Good choice for investigation ● two server mode - For production work
What about none-Hadoop jobs?
Azkaban able to handle it● It can run command-line processes● Good alternative for Crontab
Nice UI. Isn’t It?
What about native Hadoop scheduler?
Oozie - Scheduler framework. Also good tool
Pros:● Rich and very powerful configuration abilities for Workflow● Rich API (REST, command-line)● Integrated with Cloudera● Large community● Good documentation
Cons:● Complex configuration with XML hell !● Pure visualization of workflow
Any questions?
Resources
http://azkaban.github.io/http://oozie.apache.org/