by Ran Silberman
DevOps for Big DataCluster management tools
20.4.2015
Hosted by:
FullStack Developers Israel
● Input photos and locations
● Batch: Display statistics on bird,
location & photographer.
● Real-time: Count how many birds
were seen in the last minute from
each species
Application
requirements
● Volume growth
● Velocity of Streaming and Batch
● Same env from DEV to PROD
● Data from PROD to test on DEV
● Manage Deployment of many
applications on many nodes
Big Data lifecycle
considerations
● HDFS for storing the data
● Hive for batch processing
● Solr/elasticsearch for search
● Spark for streaming
● ...Home-grown applications
Choosing the
Infrastructures
● Hortonworks Ambari
or
● Cloudera Manager
Choosing the
Management
tool
● All platforms & infrastructures are
installed by the tool
● Monitoring, Audits & logs are
built-in
● Easy installation and upgrade
● Save scripting work
What are the
news for DevOps
pipeline?
● Manage cluster with GUI or API
● Hadoop installation and setup
● System monitoring & alerts
● Built-in systems: Zookeeper,
Spark, Hive Impala and more
● Ability to add parcels
CM features
Custom Service Descriptors● CSD is a descriptor for a service
used by CM
● Defines how to install start/stop
a service and the logic used by
CM
CSD
● Archive data in Hadoop
● Growing data affects DWH
performance & capabilities
● Creating realistic testing data
● Dev and Prod env. may differ in
cluster size (dev may be 1 node)
More DevOps
considerations
Tools Comparison
CM Ambari
Licence Paid Ent edition Free Apache Open Source
Technology Cloudera puppet, ganglia, nagios
Dependency CDH HDP
Manage cluster Parcels Yum
REST API + +
Extra Features Rolling Upgrade, 3rd-
parties Mngt,
Extendable by REST API
CM features
Express Enterprise
Subscription Free Annual
Deployment &
Configuration
+ +
Management + +
Monitoring + +
Diagnostic + +
Extra Features Reports, Rollbacks, Rolling
Upgrade, AD Kerberos, Kerberos
wizard, Backup & DR
● Fast Deploy
● Easy management by GUI
● Built in monitoring and alerts
● Simple upgrades
● Same management and deploy
in Dev and Prod
Pros. of Hadoop
Management
tools
● Tied to specific vendor
proprietary system
● Tied to system version by
Parcels
● Less flexibility to low-level
management
Cons. of Hadoop
Management
tools
Recommended