26
Creating a Multinode Hadoop cluster in 4 mins using docker containers Rachit Arora [email protected] IBM, India Software Labs

Creating a MultinodeHadoop cluster in 4 mins using docker ... · PDF fileIBM BigInsights on Cloud – Basic Plan Cheap – Pay for only what you need Scalable –Add or remove nodes

Embed Size (px)

Citation preview

CreatingaMultinode Hadoopclusterin4mins usingdocker

containersRachitArora

[email protected],IndiaSoftwareLabs

Asadatascientist

• Iwanttorunmyanalyticsjobs• Socialmediaanalytics• Textanalytics(StructureandUnstructured)

• Iwanttorunqueriesondemand• IwanttorunRscripts• IwanttosubmitSparkjobs

• IneedaHadoopclusterondemandtofulfilmyjobs• IdonotwanttospendtimetosetuptheHadoopcluster

IhaveaHadoopCluster

• Idonotwanttomanageit• IdonotwanttoPatchthemforupgrades• Idonotneedthisclusterrunningallthetime.• Iwanttoscaleanddescaletheclusterbasedonjobload

TechnicalInterpretationofcustomerneeds

• MultinodeHadoopclusterinminutes• Elasticity– Addorremovedatanodesondemand• Economical• FullyManaged

• Choosewhattoinstallandkeepitrunningallthetime• Repeatable,scalable&Highlyavailableprovisioninginfra• Minimizedisruptionsduringpatching• SupportforServicecomposition• AutoHealServices

TypicalHadoopCluster- Setup

• Getthesuitablehardware• Preparehostmachine• Setupvariousnetworks

• Private• Public• Management

• Fetchthebinariesfortheinstall• Preparetheblueprint/config filefortheinstall• Starttheinstall• Manyatimesinstallfails,debugandretryagain.

BluemixIBM'sOpenCloudArchitectureimplementationbasedontheCloudFoundryproject

Bluemix isanopen-standard,cloud-basedplatformforbuilding,managing,andrunningapplicationsofalltypes(web,mobile,bigdata,newsmartdevices,andsoon).

Bluemix Services

BigInsights oncloud(Basic)

ComparisonoftimetoSetupHadoopcluster

0

50

100

150

200

250

4Node8Node(HA)

12Node(HA)

HadoopClusterSetup

DockerBased Pre-configEnvironment NativeInstallation

EarlierExperiments

Option OS Provisioning ConfigCluster Management /

Updates

1 Bare metal Chef Chef

2 xCAT – Stateful(Create your own VMs) PostScripts xCAT - updateNode

3 xCAT – Sysclone(Image from current system) Not Needed xCAT - updateNode

4 Bare metal PostScripts xCAT - updateNode

5 Cloud Provider Specific Images Not Needed Manual/Scripts

6 Standard-ISO Image Anaconda -post scripts Manual/Scripts

GuidingPrinciples

• Virtualizationhelpsrepeatability,lesserfailures&speed• Maintenance• Performance(EquivalenttoBaremetal)• Useopensourcefromanactivecommunity• Cloud-agnostic

ThinkContainers!!

Docker :ThinkingasVMs?(mistake)

• KeyConcepts• Containerssharehostkernel• Images&Imageregistry• Build-ableImages(makelike)• Imagesarelayered&hencecanbeextended

• Relevantconcepts• HostFSdirectorycanbemountedas’volumes’

• IPspecificport-forwarding

Docker inHadoopClusteronCloud

• EachclusternodeisavirtualnodepoweredbyDocker.EachnodeoftheclusterisaDocker container

• Docker containersrunonabunchofbaremetalhosts(Docker-hosts)• EachHadoopclusterwillhavemultiplenodes/Docker containersspanningmultiplehosts

• Docker• Containermanagement- Custom• Multihostnetworking– Weave(Pluggablearchitecturetosupportothersolutions)

• Registry– Private

TypicalClusters

LDAP

KMS

MySQL

SSH

Master Data1

Data2 Data3LDAP

KMS

MySQL

SSH

Master Data1

Data2

LDAP

KMS

MySQL

SSH

Master Data1

Data2 Data3

Data4 Data5Cluster1

Cluster2

Cluster3

NetworkBoundary

Docker Images

• Masternode• DataNode• EdgeNode• Auxiliaryserviceimages

• Ldap• Mysql• Ambari server• KMS

MultihostDocker networkingwithweave

• Weave basedoverlaynetworkamongIOPnodes• One/26privatesubnetpercluster(172.x.x.x)• MasternodetohaveapublicIP– ports-forwarding

• PortablepublicIPs• Networkspeed(sharedwithothermasters)

• EdgenodewillbeaccessibleusingapublicIP• UsercanSSHandrunHive,Hbase,Hadoop&Sparkshells

• Privatenetwork• HighSpeed• Secure

NetworkArchitecture

A A A B

bond1 bond0 eth-weave

docker0

10Gbps Softlayerprivatenetwork

10Gbps Softlayerpublic network

B B B C

eth-weave

docker0

C C C C

eth-weave

docker0

bond1 bond0 bond1 bond0

dockerportforwarding

*dockerICC=false(nointercontainercommunicationoverdocker0network)*Allinter-containercommunication isthroughweavenetwork*Oneweave’sprivatesubnetpercluster(Nocommunication acrosssubnets)

weaveoverlaynetworkonSoftlayer privatenetwork

Host1 Host2 Host3

Masternode

B BBMasternode

Edgenode

Datanode

ClusterProvisioning

ProvisioningInfrastructure

• Provisioninginfrastructureconsistsof• ClusterManagerthatprovidesRESTAPItocreatecluster• APIGatewayapplication• Deploymentagent• Deployer scriptsthatactuallydoallthework• Andofcoursethedatabasethatholdsallmetadatarelatedtoclusters,hosts,nodesetc.

Phasesinvolved

1. AcquireHardware2. DeployProvisioningInfra3. AddHardwaretoResourcePool4. PrepareHostMachines5. OrchestrateClusterlifecycle

1. CreateCluster,AddNodes,RemoveNodes,DeleteCluster

Howisaclustercreated?

ClusterManager DBAPIGateway

2.Manageresources1.CreateCluster

DeployerAgent

DeployerAgent

DeployerAgent

3.CreateCluster

4.PrepareNode

7.InstallIOP

5.GetNodedetails

6.GetBlueprint

Summary

• ClusterCreationinaround90secs ,2-3mins tostartservices• Reliable,repeatableclusterprovisioning• Fullymanaged• HighlySecure• Closetozerofailurerates• EasyPatching• Optimaluseofresources• Costeffectiveservices

IBMBigInsightsonCloud– BasicPlan

Cheap – Pay for only what you need

Scalable – Add or remove nodes as per your workload

Modular – Compose and deploy applications using data and analytics services on Bluemix

Reliable – Durable SWIFT Object Store service always persists data

Secure, flexible, managed Apache Hadoop clusters

IBMOpenPlatform

Knox

Ambari

OozieFlume

Pig

HadoopHDFS/MapReduce/YARN

Zookeeper

Parquet

HBase Spark

Hive

Sqoop

Avro

Betaopennow!

References• Bluemix :http://www.ibm.com/cloud-computing/bluemix• BigInsights OnCloud

• http://www-03.ibm.com/software/products/en/ibm-biginsights-for-apache-hadoop

• https://developer.ibm.com/clouddataservices/docs/biginsights/get-started-in-bluemix/what-is-basic-plan/•

https://www.youtube.com/watch?v=0DN1cDEs6ME

• Tutorials• https://www.youtube.com/watch?v=S3n9L2X91xM• https://www.youtube.com/watch?v=t1Nuy_zrL7U

• IOP:http://www.ibm.com/analytics/us/en/technology/hadoop/• Docker• Weave