Upload
nguyenliem
View
216
Download
1
Embed Size (px)
Citation preview
CreatingaMultinode Hadoopclusterin4mins usingdocker
containersRachitArora
[email protected],IndiaSoftwareLabs
Asadatascientist
• Iwanttorunmyanalyticsjobs• Socialmediaanalytics• Textanalytics(StructureandUnstructured)
• Iwanttorunqueriesondemand• IwanttorunRscripts• IwanttosubmitSparkjobs
IhaveaHadoopCluster
• Idonotwanttomanageit• IdonotwanttoPatchthemforupgrades• Idonotneedthisclusterrunningallthetime.• Iwanttoscaleanddescaletheclusterbasedonjobload
TechnicalInterpretationofcustomerneeds
• MultinodeHadoopclusterinminutes• Elasticity– Addorremovedatanodesondemand• Economical• FullyManaged
• Choosewhattoinstallandkeepitrunningallthetime• Repeatable,scalable&Highlyavailableprovisioninginfra• Minimizedisruptionsduringpatching• SupportforServicecomposition• AutoHealServices
TypicalHadoopCluster- Setup
• Getthesuitablehardware• Preparehostmachine• Setupvariousnetworks
• Private• Public• Management
• Fetchthebinariesfortheinstall• Preparetheblueprint/config filefortheinstall• Starttheinstall• Manyatimesinstallfails,debugandretryagain.
BluemixIBM'sOpenCloudArchitectureimplementationbasedontheCloudFoundryproject
Bluemix isanopen-standard,cloud-basedplatformforbuilding,managing,andrunningapplicationsofalltypes(web,mobile,bigdata,newsmartdevices,andsoon).
ComparisonoftimetoSetupHadoopcluster
0
50
100
150
200
250
4Node8Node(HA)
12Node(HA)
HadoopClusterSetup
DockerBased Pre-configEnvironment NativeInstallation
EarlierExperiments
Option OS Provisioning ConfigCluster Management /
Updates
1 Bare metal Chef Chef
2 xCAT – Stateful(Create your own VMs) PostScripts xCAT - updateNode
3 xCAT – Sysclone(Image from current system) Not Needed xCAT - updateNode
4 Bare metal PostScripts xCAT - updateNode
5 Cloud Provider Specific Images Not Needed Manual/Scripts
6 Standard-ISO Image Anaconda -post scripts Manual/Scripts
GuidingPrinciples
• Virtualizationhelpsrepeatability,lesserfailures&speed• Maintenance• Performance(EquivalenttoBaremetal)• Useopensourcefromanactivecommunity• Cloud-agnostic
Docker :ThinkingasVMs?(mistake)
• KeyConcepts• Containerssharehostkernel• Images&Imageregistry• Build-ableImages(makelike)• Imagesarelayered&hencecanbeextended
• Relevantconcepts• HostFSdirectorycanbemountedas’volumes’
• IPspecificport-forwarding
Docker inHadoopClusteronCloud
• EachclusternodeisavirtualnodepoweredbyDocker.EachnodeoftheclusterisaDocker container
• Docker containersrunonabunchofbaremetalhosts(Docker-hosts)• EachHadoopclusterwillhavemultiplenodes/Docker containersspanningmultiplehosts
• Docker• Containermanagement- Custom• Multihostnetworking– Weave(Pluggablearchitecturetosupportothersolutions)
• Registry– Private
TypicalClusters
LDAP
KMS
MySQL
SSH
Master Data1
Data2 Data3LDAP
KMS
MySQL
SSH
Master Data1
Data2
LDAP
KMS
MySQL
SSH
Master Data1
Data2 Data3
Data4 Data5Cluster1
Cluster2
Cluster3
NetworkBoundary
Docker Images
• Masternode• DataNode• EdgeNode• Auxiliaryserviceimages
• Ldap• Mysql• Ambari server• KMS
MultihostDocker networkingwithweave
• Weave basedoverlaynetworkamongIOPnodes• One/26privatesubnetpercluster(172.x.x.x)• MasternodetohaveapublicIP– ports-forwarding
• PortablepublicIPs• Networkspeed(sharedwithothermasters)
• EdgenodewillbeaccessibleusingapublicIP• UsercanSSHandrunHive,Hbase,Hadoop&Sparkshells
• Privatenetwork• HighSpeed• Secure
NetworkArchitecture
A A A B
bond1 bond0 eth-weave
docker0
10Gbps Softlayerprivatenetwork
10Gbps Softlayerpublic network
B B B C
eth-weave
docker0
C C C C
eth-weave
docker0
bond1 bond0 bond1 bond0
dockerportforwarding
*dockerICC=false(nointercontainercommunicationoverdocker0network)*Allinter-containercommunication isthroughweavenetwork*Oneweave’sprivatesubnetpercluster(Nocommunication acrosssubnets)
weaveoverlaynetworkonSoftlayer privatenetwork
Host1 Host2 Host3
Masternode
B BBMasternode
Edgenode
Datanode
ProvisioningInfrastructure
• Provisioninginfrastructureconsistsof• ClusterManagerthatprovidesRESTAPItocreatecluster• APIGatewayapplication• Deploymentagent• Deployer scriptsthatactuallydoallthework• Andofcoursethedatabasethatholdsallmetadatarelatedtoclusters,hosts,nodesetc.
Phasesinvolved
1. AcquireHardware2. DeployProvisioningInfra3. AddHardwaretoResourcePool4. PrepareHostMachines5. OrchestrateClusterlifecycle
1. CreateCluster,AddNodes,RemoveNodes,DeleteCluster
Howisaclustercreated?
ClusterManager DBAPIGateway
2.Manageresources1.CreateCluster
DeployerAgent
DeployerAgent
DeployerAgent
3.CreateCluster
4.PrepareNode
7.InstallIOP
5.GetNodedetails
6.GetBlueprint
Summary
• ClusterCreationinaround90secs ,2-3mins tostartservices• Reliable,repeatableclusterprovisioning• Fullymanaged• HighlySecure• Closetozerofailurerates• EasyPatching• Optimaluseofresources• Costeffectiveservices
IBMBigInsightsonCloud– BasicPlan
Cheap – Pay for only what you need
Scalable – Add or remove nodes as per your workload
Modular – Compose and deploy applications using data and analytics services on Bluemix
Reliable – Durable SWIFT Object Store service always persists data
Secure, flexible, managed Apache Hadoop clusters
IBMOpenPlatform
Knox
Ambari
OozieFlume
Pig
HadoopHDFS/MapReduce/YARN
Zookeeper
Parquet
HBase Spark
Hive
Sqoop
Avro
Betaopennow!
References• Bluemix :http://www.ibm.com/cloud-computing/bluemix• BigInsights OnCloud
• http://www-03.ibm.com/software/products/en/ibm-biginsights-for-apache-hadoop
• https://developer.ibm.com/clouddataservices/docs/biginsights/get-started-in-bluemix/what-is-basic-plan/•
https://www.youtube.com/watch?v=0DN1cDEs6ME
• Tutorials• https://www.youtube.com/watch?v=S3n9L2X91xM• https://www.youtube.com/watch?v=t1Nuy_zrL7U
• IOP:http://www.ibm.com/analytics/us/en/technology/hadoop/• Docker• Weave