Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

  • Published on
    01-Jul-2015

  • View
    237

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation given at SQLSaturday #326 Tampa, FL BA Edition https://www.sqlsaturday.com/326/schedule.aspx

Transcript

  • 1. HADOOP:PAST,PRESENTANDFUTUREBIGDATAINTELLIGENCEPRACTICE2014Trace3,Allrightsreserved.

2. Roadmap2014Trace3,Allrightsreserved.1~1hour1-WhatMakesUpHadoop1.x?2-WhatsNewInHadoop2.x?3-TheFutureOfHadoop 3. WHATMAKESUPHADOOP1.0?2014Trace3,Allrightsreserved. 4. WhatsaNode?Processes/Daemons/Services2014Trace3,Allrightsreserved.NodeakaServerOperaZngSystemComputeStorageMemory 5. Hadoop1.0:HDFS+MapReduce2014Trace3,Allrightsreserved.4NameNodeJobTrackerDataNode/TaskTrackerDataNode/TaskTrackerDataNode/TaskTrackerDataNode/TaskTrackerClient1-111--23 6. Hadoop1.0:HDFS+MapReduce2014Trace3,Allrightsreserved.5NameNodeJobTrackerDataNode/TaskTrackerDataNode/TaskTracker2-13-2MapReduceDataNode/TaskTrackerDataNode/TaskTrackerClient1-11-21-3MapReduce3-34-12-34-22-23-14-3 7. MapReducev1LimitaZons2014Trace3,Allrightsreserved.6ScalabilityMaximumclustersizeis4,000nodesandmaximumconcurrenttasksis40,000AvailabilityJobTrackerfailurekillsallqueuedandrunningjobsResourcesParZZonedintoMapandReduceHardparGGoningofMapandReduceslotsledtolowresourceuZlizaZonNoSupportforAlternateParadigms/ServicesOnlyMapReducebatchjobs,nothingelse 8. Hadoop1.0:SingleUseSystemPigHiveMapReduce(clusterresourcemanagementanddataprocessing)2014Trace3,Allrightsreserved.7HADOOP1.0SingleUseSystemBatchAppsHDFS(redundant,reliablestorage) 9. WHATSNEWINHADOOP2.0?2014Trace3,Allrightsreserved. 10. YARN2014Trace3,Allrightsreserved.9YARNReplacesMapReduceYetAnotherResourceNegoZatorYARNwillbethede-factodistributedoperaZngsystemforBigData 11. YARN=BIGDATA2014Trace3,Allrightsreserved.10 12. YARN:NoLongerJustBatchApps2014Trace3,All11rightsreserved.StoreDATAinoneplaceInteractwiththatdatainMULTIPLEWAYSwithPredictablePerformanceandQualityofServiceApplicaGonsRunNaGvelyINHadoopYARN(clusterresourcemanagement)HDFS2(redundant,reliablestorage)BATCH(MapReduce)INTERACTIVE(Tez)ONLINE(HBase)STREAMING(DataTorrent)GRAPH(Giraph) 13. YARN:ApplicaZonsOnlineRunningallonthesameHadoopclustertogiveapplicaZonsaccesstoallthesamesourcedata!2014Trace3,All12rightsreserved.MapReducev2Real-TimeStreamProcessingMaster-WorkerIn-MemoryApacheStorm 14. YARN:QuicklyMaturing2014Trace3,All13Version2.3Version2.5rightsreserved.20102011201220132014TodayConceivedatYahoo!AlphaReleases2.0BetaReleases2.1GAReleased2.2Version2.4200,000+nodes,800,000+jobsdaily10million+hoursofcomputedaily 15. YARN:WhatHasChanged?2014Trace3,All14rightsreserved.YARNMRv1RMResourceManagerAMApplicaZonMasterJTJobTrackerSchedulerSchedulerNMTTNodeManagerTaskTrackerContainerMap&ReduceSlotResourceManagerSchedulerJobTrackerSchedulerNodeManagerApplicaZonMasterTaskTrackerMapReduceNodeManagerContainerContainerTaskTrackerMapReduce 16. The6BenefitsOfYARN2014Trace3,Allrightsreserved.15 Scale Newprogrammingmodelsandservices ImprovedclusteruZlizaZon Agility BackwardscompaZblewithMapReducev1 Mixedworkloadsonthesamesourceofdata 17. THEFUTUREOFHADOOP2014Trace3,Allrightsreserved. 18. SQLonHadoopSpeedDeliverinteracGvequeryperformance.SQLSupportarrayofSQLsemanGcsforanalyGcapplicaGonsrunningagainstHadoop.ScaleSQLinterfacetoHadoopdesignedforqueriesthatscalefromTerabytestoPetabytes2014Trace3,Allrightsreserved. 19. SQLonHadoopHiveonApacheTezHortonworksHDP2HiveonApacheSparkClouderaCDH5ApacheDrillMapRM7ClouderaImpalaClouderaCDH5PivotalHAWQPivotalBigDataSuite2014Trace3,Allrightsreserved. 20. ApacheSpark2014Trace3,Allrightsreserved.ApacheSpark(Databricks)YARN(clusterresourcemanagement)HDFS2(redundant,reliablestorage)ProgrammingLanguagesJava,Scala,Python,R*InteracZveShellAbilitytowritecodeandgetoutput.Fasterby~100xDuehowithandlesdatainmemory. 21. ApacheSparkWordcount2014Trace3,Allrightsreserved. 22. HOYA:HBase(NoSQL)onYARNDynamicScalingOn-demandclustersize.Increaseanddecreasethesizewithload.EasierDeploymentAPIstocreate,start,stopanddeleteHBaseclusters.AvailabilityRecoverfromRegionServerlosswithanewcontainer.2014Trace3,Allrightsreserved. 23. ApacheREEFMachineLearningFrameworkwellsuitedforbuildingmachinelearningjobs.Scalable/FaultTolerantMakesiteasytoimplementscalable,fault-tolerantrunGmeenvironmentsforarangeofcomputaGonalmodels.MaintainStateUserscanbuildjobsthatuGlizedatafromwhereitsneededandalsomaintainstatea`erjobsaredone.2014Trace3,Allrightsreserved.RetainableEvaluatorExecuGonFramework 24. Real-TimeStreamProcessing2014Trace3,Allrightsreserved.ApacheStormStreaming 25. HeterogeneousStorageNameNodeStorage2014Trace3,Allrightsreserved.NameNodeSATASSDFusionIOTHENNOW 26. HadoopRoadmap ApacheHadoop2.5 NodeManager2014Trace3,Allrightsreserved.Restartw/odisrupGon ApacheHadoop2.6 MemoryAsStorageTier DynamicResourceConfiguraGon SupportForDockerContainersQ32014Q42014 27. IKNOWYOUHAVEQUESTIONS2014Trace3,Allrightsreserved.26 28. THANKYOU!hqp://bigdatajoe.io/hqp://bigdatacentric.com/@bigdatajoerossibigdatajoerossi@gmail.com2014Trace3,Allrightsreserved.