Hadoop - Past, Present and Future - v2.0

  • Published on
    26-Jan-2015

  • View
    103

  • Download
    0

Embed Size (px)

DESCRIPTION

A session focused on ramping you up on what Hadoop is, how its works and what it's capable of. We will also look at what Hadoop 2.x and YARN brings to the table and some future projects in the Hadoop space to keep an eye on.

Transcript

  • 1. 2014Trace3,Allrightsreserved. BIGDATAINTELLIGENCEPRACTICE HADOOP: PAST,PRESENTANDFUTURE

2. 2014Trace3,Allrightsreserved. Roadmap 1 ~1hour 1-WhatMakesUpHadoop1.x? 2-WhatsNewInHadoop2.x? 3-TheFutureOfHadoop 3. 2014Trace3,Allrightsreserved. WHATMAKESUP HADOOP1.0? 4. 2014Trace3,Allrightsreserved. WhatsaNode? NodeakaServer Compute Storage OperaVngSystem Memory 5. 2014Trace3,Allrightsreserved. Hadoop1.0:HDFS+MapReduce 4 NameNode DataNode/TaskTracker DataNode/TaskTracker DataNode/TaskTracker DataNode/TaskTracker JobTracker Client 1-1 1-21-3 6. 2014Trace3,Allrightsreserved. Hadoop1.0:HDFS+MapReduce 5 NameNode DataNode/TaskTracker DataNode/TaskTracker DataNode/TaskTracker DataNode/TaskTracker JobTracker Client 1-1 1-2 1-3 ReduceMap 2-1 3-2 3-3 4-1 2-3 4-2 2-2 3-1 4-3 ReduceMap 7. 2014Trace3,Allrightsreserved. MapReducev1LimitaVons 6 Scalability Maximumclustersizeis4,000nodesandmaximumconcurrenttasksis40,000 Availability JobTrackerfailurekillsallqueuedandrunningjobs ResourcesParVVonedintoMapandReduce HardparGGoningofMapandReduceslotsledtolowresourceuVlizaVon NoSupportforAlternateParadigms/Services OnlyMapReducebatchjobs,nothingelse 8. 2014Trace3,Allrightsreserved. Hadoop1.0:SingleUseSystem 7 HADOOP1.0 SingleUseSystem BatchApps HDFS (redundant,reliablestorage) MapReduce (clusterresourcemanagementanddata processing) Pig Hive 9. 2014Trace3,Allrightsreserved. WHATSNEWIN HADOOP2.0? 10. 2014Trace3,Allrightsreserved. YARN 9 YARNReplaces MapReduce YetAnotherResourceNegoVator YARNwillbethede-factodistributed operaVngsystemforBigData 11. 2014Trace3,Allrightsreserved.10 StoreDATAinoneplace InteractwiththatdatainMULTIPLEWAYS withPredictablePerformanceandQualityofService ApplicaGonsRunNaGvelyINHadoop HDFS2 (redundant,reliablestorage) YARN (clusterresourcemanagement) BATCH (MapReduce) INTERACTIVE (Tez) ONLINE (HBase) STREAMING (DataTorrent) GRAPH (Giraph) YARN:NoLongerJustBatchApps 12. 2014Trace3,Allrightsreserved.11 YARN:ApplicaVons RunningallonthesameHadoopclustertogive applicaVonsaccesstoallthesamesourcedata! MapReducev2 StreamProcessing Master-WorkerOnline In-Memory ApacheStorm 13. 2014Trace3,Allrightsreserved.12 YARN:QuicklyMaturing 2010 2011 2012 2013 2014 Today ConceivedatYahoo! AlphaReleases2.0 BetaReleases2.1 GAReleased2.2 100,000+nodes,400,000+jobsdaily 10million+hoursofcomputedaily Version2.3 Version2.4 14. 2014Trace3,Allrightsreserved.13 YARN:Dr.EvilApproved 15. 2014Trace3,Allrightsreserved.14 YARN:WhatHasChanged? YARN MRv1 RM ResourceManager AMApplicaVonMaster JT JobTracker Scheduler Scheduler NMNodeManager TTTaskTracker Container Map& Reduce Slot ResourceManager Scheduler JobTracker Scheduler NodeManager ApplicaVonMaster TaskTracker Map Reduce NodeManager Container Container TaskTracker Map Reduce 16. 2014Trace3,Allrightsreserved. The6BenetsOfYARN 15 Scale Newprogrammingmodels andservices ImprovedclusteruVlizaVon Agility BackwardscompaVblewith MapReducev1 Mixedworkloadsonthe samesourceofdata 17. 2014Trace3,Allrightsreserved. THEFUTURE OFHADOOP 18. 2014Trace3,Allrightsreserved. SQLonHadoop Speed DeliverinteracGvequeryperformance. SQL SupportarrayofSQLsemanGcsforanalyGc applicaGonsrunningagainstHadoop. Scale SQLinterfacetoHadoopdesignedforqueries thatscalefromTerabytestoPetabytes 19. 2014Trace3,Allrightsreserved. SQLonHadoop HiveonApacheTez HortonworksHDP2 HiveonApacheSpark ClouderaCDH5 ApacheDrill MapRM7 ClouderaImpala ClouderaCDH5 PivotalHAWQ PivotalBigDataSuite 20. 2014Trace3,Allrightsreserved. HOYA:HBase(NoSQL)onYARN DynamicScaling On-demandclustersize.Increaseanddecrease thesizewithload. EasierDeployment APIstocreate,start,stopanddeleteHBase clusters. Availability RecoverfromRegionServerlosswithanew container. 21. 2014Trace3,Allrightsreserved. MicrosooREEF MachineLearning Frameworkwellsuitedforbuildingmachine learningjobs. Scalable/FaultTolerant Makesiteasytoimplementscalable,fault- tolerantrunGmeenvironmentsforarangeof computaGonalmodels. MaintainState UserscanbuildjobsthatuGlizedatafrom whereitsneededandalsomaintainstatea_er jobsaredone. Retainable Evaluator ExecuGon Framework 22. 2014Trace3,Allrightsreserved. HeterogeneousStorage NameNode Storage NameNode SATA SSD Fusion IO THEN NOW 23. 2014Trace3,Allrightsreserved. HadoopRoadmap ApacheHadoop2.5 NodeManagerRestartw/odisrupGon DynamicResourceConguraGon ApacheHadoop2.6 MemoryAsStorageTier SupportForDockerContainers Q32014 Q42014 24. 2014Trace3,Allrightsreserved. HADOOP:PAST,PRESENT&FUTURE 23 IKNOWYOUHAVE QUESTONS NOSUCHTHINGASASTUPIDQUESTION. 25. 2014Trace3,Allrightsreserved. ONELASTTHING 24 SDBigDataMeetup meetup.com/sdbigdata 2ndWednesdayOfTheMonth Next:August13th@5:45P