Apache Hadoop YARN: Present and Future

  • Published on
    26-Jan-2015

  • View
    105

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Transcript

  • 1. Apache Hadoop YARN: Present and Future Vinod Kumar Vavilapalli Hortonworks

2. Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 2 3. Hortonworks Inc. 2014 A quick show of hands.. Hadoop 2 Page 3 Architecting the Future of Big Data Real life Hadoop Logo 4. Hortonworks Inc. 2014 Who am I? 6.75 Hadoop-years old Last thing at School a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) Previously @Yahoo! Now @Hortonworks Two hats Hortonworks: Hadoop MapReduce and YARN Development lead Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member Worked/working on YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security Apache Ambari: Kickstarted the project and its first release Stinger: High performance data processing with Hadoop/Hive Lots of trouble shooting on clusters 99% + code in Apache, Hadoop Page 4 Architecting the Future of Big Data 5. Hortonworks Inc. 2014 Agenda Apache Hadoop 2 : Overview Past Present Future Page 5 Architecting the Future of Big Data 6. Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 6 7. Hortonworks Inc. 2014 What is YARN? Resource Management Platform MapReduce v2 Beyond MapReduce with Tez, Storm, Spark; in Hadoop! Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? How is it different from Hadoop 1? .. Page 7 Architecting the Future of Big Data 8. Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, Page 8 9. Hortonworks Inc. 2014 Key Benefits of YARN Scale New Programming Models & Services Improved cluster utilization Agility To infinity and beyond .. Page 9 10. Hortonworks Inc. 2014 Why Migrate? 2.0 >= 2 * 1.0 HDFS: Lots of ground-breaking features YARN: Next generation architecture Return on Investment: 2x throughput on same hardware! Ready for improvements in hardware Not convinced? Lets see what others are saying! Page 10 Architecting the Future of Big Data 11. Hortonworks Inc. 2014 Yahoo! Leader/Visionary on all things Hadoop! On YARN (0.23.x) Moving fast to 2.x Page 11 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html 12. Hortonworks Inc. 2014 Twitter Page 12 Architecting the Future of Big Data 13. Hortonworks Inc. 2014 Ebay Has one of the largest Hadoop clusters in the industry with many petabytes of data Migrated production clusters to Hadoop-2 Go to Mayanks talk Hadoop-2 @ ebay! Thursday, April 3 Track : Deployment and Operations Should be convinced by now .. . No? Page 13 Architecting the Future of Big Data 14. Hortonworks Inc. 2014 YARN: the Data Operating System Page 14 Architecting the Future of Big Data 15. Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 15 16. Hortonworks Inc. 2014 Apache Hadoop releases 15 October, 2013 The 1st GA release of Apache Hadoop 2.x YARN First stable and supported release of YARN Binary Compatibility for MapReduce applications built on hadoop-1.x YARN level APIs solidified for the future Performance Scale! HDFS High Availability for HDFS HDFS Federation HDFS Snapshots NFSv3 access to data in HDFS Support for running Hadoop on Microsoft Windows Substantial amount of integration testing with rest of projects in the ecosystem Page 16 Architecting the Future of Big Data Apache Hadoop 2.2 17. Hortonworks Inc. 2014 Apache Hadoop releases (contd) 24 February, 2014 First post GA release for the year 2014 Alpha features in YARN ResourceManager HA Application History Will cover in the 2.4 content HDFS Details follow.. Number of bug-fixes, enhancements Page 17 Architecting the Future of Big Data Apache Hadoop 2.3 18. Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 18 Architecting the Future of Big Data 19. Hortonworks Inc. 2014 HDFS: DataNode caching Page 19 Architecting the Future of Big Data 20. Hortonworks Inc. 2014 Apache Hadoop releases (contd) Very soon! YARN Details follow.. ResourceManager restart fail-over for high availability Preemption Application History and timeline HDFS FileSystem ACLs Rolling upgrades Page 20 Architecting the Future of Big Data Apache Hadoop 2.4 21. Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 21 Architecting the Future of Big Data ZooKeeper 22. Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 22 Architecting the Future of Big Data 23. Hortonworks Inc. 2014 Application History and Timeline Few MR specific implementations: History and web-UI Not just MR anymore! History MapReduce specific Job History Server Beyond ResourceManager Restart Timeline Framework specific event collection and UIs Run analytics on historical apps! Page 23 Architecting the Future of Big Data 24. Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 24 25. Hortonworks Inc. 2014 Future: Operational enhancements Rolling upgrades No/minimal impact to users Ideal: Always rolling! HDFS in YARN Page 25 Architecting the Future of Big Data 26. Hortonworks Inc. 2014 Future: Enabling more apps Beyond MR Discussing next Long running services Isolation Multi-dimensional resource scheduling Page 26 Architecting the Future of Big Data 27. Hortonworks Inc. 2014 Future: Long running services You can run them already! Few enhancements needed Logs Security Management/monitoring Resource sharing across workload types Project Slider Page 27 Architecting the Future of Big Data 28. Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy Custom memory-monitoring Cgroups Linux Containers VMs Page 28 Architecting the Future of Big Data 29. Hortonworks Inc. 2014 Multi-resource scheduling Today memory & cpu Physical memory / virtual memory Cpu Cores Virtual cores CPU stuff: More bake in Disks Space IOPS Network Page 29 Architecting the Future of Big Data 30. Hortonworks Inc. 2014 Other features Application SLAs Node labels Node affinity/anti-affinity Better online queue-management Page 30 Architecting the Future of Big Data 31. Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 31 32. Hortonworks Inc. 2014 Eco-system Page 32 Applications Powered by YARN Apache Giraph Graph Processing Apache Hama BSP Apache Hadoop MapReduce Batch Apache Tez Batch/Interactive Apache S4 Stream Processing Apache Samza Stream Processing Apache Storm Stream Processing Apache Spark Iterative applications HOYA HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace! 33. Hortonworks Inc. 2014 Apache TEZ Moving beyond MR A data processing framework that can execute a complex DAG of tasks. Apache Tez - A New Chapter in Hadoop Data Processing By Siddharth Seth: YARN & Tez Committer/PMC Member Thursday, April 3 (4:20-5:00pm) Page 33 Architecting the Future of Big Data 34. Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 34 35. Hortonworks Inc. 2014 Recap Page 35 Architecting the Future of Big Data Apache Hadoop 2 is, at least, twice as good! Exciting journey with Hadoop for this decade Hadoop is no longer a one-trick pony, err elephant Beyond just HDFS & MapReduce Architecture for the future Centralized data Exciting spectrum of application types, workloads and usecases 36. Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 36 37. Hortonworks Inc. 2014 The Book is out! Page 37 Architecting the Future of Big Data 38. Hortonworks Inc. 2014 Page 38 Architecting the Future of Big Data 39. Hortonworks Inc. 2014 Thank you! Page 39 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!

Recommended

View more >