40
Apache Hadoop YARN: Present and Future Vinod Kumar Vavilapalli Hortonworks

Apache Hadoop YARN: Present and Future

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Apache Hadoop YARN: Present and Future

Apache Hadoop YARN: Present and Future

Vinod Kumar VavilapalliHortonworks

Page 2: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop YARNPresent and Future

Vinod Kumar Vavilapalli

vinodkv [at] apache.org

@tshooter

Page 2

Page 3: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

A quick show of hands..

• Hadoop 2

Page 3Architecting the Future of Big Data

Real life Hadoop Logo

Page 4: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Who am I?

• 6.75 Hadoop-years old• Last thing at School – a two node Tomcat cluster. Three months later,

first thing at job, brought down a 800 node cluster ;)• Previously @Yahoo!• Now @Hortonworks• Two hats

– Hortonworks: Hadoop MapReduce and YARN Development lead– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member

• Worked/working on– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop

security– Apache Ambari: Kickstarted the project and its first release– Stinger: High performance data processing with Hadoop/Hive

• Lots of trouble shooting on clusters• 99% + code in Apache, Hadoop

Page 4Architecting the Future of Big Data

Page 5: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Agenda

• Apache Hadoop 2 : Overview• Past• Present• Future

Page 5Architecting the Future of Big Data

Page 6: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop 2Next Generation Architecture

Architecting the Future of Big DataPage 6

Page 7: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

What is YARN?

• Resource Management Platform– MapReduce v2– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?

• How is it different from Hadoop 1? ..

Page 7Architecting the Future of Big Data

Page 8: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Hadoop 1 vs Hadoop 2

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HDFS2(redundant, highly-available & reliable storage)

YARN(cluster resource management)

MapReduce(data processing)

Others

HADOOP 2.0

Single Use SystemBatch Apps

Multi Purpose PlatformBatch, Interactive, Online, Streaming, …

Page 8

Page 9: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Key Benefits of YARN

• Scale

• New Programming Models & Services

• Improved cluster utilization

• Agility

• To infinity and beyond ..

Page 9

Page 10: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Why Migrate?

• 2.0 >= 2 * 1.0– HDFS: Lots of ground-breaking features– YARN: Next generation architecture

• Return on Investment: 2x throughput on same hardware!• Ready for improvements in hardware• Not convinced? Let’s see what others are saying!

Page 10Architecting the Future of Big Data

Page 11: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Yahoo!

• Leader/Visionary on all things Hadoop!• On YARN (0.23.x)• Moving fast to 2.x

Page 11Architecting the Future of Big Data

http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

Page 12: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Twitter

Page 12Architecting the Future of Big Data

Page 13: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Ebay

• Has one of the largest Hadoop clusters in the industry with many petabytes of data

• Migrated production clusters to Hadoop-2• Go to Mayank’s talk

– “Hadoop-2 @ ebay”!– Thursday, April 3– Track : Deployment and Operations

• Should be convinced by now .. . No?

Page 13Architecting the Future of Big Data

Page 14: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

YARN: the Data Operating System

Page 14Architecting the Future of Big Data

Page 15: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Present

Architecting the Future of Big DataPage 15

Page 16: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases

• 15 October, 2013• The 1st GA release of Apache Hadoop 2.x• YARN

– First stable and supported release of YARN– Binary Compatibility for MapReduce applications built on hadoop-1.x– YARN level APIs solidified for the future– Performance– Scale!

• HDFS– High Availability for HDFS– HDFS Federation– HDFS Snapshots– NFSv3 access to data in HDFS

• Support for running Hadoop on Microsoft Windows• Substantial amount of integration testing with rest of projects in the

ecosystem

Page 16Architecting the Future of Big Data

Apache Hadoop 2.2

Page 17: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• 24 February, 2014• First post GA release for the year 2014

• Alpha features in YARN– ResourceManager HA– Application History– Will cover in the 2.4 content

• HDFS– Details follow..

• Number of bug-fixes, enhancements

Page 17Architecting the Future of Big Data

Apache Hadoop 2.3

Page 18: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

HDFS: Heterogeneous Storage

Page 18Architecting the Future of Big Data

Page 19: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

HDFS: DataNode caching

Page 19Architecting the Future of Big Data

Page 20: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• Very soon!

• YARN– Details follow..– ResourceManager restart fail-over for high availability– Preemption– Application History and timeline

• HDFS– FileSystem ACLs– Rolling upgrades

Page 20Architecting the Future of Big Data

Apache Hadoop 2.4

Page 21: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

ResourceManager Restart and fail-over

Page 21Architecting the Future of Big Data

ZooKeeper

Page 22: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Capacity Scheduler Preemption

Page 22Architecting the Future of Big Data

Page 23: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Application History and Timeline

• Few MR specific implementations: History and web-UI• Not just MR anymore!• History

– MapReduce specific Job History Server– Beyond ResourceManager Restart

• Timeline– Framework specific event collection and UIs

• Run analytics on historical apps!

Page 23Architecting the Future of Big Data

Page 24: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future

Architecting the Future of Big DataPage 24

Page 25: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Operational enhancements

• Rolling upgrades– No/minimal impact to users– Ideal: Always rolling!

• HDFS in• YARN

Page 25Architecting the Future of Big Data

Page 26: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Enabling more apps

• Beyond MR• Discussing next

– Long running services– Isolation– Multi-dimensional resource

scheduling

Page 26Architecting the Future of Big Data

Page 27: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Long running services

• You can run them already!• Few enhancements needed

– Logs– Security– Management/monitoring

• Resource sharing across workload types

• Project Slider

Page 27Architecting the Future of Big Data

Page 28: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Fine-grain isolation for multi-tenancy

• Custom memory-monitoring• Cgroups• Linux Containers• VMs

Page 28Architecting the Future of Big Data

Page 29: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Multi-resource scheduling

• Today – memory & cpu– Physical memory / virtual memory– Cpu Cores – Virtual cores

• CPU stuff: More bake in• Disks

– Space– IOPS

• Network

Page 29Architecting the Future of Big Data

Page 30: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Other features

• Application SLAs• Node labels• Node affinity/anti-affinity• Better online queue-management

Page 30Architecting the Future of Big Data

Page 31: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

YARN EcosystemBeyond the core YARN project: Briefly

Architecting the Future of Big DataPage 31

Page 32: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Eco-system

Page 32

Applications Powered by YARN

Apache Giraph – Graph Processing

Apache Hama – BSP

Apache Hadoop MapReduce – Batch

Apache Tez – Batch/Interactive

Apache S4 – Stream Processing

Apache Samza – Stream Processing

Apache Storm – Stream Processing

Apache Spark – Iterative applications

HOYA – HBase on YARNYARN FrameworksApache Twill

REEF by Microsoft

Spring support for Hadoop 2

There's an app for that...

YARN App Marketplace!

Page 33: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache TEZ

• Moving beyond MR• A data processing framework that can execute a complex DAG

of tasks.

• “Apache Tez - A New Chapter in Hadoop Data Processing”– By Siddharth Seth: YARN & Tez Committer/PMC Member– Thursday, April 3 (4:20-5:00pm)

Page 33Architecting the Future of Big Data

Page 34: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Recap

Architecting the Future of Big DataPage 34

Page 35: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Recap

Page 35Architecting the Future of Big Data

• Apache Hadoop 2 is, at least, twice as good!

• Exciting journey with Hadoop for this decade…– Hadoop is no longer a one-trick pony, err elephant– Beyond just HDFS & MapReduce

• Architecture for the future– Centralized data– Exciting spectrum of application types, workloads and usecases

Page 36: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Couple more things..

Architecting the Future of Big DataPage 36

Page 37: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

The Book is out!

Page 37Architecting the Future of Big Data

http://yarn-book.com/

Page 38: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014Page 38

Architecting the Future of Big Data

Page 39: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Thank you!

Page 39

Download Sandbox: Experience Apache Hadoop

Both 2.x and 1.x Versions Available!

http://hortonworks.com/products/hortonworks-sandbox/

Questions Time!

Page 40: Apache Hadoop YARN: Present and Future