10
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN – The Data Operation System Tom Benton

YARN - Strata 2014

Embed Size (px)

DESCRIPTION

Part of the core Hadoop project, YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. It is the foundation of the new generation of Hadoop and is enabling organizations everywhere to realize a Modern Data Architecture.

Citation preview

Page 1: YARN - Strata 2014

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN – The Data Operation System Tom Benton

Page 2: YARN - Strata 2014

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

2006

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tradi5onal  Hadoop Traditional Hadoop allowed early adopters to deal with data at scale via: •  Single purpose clusters, specific data sets

•  Primarily batch-oriented applications using MapReduce

However… •  No direct way to integrate interactive and real-time

applications

•  Limited enterprise capabilities: Operations, Security & Governance

In the beginning…

Page 3: YARN - Strata 2014

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

2006 JAN  2008

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tradi5onal  Hadoop

MAPREDUCE-279 Outlines a NEW architecture for Hadoop which allows for efficient use of resources across many types of apps

…with increased adoption and breadth of use cases, a new approach was needed

2011 Hortonworks Founded Work accelerates on Hadoop’s next-gen architecture

Page 4: YARN - Strata 2014

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional Hadoop, challenges & limitations

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce Largely Batch Processing

SOU

RC

ES

EXISTING  Systems  

Clickstream   Web  &Social   Geoloca5on   Sensor  &  Machine  

Server  Logs   Unstructured  

Architectural Limitations •  Primarily a batch system using MapReduce •  Single purpose clusters, specific data sets

Enterprise Challenges •  Limited enterprise capabilities:

Operations, Security & Governance •  Created additional Silos

Interoperability Challenges •  Difficult to natively integrate existing applications

APP

LIC

ATIO

NS

DAT

A S

YSTE

M

Business Analytics

Custom Applications

Packaged Applications

RDBMS EDW MPP

Page 5: YARN - Strata 2014

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN Has Fundamentally Changed Hadoop

YARN enables: •  More Workloads

From batch to interactive & real-time

•  More Data Multiple data sets of varying types and structures

•  More Value Hosting multiple business cases in a single Hadoop cluster

Enterprise Hadoop Enables…

•  More Workloads From batch to interactive & real-time

•  More Data Multiple data sets of varying types and structures

•  More Value Hosting multiple business cases in a single Hadoop cluster

Page 6: YARN - Strata 2014

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

2008 2006

1   °   °   °   °   °  

°   °   °   °   °   N  

HDFS    (Hadoop  Distributed  File  System)  

MapReduce  Largely  Batch  Processing  

Tradi5onal  Hadoop

MAPREDUCE-­‐279

2011

Enterprise Hadoop Era Begins October 23, 2013

Hadoop 2 & YARN

YARN : Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

° N

HDFS (Hadoop Distributed File System)

Batch Interactive Real-Time

Core of Enterprise Hadoop

Architected & led development of YARN to enable the Modern Data Architecture

Page 7: YARN - Strata 2014

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Benefits Enabled by MDA and YARN SOLUTION: A single set of data across the entire cluster with multiple access methods using “zones” for processing

1   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   n  

Interactive Hive

 Storm  Real  Time  Streams  

Single Cluster, Multiple Workloads • Maximize compute

resources to lower TCO

• No standalone, siloed clusters

• Simple management & operations

…all enabled by YARN

Batch Pig

Real Time HBase

 Spark  In  Memory  

Page 8: YARN - Strata 2014

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN and HDP Enables the Modern Data Architecture

HDP Hortonworks Data Platform

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume NFS

WebHDFS

YARN: Data Operating System

DATA MANAGEMENT

SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

GOVERNANCE & INTEGRATION

Authentication Authorization Accounting

Data Protection

Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Other ISVs

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

In-Memory

Spark

YARN is the architectural center of Hadoop and HDP •  YARN enables a common data set

across all applications

•  Batch, interactive & real-time workloads

•  Support multi-tenant access & processing

HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services •  Security

•  Governance

•  Operations

•  Productization

Enabled broad ecosystem adoption

Tez Tez

Hortonworks drove this innovation of Hadoop through YARN

Page 9: YARN - Strata 2014

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Page 10: YARN - Strata 2014

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank You! Questions?

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch

1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)