34
© Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! We’ll get started soon

Combine SAS High-Performance Capabilities with Hadoop YARN

Embed Size (px)

Citation preview

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Q&A box is available for your questions

Webinar will be recorded for future viewing

Thank you for joining!

We’ll get started soon…

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Combine SAS High-Performance Capabilities with Hadoop YARN

We do Hadoop.

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Your speakers…

Arun Murthy, Founder and Architect Hortonworks @acmurthy

Paul Kent, Vice President Big Data SAS @hornpolish

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Agenda

•  Introduction to YARN •  SAS Workloads on the Cluster •  SAS Workloads: Resource Settings •  SAS and YARN •  YARN Futures •  Next Steps

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

The 1st Generation of Hadoop: Batch

HADOOP 1.0 Built for Web-Scale Batch Apps

Single  App  BATCH

HDFS

Single  App  INTERACTIVE

Single  App  BATCH

HDFS

•  All other usage patterns must leverage that same infrastructure

•  Forces the creation of silos for managing mixed workloads

Single  App  BATCH

HDFS

Single  App  ONLINE

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop MapReduce Classic

JobTracker

§ Manages cluster resources and job scheduling

TaskTracker

§  Per-node agent

§ Manage tasks

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

MapReduce Classic: Limitations Scalability § Maximum Cluster size – 4,000 nodes § Maximum concurrent tasks – 40,000 § Coarse synchronization in JobTracker

Availability §  Failure kills all queued and running jobs Hard partition of resources into map and reduce slots §  Low resource utilization

Lacks support for alternate paradigms and services §  Iterative applications implemented using MapReduce are 10x slower

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Our Vision: Hadoop as Next-Gen Platform

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° ° ° ° ° ° ° °

MapReduce (Cluster Resource Management & Data Processing)

Script

Pig

SQL

Hive

Others

Storm, Solr, etc.

1 ° ° ° ° °

° ° ° ° ° °

° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

Real-time

HBase

Script

Pig

SQL

Hive

Engines

HBase Accumulo, Storm,

Solr, Spark.

Others

ISV Engines

Tez Tez

Others

Engines

Tez

Hadoop 1 •  Silos & Largely batch •  Single Processing engine

Hadoop 2 w/ •  Multiple Engines, Single Data Set •  Batch, Interactive & Real-Time

Java

Cascading

Tez

° °

° °

° °

°

°

N

HDFS (Hadoop Distributed File System)

Tez

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN: Taking Hadoop Beyond Batch

Applica,ons  Run  Na,vely  IN  Hadoop  

HDFS2  (Redundant,  Reliable  Storage)  

YARN  (Cluster  Resource  Management)      

BATCH  (MapReduce)  

INTERACTIVE  (Tez)  

STREAMING  (Storm,  S4,…)  

GRAPH  (Giraph)  

IN-­‐MEMORY  (Spark)  

HPC  MPI  (OpenMPI)  

ONLINE  (HBase)  

OTHER  (Search)  (Weave…)  

Store ALL DATA in one place…

Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN

Hortonworks Data Platform

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Others

Engines

Tez

Java

Cascading

Tez

° °

° °

° °

HBase

NoSQL

Storm

Stream

Slider Slider

Accumulo

NoSQL

Others

Engines

Slider Slider

° ° ° ° °

° ° ° ° °

° ° ° ° °

°

°

°

Spark

In-Memory

°

°

°

°

°

°

PaaS

Kubernetes

LASR HPA

°

°

N

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

Batch

MR

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

5 5 Key Benefits of YARN

1.  Scale

2.  New Programming Models & Services

3.  Improved cluster utilization

4.  Agility

5.  Beyond Java

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Concepts

Application §  Application is a temporal job or a service submitted YARN §  Examples

– Map Reduce Job (job) – Hbase Cluster (service)

Container §  Basic unit of allocation §  Fine-grained resource allocation across multiple resource types (memory, cpu, disk,

network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU

§ Replaces the fixed map/reduce slots

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Design Centre

Split up the two major functions of JobTracker § Cluster resource management §  Application life-cycle management

MapReduce becomes user-land library

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

NodeManager   NodeManager   NodeManager   NodeManager  

Container  1.1  

Container  2.4  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager   NodeManager   NodeManager   NodeManager  

Container  1.2  

Container  1.3  

AM  1  

Container  2.2  

Container  2.1  

Container  2.3  

AM2  

YARN Architecture - Walkthrough

Client2  

ResourceManager  

Scheduler  

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Multi-Tenancy with YARN Economics as queue-capacity §  Heirarchical Queues

SLAs §  Preemption

Resource Isolation §  Linux: cgroups §  MS Windows: Job Control §  Roadmap: Virtualization (Xen, KVM)

Administration §  Queue ACLs §  Run-time re-configuration for queues §  Charge-back

ResourceManager  

Scheduler  

root

Adhoc 10%

DW 70%

Mrkting 20%

Dev 10%

Reserved 20%

Prod 70%

Prod 80%

Dev 20%

P0 70%

P1 30%

Capacity Scheduler

Hierarchical Queues

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN Applications

Data processing applications and services §  Services - Slider §  Real-time event processing – Storm, S4, other commercial platforms

§  Tez – Generic framework to run a complex DAG § MPI: OpenMPI, MPICH2 § Master-Worker § Machine Learning: Spark § Graph processing: Giraph §  Enabled by allowing the use of paradigm-specific application master

Run all on the same Hadoop cluster!

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SHARE!

Customers are:

wrapping up POCs

building Bigger Clusters

assembling their Data { Lake, Reservoir }

want their software to SHARE the cluster

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Workloads on the Cluster

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Workloads on the Cluster - Video

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

Some Requests are for a significant slice of the cluster

Reservation will be ALL DAY, ALL WEEK, ALL MONTH?

Memory typically fixed (15% of cluster)

CPU floor, would like the spare capacity when available Some Requests are more short term

Memory can be estimated

Duration can be capped

CPU floor, would like spare capacity

SAS Workloads on the Cluster

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

SAS Workloads on the Cluster

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

How much should you reserve?

not a perfect science yet Long Running?

LASR server by percent of total memory More like a batch request?

HPA procedure by anecdotal experience

SAS Workloads – Resource Settings

Copyr igh t © 2014 , SAS Ins t i tu te Inc . A l l r i gh ts reserved .

if [ "$USER" = "lasradm" ]; then # Custom settings for running under the lasradm account. export TKMPI_ULIMIT="-v 50000000” export TKMPI_MEMSIZE=50000 export TKMPI_CGROUP="cgexec -g cpu:75” fi # if [ "$TKMPI_APPNAME" = "lasr" ]; then # Custom settings for a lasr process running under any account. # export TKMPI_ULIMIT="-v 50000000" # export TKMPI_MEMSIZE=50000 # export TKMPI_CGROUP="cgexec -g cpu:75"

SAS Workloads – Resource Settings

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN: Taking Hadoop Beyond Batch

Applica,ons  Run  Na,vely  IN  Hadoop  

HDFS2  (Redundant,  Reliable  Storage)  

YARN      

BATCH  (MapReduce)  

INTERACTIVE  (Tez)  

STREAMING  (Storm,  S4,…)  

GRAPH  (Giraph)  

IN-­‐MEMORY  (Spark)  

ONLINE  (HBase)  

Store ALL DATA in one place…

Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN Futures

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

NodeManager   NodeManager    

Container  1.1  

NodeManager   NodeManager  

AM  1   startContainer!

ResourceManager  

Scheduler  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager  

NodeManager  NodeManager  

NodeManager  

1

allocate!

2 container!

3

YARN – Delegated Container Model

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

NodeManager   NodeManager    

ServiceX  

NodeManager   NodeManager  

AM  1   delegateContainer!

ResourceManager  

Scheduler  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager  

NodeManager  NodeManager  

NodeManager  

1

allocate!

2 container!

3

4

YARN – Delegated Container Model

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

NodeManager   NodeManager    

ServiceX  

NodeManager   NodeManager  

AM  1  

ResourceManager  

Scheduler  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager  

NodeManager  NodeManager  

NodeManager  

5

YARN – Delegated Container Model

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

NodeManager   NodeManager    

NodeManager   NodeManager  

AM  1  

ResourceManager  

Scheduler  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager  

NodeManager  NodeManager  

NodeManager  

6 ServiceX  

YARN – Delegated Container Model

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

PaaS - Kubernetes-on-YARN YARN as the default enterprise-class scheduler and resource manager for Kubernetes and OpenShift 3

q  First class support for containerization and mainstream PaaS

q  Updated go language bindings for YARN

q  Uses container delegation model

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Labels – Constraint Specifications

NodeManager   NodeManager   NodeManager   NodeManager  w/  GPU  

map  1.1  

NodeManager   NodeManager   NodeManager  w/  GPU   NodeManager  w/  GPU  

NodeManager   NodeManager   NodeManager   NodeManager  w/  GPU  

map1.2  

reduce1.1  

MR  AM  1  

DL1.1  

DL1.2  

DL1.3  

DL-­‐AM  

ResourceManager  

Scheduler  

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Reservations - SLAs via Allocation Planning

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN

Hortonworks Data Platform

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Others

Engines

Tez

Java

Cascading

Tez

° °

° °

° °

HBase

NoSQL

Storm

Stream

Slider Slider

Accumulo

NoSQL

Others

Engines

Slider Slider

° ° ° ° °

° ° ° ° °

° ° ° ° °

°

°

°

Spark

In-Memory

°

°

°

°

°

°

PaaS

Kubernetes

LASR HPA

°

°

N

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

Batch

MR

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Next Steps…

Download the Hortonworks Sandbox

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about SAS & Hortonworks http://hortonworks.com/partner/SAS/

Contact us: [email protected]