Introduction to YARN Apps

Intro to YARN Apps Sandy Ryza

Introduc4on

•  What’s YARN? •  YARN apps •  Building YARN apps

The OS analogy

Traditional Operating System

Storage: File System

Execution/Scheduling: Processes/Kernel

Scheduler

The OS analogy

Hadoop

Storage: Hadoop Distributed File System (HDFS)

Execution/Scheduling: YARN!

Goal: Mul4tenancy

•  Different types of applications on the same cluster

•  Different users and organizations on the same cluster

ResourceManager (RM)

•  Central service that tracks o  Nodes

§  Resources o  Applications o  Containers

•  Houses scheduler, which is in charge of all container placement decisions

NodeManager (NM)

•  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness

Applica4on Master (AM)

•  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager

YARN

ResourceManager

NodeManager NodeManager

Container

Map Task

Container

Application Master

Container

Reduce Task

JobHistoryServer Client

Processing Frameworks / YARN apps

•  MapReduce o  Batch processing, fault tolerant

•  Impala o  Low latency SQL on Hadoop

•  Spark o  Load data into memory, great for iterative

algorithms •  Storm o  Stream processing

YARN app models

•  Applica4on master (AM) per job •  Most simple for batch •  Used by MapReduce

YARN app models

•  Applica4on master per session •  Runs mul4ple jobs on behalf of the same user •  Recently added in Tez •  Spark interac4ve mode

YARN app models

•  Singleton AM as permanent service •  Always on, waits around for jobs to come in •  Used for Impala

YARN/MR Scheduling

Fair Scheduler Decide which jobs to give resources to

ResourceManager

Decide which tasks to give resources to within a job

MapReduce Application Master

Scheduling on Hadoop

ResourceManager

Application Master 1


Node 1 Node 2 Node 3


ResourceManager




I want 2 containers with 1024 MB and a 1 core each


ResourceManager




Noted


ResourceManager




I’m still here


ResourceManager




I’ll reserve some space on node1 for AM1


ResourceManager




Got anything for me?


ResourceManager




Here’s a security token to let you launch a container on Node 1


ResourceManager




Hey, launch my container with this shell command


ResourceManager




Container

Should you build a YARN app?

•  MapReduce can’t run arbitrary DAGs? o  Use Spark


•  MapReduce can’t store data in memory? o  Use Spark


•  Iterative processing? o  Use Spark


•  Have an existing distributed app that runs all tasks at once? o  Use distributed shell

When to build a YARN app

•  Allocating and releasing containers dynamically

•  Weird scheduling requirements o  Gang o  Complex locality

What YARN does for you

•  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave

What YARN does not do for you

•  Communication between your processes

AMRMClientAsync CallbackHandler handler = new CallbackHandler() {

public void onContainersAllocated(List<Container> containers) {

for (Container container : containers) {

startTask(container);

}

}

[... more methods]

}

AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler);

amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”);

amClient.addContainerRequest(

new ContainerRequest(

Resource.newInstance(1024, 1),

new String[] {“node1”, “node2”}, new String[] {“rack1”},

Priority.newInstance(2)));

NMClientAsync CallbackHandler nmHandler = new CallbackHandler() {

[... listen for containers stopped and started]

}

NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);

Launching Containers

public void startContainer(Container container) {

ContainerLaunchContext launchContext =

ContainerLaunchContext.newInstance(

localResources,

environment,

Arrays.asList(“sleep 1000”),

serviceData,

tokens,

acls);

nmClient.startContainerAsync(container, launchContext);

}

Local resources

HDFS

Node Container Container

file.txt

file.txt

Node Container Container

file.txt

Technology

Introduction to YARN Apps