Hadoop Applications on High Performance Computing · referenced web site and confirm whether referenced data are accurate. • Results have been estimated or simulated using internal

Hadoop Applications on High Performance

Computing

Devaraj Kavali

[email protected]

About Me

• Apache Hadoop Committer

• Yarn/MapReduce Contributor

• Senior Software Engineer @Intel Corporation

2

Agenda

• Objectives

• HDFS Applications with HPC File Systems

• Yarn Application

• Mapreduce Job

• HPC Schedulers

• Yarn Protocols

• Log Aggregation

• Shuffle Implementation

• Q&A

3

Objectives

• Use existing HPC Cluster for running Hadoop Applications

• Use any of the HPC File Systems like Lustre, PVFS, IBRIX

Fusion, etc.

• Use any of the HPC schedulers like Slurm, Moab, PBS Pro,

etc.

• Combine Hadoop workloads with HPC workloads

• No code changes to existing Hadoop(HDFS/YARN/MR)

applications

• Minimal Hadoop configuration changes

4

HDFS Applications Using HPC File Systems

5

public class <HPC>Adapter

extends AbstractFileSystem{

…………..

…………..

}

HDFS

Application

HPC FS

6

<property>

<name>fs.AbstractFileSystem.${hpc-uri}.impl</name>

<value>HPCFileSystemAdapter</value>

</property>

<property>

<name>fs.defaultFS</name>

<value>${hpc-uri}:///</value>

</property>

Hadoop Configurations for File System

YARN Application

7

Client Resource

Manager

Node

Manager

App

Master Container

Node

Manager

Container Container

8

YARN Application with HPC Scheduler

Client

HPC

Scheduler

Master

HPC

Scheduler

Slave

App

Master Container

HPC

Scheduler

Slave

Container Container

Yarn Application Submission with HPC Scheduler

9

Yarn

Application

Yarn Client

HPC

Application

Client Protocol

Impl

HPC Scheduler

Application Master

NM Client AMRM Client

HPC Container

Management

Protocol Impl

HPC

Application

Master

Protocol Impl

Container

Yarn Child

1. submitApplication()

2. submit() 3. run 7. Launch Yarn Child

5. Allocate 6. Run Child Task

4. Launch

App Master

8. Report Progress

10

Mapreduce Job with HPC Scheduler

Job Client

Job

Submitter

Yarn Client

ResourceMg

rDelegate

ApplicationC

lientProtocol

Proxy

Resource

Manager

HPC

Scheduler

Job History

Server

HPCApplicat

ionClientProt

ocolImpl

LocalClientProtocol

Provider

YarnClientProtocol

Provider

Local Job

Runner

YARN

Runner

1. submit()

2. submitJobInternal()

3. submitJob()

4. submit()

5. submit() 6. submit()

7. run()

Yarn Protocols Configurations

11

<property>

<description>RPC class implementation</description>

<name>yarn.ipc.rpc.class</name>

<value>HadoopYarnHPCRPC</value>

</property>

RPC class Configuration

Yarn Protocols Configurations

12

public class HadoopYarnHPCRPC extends HadoopYarnProtoRPC {

@Override

public Object getProxy(Class protocol, InetSocketAddress address, Configuration conf) {

Object proxy;

if (protocol == ApplicationClientProtocol.class) {

proxy = new HPCApplicationClientProtocolImpl(conf);

} else if (protocol == ApplicationMasterProtocol.class) {

proxy = new HPCApplicationMasterProtocolImpl(conf);

} else if (protocol == ContainerManagementProtocol.class) {

proxy = new HPCContainerManagementProtocolImpl(conf);

} else {

proxy = super.getProxy(protocol, address, conf);

}

return proxy;

}

Application Client Protocol

13

Yarn Application Client Protocol Flow

Resource

Manager

Yarn Application Client

RMClient

RMApplicati

onClientPro

tocolProxy

1. getNewApplication ()

2. submitApplication ()

3. forceKillApplication ()

4. getClusterMetrics ()

5. getClusterNodes ()

6. getQueueInfo ()

………..


14

Yarn Application Client HPC Scheduler Flow

HPC

Scheduler

Yarn Application Client

RMClient

HPCApplicati

onClientProt

ocolImpl

1. Allocate Resources cmd

2. Submit Job(Batch) cmd

3. Cancel Job cmd

4. Cluster Info cmd

5. Get Jobs Report cmd

………..


15

API’s for interaction

1. getNewApplication()

The interface used by clients to obtain a new ApplicationId for submitting new

applications.

2. submitApplication()

The interface used by clients to submit a new application to the ResourceManager.

3. forceKillApplication()

The interface used by clients to request the ResourceManager to abort submitted

application.

4. getClusterMetrics()

5. getClusterNodes()

6. getQueueInfo()

Application Master Protocol

16

Yarn Application Master Flow Diagram

Resource

Manager

Application Master

RMClient

RMApplicati

onMasterS

erviceProto

colProxy

1. registerApplicationMaster()

2. allocate()

3. finishApplicationMaster()


17

HPC Scheduler Application Master Flow Diagram

HPC

Scheduler

Application Master

NMClient

HPCApplicati

onMasterProt

ocolImpl

1. Cluster Info Cmd

2. Resource Allocation Cmd

3. Finish Tasks Cmds


18


1. registerApplicationMaster()

The interface used by a new ApplicationMaster to register with the ResourceManager.

2. allocate()

The main interface between an ApplicationMaster and the ResourceManager.

3. finishApplicationMaster()

The interface used by an ApplicationMaster to notify the ResourceManager about its

completion (success or failed).

Container Management Protocol

19

Yarn Container Management Flow

Node Manager

Application Master

NMClient

NMContain

erManagem

entProtocol

Proxy

1. startContainers()

2. stopContainers()

3. getContainerStatuses()


20

HPC Scheduler Task Management Flow

HPC

Scheduler

Application Master

NMClient

HPCContai

nerManage

mentProtoc

olImpl

1. Start Containers Cmd

2. Stop Containers Cmd

3. Get Container Statuses Cmd


21


1. startContainers()

The ApplicationMaster provides a list of StartContainerRequest's to a NodeManager

to start Container's allocated to it using this interface.

2. stopContainers()

The ApplicationMaster requests a NodeManager to stop a list of Container's allocated

to it using this interface.

3. getContainerStatuses()

The API used by the ApplicationMaster to request for current statuses of Container's

from the NodeManager.

22

<property>

<name>yarn.log-aggregation-enable</name>

<value>true</value>

</property>

Yarn Log Aggregation

Log Aggregation by Node Manager

Log Aggregation with HPC Scheduler

• Issue an HPC scheduler command to execute in all nodes(where

application tasks executed) as part of

ApplicationMasterProtocol.finishApplicationMaster() for

aggregating the application logs.

23

Shuffle Handling – Hadoop

MRAppMaster

Reduce Task

EventFetcher

Shuffle

Consumer Local

Dirs

6. Read Map O/P

5. Map Completed

4. Get Map

Completion Events Node 1

2. Write Final Map O/P

Map

Task 3. Task Completed

1. Assign Task

Node

Manager

Shuffle Handling – HPC File Systems

24

Map Task MRAppMaster

Reduce Task

EventFetcher

Shuffle

Consumer

Parallel File

System

1. Assign Task

3. Task Completed

2. Write Final Map O/P

6. Read Map O/P

5. Map Completed

4. Get Map

Completion Events

Shuffle Handling

25

<property>

<name>mapreduce.job.map.output.collector.class</name>

<value>org.apache.hadoop.mapred.MapTask$MapOutputBuffer</value>

<description>

The MapOutputCollector implementation(s) to use. This may be a

comma-separated list of class names, in which case the map task will try

to initialize each of the collectors in turn. The first to successfully

initialize will be used.

</description>

</property>

Shuffle Handler

26

<property>

<name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>

<value>org.apache.hadoop.mapreduce.task.reduce.Shuffle</value>

<description>

Name of the class whose instance will be used to send shuffle

requests by reduce tasks of this job. The class must be an instance of

org.apache.hadoop.mapred.ShuffleConsumerPlugin.

</description>

</property>

Shuffle Handling Shuffle Consumer

Summary

HDFS configuration for new File System

HPC Schedulers

YARN Protocols

M/R Shuffle Implementation

Yarn Log Aggregation

27

28

Q & A

29

Thank You…

[email protected]

Notices and Disclaimers

30

• Copyright © 2014 Intel Corporation. • Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be

claimed as the property of others. See Trademarks on intel.com for full list of Intel trademarks.

• All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps

• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. • Performance tests are measured using specific computer systems, components, software, operations and functions. Any

change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

• For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. • Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the

referenced web site and confirm whether referenced data are accurate. • Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to

you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

• Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer.

• No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses.

• You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • The products described may contain design defects or errors known as errata which may cause the product to deviate from

publish.

31

Documents

Hadoop Applications on High Performance Computing · referenced web site and confirm whether referenced data are accurate. • Results have been estimated or simulated using internal