Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
About Me
• Apache Hadoop Committer
• Yarn/MapReduce Contributor
• Senior Software Engineer @Intel Corporation
2
Agenda
• Objectives
• HDFS Applications with HPC File Systems
• Yarn Application
• Mapreduce Job
• HPC Schedulers
• Yarn Protocols
• Log Aggregation
• Shuffle Implementation
• Q&A
3
Objectives
• Use existing HPC Cluster for running Hadoop Applications
• Use any of the HPC File Systems like Lustre, PVFS, IBRIX
Fusion, etc.
• Use any of the HPC schedulers like Slurm, Moab, PBS Pro,
etc.
• Combine Hadoop workloads with HPC workloads
• No code changes to existing Hadoop(HDFS/YARN/MR)
applications
• Minimal Hadoop configuration changes
4
HDFS Applications Using HPC File Systems
5
public class <HPC>Adapter
extends AbstractFileSystem{
…………..
…………..
}
HDFS
Application
HPC FS
6
<property>
<name>fs.AbstractFileSystem.${hpc-uri}.impl</name>
<value>HPCFileSystemAdapter</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>${hpc-uri}:///</value>
</property>
Hadoop Configurations for File System
YARN Application
7
Client Resource
Manager
Node
Manager
App
Master Container
Node
Manager
Container Container
8
YARN Application with HPC Scheduler
Client
HPC
Scheduler
Master
HPC
Scheduler
Slave
App
Master Container
HPC
Scheduler
Slave
Container Container
Yarn Application Submission with HPC Scheduler
9
Yarn
Application
Yarn Client
HPC
Application
Client Protocol
Impl
HPC Scheduler
Application Master
NM Client AMRM Client
HPC Container
Management
Protocol Impl
HPC
Application
Master
Protocol Impl
Container
Yarn Child
1. submitApplication()
2. submit() 3. run 7. Launch Yarn Child
5. Allocate 6. Run Child Task
4. Launch
App Master
8. Report Progress
10
Mapreduce Job with HPC Scheduler
Job Client
Job
Submitter
Yarn Client
ResourceMg
rDelegate
ApplicationC
lientProtocol
Proxy
Resource
Manager
HPC
Scheduler
Job History
Server
HPCApplicat
ionClientProt
ocolImpl
LocalClientProtocol
Provider
YarnClientProtocol
Provider
Local Job
Runner
YARN
Runner
1. submit()
2. submitJobInternal()
3. submitJob()
4. submit()
5. submit() 6. submit()
7. run()
Yarn Protocols Configurations
11
<property>
<description>RPC class implementation</description>
<name>yarn.ipc.rpc.class</name>
<value>HadoopYarnHPCRPC</value>
</property>
RPC class Configuration
Yarn Protocols Configurations
12
public class HadoopYarnHPCRPC extends HadoopYarnProtoRPC {
@Override
public Object getProxy(Class protocol, InetSocketAddress address, Configuration conf) {
Object proxy;
if (protocol == ApplicationClientProtocol.class) {
proxy = new HPCApplicationClientProtocolImpl(conf);
} else if (protocol == ApplicationMasterProtocol.class) {
proxy = new HPCApplicationMasterProtocolImpl(conf);
} else if (protocol == ContainerManagementProtocol.class) {
proxy = new HPCContainerManagementProtocolImpl(conf);
} else {
proxy = super.getProxy(protocol, address, conf);
}
return proxy;
}
Application Client Protocol
13
Yarn Application Client Protocol Flow
Resource
Manager
Yarn Application Client
RMClient
RMApplicati
onClientPro
tocolProxy
1. getNewApplication ()
2. submitApplication ()
3. forceKillApplication ()
4. getClusterMetrics ()
5. getClusterNodes ()
6. getQueueInfo ()
………..
Application Client Protocol
14
Yarn Application Client HPC Scheduler Flow
HPC
Scheduler
Yarn Application Client
RMClient
HPCApplicati
onClientProt
ocolImpl
1. Allocate Resources cmd
2. Submit Job(Batch) cmd
3. Cancel Job cmd
4. Cluster Info cmd
5. Get Jobs Report cmd
………..
Application Client Protocol
15
API’s for interaction
1. getNewApplication()
The interface used by clients to obtain a new ApplicationId for submitting new
applications.
2. submitApplication()
The interface used by clients to submit a new application to the ResourceManager.
3. forceKillApplication()
The interface used by clients to request the ResourceManager to abort submitted
application.
4. getClusterMetrics()
5. getClusterNodes()
6. getQueueInfo()
Application Master Protocol
16
Yarn Application Master Flow Diagram
Resource
Manager
Application Master
RMClient
RMApplicati
onMasterS
erviceProto
colProxy
1. registerApplicationMaster()
2. allocate()
3. finishApplicationMaster()
Application Master Protocol
17
HPC Scheduler Application Master Flow Diagram
HPC
Scheduler
Application Master
NMClient
HPCApplicati
onMasterProt
ocolImpl
1. Cluster Info Cmd
2. Resource Allocation Cmd
3. Finish Tasks Cmds
Application Master Protocol
18
API’s for interaction
1. registerApplicationMaster()
The interface used by a new ApplicationMaster to register with the ResourceManager.
2. allocate()
The main interface between an ApplicationMaster and the ResourceManager.
3. finishApplicationMaster()
The interface used by an ApplicationMaster to notify the ResourceManager about its
completion (success or failed).
Container Management Protocol
19
Yarn Container Management Flow
Node Manager
Application Master
NMClient
NMContain
erManagem
entProtocol
Proxy
1. startContainers()
2. stopContainers()
3. getContainerStatuses()
Container Management Protocol
20
HPC Scheduler Task Management Flow
HPC
Scheduler
Application Master
NMClient
HPCContai
nerManage
mentProtoc
olImpl
1. Start Containers Cmd
2. Stop Containers Cmd
3. Get Container Statuses Cmd
Container Management Protocol
21
API’s for interaction
1. startContainers()
The ApplicationMaster provides a list of StartContainerRequest's to a NodeManager
to start Container's allocated to it using this interface.
2. stopContainers()
The ApplicationMaster requests a NodeManager to stop a list of Container's allocated
to it using this interface.
3. getContainerStatuses()
The API used by the ApplicationMaster to request for current statuses of Container's
from the NodeManager.
22
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
Yarn Log Aggregation
Log Aggregation by Node Manager
Log Aggregation with HPC Scheduler
• Issue an HPC scheduler command to execute in all nodes(where
application tasks executed) as part of
ApplicationMasterProtocol.finishApplicationMaster() for
aggregating the application logs.
23
Shuffle Handling – Hadoop
MRAppMaster
Reduce Task
EventFetcher
Shuffle
Consumer Local
Dirs
6. Read Map O/P
5. Map Completed
4. Get Map
Completion Events Node 1
2. Write Final Map O/P
Map
Task 3. Task Completed
1. Assign Task
Node
Manager
Shuffle Handling – HPC File Systems
24
Map Task MRAppMaster
Reduce Task
EventFetcher
Shuffle
Consumer
Parallel File
System
1. Assign Task
3. Task Completed
2. Write Final Map O/P
6. Read Map O/P
5. Map Completed
4. Get Map
Completion Events
Shuffle Handling
25
<property>
<name>mapreduce.job.map.output.collector.class</name>
<value>org.apache.hadoop.mapred.MapTask$MapOutputBuffer</value>
<description>
The MapOutputCollector implementation(s) to use. This may be a
comma-separated list of class names, in which case the map task will try
to initialize each of the collectors in turn. The first to successfully
initialize will be used.
</description>
</property>
Shuffle Handler
26
<property>
<name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>
<value>org.apache.hadoop.mapreduce.task.reduce.Shuffle</value>
<description>
Name of the class whose instance will be used to send shuffle
requests by reduce tasks of this job. The class must be an instance of
org.apache.hadoop.mapred.ShuffleConsumerPlugin.
</description>
</property>
Shuffle Handling Shuffle Consumer
Summary
HDFS configuration for new File System
HPC Schedulers
YARN Protocols
M/R Shuffle Implementation
Yarn Log Aggregation
27
28
Q & A
Notices and Disclaimers
30
• Copyright © 2014 Intel Corporation. • Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be
claimed as the property of others. See Trademarks on intel.com for full list of Intel trademarks.
• All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps
• Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. • Performance tests are measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
• For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. • Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the
referenced web site and confirm whether referenced data are accurate. • Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to
you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
• Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer.
• No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses.
• You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • The products described may contain design defects or errors known as errata which may cause the product to deviate from
publish.
31