Upload
calvin-owens
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Implementing Babel RMI with ARMCI
Jian YinKhushbu AgarwalDaniel ChavarríaManoj Krishnan
Ian GortonVidhya Gurumoorthi
Patrick Nichols
Motivation
Remote Method Invocation provides a useful abstraction for distributed computing
Example: event service for CCA framework
Existing TCP/IP based implementation has performance problemsQuestion: can we speed up Babel RMI with high performance communication protocols
2
Objectives
Demonstrate that it is feasible to build high performance Babel RMI
Prototype a Babel RMI with ARMCI and measure its performance experimentally
Produce a quality implementation of high performance RMI
3
Outline
MotivationObjectivesBackground
Babel RMI
ARMCI
Preliminary performance resultsFuture works
4
Babel RMI
Babel supports Remote Method InvocationTransparent
Flexible
Implemented with extensive code marshalling and runtime libraryExisting TCP/IP based implementation incurs high overhead
Multiple copying
Context switching
5
TCP RMI Performance
6
ARMCI
Middleware for remote memory access (RMA)Support many networks and HPC systems
Myrinet, Infiniband, Quadrics, Giganet, …
Cray XT4, XT, X1, IBM BlueGene,…
Efficient
Minimum number of copying
Truly one side communication protocolPut, get, accumulating
Atomic read-modified-write, mutex
Blocking and non-blocking interfaces
7
Experiment Setup
Hardwarecluster with 11 nodes
4 core 2.4 GHz Intel Xeon processor
Infiniband DDR network
SoftwareBabel 1.4.0
ARMCI 1.4
OpenMPI 1.2.6
8
Implementation
Implemented extensive set of functions in the runtime library
InstanceHandle, Server, Invocation, Response, Call, Return, …
Usage Exampleshello_World h = hello_World__createRemote(armcihandler://<process_id>:<mutex_id>, &_ex);
hello_World h2 = hello_World__connect(armcihandler://<process_id>:<mutex_id>/<object_id>&_ex);
9
ARMCI RMI Performance
10
Next Step
Reduce protocol overheadReduce function call overhead
Reduce copying
Batch RMI CallReduce RDMA overhead
Prefetch in the backgroundPreload libraries
Prefech arguments
11
Where to Use High Performance Babel RMI
Applications for high performance RMIFine grain distribution
Hybrid computing
Suggestions …
12