12
Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Embed Size (px)

Citation preview

Page 1: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Implementing Babel RMI with ARMCI

Jian YinKhushbu AgarwalDaniel ChavarríaManoj Krishnan

Ian GortonVidhya Gurumoorthi

Patrick Nichols

Page 2: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Motivation

Remote Method Invocation provides a useful abstraction for distributed computing

Example: event service for CCA framework

Existing TCP/IP based implementation has performance problemsQuestion: can we speed up Babel RMI with high performance communication protocols

2

Page 3: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Objectives

Demonstrate that it is feasible to build high performance Babel RMI

Prototype a Babel RMI with ARMCI and measure its performance experimentally

Produce a quality implementation of high performance RMI

3

Page 4: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Outline

MotivationObjectivesBackground

Babel RMI

ARMCI

Preliminary performance resultsFuture works

4

Page 5: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Babel RMI

Babel supports Remote Method InvocationTransparent

Flexible

Implemented with extensive code marshalling and runtime libraryExisting TCP/IP based implementation incurs high overhead

Multiple copying

Context switching

5

Page 6: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

TCP RMI Performance

6

Page 7: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

ARMCI

Middleware for remote memory access (RMA)Support many networks and HPC systems

Myrinet, Infiniband, Quadrics, Giganet, …

Cray XT4, XT, X1, IBM BlueGene,…

Efficient

Minimum number of copying

Truly one side communication protocolPut, get, accumulating

Atomic read-modified-write, mutex

Blocking and non-blocking interfaces

7

Page 8: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Experiment Setup

Hardwarecluster with 11 nodes

4 core 2.4 GHz Intel Xeon processor

Infiniband DDR network

SoftwareBabel 1.4.0

ARMCI 1.4

OpenMPI 1.2.6

8

Page 9: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Implementation

Implemented extensive set of functions in the runtime library

InstanceHandle, Server, Invocation, Response, Call, Return, …

Usage Exampleshello_World h = hello_World__createRemote(armcihandler://<process_id>:<mutex_id>, &_ex);

hello_World h2 = hello_World__connect(armcihandler://<process_id>:<mutex_id>/<object_id>&_ex);

9

Page 10: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

ARMCI RMI Performance

10

Page 11: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Next Step

Reduce protocol overheadReduce function call overhead

Reduce copying

Batch RMI CallReduce RDMA overhead

Prefetch in the backgroundPreload libraries

Prefech arguments

11

Page 12: Implementing Babel RMI with ARMCI Jian Yin Khushbu Agarwal Daniel Chavarría Manoj Krishnan Ian Gorton Vidhya Gurumoorthi Patrick Nichols

Where to Use High Performance Babel RMI

Applications for high performance RMIFine grain distribution

Hybrid computing

Suggestions …

12