35
Open Fabrics workshop, March 2015 State of libfabric in Open MPI 1 The State of libfabric in Open MPI Jeffrey M. Squyres [email protected] 16 March 2015

The State of libfabric in Open MPI

Embed Size (px)

Citation preview

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 1

The State of libfabric in Open MPI

Jeffrey M. Squyres [email protected]

16 March 2015

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 2

What is the Message Passing Interface (MPI)?

A standards document

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 3

Using MPI

Hardware and software implement the interface in the MPI standard (book)

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 4

MPI implementations

There are many implementations of the MPI standard

Some are closed source

Others are open source

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 5

Open MPI

Open MPI is a free, open source implementation of the MPI standard

www.open-mpi.org

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 6

How I think of MPI

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 7

Server

Server

MPI abstracts away the underlying network

MPI_Send(…) MPI_Recv(…)

Network

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 8

Server

Server

MPI abstracts away the underlying network

MPI_Send(…) MPI_Recv(…)

Network MAGIC

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 9

Server

Server

Open MPI multiplexes to the underlying network stack

MPI_Send(…) MPI_Recv(…)

TCP Shared memory Verbs MXM Portals

SCIF Loopback uGni PSM libfabric

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 10

Two major types of transports

Byte Transport Layer (BTL) plugins

Matching Transport Layer (MTL) plugins

MPI_Send(…)

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 11

BTL

•  Inherently multi-device •  Round-robin for

small messages •  Striping for large messages

•  Major protocol decisions and MPI message matching driven by an Open MPI engine

Byte Transport Layer (BTL) plugins

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 12

Matching Transport Layer (MTL) plugins

MTL

•  Most details hidden by network API

•  MXM •  Portals •  PSM

•  As a side effect, must handle: •  Process loopback •  Server loopback (usually via shared memory)

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 13

BTL and MTL plugins

Byte Transport Layer (BTL) plugins

Matching Transport Layer (MTL) plugins

•  IB / iWarp (verbs) •  Portals •  SCIF •  Shared memory •  TCP •  uGNI •  usNIC (verbs)

•  MXM •  Portals •  PSM

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 14

•  IB / iWarp (verbs) •  Portals •  SCIF •  Shared memory •  TCP •  uGNI •  usNIC

Now featuring 200% more libfabric

Byte Transport Layer (BTL) plugins

Matching Transport Layer (MTL) plugins

•  MXM •  Portals •  PSM •  ofi

libfabric

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 15

Linux linker: fun fact

MPI process

libmpi.so

ofi MTL

libfabric.so

Linker auto loads

dependency

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 16

Linux linker: fun fact

MPI process

libmpi.so

usnic BTL ofi MTL

libfabric.so

Linker does not re-load

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 17

Libfabric-based plugins

libfabric

usnic BTL ofi MTL

•  Cisco developed •  usNIC-specific •  OFI point-to-point / UD •  Tested with usNIC

•  Intel developed •  Provider neutral •  OFI tag matching •  Tested with PSM

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 18

First experiment usnic BTL: verbs à libfabric

verbs bootstrapping

verbs message passing

Can loosely classify the usnic BTL into two parts

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 19

First experiment usnic BTL: verbs à libfabric

verbs bootstrapping

verbs message passing

sideband bootstrapping

1.  Find the corresponding ethX device

2.  Obtain MTU 3.  Open usNIC-specific

configuration options

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 20

First experiment usnic BTL: verbs à libfabric

verbs bootstrapping

verbs message passing

sideband bootstrapping

libfabric bootstrapping

à

libfabric message passing

à

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 21

Comparison results

verbs bootstrapping

verbs message passing

sideband bootstrapping

libfabric bootstrapping

à

libfabric message passing

à Pretty much a 1:1 swap of verbs à libfabric calls

Bootstrapping sequence totally different

libfabric requires no sideband bootstrapping

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 22

Second experiment Two different libfabric usage models

•  For a specific provider §  Ask fi_getinfo() for

prov_name=“usnic” •  Use usNIC extensions

§  Netmask, link speed, IP device name, etc.

•  usNIC-specific error messages

•  For any tag-matching provider

•  No extension use §  100% portable

•  Generic error messages

usnic BTL ofi MTL

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 23

Second experiment Two different libfabric usage models

•  For a specific provider §  Ask fi_getinfo() for

prov_name=“usnic” •  Use usNIC extensions

§  Netmask, link speed, IP device name, etc.

•  usNIC-specific error messages

•  For any tag-matching provider

•  No extension use §  100% portable

•  Generic error messages

usnic BTL ofi MTL

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 24

libfabric performance vs. Linux verbs

1.9

1.95

2

2.05

2.1

2.15

2.2

2.25

2.3

2.35

2.4

0.1 1 10 100

Tim

e (

mic

rose

conds)

Buffer size

Open MPI with usNIC: IMB PingPong Latency

imb-pingpong-ompi-1.8-verbs.outimb-pingpong-ompi-1.8-libfabric.out

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 25

61000

62000

63000

64000

65000

66000

67000

68000

69000

1e+06

Bandw

idth

(m

egabit

s/se

cond)

Buffer size

Open MPI with usNIC: IMB SendRecv Bandwidth

imb-sendrecv-ompi-1.8-verbs.outimb-sendrecv-ompi-1.8-libfabric.out

libfabric performance vs. Linux verbs

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 26

Version roadmap

Git master Main development

v1.8 / Stable release series

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 27

Version roadmap

Git master Main development

v1.8 / Stable release series

Past

v1.9 Feature series

Present Future libfabric

libfabric

libfabric

Dec 2014

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 28

Currently embedding libfabric

•  openmpi-master §  opal

• mca § common

•  libfabric •  include • prov •  src …

Because there is no public libfabric release (yet) Will be removed before Open MPI v1.9 release

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 29

Periodic refresh from libfabric Github

•  openmpi-master §  opal

• mca § common

•  libfabric •  include • prov •  src …

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 30

Periodic refresh from libfabric Github

•  openmpi-master §  opal

• mca § common

•  libfabric •  include • prov •  src …

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 31

Periodic refresh from libfabric Github

•  openmpi-master §  opal

• mca § common

•  libfabric •  include • prov •  src …

Moar new libfabric goodness!

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 32

Can also build against external libfabric

•  openmpi-master §  opal

• mca § common

•  libfabric •  include • prov •  src …

libfabric

(e.g., installed under $HOME, or in /usr, or …)

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 33

Will be the only model in v1.9

•  openmpi-master

libfabric

(e.g., installed under $HOME, or in /usr, or …)

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 34

Feedback loop = good

•  Using libfabric in its (first) intended environment was quite useful § Resulted in libfabric pull requests, minor

changes, etc.

•  Biggest thing missing is the mmunotify functionality § …will file a PR/RFC about this soon

Open Fabrics workshop, March 2015 State of libfabric in Open MPI 35

Questions?