15
Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

Embed Size (px)

Citation preview

Page 1: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

Presented by

Open MPI on the Cray XT

Richard L. GrahamTech Integration

National Center for Computational Sciences

Page 2: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

2 Graham_OpenMPI_SC07

Why does Open MPI exist?

Maximize all MPI expertise: research/academia, industry, …elsewhere.

Capitalize on (literally) years of MPI research and implementation experience.

The sum is greater than the parts.

Research/academia

Industry

Page 3: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

3 Graham_OpenMPI_SC07

Current membership

14 members, 6 contributors

4 US DOE labs 7 vendors

8 universities 1 individual

Page 4: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

4 Graham_OpenMPI_SC07

Key design feature: Components

Formalized interfaces

Specifies “black box” implementation

Different implementations available at run-time

Can compose different systems on the fly

Interface 3Interface 2Interface 1

CallerCaller

Page 5: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

5 Graham_OpenMPI_SC07

Point-to-point architecture

BTL-GM

MPool-GM

Rcache

BTL-OpenIB

MPool-OpenIB

Rcache

BML-R2

PML-OB1/DR

MPI

MTL-MX(Myrinet)

PML-CM

MTL-Portals

MTL-PSM(QLogic)

Page 6: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

6 Graham_OpenMPI_SC07

Portals port: OB1 vs. CM

OB1 Matching in main-memory

Short message: eager, buffer on receive

Long message: rendezvous

Rendezvous packet: 0 byte payload

Get message after match

CM Matching maybe on NIC

Short message: eager, buffer on receive

Long message: eager Send all data

If Match: deliver directly to user buffer

No Match: discard payload, and get() user data after match

Page 7: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

7 Graham_OpenMPI_SC07

Collective communications component structure

PMLPML

OB

1O

B1

CMCM

DRDR

CR

CPW

CR

CPW

AllocatorAllocator

Bas

icB

asic

Buc

ket

Buc

ket

BTLBTL

TCP

TCP

Shar

ed M

em.

Shar

ed M

em.

Infib

and

Infib

and

MTLMTLM

yrin

et M

XM

yrin

et M

X

Port

als

Port

als

PSM

PSM

TopologyTopology

Bas

icB

asic

Util

ityU

tility

CollectiveCollective

Bas

icB

asic

Tune

dTu

ned

Hie

rarc

hica

lH

iera

rchi

cal

Inte

rcom

m.

Inte

rcom

m.

Shar

ed M

em.

Shar

ed M

em.

Non

-blo

ckin

gN

on-b

lock

ing

I/OI/O

Port

als

Port

als

MPI Component Architecture (MCA)MPI Component Architecture (MCA)

MPI APIMPI API

User applicationUser application

Page 8: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

8 Graham_OpenMPI_SC07

Benchmark ResultsBenchmark Results

Page 9: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

9 Graham_OpenMPI_SC07

NetPipe bandwidth data (MB/sec)

0.0001 0.001 0.01 0.1 1 10 100 1000 10000

Data Size (KBytes)

Open MPI—CM

Open MPI—OB1

Cray MPI

2000

1800

1600

1400

1200

1000

800

600

400

200

0

Ban

dw

idth

(M

Byt

es/s

ec)

Page 10: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

10 Graham_OpenMPI_SC07

Zero byte ping-pong latency

Open MPI—CM 4.91 µsec

Open—OB1 6.16 µsec

Cray MPI 4.78 µsec

Page 11: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

11 Graham_OpenMPI_SC07

VH1—Total runtime

3.5 4.0 4.5 5.0 5.5 6.5 7.0 7.5 8.5

Log 2 Processor Count

250

240

230

220

210

200

VH

-1 W

all

Clo

ck T

ime

(sec

)

8.06.0

Open MPI—CM

Open MPI—OB1

Cray MPI

Page 12: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

12 Graham_OpenMPI_SC07

GTC—Total runtime

1 2 3 4 5 7 8 9 11

Log 2 Processor Count

Open MPI—CM

Open MPI—OB1

Cray MPI

1150

1100

1050

1000

900

800106

950

850

GT

C W

all

Clo

ck T

ime

(sec

)

Page 13: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

13 Graham_OpenMPI_SC07

POP—Step runtime

3 4 5 6 7 8 9 11

Log 2 Processor Count

2048

1024

512

256

128PO

P T

ime

Ste

p W

all

Clo

ck T

ime

(sec

)

10

Open MPI—CM

Open MPI—OB1

Cray MPI

Page 14: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

14 Graham_OpenMPI_SC07

Summary and future directions

Support for XT (Catamount and Compute Node Linux) within standard distribution

Performance (application and micro-benchmarks) comparable to that of Cray MPI

Support for recovery from process failure is being added

Page 15: Presented by Open MPI on the Cray XT Richard L. Graham Tech Integration National Center for Computational Sciences

15 Graham_OpenMPI_SC07

Contact

Richard L. GrahamTech IntegrationNational Center for Computational Sciences(865) [email protected]

www.open-mpi.org

15 Graham_OpenMPI_SC07