Upload
peter-lindsey
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Presented by
Open MPI on the Cray XT
Richard L. GrahamTech Integration
National Center for Computational Sciences
2 Graham_OpenMPI_SC07
Why does Open MPI exist?
Maximize all MPI expertise: research/academia, industry, …elsewhere.
Capitalize on (literally) years of MPI research and implementation experience.
The sum is greater than the parts.
Research/academia
Industry
3 Graham_OpenMPI_SC07
Current membership
14 members, 6 contributors
4 US DOE labs 7 vendors
8 universities 1 individual
4 Graham_OpenMPI_SC07
Key design feature: Components
Formalized interfaces
Specifies “black box” implementation
Different implementations available at run-time
Can compose different systems on the fly
Interface 3Interface 2Interface 1
CallerCaller
5 Graham_OpenMPI_SC07
Point-to-point architecture
BTL-GM
MPool-GM
Rcache
BTL-OpenIB
MPool-OpenIB
Rcache
BML-R2
PML-OB1/DR
MPI
MTL-MX(Myrinet)
PML-CM
MTL-Portals
MTL-PSM(QLogic)
6 Graham_OpenMPI_SC07
Portals port: OB1 vs. CM
OB1 Matching in main-memory
Short message: eager, buffer on receive
Long message: rendezvous
Rendezvous packet: 0 byte payload
Get message after match
CM Matching maybe on NIC
Short message: eager, buffer on receive
Long message: eager Send all data
If Match: deliver directly to user buffer
No Match: discard payload, and get() user data after match
7 Graham_OpenMPI_SC07
Collective communications component structure
PMLPML
OB
1O
B1
CMCM
DRDR
CR
CPW
CR
CPW
AllocatorAllocator
Bas
icB
asic
Buc
ket
Buc
ket
BTLBTL
TCP
TCP
Shar
ed M
em.
Shar
ed M
em.
Infib
and
Infib
and
MTLMTLM
yrin
et M
XM
yrin
et M
X
Port
als
Port
als
PSM
PSM
TopologyTopology
Bas
icB
asic
Util
ityU
tility
CollectiveCollective
Bas
icB
asic
Tune
dTu
ned
Hie
rarc
hica
lH
iera
rchi
cal
Inte
rcom
m.
Inte
rcom
m.
Shar
ed M
em.
Shar
ed M
em.
Non
-blo
ckin
gN
on-b
lock
ing
I/OI/O
Port
als
Port
als
MPI Component Architecture (MCA)MPI Component Architecture (MCA)
MPI APIMPI API
User applicationUser application
8 Graham_OpenMPI_SC07
Benchmark ResultsBenchmark Results
9 Graham_OpenMPI_SC07
NetPipe bandwidth data (MB/sec)
0.0001 0.001 0.01 0.1 1 10 100 1000 10000
Data Size (KBytes)
Open MPI—CM
Open MPI—OB1
Cray MPI
2000
1800
1600
1400
1200
1000
800
600
400
200
0
Ban
dw
idth
(M
Byt
es/s
ec)
10 Graham_OpenMPI_SC07
Zero byte ping-pong latency
Open MPI—CM 4.91 µsec
Open—OB1 6.16 µsec
Cray MPI 4.78 µsec
11 Graham_OpenMPI_SC07
VH1—Total runtime
3.5 4.0 4.5 5.0 5.5 6.5 7.0 7.5 8.5
Log 2 Processor Count
250
240
230
220
210
200
VH
-1 W
all
Clo
ck T
ime
(sec
)
8.06.0
Open MPI—CM
Open MPI—OB1
Cray MPI
12 Graham_OpenMPI_SC07
GTC—Total runtime
1 2 3 4 5 7 8 9 11
Log 2 Processor Count
Open MPI—CM
Open MPI—OB1
Cray MPI
1150
1100
1050
1000
900
800106
950
850
GT
C W
all
Clo
ck T
ime
(sec
)
13 Graham_OpenMPI_SC07
POP—Step runtime
3 4 5 6 7 8 9 11
Log 2 Processor Count
2048
1024
512
256
128PO
P T
ime
Ste
p W
all
Clo
ck T
ime
(sec
)
10
Open MPI—CM
Open MPI—OB1
Cray MPI
14 Graham_OpenMPI_SC07
Summary and future directions
Support for XT (Catamount and Compute Node Linux) within standard distribution
Performance (application and micro-benchmarks) comparable to that of Cray MPI
Support for recovery from process failure is being added
15 Graham_OpenMPI_SC07
Contact
Richard L. GrahamTech IntegrationNational Center for Computational Sciences(865) [email protected]
www.open-mpi.org
15 Graham_OpenMPI_SC07