Upload
phamkhanh
View
217
Download
0
Embed Size (px)
Citation preview
Presented by
Open MPI on the Cray XT
Richard L. Graham
Tech Integration
National Center for Computational Sciences
2 Graham_OpenMPI_SC07
Why does Open MPI exist?
• Maximize all MPI expertise:
research/academia,
industry,
…elsewhere.
• Capitalize on (literally) years ofMPI research and implementationexperience.
• The sum is greater than the parts.
Research/
academia
Industry
3 Graham_OpenMPI_SC07
Current membership
1 individual8 universities
7 vendors4 US DOE labs
14 members, 6 contributors
4 Graham_OpenMPI_SC07
Key design feature: Components
Formalized interfaces
• Specifies “black box” implementation
• Different implementations available at run-time
• Can compose different systems on the fly
Interface 3Interface 2Interface 1
Caller
5 Graham_OpenMPI_SC07
Point-to-point architecture
BTL-GM
MPool-GM
Rcache
BTL-OpenIB
MPool-OpenIB
Rcache
BML-R2
PML-OB1/DR
MPI
MTL-MX
(Myrinet)
PML-CM
MTL-
Portals
MTL-PSM
(QLogic)
6 Graham_OpenMPI_SC07
Portals port: OB1 vs. CM
OB1
• Matching in main-memory
• Short message: eager,buffer on receive
• Long message: rendezvous
Rendezvous packet:0 byte payload
Get message after match
CM
• Matching maybe on NIC
• Short message: eager,buffer on receive
• Long message: eager
Send all data
• If Match: deliver directlyto user buffer
• No Match: discard payload,and get() user data aftermatch
7 Graham_OpenMPI_SC07
Collective communications componentstructure
PML
OB
1
CM
DR
CR
CP
W
Allocator
Bas
ic
Buc
ket
BTL
TC
P
Sha
red
Mem
.
Infib
and
MTLM
yrin
et M
X
Por
tals
PS
M
Topology
Bas
ic
Util
ity
Collective
Bas
ic
Tun
ed
Hie
rarc
hica
l
Inte
rcom
m.
Sha
red
Mem
.
Non
-blo
ckin
g
I/O
Por
tals
MPI Component Architecture (MCA)
MPI API
User application
9 Graham_OpenMPI_SC07
NetPipe bandwidth data (MB/sec)
0.0001 0.001 0.01 0.1 1 10 100 1000 10000
Data Size (KBytes)
Open MPI—CM
Open MPI—OB1
Cray MPI
2000
1800
1600
1400
1200
1000
800
600
400
200
0
Ban
dw
idth
(M
Byte
s/s
ec)
10 Graham_OpenMPI_SC07
Zero byte ping-pong latency
4.78 secCray MPI
6.16 secOpen—OB1
4.91 secOpen MPI—CM
11 Graham_OpenMPI_SC07
VH1—Total runtime
3.5 4.0 4.5 5.0 5.5 6.5 7.0 7.5 8.5
Log 2 Processor Count
250
240
230
220
210
200
VH
-1 W
all
Clo
ck T
ime (
sec)
8.06.0
Open MPI—CM
Open MPI—OB1
Cray MPI
12 Graham_OpenMPI_SC07
GTC—Total runtime
1 2 3 4 5 7 8 9 11
Log 2 Processor Count
Open MPI—CM
Open MPI—OB1
Cray MPI
1150
1100
1050
1000
900
800106
950
850
GT
C W
all
Clo
ck T
ime (
sec)
13 Graham_OpenMPI_SC07
POP—Step runtime
3 4 5 6 7 8 9 11
Log 2 Processor Count
2048
1024
512
256
128PO
P T
ime S
tep W
all
Clo
ck T
ime (
sec)
10
Open MPI—CM
Open MPI—OB1
Cray MPI
14 Graham_OpenMPI_SC07
Summary and future directions
• Support for XT (Catamount and Compute Node Linux) withinstandard distribution
• Performance (application and micro-benchmarks)comparable to that of Cray MPI
• Support for recovery from process failure is being added
15 Graham_OpenMPI_SC07
Contact
Richard L. Graham
Tech IntegrationNational Center for Computational Sciences(865) [email protected]
www.open-mpi.org
15 Graham_OpenMPI_SC07