Mellanox InfiniBand QDR 40Gb/sThe Fabric of Choice for High Performance Computing
Gilad Shainer, [email protected] 2008
Birds of a Feather PresentationBirds of a Feather Presentation
2 Mellanox Technologies
Performance RoadmapGigabits per second
InfiniBand Delivers Ultra Low Latency
InfiniBand Technology Leadership
Industry Standard• Hardware, software, cabling, management• Design for clustering and storage interconnect
Price and Performance• 40Gb/s node-to-node• 120Gb/s switch-to-switch• 1us application latency• Most aggressive roadmap in the industry
Reliable with congestion managementEfficient
• RDMA and Transport Offload• Kernel bypass• CPU focuses on application processing
Scalable for Petascale computing & beyondEnd-to-end quality of serviceVirtualization accelerationI/O consolidation Including storage
3 Mellanox Technologies
InfiniBand in the TOP500
InfiniBand makes the most powerful clusters• 5 of the top 10 (#1,#4, #7, #8, #10) and 49 of the Top100• The leading interconnect for the Top200• InfiniBand clusters responsible for ~40% of the total Top500 performance
InfiniBand enables the most power efficient clustersInfiniBand QDR expected Nov 2008No 10GigE clusters exist on the list
Top500 Interconnect Placement
01020304050607080
1-100 101-200 201-300 301-400 401-500
Top500 Placement
Num
ber o
f Clu
ster
s
InfiniBand All Proprietary High Speed GigE
InfiniBand Clusters - Performance
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Nov 05 June 06 Nov 06 June 07 Nov 07 June 08
Perfo
rmac
e (G
flops
)
InfiniBand Performance
360% CAGR
4 Mellanox Technologies
Mellanox InfiniBand End-to-End Products
High Throughput - 40Gb/sLow latency - 1us Low CPU overhead
Kernel bypassRemote DMA (RDMA)Reliability
Blade/Rack Servers StorageSwitch
ADAPTER ADAPTERSWITCH
Adapter ICs & Cards
Cables
Switch ICsSoftware
End-to-End Validation
Maximum Productivity
5 Mellanox Technologies
ConnectX IB QDR 40Gb/s MPI BandwidthPCIe Gen2
0
1000
20003000
4000
5000
6000
7000
1 4 16 64 256
1024
4096
1638
465
536
2621
4410
48576
4194
304
Bytes
MB
/s
IB QDR Uni-dir IB QDR Bi-dir
ConnectX - Fastest InfiniBand Technology
Performance driven architecture • MPI latency 1us, ~6.5GB/s with 40Gb/s InfiniBand (bi-directional)• MPI message rate of >40 Million/sec
Superior real application performance• Engineering Automotive, oil & gas, financial analysis, etc.
ConnectX IB MPI Latency
0
1
2
3
4
5
1 2 4 8 16 32 64 128 256 512 1024
Bytes
usec
PCIe Gen2 IB QDR Latency
1.07us6.47GB/s
6 Mellanox Technologies
Mellanox ConnectX MPI Latency - Multi-core Scaling
0
2
4
6
1 2 3 4 5 6 7 8# of CPU cores (# of processes)
Late
ncy
(use
c)
Mellanox ConnectX MPI Latency - Multi-core Scaling
0
3
6
9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# of CPU cores (# of processes)
Late
ncy
(use
c)
ConnectX Multi-core MPI Scalability
Scalability to 64+ cores per node, to 20K+ nodes per subnetGuarantees same low latency regardless of the number of cores Guarantees linear scalability for real applications
7 Mellanox Technologies
InfiniScale IV Switch: Unprecedented Scalability
36 40Gb/s or 12 120Gb/s InfiniBand Ports• Adaptive routing and congestion control• Virtual Subnet Partitioning
6X switching and data capacity• Vs. using 24-port 10GigE Ethernet switch devices
4X storage I/O throughput• Critical for backup, snapshot and quickly loading large datasets• Vs. deploying 8Gb/s Fibre Channel SANs
10X lower end-to-end latency performance• Vs. using 10GigE/DCE switches and iWARP-based adapters
3X the server and storage node cluster scalability when building a 3-tier CLOS fabric• Vs. using 24-port 10GigE Ethernet switch devices
8 Mellanox Technologies
Addressing the Needs for Petascale Computing
Faster network streaming propagation• Network speed capabilities• Solution: InfiniBand QDR
Large clusters• Scaling to many nodes, many cores per node• Solution: High density InfiniBand switch
Balanced random network streaming• “One to One” random streaming• Solution: Adaptive routing
Balanced known network streaming• “One to One” known streaming• Solution: Static routing
Un-balanced network streaming• “Many to one” streaming• Solution: Congestion control
Designed to handle all communications in HW
9 Mellanox Technologies
HPC Applications Demand Highest Throughput
Fluent Message Size Profiling
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10[0.
.256]
[257..
1024
][10
25..4
096]
[4097
..163
84]
[1638
5..65
536]
[6553
7..26
2144
][26
2145
..104
8576
]
Message size (Byte)
Tota
l dat
a se
nd (B
yte)
2 Servers 7 Servers
The need for latency
The need for bandwidth
LS-DYNA Profiling
110
1001000
10000100000
100000010000000
1000000001000000000
10000000000
[0..64]
[65..256]
[257..1024
][1025..4
096]
[4097..163
84][16385..6
5536][65537..2
62144]
[262145..1048576]
[1048577..4194304]
[4194305..infin
ity]
MPI message size (Byte)
Tota
l dat
a tra
nsfe
rred
(Byt
e)
16 cores 32 coresThe need for bandwidth
The need for bandwidth
Scalability Mandates Highest Bandwidth
Lowest Latency
10 Mellanox Technologies
HPC Council Advisory
Distinguished HPC alliance (OEMs, IHVs, ISVs, end-users) Members activities• Qualify and optimize HPC solutions • Early access to new technology, and mutual development of future solutions • Explore new opportunities within the HPC market • HPC targeted joint marketing programs
A community effort support center for HPC end-users• Mellanox Cluster Center
Latest InfiniBand and the HPC Advisory Council member technologyDevelopment, testing, benchmarking and optimization environment
• End- user support center - [email protected] details – [email protected]
11 Mellanox Technologies11
Providing advanced, powerful, and stablehigh performance computing solutions