Upload
della
View
22
Download
0
Embed Size (px)
DESCRIPTION
Mellanox Technologies Maximize World-Class Cluster Performance. April, 2008. Gilad Shainer – Director of Technical Marketing. Mellanox Technologies. Fabless semiconductor supplier founded in 1999 Business in California R&D and Operations in Israel Global sales offices and support - PowerPoint PPT Presentation
Citation preview
CONFIDENTIAL
April, 2008
Mellanox TechnologiesMaximize World-Class Cluster Performance
Gilad Shainer – Director of Technical Marketing
2 Mellanox Confidential
Mellanox Technologies
Fabless semiconductor supplier founded in 1999 • Business in California • R&D and Operations in Israel • Global sales offices and support• 250+ employees worldwide
Leading server and storage interconnect products• InfiniBand and Ethernet leadership
Shipped over 2.8M 10 & 20Gb/s ports as of Dec 2007 $106M raised in Feb 2007 IPO on NASDAQ (MLNX)
• Dual Listed on Tel Aviv Stock Exchange (TASE: MLNX)• Profitable since 2005
Revenues: FY06=$48.5M, FY07=$84.1M• 73% yearly growth• 1Q08 guidance ~$24.8M
Customers include Cisco, Dawning, Dell, Fujitsu, Fujitsu-Siemens, HP, IBM, NEC, NetApp, Sun, Voltaire
3 Mellanox Confidential
Interconnect: A Competitive Advantage
Adapter ICs & Cards Switch ICs Cables
Reference Designs Software End-to-End Validation
Blade/Rack Servers StorageSwitch
ADAPTER ADAPTERSWITCH
Providing end-to-end products
4 Mellanox Confidential
InfiniBand in the TOP500
Number of Clusters in Top500
InfiniBand shows the highest yearly growth of 52% compared to Nov 06• InfiniBand strengthens its leadership as high-speed interconnect of choice• 4+ times the total number of all other high-speed interconnects (1Gb/s+)
InfiniBand 20Gb/s connects 40% of the InfiniBand clusters• Reflects the ever growing performance demands
InfiniBand makes the most powerful clusters• 3 of the top 5 (#3,#4, #5) and 7 of the Top20 (#14, #16, #18, #20)• The leading interconnect for the Top100
5 Mellanox Confidential
InfiniBand Clusters - CPU Count
0
50000
100000
150000
200000
250000
300000
350000
400000
Nov 05 June 06 Nov 06 June 07 Nov 07
Nu
mb
er
of
CP
Us
InfiniBand CPUs
InfiniBand Clusters - Performance
0
500000
1000000
1500000
2000000
2500000
Nov 05 June 06 Nov 06 June 07 Nov 07
Pe
rfo
rma
ce
(G
flo
ps
)
InfiniBand Performance
TOP500 - InfiniBand Performance Trends
180% CAGR 220% CAGR
InfiniBand In The Top500
0
20
40
60
80
100
120
140
Nov 05 Nov 06 Nov 07
Nu
mb
er
of
Clu
ste
rs
IB SDR IB DDR
Ever growing demand for compute resources
Explosive growth of IB 20Gb/s IB 40Gb/s anticipated in Nov08 list
InfiniBand the optimized Interconnects for multi-core environments Maximum Scalability, Efficiency and Performance
6 Mellanox Confidential
ConnectX: Leading IB and 10GigE Adapters
InfiniBand Switch Ethernet Switch
10/20/40 Gb/s IB Adapter 10GigE Adapter
10/20/40 Gb/s
PCIe Gen 1/2
Dual Port
Hardware IOV
Dual Port
PCIe Gen1/2
Hardware IOV
FCoE, CEE
Server and Storage Interconnect • Highest InfiniBand and 10GigE performance
Single chip or slot optimizes cost, power, footprint, reliability One device for 40Gb/s IB, FCoIB, 10GigE, CEE, FCoE
• One SW stack for offload, virtualization, RDMA, storage
7 Mellanox Confidential
ConnectX Multi-core MPI Scalability
Mellanox ConnectX MPI Latency - Multi-core Scaling
0
5
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# of CPU cores (parallel processes)L
ate
nc
y (
us
ec
)
ConnectX IB QDR versus DDR Bandwidth (PCIe Gen2)
0500
100015002000250030003500400045005000550060006500
2 8 32 128
512
2048
8192
3276
8
1310
72
5242
88
2097
152
Bytes
MB
/s
IB Write BW DDR PCIe Gen2 IB Write BiBW DDR PCIe Gen2
IB Write BW QDR PCIe Gen2 IB Write BiBW QDR PCIe Gen2
Scalability to 64+ cores per node Scalability to 20K+ nodes per cluster Guarantees same low latency regardless of the number of cores Guarantees linear scalability for real applications
8 Mellanox Confidential
ConnectX EN 10GigE Performance
World leading 10GigE performance TCP, UDP, CPU utilization
Performance testing on same HW platformBandwidth with MTU of 9600B, CPU utilization for Rx
UDP Bandwidth Comparison
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 4 8 16 32 64 128 256
Message Size (Bytes)
Ba
nd
wid
th (
Gb
/s)
Vendor A Vendor B Mellanox ConnectX EN
10GigE TCP CPU Utilization (%)
0
5
10
15
20
25
30
35
64 128 256 512 1024 2048 4096 8192 16384 32768
Bytes
%
Vendor A Vendor A - TOE Vendor B Mellanox ConnectX EN
10GigE TCP Bandwidth (Gb/s)
0
2
4
6
8
10
12
64 128 256 512 1024 2048 4096 8192 16384 32768
Bytes
Ba
nd
wid
th (
Gb
/s)
Vendor A Vendor A - TOE Vendor B Mellanox ConnectX EN
9 Mellanox Confidential
~3 terabits per second of
switching capacity in a single silicon
device!
InfiniScale IV: Unprecedented Scalability
Up to 36 40Gb/s or 12 120Gb/s InfiniBand Ports 60-70ns switching latency Adaptive routing and congestion control Systems available latter part of 2008
10 Mellanox Confidential
InfiniBand Address the Needs for Petascale Computing
Balanced random network streaming• “One to One” random streaming• Solution: Dynamic routing (InfiniScale IV)
Balanced known network streaming• “One to One” known streaming• Solution: Static routing (Now)
Un-balanced network streaming• “Many to one” streaming• Solution: Congestion control (Now)
Faster network streaming propagation• Network speed capabilities• Solution: InfiniBand QDR (InfiniScale IV)
40/80/120Gb/s
IB designed to handle all communications in HW
11 Mellanox Confidential
Hardware Congestion Control
Congestion spots catastrophic loss of throughput• Old techniques are not adequate today
Over-provisioning – applications demands high throughput Algorithmic predictability – virtualization drives multiple
simultaneously algorithms InfiniBand HW congestion control
• No a priori network assumptions needed• Automatic hot spots discovery• Data traffics adjustments• No bandwidth oscillation or other stability side effects
Ensures maximum effective bandwidth
Simulation results32-port 3 stage fat-tree network
High input load, large hotpot degree
“Solving Hot Spot Contention Using InfiniBand Architecture Congestion ControlIBM Research; IBM Systems and Technology Group; Technical University of Valencia, Spain
Before congestion control
After congestion control
Switch
HCA HCA
IPDIndex
Timer
Packet Packet
BECN
FECN
BECN+
- Switch
HCA HCA
IPDIndex
Timer
Packet Packet
BECN
FECN
BECN+
-
12 Mellanox Confidential
Simulation model (Mellanox): 972 nodes cases, Hot Spot traffic
Hot Spot Traffic - Average Performance
0.00E+00
2.00E+08
4.00E+08
6.00E+08
8.00E+08
1.00E+09
0 50 100 150 200 250 300
Message Size (kB)
Ave
rag
e O
ffer
ed L
oad
(B
ps)
Static - 972Nodes Adaptive - 972Nodes
InfiniBand Adaptive Routing
Fast path modifications No overhead throughput Maximum flexibility for routing
algorithms• Random based decision• Least loaded based decision• Greedy Random based solution
Least loaded out of random set
Maximize “One to One” random traffic network efficiency Dynamically re-routes traffic to alleviate congested ports
13 Mellanox Confidential
InfiniBand QDR 40Gb/s Technology
Superior performance for HPC applications• Highest bandwidth, 1us node-to-node latency• Low CPU overhead, MPI offloads• Designed for current and future multi-core environments
Addresses the needs for Petascale Computing• Adaptive routing, congestion control, large-scale switching• Fast network streaming propagation
Consolidated I/O for server and storage• Optimize cost and reduce power consumption
Maximizing cluster productivity• Efficiency, scalability and reliability
14 Mellanox Confidential
Commercial HPC Demands High Bandwidth
4-node InfiniBand cluster demonstrates higher performance versus any cluster size with GigE
LS-DYNA Productivity
11%
1034%
199%
39%
89%
0
20
40
60
80
100
120
140
160
180
200
8 16 20 32 64
Number of cores
Jo
bs
pe
r d
ay
0%
150%
300%
450%
600%
750%
900%
1050%
1200%
1350%
1500%
% d
iffe
ren
ce
be
twe
en
IB
an
d G
igE
InfiniBand GigE % Difference
LS-DYNA Profiling
110
1001000
10000100000
100000010000000
1000000001000000000
10000000000
MPI message size (Byte)
To
tal d
ata
tra
ns
ferr
ed
(B
yte
)
16 cores 32 cores
Scalability mandates InfiniBand QDR and beyond
15 Mellanox Confidential
Mellanox Cluster Center
http://www.mellanox.com/applications/clustercenter.php
Neptune cluster • 32 nodes • Dual core AMD Opteron CPUs
Helios cluster • 32 nodes• Quad core Intel Clovertown CPUs
Vulcan cluster • 32 nodes• Quad core AMD Barcelona CPUs
Utilizing “Fat Tree” network architecture (CBB)• Non-blocking switch topology • Non-blocking bandwidth
InfiniBand 20Gb/s (40Gb/s May 2008) InfiniBand based storage
• NFS over RDMA, SRP• GlusterFS cluster file system (Z Research)
16 Mellanox Confidential
Summary
Market-wide adoption of InfiniBand • Servers/blades, storage and switch systems• Data Centers, High-Performance Computing, Embedded• Performance, Price, Power, Reliable, Efficient, Scalable• Mature software ecosystem
4th Generation adapter extends connectivity to IB and Eth• Market leading performance, capabilities and flexibility• Multiple 1000+ node clusters already deployed
4th Generation switch available May08• 40Gb/s server and storage connections, 120Gb/s switch to switch links• 60-70nsec latency, 36 ports in a single switch chip, 3 Terabits/second
Driving key trends in the market• Clustering/blades, low-latency, I/O consolidation, multi-core CPUs and
virtualization
18 Mellanox Confidential
Accelerating HPC Applications
PAM-CRASH
0
5000
10000
15000
20000
25000
30000
35000
40000
16 CPUs 32 CPUs 64 CPUs
Ela
ps
ed
Tim
e (
se
c)
GigE InfiniBandLower is better
LS-DYNA neon_refined_revised
50
100
150
8 16
Number of Servers
Ra
tin
g
Mellanox InfiniHost III Ex SDR QLogic InfiniPath SDR
100
120
140
160
180
200
Tim
e (s
ec)
Vector Distribution Benchmark
GigE InfiniBandLower is better
57%
100
120
140
160
180
200
Tim
e (s
ec)
Vector Distribution Benchmark
GigE InfiniBandLower is better
57%
Schlumberger ECLIPSE
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
16 32 64
# of Servers
Ela
ps
ed
Ru
nti
me
(S
ec
on
ds
)
Myrinet InfiniBand
Lower is better