Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Michael Kagan, CTO
HPC Advisory Council Stanford, 2014
The Future of Interconnect Technology
© 2014 Mellanox Technologies 2
Exponential Data Growth – Best Interconnect Required
0.8 Zetabyte
2009
35 Zetabyte
2020
44X
Source: IDC
© 2014 Mellanox Technologies 3
The Power of Data
Data-Intensive Simulations Internet of Things National Security
Healthcare
Smart Cars
Congestion-Free Traffic Business Intelligence
© 2014 Mellanox Technologies 4
Data Must Always Be Accessible and at Real-Time
Compute Storage Archive Sensor Data
Smart Interconnect Required to Unleash The Power of Data
CPU CPU
Lower Latency, Higher Bandwidth, RDMA, Offloads, NIC/Switch Routing, Overlay Networks
© 2014 Mellanox Technologies 5
InfiniBand’s Unsurpassed System Efficiency
TOP500 systems listed according to their efficiency
InfiniBand is the key element responsible for the highest system efficiency
Mellanox delivers efficiencies of up to 96% with InfiniBand
© 2014 Mellanox Technologies 6
FDR InfiniBand Delivers Highest Return on Investment
Higher is better
Higher is better Higher is better
Source: HPC Advisory Council
© 2014 Mellanox Technologies 7
13 Million Financial Transactions Per Day, 4 Billion Database Inserts
Real Time Fraud Detection
235 Supermarkets, 8 States, USA
Reacting to Customers’ Needs in Real Time!
Reducing Data Queries from 20 minutes to 20 seconds
Accuracy, Details, Fast Response
10X Higher Performance, 50% CAPEX Reduction
Microsoft
Bing Maps
Businesses Success Depends on Fast Interconnect
97% Reduction in Database Recovery Time
From 7 Days to 4 Hours! Tier-1 Fortune100 Company
Web 2.0 Application
© 2014 Mellanox Technologies 8
Helping to Make the World a Better Place
SANGER
• Sequence Analysis and Genomics Research
• Genomic Analysis for pediatric cancer patients
Challenge: An individual patient’s RNA analysis took 7 days
Goal: Reduce it to 5 days
InfiniBand reduced the RNA-Sequence data analysis time
for patients to only 1 hour!
Fast interconnect for fighting pediatric cancer
© 2014 Mellanox Technologies 9
Mellanox InfiniBand Paves the Road to Exascale Computing
Accelerating Half of the World’s Petascale Systems Mellanox Connected Petascale System Examples
© 2014 Mellanox Technologies 10
20K InfiniBand nodes
Mellanox end-to-end FDR and QDR InfiniBand
Supports variety of scientific and engineering projects • Coupled atmosphere-ocean models
• Future space vehicle design
• Large-scale dark matter halos and galaxy evolution
NASA Ames Research Center Pleiades
Asian Monsoon Water Cycle
High-Resolution Climate Simulations
© 2014 Mellanox Technologies 11
InfiniBand Enables Lowest Application Cost in the Cloud (Examples)
Microsoft Windows Azure
90.2% Cloud Efficiency
33% Lower Cost per Application
Cloud
Application Performance
Improved up to 10X
3x Increase in VMs per Physical Server
Consolidation of Network and Storage I/O
32% Lower Cost per Application
694% Higher Network Performance
© 2014 Mellanox Technologies 12
Dominant in Storage Interconnects
SMB Direct
Market Leading Performance with RDMA Interconnects
© 2014 Mellanox Technologies 13
Technology Roadmap
2000 2020 2010 2005
20Gbs 40Gbs 56Gbs 100Gbs
“Roadrunner” Mellanox Connected
1st 3rd
TOP500 2003 Virginia Tech (Apple)
2015
200Gbs
Mega Supercomputers
Terascale Petascale Exascale
10Gb/s
© 2014 Mellanox Technologies 14
Architectural Foundation for Exascale Computing
Connect-IB
© 2014 Mellanox Technologies 15
Mellanox Connect-IB The World’s Fastest Adapter
The 7th generation of Mellanox interconnect adapters
World’s first 100Gb/s interconnect adapter (dual-port FDR 56Gb/s InfiniBand)
Delivers 137 million messages per second – 4X higher than competition
Support the new innovative InfiniBand scalable transport – Dynamically Connected
© 2014 Mellanox Technologies 16
Connect-IB Provides Highest Interconnect Throughput
Source: Prof. DK Panda
Connect-IB FDR
(Dual port)
ConnectX-3 FDR
ConnectX-2 QDR
Competition (InfiniBand)
Connect-IB FDR
(Dual port)
ConnectX-3 FDR
ConnectX-2 QDR
Competition (InfiniBand)
Hig
he
r is
Be
tte
r
Gain Your Performance Leadership With Connect-IB Adapters
© 2014 Mellanox Technologies 17
Connect-IB Delivers Highest Application Performance
200% Higher Performance Versus Competition, with Only 32-nodes
Performance Gap Increases with Cluster Size
© 2014 Mellanox Technologies 18
Solutions for MPI/SHMEM/PGAS
Fabric Collective Accelerations
© 2014 Mellanox Technologies 19
Collective algorithms are not topology aware
and can be inefficient
Congestion due to many-to-many
communications
Slow nodes and OS jitter affect scalability and
increase variability
Collective Operation Challenges at Large Scale
Ideal Actual
© 2014 Mellanox Technologies 20
CORE-Direct
• US Department of Energy (DOE) funded project – ORNL and Mellanox
• Adapter-based hardware offloading for collectives operations
• Includes floating-point capability on the adapter for data reductions
• CORE-Direct API is exposed through the Mellanox drivers
FCA
• FCA is a software plug-in package that integrates into available MPIs
• Provides scalable topology aware collective operations
• Utilizes powerful InfiniBand multicast and QOS capabilities
• Integrates CORE-Direct collective hardware offloads
Mellanox Collectives Acceleration Components
© 2014 Mellanox Technologies 21
Minimizing the impact of system noise on applications – critical for scalability
The Effects of System Noise on Applications Performance
Ideal System noise CORE-Direct (Offload)
© 2014 Mellanox Technologies 22
Provide support for overlapping computation and communication
CORE-Direct Enables Computation and Communication Overlap
Synchronous CORE-Direct - Asynchronous
© 2014 Mellanox Technologies 23
Nonblocking Alltoall (Overlap-Wait) Benchmark
CoreDirect Offload
allows Alltoall
benchmark with almost
100% compute
© 2014 Mellanox Technologies 24
Accelerator and GPU Offloads
© 2014 Mellanox Technologies 25
GPUDirect 1.0
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory 1 2
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
2
Transmit Receive
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
Non GPUDirect
GPUDirect 1.0
© 2014 Mellanox Technologies 26
GPUDirect RDMA
Transmit Receive
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
GPUDirect RDMA
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
CPU
GPU Chip
set
GPU Memory
InfiniBand
System
Memory
1
GPUDirect 1.0
© 2014 Mellanox Technologies 27
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 4 16 64 256 1K 4K
Message Size (bytes)
Ban
dw
idth
(M
B/s
)
0
5
10
15
20
25
1 4 16 64 256 1K 4K
Message Size (bytes)
Late
ncy (
us
)
GPU-GPU Internode MPI Latency
Low
er is
Bette
r 67 %
5.49 usec
Performance of MVAPICH2 with GPUDirect RDMA
67% Lower Latency
5X
GPU-GPU Internode MPI Bandwidth
Hig
her
is B
ett
er
5X Increase in Throughput
Source: Prof. DK Panda
© 2014 Mellanox Technologies 28
Execution Time of HSG
(Heisenberg Spin Glass)
Application with 2 GPU Nodes
Source: Prof. DK Panda
Performance of MVAPICH2 with GPU-Direct-RDMA
Problem Size
© 2014 Mellanox Technologies 29
Remote GPU Access through rCUDA
GPU servers GPU as a Service
rCUDA daemon
Network Interface CUDA
Driver + runtime Network Interface
rCUDA library
Application
Client Side Server Side
Application
CUDA
Driver + runtime
CUDA Application
rCUDA provides remote access from
every node to any GPU in the system
CPU VGPU
CPU VGPU
CPU VGPU
GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU
© 2014 Mellanox Technologies 30
rCUDA Performance Comparison
© 2014 Mellanox Technologies 31
Other Developments
© 2014 Mellanox Technologies 32
RDMA Accelerates OpenStack Storage
RDMA Accelerates iSCSI Storage
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Utilizing OpenStack Built-in components/Management - Open-iSCSI, tgt
target, Cinder To accelerate Storage Access
1.3
5.5
0
1
2
3
4
5
6
iSCSI over TCP iSER
GB
yte
s/s
Cinder / Volume Storage Performance *
* iSER patches are available on OpenStack
branch: https://github.com/mellanox/openstack
© 2014 Mellanox Technologies 33
Next Generation Enterprises: The Generation of Open Ethernet
PROPRIETARY
Software
PROPRIETARY
Management
Software
of Choice
Management
of Choice
Freedom to Choose and Create Any Software, Any Management
Enables Vendor / User Differentiation, No Limitations
Open Ethernet
Open Platform Locked Down Vertical Solution
OPEN ETHERNET
Switch
CLOSED ETHERNET
Switch
© 2014 Mellanox Technologies 34
Open Ethernet Solutions – The Freedom to Choose
Open Source
Software
3rd Party
Software
Switch Vendor
Software
Home Grown
Software
© 2014 Mellanox Technologies 35
Futures
Thank You