Upload
norman-shields
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
100 Gb/s InfiniBand Transport over up to 100 km
Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009, Málaga, June 2009
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.2
Agenda
InfiniBand in Data Centers
InfiniBand Distance Transport
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.3
InfiniBand in Data Centers
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.4
Connectivity performance
Bandwidth requirements follow Moore’s Law (# transistors on a chip) So far both, Ethernet and InfiniBand outperform Moore’s growth rate
Adapted from: Ishida, O., “Toward Terabit LAN/WAN” Panel, iGRID2005
Moore’s Law
Doubles every 18m
WDM
FC
Ethernet
InfiniBand
Fib
er L
ink
Cap
acity
[b
/s]
100M
1G
10G
100G
1T
10T
1990 1995 2000 2005 2010
2008 2009 2010 2011
10
20
40
80
160
320
640
QDRx1
QDRx4
QDRx12
EDRx1
EDRx4
EDRx12
HDRx1
HDRx4
HDRx12
Time
Ban
dwid
th p
er D
irect
ion
[Gb/
s]
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.5
InfiniBand Data Rates
InfiniBand IBx1 IBx4 IBx12
Single Data Rate, SDR 2.5 Gb/s 10 Gb/s 30 Gb/s
Double Data Rate, DDR 5 Gb/s 20 Gb/s 60 Gb/s
Quad Data Rate, QDR 10 Gb/s 40 Gb/s 120 Gb/s
IB uses 8B/10B coding, e.g., IBx1 DDR has 4 Gb/s throughput
Copper
Serial (x1, not much seen on the market)
Parallel copper cables (x4, x12)
Fiber Optic
Serial for x1 and SDR x4 LX (serialized I/F)
Parallel for x4, x12
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.6
Converged Architectures
SRP – SCSI RDMA Protocol
Latency
TCP
iSCSI
lossy
FCIP
FCIP
lossy
FCP
iFCP
lossy
FCoE
DCB
lossless
Operating System / Application
Small Computer System Interface (SCSI)
InfiniBand
lossless
Performance
IBEthernet Ethernet Ethernet DCB
IPIPIP
TCPTCP
iFCP
FCP FCPiSCSI SRP
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.7
HPC Networks today
ServerCluster
Typical HPC Data Center today
Dedicated networks / technologies for LAN, SAN, CPU (server) interconnect
Consolidation required (management complexity, cables, cost, power)
FC and GbE HBAs and IB HCAs
FC
FC
FCEthernet LANFC SAN
FC
IB
Eth FC
IB
Eth
FC
IB
Eth FC
IB
Eth
Relevant Parameters LAN HBA based on GbE/10GbE SAN HBAs based on 4G/8G-FC HCAs based on IBx4 DDR/QDR
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.8
InfiniBand Distance Transport
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.9
Generic NREN
DC
DC
DC
DC
DC
DC
DC
DC
DC
Large, dispersed Metro Campus, orCluster of Campuses
DC
Core (Backbone) Router
Large Data Center
Layer-2 Switch
OXC / ROADM
Connection toBackbone (NREN)
Dedicated (P2P) Connection to large Data Centers
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.10
InfiniBand-over-DistanceDifficulties and solution considerations
Technical difficulties:
IB-over-copper – limited distance (<15 m)
IB-to-XYZ conversion – high latency
No IB buffer credits in today’s switches for distance transport
High-speed serialization and E-O conversion needed
Requirements:
Lowest latency, hence highest throughput is a must
Interworking must be demonstrated
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.11
InfiniBand Flow Control
InfiniBand is credit-based per virtual lane (16)
On initialization, each fabric end-point declares its capacity to receive data
This capacity is described as its buffer credit
As buffers are freed up, end points post messages updating their credit status
InfiniBand flow control happens before transmission, not after it – lossless transport
Optimized for short signal flight time; small buffers are used inside the ICs: Limits effective range to ~300 m
From System Memory
Across IB Link
Into SystemMemoryUpdate Credit1
HCA A HCA B
2
3
4
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.12
InfiniBand Throughput vs. Distance
Only sufficient Buffer-to-Buffer credits (B2B credits) in conjunction with error-free
optical transport can ensure maximum InfiniBand performance over distance
Throughput drops significantly after several 10 m w/o additional B2B credits, this is
caused by an inability to keep the pipe full by restoring receive credits fast enough
Buffer credit size depends directly on desired distance
Thro
ugh
put
Distance
w/o B2B Credits
With B2B Credits
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.13
DC
DC
InfiniBand-over-Distance Transport
Point-to-point
Typically, <100 km, but can be extended to any arbitrary distance
Low latency (distance!)
Transparent infrastructure (should support other protocols)
LAN
IB HCAs
IB SF
CPU/Server ClusterIB IB
FC
FC
FCSAN
LAN
FC
FC
FCSAN
WDM
WDM…
…
80 x 10G DWDM (redundant)
Gateway NREN10GbE…100GbE
IB SF – InfiniBand Switch Fabric
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.14
IB Transport Demonstrator Results
SendRecV Throughput vs. Message Length
0.2
0.4
0.6
0.8
1
00 1000 2000 3000 4000
Message Length [kB]
Th
rou
gh
put
[GB
/s]
0.4 km
25.4 km
50.4 km
75.4 km
100.4 km
SendRecV Throughput vs. Distance
0.2
0.4
0.6
0.8
1
00 20 40 60 80 100
Distance [km]
Th
rou
gh
put
[GB
/s]
32 kB
128 kB
512 kB
4096 kB
N x 10G InfiniBand Transport over >50 km Distance demonstrated
ADVA FSP 3000 DWDM Up to 80 x 10Gb/s transponders <100 ns latency per transponder Max. reach 200/2000 km
Obsidian Campus C100 4x SDR copper to serial 10G optical 840 ns port-to-port latency Buffer Credits for up to 100 km
(test equipment ready for 50 km)
B2B Credits
SerDes
… 80 x 10G DWDM… DWDM DWDM …
B2B Credits
SerDes
…
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.15
WCA-PC-10G WDM Transponder
Bit rates: 4.25 / 5.0 / 8.5 / 10.0 / 10.3 / 9.95 / 10.5 Gb/s
Applications: IBx1 DDR/QDR, IBx4 SDR, 10GbE WAN/LAN PHY, 4G-/8G-/10G-FC
Dispersion tolerance: up to 100 km w/o compensation
Wavelengths: DWDM (80 channels) and CWDM (4 channels)
Client port: 1 x XFP (850 nm MM, or 1310/1550 nm SM)
Latency <100 ns
Solution Components
Campus C100 InfiniBand Reach Extender
Optical bit rate 10.3 Gb/s (850 nm MM, 1310/1550 nm SM)
InfiniBand bit rate 8 Gb/s (4x SDR v1.2 compliant port)
Buffer credit range up to 100 km (depending on model)
InfiniBand node type: 2-port switch
Small-packet port-to-port latency: 840 ns
Packet forwarding rate: 20 Mp/s
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.16
FSP 3000 DWDM System (~100 km, dual-ended)
Chassis, PSUs, Controllers
10G DWDM Modules
Optics (Filters, Amplifiers)
Sum (budgetary)
Solution 8x10G InfiniBand Transport
~€10.000,-
~€100.000,-
~€10.000,-
~€120.000,-
16 x Campus C100 (100 km)
System total (budgetary)
~€300.000,-
~€420.000,-
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.17
An Example…
NASA's largest supercomputer uses 16 Longbow C102 devices to span two buildings, 1.5 km apart, at a link speed of 80 Gb/s and a memory-to-memory latency of just 10 µs.
Thank you
IMPORTANT NOTICE
The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly prohibited.
The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages, alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation.
Copyright © for the entire content of this presentation: ADVA Optical Networking.