16
Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Embed Size (px)

Citation preview

Page 1: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Improving Cluster Interconnect Reliability

Otto LiebatApril 14, 2008

Intel® Connects Cables Technology

Page 2: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

2

Legal Disclaimers Copyright © 2008 Intel Corporation. All rights reserved. Intel and

the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Other names and brands may be claimed as the property of their respective owners.

Page 3: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

3

The Cluster Interconnect Problem

*Depends on BER requirement

24 AWG Copper Cables

Distance Limited @DDR: 8-10 meters* Heavy: 1.2 Kg for a 10 meter cable Bulky: Blocks airflow, affects cooling

Copper cables limit HPC cluster design

Page 4: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

4

Intel® Connects Cables

What Are They ?Electrical

Socket

Optical Transceiver

in Plug

Optical Cable Optical Transceiver

in PlugUp to 100m

Electrical

Socket

A Drop In Replacement for Copper Cables Compatible: Use existing copper sockets Reliable: No user accessible optical interface Low Cost: No separate optical transceivers

*Source: Intel internal testing

Page 5: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

5

Intel® Connects Cables High-performance 20 Gbps Optical Cables

Longer, Faster, Lighter, Thinner, … and More Reliable

Page 6: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

6

Intel® Connects Cables Technology

Highly Reliable: Real World Case StudiesOhio State University SCinet ’07

“One can now put switches and racks where they are needed, rather than within the typical 8 or 10-meter reach of traditional cables. This will let us significantly expand the size of InfiniBand clusters, and that is one of our primary goals”-Dr. DK Panda

Head of the OSU Network-Based Computing Research Group

“The Intel cables were a clear success in our deployment. Every cable worked as intended, and that helped both the OpenFabrics effort and SCinet deliver on our promises.”Doug Fuller

Co-Team Lead

SCinet OpenFabrics Subcommittee, SC ‘07

Page 7: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

7

Simple Cable Latency for Intel® Connects Cables

• Optical/Electrical Conversions = 0.275 nanoseconds each end

• Speed of light through the fiber = 4.99 nanoseconds per meter

• Latency of a 10 meter Intel Connects Cable (ns)– First O/E conversion: 0.275– Speed of of light 10m*4.99 ns/m: 49.9– Second O/E conversion: 0.275 – Total: 50.45 ns

• Latency of a 100 meter Intel Connects Cable (ns)– First O/E conversion: 0.275 – Speed of of light 100m*4.99 ns: 499.0 – Second O/E conversion: 0.275 – Total: 499.45 ns

Note: 1 ns = 1x10-9 sec

Page 8: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

Source: DK Panda, OSU

Page 9: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

9

Effective LatencyFunction of

–Simple Cable Latency

–Bit Error Rate

–The time required to find and fix those bit errors>Many things affect this

•Other physical delays (e.g. passing through switches)

•Where the error is detected

• Is it the Bit Error random, or is there a bad link ?

•Whether the data to be resent is in a buffer or has to be re-accessed from slower media.

• The system and application tolerance for bit errors.

Page 10: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

10

Intel® Connects Cables

Better Quality: At 10m .. at 100m 1 Meter Intel® Connects Cable

1Source: Tektronix Lab Evaluation

Superior signal quality from 1 to 100 meters

10 Meter Intel® Connects Cable

100 Meter Intel® Connects Cable

5 Meter 24 AWG Copper Cable

10 m

1 m 100 m

5m

Page 11: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

11

Extremely low BER for high HPC compute fabric stability

Intel® Connects CablesActual Bit Error Rates May Even Be Lower*

*Note: Specified BER for Intel® Connects Cables is 10-15 **Source: Tektronix Lab Evaluation

10 Meter Intel® Connects Cable** 100 Meter Intel® Connects Cable**

10-25

Page 12: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

12

10-12 BER @20Gbps 10-15 BER @20Gbps

Errors Per Day for a single link

1728 1.7

Errors Per Day For 1000 links

Bit Error Rate at 20 Gbps per link

Intel® Connects Cable BER/day for 1000 links

1000 times less BER than interconnects at 10-12

Intel® Connects Cables

More Reliable: 10-15 BER

10-12 interconnects BER/day for 1000 links

1,7281,728,000

*Source: Intel

Page 13: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

13

Intel® Connects Cables Technology

Highly Reliable: Real World Case Studies

Computational Research Labs Navy Research Laboratory

“Almost all our reliability problems went away when we went with the Intel optical cables.”-Henry D. Dardy, Ph. D.

Chief Scientist, Center for Computational Science, Naval Research Laboratory

“We had to overcome significant reliability issues, but virtually all our reliability problems went away when we went with the Intel optical cables.”-Ashrut Ambastha

EKA Team Member, Computational Research Laboratories

Page 14: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

14

Intel® Connects Cables

Enable Large Cluster Scale Out High data rate: 20 Gbps per cable*

InfiniBand or 10 GbE: CX4 Connector Long distance: Up to 100 meters

Low bit error rate: 10-15 BER

Low conversion latency: 550 picoseconds**

Reduced Installation, Maintenance Less weight: 84% lighter

Less volume, better airflow: 83% smaller

Smaller bend radius: 40% less

Low Electro Magnetic Interference

No ground loops

High-Performance 20 Gbps Optical Cables

*Source: All claims based on Intel Internal testing**Per pair of connectors

Page 15: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

15

Intel® Connects Cables Technology

40 Gbps Ready

Shown with Mellanox ConnectX* 40 Gbps InfiniBand* HCA

*Other names and brands may be claimed as the property of their respective owners.

Page 16: Improving Cluster Interconnect Reliability Otto Liebat April 14, 2008 Intel ® Connects Cables Technology

Revision - 01

16

Intel® Connects Cables Longer, Faster, Lighter, Thinner,

… and More Reliable

For more information visit…www.intelconnects.com*