33
Handling High-Bandwidth Applications in a Modern DC design Alexis Dacquay ([email protected]) Arista

PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

  • Upload
    proidea

  • View
    352

  • Download
    1

Embed Size (px)

DESCRIPTION

Alexis Dacquay – is CCIE with over 10 years experience in the networking industry. He has in the past been designing, deploying, and supporting some large corporate LAN/WAN networks. He has in the last 4 years specialised in high performance datacenter networking to satisfy the needs of cloud providers, web2.0, big data, HPC, HFT, and any other enterprise for which high performing network is critical to their business. Originally from Bretagne, privately a huge fan of polish cuisine. Topic of Presentation: Handling high-bandwidth-consumption applications in a modern DC design Language: English Abstract: Modern Data Centre requires proper handling of high-bandwidth consuming applications, like BigData or IP Storage. To achieve this, next generation Ethernet speeds of 25, 50 and 100Gbps are being pursued. We are to show _why_ these new Ethernet speeds are vital from technology standpoint and _how_ to cope with the those sparkling new requirements by networking hardware enablements. We are to share ethernet switches’ design considerations, with the biggest emphasis put on the importance of big buffers and how they accommodate this bursty traffic. Throughout the presentation we will additionally elaborate on the evolution of variety of modern applications, and how we can handle those with the properly designed hardware, software, and Data Centre itself.

Citation preview

Page 1: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Handling High-Bandwidth Applications in a Modern DC design

Alexis Dacquay ([email protected]) Arista

Page 2: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

BANDWIDTH-HUNGRY APPLICATIONS

Page 3: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Drivers for bandwidth increase ü  Application Clustering

ü  High Density non-blocking scale

ü  ECMP to provide scale and fault tolerance

ü  IP Storage / Big Data and Hadood ü  2 tier active / active with low oversubscription ratio

ü  Dual home or single home server

ü  Distributed traffic, Mesh, anything-anywhere ü  Fan-in, Fan-out

ü  Virtualized Cloud – Scale ü  VXLAN with Equal Cost Multipathing

Page 4: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

•  39.5 Gb/s utilization per 40G Ethernet link, on all 8 simultaneously (=316Gbps)

•  30 GB/s GPFS aggregate throughput, with some disk drawers still unpopulated

•  Low latency, large buffers. Highest performance without tuning on the network

6PB Storage (GPFS)

High density 10/40/100G Ethernet

Compute Workstations

8 x 40G Ethernet

Users

1000+ compute

Replication

Storage – Arista european customer case 40G Ethernet storage

Page 5: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

0

2000

0 1 2 3 4 5 6 7 8 9 10 Time

Eth1

Eth2

Eth3

Non stressful traffic

Buffer limit

Packet segments

Eth 4

0

20

40

60

80

100

120

10 20 30 40 50 60 70 80 90 100

Util

isat

ion

(%)

Time (ms)

Av Throughput

Buffer Usage

Current Throughput

Page 6: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

•  Offers visibility of µburst •  Impact of congestion on latency, drops

•  Trigger-based •  Guaranteed visibility (vs Polls) •  Configurable high/low threshold

Buffering Visibility with LANZ (trigger-based)

High Threshold

Eth8

Eth9

Eth10

Eth1

Eth2

Eth3

Congestion Low

Threshold

Congestion Event

triggered by an Over-threshold

event

Packet buffering on Eth8 queue

due to temp µburst from eth1 and eth2

EOS

Arista 7150S#show queue-monitor length drops Report generated at 2013-01-16 20:48:09 Time Interface TX Drops ----------------------------------------------------------------- 0:02:32.18999 ago Et46 32755054 0:02:35.29710 ago Et46 53552534 0:02:40.29720 ago Et46 53552633

LANZ Agent

Page 7: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

What Causes Congestion? Buffer starvation à TCP collapse

•  Oversubscribed networks with bursts > available bandwidth •  Multiple nodes trying to read/write to one node (e.g.:

Storage) •  Lack of buffers means drops, which result in lower goodput

Data Block

1

2

3

4

StorageServers

SwitchClient

SRU (Server Request Unit)

Page 8: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

0

2000

0 1 2 3 4 5 6 7 8 9 10 Time

Eth1

Eth2

Eth3

Bursty traffic on shallow buffer

Buffer limit

Packet segments

Eth 4

0

20

40

60

80

100

120

140

160

10 20 30 40 50 60 70 80 90 100

Uti

lisat

ion

(%)

Time (ms)

Av Throughput

Buffer Usage

Current Throughput

Page 9: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Time

Bandwidth Utilization 100%

Why are deep buffers required?

Packet Loss Backoff and Slowstart Window Increasing

Greater than 3 second screen paint time will cause you to lose 43% of your customers! Akamai report on page response time

3 second response time 5 second response time

Page 10: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Deep Buffers Matter! … Fairness

•  20 node test with 10 flows per node (200 flows) •  Two tests:

•  4MB buffer •  256MB buffer

Results:

•  Complete fairness with large buffers

•  Small buffers caused confusing flow transmission rates

Page 11: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

0

2000

0 1 2 3 4 5 6 7 8 9 10 Time

Eth1

Eth2

Eth3

Bursty traffic on deep buffer

Buffer limit

Packet segments

Eth 4

0

20

40

60

80

100

120

140

160

10 20 30 40 50 60 70 80 90 100

Util

isat

ion

(%)

Time (ms)

Av Throughput

Buffer Usage

Current Throughput

Page 12: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

BUFFERING: HOW MUCH IS NEEDED ?

Page 13: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Deep Buffers Matter! …Hadoop Test

0

250000

500000

750000

1000000

Packets dropped per TeraGen

1MB 4:11MB 5.33:148MB 5.33:1

Zero!

4x10G ⇒ 4:13x10G ⇒ 5.33:1

...

16 hostsx 10G

16 hostsx 10G

...

1k T

CP

slow

sta

rt/se

c

Page 14: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Buffer Impact to High Performance

•  Use Cases: ü  Optimizing multi-speed: 40à10G , 100Gà10,40G, 10Gà1G ü  Improving uplink contention in mixed speed networks ü  High Density in core/spine (many-to-one, in-cast, fan-in)

Shallow Buffer Deep Buffer

Oversubscription

Goo

dhpu

t

Page 15: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

0 10 20 30 40 50 60 70 80 90

100 110 120 130

2 10 18 26 34 42 50 58 66 74 82 90 98

Buf

fer c

onsu

med

(MB

) per

por

t

th percentile

Buffer Utilization per Port – High Perf Networking

Trident+ ASIC 9MB per 64 ports

Arista 7500E : 125MB Per 10G

Page 16: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

How much Buffer Memory do you need ?

Customer Real Buffer Utilization Observations Max Buffer Used per Port

HPC Storage Cluster – Medium 33 MB

Animation Storage Filer (NFS) 6.2 MB

Software vendor Engineering Build Servers (Perforce) 14.9 MB

Online shopping Hadoop 2K servers – Big Data 52.3 MB

Educational Enterprise Data Center (Virtualization) 52.4 MB

NS3 Network simulations match real-world data showing TCP incast issue - large # of TCP flows create microburst congestion

Rea

l Wor

ld D

ata

Page 17: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

LANZ Revolutionizes Network Visibility

Precision analysis of queues, ports and buffers + congestion capture!

NMS Applications

Switch detects potential congestion

Application reacts to

conditions

NMS Identifies hotspots

7150S! 7150S! 7150S!

7150S!

7150S! 7150S! 7150S! 7150S!

7150S!7150S!

Page 18: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

How  to  catch  microbursts?  LANZ – Trigger-based vs Polling  

poll   poll  SNMP Polling Rate (1/sec)

Average  u6liza6on  based  on  1  second  polling:  

 

0%  

At  10Gbps  1  Second  =  ~30  Million  Packets  !  

•  Microbursts occur in very short periods, micro or even nanoseconds, they are undetectable using standard polling methods.

•  LANZ on 7150, is event-driven, offer real-time visibility of microbursts

Page 19: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

100G

Page 20: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Cloud Data Center 100G Requirements

Customers will only deploy 100G Ethernet in volume once it is cost-competitive, i.e.

100 GigE less or equal to 10 x 10 GigE

Increasing port density choice and transceiver distance will accelerate 100G adoption

Page 21: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

100G Deployment in Data Center

100G Rack to DC Spine

High Performance Storage Mix 1km and 10km

Broadest choice of 100G ports Highest Density for DC Spines Mix and match 40G and 100G

100G Any-scale Pods

10 G

10 G

10 G

100G

100G

10 G

Long Distances

Smaller Footprint Option IEEE LR4 and SR10 Optical interconnect

Metro, Core, Edge Routers Mix and Match SM and MM

100G at the PoP

Interconnecting Data Centers &

POPs

10km

Data Center

Data Center

Small DC

or PoP

Leaf and Spine Mix 10/40/100G 400m reaches

Investment Protection 10G to 40G Server Transition

100G to the ToR

Up to 400m

1/10

G

10 G

40 G

Server/Storage Expansion

Scale Built Spine

Page 22: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

•  Architected to operate at massive network scale •  Designed and Optimized for Virtualization and Cloud •  Energy Efficient •  1,152 10GbE / 288 40GbE / 96 100GbE •  30 Tbps

The Foundation for Virtualized Clouds Arista 7500E Series

Highest Density 10/40/100GbE Switch

Page 23: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Pay As You Grow 100G Deployment Flexibility

Cost Effective MXP Integrated Triple Speed SR10 Optics 10/40/100G 7500E-12CM-LC

7500E-6C2-LC

Flexible short and long reach CFP2 LR4 - 100GbE over 10km / SR10 - 300m

7500E-12CQ-LC High Density QSFP-100G

Broad 10/40/100G QSFP Optics

Dense 100/40/10G � Deep Buffers � Feature Parity � Investment Protection

Page 24: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Wire Speed 10/40/100G with Deep Buffers 7280SE Fixed Configuration Switches

900 Million Packets per Second 1.44 Terabits per second Less than 4usec Latency Ultra deep 9GB packet buffers VOQ architecture for lossless forwarding Wire speed L2 and L3 forwarding 40G and 100G uplinks for HPC and CDN Leaf and Spine 40/100G ECMP and MLAG Integrated SSD for local traffic analysis Reversible airflow and AC / DC Power Options

Page 25: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Flexible Optics: 100G CFP2 & QSFP

QSFP100

•  Smallest form factor transceiver for 40/100GbE

•  Support for IEEE 100G standards – SR, CR, LR •  Interoperable with IEEE compliant 40G and 100G optics

•  Power efficient with only 3.5W/port

•  Low power and size allows for high 100G density

Highest density, lowest cost  

CFP2 •  Hot pluggable transceiver for 100GbE •  Full support for IEEE 100G standards – SR, CR, LR, ER •  Interoperable with IEEE compliant 100G optics •  Half the size of CFP – allows higher density •  Lower power consumption than CFP reducing concerns on

optic cooling

Broad MM and SM choice

Page 26: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

7500E-12CQ Direct Data Center Interconnect

Use Case: Long Distance Single Mode

Interconnecting Data

Centers & POPs

10km

Data Center

Data Center

Small DC or PoP

7280SE-68 Small Data Center/PoP Interconnect

•  Up to 10km reach over Single Mode Fiber •  Connect to optical transport and core routers •  IEEE Standards for multi-vendor interoperability •  Broad range of pluggable CFP2 optics •  Lowest cost solution for cross-site 100GbE

•  QSFP100 drives up to 10km distance •  Provide up to 2x100G bandwidth •  1RU form factor ensures minimal space and very

low power requirements

Page 27: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

SPEED UP : 25G AND 50G ETHERNET

Page 28: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

25G and 50G Ethernet Consortium

•  Founded by Arista, Broadcom, Google, Mellanox & Microsoft

•  25gEthernet.org consortium website

•  An open specification for the new speeds

•  Consortium open to everyone in the industry

Page 29: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Cloud Applications that drive bandwidth 25G

•  Compute/BigData that needs lowest cost per Gbps

•  Servers can push more than 10Gbps but not willing to pay a premium

•  Need same port density as 10G

50G

•  IP Storage

•  2x25G is most cost effective

•  Higher port density than 40G, so single Leaf switch sufficient

•  Easier to scale on NICs too

Arista is leading the industry here

25G and 50G support needed in silicon

Products expected in the next 18 to 36 months on both switches and NICs

Page 30: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Why need another speed? 1G  

10G  

40G:  4x10G  

100G:  4x25G  

•  1G and 10G use single lanes (1 pair) •  40G and 100G use parallel lanes (4 pairs) •  40G and 100G ports need more SerDes, consume more power and reduce port density •  The Cloud needs to hit the sweet-spot of lowest price per Gigabit vs optimal performance

Page 31: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

25G and 50G Ethernet 25G   50G:  2x25G  

•  25G is a single lane specification, just like 10G

•  Leverages IEEE 802.3 ethernet framing

•  Offers 2.5X the speed at a cost structure closer to 10G

•  Same port density & connectors as 10G SFP+

•  50G is dual-lane

•  Offers 1.25X the speed of 40G

•  Cost structure is closer to 2X of 10G

•  2X the port density as 40G using splitter cables from QSFP

Page 32: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

The Sweet Spots

•  25G is a single lane specification, just like 10G

•  Leverages IEEE 802.3 ethernet framing

•  Offers 2.5X the speed at a cost structure closer to 10G

•  50G is dual-lane

•  Offers 1.25X the speed of 40G

•  Cost structure is closer to 2X of 10G

0"

20"

40"

60"

80"

100"

120"

1G" 10G" 25G" 40G" 50G" 100G"

Price&per&Gbps&

Page 33: PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in a modern DC design

Thank You