Upload
proidea
View
352
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Alexis Dacquay – is CCIE with over 10 years experience in the networking industry. He has in the past been designing, deploying, and supporting some large corporate LAN/WAN networks. He has in the last 4 years specialised in high performance datacenter networking to satisfy the needs of cloud providers, web2.0, big data, HPC, HFT, and any other enterprise for which high performing network is critical to their business. Originally from Bretagne, privately a huge fan of polish cuisine. Topic of Presentation: Handling high-bandwidth-consumption applications in a modern DC design Language: English Abstract: Modern Data Centre requires proper handling of high-bandwidth consuming applications, like BigData or IP Storage. To achieve this, next generation Ethernet speeds of 25, 50 and 100Gbps are being pursued. We are to show _why_ these new Ethernet speeds are vital from technology standpoint and _how_ to cope with the those sparkling new requirements by networking hardware enablements. We are to share ethernet switches’ design considerations, with the biggest emphasis put on the importance of big buffers and how they accommodate this bursty traffic. Throughout the presentation we will additionally elaborate on the evolution of variety of modern applications, and how we can handle those with the properly designed hardware, software, and Data Centre itself.
Citation preview
Handling High-Bandwidth Applications in a Modern DC design
Alexis Dacquay ([email protected]) Arista
BANDWIDTH-HUNGRY APPLICATIONS
Drivers for bandwidth increase ü Application Clustering
ü High Density non-blocking scale
ü ECMP to provide scale and fault tolerance
ü IP Storage / Big Data and Hadood ü 2 tier active / active with low oversubscription ratio
ü Dual home or single home server
ü Distributed traffic, Mesh, anything-anywhere ü Fan-in, Fan-out
ü Virtualized Cloud – Scale ü VXLAN with Equal Cost Multipathing
• 39.5 Gb/s utilization per 40G Ethernet link, on all 8 simultaneously (=316Gbps)
• 30 GB/s GPFS aggregate throughput, with some disk drawers still unpopulated
• Low latency, large buffers. Highest performance without tuning on the network
6PB Storage (GPFS)
High density 10/40/100G Ethernet
Compute Workstations
8 x 40G Ethernet
Users
1000+ compute
Replication
Storage – Arista european customer case 40G Ethernet storage
0
2000
0 1 2 3 4 5 6 7 8 9 10 Time
Eth1
Eth2
Eth3
Non stressful traffic
Buffer limit
Packet segments
Eth 4
0
20
40
60
80
100
120
10 20 30 40 50 60 70 80 90 100
Util
isat
ion
(%)
Time (ms)
Av Throughput
Buffer Usage
Current Throughput
• Offers visibility of µburst • Impact of congestion on latency, drops
• Trigger-based • Guaranteed visibility (vs Polls) • Configurable high/low threshold
Buffering Visibility with LANZ (trigger-based)
High Threshold
Eth8
Eth9
Eth10
Eth1
Eth2
Eth3
Congestion Low
Threshold
Congestion Event
triggered by an Over-threshold
event
Packet buffering on Eth8 queue
due to temp µburst from eth1 and eth2
EOS
Arista 7150S#show queue-monitor length drops Report generated at 2013-01-16 20:48:09 Time Interface TX Drops ----------------------------------------------------------------- 0:02:32.18999 ago Et46 32755054 0:02:35.29710 ago Et46 53552534 0:02:40.29720 ago Et46 53552633
LANZ Agent
What Causes Congestion? Buffer starvation à TCP collapse
• Oversubscribed networks with bursts > available bandwidth • Multiple nodes trying to read/write to one node (e.g.:
Storage) • Lack of buffers means drops, which result in lower goodput
Data Block
1
2
3
4
StorageServers
SwitchClient
SRU (Server Request Unit)
0
2000
0 1 2 3 4 5 6 7 8 9 10 Time
Eth1
Eth2
Eth3
Bursty traffic on shallow buffer
Buffer limit
Packet segments
Eth 4
0
20
40
60
80
100
120
140
160
10 20 30 40 50 60 70 80 90 100
Uti
lisat
ion
(%)
Time (ms)
Av Throughput
Buffer Usage
Current Throughput
Time
Bandwidth Utilization 100%
Why are deep buffers required?
Packet Loss Backoff and Slowstart Window Increasing
Greater than 3 second screen paint time will cause you to lose 43% of your customers! Akamai report on page response time
3 second response time 5 second response time
Deep Buffers Matter! … Fairness
• 20 node test with 10 flows per node (200 flows) • Two tests:
• 4MB buffer • 256MB buffer
Results:
• Complete fairness with large buffers
• Small buffers caused confusing flow transmission rates
0
2000
0 1 2 3 4 5 6 7 8 9 10 Time
Eth1
Eth2
Eth3
Bursty traffic on deep buffer
Buffer limit
Packet segments
Eth 4
0
20
40
60
80
100
120
140
160
10 20 30 40 50 60 70 80 90 100
Util
isat
ion
(%)
Time (ms)
Av Throughput
Buffer Usage
Current Throughput
BUFFERING: HOW MUCH IS NEEDED ?
Deep Buffers Matter! …Hadoop Test
0
250000
500000
750000
1000000
Packets dropped per TeraGen
1MB 4:11MB 5.33:148MB 5.33:1
Zero!
4x10G ⇒ 4:13x10G ⇒ 5.33:1
...
16 hostsx 10G
16 hostsx 10G
...
1k T
CP
slow
sta
rt/se
c
Buffer Impact to High Performance
• Use Cases: ü Optimizing multi-speed: 40à10G , 100Gà10,40G, 10Gà1G ü Improving uplink contention in mixed speed networks ü High Density in core/spine (many-to-one, in-cast, fan-in)
Shallow Buffer Deep Buffer
Oversubscription
Goo
dhpu
t
0 10 20 30 40 50 60 70 80 90
100 110 120 130
2 10 18 26 34 42 50 58 66 74 82 90 98
Buf
fer c
onsu
med
(MB
) per
por
t
th percentile
Buffer Utilization per Port – High Perf Networking
Trident+ ASIC 9MB per 64 ports
Arista 7500E : 125MB Per 10G
How much Buffer Memory do you need ?
Customer Real Buffer Utilization Observations Max Buffer Used per Port
HPC Storage Cluster – Medium 33 MB
Animation Storage Filer (NFS) 6.2 MB
Software vendor Engineering Build Servers (Perforce) 14.9 MB
Online shopping Hadoop 2K servers – Big Data 52.3 MB
Educational Enterprise Data Center (Virtualization) 52.4 MB
NS3 Network simulations match real-world data showing TCP incast issue - large # of TCP flows create microburst congestion
Rea
l Wor
ld D
ata
LANZ Revolutionizes Network Visibility
Precision analysis of queues, ports and buffers + congestion capture!
NMS Applications
Switch detects potential congestion
Application reacts to
conditions
NMS Identifies hotspots
7150S! 7150S! 7150S!
7150S!
7150S! 7150S! 7150S! 7150S!
7150S!7150S!
How to catch microbursts? LANZ – Trigger-based vs Polling
poll poll SNMP Polling Rate (1/sec)
Average u6liza6on based on 1 second polling:
0%
At 10Gbps 1 Second = ~30 Million Packets !
• Microbursts occur in very short periods, micro or even nanoseconds, they are undetectable using standard polling methods.
• LANZ on 7150, is event-driven, offer real-time visibility of microbursts
100G
Cloud Data Center 100G Requirements
Customers will only deploy 100G Ethernet in volume once it is cost-competitive, i.e.
100 GigE less or equal to 10 x 10 GigE
Increasing port density choice and transceiver distance will accelerate 100G adoption
100G Deployment in Data Center
100G Rack to DC Spine
High Performance Storage Mix 1km and 10km
Broadest choice of 100G ports Highest Density for DC Spines Mix and match 40G and 100G
100G Any-scale Pods
10 G
10 G
10 G
100G
100G
10 G
Long Distances
Smaller Footprint Option IEEE LR4 and SR10 Optical interconnect
Metro, Core, Edge Routers Mix and Match SM and MM
100G at the PoP
Interconnecting Data Centers &
POPs
10km
Data Center
Data Center
Small DC
or PoP
Leaf and Spine Mix 10/40/100G 400m reaches
Investment Protection 10G to 40G Server Transition
100G to the ToR
Up to 400m
1/10
G
10 G
40 G
Server/Storage Expansion
Scale Built Spine
• Architected to operate at massive network scale • Designed and Optimized for Virtualization and Cloud • Energy Efficient • 1,152 10GbE / 288 40GbE / 96 100GbE • 30 Tbps
The Foundation for Virtualized Clouds Arista 7500E Series
Highest Density 10/40/100GbE Switch
Pay As You Grow 100G Deployment Flexibility
Cost Effective MXP Integrated Triple Speed SR10 Optics 10/40/100G 7500E-12CM-LC
7500E-6C2-LC
Flexible short and long reach CFP2 LR4 - 100GbE over 10km / SR10 - 300m
7500E-12CQ-LC High Density QSFP-100G
Broad 10/40/100G QSFP Optics
Dense 100/40/10G � Deep Buffers � Feature Parity � Investment Protection
Wire Speed 10/40/100G with Deep Buffers 7280SE Fixed Configuration Switches
900 Million Packets per Second 1.44 Terabits per second Less than 4usec Latency Ultra deep 9GB packet buffers VOQ architecture for lossless forwarding Wire speed L2 and L3 forwarding 40G and 100G uplinks for HPC and CDN Leaf and Spine 40/100G ECMP and MLAG Integrated SSD for local traffic analysis Reversible airflow and AC / DC Power Options
Flexible Optics: 100G CFP2 & QSFP
QSFP100
• Smallest form factor transceiver for 40/100GbE
• Support for IEEE 100G standards – SR, CR, LR • Interoperable with IEEE compliant 40G and 100G optics
• Power efficient with only 3.5W/port
• Low power and size allows for high 100G density
Highest density, lowest cost
CFP2 • Hot pluggable transceiver for 100GbE • Full support for IEEE 100G standards – SR, CR, LR, ER • Interoperable with IEEE compliant 100G optics • Half the size of CFP – allows higher density • Lower power consumption than CFP reducing concerns on
optic cooling
Broad MM and SM choice
7500E-12CQ Direct Data Center Interconnect
Use Case: Long Distance Single Mode
Interconnecting Data
Centers & POPs
10km
Data Center
Data Center
Small DC or PoP
7280SE-68 Small Data Center/PoP Interconnect
• Up to 10km reach over Single Mode Fiber • Connect to optical transport and core routers • IEEE Standards for multi-vendor interoperability • Broad range of pluggable CFP2 optics • Lowest cost solution for cross-site 100GbE
• QSFP100 drives up to 10km distance • Provide up to 2x100G bandwidth • 1RU form factor ensures minimal space and very
low power requirements
SPEED UP : 25G AND 50G ETHERNET
25G and 50G Ethernet Consortium
• Founded by Arista, Broadcom, Google, Mellanox & Microsoft
• 25gEthernet.org consortium website
• An open specification for the new speeds
• Consortium open to everyone in the industry
Cloud Applications that drive bandwidth 25G
• Compute/BigData that needs lowest cost per Gbps
• Servers can push more than 10Gbps but not willing to pay a premium
• Need same port density as 10G
50G
• IP Storage
• 2x25G is most cost effective
• Higher port density than 40G, so single Leaf switch sufficient
• Easier to scale on NICs too
Arista is leading the industry here
25G and 50G support needed in silicon
Products expected in the next 18 to 36 months on both switches and NICs
Why need another speed? 1G
10G
40G: 4x10G
100G: 4x25G
• 1G and 10G use single lanes (1 pair) • 40G and 100G use parallel lanes (4 pairs) • 40G and 100G ports need more SerDes, consume more power and reduce port density • The Cloud needs to hit the sweet-spot of lowest price per Gigabit vs optimal performance
25G and 50G Ethernet 25G 50G: 2x25G
• 25G is a single lane specification, just like 10G
• Leverages IEEE 802.3 ethernet framing
• Offers 2.5X the speed at a cost structure closer to 10G
• Same port density & connectors as 10G SFP+
• 50G is dual-lane
• Offers 1.25X the speed of 40G
• Cost structure is closer to 2X of 10G
• 2X the port density as 40G using splitter cables from QSFP
The Sweet Spots
• 25G is a single lane specification, just like 10G
• Leverages IEEE 802.3 ethernet framing
• Offers 2.5X the speed at a cost structure closer to 10G
• 50G is dual-lane
• Offers 1.25X the speed of 40G
• Cost structure is closer to 2X of 10G
0"
20"
40"
60"
80"
100"
120"
1G" 10G" 25G" 40G" 50G" 100G"
Price&per&Gbps&
Thank You