Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Cloud Done Right!Bringing Web-Scale Innovations to Every Data Center
David Iles – Senior Director of Ethernet Switching
Mellanox Technologies
2
Mellanox Leadership Across Industries
10 of Top 10 Automotive
Manufacturers
3 of Top 5 Pharmaceutical
Companies
9 of Top 10Oil and GasCompanies
5 of Top 6Global Banks
9 of Top 10Hyperscale Companies
Mellanox Interconnect Solutions Deliver Highest Return on Investment
3
What have Cloud Titans taught the Industry?
SDN
Disaggregation
Modern Topology
Open Network Operating System
Smart Telemetry
Pure Layer 3 Network
We bring Cloud Titan innovations to you!
4
Web-Scale Innovation: Leverage Open Platforms
Mainframe-like Networks:- Vendor lock-in- Higher switch prices- Higher support prices
Open Networking:- Best of breed hardware- Best of breed software- No hidden licenses/royalties
hardware
operating system
app app appApp App
Operating System
Switch Hardware
Optics
App
optics cables
Cables
5
Web-Scale Innovation:From Layer 2 to Layer 3
Layer 3
Layer 2
Layer 3
Layer 2
Layer 3
Layer 2
Layer 3
Trend over Time
Fault
Domain
Fault
Domain
Fault
Domain Fault
Domain
Web-Scale Innovation:VXLAN without compromise
EVPN
+
VM2 VM3
VM’s migrate seamlessly
VM1When VMs are deployed:VLAN Auto-configured
& mapped to VXLAN
Web-Scale Innovation:Leaf/Spine Networks
Scale Out NetworkScale Up Network
25G
Leaf Switches
25G25G25G
Superior Price, Performance, and Resiliency
ToR Switches
100G100G
Leaf Switches
25G25G
100G
Web-Scale Innovation:Trend from Modular to Fixed Port Switches
78% of Data Center ports
Crehan Quarterly Market Shares - Data Center Ethernet, July 2019
31%
28% 28%
21%
17%
13%12% 12%
13% 14% 13%
43%
47%48%
57%
64%
69%
73%74% 73% 74%
78%
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Modular Ports
Fixed Ports
Data Center Switch Port Share
Web-Scale Innovation:Trend from Modular to Fixed Port Switches
Scales to over 100K Nodes
31%
28% 28%
21%
17%
13%12% 12%
13% 14% 13%
43%
47%48%
57%
64%
69%
73%74% 73% 74%
78%
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Modular Ports
Fixed Ports
Data Center Switch Port Share
Crehan Quarterly Market Shares - Data Center Ethernet, July 2019
10
Web-Scale Innovation:Simplified Configs
Cumulus EVPN Config - 11 Lines Other’s EVPN Config
feature bgp
feature pim
feature nv overlay
feature vn-segment-vlan-
based
feature lacp
feature vpc
vpc domain 1
peer-switch
peer-gateway
ipv6 nd synchronize
ip arp synchronize
peer-keepalive destination
10.255.255.2
nv overlay evpn
vlan 10
no shutdown
vn-segment 10
rd auto
address-family ipv4
unicast
route-target import
65535:101 evpn
route-target export
65535:101 evpn
route-target import
65535:101
route-target export
65535:101
address-family ipv6
unicast
route-target import
65535:101 evpn
route-target export
65535:101 evpn
route-target import
65535:101
route-target export
65535:101
interface loopback0
ip address 192.168.1.1/32
ip pim sparse-mode
ip pim rp-address
192.168.1.100 group-list
224.0.0.0/4
ip pim ssm range
232.0.0.0/8
route-map permitall permit
10
set ip next-hop unchanged
interface port-channel 20
vpc peer-link
interface ethernet4/48
channel-group 20
interface port-channel 20
vpc 1
interface ethernet4/47
no shutdown
ip address 10.255.255.1/30
interface ethernet4/2
ip address 10.1.1.1/30
ip pim sparse-mode
no shutdown
interface ethernet4/3
ip address 10.1.2.1/30
ip pim sparse-mode
no shutdown
router bgp 65001
address-family l2vpn evpn
nexthop route-map
permitall
retain route-target all
neighbor 192.168.1.2
remote-as 65002
update-source loopback0
ebgp-multihop 255
address-family l2vpn evpn
disable-peer-as-check
send-community extended
route-map permitall out
neighbor 192.168.1.3
remote-as 65003
update-source loopback0
ebgp-multihop 255
address-family l2vpn evpn
disable-peer-as-check
send-community extended
route-map permitall out
neighbor 10.1.1.2 remote-
as 65111
address-family ipv4
unicast
allowas-in
disable-peer-as-check
neighbor 10.1.2.2 remote-
as 65111
address-family ipv4
unicast
allowas-in
disable-peer-as-check
evpn
vni 10 l2
hardware access-list tcam
region arp-ether 256
double-wide
interface nve1
no shutdown
source-interface loopback1
host-reachability protocol
bgp
member vni 10
mcast-group 239.0.0.1
interface e1/47
switchport
switchport access vlan 10
channel-group 50 mode
active
interface port-channel 50
vpc 1
11
Web-Scale Innovation:Simplified Configs
Other’s RoCE Config
Step 1 – Ingress Traffic Classificationclass-map type qos match-all CNP
match dscp 48
class-map type qos match-all
RDMA
match dscp 26
policy-map type qos QOS_MARKING
class RDMA
set qos-group 3
class CNP
set qos-group 6
Step 2 – QoS Policiespolicy-map type network-qos
QOS_NETWORK
class type network-qos c-8q-nq3
pause pfc-cos 3
mtu 2240
policy-map type queuing
QOS_QUEUEING
class type queuing c-out-8q-q3
random-detect minimum-threshold
150 kbytes maximum-threshold
1500 kbytes drop-probability 100
weight 0 ecn
bandwidth remaining percent 20
class type queuing c-out-8q-q6
priority level 1
policy-map type queuing
INPUT_QOS_QUEUEING
class type queuing c-in-q3
queue-limit dynamic 3
system qos
service-policy type queuing
input INPUT_QOS_QUEUEING
service-policy type queuing
output QOS_QUEUEING
service-policy type network-qos
QOS_NETWORK
Step 3 –Resource Allocationhardware access-list tcam region
e-racl 0
hardware access-list tcam region
vpc-convergence 0
hardware access-list tcam region
racl-lite 768
hardware access-list tcam region
l3qos-intra-lite 0
hardware access-list tcam region
qos 256
hardware access-list tcam region
e-qos 256
Mellanox “Do RoCE”
switch (config) # roce
12
Reduction in OpEx with Web-Scale Networking75%Up to
TRADITIONAL NETWORKING Web-Scale NETWORKING
Operational Leverage 1 admins : 4 Switches 1 admin : 500 Switches
Provisioning Weeks Minutes
Supply Chain Locked-in Open Supply Chain
3rd Party Integration Vendor Determines Customer Determines
Management Tools Vendor Driven Customer Choice
Robustness / ReliabilityManual &
Highly Error Prone
Automated &
Reduced Network Downtime
Web-Scale Innovation:Automate everything
13
Web-Scale Innovation:Moving Intelligence to the Edge
Network
Services
Trend over Time
Network
Services
Network
Services
Network
ServicesTromboning
East/West
Traffic
Software Defined EverythingCreates Bottlenecks
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Ap
plic
atio
n P
roce
ssin
gBare Metal
Available for Application Processing
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core Core Core
Core
Ap
plic
atio
n P
roce
ssin
g
Net
wo
rkin
g &
Se
curi
ty
Bare Metal
Virtualized &
Software Defined
Available for Application Processing
Core
Core
Software Defined Everything (SDX) Consumes CPU cores for Packet Processing
• Virtualization, Storage, Switching, Routing, Load Balancing
Security: Consumes CPU cores for Security Processing
• Layer 4 Firewall, encryption, host introspection
• Intrusion detection & prevention
Core Core Core
Core Core Core
Core Core Core Core
Software Defined Everything Creates Bottlenecks
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core Core Core Core
Ap
plic
atio
n P
roce
ssin
g
Net
wo
rkin
g &
Se
curi
ty Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Ap
plic
atio
n P
roce
ssin
g
Bare Metal
Virtualized &
Software Defined
Accelerated
Bare Metal Cloud
Available for Application Processing
Core
Core
Software Defined Everything (SDX) Consumes CPU cores for Packet Processing
• Virtualization, Storage, Switching, Routing, Load Balancing
Security: Consumes CPU cores for Security Processing
• Layer 4 Firewall, encryption, host introspection
• Intrusion detection & prevention
Core
Core
Core Core Core
Core Core Core
Core Core Core Core
Software Defined Everything Creates Bottlenecks
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core Core Core Core
Ap
plic
atio
n P
roce
ssin
g
Net
wo
rkin
g &
Se
curi
ty Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Ap
plic
atio
n P
roce
ssin
g
Bare Metal
Virtualized &
Software Defined
Accelerated
Bare Metal Cloud
Available for Application Processing
Core
Core
Software Defined Everything (SDX) Consumes CPU cores for Packet Processing
• Virtualization, Storage, Switching, Routing, Load Balancing
Security: Consumes CPU cores for Security Processing
• Layer 4 Firewall, encryption, host introspection
• Intrusion detection & prevention
SDX in SmartNIC
Security in SmartNIC
Core
Core
Core Core Core
Core Core Core
Core Core Core Core
Software Defined Everything Creates Bottlenecks
18
Cloud ready virtual switch
Open vSwitch (OVS)
VM VM VM
Network Adapter Open vSwitch
Monitoring: Netflow, sFlow, SPAN, RSPAN
Automated Control: OpenFlow, OVSDB mgmt. protocol
QoS: traffic queuing and traffic shaping
Security: VLAN isolation, traffic filtering
Virtual Function
(VF)Virtual Function
(VF)
Virtual Function
(VF)
19
OVS Performance Challenges
OVS performance burdens:
Higher LatencyHigh CPU Utilization Limited Throughput
20
Accelerated Switching and Packet Processing ASAP2
Best of both worlds:
Hardware Accelerated Data Plane
+
Software Define Control Plane
Virtual Switch Control Plane
Hardware Accelerated Data Plane
Standard Hardware
Abstraction Interface
ASAP2
21
Software vs. Hardware OVS
OVS-vswitchd
OVS Kernel Module
User Space
Kernel
OVS-vswitchd
OVS Kernel Module
User Space
Kernel
ConnectX eSwitchHardware
First flow packet
Fallback path
Hardware forwarded packets
Legacy
OVS Software Implementation ASAP2 on ConnectX Hardware
▪ High latency
▪ Low bandwidth
▪ CPU intensive
▪ Low latency
▪ High bandwidth
▪ Efficient CPU
Improving OVS Performance
▪ Mellanox OVS Offload - ASAP2
▪ 20X higher performance than vanilla OVS
▪ 10X better performance than OVS-DPDK
▪ Line rate performance at 25/40/50/100Gbps
7.6 MPPS
66 MPPS
0
10
20
30
40
50
60
70
OVS over DPDK OVS Offload
Mil
lio
n P
acket
Per
Seco
nd
ASAP2
ASAP2: 10X packet rate with Zero CPU Load
2 CPU Cores
Zero CPU Load!
10X
Better
Performance
23
Storage Networking Trend
Source: Crehan Research, Host Adapter Port Shipments, January 2018
1.5
2.0
2.5
3.0
3.5
10
20
30
40
50
60
70
Fibre Channel Port Shipments (in Millions) Ethernet Port Shipments (in Millions)
24
Storage Networking Trend
Feature Fibre Channel Ethernet
Bandwidth 1 G 100 M
Supports Block Block, file
Lossless Yes No
Cost High $$$$ Medium $$
Cloud / HCI No / No No / No
Vendors Several Many
SDS / Scale-out No / No No / No
1997 2019
Feature Fibre Channel Ethernet
Bandwidth 8/16/32 G 10/25/40/100 G
Supports Block Block, file, object
Lossless Yes Yes
Cost Medium $$ Low $
Cloud / HCI No / No Yes / Yes
Vendors 2 / 2 Many / Many
SDS / Scale-out Rare / No Yes / Yes
Yesterday: Storage Network = FC
- Fibre Channel offered best performance
- All interesting storage was tier-1 block
- No cloud or hyperconverged
Today: Ethernet for storage networks
25
Flash Storage is Getting a Lot Faster!
Network Bottleneck
Web-Scale Innovation:Matching Network & Storage Latency
26
Web-Scale Innovation:Accelerate Scale-out Storage with ROCE
Storage is Getting Faster!
Source: Outstanding S2D Efficiency
3x
More!
27
Performance
Automated
Cost Efficient
High Availability
Simplicity
Scalability
✓ 2 Switches in 1RU
✓ Ideal Port Count for Storage /HCI / ML
✓ Zero Packet Loss
✓ Low Latency
✓ RoCE optimized
✓ Network automation/visibility
✓ Cost optimized
Web-Scale Innovation:Storage Optimized Switch Form Factors
28
Web-Scale Innovation: Measure Everything!
Legacy Mindset Webscale Mindset
Protocols Telemetry Features
PIM
HSRP
LACP
VPC
OSFPv2
RIPv2
EIGRP
SNMP
TACACS
UFD
PVRST/ MSTP
Private VLAN
Loop/Root/BPDU Guard
QOS
VRRP
VTP
GVRP
IGMP
TRILL
SPB
FabricPath
VCS
Qfabric
BGP
FCoE
BFD
FEX
OVSDB/VTEP
MLAG
QinQ
EVPN
LACP
BGP/BFD
SNMP
RMON
SPAN
CDP
SNMP
SPAN
ERSPAN
sFlow
IPFIX
SYSLOG
Packet Brokering
LLDP
Real-time Visibility Snapshots
Streaming Telemetry (GPB)
Packet Brokering
Buffer Histograms
Mirror Congestion
Mirror Drops
In Band Telemetry
In-situ OAM
RoCE Telemetry
Watermarks
SYSLOG
ERSPAN
SPAN
sFlow
LLDP
WJH
29
Why Do We Need Telemetry?
Faster Time to Innocence Faster Time To Resolution Get more out of the Network
30
WJH™ Accelerates the Time to Root-Cause
SNMP SYSLOG
???
???? ??
?
!
31
WJH™ – How Does It Work?
SDK/SAI
Network OS
Packet’s Header +
very detailed description
The Important Questions1. SDK generates:
WJH messages
2. WJH Agent:
Streams to a Database
3. Presentation layer shows:
What Just Happened
WHO is being impacted
WHAT is causing the problem
WHEN it happened
WHERE is the problem
WHY it is happening
32
L1
• Bad CRC
• Flaky cable
L2/L3/Overlay
• BGP
• VLAN
Buffer
• Incast
• Rate Limit
ACLs
• Deny based on IP
• Deny based on VLAN
Congestion
• Incast
• Busy storage device
Latency
• Pause frames
• Congestion➔ latency
Suboptimal Route
• Packet doesn’t reach the firewall
• Packet go through a sub-optimized path
Suboptimal Load Balance
• Suboptimal ECMP
• Suboptimal LAG
Packet Drop No Packet Drop
Web-Scale Innovation: Extreme Visibility
33
What have Cloud Titans taught the Industry?
SDN
Disaggregation
Modern Topology
Open Network Operating System
Smart Telemetry
Pure Layer 3 Network
We bring Cloud Titan innovations to you!
Thank you!