25
NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud) Think Outside of Cloud - Alankar Sharma Sr. Principal Architect, DC Networking Comcast - Vincent Celindro Globals Architect - Networking 1

SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

NANOG-77 [2019]

SONiC at Comcast Datacenters(Software for Open Networking in Cloud)

Think Outside of Cloud

- Alankar SharmaSr. Principal Architect, DC NetworkingComcast

- Vincent CelindroGlobals Architect - Networking

1

Page 2: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Agenda

Ø Comcast Datacenter Network OverviewØ Current Challenges & Motivation for New NOS: SONiCØ What is SONiC?

§ BYO Container Apps§ SONiC Subsystem§ BGP Update Processing

Ø SONiC MaturityØ SONiC BenefitsØ SONiC at Comcast

§ Current State of Deploymento Featureso Roadmap

Ø Comcast DC Quad DesignØ DC Core running SONiCØ Monitoring & Management

§ Telemetry & Data Sources§ Grey Failures: ML Based Anomaly Detection

Ø Advanced Features: Fast Reboot CLI Demo2

Page 3: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Comcast DC Network Overview

• 8 National DC

• 14 Regional DC

• 54 Local DC

• 8 Syndication DC

• 4 Video Distribution DC

• + POPs and Head Ends

• 9500+ Network device

• 300,000 server ports

• 600+ Changes/mo

• Multi Vendor: 8+ vendors for n/w infra

• 117 Hardware Models

• 184 versions of NOS Images

• MGMT Tools- (you name it!) -CVP, JunOS,

DCNM, HPNA, TAIL-f, Home Grown

Complex Eco-systemLarge Footprint

3

Page 4: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Comcast DC Network Overview

• Heterogeneous Infra types- Various Cloud Platforms (OpenStack, VMW, CaaS,Cloud Foundry), Storage Systems, Voice Systems, Baremetal Servers, VideoSystems, Appliances, Legacy Appliances

• Applications: Video, Multicast, DVR, Encoders, Big Data, Voice, HSD, CDN, X1

• Security (Permissions, Access Control, Asset protection, Copyrights), PCI/PIIZones- DMZ, Internal, PCI Sensitive, Office (IT)

• Unlike web-scaler cookie-cutter, but more customized, siloed deployments duevariety of applications and appliances

• Virtualized but considerable Baremetal servers

4

Page 5: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Why a new NOS at Comcast? Our MotivationWhat if the whole world spoke just English…

• Automation Challenges: Needed to build an abstraction layer that can consume common templatesand services. Attempts were made with Tail-f and OpenConfig

• Normalize the hardware variations. Empower to share same configuration, provisioning and mgmttools

• Minimize the software interop issues, even with same vendor

• Reduce cost. Pay only where you need to. Lean DC core and features pushed down to the edge.(Avoid any bandwidth-based NOS charges)

• Break-free: No more hostage to vendors. Feature Velocity and portability

• Multi-Vendor and Chip Diversity

• Decouple not only the HW/SW, also the SW/SW (containers)

• Commercially supported but easy to pull the plug, unlike proprietary NOS. And no perpetual licensebaggage.

5

Page 6: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

What is SONiC?

Software for Open Networking in Cloud

• Linux software and integration.

Full interaction with native Linux shell

• Strong and growing support from

ODM and ASIC vendors

• SAI (Switch Abstraction Interface):

Hardware independent

• Containerized Network Apps

• Advanced management tools

Switch Abstraction Layer (SAI)

SWSS Utility

DB Platform LLDPTeamD

SNMP DHCP IPv6 BGP LAG Apps

Management Tools

Ansible K8S Jenkins CHEF PuppetDockerSwarm

SONiC

SyncDRedisDB

6

Page 7: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

BYO Container Apps

• Components Isolation

• Select Building blocks

• Easy deployment

• Transactional

Benefits• Serviceability

• Extensibility

• Development agility

Database Platform

SWSS Utility

TeamD LLDP

RedisDB SyncD

SNMP DHCP BGP IPv6 LAG

Quagga FRR GOBGP Arista BGP

Juniper CRPD

SONiC Base

Network Apps

BYO

7

Page 8: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Subsystem

Fans Transceivers ASICLED Power Sensors

Platform Drivers Network Drivers ASIC Drivers

portsyncdintfsyncd

neighsyncd

orchagentintfmgrd

vlanmgrd

dhcprelay

Redis Server

SWSS container DB container

sensordfancontrol

smnpdsnmp agent

teamdteamsyncd

bgpdzebrafpmsyncd

lldpdlldpmgrdlldpsyncd

syncdsai apiasic sdk

CLIConfig-gen

Kernel Space

Hardware

User Space

Syncd container

DHCP PMON SNMP LLDPBGPTeamD

8 Ref Link: https://github.com/Azure/SONiC/wiki/Architecture

Page 9: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

BGP Routing Innerworkings Example

Ref Link: http://azure.github.io/SONiC/

1. Neighbor sends new BGP prefix as an update,which is received at BGP's socket in kernel

2. BGP socket propagates the update to the bgpdprocess

3. BGPd interprets the update and notifies Zebraof the new route

4. Zebra validates the new route and delivers thenetlink-route message to fpmsyncd

5. fpmsyncd updates the APP DB6. Orch Agent, subscribes to APP DB, realizes the

state change7. Orch Agent, processes the updates of the App

DB and do corresponding changes in the SAIDB via SAI Redis.

8. Syncd receives the new state, generated by theorchagentd

9. Syncd uses SAI API to push the route into ASICvia the ASIC driver

1

2

3 4

56

78

9

9

Page 10: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Maturity

• Community Ecosystem• 3 releases per year• 120-250 commits/months• ~850 community members• ~200 active code contributors• 68+ supported platforms

• Monitoring Tools Available• Proven Management Tools

• Major Customers are running it in Prod• Contributing & Hardening• Ali Baba, Tencent, Linkedin, Dell,

Broadcom, Mellanox

• Commercial Support Available

BGP

ECMP

VLANTrunki

ng

LLDP

QoS

FlowContro

l

WRED

SNMP

Syslog

NTP

Mirroring

DHCP

VLAN

ACL

IPv6

TunnelDecap

COPP

BGP Gracef

ul

BGP MP

FasrReboo

t

TACACS

LAG

ConfigDB

MACAging

Critical Res.

Monitoring

FIBAccn

RDMA

gRPC

AsymmetricPFC

PFCWatermark

WarmReboo

t

VRFVxLA

N

PMON

FRR

BGPEVPN

Dtel

10

Page 11: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Advantages

• Designer NOS: Operators can select best building blocks. Containerization gives choiceof per component level. E.g. BGP- FRR, Quagga, GOBGP, Arista BGP, Juniper CRPD

• Not everyone has high negotiation power. SONiC is same cost for all, favoring small-midsize companies

• No carry forward baggage from monolithic codes

• Dedup: All NOS vendors are building same L2/L3 stack

• Zero downtime upgrade through docker swarm, patches in hours and new features rolledout quickly

• Hitless upgrades: Upgrade thousands of switches in a day• Fast Reboot, Warm Reboot

• CI/CD practices

11

Page 12: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Comcast Deployment

• Deployed in DC Lean Core (v4/v6 L3 underlay)• Team: Five engineers + Ops (24/7) as the team to manage/deploy • Plans to expand into Leaf Layer

• Deployed FeaturesBGP, ECMP, LAG, SYSLOG, LLDP, DHCP, TACACS, CoPP, IPv6, Fast Reboot, Everflow, Telemetry, Warm Reboot, ACL, NTP(Unlike other vendor NOS with monolithic codes)

• Roadmap (Testing)VLAN, VLAN Trunk, VxLAN, VRF, MLAG, sFLOW, BGP EVPN, INT

• DesiredBGP Unnumbered

12

Page 13: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Comcast Datacenter (HLD)DC Quad Design

DC Core

SAS01 SAS02

Nx100G Nx100G

AS01 AS02 AS03 AS04

S01 S02

sw001a

sw001b

sw001c

Server(n)

mgmt

Backbone

S(x) S(x+1)

Sw(n)csw002

asw002

b

Server01

sw002c

Server(n)

mgmt

sw002a

sw002b

sw002c

Server(n)

mgmt

sw(n)a sw(n)b

Server01

sw(n)c

Server(n)

mgmt

sw001a

sw001b

Server01

sw001c

Server(n)

mgmt

sw002a

sw002b

Server01

sw002c

Server(n)

mgmt

sw002a

sw002b

Server01

sw002c

Server(n)

mgmt

sw(n)a sw(n)b

Server01

sw00(n)c

Server(n)

mgmt

SPINE65003

LEAF65004

AGG SPINE65002

SUPER AGG SPINE65001

EBGP

EBGP

EBGP

EBGP

L3P2P

L3P2P

L3P2P

cs(x)a cs(x)b

sw(x)c

Server

mgmt

LBNAT

PolicySecurity

Default Route

Specific Routes

Specific Routes

Default Route

Default Route

Specific Routes

Default Route

Specific Routes

Server01 Server01

S(n) S(n+1)

13

Page 14: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC in DC Core

• T2S (Tier 2 Spine) = TH3, 32x400G• S (Spine) = TH3, 32x400G• 204.8T Non-Blocking Fabric• 2048 x 100G ports for Leaf Switches• Lean Core, Underlay Only• 40% Cost Reduction• Standardize Optics (DR4, SM, 500m)• Low Latency

L3 P2PEBGP

400G400G

T2S01 T2S02 T2S08

S01 S02 S15 S16 S31 S32

T2S09 T2S16T2S15

1

2

3

4

S01

S16

T2S01 S17

S32T2S16

S01

S16

T2S01 S17

S32T2S16

S01

S16

T2S01 S17

S32T2S16

S01

S16

T2S01 S17

S32T2S16

14

Page 15: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Monitoring and Management

• Augtera Networks provides AI based network monitoring with SONiC integration

• Machine learning based real-time anomaly detection and auto-correlation along with notifications

• Ad-Hoc queries on real-time and historical data to troubleshoot incidents and analyze impact to customers / applications

• On demand or AI triggered flow collection and analytics leveraging Everflow*

Data collection from SONiC

Comcast Data Center

ML Anomalies & Incidents

Augtera AI Network Pulse

DC Ops & NOC Tools

SAS01 SAS02

Nx100G Nx100G

AS01 AS02 AS03 AS04

S01 S02

sw001a sw001b

Server01

sw001c

Server(n)

mgmt

S(n) S(n+1)

Backbone/ISP

S(x) S(x+1)

Sw(n)csw002a sw002b

Server01

sw002c

Server(n)

mgmt

sw002a sw002b

Server01

sw002c

Server(n)

mgmt

sw00(n)a

sw00(n)b

Server01

sw00(n)c

Server(n)

mgmt

sw001a sw001b

Server01

sw001c

Server(n)

mgmt

sw002a sw002b

Server01

sw002c

Server(n)

mgmt

sw002a sw002b

Server01

sw002c

Server(n)

mgmt

sw00(n)a

sw00(n)b

Server01

sw00(n)c

Server(n)

mgmt

SPINE65003

LEAF65004

AGG SPINE65002

SUPER AGG SPINE65001

EBGP

EBGP

EBGP

EBGP

Default Route

Default Route

Default Route

Default Route

L3P2P

L3P2P

L3P2P

Specific Routes

Specific Routes

Specific Routes

Specific Routes

cs00(x)a cs00(x)b

sw00(x)c

Server

mgmt

LB

NATPolicy

Security

AI & Operator Driven Config Changes

15

Page 16: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Telemetry

Data Analytics

SONiC Switch

SNMP CLI Syslog gRPC Mirror Everflow

Data Collection

Ops Monitoring Troubleshooting

16

Page 17: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC Data Sources LeveragedData Type API Type NotesNetwork Topology SNMP L2 and L3

System Events Syslog

Control Plane State & Metrics

SNMP Comcast is using BGP

*Sensor data

(Roadmap)

Sonic GRPC based Telemetry streaming

CPU, temperature, fan speed etc.

Port counters SonicTelemetry streaming inPackets128To255Octets ,inOversizePacket, outOversizePackets, inBroadcastPackets, ifInDiscards, ifInErrors,

inMulticastPackets, ifHCInOctets, inUnicastPackets, inUnknownProtos, outBroadcastPackets, ifOutDiscards, ifOutErrors,outMulticastPackets, ifHCOutOctets, outUnicastPackets

Queue counters Sonic Telemetrystreaming queueStatPackets, queueStatBytes, queueStatDroppedPackets, queueStatDroppedBytes

*Buffer Statistics and Tracking (BST)

(Roadmap)

SonicTelemetry Streaming

Coming soon

Buffer count and queue entries for port-priority-group, service-pool, port-service-pool and queue categoriesTop-drops, top-port-queue-drops, port-drops, port-queue-drops

*Packet Telemetry

(Roadmap)

Sonic Everflow On demand and AI triggered

17

Page 18: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

ML Based Anomaly Detection with SONiC

• Grey failures and traffic anomalies detection using Augtera AI Network Pulse

• No thresholds configured (e.g., BST thresholds)

-ML learns the patterns and automatically triggers anomalies on anomalous queue utilization or drops

• Congestion detection benefits from existing and additional counters in the roadmap

-Port counters: Finds anomalies on traffic, discards, oversize packets and 128To255Octet packets

-Queue counters and BST metrics are in progress

18

Page 19: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

CLI/Demo Screenshots

• Fast Reboot – Minimal Disruption (30 Seconds)• Warm Reboot – Sub-second Disruption

19

Page 20: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

DEMO Setup

sjc-z9100-01

server37

server39

sjc-z9100-02150.0.0.0/31

.0 .1

100.0.0.0/24 200.0.0.0/24

.1 .1

.2 .2

iperf

Fast-reboot DUT

20

Page 21: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Setup in Default State- iperf initiated from window 1

1

21

Page 22: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Fast-reboot in progress

Control plane goes for reboot with no disruption to data plane traffic (Window 1, 3)

1 2

3 4

22

Page 23: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Warm reboot in progressData plane rebooted in 15s after control plane is up

Only 15 second data plane time out

23

Page 24: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

Links & References

• Command-Referencehttps://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md

• Virtual Sonic setuphttps://github.com/Azure/sonic-mgmt/blob/master/ansible/doc/README.testbed.VsSetup.md

• Mailing [email protected]

• Githubhttps://github.com/Azure/SONiC/

• Wikihttps://github.com/Azure/SONiC/wiki

• SAIhttps://github.com/opencomputeproject/SAI

24

Page 25: SONiC at Comcast Datacenters · NANOG-77 [2019] SONiC at Comcast Datacenters (Software for Open Networking in Cloud)Think Outside of Cloud-Alankar Sharma Sr. Principal Architect,

SONiC at Comcast Datacenters

- Alankar [email protected]

- Vincent CelindroNetwork /R/evolutionist

25

Thank You