34
Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors 1 For more details: “ Experiences with Modeling Network Topologies at Multiple Levels of Abstraction,” Jeffrey C. Mogul, Drago Goricanec, Martin Pool, Anees Shaikh, Douglas Turk, Bikash Koley, Xiaoxue Zhao, USENIX Symposium on Networked Systems Design and Implementation (NSDI 2020)

Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Experiences with multi-layer network modelingAnees Shaikh, Google Global Networkingon behalf of many, many contributors

1

For more details: “Experiences with Modeling Network Topologies at Multiple Levels of Abstraction,” Jeffrey C. Mogul, Drago Goricanec, Martin Pool, Anees Shaikh, Douglas Turk, Bikash Koley, Xiaoxue Zhao, USENIX Symposium on Networked Systems Design and Implementation (NSDI 2020)

Page 2: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

A common standard for representing network structure

Motivation

Design choices

Lessons / experience that others may find useful

2

Page 3: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

External view of Google Cloud Network134 points of presence and 14 subsea cable investments around the globe

Network

Equiano(PT, NG, ZA)

2021

Dunant(US, FR)

2020

SJC(JP, HK, SG)2013

JGA-S(GU, AU)2019

Indigo(SG, ID, AU)

2019

Havfrue(US, IE, DK)

2019

Monet(US, BR)2017

Junior(Rio, Santos)2018

Tannat(BR, UY, AR)2018

Curie(CL, US)

2019

Faster(US, JP, TW)

2016

PLCN(HK, US)

2019

Unity(US, JP)

2010

HK-G(HK, GU)2020

Edge point of presence

Grace Hopper(US, UK, ES)

2022

Page 4: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Google production networks

4Enterprise

B4

B2

Internet

Peering Metro

Jupiter Data Center Google

User-facing WAN

EspressoUser

Andromeda Virtualization

Server-Server WAN

Page 5: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Automation is a requirement for all networks

5

IXP

Scale

do more with the resources that we have

VPN Backbone

Consistency

consistent deployments, not snowflakes

Correctness

eliminate human inconsistencies

Common automation targets: device and link provisioning, configuration changes, monitoring / probing, troubleshooting ...

● demand forecasting and capacity planning● high-level network design● detailed network design● ordering materials -- racks, switches, cables,

etc.

● installing the physical network (for human operators)

● configuring devices and controllers● monitoring the state of all components of the

network● diagnosing problems

For large-scale networks, we need to automate all aspects of management:

Page 6: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Automation needs dataIn order to automate safely, we need precise and accurate data about our networks

Examples

● high-level plans for connectivity (future)

● low-level details of connectivity (soon / current)

● device and controller configuration

● access control policies

● routing policies

● IP address assignments

6

Page 7: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

A common standard for representing network topologyMulti-Abstraction-Layer Topology (MALT):

● Google's internal standard for (almost) all representations of network topology / structure

● provides interoperability between many software systems● multiple layers of abstraction● extensibility and evolution● used to implement declarative network management systems● supported by a extensive software ecosystem

7

Page 8: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Why a standard representation? Prior to adopting MALT, we had lots of ad hoc producer-consumer agreements● knowledge was often hidden in code

8

P1 P2 Pn...

C1 C2 ... Cm

P1 P2 Pn...

C1 C2 ... Cm

single standard

No standard: m*n agreements

With standard:m+n agreements

A standard representation:● decouples producers and consumers● exposes knowledge in the data, rather than hiding

it in code● enables the development of shared infrastructure

Page 9: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Example -- intent-driven configuration of the WAN

9

configuration orchestration

config generation

serviceconfig and operational commands

intent API

workflows

config DB

network model

Page 10: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Example -- intent-driven configuration of the WAN

10

configuration orchestration

config generation

serviceconfig and operational commands

intent API

workflows

config DB

network model

health checking

monitoring configuration

peer management

network planning

● ● ●

other model clients

Page 11: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Basics of MALTMALT is an entity-relationship model:● entities represent things: real or abstract● entities have entity-kinds, names and attributes● relationships connect entities, and don't have attributes

Examples● real entities: routers, connectors, fibers, server machines, racks, buildings● example abstract entities: Clos networks, tunnel / trunk links, groupings of all sorts● example relationships: contains, aggregates, controls, configured_on

MALT today has:● >250 entity-kinds● ~20 relationship-kinds

11

Page 12: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Trivial entity-relationship graph (one L3 link)

L2elements

L3elements

EK_LOGICAL_PACKET_LINKY:1.0 - X:1.0

EK_PORTX:1

EK_PORTY:1

EK_PHYSICAL_PACKET_LINKX:1 - Y:1

EK_PHYSICAL_PACKET_LINKY:1 - X:1

RK_CONTAINS RK_TRAVERSES RK_ORGINATES RK_TERMINATES

EK_ROUTERX

EK_ROUTERY

EK_INTERFACEX:1.0

EK_INTERFACEY:1.0

EK_LOGICAL_PACKET_LINKX:1.0 - Y:1.0

Entity

12

Page 13: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Trivial entity-relationship graph (one L3 link)

L2elements

L3elements

EK_LOGICAL_PACKET_LINKY:1.0 - X:1.0

EK_PORTX:1

EK_PORTY:1

EK_PHYSICAL_PACKET_LINKX:1 - Y:1

EK_PHYSICAL_PACKET_LINKY:1 - X:1

EK_ROUTERX

EK_ROUTERY

EK_INTERFACEX:1.0

EK_INTERFACEY:1.0

EK_LOGICAL_PACKET_LINKX:1.0 - Y:1.0Relationship

13RK_CONTAINS RK_TRAVERSES RK_ORGINATES RK_TERMINATES

Page 14: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Trivial entity-relationship graph (one L3 link)

L2elements

L3elements

EK_ROUTERX

EK_ROUTERY

EK_INTERFACEX:1.0

EK_INTERFACEY:1.0

EK_LOGICAL_PACKET_LINKX:1.0 - Y:1.0

EK_LOGICAL_PACKET_LINKY:1.0 - X:1.0

EK_PORTX:1

EK_PORTY:1

EK_PHYSICAL_PACKET_LINKX:1 - Y:1

EK_PHYSICAL_PACKET_LINKY:1 - X:1

14RK_CONTAINS RK_TRAVERSES RK_ORGINATES RK_TERMINATES

Page 15: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

"This looks too verbose"

MALT is meant for computers, not for humans!● computers are good at processing graphs with millions of entities● software is bad at making inferences -- better to be explicit and have too much detail

15

But we can still express MALT graphs in text, when we have to:

EK_ROUTER/X RK_CONTAINS EK_INTERFACE/X:1.0

EK_INTERFACE/X:1.0 RK_TRAVERSES EK_PORT/X:1

EK_ROUTER/Y RK_CONTAINS EK_INTERFACE/Y:1.0

EK_INTERFACE/Y:1.0 RK_TRAVERSES EK_PORT/Y:1

EK_LOGICAL_PACKET_LINK/"X:1.0 - Y:1.0"

RK_TRAVERSES EK_PHYSICAL_PACKET_LINK/"X:1 - Y:1"

EK_PORT/X:1 RK_ORIGINATES

EK_PHYSICAL_PACKET_LINK/"X:1 - Y:1"

EK_PORT/Y:1 RK_TERMINATES

EK_PHYSICAL_PACKET_LINK/"X:1 - Y:1"

EK_INTERFACE/X:1.0 RK_ORIGINATES

EK_LOGICAL_PACKET_LINK/"X:1.0 - Y:1.0"

EK_INTERFACE/Y:1.0 RK_TERMINATES

EK_LOGIICAL_PACKET_LINK/"X:1.0 - Y:1.0"

(this is about 80% of the previous diagram)

Page 16: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Abstractions go deep

16

router

physical_packet_link

optical transponder/muxponder card

client port

line port

optical common - mux/demux, ROADM

local port DWDM

port

SMF28 fiber

LEAF fiber

fiber joint

ODU

OTU

OCH / OMSG

OMS

OTSfiber

segment optical amplifier

Media_link

Media_link

Example: Optical Transport Network hierarchy (used in WANs)

Page 17: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Modeling a simple switched network topology

17

EK_PORT

EK_MACHINE

EK_PORT EK_PORT

EK_PACKET_ SWITCH

EK_PHYSICAL_ PACKET_LINK

EK_PHYSICAL_ PACKET_LINK

EK_PORT EK_PORT

EK_PACKET_ SWITCH

EK_PHYSICAL_ PACKET_LINK

EK_PHYSICAL_ PACKET_LINK

EK_PORT

EK_MACHINE

EK_PHYSICAL_ PACKET_LINK

EK_PHYSICAL_ PACKET_LINK

RK_CONTAINSRK_ORIGINATESRK_TERMINATES

Page 18: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Entity attributesattributes allow us to express intent and status for specific points in the topology

partial examples for EK_PORT and EK_INTERFACE (protobuf notation):

port_attr: <

device_port_name: "port-1/24"

openflow: <

of_port_number: 24

>

port_role: PR_SINGLETON

port_attributes: <

physical_capacity_bps: 40000000000

>

dropped_packets_per_second: 3

>

interface_attr: <

address: <

ipv4: <

address: "10.1.2.3"

prefixlen: 32

>

ipv6: <

address: "1111:2222:3333:4444::"

prefixlen: 64

>

>

> 18

Page 19: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Entity attributesattributes allow us to express intent and status for specific points in the topology

partial examples for EK_PORT and EK_INTERFACE (protobuf notation):

port_attr: <

device_port_name: "port-1/24"

openflow: <

of_port_number: 24

>

port_role: PR_SINGLETON

port_attributes: <

physical_capacity_bps: 40000000000

>

dropped_packets_per_second: 3

>

interface_attr: <

address: <

ipv4: <

address: "10.1.2.3"

prefixlen: 32

>

ipv6: <

address: "1111:2222:3333:4444::"

prefixlen: 64

>

>

> 19

intent attributes

observed attribute

Page 20: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

MALT's software ecosystemMALT's representation would be useless without a rich software ecosystem

● libraries to support common operations and hide some details● autogenerated schema documentation● model visualization and network visualization UIs● a domain-specific query language● a scalable, reliable storage system

20

Page 21: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

MALT queries

Most applications navigate small regions of a model, not an entire graph● e.g., generate config for a single device; figure out what fails if a rack dies

MALT has a query language to make this reasonably efficient● challenging tradeoff between expressive power and usability● raw query language is still confusing to many programmers

○ added a layer of "canned queries" with specific semantics■ e.g. "all L2 links between a pair of switches" or "rack that contains a line card"

○ Canned queries also insulate clients against many kinds of schema changes

Why didn't we use SQL queries?● reduce client coupling to the underlying SQL schema (more details in paper)

21

Page 22: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Storage: MALTshopSingle (replicated) service for storing MALT models● implement and operate just one high-availability service, not lots of them● promote controlled sharing between applications and teams● ensure there's an easy way to find anything across all of our network models

MALTshop:● supports many, many named "shards" with ACLs + immutable-version semantics● efficient support for incremental updates, queries, etc.● based on Spanner for scale and geo-consistency● currently: thousands of shards, millions of entities/shard, 1000s of queries/sec

22

Page 23: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

We learned a lot of lessons, the hard way● schema design principles (and the need to be rigorous about them)

● support for schema evolution

● structure design pipelines as dataflow graphs, not shared-database updates

● use different models for different phases of a network's lifecycle

● migrating users from older representations (it's really hard)

● the dangers of string-parsing (avoid!)

● using human-readable names for entities (not our best idea)

● a good representation doesn't save you from dirty data

23

Page 24: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

We learned a lot of lessons, the hard way

24

● schema design principles (and the need to be rigorous about them)

● support for schema evolution

● structure design pipelines as dataflow graphs, not shared-database updates

● use different models for different phases of a network's lifecycle

● migrating users from older representations (it's really hard)

● the dangers of string-parsing (avoid!)

● using human-readable names for entities (not our best idea)

● a good representation doesn't save you from dirty data

Page 25: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Schema design principles● fewer entity-kinds does not make the schema simpler

○ overloaded concepts lead to ambiguity, which leads to complex/fragile code

● instead, favor orthogonality and separation of aspects○ orthogonality: two "things" with mostly-disjoint attributes/relationships should be two EKs○ separation of aspects: complex things (e.g., routers) can be multiple EK (data plane,

chassis, etc.)

● bias toward explicit relationships rather than name-based attributes○ there are some interesting trade-offs, however

● use relationship-kinds consistently○ otherwise, it's harder to create straightforward queries

25

Page 26: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Schema evolutionnetworks are complex and we're constantly adding new modeling use cases● MALT schema needs to continually evolve

We use multiple processes to manage evolution:● curation of schema changes via expert "review board" + a written Style Guide● versioned "profiles" to further constrain schema for specific parts of our networks

○ machine-checkable profile enforces contract between producers + consumers○ automated model gen allows producers to create same data for multiple profiles

● "canned queries" insulate most consumers from much of our evolution

Abstraction is vital, but taxonomy is hard -- even for experts

26

Page 27: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Why we prefer dataflow design pipelines to databases

27

Dataflow-style design pipeline

Demandforecast

Humaninputs

High-levelnetwork design

Detailed L3network design

Detailed L1 network design

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

L3 consumers

L1 consumers

Page 28: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Why we prefer dataflow design pipelines to databases

Dataflow-style design pipeline

Humaninputs

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

Topology database

L3 consumers

L1 consumers

Database-style design pipeline

28

>>

Demandforecast

Humaninputs

High-levelnetwork design

Detailed L3network design

Detailed L1 network design

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

L3 consumers

L1 consumers

Page 29: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Why we prefer dataflow design pipelines to databasesDemandforecast

Humaninputs

High-levelnetwork design

Detailed L3network design

Detailed L1 network design

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

Dataflow-style design pipeline

Humaninputs

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

Topology database

L3 consumers

L1 consumers

L3 consumers

L1 consumers

Database-style design pipeline29

Dataflow-style pipeline:

● Clear ownership of data at each stage

● Clear producer-consumer contracts

● Easy to create test datasets

● Easy to re-run the pipeline when things change

● Easy to insert validations at each step

Page 30: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Why we prefer dataflow design pipelines to databasesDemandforecast

Humaninputs

High-levelnetwork design

Detailed L3network design

Detailed L1 network design

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

Dataflow-style design pipeline

Humaninputs

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

Topology database

L3 consumers

L1 consumers

L3 consumers

L1 consumers

Database-style design pipeline30

Dataflow-style pipeline:

● Clear ownership of data at each stage

● Clear producer-consumer contracts

● Easy to create test datasets

● Easy to re-run the pipeline when things change

● Easy to insert validations at each step

Database-style pipeline:

● Stages are unclear

● Ownership is global

● Fuzzy producer-consumer contracts

● Hard to create test datasets

● Hard to re-run the pipeline, because you first have to undo the previous updates

Page 31: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Thanks!

● automation requires both low-level detail and abstraction

● abstraction is hard and requires support for controlled evolution

● a data-exchange format needs a full software ecosystem

● network models tie together all of our network management automation

● network management: it's about the whole lifecycle, not just the running network

31

Page 32: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Additional material

32

Page 33: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Key

Example: MALT for a multi-phase network design pipeline

33

Demandforecast

Humaninputs

High-levelnetwork design

Detailed L3network design

Detailed L1 network design

Automated high-level designer

Automated L3 designer

Automated L1 designer

L3 design rules

L1 design rules

Spatial data

L3 consumers: device config, SDN controllers, etc.

L1 consumers: materials ordering, fiber installation, etc.

MALT data

Process step

Other data

Generate network designs automatically● Start with high-level abstractions● Expand detail at each step, based on additional

data

Page 34: Experiences with multi-layer network modeling · 2020. 10. 20. · Experiences with multi-layer network modeling Anees Shaikh, Google Global Networking on behalf of many, many contributors

Example MALT query

# Given a device, find its geographical information and

# the ports and interfaces it contains.

cmd { find { match { id { kind: EK_DEVICE name: 'foo' }}}}

cmd

branch {

# Expand backwards.

sequence {

cmd {

follow_until {

kind: RK_CONTAINS dir:DIR_BACKWARDS

target { match { id { kind: EK_CONTINENT }}}

}

}

}

# Expand forwards.

sequence {

cmd {

follow_until {

kind: RK_CONTAINS

target {

match { id { kind: EK_PORT } }

match { id { kind: EK_INTERFACE } }

}

}

}

}

}

}

34