37
Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017

for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Networking Challenges for the Next DecadeAmin VahdatOn behalf of Google Technical Infrastructure and Google Cloud Platform

APRIL 4, 2017

Page 2: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Google Global Cache edge nodes

FASTER (US, JP, TW) 2016

Unity (US, JP) 2010SJC (JP, HK, SG) 2013

Points of presence >100

Network fiber

Google NetworkMore than a collection of data centers

Page 3: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

#

#

Future regions and number of zones

Current regions and number of zones

3

3

2

3

3 3

3

3

24

3

3

2

Frankfurt

Singapore

S Carolina

N Virginia

Belgium

London

TaiwanMumbai

Sydney

OregonIowa

São Paulo

Finland

Tokyo

Montreal

California

Netherlands

3

3

33

Google Cloud RegionsAdding 11 new regions

Page 4: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Ubiquitous Cloud...10x Scaling

Datacenter

Next-gen disaggregation of storage, memory and compute

Campus & MetroCloud regions and campus expansion driving DC interconnect

WANCloud replication and bandwidth intensive cloud services (e.g., turnkey video, IoT)

10x10x 10x

Step Function Disruptions: Bandwidth, Latency, Availability, Predictability

Page 5: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4WAN

Interconnect

Andromeda NFV and network

virtualization

JupiterDatacenter Networking

The Pillars of SDN @ Google

Page 6: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4WAN

Interconnect

Andromeda NFV and network

virtualization

JupiterDatacenter Networking

The Pillars of SDN @ Google

Espresso SDN for public

Internet

Page 7: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

B4: Google's Software Defined WAN

Page 8: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

B4: From Copy Network to Business Critical

B4 tr

affic

2012 — 2016

Page 9: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

10.1.4/24

VNET: 5.4/16

VNET: 192.168.32/24

VNET: 10.1.1/24 Load Balancing

DoS

ACLs

VPN

NFVInternal Network

Andromeda

ToR

Google Infrastructure Services

10.1.1/24

ToR

10.1.2/24

ToR

10.1.3/24

ToR

Page 10: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Watchtower

Saturn

Firehose 1.1

Google Datacenter Network InnovationAnd hardware scale that we could not buy

10

Time

Capa

city

Firehose 1.0

Jupiter

4 Post

1.3Pb/s clusters in 2013

Page 11: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4WAN

Interconnect

Andromeda NFV and network

virtualization

JupiterDatacenter Networking

The Pillars of SDN @ Google

PublicInternet?

Page 12: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

B4WAN

Interconnect

Andromeda NFV and network

virtualization

JupiterDatacenter Networking

The Pillars of SDN @ Google

Espresso SDN for public

Internet

Page 13: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso in Context

B4

Jupiter Data CenterGoogle

Page 14: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso in Context

B4

B2

Peering Metro

Jupiter Data CenterGoogle

Google

Page 15: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso in Context

B4Espresso

B2

Internet

Peering Metro

User

Jupiter Data CenterGoogle

Google

Page 16: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Cloud 1.0Espresso

SDNPeering

RouterCentric

Protocols

Espresso: Before and After

Local viewConnectivity firstCoarse fault recovery

Per-metro and global viewApplication signalsReal-time optimization

Page 17: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso Architecture Overview

Label-switched Fabric

BGP speaker

External Peer

Espresso Metro

Peering Fabric

eBGP Peering

Page 18: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso Architecture Overview

Label-switched Fabric

HostHostHostHostHost

Host

Packet Processor

BGP speaker

External PeereBGP Peering

Espresso Metro

Labeled packets specify egress

HostHostHostHostHost

Peering Fabric

Page 19: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Espresso Architecture Overview

Label-switched Fabric

HostHostHostHostHost

Host

Packet Processor

LocalControl

Global Controller

BGP speaker

External PeereBGP Peering

Espresso Metro

Application Signals

Labeled packets specify egress

HostHostHostHostHost

Peering Fabric

Page 20: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

The next wave in computing• Serverless compute in Cloud 3.0• IoT• Tightly coupled, general purpose

distributed computing

It’s time to put it all together• Agile Scale• Jitter• Isolation• Performance is great, but only

meaningful with availability, manageability, and velocity

Next Decade Challenges in Networking

Page 21: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Virtualization delivers capex savings to enterprise DCs

Cloud 1.0

Last Decade

Page 22: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Cloud 1.0

Public cloud frees enterprise from private HW infrastructure

Scheduling, load balancing primitives, “big data” query processing

Cloud 2.0Cloud 1.0

HW on Demand

Now

Page 23: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Cloud 1.0 Cloud 2.0

Serverless compute, real-time intelligence, and machine learning

Not data placement, load balancing, OS configuration and patching

Cloud 3.0

Compute,not servers

The Third Wave of Cloud Computing

Page 24: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Cloud 2.0

Networking should be aiming for Cloud 3.0

Cloud 3.0Cloud 1.0

The Third Wave of Cloud Computing

Page 25: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Storage disaggregation:the datacenter is the storage appliance

Seamless telemetryand scale up/down

Transparent live migration

Open Marketplaceof services, securely placed and accessed

Networking and Cloud 3.0

Page 26: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Applications+Functionsnot VMs

Policynot middleboxes

Actionable Intelligencenot data processing

SLOsnot placement/load balancing/scheduling

Networking and Cloud 3.0

Page 27: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

The network will enable next-generation compute infrastructure

The network can define next-generation storage infrastructure

The right network infrastructure can deliver fundamental new capability

Next Decade Challenges in Networking

Page 28: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

How we Prioritize Infrastructure Work

Availability

Manageability

Velocity

Stranding

Performance

Page 29: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

• First things first: an insecure infrastructure is an unavailable infrastructure• Stability is more important than efficiency• Network management is critical• Configuration is hard• Automation matters but can be counter to availability

“Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure.” SIGCOMM 2016.

Availability is Paramount

Page 30: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

• Velocity is the speed of iteration• Retrospective on “Tussle in Cyberspace:

Defining Tomorrow’s Internet”• Build for hitless upgrades and

self-validation• Debugging and tracing matter

○ Without visibility, performance does not matter

• Network fabrics built for expansion and evolution

• Launch and Iterate

Build for Velocity

Page 31: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Isolation with reservations is easy but leads to huge resource stranding● General-purpose, shared infrastructure to approximate custom-built and reserved

Isolation has many components● Latency, bandwidth, but also the control plane● Accounting and chargeback are big missing pieces

Congestion Control is still really hard● Rationalizing multiple control loops, flow, endpoint, flow group, Traffic Engineering

Isolation is Critical; Stranding is Terrible

Page 32: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Amdahl’s law applies and so an incredible, localized optimization that takes any effort to adopt will be ignored

1. Scale2. Jitter3. Storage Disaggregation

Must optimize from the application all the way to the end user

Performance only Matters if End to End

Page 33: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

How we Prioritize Infrastructure Work

Availability

Manageability

Velocity

Stranding

Performance

Page 34: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

The next wave of computing• Serverless compute in Cloud 3.0• IoT• Tightly coupled, general purpose

distributed computing

It’s time to put it all together• Agile Scale• Jitter• Isolation• Performance is great, but only

meaningful with availability, manageability, and velocity

Next Decade Challenges in Networking

Page 35: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Thank You!Thank You!

Page 36: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Open Source

Google Cloud Platform 36

Google MapReduce

Google Bigtable

Google Borg Google BorgGoogle Dremel

Page 37: for the Next Decade Networking Challenges · Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4,

Open Source

Google Cloud Platform 37

TCPBBR

gRPCOpen

ConfigQUIC ...