56
A Cloud Gateway - A Large Scale Company’s First Line of Defense Mikey Cohen Manager - Edge Gateway Netflix

A Cloud Gateway - SYS-CON Mediadw.connect.sys-con.com/session/2955/Mikey_Cohen.pdf · A Cloud Gateway - A Large Scale ... Our Journey . So What!? - Change your perspective!! ... Running

  • Upload
    vantruc

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

A Cloud Gateway - A Large Scale Company’s First Line

of Defense

Mikey Cohen

Manager - Edge Gateway

Netflix

Today, more than 36% of

North America’s internet

traffic is controlled by

systems in the Amazon

Cloud

Global Streaming of TV Shows and

Movies

Nearly 70 Million Subscribers

In over 80 Countries

Netflix accounts for over 36% of

Downstream Traffic in North

America

From the Internet to Services in the Cloud

Gateway Gateway

?????

Origin (API) Origin (API)

API

Origin (API) Origin (API)

Website

Our Edge Gateway @ Netflix

Handles most netflix.com hosts

Over 20 production Zuul clusters

~ 50 elbs

Gateway handles ~10 origin services

Netflix Gateway Scale

Tens of billions of requests per day

3 AWS regions

Over 1000 device types

Hundreds of permutations of protocols and

device versions

Success

Evolution

Scale

Failure

Our Journey

So What!? - Change your perspective!!

Traditional Cloud Proxy Mission

Simple static rule-based routing

API portal

Request authentication

Throttling - request caps

Monitoring

Caching

The Gateway - a grown-up proxy!

●Dynamic routing

●Deep Insights

●Load balancing

●Availability focused

●Service protection

●Quality assurance tool

Evolving to a Gateway

Netflix’s Public API

Late 2008

Mashery

Datacenter

Streaming Devices using public API

Early Streaming Devices - 2009 Windows Media Center

XBox

PS3

Migration to AWS

2010

Sonoa / Apigee proxy

Device traffic, not public

Controlling DC -> cloud

migration

Running in AWS

Under Netflix control

Streaming Success

2011

Chaos

Complexity

Failure

Success

Leveraging

Cloud benefits

Anti-patterns of most cloud proxies

Static configurations Service push needed to

change behavior

Limited range of

functionality

Limited to HTTP

Zuul Created

2012

Dynamically injected and compiled filters Manipulate requests and responses

Headers / Body / etc

Change routing

Add metrics and other functions

Built on Netflix’s OSS stack

Open Sourced

Zuul - A Victim of Success

Easy and convenient

Instant results

High adoption

Happy customers

Business logic in proxy

Affects system resiliency

Zuul team in critical path

Creating a Gateway

Strategy

Principles of Netflix’s Gateway Strategy

Creative Routing

Dynamic Routing

Delivery Focused

Traffic Shaping

React Fast

Insights

Creative Routing - Subclusters with Purpose

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze

Red / Green Deployments

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented Instrumented

squeeze squeeze

Developer Test Branches

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented Instrumented

squeeze squeeze

Instrumented Clusters

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze squeeze

Squeeze Testing

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze

Targeted Routing

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debu

g

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze

Service “Canarying”

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze squeeze

“Sticky” Canary

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze squeeze

Failure Injection Testing

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze squeeze

Degraded Experience Testing

Gateway Gateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky”

canary baseline

“sticky”

baseline

v1

v2

test

debug

baseline canary

“sticky”

canary

“sticky”

baseline FIT

Instrumented

squeeze squeeze

Traffic Shaping

A Global Cloud Deployment

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Global Cloud Routing

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

A Failing region

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Gateway routing to other regions

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Persistence Tier

Business

services Tier

Presentation

Tier

Network Tier

Websites

API

Proxy

DB

Attack prevention

Gateway Gateway

Gateway

Origin (API) Origin (API)

API

Origin (API) Origin (API)

Website

Smart Load Balancing

Gateway Gateway

Gateway

Origin (API)

Smart Load Balancing - Bad Nodes

Gateway Gateway

Gateway

Origin (API)

Gateway Backoff and Blacklists Bad Nodes

Gateway Gateway

Gateway

Origin (API)

Zone Failure - Blacklist the Zone automatically

Gateway Gateway

Gateway

Origin (API)

React Quickly - Runtime Filter changes

Gateway Gateway

Gateway

Origin (API) Origin (API)

API

Origin (API) Origin (API)

Website

Runtime Policy

Injection

A Room with a View - Insights

Gateway Gateway

Gateway

Origin (API) Origin (API)

API

Origin (API) Origin (API)

Website

Insights

What’s Next for Netflix’s Gateway?

Gateway as a service Self-service dynamic routing / route validation

Control APIs for special routing functions

Netty Based Zuul (using RxNetty) Handling persistent connections

non-blocking, async

Transport protocol agnostic routing

Reactive Socket http://reactivesocket.io/

Top Ten Lessons Learned

Build for handling

Failures

Expect the Unexpected

Using Routing Creatively

Shard to Reduce Blast

Radius

Devices are Weird

Protocols are Weird

Devices are Forever

Protocols are Forever

It will be built “wrong”

Keep Business Logic out

of your Gateway