56
A Cloud Gateway - A Large Scale Company’s First Line of Defense Mikey Cohen Manager - Edge Gateway Netflix

Rethinking Cloud Proxies

Embed Size (px)

Citation preview

Page 1: Rethinking Cloud Proxies

A Cloud Gateway -A Large Scale Company’s First Line of Defense

Mikey CohenManager - Edge Gateway

Netflix

Page 2: Rethinking Cloud Proxies

Today, more than 36% of North America’s internet

traffic is controlled by systems in the Amazon

Cloud

Page 3: Rethinking Cloud Proxies
Page 4: Rethinking Cloud Proxies

Global Streaming of TV Shows and Movies

Page 5: Rethinking Cloud Proxies

Nearly 70 Million Subscribers

In over 80 Countries

Page 6: Rethinking Cloud Proxies

Netflix accounts for over 36% of Downstream Traffic in North America

Page 7: Rethinking Cloud Proxies

From the Internet to Services in the Cloud

GatewayGateway

?????

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Page 8: Rethinking Cloud Proxies

Our Edge Gateway @ Netflix

Handles most netflix.com hostsOver 20 production Zuul clusters~ 50 elbs Gateway handles ~10 origin services

Page 9: Rethinking Cloud Proxies

Netflix Gateway Scale

Tens of billions of requests per day3 AWS regionsOver 1000 device types

Hundreds of permutations of protocols and device versions

Page 10: Rethinking Cloud Proxies

SuccessEvolutionScaleFailure

Our Journey

Page 11: Rethinking Cloud Proxies

So What!? - Change your perspective!!

Page 12: Rethinking Cloud Proxies

Traditional Cloud Proxy Mission

Simple static rule-based routingAPI portalRequest authenticationThrottling - request capsMonitoringCaching

Page 13: Rethinking Cloud Proxies

The Gateway - a grown-up proxy!●Dynamic routing●Deep Insights●Load balancing●Availability focused●Service protection●Quality assurance tool

Page 14: Rethinking Cloud Proxies

Evolving to a Gateway

Page 15: Rethinking Cloud Proxies

Netflix’s Public API

Late 2008MasheryDatacenter

Page 16: Rethinking Cloud Proxies

Streaming Devices using public API

Early Streaming Devices - 2009 Windows Media CenterXBoxPS3

Page 17: Rethinking Cloud Proxies

Migration to AWS2010Sonoa / Apigee proxy

Device traffic, not publicControlling DC -> cloud

migrationRunning in AWSUnder Netflix control

Page 18: Rethinking Cloud Proxies

Streaming Success2011ChaosComplexityFailureSuccessLeveraging

Cloud benefits

Page 19: Rethinking Cloud Proxies

Anti-patterns of most cloud proxiesStatic configurations

Service push needed to change behavior

Limited range of functionality

Limited to HTTP

Page 20: Rethinking Cloud Proxies

Zuul Created

2012Dynamically injected and compiled filters

Manipulate requests and responsesHeaders / Body / etc

Change routing Add metrics and other functions

Built on Netflix’s OSS stackOpen Sourced

Page 21: Rethinking Cloud Proxies

Zuul - A Victim of SuccessEasy and convenientInstant resultsHigh adoptionHappy customers

Business logic in proxyAffects system resiliency Zuul team in critical path

Page 22: Rethinking Cloud Proxies

Creating a Gateway Strategy

Page 23: Rethinking Cloud Proxies

Principles of Netflix’s Gateway Strategy

Creative RoutingDynamic RoutingDelivery FocusedTraffic ShapingReact Fast Insights

Page 24: Rethinking Cloud Proxies

Creative Routing - Subclusters with Purpose

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Page 25: Rethinking Cloud Proxies

Red / Green Deployments

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

InstrumentedInstrumented

squeezesqueeze

Page 26: Rethinking Cloud Proxies

Developer Test Branches

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

InstrumentedInstrumented

squeezesqueeze

Page 27: Rethinking Cloud Proxies

Instrumented Clusters

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Page 28: Rethinking Cloud Proxies

Squeeze Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Page 29: Rethinking Cloud Proxies

Targeted Routing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeeze

Page 30: Rethinking Cloud Proxies

Service “Canarying”

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Page 31: Rethinking Cloud Proxies

“Sticky” Canary

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

canary

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Page 32: Rethinking Cloud Proxies

Failure Injection Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Page 33: Rethinking Cloud Proxies

Degraded Experience Testing

GatewayGateway

Gateway

Origin (API)

v1

v2

test

debug

Instrumented

squeeze

“sticky” canarybaseline

“sticky” baseline

v1

v2

test

debug

baseline canary

“sticky” canary

“sticky” baselineFIT

Instrumented

squeezesqueeze

Page 34: Rethinking Cloud Proxies

Traffic Shaping

Page 35: Rethinking Cloud Proxies

A Global Cloud Deployment

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Page 36: Rethinking Cloud Proxies

Global Cloud Routing

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Page 37: Rethinking Cloud Proxies

A Failing region

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Page 38: Rethinking Cloud Proxies

Gateway routing to other regions

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Persistence Tier

Business services Tier

Presentation Tier

Network Tier

Websites API

Proxy

DB

Page 39: Rethinking Cloud Proxies

Attack prevention

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Page 40: Rethinking Cloud Proxies

Smart Load Balancing

GatewayGateway

Gateway

Origin (API)

Page 41: Rethinking Cloud Proxies

Smart Load Balancing - Bad Nodes

GatewayGateway

Gateway

Origin (API)

Page 42: Rethinking Cloud Proxies

Gateway Backoff and Blacklists Bad Nodes

GatewayGateway

Gateway

Origin (API)

Page 43: Rethinking Cloud Proxies

Zone Failure - Blacklist the Zone automatically

GatewayGateway

Gateway

Origin (API)

Page 44: Rethinking Cloud Proxies

React Quickly - Runtime Filter changes

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Runtime Policy Injection

Page 45: Rethinking Cloud Proxies

A Room with a View - Insights

GatewayGateway

Gateway

Origin (API)Origin (API)

API

Origin (API)Origin (API)

Website

Insights

Page 46: Rethinking Cloud Proxies

What’s Next for Netflix’s Gateway?

Gateway as a serviceSelf-service dynamic routing / route validationControl APIs for special routing functions

Netty Based Zuul (using RxNetty)Handling persistent connectionsnon-blocking, async

Transport protocol agnostic routingReactive Socket http://reactivesocket.io/

Page 47: Rethinking Cloud Proxies

Top Ten Lessons Learned

Page 48: Rethinking Cloud Proxies

Build for handling Failures

Page 49: Rethinking Cloud Proxies

Expect the Unexpected

Page 50: Rethinking Cloud Proxies

Using Routing Creatively

Page 51: Rethinking Cloud Proxies

Shard to Reduce Blast Radius

Page 52: Rethinking Cloud Proxies

Devices are WeirdProtocols are Weird

Page 53: Rethinking Cloud Proxies

Devices are ForeverProtocols are Forever

Page 54: Rethinking Cloud Proxies

It will be built “wrong”

Page 55: Rethinking Cloud Proxies

Keep Business Logic out of your Gateway