A Cloud Gateway -A Large Scale Company’s First Line of Defense
Mikey CohenManager - Edge Gateway
Netflix
Today, more than 36% of North America’s internet
traffic is controlled by systems in the Amazon
Cloud
Global Streaming of TV Shows and Movies
Nearly 70 Million Subscribers
In over 80 Countries
Netflix accounts for over 36% of Downstream Traffic in North America
From the Internet to Services in the Cloud
GatewayGateway
?????
Origin (API)Origin (API)
API
Origin (API)Origin (API)
Website
Our Edge Gateway @ Netflix
Handles most netflix.com hostsOver 20 production Zuul clusters~ 50 elbs Gateway handles ~10 origin services
Netflix Gateway Scale
Tens of billions of requests per day3 AWS regionsOver 1000 device types
Hundreds of permutations of protocols and device versions
SuccessEvolutionScaleFailure
Our Journey
So What!? - Change your perspective!!
Traditional Cloud Proxy Mission
Simple static rule-based routingAPI portalRequest authenticationThrottling - request capsMonitoringCaching
The Gateway - a grown-up proxy!●Dynamic routing●Deep Insights●Load balancing●Availability focused●Service protection●Quality assurance tool
Evolving to a Gateway
Netflix’s Public API
Late 2008MasheryDatacenter
Streaming Devices using public API
Early Streaming Devices - 2009 Windows Media CenterXBoxPS3
Migration to AWS2010Sonoa / Apigee proxy
Device traffic, not publicControlling DC -> cloud
migrationRunning in AWSUnder Netflix control
Streaming Success2011ChaosComplexityFailureSuccessLeveraging
Cloud benefits
Anti-patterns of most cloud proxiesStatic configurations
Service push needed to change behavior
Limited range of functionality
Limited to HTTP
Zuul Created
2012Dynamically injected and compiled filters
Manipulate requests and responsesHeaders / Body / etc
Change routing Add metrics and other functions
Built on Netflix’s OSS stackOpen Sourced
Zuul - A Victim of SuccessEasy and convenientInstant resultsHigh adoptionHappy customers
Business logic in proxyAffects system resiliency Zuul team in critical path
Creating a Gateway Strategy
Principles of Netflix’s Gateway Strategy
Creative RoutingDynamic RoutingDelivery FocusedTraffic ShapingReact Fast Insights
Creative Routing - Subclusters with Purpose
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeeze
Red / Green Deployments
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
InstrumentedInstrumented
squeezesqueeze
Developer Test Branches
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
InstrumentedInstrumented
squeezesqueeze
Instrumented Clusters
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeezesqueeze
Squeeze Testing
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeeze
Targeted Routing
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeeze
Service “Canarying”
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary“sticky” canary
“sticky” baselineFIT
Instrumented
squeezesqueeze
“Sticky” Canary
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
canary
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeezesqueeze
Failure Injection Testing
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeezesqueeze
Degraded Experience Testing
GatewayGateway
Gateway
Origin (API)
v1
v2
test
debug
Instrumented
squeeze
“sticky” canarybaseline
“sticky” baseline
v1
v2
test
debug
baseline canary
“sticky” canary
“sticky” baselineFIT
Instrumented
squeezesqueeze
Traffic Shaping
A Global Cloud Deployment
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Global Cloud Routing
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
A Failing region
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Gateway routing to other regions
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Persistence Tier
Business services Tier
Presentation Tier
Network Tier
Websites API
Proxy
DB
Attack prevention
GatewayGateway
Gateway
Origin (API)Origin (API)
API
Origin (API)Origin (API)
Website
Smart Load Balancing
GatewayGateway
Gateway
Origin (API)
Smart Load Balancing - Bad Nodes
GatewayGateway
Gateway
Origin (API)
Gateway Backoff and Blacklists Bad Nodes
GatewayGateway
Gateway
Origin (API)
Zone Failure - Blacklist the Zone automatically
GatewayGateway
Gateway
Origin (API)
React Quickly - Runtime Filter changes
GatewayGateway
Gateway
Origin (API)Origin (API)
API
Origin (API)Origin (API)
Website
Runtime Policy Injection
A Room with a View - Insights
GatewayGateway
Gateway
Origin (API)Origin (API)
API
Origin (API)Origin (API)
Website
Insights
What’s Next for Netflix’s Gateway?
Gateway as a serviceSelf-service dynamic routing / route validationControl APIs for special routing functions
Netty Based Zuul (using RxNetty)Handling persistent connectionsnon-blocking, async
Transport protocol agnostic routingReactive Socket http://reactivesocket.io/
Top Ten Lessons Learned
Build for handling Failures
Expect the Unexpected
Using Routing Creatively
Shard to Reduce Blast Radius
Devices are WeirdProtocols are Weird
Devices are ForeverProtocols are Forever
It will be built “wrong”
Keep Business Logic out of your Gateway
For More Info...
Zuul OSSNetflix Tech BlogRxNettyJobs