November 2016
Slaying Monoliths with &
Yunong XiaoPrincipal Engineer
@yunongxhttp://yunong.io
#netflixeverywhere
Subscriber Growth
20M
33M
46M
59M
72M
85M
2011 2012 2013 2014 2015 2016
API Evolution
So You Want to Watch Netflix
So You Want to Watch Netflix
Watch Anywhere
In The Beginning…
Java Web Server
❖ Java based web server
❖ Renders UI
❖ Accesses data
❖ Individual clients for each service
❖ Different behavior for each client
Java Web server
Route A
Route B
Route C
Route D
Route N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
Spot the Monolith
Java Web server
Route A
Route B
Route C
Route D
Route N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
MONO
LITH
New Devices
API Evolution
Java Web Server
Java Web server
Route A
Route B
Route C
Route D
Route N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
MONO
LITH
REST API
REST API
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
REST API
❖ Inflexible: waiting for weeks between API changes.
❖ Inefficient: multiple round trips
❖ Complex API: hard to maintain
API Evolution
Design for Innovation
❖ Rapid innovation
❖ More AB tests and devices
❖ Customized API
❖ Performance matters
REST API
REST API
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
API.NEXT
API Server
Script A
Script B
Script C
Script D
Script N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
MONO
LITH
Scale
❖ 42.5 billion hours watched in 2015
❖ “Massive” RPS: Billions/day
❖ 1000s of scripts active in prod, 10000s in test
❖ 100s of changes/day
❖ 100s of AB tests with many variants/test
All Scripts Live in One Process
❖ Vertical Scale: Running out of headroom
❖ Memory
❖ I/O
❖ Instance cost: Largest instances $ can buy
HappySad Together?
❖ Resource contention
❖ 1 bad script takes out everyone
❖ Conflicting dependencies
API Server
Script A
Script B
Script C
Script D
Script N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
Developer ErgonomicsUI Engineering Systems Engineering
API Evolution
Requirements
❖ Scalability
❖ Availability
❖ Developer productivity
Runtime Scalability & Availability
❖ Process isolation
❖ Separation of data access scripts and API servers
❖ Reduce infrastructure costs
❖ Horizontally scalable architecture
❖ Faster startup times
❖ Immutable deployment artifacts
Developer Productivity
❖ JS to rule them all
❖ Run and debug scripts locally, set breakpoints, step through code
❖ Fast, incremental builds
❖ As closely mirrors production as possible
API Evolution
API Server
Script A
Script B
Script C
Script D
Script N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
MONO
LITH
API Server
Script A
Script B
Script C
Script D
Script N
…
Client Library A
Client Library B
Client Library C
Client Library N
…
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
Natural SeparationUI Engineering Systems Engineering
Next Generation Data Access API
TV
iOS
Android
Windows
Browsers
Remote Service Layer
Search
MAP
GPS
Playback
…
Clients Node API Edge API Backend Services
Node API Platform
❖ Set of JS data access scripts
❖ Running Node.js + restify
❖ Inside of a Docker
/browse/search/account/signup
Unified Remote Service Layer
/bootstrap/search/account/login
Unified Remote Service Layer
Evolutionary Traits
❖ Runtime platform
❖ Application management
❖ Container infrastructure
❖ Developer tools
“Production”
Evolutionary Traits
❖ Runtime platform
❖ Application management
❖ Container infrastructure
❖ Developer tools
“Production”
“A full-stack developer is one who can add technical debt to any layer of the
application”
Aim: Paved Path for Data Access Apps
❖ Metrics
❖ Alerts
❖ Autoscaling
❖ Load balancing
❖ Discovery
❖ Analytics
Node Runtime: Platform as a Service
❖ Production ready Node platform
❖ Just bring JS business logic
❖ Everything else is “free”
❖ No servers/infrastructure to manage
nf-iso-properties
Properties Discovery RPC
nf-eureka-client
reactive-datasource
Insight
nf-atlas-client
bunyan-suro
(data-pipeline)
bunyan (logging)
nf-salp
Web serverRuntime
reactive-socket-lb
HTTP Client
Evolutionary Traits
❖ Runtime platform
❖ Application management
❖ Container infrastructure
❖ Developer tools
“Production”
Aim: Simple App Management
❖ Versioning
❖ Deployment
❖ Operational Insights
Versioning: Current Problems
❖ APIs change all the time
❖ 100000s different versions
❖ 1000s live in prod
Versioning: Inconsistency
api.netflix.com/tvui/1469577600021
api.netflix.com/web/6dbd361
api.netflix.com/ios/1.3.2
api.netflix.com/android/1234
Build Timestamp
Git sha
App version
Integer
Aim: Consistent Versions & Reproducible Builds
Solution: Use SemVer
Versioning: Node API Index
Routing
api.netflix.com/tvui/1469577600021
api.netflix.com/web/6dbd361
api.netflix.com/ios/1.3.2
api.netflix.com/android/1234
Build Timestamp
Git sha
App version
Integer
Problem: API Upgrades
api.netflix.com/ios/1.3.2 1.3.2
1.3.3
Path immutably baked into client
Solution: SemVer Routing
api.netflix.com/ios/^1.0.0
1.3.2
1.3.3
1.4.0
1.6.5
nq.netflix.com
api.netflix.com/ios/1.3.2
^1.0.0
^1.0.0
1.3.2 1.3.2
Operational Insights
❖ List and view deployed apps and routes
❖ Deployment history
❖ Metrics: RPS, latency, errors, …
❖ Analytics
Generated Dashboards
Evolutionary Traits
❖ Runtime platform
❖ Application management
❖ Container infrastructure
❖ Developer tools
“Production”
Titus: Container Management & Scheduling
Fenzo
Evolutionary Traits
❖ Runtime platform
❖ Application management
❖ Container infrastructure
❖ Developer tools
“Production”
Aim: Developer Productivity
❖ Run and debug scripts locally
❖ Fast, incremental builds
❖ Local “prod” environment
Local Development: Builds are Slow
Build depsCommit to SCM
DocumentJS NQ Scripts
Build Docker Image
Tens of Minutes
Rapid Local Development: Debug in SecondsDeveloper Laptop (Mac OSX)
Virtual Box (Linux)Running Docker Host
Docker Server
ContainerRunning MyApp Image MyApp Image
MyApp scripts & config
NodeQuark Image
Prana Image
NodeJS Image
Ubuntu Image
Recap: Containers
❖ Process isolation❖ Layered dependency management❖ Portability across environments:
prod->test❖ Fast deployment❖ Single deployment artifact: Docker
image
Recap: Node.js
❖ JS everywhere: client & server❖ Performant❖ Lightweight & efficient: run
locally❖ Non blocking❖ Superb ecosystem (npm)❖ Built for the web
Recap: Node Platform❖ Developer productivity
❖ Fast incremental builds❖ Run, debug, and test locally❖ Local prod like environment
❖ Scalability & availability❖ Monolith -> micro-services❖ Process isolation: better availability❖ Horizontally scalable architecture❖ Immutable deployment artifacts
Unified Remote Service Layer
Backend Service A
Backend Service B
Backend Service C
Backend Service N
…
Thanks!
❖ Interested? is hiring! ❖ @yunongx❖ [email protected]❖ yunong.io