110
How Netflix Delivers Software July 8 th , 2014 Email: jedberg@{gmail,netflix}.com Twitter: @jedberg Web: www.jedberg.net Facebook: facebook.com/jedberg Linkedin: www.linkedin.com/in/jedberg

20140708 - Jeremy Edberg: How Netflix Delivers Software

Embed Size (px)

DESCRIPTION

Jeremy Edberg: How Netflix Delivers Software

Citation preview

Page 1: 20140708 - Jeremy Edberg: How Netflix Delivers Software

How Netflix Delivers Software

!July 8th, 2014

Email: jedberg@{gmail,netflix}.com

Twitter: @jedberg

Web: www.jedberg.net

Facebook: facebook.com/jedberg

Linkedin: www.linkedin.com/in/jedberg

Page 2: 20140708 - Jeremy Edberg: How Netflix Delivers Software

When your software fails...

Page 3: 20140708 - Jeremy Edberg: How Netflix Delivers Software

will your system survive?

Page 4: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 5: 20140708 - Jeremy Edberg: How Netflix Delivers Software

The Netflix way

• Fully automated build tools to test and make packages

• Fully automated machine image bakery

• Fully automated image deployment

Page 6: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Everything is “built for three”

• Independent teams responsible for both Dev and Ops

• Redundancy through multi-region deployment

The Netflix way

Page 7: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Philosophy

Page 8: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• We hire responsible adults and keep rules and policies to a minimum

• Developers can change any code in production at any time

• And things don’t break (usually)

Freedom and Responsibility

Page 9: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Automate all the things!

http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html

Page 10: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Application startup

• Configuration

• Code deployment

• System deployment

Automate all the things!

Page 11: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Standard base image

• Tools to manage all the systems

• Reduce errors through reproducibility

Automation

Page 12: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Shared state should be stored in a shared service

!

Data on an instance should be replicated to other instances

Page 13: 20140708 - Jeremy Edberg: How Netflix Delivers Software

“Build for three”

We hold a boot camp for new engineers to teach them how to

build for a highly distributed environment.

Page 14: 20140708 - Jeremy Edberg: How Netflix Delivers Software

“Build for three”We hold a boot camp for new

engineers to teach them how to build for a highly distributed

environment.

Page 15: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 16: 20140708 - Jeremy Edberg: How Netflix Delivers Software

12B  outbound  requests  per  day  

to  API  dependencies

Movie  Ra)ngs

Personaliza)on  Engine User  Info Movie  

MetadataSimilar  Movies Reviews A/B  Test  

Engine

2B  requests  per  day    

into  the  NeHlix  API

Discovery API

Streaming API

Page 17: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Movie  Ra)ngs

Personaliza)on  Engine User  Info

Movie  Metadata

Similar  Movies

Reviews

A/B  Test  Engine

Discovery API

Streaming API

Content Encoding

CDN Management

QOS Logging

DRM

OpenConnect Edge Locations

Browse

Play

Watch

Page 18: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Services are built by different teams who work together to figure out what each service will provide.

• The service owner publishes an API that anyone can use.

Highly aligned, loosely coupled

Page 19: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Easier auto-scaling

• Easier capacity planning

• Identify problematic code-paths more easily

• Narrow in the effects of a change

• More efficient local caching

Advantages to a Service Oriented Architecture

Page 20: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Developers deploy when they want

• They also manage their own capacity and autoscaling

• And fix anything that breaks at 4am!

Freedom and Responsibility

Page 21: 20140708 - Jeremy Edberg: How Netflix Delivers Software

All systems choices assume

some part will fail at some point.

Page 22: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Simulate things that go wrong

• Find things that are different

The Monkey Theory

Page 23: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Execution

Page 24: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 25: 20140708 - Jeremy Edberg: How Netflix Delivers Software

AWS

Netflix OSS

Netflix Application Code

Page 26: 20140708 - Jeremy Edberg: How Netflix Delivers Software

AWS

Netflix OSS

YOUR Application Code

Page 27: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Instances

• Machine Images

• Elastic IPs

• Load Balancers

• Security groups / Autoscaling

What AWS Provides

AWS

Page 28: 20140708 - Jeremy Edberg: How Netflix Delivers Software

AWS

Netflix OSS

YOUR Application Code

Page 29: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Service Oriented Architecture

• HTTP/Rest interfaces between services

Netflix built a global PaaS

Netflix OSS

Page 30: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Supports all regions and zones

• Multiple accounts

• Cross region/account replication

• Internationalized, localized and GeoIP routed

• Advanced key management

• Autoscaling with 1000s of instances

• Monitoring and alerting on millions of metrics

Netflix PaaS featuresNetflix OSS

Page 31: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 32: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Open Source at Netflix

Page 33: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 34: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 35: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Be liberal in what you accept, strict in what you send

Circuit Breakers (Hystrix)

Page 36: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 37: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Simulate things that go wrong

• Find things that are different

The Monkey Theory

Page 38: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Chaos -- Kills random instances

• Chaos Gorilla -- Kills zones

• Chaos Kong -- Kills regions

• Latency -- Degrades network and injects faults

• Conformity -- Looks for outliers

The simian army• Circus -- Kills and launches

instances to maintain zone balance

• Doctor -- Fixes unhealthy resources

• Janitor -- Cleans up unused resources

• Howler -- Yells about bad things like Amazon limit violations

• Security -- Finds security issues and expiring certificates

Page 39: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 40: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 41: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Blueprint for the rest of the platform libraries

• Pluggable architecture

Page 42: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 43: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• On instance software load balancer

• Zone aware / Zone affinity

• Handles retry logic

Page 44: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 45: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Global variables

• Support for staged rollout

• Feature flags

Page 46: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 47: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 48: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Application to instance mapping

• Heartbeat to keep track of health

Page 49: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 50: 20140708 - Jeremy Edberg: How Netflix Delivers Software

DQ Transport Routing

Suro

etc

Eventbus

Druid

Page 51: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 52: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 53: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Why Bake?

Generic AMI InstanceTraditional: •launch OS •install packages •install app

Netflix: •launch OS+app

App AMI Instance

Page 54: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Getting Baked

Perforce / Git

libraries

source

Ant targets

Ivy

Groovy all over

app bundles

Jenkins

sync

resolve

buildcompile report

publishtest

Artifactory

snapshot / release libraries / apps

Page 55: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Base Image Baking

Yum / Apt

Linux: CentOS, Fedora, Ubuntu

RPMs: Apache, Java...

ec2 slave instances

S3 / EBSfoundation

AMI

base AMI

Bakery

mount

install

Ready for app bake

snapshot

AWS

Page 56: 20140708 - Jeremy Edberg: How Netflix Delivers Software

App Image Baking

Jenkins / Yum / Artifactory

Linux, Apache, Java, Tomcat

AWS

app bundle

ec2 slave instances

S3 / EBS

base AMI

app AMI

Bakery

mount

install

Ready to launch!

snapshot

Page 57: 20140708 - Jeremy Edberg: How Netflix Delivers Software

app AMI Linux Base AMI (CentOS or Ubuntu)

Java

Tomcat

Optional Apache

Monitoring !

Log Rotation to S3

monitoring

GC and thread dump

logging

Application war file, base servlet, platform, interface

jars for dependent services

Healthcheck, status servelets, JMX interface,

Servo autoscale

Page 58: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Linux Base AMI (CentOS or Ubuntu)

Java

Tomcat

Optional Apache

Monitoring !

Log Rotation to S3

monitoring

GC and thread dump

logging

Application war file, base servlet, platform, interface

jars for dependent services

Healthcheck, status servelets, JMX interface,

Servo autoscale

app AMI

Application war file

Page 59: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Linux Base AMI (CentOS or Ubuntu)

Java

JBoss

Optional Apache

Monitoring !

Log Rotation to S3

monitoring

GC and thread dump

logging

Application war file, base servlet, platform, interface

jars for dependent services

Healthcheck, status servelets, JMX interface,

Servo autoscale

app AMI

Page 60: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Linux Base AMI (CentOS or Ubuntu)

Python

Bottle

Optional Apache

Monitoring !

Log Rotation to S3

monitoring

logging

Application file, base server, platform, interface

libs for dependent services

app AMI

Page 61: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 62: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 63: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 64: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 65: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Deploying Code; Step 1

Page 66: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 67: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 68: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Auto Scaling Group

Launch Configuration

Security Group

Amazon Machine Image

Instances

Load Balancer

Page 69: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 70: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix has moved the granularity

from the instance to the cluster

Page 71: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 72: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Data is the most important asset Netflix

has. It’s what differentiates us from our competitors.

Page 73: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix OSS

Page 74: 20140708 - Jeremy Edberg: How Netflix Delivers Software

EVCache

• Wrapper on top of memcached

• Automatically replicates writes to multiple regions

• Pulls cache data intelligently via zone affinity

Page 75: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Cassandra

Page 76: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Availability over consistency

• Writes over reads

• We know Java

• Open source + support

Why Cassandra?

Page 77: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• Priam

• Zero touch auto-config

• State management

• Token assignment

• Node replacement

• Backup/restore to/from S3

Using Cassandra at Netflix

• Astyanax

• OO abstraction to Cassandra

• Multi-region support

Page 78: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Cassandra Architecture

Page 79: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Going Multi-region

Page 80: 20140708 - Jeremy Edberg: How Netflix Delivers Software

• 100% uptime is theoretically possible.

• You have to replicate your data

• This will cost money

Leveraging Multi-region

Page 81: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 82: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 83: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 84: 20140708 - Jeremy Edberg: How Netflix Delivers Software

us-east-1 us-west-2 etc

eu-west-1

Page 85: 20140708 - Jeremy Edberg: How Netflix Delivers Software

us-east-1 us-west-2 etc

eu-west-1

Page 86: 20140708 - Jeremy Edberg: How Netflix Delivers Software

us-east-1 us-west-2 etc

eu-west-1

Page 87: 20140708 - Jeremy Edberg: How Netflix Delivers Software

What’s going on?!

Page 88: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Atlas

!

alerting

api

api

Central Event

Gateway

Paging Service

Amazon SES

CORE Agent

Other Team’s Agent

CORE Agent

Alert Systems

Page 89: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Central Event

Gateway

• Parse raw alerts, match application to owner

• Add image captures and links to related graphs for easy mobile use

• Send to the right service based on priority

• Register the event in Chronos, the timeline application

• Correlate low priority alerts and generate new high priority alerts

Page 90: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 91: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Metrics in Production• 796B Daily metric

points

• Peaks at 1.4B / min

• 50% daily metric churn

Page 92: 20140708 - Jeremy Edberg: How Netflix Delivers Software

What is a metric?com.netflix.eds.nccp.successful.requests.uiversion.nccprt-authorization.devtypid-101.clver-PHL_0AB.uiver-UI_169_mid.geo-US

Page 93: 20140708 - Jeremy Edberg: How Netflix Delivers Software

How we built it• Built our own big data

system

• Based on S3 and EMR

• Less copies, lower resolution, and slower speed retrieval based on age of data

Page 94: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Self Serve is the Key

• Developers choose what metrics to submit

• What graphs they put on their dashboards

• What to alert on

Page 95: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Example Alert Config

Page 96: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Atlas

Page 97: 20140708 - Jeremy Edberg: How Netflix Delivers Software

When something breaks..

Page 98: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 99: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 100: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 101: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Breakdown of an outage

Is something wrong? Alerting

Where is the problem? Telemetry and Dashboards

What changed? ???

Page 102: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Breakdown of an outage

Is something wrong? Alerting

Where is the problem? Telemetry and Dashboards

What changed? Change control?

Page 103: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Change control, the good• Tells you what changed

• Tells you what’s about to change

• Great for coordination when one change gates another change

Page 104: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Change control, the bad• It’s manual

• It expresses intent, not reality

• It forces you to serialize your changes to an extent

Page 105: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Breakdown of an outage

Is something wrong? Alerting

Where is the problem? Telemetry and Dashboards

What changed? Chronos

Page 106: 20140708 - Jeremy Edberg: How Netflix Delivers Software
Page 107: 20140708 - Jeremy Edberg: How Netflix Delivers Software

(Some of) Netflix is open source:

https://netflix.github.io

Just a quick reminder...

Page 108: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Netflix is hiring!

If you like what you see here, feel free to reach out!

Page 109: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Questions?

Page 110: 20140708 - Jeremy Edberg: How Netflix Delivers Software

Getting in touch

Email: jedberg@{gmail,netflix}.com

Twitter: @jedberg

Web: www.jedberg.net

Facebook: facebook.com/jedberg

Linkedin: www.linkedin.com/in/jedberg