How Netflix directs 1/3rd of - qconsf.com...How Netflix directs 1/3rd of Haley Tucker Mohit Vora...

Preview:

Citation preview

How Netflix directs 1/3rd of

Haley Tucker Mohit Vora

QCon San FranciscoNov 16, 2015

Playback Overview

DATA PLANE(CDN)

CONTROL PLANE

STREAMNETFLIX DEVICE

Project 366 #59; 280212 Days Gone By..., CC BY-SA, Pete 2012, Flickr

AUDIOVIDEO TEXT

STREAMS

How do we build a streaming “tape”?

Determine the preferred experience

DEVICETITLE

CONNECTIONS

COUNTRY

NETWORK

Broadband - wired or wifiCellular - Edge, 3G, LTE, ...

CUSTOMER

That’s exactly what I want ...now where can I get it?

Point the device to appropriate locations

Steering

GENERATE PLAYBACK MANIFEST

PLAYBACK MANIFEST

PLAYBACK MANIFEST

Uh-oh, the content is encrypted!

Keymaster, CC BY-SA, Sean McGrath 2007, Flickr

LICENSE

LICENSE

And...Action!

SESSION (START, STOP, PAUSE, RESUME, KEEPALIVE)

SESSION EVENTS

LICENSE

PLAYBACK MANIFEST

GENERATE PLAYBACK MANIFEST

SESSION (START, STOP, PAUSE, RESUME, KEEPALIVE)

PLAYBACK LIFECYCLE

Data Plane(CDN)

What is a Content Delivery Network?

Open ConnectA NETFLIX ORIGINAL

CONTENT RANK

BYTE

S ST

REA

MED

PREDICTABLE VIEWING PATTERNS

Content Delivery Mechanisms

DATA PLANE(CDN)

CONTROL PLANE

STREAMNETFLIX DEVICE

STREAM

ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

STREAM

ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

ISP CO-LOCATION

STREAM

ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

STREAM

ISP DATA CENTER

NETFLIX DEVICE

IXP DATA CENTER

NFLXROUTER

ISP ROUTER

ISP ROUTER

NETFLIX

STREAM

ISP DATA CENTER

NETFLIX DEVICE

IXP DATA CENTER

NFLXROUTER

ISP ROUTER

ISP ROUTER

NETFLIX

STREAM

ISP DATA CENTER

NETFLIX DEVICE

IXP DATA CENTER

NFLXROUTER

ISP ROUTER

ISP ROUTER

IXP INTERCONNECTION

NETFLIX

Control Plane

OPEN CONNECTSTREAMNETFLIX DEVICE

CDN CONTROL

PLANE

DEVICE CONTROL

PLANEDON’T KEEP SECRETS

Network ProximityContent Positioning

Load Distribution

Network Proximity

Social Network in a Course, CC BY-SA, Hans Põldoja 2010, Flickr

By Specification?

By Specification?Doesn’t scale

Border Gateway Protocol

TAKEAWAY

BGP ROUTE175.231.128.0/24

(+ proximity attributes)

Use BGP

ISP2 DATA CENTER

ISP2 BGP ROUTES

CONTROLPLANE

IXP DATA CENTER

ISP1 BGP ROUTES

ISP1 DATA CENTER ISP1

NFLX

BGP ROUTE175.231.128.0/24 (+ proximity attributes)

Content Positioning

LOCALIZE TRAFFIC

ISP DATA CENTER

SERVE CACHE MISS

HOW DO WE DETERMINE WHAT CONTENT WILL BE POPULAR TOMORROW?

CHANGING CATALOG

EVOLVING MEMBER TASTES

MINIMIZE FILL CHURN

ISP DATA CENTER

OFF PEAKFILL

USE HISTORICAL DATA

CONTENT RANKBY

TES

STRE

AM

ED

bytesStreamed/bytesStored

IS ONE DAY OF HISTORY ENOUGH?

EXPONENTIALLY WEIGHTED MOVING AVERAGE

WEI

GHT

DAYS AGO0 10 20 30 40

= 0.9

TAKEAWAY Weigh Recent Data Higher

HOW SHOULD CONTENT BE ALLOCATED?

MILLIONS OF FILES

THOUSANDS OF SERVERS

HOW SHOULD CONTENT BE ALLOCATED?

SVR4

SVR2SVR1

SVR3

FILE1

FILE3

FILE1

TAKEAWAY

ALLOCATE MULTIPLE REPLICASRESILIENT TO CLUSTER CHANGES

REPEATABLE

Consistent Hashing

ISP2 DATA CENTER

WHAT TO FILL?

CONTROLPLANE

IXP DATA CENTER

WHERE TO FILL FROM?

ISP1 DATA CENTER

S3

FILL OVER HTTP

Load Distribution

CONTENT RANKBY

TES

STRE

AM

ED

LOTS OF THROUGHPUT

LOTS OF STORAGE

CONTENT WITH CONFLICTING CONSTRAINTS

SSD BASED

SPINNING DISK BASED

WITHIN CLUSTERS ON EACH SERVER

MEMORY

CONTENT RANK

BYTE

S ST

REA

MED

SSD SPINNING DISK

TAKEAWAY Tier Infrastructure

ACROSS SERVERS WITHIN CLUSTERS

BALANCE BALANCE

ACROSS EQUIDISTANT CLUSTERS

HOW DO WE BALANCE LOAD?

OPEN CONNECTNETFLIX DEVICE

CDN CONTROL

PLANE

DEVICE CONTROL

PLANE

LOAD BALANCER

STREAM

USING CONTENT DISTRIBUTION

HOW DO WE BALANCE LOAD?

FLIP A COIN

AND WHEN WE HAVE EQUALLY ATTRACTIVE LOCATIONS TO SERVE FROM –

INCIDENT LOAD

SYST

EMM

ETRI

CS

MAX

INSANESANE

HOW DO WE LOAD SERVERS OPTIMALLY?

… AMIDST EVER CHANGING INTERNET WEATHER

TRA

FFIC

t

… AND DAILY TRAFFIC EBBS AND FLOWS

+ SERVE STREAMS

FEEDBACK

-TRAFFIC EFFECT ON

SYSTEM METRICS

CONTROL

WE INTRODUCE A FEEDBACK LOOP

TAKEAWAY PID CONTROLLER

TAKEAWAY PID CONTROLLER

Process Variable

Set Point

Control Variable

Current RPM

Desired RPM

Input Voltage

System Metrics

System Metrics Max

Controlled Traffic

DC MOTOR

TAKEAWAY PID CONTROLLER

Process Variable

Set Point

Control Variable

System Metrics

System Metrics Max

Controlled Traffic

Current RPM

Desired RPM

Input Voltage

LOADING SERVERS

ISP2 DATA CENTER

CONTROL TO 80%

CONTROLPLANE

IXP DATA CENTER

NO CONTROL

ISP1 DATA CENTER

0.0 < CONTROL VAR < 1.0

TRA

FFIC

t

NEXT HOP

TRAFFIC SHIFTS TO NEXT HOP LOCATION

Steering

STREAMNETFLIX DEVICE

CDN CONTROL

PLANEPLAYBACK SERVICES

STEERINGGot URLs for f1, f2, …, fn?

Yes, here’s the URLs

PROXIMITYHEALTH

CONTENT

CASS

KAFKA

OPEN CONNECT

ArchitectureEvolution

5 CHALLENGES

API

STEERING

SESSION

MANIFEST

DRMLICENSE

How did we evolve from here...

API

STEERING

SESSION

MANIFEST

DRMLICENSE

CLIENT SCRIPTS

SERVICE LAYER

RULES

INSIGHTS

...to here.

5 SOLUTIONS

CACHE

DEVICE

CUSTOMER

TITLE

NETWORK

Broadband - wired or wifiCellular - Edge, 3G, LTE, ...

CONNECTIONS

COUNTRY

High dimensionalityCHALLENGE

How can we quickly alter the playback experience in a targeted manner?

ALL STREAMS

FOR CONTENT

ENGINE

RULES

BEST STREAMS

FOR SESSION

Stream FilteringUSE CASE

EXAMPLE RULES

ENGINE

CONFIGURATION MANAGEMENT UI

UPDATING RULES

TOPIC

PUBLISH

RULESSUBSCRIBE

Dynamic Business Rules

API

STEERING

SESSION

MANIFEST

DRMLICENSE

RULES

TAKEAWAY

Pinpoint what is brokenCHALLENGE

Hay

stac

ks, C

C B

Y-S

A, J

ohn

Pav

elka

200

8, F

lickr

3:00 AM : Pager goes off

METRICS AND ALERTING

OK...error code 105 is elevated. But why?

Indexed Logging

Detailed Domain Insights

API

STEERING

SESSION

MANIFEST

DRMLICENSE

RULES

INSIGHTS

TAKEAWAY

Large amount of stateCHALLENGE

How can we enable faster UIs and low-end devices?

We introduced a server-side caching tier

MANIFESTSC

US

TOM

ER

A

CU

STO

ME

R A

CU

STO

ME

R B

Watch out for resiliency issues!!

Ping Pong project, CC BY-SA, Michael Knowles 2008, Flickr

API

STEERING

SESSION

MANIFEST

DRMLICENSE

RULES

INSIGHTS

Reduce client stateTAKEAWAY

CACHE

Managing device protocolsCHALLENGE

Square peg, round hole, CC BY-SA, Simon Law 2006, Flickr

Can we allow devices to define their own protocols?

DYNAMIC SCRIPTING PLATFORM

SESSION

LICENSE

MANIFESTXBOX

iPHONE

HTML5 PLAYER

iphone.groovy

JAVA

SER

VICE

LA

YER

xbox.groovy

html5.groovy

API

STEERING

SESSION

MANIFEST

DRMLICENSE

RULESINSIGHTS

Client-driven protocols

API

CLIENT SCRIPTS

SERVICE LAYER

TAKEAWAY

CACHE

Enabling high-velocity innovationCHALLENGE

CC BY-SA, Nathan E Photography 2008, Flickr

How can we expose new data with the least amount of churn?

API MANIFEST

Stream● Bitrate● Framerate● Dynamic Data

Stream’● Bitrate● Dynamic Data

This works from API:● stream.getBitrate()● stream.getDynamicData().get(“FRAME_RATE”)

Works both ways!

This works from CLIENT SCRIPT!● stream.getDynamicData().get(“BIT_RATE”)● stream.getDynamicData().get(“FRAME_RATE”)

CLIENT SCRIPT

Stream’’● Dynamic Data

Works both

ways!

API MANIFEST

Stream● Bitrate● Framerate● Dynamic Data

Stream’● Bitrate● Dynamic Data

Works both

ways!

API

CLIENT SCRIPTS

SERVICE LAYER

STEERING

SESSION

MANIFEST

DRMLICENSE

RULES

INSIGHTS

Data pass-thruTAKEAWAY

CACHE

TAKEAWAYS

● BGP based proximity● Tiered Infrastructure● PID Controller● EWMA for historical data● Consistent Hashing

● Dynamic business rules● Detailed domain insights● Reduce client state● Client-driven protocols● Data pass-thru

TAKEAWAYS

● BGP based proximity● Tiered Infrastructure● PID Controller● EWMA for historical data● Consistent Hashing

● Dynamic business rules● Detailed domain insights● Reduce client state● Client-driven protocols● Data pass-thru

Questions?Haley Tucker @hwilson1204

Mohit Vora@mohitvora

STREAM

NETFLIX DEVICE

NETFLIX DEVICE

STREAM

SPINNING DISK SERVERS

SSD SERVERS

WHAT TO FILL?

WHERE TO FILL FROM?

API

CLIENT SCRIPTS

SERVICE LAYER

CACHE CONTROL

DON’T KEEP SECRETSSTEERING

SESSION

MANIFEST

DRMLICENSE

RULES

CACHE

INSIGHTS

IXP DATA CENTER

ISP1

ISP2

ISP2 BGP ROUTES

ISP1 BGP ROUTES

CONTROL TO 80%

● Background image from https://www.flickr.com/photos/centralasian/4099515384, Image was cropped and red lines and dots were drawn on top, https://creativecommons.org/licenses/by/2.0/.

● Image from https://www.flickr.com/photos/28705377@N04/4142872268, No modifications made, https://creativecommons.org/licenses/by/2.0/.

● Image of cassette is from https://www.flickr.com/photos/comedynose/6939206771, Image was cropped, https://creativecommons.org/licenses/by/2.0/.

● Image of speaker is from https://www.flickr.com/photos/av_hire_london/5578975575, No changes made, https://creativecommons.org/licenses/by/2.0/.

● Image of television is from https://www.flickr.com/photos/jvcamerica/3660897684/, No changes made, https://creativecommons.org/licenses/by/2.0/.

● Image of text is from https://www.flickr.com/photos/dno1967b/5754743006, No changes made, https://creativecommons.org/licenses/by/2.0/.

● Background image from https://www.flickr.com/photos/mcgraths/866572532, Image was cropped, https://creativecommons.org/licenses/by/2.0/.

● Image from https://www.flickr.com/photos/thatguyfromcchs08/2300190277, Image is dimmed, https://creativecommons.org/licenses/by/2.0/.

● Image from https://www.flickr.com/photos/mknowles/3134373590, Image was cropped, https://creativecommons.org/licenses/by-sa/2.0/.

Image Attributions