76
The way to deal with Big data problems Monal Daxini March 2016

BDX 2016- Monal daxini @ Netflix

Embed Size (px)

Citation preview

Page 1: BDX 2016-  Monal daxini  @ Netflix

The way to deal with Big data problems

Monal Daxini

March 2016

Page 2: BDX 2016-  Monal daxini  @ Netflix

Monal DaxiniReal Time Data Infrastructure

Senior Software Engineer, Netflixhttps://www.linkedin.com/in/monaldaxini

@monaldax

#Netflix #Keystone

Page 3: BDX 2016-  Monal daxini  @ Netflix

We help Produce,Store,

Process,Move

Events @ scale

Page 4: BDX 2016-  Monal daxini  @ Netflix

Tell me more...

● Big Data Ecosystem @ Netflix● How we built a scalable event pipeline - Keystone - in a year

○ Replaced legacy system without service disruption○ Small team 8 +1

● Netflix Culture○ Relevant tenets tagged on the slides

Page 5: BDX 2016-  Monal daxini  @ Netflix

Global Launch - Jan 6, 2016

Page 6: BDX 2016-  Monal daxini  @ Netflix

Over 75M Members

190 Countries

125M hours/day → 11B hours / quarter

14,269 years / day → 1,255,707 years / quarter

1000+ devices

37% of Internet traffic at peak

Page 7: BDX 2016-  Monal daxini  @ Netflix

Netflix Is a Data Driven Company

Content

Product

Marketing

Finance

Business Development

Talent

Infrastructure

← C

ultu

re o

f Ana

lytics→

Page 8: BDX 2016-  Monal daxini  @ Netflix

Data @ Netflix

Data at Rest (batch)

Data in Motion (streaming)

Page 9: BDX 2016-  Monal daxini  @ Netflix

Big Data Systems - batch

Ingestion / Kafka -> Ursula, Aegisthus

Storage / S3, Teradata, Redshift, Druid

Processing / Pig, Hive, Presto, Spark

Reporting / Microstrategy, Tableau, Sting

Scheduling / UC4

Interface / Big Data Portal, Kragle

Open source &

Community Driven

Page 10: BDX 2016-  Monal daxini  @ Netflix

Big Data Systems - batch

Page 11: BDX 2016-  Monal daxini  @ Netflix

Scale - batch

AWS S3 (instead of HDFS)

40 PB (S3) Compressed

Of which 13 PB events data

Page 12: BDX 2016-  Monal daxini  @ Netflix

Big Data Systems - streaming

Data Pipeline - Keystone

Playback & operational insight - Mantis

Stream Processing* - Spark Streaming

Metrics & monitoring - Atlas

Loosely Coupled

Highly AlignedOpen so

urce &

Community

Driven

Page 13: BDX 2016-  Monal daxini  @ Netflix

What does culture have to do with big data?

Page 14: BDX 2016-  Monal daxini  @ Netflix

Netflix Culture Deck

Netflix CultureFreedom & Responsibility

Page 15: BDX 2016-  Monal daxini  @ Netflix

"It may well be the most important document ever to come out of the Valley." 1

Sheryl SandbergCOO, Facebook

1 Business Insider, 2013

Page 16: BDX 2016-  Monal daxini  @ Netflix

A NETFLIX ORIGINAL SERVICE

How we built an internal facing 1 trillion / day stream processing cloud platform in a year, and how culture played a pivotal role

Freedom & Responsibility

Page 17: BDX 2016-  Monal daxini  @ Netflix

Years ago...

Page 18: BDX 2016-  Monal daxini  @ Netflix

In the Old Days ...

EMR

EventProducers

Page 19: BDX 2016-  Monal daxini  @ Netflix

Chukwa/Suro + Real-Time Branch

Page 20: BDX 2016-  Monal daxini  @ Netflix

About a year ago ...

Page 21: BDX 2016-  Monal daxini  @ Netflix

Chukwa / Suro + Real-Time Branch

EventProducer

Druid

Stream Consumers

EMR

ConsumerKafka

Suro Router

EventProducer

Suro

Kafka

SuroProxy

Page 22: BDX 2016-  Monal daxini  @ Netflix

Support at-least-once processing

Scale, Ease of Operations

Replace dormant open source software - Chukwa

Enable future value adds - Stream Processing As a Service

Seamless transition to the new platform

Context Not Control

Page 23: BDX 2016-  Monal daxini  @ Netflix

Migrate Events to a new Pipeline In flight,while not losing more that 0.1% of them

Context Not ControlHighly A

ligned

Loosel

y Cou

pled

Page 24: BDX 2016-  Monal daxini  @ Netflix

Jan 2016

Page 25: BDX 2016-  Monal daxini  @ Netflix

Keystone

Stream Consumers

SamzaRouter

EMR

FrontingKafka

ConsumerKafka

Control Plane

EventProducer

KS

Prox

y

Page 26: BDX 2016-  Monal daxini  @ Netflix

1 trillion events ingested per day during holiday season

1+ trillion events processed every day

350 billion a year ago 600+ billion events ingested per day

Keystone - Scale - Streaming

Page 27: BDX 2016-  Monal daxini  @ Netflix

11 million events (24 GB per second) peak

Upto 10MB payload / Avg 4K

1.3 PB / day

Keystone - Scale - Streaming

Page 28: BDX 2016-  Monal daxini  @ Netflix

Events & Producers

Page 29: BDX 2016-  Monal daxini  @ Netflix

Keystone

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Page 30: BDX 2016-  Monal daxini  @ Netflix

Event Payload is ImmutableAt-least-once semantics*

* Once the event makes it to Kafka, ther are disaster scenarios where this breaks.

Page 31: BDX 2016-  Monal daxini  @ Netflix

Injected Event Metadata

● GUID

● Timestamp

● Host

● App

Page 32: BDX 2016-  Monal daxini  @ Netflix

Keystone Extensible Wire Protocol

● Backwards and forwards compatibility

● Supports JSON, AVRO on the horizon

● Invisible to source & sinks

● Efficient - 10 bytes overhead per message

○ because message size - hundreds of bytes to 10MB

Page 33: BDX 2016-  Monal daxini  @ Netflix

Netflix Kafka Producer

● Best effort delivery - ack = 1

● Prefer drop event than disrupting producer app

● Resume event production after Kafka cluster restore

● Integration with Netflix Ecosystem

● Configurable topic to Kafka clusters route

Page 34: BDX 2016-  Monal daxini  @ Netflix

Fronting Kafka Clusters

Page 35: BDX 2016-  Monal daxini  @ Netflix

Keystone

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Page 36: BDX 2016-  Monal daxini  @ Netflix

● Pioneer Tax● Started with 0.7● In prod with 0.8.2● Move to 0.9 & VPC in progress

Kafka in the Cloud

Page 37: BDX 2016-  Monal daxini  @ Netflix

Based on topics assigned

● Normal-priority (majority)● High-priority (streaming activities etc.)

Fronting Kafka Topic Classification

Page 38: BDX 2016-  Monal daxini  @ Netflix

● ≅3200 d2.xl brokers for regular, failover, & consumer

● 125 Zookeeper nodes○ Independent zookeeper cluster per Kafka cluster

● 24 island clusters, 8 per region○ 3 ASGs per cluster, 1 ASG per zone○ 24 warm standby 3 node failover clusters

Scale - Kafka (prod)

Page 39: BDX 2016-  Monal daxini  @ Netflix

● No dynamic topic creation● Two copies● Zone aware assignment of Topic partitions and replica

Fronting Kafka Topics

Page 40: BDX 2016-  Monal daxini  @ Netflix

In a distributed system make sure you understand limitations and failures,

even if you don’t know all the features.

- Monal

Page 41: BDX 2016-  Monal daxini  @ Netflix

In addition, we doKafka Kong once a week

Page 42: BDX 2016-  Monal daxini  @ Netflix

Fronting Kafka Failover

Self Service Tool

Blameless Culture

Page 43: BDX 2016-  Monal daxini  @ Netflix

Fronting Kafka Failover

Page 44: BDX 2016-  Monal daxini  @ Netflix

Fronting Kafka Failover

Page 45: BDX 2016-  Monal daxini  @ Netflix

Kafka Management UI (Beta)Open sourcing on the road map

Open source

&

Community

Driven

Page 46: BDX 2016-  Monal daxini  @ Netflix
Page 47: BDX 2016-  Monal daxini  @ Netflix
Page 48: BDX 2016-  Monal daxini  @ Netflix

Kafka AuditorOpen sourcing on the road map

Open source

&

Community

Driven

Page 49: BDX 2016-  Monal daxini  @ Netflix

Kafka Auditor - One pre cluster

● Broker monitoring

● Consumer monitoring

● Heart-beat & Continuous message latency

● On-demand Broker performance testing

● Built as a service deployable on single or multiple instances

Page 50: BDX 2016-  Monal daxini  @ Netflix

Kafka Cluster Size -Tips

● Per Cluster Stay under 10k partitions & 200 brokers

● Leave approx. 40% free disk space on each broker

Page 51: BDX 2016-  Monal daxini  @ Netflix

● Started with AWS zone aware partition assignments

● We have discovered and filed several bugs

○ Details - Upcoming in Netflix Tech blog

Kafka ContributionsOpen source &

Community Driven

Page 52: BDX 2016-  Monal daxini  @ Netflix

Routing Service

Page 53: BDX 2016-  Monal daxini  @ Netflix

Keystone

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Page 54: BDX 2016-  Monal daxini  @ Netflix

Routing Infrastructure

+

CheckpointingCluster

+ 0.9.1Go

C language

Page 55: BDX 2016-  Monal daxini  @ Netflix

Router Job Manager(Control Plane)

EC2 InstancesZookeeper

(Instance Id assignment)

JobJobJob

ksnode

Checkpointing Cluster

ASG

Page 56: BDX 2016-  Monal daxini  @ Netflix

Custom Go Executor

./runJob

Logs

Snapshots

Attach Volumes

./runJob./runJob

Reconcile Loop - 1 minHealth Check

What’s running in ksnode?

Zookeeper(Instance Id assignment)

Page 57: BDX 2016-  Monal daxini  @ Netflix

Logs ZFS Volume Snapshots

Custom Go Executor

./runJo

b.

/runJob

./runJo

b

Go Tools Server��Client ToolsStream Logs

Browse through rotated logs by date

Ksnode Tooling

Page 58: BDX 2016-  Monal daxini  @ Netflix

Yes! You inferred right!

No Mesos & No Yarn

Page 59: BDX 2016-  Monal daxini  @ Netflix

Distributed Systems are HardKeep it Simple

Minimize Moving Parts

Page 60: BDX 2016-  Monal daxini  @ Netflix

● 13,000 docker containers (samza jobs)○ 7,000 - S3 Sink○ 4,500 - Consumer Kafka sink○ 1,500 - Elasticsearch sink

● 1,300 AWS C3-4XL instances

Scale - Routing Service

Page 61: BDX 2016-  Monal daxini  @ Netflix

More Info - Samza Meetup (10/2015)

Samza ver 0.9.1 Contributions

Open source &

Community Driven

Page 62: BDX 2016-  Monal daxini  @ Netflix

Target & Achieved <= 0.1% diff

bw Chukwa & Keystone pipeline,

over 2.6 PB of data / day

Chukwa & Keystone Pipeline Shadowing

Page 63: BDX 2016-  Monal daxini  @ Netflix

Metrics & Monitoring

Page 64: BDX 2016-  Monal daxini  @ Netflix

Keystone

Stream Consumers

SamzaRouter

EMR

FrontingKafka

ConsumerKafka

Control Plane

EventProducer

KS

Prox

y

Page 65: BDX 2016-  Monal daxini  @ Netflix

Customer Facing per topic end-to-end dashboard

Page 66: BDX 2016-  Monal daxini  @ Netflix

Dev facing infrastructure end-to-end dashboard

Page 67: BDX 2016-  Monal daxini  @ Netflix

Scaling Avenues

Page 68: BDX 2016-  Monal daxini  @ Netflix

● Exposed cost attribution per event producers & topic

○ E.g. one producer reduced throughput by 600%

● Automation - frees up additional resources

Scaling Up by Scaling Down

Page 69: BDX 2016-  Monal daxini  @ Netflix

● No dedicated product or project managers

● No separate devops or operational team

● This does not mean we are constantly overworked

○ we make wise and simple choices and

○ lean towards automation & self-healing systems.

We build and run what you saw today!

You build It!

You run it!High Perf

ormance

Page 70: BDX 2016-  Monal daxini  @ Netflix

Not DevOps, but move towards NoOps

You build it! You run it!

Page 71: BDX 2016-  Monal daxini  @ Netflix

● High Performance culture● Communication● No culture of process adherence

○ Creativity & Self Discipline○ Freedom and Responsibility

Page 72: BDX 2016-  Monal daxini  @ Netflix

Looking into the future?

Page 73: BDX 2016-  Monal daxini  @ Netflix

Streaming Processing As a Service

● multi-tenant polyglot support of streaming engines like

Spark Streaming, Mantis, Samza, and may be Flink

Future stepsOpen source &

Community Driven

Page 74: BDX 2016-  Monal daxini  @ Netflix

Messaging As a Service

● Kafka & Others● Spark Streaming, Mantis, Samza, and may be Flink.

Future stepsOpen source &

Community Driven

Page 75: BDX 2016-  Monal daxini  @ Netflix

Data thruway

● Support for schemas - registry, discovery, validation.

Self Service Tooling

Future stepsOpen source &

Community Driven