31
Storm @Twitter KARTHIK RAMASAMY @KARTHIKZ #TwitterAtSigmod #TwitterDataStorm Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Jignesh Patel, Sanjeev Kulkarni Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat Sailesh Mittal and Dmitriy Ryaboy

Storm@Twitter, SIGMOD 2014

Embed Size (px)

Citation preview

Page 1: Storm@Twitter, SIGMOD 2014

Storm @Twitter

KARTHIK RAMASAMY @KARTHIKZ

#TwitterAtSigmod #TwitterDataStorm

Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Jignesh Patel, Sanjeev Kulkarni Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat

Sailesh Mittal and Dmitriy Ryaboy

Page 2: Storm@Twitter, SIGMOD 2014

BEGIN

END

STORM OVERVIEW

!I

STORM INTERNALS

(II

STORM EXPERIMENTS

KV

OPERATIONAL EXPERIENCES

ZIV

TALK OUTLINE

OPERATIONAL OVERVIEW

bIII

Page 3: Storm@Twitter, SIGMOD 2014

OVERVIEW

![

I

Page 4: Storm@Twitter, SIGMOD 2014

GUARANTEED MESSAGE

PROCESSING

HORIZONTAL SCALABILITY

ROBUST FAULT

TOLERANCE

CONCISE CODE- FOCUS

ON LOGIC

/b \ Ñ

WHAT IS STORM?

Streaming platform for analyzing realtime data as they arrive, so you can react to data as it happens.

Page 5: Storm@Twitter, SIGMOD 2014

STORM DATA MODELTOPOLOGY

Directed acyclic graph

Vertices=computation, and edges=streams of data tuples

SPOUTS

Sources of data tuples for the topology

Examples - Kafka/Kestrel/MySQL/Postgres

BOLTS

Process incoming tuples and emit outgoing tuples

Examples - filtering/aggregation/join/arbitrary function

,

%

Page 6: Storm@Twitter, SIGMOD 2014

STORM TOPOLOGY

%

%

%

%

%

SPOUT 1

SPOUT 2

BOLT 1

BOLT 2

BOLT 3

BOLT 4

BOLT 5

Page 7: Storm@Twitter, SIGMOD 2014

WORD COUNT TOPOLOGY

% %TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT

Live stream of Tweets

#worldcup : 1M soccer: 400K

….

Page 8: Storm@Twitter, SIGMOD 2014

WORD COUNT TOPOLOGY

% %TWEET SPOUT

TASKSPARSE TWEET BOLT

TASKSWORD COUNT BOLT

TASKS

%%%% %%%%

When a parse tweet bolt task emits a tuple which word count bolt task should it send to?

Page 9: Storm@Twitter, SIGMOD 2014

STREAM GROUPINGS

Random distribution of tuples

Group tuples by a field or multiple

fields

Replicates tuples to all tasks

SHUFFLE GROUPING FIELDS GROUPING ALL GROUPING

Sends the entire stream to one task

GLOBAL GROUPING

/ - ,.

Page 10: Storm@Twitter, SIGMOD 2014

STORM INTERNALS

(II

Page 11: Storm@Twitter, SIGMOD 2014

STORM ARCHITECTURE

Nimbus

ZK CLUSTER

SUPERVISOR

W1 W2 W3 W4

SUPERVISOR

W1 W2 W3 W4

TOPOLOGY SUBMISSION

ASSIGNMENT MAPS

SYNC CODE

SLAVE NODE SLAVE NODE

MASTER NODE

Page 12: Storm@Twitter, SIGMOD 2014

STORM WORKER

TASK

EXECUTOR

TASK

TASK

EXECUTOR

TASK

TASK

TASK

EXECUTOR

JVM

PR

OC

ESS

Page 13: Storm@Twitter, SIGMOD 2014

DATA FLOW IN STORM WORKERS

In QueueIn QueueIn QueueIn QueueIn Queue

TCP Receive Buffer

In QueueIn QueueIn QueueIn QueueOut Queue

Outgoing Message Buffer

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic Thread

User Logic ThreadSend Thread

Global Send Thread

TCP Send Buffer

Global Receive Thread

Kernel

Disruptor Queues

0mq Queues

Page 14: Storm@Twitter, SIGMOD 2014

OPERATIONAL OVERVIEW

bIII

Page 15: Storm@Twitter, SIGMOD 2014

1

STORM METRICS

SUPPORT AND TROUBLE SHOOTING

2 CONTINUOUS PERFORMANCE

3 CLUSTER AVAILABILITY

Page 16: Storm@Twitter, SIGMOD 2014

COLLECTING TOPOLOGY METRICS

% %TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT

%

METRICS BOLT

SCRIBE

Page 17: Storm@Twitter, SIGMOD 2014

SAMPLE TOPOLOGY DASHBOARD

Page 18: Storm@Twitter, SIGMOD 2014

OPERATIONAL EXPERIENCES

K"

IV

Page 19: Storm@Twitter, SIGMOD 2014

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Shared configuration

W

W

WSTORM

Quickly exceeded number of clients

Impacted uptime of other systems

Page 20: Storm@Twitter, SIGMOD 2014

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Detached configuration

W

W

WSTORM

zk

Increased to 300 workers per cluster

> 300 - workers get killed and relaunched

Worker heart beats written to zknode every 15 secs

Page 21: Storm@Twitter, SIGMOD 2014

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Scale up

W

W

WSTORM

zk

Increased to 1200 workers per cluster

Page 22: Storm@Twitter, SIGMOD 2014

67%

33%

OVERLOADED ZOOKEEPER

KAFKA SPOUT

Offset/partition is written every 2 secs

!

!

STORM RUNTIME

Workers write heart beats every 3 secs

Analyzing zookeeper traffic

Page 23: Storm@Twitter, SIGMOD 2014

OVERLOADED ZOOKEEPER

zk

S1

S2

S3

Heart beat daemons

W

W

WSTORM

zk

5000 workers per cluster and still growing!

HHH

KVKVKV

Page 24: Storm@Twitter, SIGMOD 2014

EXPT 1

STORM OVERHEADSJAVA PROGRAM

Read from Kafka cluster and deserialize in a “for loop”

Sustain input rate of 300K msgs/sec from Kafka topic

EXPT 2

1-STAGE TOPOLOGY

No acks to achieve at most once semantics

Storm processes were co-located using isolation scheduler

EXPT 3

1-STAGE TOPOLOGY WITH ACKS

Enable acks for at least once semantics

Page 25: Storm@Twitter, SIGMOD 2014

STORM OVERHEADS

Aver

age

CPU

Util

izat

ion

0%

20%

40%

60%

80%

Mac

hine

s U

sed

0

1

2

3

JAVA 1-STAGE 1-STAGE-ACK

Machines Avg. CPU

77%

58.2%58.3%

3

11

Page 26: Storm@Twitter, SIGMOD 2014

STORM EXPERIMENTS

x

9V

Page 27: Storm@Twitter, SIGMOD 2014

STORM EXPERIMENTSExamine resiliency and efficiency during machine failures

% %CLIENT EVENT

SPOUTDISTRIBUTOR

BOLTUSER COUNT

BOLT

%AGGREGATOR

BOLT

SHUFFLE GROUPING

FIELDS GROUPING

FIELDS GROUPING

COMPONENTS # TASKS

client event spout 200

distributor bolt 200

user count bolt 300

aggregator bolt 20

Page 28: Storm@Twitter, SIGMOD 2014

STORM THROUGHPUT

Page 29: Storm@Twitter, SIGMOD 2014

STORM LATENCY

Page 30: Storm@Twitter, SIGMOD 2014

#

#ThankYouFOR LISTENING

Page 31: Storm@Twitter, SIGMOD 2014

QUESTIONS

and

ANSWERS

R$ Go ahead. Ask away.