69
February 5, 2014 Pablo Barrera <[email protected]> Reliable RT processing @ Spotify

Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

February 5, 2014

Pablo Barrera <[email protected]>

Reliable RT processing @ Spotify

Page 2: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Spotify

Page 3: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�3

Spotify

• the right music for every moment • over 6 million paying customers • over 24 million active users each month • over 20 million songs • over 1.5 billion playlists created so far • available in 55 markets

Page 4: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

i/o tribe

responsible for building the awesome infrastructure that supports the Spotify experience

�4

Page 5: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Our goal

this looks easy

Page 6: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 7: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 8: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 9: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

That was easy !

!

MISSING FIGURE!!!

�7

Page 10: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

but we have a problem...

�8

Page 11: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Naïve approach (tm)

�9

SYSLOG FILE SCP LOG

ARCHIVER CURL HDFS PROXY HADOOP

Page 12: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�10

SCP CURL

Page 13: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�10

SCP CURL

Page 14: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Scalability�11

SCP CURL

Page 15: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Scalability�11

SCP CURLfor(;;) { (file) }

for(;;) { (file) }

Page 16: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�12

Page 17: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

We have a problem...

�13

thousands of servers several data centres millions of users

10 TB each day!

Page 18: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Our Needs !

•reliable delivery •fast data transfer •per-service subscription •low cpu overhead

�14

Page 19: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�15

Page 20: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Other options !

•active mq/rabbit mq •flume/flume-ng •others: scribe, chukwa, bookkeeper

�16

Page 21: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Apache Kafka !

!

distributed pub/sub system

Page 22: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka coolness !

•at least once read •O(1) •network bounded

�18

Page 23: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka architecture

�19

!KAFKA BROKER

TOPIC A

TOPIC B

TOPIC C

TOPIC D

TOPIC E

KAFKA PRODUCER

KAFKA CONSUMER

Page 24: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Cons !

•no reliability •no replication •manual tuning

�20

Page 25: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Spotify <3 Kafka

running in production!

Page 26: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka at Spotify !

•key component of our log delivery system •kafka 0.7.1 •java 7

�22

Page 27: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Custom extensions !

•end-to-end reliable delivery •compression/encryption service

�23

Page 28: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

End-to-end reliable delivery

Page 29: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�25

syslog file

syslog file

syslog file

Page 30: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

Page 31: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

Page 32: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMER

Page 33: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMER

Page 34: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

KAFKA SYSLOG

CONSUMERACK

Page 35: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

Service

production server

�25

syslog file

syslog file

syslog file

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 36: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

is that all?

�26

Page 37: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Piece of cake

right?

Page 38: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�28

Page 39: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�29

Kafka Producer

Kafka Broker

Zookeeper

Kafka Consumer Hadoop

Page 40: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�29

Kafka Producer

Kafka Broker

Zookeeper

Kafka Consumer Hadoop

Cross-site problems

Page 41: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

TCP window !

•TCP parameters for big latency •linux TCP scaling algorithm

�30

Page 42: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

IPSEC !

•linux IPSEC + firewall is slow •major drop in throughput •can not tweak it at app level

�31

Page 43: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 44: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

Page 45: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTION

Page 46: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTIONCompressed

Page 47: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

production server

�32

syslog file

syslog file

syslog file

!KAFKA BROKER

Service

HADOOP

KAFKA SYSLOG

PRODUCER

Checkpoint

KAFKA SYSLOG

CONSUMERACK

KAFKA SYSLOG

ENCRYPTIONCompressed

Page 48: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 49: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Garbage collector !

•50% of performance drop •25% of cpu time •young generation tuning

�34

Page 50: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�35

0

20

40

60

80

100

0 2 4 6 8 10 12 14

Tim

e sp

ent o

n Fu

ll GC

(%)

Time (minutes)

% of time spent doing Full GC before tuning

Page 51: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

�36

0

20

40

60

80

100

0 200 400 600 800 1000

Tim

e sp

ent o

n Fu

ll GC

(%)

Time (minutes)

% of time spent doing Full GC after tuning

Page 52: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Hadoop replication factor !

•stochastic failure mode •no real ack from Hadoop •files open for a long time

�37

Page 53: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 54: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Apache Storm !

!

distributed computation framework

Page 55: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

•abstractions: topology, bolt, stream, tuple, grouping •great community •ack + retries •but not for reliable apps

•use Hadoop instead

�40

Page 56: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Kafka integration !

•reliable data for reporting •low latency data for RT

�41

Page 57: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

KAFKA SYSLOG

PRODUCER

Page 58: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

STORM

KAFKA SYSLOG

PRODUCER

Page 59: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

!KAFKA BROKER

ACK

production server

�42

syslog file

syslog file

syslog file

Checkpoint

KAFKA SYSLOG

CONSUMERService

Retries

STORM

KAFKA SYSLOG

PRODUCER

Page 60: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

RT apps

Page 61: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 62: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 63: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 64: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Body copy large

Page 65: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 66: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20
Page 67: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

�49

Page 68: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

Storm !

�49

Page 69: Reliable RT processing @ Spotify - Jfokus · 3 Spotify •the right music for every moment •over 6 million paying customers •over 24 million active users each month •over 20

February 5, 2014

Thanks!

Pablo Barrera <[email protected]> !

!

Want to join the band?spotify.com/jobs