23
Apache Kafka... ...”a system optimized for writing” Bernhard Hopfenmüller 26. August 2018

Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Apache Kafka......”a system optimized for writing”

Bernhard Hopfenmüller

26. August 2018

Page 2: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

whoami

Bernhard HopfenmüllerIT Consultant @ ATIX AG

IRC: Fobhepgithub.com/Fobhep

#atix #froscon2018

Page 3: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

whoarewe

Why do we need a messaging system?

The Linux & Open Source CompanyUnterschleißheim @ München

over 15 Jahrendatacenter automation, Linux

Consulting, Engineering, Support,Training

#atix #froscon2018

Page 4: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Kafka

Quora.com What is the relation between Kafka, the writer, andApache Kafka, the distributed messaging system?

Jay Kreps: I thought that since Kafka was a system optimized forwriting using a writer’s name would make sense. I had taken a lotof lit classes in colleague and liked Franz Kafka. Plus the namesounded cool for an OS project

#atix #froscon2018

Page 5: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

I developed by LinkedIn, Open Source since 2011

I 2014 foundation of Confluent

#atix #froscon2018

Page 6: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Messaging-Systems

Why do we need a messaging system?

I Challenge 1: System B notavailable no caching

I Challenge 2: System A sending toomuch (DoS)

I Challenge 3: System B crashingupon processing Source[1]

#atix #froscon2018

Page 7: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Queues vs Topics

Supermarket vs Television

Supermarket Wait until it’s your turn

Each topic is received once!Source[1]

Television Choose what you want toreceive

topics are received any number of times!Source[1]

#atix #froscon2018

Page 8: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Kafka-Basic structure

Source[2]

#atix #froscon2018

Page 9: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topics I

I core component of Kafka

I is filled by producer

I consists of one or more partitions

1 102 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 9partition 2

old

0

0

0

partition 1

partition 0

producernew data

new

#atix #froscon2018

Page 10: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topics II

I producer can choose partition

I partition has running offset

I message is identified by offset

1 102 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 9partition 2

old

0

0

0

partition 1

partition 0

producernew data

new

#atix #froscon2018

Page 11: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topics III

1 102 3 4 5 6 7 8 90 11 12

segment 0 4 8 12

1314 15

*.log *.index

directory

partition

I messages are stored physically!

I key-value principle

I Clean-Up policies:

#atix #froscon2018

Page 12: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topics IV

1 102 3 4 5 6 7 8 90 11offset

1 102 3 4 5 6 7 8 90 11

key

value

9 118

9 118 10

10

1 2 30 1 13 33 220

time/size retention

certain time/size

1 33 2

I Clean-Up policies:

I default: Retention-time(delete old data after x days)

I Retention-size(delete old data if datamemory > x)

#atix #froscon2018

Page 13: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topics V

1 102 3 4 5 6 7 8 90 11offset

1 102 3 4 5 6 7 8 90 11

key

value

8 105

8 105 11

11

1 2 30 1 13 33 220

1 2 30

log compaction

I Clean-Up policies:

I default: Retention-time(delete old data after x days)

I Retention-size(delete old data if datamemory > x)

I Log-Compaction(replace old value to key withnew)

#atix #froscon2018

Page 14: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Topic consumption

I topics are pulled! (no DoS)

I any existing data can be pulled

#atix #froscon2018

Page 15: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Consumer Groups

I 1 message is read 1 time in agroup

I 1 consumer can read x partitions

I 1 partition can be read by 1consumer

I parallelism allows high throughput

I never more consumers thanpartitions

Source[1]

#atix #froscon2018

Page 16: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Replication

I implemented on partition level

I replicum = redundant unit ofpartition

I replica distributed over cluster

I replica have 1 leader and nfollowers

I producer writes to leader!

I leader copies to followersSource[3]

#atix #froscon2018

Page 17: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Did somebody hear my message?

Producer decides if message was successfully sentConfiguration possibilities:

I as soon as sent

I as soon as received by first broker

I as soon as desired number of replica existKafka features exactly-once-semantics!

#atix #froscon2018

Page 18: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Broker and ZooKeeper

I Brokers are stateless!

I Which Broker is alive?

I Broker communication?

I → ZooKeeper!

Source[4]

#atix #froscon2018

Page 19: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

ZooKeeper

I distributed, hierachical file system

I management of znodes()

I HA via ensemble (=ZooKeepercluster)

I ZooKeeper leader ! = Kafka leader

Source[4]

#atix #froscon2018

Page 20: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Use Cases

I Messaging (ActiveMQ or RabbitMQ)

I Website Activity Tracking

I Metrics

I Log Aggregation

I Stream Processing

I Apache Storm and Apache Samza.

I Commit Log

#atix #froscon2018

Page 21: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Who likes Kafka?

I Apple Inc.

I Cisco Systems

I Cloudflare

I eBay

I Netflix (Monitoring!)

I The New York Times ( Kafka as data storage! Super awesome blogpost)

I PayPal

I Spotify

I Twitter

I Uber (Kafka = Backbone!!!)

#atix #froscon2018

Page 22: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Summary

streams...

I ... publish, subscribe

I ... persistent, failsafe

I ... real time processing

I combining streams with DFS

Kafka moves complication to application

#atix #froscon2018

Page 23: Apache Kafka - 'a system optimized for writing' · Kafka Quora.comWhatistherelationbetweenKafka,thewriter,and ApacheKafka,thedistributedmessagingsystem? JayKreps:IthoughtthatsinceKafkawasasystemoptimizedfor

Sources

1 https://www.informatik-aktuell.de/betrieb/verfuegbarkeit/apache-kafka-eine-schluesselplattform-fuer-hochskalierbare-systeme.html

2 https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-1/ andhttps://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-2/

3 https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/

4 https://www.infoq.com/articles/apache-kafka

#atix #froscon2018