Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Apache Kafka......”a system optimized for writing”
Bernhard Hopfenmüller
26. August 2018
whoami
Bernhard HopfenmüllerIT Consultant @ ATIX AG
IRC: Fobhepgithub.com/Fobhep
#atix #froscon2018
whoarewe
Why do we need a messaging system?
The Linux & Open Source CompanyUnterschleißheim @ München
over 15 Jahrendatacenter automation, Linux
Consulting, Engineering, Support,Training
#atix #froscon2018
Kafka
Quora.com What is the relation between Kafka, the writer, andApache Kafka, the distributed messaging system?
Jay Kreps: I thought that since Kafka was a system optimized forwriting using a writer’s name would make sense. I had taken a lotof lit classes in colleague and liked Franz Kafka. Plus the namesounded cool for an OS project
#atix #froscon2018
I developed by LinkedIn, Open Source since 2011
I 2014 foundation of Confluent
#atix #froscon2018
Messaging-Systems
Why do we need a messaging system?
I Challenge 1: System B notavailable no caching
I Challenge 2: System A sending toomuch (DoS)
I Challenge 3: System B crashingupon processing Source[1]
#atix #froscon2018
Queues vs Topics
Supermarket vs Television
Supermarket Wait until it’s your turn
Each topic is received once!Source[1]
Television Choose what you want toreceive
topics are received any number of times!Source[1]
#atix #froscon2018
Kafka-Basic structure
Source[2]
#atix #froscon2018
Topics I
I core component of Kafka
I is filled by producer
I consists of one or more partitions
1 102 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9partition 2
old
0
0
0
partition 1
partition 0
producernew data
new
#atix #froscon2018
Topics II
I producer can choose partition
I partition has running offset
I message is identified by offset
1 102 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9partition 2
old
0
0
0
partition 1
partition 0
producernew data
new
#atix #froscon2018
Topics III
1 102 3 4 5 6 7 8 90 11 12
segment 0 4 8 12
1314 15
*.log *.index
directory
partition
I messages are stored physically!
I key-value principle
I Clean-Up policies:
#atix #froscon2018
Topics IV
1 102 3 4 5 6 7 8 90 11offset
1 102 3 4 5 6 7 8 90 11
key
value
9 118
9 118 10
10
1 2 30 1 13 33 220
time/size retention
certain time/size
1 33 2
I Clean-Up policies:
I default: Retention-time(delete old data after x days)
I Retention-size(delete old data if datamemory > x)
#atix #froscon2018
Topics V
1 102 3 4 5 6 7 8 90 11offset
1 102 3 4 5 6 7 8 90 11
key
value
8 105
8 105 11
11
1 2 30 1 13 33 220
1 2 30
log compaction
I Clean-Up policies:
I default: Retention-time(delete old data after x days)
I Retention-size(delete old data if datamemory > x)
I Log-Compaction(replace old value to key withnew)
#atix #froscon2018
Topic consumption
I topics are pulled! (no DoS)
I any existing data can be pulled
#atix #froscon2018
Consumer Groups
I 1 message is read 1 time in agroup
I 1 consumer can read x partitions
I 1 partition can be read by 1consumer
I parallelism allows high throughput
I never more consumers thanpartitions
Source[1]
#atix #froscon2018
Replication
I implemented on partition level
I replicum = redundant unit ofpartition
I replica distributed over cluster
I replica have 1 leader and nfollowers
I producer writes to leader!
I leader copies to followersSource[3]
#atix #froscon2018
Did somebody hear my message?
Producer decides if message was successfully sentConfiguration possibilities:
I as soon as sent
I as soon as received by first broker
I as soon as desired number of replica existKafka features exactly-once-semantics!
#atix #froscon2018
Broker and ZooKeeper
I Brokers are stateless!
I Which Broker is alive?
I Broker communication?
I → ZooKeeper!
Source[4]
#atix #froscon2018
ZooKeeper
I distributed, hierachical file system
I management of znodes()
I HA via ensemble (=ZooKeepercluster)
I ZooKeeper leader ! = Kafka leader
Source[4]
#atix #froscon2018
Use Cases
I Messaging (ActiveMQ or RabbitMQ)
I Website Activity Tracking
I Metrics
I Log Aggregation
I Stream Processing
I Apache Storm and Apache Samza.
I Commit Log
#atix #froscon2018
Who likes Kafka?
I Apple Inc.
I Cisco Systems
I Cloudflare
I eBay
I Netflix (Monitoring!)
I The New York Times ( Kafka as data storage! Super awesome blogpost)
I PayPal
I Spotify
I Twitter
I Uber (Kafka = Backbone!!!)
#atix #froscon2018
Summary
streams...
I ... publish, subscribe
I ... persistent, failsafe
I ... real time processing
I combining streams with DFS
Kafka moves complication to application
#atix #froscon2018
Sources
1 https://www.informatik-aktuell.de/betrieb/verfuegbarkeit/apache-kafka-eine-schluesselplattform-fuer-hochskalierbare-systeme.html
2 https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-1/ andhttps://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-2/
3 https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
4 https://www.infoq.com/articles/apache-kafka
#atix #froscon2018