Apache Kafkacs237/project2020/Kafka.pdf · 2020-05-11 · Publish/subscribe messaging pattern...

Apache KafkaYinhao HeJiaqi Xiao

Ananth Gottumukkala

Publish/subscribe messaging pattern● Publisher: classify the message without

knowing any subscribers exist

● Subscriber: subscribe to the message

without knowing any publishers exist

● Broker: decouples publishers from

subscribers

(Similar to a bulletin board)

What is Kafka?● Open source publish/subscribe messaging system

● Distributed event log (persistent on disk)

● Hybrid between a messaging system and a database

● High throughput platform

● Real-time data streams

● Used by Twitter, Netflix, and originally developed by LinkedIn

Kafka structure

Message● Single Unit of Data (Byte Array)

● Batch○ collection of messages produced for the same topic

and partition

○ trade-off between latency and throughput

○ can be compressed

● Additional Structure○ E.g. JSON, XML, AVRO or PROTOBUF

● Message ordering not guaranteed across multiple partitions

Producer & ConsumerProducer

● create new messages & send to specific topic

Consumer

● read messages○ In order

● Offset○ Created when message is written to Kafka○ Consumer remember what offset each partition is at○ Zookeeper

Consumer Group● each partition only

consumed by one member of a consumer group

Broker● Kafka cluster consists of

multiple servers called brokers

● Controller Broker responsible for administrative operations○ Assign partitions to brokers○ Monitor Broker Failure

● Provides redundancy of messages in the partition○ Avoid Broker Failure

Retention● Provides a certain time period durable

storage for messages

● Time

● Size

● Individual topics can also configure their

own retention settings

Reliability Guarantees● Guarantees the order of messages in one partition

● Committed messages won't be lost as long as at least one replica

remains alive and retention policy holds

● Consumers can only read committed messages

● At least once message delivery semantics

Advantages of KafkaDeals with Integration Complexity

High Throughput and Fairly Low Latency

Handles Big Data

Many Configuration Options

Data Retention

Multiple Producers/Consumers

Disadvantages of KafkaSteep Learning Curve

Not Low Enough Latency

Susceptible to Data Loss

● Split-Brain● Partition Lead Failover

Kafka vs JMS/ActiveMQ

Kafka JMS/ActiveMQ

Real-Time Data Stream Traditional Messaging

Consumers Pull Messages from Brokers Messages Pushed to Consumers

Implements Backpressure Hard to Achieve Backpressure

Data Retention to Disk No Data Retention

Guarantees Message Ordering in Partition No Ordering Guarantees

Can rewind and re-consume data Consumer does not track offset

Kafka vs Kinesis

Kafka Kinesis

Requires setting up your own cluster, nodes, replicas, partitions, etc.

AWS manages infrastructure, config, etc.

Flexible config but need to tune producers (amt. of data to send to broker), consumers (# replicas, # consumers per partition/topic)

Config not as flexible but AWS ensures availability/durability for 7 days. Configure # shards for throughput

Higher Maintenance/Risk Mgmt Cost Pay-as-you-go / Per # Shards

Thank you

Apache Kafkacs237/project2020/Kafka.pdf · 2020-05-11 · Publish/subscribe messaging pattern...

Documents

Preferential Publish/Subscribe

Pronto: MobileGateway with publish-subscribe paradigm … · publish-subscribe paradigm over wireless network ... Pronto: MobileGateway with Publish-Subscribe ... in both centralized

Distributed Publish/Subscribe

Apache’Kaa - PUC-Rioendler/courses/RT-Analytics/transp/Kafka.pdf · Intro’ • Originally’developed’by’LinkedIn,’and’later’turned’into’ an’Apache’open’source’project

Subscribe Past Issues Translate RSS · Subscribe Past Issues Translate RSS. Subscribe Past Issues Translate RSS. Subscribe Past Issues Translate RSS. YEARBOOK PICK-UP INFORMATION

Apache Kaa - Inspiring Innovationshadam1/491s16/lectures/04-Kafka.pdf · Apache Kaa CMSC 491 Hadoop-Based Distributed Compu=ng Spring 2016 Adam Shook Overview • Kaa is a “publish-subscribe

Highly Available Publish/Subscribe

AN INTRODUCTION TO SPARK AND TO ITS ...cs237/project2020/Spark.pdfIntroduction toApache Spark 2 •Fast, expressive cluster computing system compatible with Apache Hadoop •It is

Publish-subscribe, Event brokers

Reliable Multicast for Publish/Subscribe Systemsgroups.csail.mit.edu/graphics/pubs/thesis_qsun.pdf · 2011. 12. 28. · Reliable Multicast for Publish/Subscribe Systems by ... Publish/subscribe

Subscribe Online

Publish/Subscribe - ETH Z

Trusted Publish/Subscribe

Boy ft. Sam Smith - La La La (EA… · subscribe' easy 1 "/pcnpiano . subscribe' easy z "/pcnpiano . subscribe' easy 3 p "/pcnpiano p e

WMB7 Publish Subscribe

TopK Publish Subscribe

Subscribe Past Issues Translate

project2020 good governance

WebSphere MQ Publish/Subscribe - SHARE · – Topic tree administration control – No code change Publish/Subscribe ... In the WebSphere MQ Publish/Subscribe model the only thing

Persistent Publish/Subscribe Messaging in Medical Devices€¦ · Publish/Subscribe (PPS) messaging offers versatile, ... publish/subscribe or other observer pattern models in computing