15
Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

Kafka: a Distributed Messaging System for Log Processing

Embed Size (px)

Citation preview

Page 1: Kafka: a Distributed Messaging System for Log Processing

Kafka: a Distributed Messaging System for Log Processing

Jay Kreps, Neha Narkhede, Jun RaoLinkedIn

Page 2: Kafka: a Distributed Messaging System for Log Processing

AGENDA

• Kafka usage at LinkedIn

• Kafka design

• Kafka roadmap

Page 3: Kafka: a Distributed Messaging System for Log Processing

ABOUT LINKEDIN

• Professional social network platform

• top 50th largest site in the world (traffic)

• 100M+ members

Page 4: Kafka: a Distributed Messaging System for Log Processing

LOGGING OVERVIEW• Many types of events

• user activity events: impression, search, ads, etc

• operational events: call stack, service metrics, etc

• High volume: billions of events per day

• Both online and offline use case

• reporting, batch analysis

• security, news feeds, performance dashboard, ...

Page 5: Kafka: a Distributed Messaging System for Log Processing

DEPLOYMENT

Frontend Frontend Frontend

VIP

KafkaKafkaKafka

Realtimeservice

Realtimeservice

OracleAsterdata

Main site

KafkaKafkaKafka

Analysis site

Hadoop

Page 6: Kafka: a Distributed Messaging System for Log Processing

KAFKA DESIGN PRINCIPLES

• Simple API

• Efficient

•Distributed

Page 7: Kafka: a Distributed Messaging System for Log Processing

PRODUCER API

void send(String topic, ByteBufferMessageSet messages)

producer = new KafkaProducer(…); message = new Message(“test message str”.getBytes()); set = new ByteBufferMessageSet(message); producer.send(“test”, set);

Page 8: Kafka: a Distributed Messaging System for Log Processing

CONSUMER API

streams[] = Consumer.createMessageStreams(“test”, 1)

for(message: streams[0]) { bytes = message.payload() // do something with bytes}

Page 9: Kafka: a Distributed Messaging System for Log Processing

EFFICIENCY #1: SIMPLE STORAGE

• Each topic has an evergrowing log

• A log == a list of files

• A message is addressed by a log offset

Page 10: Kafka: a Distributed Messaging System for Log Processing

EFFICIENCY #2: CAREFUL TRANSFER

• Batch send and fetch

•No message caching in Kafka layer

• Rely on file system page cache

•mostly, sequential access patterns

• Zero-copy transfer : file -> socket

Page 11: Kafka: a Distributed Messaging System for Log Processing

EFFICIENCY #3: STATELESS BROKER

• Each consumer maintains its own state

•Message deletion driven by retention policy, not by tracking consumption

• acceptable in practice

• rewindable consumer

Page 12: Kafka: a Distributed Messaging System for Log Processing

AUTO CONSUMER LOAD BALANCING

• brokers and consumers register in zookeeper

• consumers listen to broker and consumer changes

• each change triggers consumer rebalancing

broker broker broker broker

consumer

zookeeper

consumer

Page 13: Kafka: a Distributed Messaging System for Log Processing

PRODUCER PERFORMANCE

!

Page 14: Kafka: a Distributed Messaging System for Log Processing

CONSUMER PERFORMANCE

!

Page 15: Kafka: a Distributed Messaging System for Log Processing

ROADMAP

•New Kafka features

• compression

• replication

• stream processing (online M/R)

• http://sna-projects.com/kafka/