Kafka Audit - Kafka Meetup - January 27th, 2015

KAFKA AUDITJanuary 27th, 2015 - LinkedIn Meetup

ProducerKafka Cluster

Mp = {Plain old Kafka message}

Kafka Cluster

Aggregate

Mp = Plain old Kafka message

Producer

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Kafka Cluster

Datacenter A Datacenter B

Producer

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Producer Kafka Cluster

Ma = {

Plain old Kafka message

Producer creation timestamp

Producer identification string}

Producer Kafka Cluster

Ma = {

Plain old Kafka message

Producer creation timestamp

Producer identification string}

Mm = {Count of messages

The topic this count is for

Tier identification string

Time bucket interval

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Ma = Message with audit data

Mm = Monitoring message

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Consumer

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Consumer

Kafka Cluster

Aggregate

Offline

processing

Consumer

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Consumer

AppREST API

Audit MySQL

Audit UI

AUDIT UI

Tier Count

Local 123

Aggregate

Aggregate Offline

Producer 123

(for each topic and time window)

AUDIT UI

Tier Count

Local 123

Aggregate

Aggregate Offline

Producer 123

We lost 4 messages between local and aggregate!

(for each topic and time window)

CAVEATS

• Audit consumers need to consume

everything.

• Intermediate tiers are tough to drill down into.

QUESTIONS?

users@kafka.apache.org

https://kafka.apache.org/

irc://irc.freenode.net/#apache-kafka

Many folks on the mailing list know the details

of how Kafka Audit works.

LATE MESSAGE

RESOLUTION

LATE MESSAGE

RESOLUTION

Producer

Aggregate

Hadoop

10:10 10:20 10:30 10:40

299 337

337 326

From the 10:10 to 10:20 time bucket, 53 messages were

lost from the producer to the Kafka local cluster.

Unhealthy!

Current time

LATE MESSAGE

RESOLUTION

Producer

Aggregate

Hadoop

10:10 10:20 10:30 10:40

299+53

299 337

337 326

Another message Mm arrives later with the missing count of 53!

Current time

LATE MESSAGE

RESOLUTION

Producer

Aggregate

Hadoop

10:10 10:20 10:30 10:40

352 337

337 326

All time periods match after arrival of late Mm message.

Healthy state now.

Current time

The producer timestamp determines the time bucket

the message is placed into — deterministic.

Mm = {Count of messages

The topic this count is for

Tier identification string

Time bucket interval

TRANSPORT TIME

Producer

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Consumer

Tt = {Time Ma seen by audit consumer

}Topic name

Metrics

(e.g. RRDs)

Tt = { Time seen by audit consumer}Topic name

Tt can be sampled,

no need to emit for all messages

Tt[time] = <Audit Consumer NTPd Time> - Ma[time]

CAVEATS

• Depends on the Audit Consumer lag.

• Producer batching can skew timestamps.

SCHEMA RESOLUTION

WHAT IS A SCHEMA?{

"type":"record",

"name":"User",

"fields":[

"name":"name",

"type":"string"

"name":"favorite_number",

"type":[

"int",

"null"

Every message should be formatted to a schema!

SCHEMA REGISTRYA REST API to go from schema to ID, and ID to schema.

Schema ID = hash(Raw Schema)

Schema Registry Database

Registration

TimestampSchema ID Raw Schema

History of registrations is maintained.

ProducerSchema

Registry

1. Producer registers schema.

2. Registry returns schema ID (hash of schema).

3. Schema ID prepended to all Kafka messages.

Ms = { }<Schema ID> + Mall

Kafka Audit - Kafka Meetup - January 27th, 2015

Technology

Kafka Tutorial: Kafka Security

Kafka Connect & Streams - the ecosystem around Kafka

Formatted: Figure [PACKT] cm, Width: 21.59 cm, Height: 27 ... · Kafka 0.7.x Consumer Kafka 0.7.x Cluster Kafka Migration Kafka 0.8 Cluster Kafka 0.8 Producer Producer (Front End)

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

101 ways to configure kafka - badly (Kafka Summit)

Paris Kafka Meetup - How to develop with Kafka

Enterprise Kafka: Kafka as a Service

PDF: Kafka Tutorial - Cloudurablecloudurable.com/ppt/cloudurable-kafka-tutorial-v1.pdf · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ What is Kafka?

Cloudurablecloudurable.com/ppt/cloudurable-kafka-intro-with-simple-java-produc… · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Fundamentals

Publish-subscribe Message Framework with Apache Kafka and ...vvtesh.co.in/teaching/bigdata-2020/slides/studentppt/kafka-kinesis.pdf · Introduction of Kafka Kafka elementary concepts

MAURICE BLANCHOT, de Kafka a Kafka

Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC Storm User Group Meetup - 21st Nov 2013

Kafka on YARN (KOYA) at Slider Meetup 20150304

Kafka & Hadoop - for NYC Kafka Meetup

Kafka blr-meetup-presentation - Kafka internals

Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLlib, SQL, Project Tungsten, Text Analytics, Natural Language Processing

Kafka Streams: Hands-on Session - ce.uniroma2.it · Kafka Streams Kafka Streams: • Kafka Streams is a client library for processing and analyzing data stored in Kafka • Supports

Seattle kafka meetup nov 2015 published siphon

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka