36
KAFKA AUDIT January 27th, 2015 - LinkedIn Meetup

Kafka Audit - Kafka Meetup - January 27th, 2015

Embed Size (px)

Citation preview

Page 1: Kafka Audit - Kafka Meetup - January 27th, 2015

KAFKA AUDITJanuary 27th, 2015 - LinkedIn Meetup

Page 2: Kafka Audit - Kafka Meetup - January 27th, 2015

+

Page 3: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Mp

Mp = {Plain old Kafka message}

Page 4: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Mp

Mp

Mp = Plain old Kafka message

Page 5: Kafka Audit - Kafka Meetup - January 27th, 2015

Producer

Kafka Cluster

Aggregate

Mp

Mp

Kafka Cluster

Aggregate

Kafka Cluster

Local

Kafka Cluster

Local

Mp Mp

Mp = Plain old Kafka message

Datacenter A Datacenter B

Page 6: Kafka Audit - Kafka Meetup - January 27th, 2015

Producer

Kafka Cluster

Aggregate

Mp

Mp

Kafka Cluster

Local

Mp = Plain old Kafka message

Page 7: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Mp

Mp

Mp

Mp = Plain old Kafka message

Page 8: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Mp

Mp

Mp

Mp

Mp = Plain old Kafka message

Page 9: Kafka Audit - Kafka Meetup - January 27th, 2015

Producer Kafka Cluster

Ma = {

Ma

Plain old Kafka message

Producer creation timestamp

Producer identification string}

Page 10: Kafka Audit - Kafka Meetup - January 27th, 2015

Producer Kafka Cluster

Ma = {

Ma

Plain old Kafka message

Producer creation timestamp

Producer identification string}

Mm

Mm = {Count of messages

The topic this count is for

Tier identification string

Time bucket interval

}

Page 11: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Ma

Mm

Ma

Ma

Ma

Ma = Message with audit data

Mm = Monitoring message

Page 12: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Audit

Consumer

Ma

Mm Mm

Ma

Ma

Ma

Ma

Ma = Message with audit data

Mm = Monitoring message

Page 13: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Audit

Consumer

Audit

Consumer

Audit

Consumer

Ma

Mm Mm

Ma

Mm

Ma

Mm

Ma

Ma

Ma

Ma

Ma = Message with audit data

Mm = Monitoring message

Page 14: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Offline

processing

Audit

Consumer

Audit

Consumer

Audit

Consumer

Audit

App

Ma

Mm Mm

Ma

Mm

Ma

Mm

Ma

Ma

Ma

Mm

Ma = Message with audit data

Mm = Monitoring message

Ma

Kafka Cluster

Aggregate

Page 15: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerKafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Offline

processing

Audit

Consumer

Audit

Consumer

Audit

Consumer

Audit

AppREST API

Ma

Mm Mm

Ma

Mm

Ma

Mm

Ma

Ma

Ma

Mm

Ma = Message with audit data

Mm = Monitoring message

Ma

Mm

Audit MySQL

Audit UI

Page 16: Kafka Audit - Kafka Meetup - January 27th, 2015

AUDIT UI

Tier Count

Local 123

Aggregate

Aggregate Offline

123

123

Producer 123

(for each topic and time window)

Page 17: Kafka Audit - Kafka Meetup - January 27th, 2015

AUDIT UI

Tier Count

Local 123

Aggregate

Aggregate Offline

119

119

Producer 123

We lost 4 messages between local and aggregate!

(for each topic and time window)

Page 18: Kafka Audit - Kafka Meetup - January 27th, 2015

CAVEATS

• Audit consumers need to consume

everything.

• Intermediate tiers are tough to drill down into.

Page 19: Kafka Audit - Kafka Meetup - January 27th, 2015

QUESTIONS?

[email protected]

https://kafka.apache.org/

irc://irc.freenode.net/#apache-kafka

Many folks on the mailing list know the details

of how Kafka Audit works.

Page 20: Kafka Audit - Kafka Meetup - January 27th, 2015
Page 21: Kafka Audit - Kafka Meetup - January 27th, 2015

LATE MESSAGE

RESOLUTION

Page 22: Kafka Audit - Kafka Meetup - January 27th, 2015

LATE MESSAGE

RESOLUTION

Producer

Local

Aggregate

Aggregate

Hadoop

10:10 10:20 10:30 10:40

341

10:00

341

341

341

341

352

299

299

299

299 337

337

337

337

337 326

326

326

326

326

From the 10:10 to 10:20 time bucket, 53 messages were

lost from the producer to the Kafka local cluster.

Unhealthy!

Current time

Page 23: Kafka Audit - Kafka Meetup - January 27th, 2015

LATE MESSAGE

RESOLUTION

Producer

Local

Aggregate

Aggregate

Hadoop

10:10 10:20 10:30 10:40

341

10:00

341

341

341

341

352

299+53

299

299

299 337

337

337

337

337 326

326

326

326

326

Another message Mm arrives later with the missing count of 53!

Current time

Page 24: Kafka Audit - Kafka Meetup - January 27th, 2015

LATE MESSAGE

RESOLUTION

Producer

Local

Aggregate

Aggregate

Hadoop

10:10 10:20 10:30 10:40

341

10:00

341

341

341

341

352

352

352

352

352 337

337

337

337

337 326

326

326

326

326

All time periods match after arrival of late Mm message.

Healthy state now.

Current time

Page 25: Kafka Audit - Kafka Meetup - January 27th, 2015

The producer timestamp determines the time bucket

the message is placed into — deterministic.

Mm = {Count of messages

The topic this count is for

Tier identification string

Time bucket interval

}

Page 26: Kafka Audit - Kafka Meetup - January 27th, 2015
Page 27: Kafka Audit - Kafka Meetup - January 27th, 2015

TRANSPORT TIME

Page 28: Kafka Audit - Kafka Meetup - January 27th, 2015

Producer

Kafka Cluster

Local

Kafka Cluster

Aggregate

Kafka Cluster

Aggregate

Audit

Consumer

Audit

Consumer

Audit

Consumer

Ma

Ma

Ma

Ma

Ma

Ma

Tt = {Time Ma seen by audit consumer

}Topic name

Tt

Tt

Tt

Metrics

(e.g. RRDs)

Page 29: Kafka Audit - Kafka Meetup - January 27th, 2015

Tt = { Time seen by audit consumer}Topic name

Tt can be sampled,

no need to emit for all messages

Tt[time] = <Audit Consumer NTPd Time> - Ma[time]

Page 30: Kafka Audit - Kafka Meetup - January 27th, 2015

CAVEATS

• Depends on the Audit Consumer lag.

• Producer batching can skew timestamps.

Page 31: Kafka Audit - Kafka Meetup - January 27th, 2015
Page 32: Kafka Audit - Kafka Meetup - January 27th, 2015

SCHEMA RESOLUTION

Page 33: Kafka Audit - Kafka Meetup - January 27th, 2015

WHAT IS A SCHEMA?{

"type":"record",

"name":"User",

"fields":[

{

"name":"name",

"type":"string"

},

{

"name":"favorite_number",

"type":[

"int",

"null"

]

}

]

}

Every message should be formatted to a schema!

Page 34: Kafka Audit - Kafka Meetup - January 27th, 2015

SCHEMA REGISTRYA REST API to go from schema to ID, and ID to schema.

Schema ID = hash(Raw Schema)

Schema Registry Database

Registration

TimestampSchema ID Raw Schema

History of registrations is maintained.

Page 35: Kafka Audit - Kafka Meetup - January 27th, 2015

ProducerSchema

Registry

1.

2.

Kafka

3.

1. Producer registers schema.

2. Registry returns schema ID (hash of schema).

3. Schema ID prepended to all Kafka messages.

Ms = { }<Schema ID> + Mall

Ms

Page 36: Kafka Audit - Kafka Meetup - January 27th, 2015