Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

Preview:

Citation preview

1

When Bad Things Happen toGood Kafka Clusters

True stories that actually happened to production Kafka clustersAs told by

Gwen Shapira, System Architect@gwenshap

2

DisclaimerI am talking about other people’s systemsNot yours.I am sure you had perfectly good reasons to configure your system the way you did. This is not personal criticismJust some stories and few lessons we learned the hard way

3

POCs are super easyIts time to go production

4

We keep our data in/tmp/logs

What can possible go wrong?

5

Replication-factor of 3 is way too much

6

__consumer_offsets topic?

Never heard of it, so its probably ok to delete.

7

8

What’s wrong with running Kafka 0.7?

9

Remember that time when…We accidentally lost all our data?

10

We added new partitions…And immediately ran out of memory

11

We wanted to lookup records by timeThe smaller the segments, the more accurate the lookups

So we created 10k segments.

12

We need REALLY LARGE messages

13

We just serialize JSON and throw it into a topic.It’s easy.The consumers will figure something out.

14

Log4J is a great way to reliably send data to Kafka

15

Keep your Kafka safe!“When it absolutely, positively has to be there:

Reliability guarantees in Apache Kafka”

Wednesday, 11:20am, Room 3D

16

Thank you

Visit Confluent in booth #929Books, Kafka t-shirts & stickers, and more…

Gwen Shapira | gwen@confluent.io | @gwenshap