Kafka at Peak Performance

SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.

Kafka at Peak Performance

SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.

Todd PalinoStaff Site Reliability EngineerLinkedIn, Data Infrastructure Streaming

SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved. 3

Who Am I?


Kafka At LinkedIn

1100+ Kafka brokers Over 32,000 topics 350,000+ Partitions

875 Billion messages per day 185 Terabytes In 675 Terabytes Out

Peak Load (whole site)– 10.5 Million messages/sec– 18.5 Gigabits/sec Inbound– 70.5 Gigabits/sec Outbound

1800+ Kafka brokers Over 79,000 topics 1,130,000+ Partitions

1.3 Trillion messages per day 330 Terabytes In 1.2 Petabytes Out

Peak Load (single cluster)– 2 Million messages/sec– 4.7 Gigabits/sec Inbound– 15 Gigabits/sec Outbound


What Will We Talk About?

Picking Your Hardware

Monitoring the Cluster

Triaging Broker Performance Problems

Conclusion


Hardware Selection


What’s Important To You?

Message Retention - Disk size

Message Throughput - Network capacity

Producer Performance - Disk I/O

Consumer Performance - Memory


Go Wide

Kafka is well-suited to horizontal scaling

RAIS - Redundant Array of Inexpensive Servers

Also helps with CPU utilization– Kafka needs to decompress and recompress every message batch– KIP-31 will help with this by eliminating recompression

Don’t co-locate Kafka


Disk Layout

RAID– Can survive a single disk failure (not RAID 0)– Provides the broker with a single log directory– Eats up disk I/O

JBOD– Gives Kafka all the disk I/O available– Broker is not smart about balancing partitions– If one disk fails, the entire broker stops

Amazon EBS performance works!


Operating System Tuning

Filesystem Options– EXT or XFS– Using unsafe mount options

Virtual Memory– Swappiness– Dirty Pages

Networking


Java

Only use JDK 8 now

Keep heap size small– Even our largest brokers use a 6 GB heap– Save the rest for page cache

Garbage Collection - G1 all the way– Basic tuning only– Watch for humongous allocations


How Much Do You Need?


Buy The Book!

Early Access available now.

Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting.

Also discusses stream processing and other use cases.


Kafka Cluster Sizing

How big for your local cluster?– How much disk space do you have?– How much network bandwidth do you have?– CPU, memory, disk I/O

How big for your aggregate cluster?– In general, multiple the number of brokers by the number of local clusters– May have additional concerns with lots of consumers


Topic Configuration

Partition Counts for Local– Many theories on how to do this correctly, but the answer is “it depends”– How many consumers do you have?– Do you have specific partition requirements?– Keeping partition sizes manageable

Partition Counts for Aggregate– Multiply the number of partitions in a local cluster by the number of local clusters– Periodically review partition counts in all clusters

Message Retention– If aggregate is where you really need the messages, only retain it in local for long enough

to cover mirror maker problems


Possible Broker Improvements

Namespaces– Namespace topics by datacenter– Eliminate local clusters and just have aggregate– Significant hardware savings

JBOD Fixes– Intelligent partition assignment– Admin tools to move partitions between mount points– Broker should not fail completely with a single disk failure


Administrative Improvements

Multiple cluster management– Topic management across clusters– Visualization of mirror maker paths

Better client monitoring– Burrow for consumer monitoring– No open source solution for producer monitoring (audit)

End-to-end availability monitoring


Keeping An Eye On Things


Monitoring The Foundation

CPU Load

Network inbound and outbound

Filehandle usage for Kafka

Disk– Free space - where you write logs, and where Kafka stores messages– Free inodes– I/O performance - at least average wait and percent utilization

Garbage Collection


Broker Ground Rules

Tuning– Stick (mostly) with the defaults– Set default cluster retention as appropriate– Default partition count should be at least the number of brokers

Monitoring– Watch the right things– Don’t try to alert on everything

Triage and Resolution– Solve problems, don’t mask them


Too Much Information!

Monitoring teams hate Kafka– Per-Topic metrics– Per-Partition metrics– Per-Client metrics

Capture as much as you can– Many metrics are useful while triaging an issue

Clients want metrics on their own topics

Only alert on what is needed to signal a problem


Broker Monitoring

Bytes In and Out, Messages In– Why not messages out?

Partitions– Count and Leader Count– Under Replicated and Offline

Threads– Network pool, Request pool– Max Dirty Percent

Requests– Rates and times - total, queue, local, and send


Topic Monitoring

Bytes In, Bytes Out Messages In, Produce Rate, Produce Failure Rate Fetch Rate, Fetch Failure Rate

Partition Bytes Log End Offset

– Why bother?– KIP-32 will make this unnecessary

Quota Throttling

Provide this to your customers for them to alert on


Client Monitoring

For consumers, use Burrow– Monitor all partitions for all consumers– Provides an easy to digest “good, warning, bad” state, with detail available– Fast and free

Producers are a little harder– Several internal implementations of message auditing– The community needs a good open source standard

Cluster availability monitoring– kafka-monitoring is coming soon from LinkedIn!


It’s Broken! Now What?


All The Best Ops People…

Know more of what is happening than their customers

Are proactive

Fix bugs, not work around them

This applies to our developers too!


Anticipating Trouble

Trend cluster utilization and growth over time

Use default configurations for quotas and retention to require customers to talk to you

Monitor request times– If you are able to develop a consistent baseline, this is early warning


Under Replicated Partitions

Count of number of partitions which are not fully replicated within the cluster

Also referred to as “replica lag”

Primary indicator of problems within the cluster


Broker Performance Checks

Are you still running 0.8? Are all the brokers in the cluster working? Are the network interfaces saturated?

– Reelect partition leaders– Rebalance partitions in the cluster– Spread out traffic more (increase partitions or brokers)

Is the CPU utilization high? (especially iowait)– Is another process competing for resources?– Look for a bad disk

Do you have really big messages?


Kafka’s OK, Now What?

If Kafka is working properly, it’s probably a client issue– Don’t throw it over the fence. Help your customers understand

Common producer issues– Batch size and linger time– Receive and send buffers– Sync vs. async, and acknowledgements

Common consumer issues– Garbage collection problems– Min fetch bytes and max wait time– Not enough partitions


Conclusion


One Ecosystem

Kafka can scale to millions of messages per second, and more– Operations must scale the cluster appropriately– Developers must use the right tuning and go parallel

Few problems are owned by only one side– Expanding partitions often requires coordination– Applications that need higher reliability drive cluster configurations

Either we work together, or we fail separately


Would You Like To Know More?

Presentations: http://www.slideshare.net/toddpalino– More Datacenters, More Problems– Kafka As A Service– Always download the originals for slide notes!

Blog Posts: https://engineering.linkedin.com/blog– Development and SRE blogs on Kafka and other topics

LinkedIn Open Source: https://github.com/linkedin/streaming– Burrow Consumer Monitoring - https://github.com/linkedin/Burrow– Kafka Admin Tools - https://github.com/linkedin/kafka-tools

http://www.slideshare.net/toddpalino

http://www.slideshare.net/toddpalino

https://engineering.linkedin.com/blog

https://engineering.linkedin.com/blog

https://github.com/linkedin/streaming



https://github.com/linkedin/Burrow



https://github.com/linkedin/kafka-tools

https://github.com/linkedin/kafka-tools


Getting Involved With Kafka

http://kafka.apache.org

Join the mailing lists– [email protected]– [email protected]

irc.freenode.net - #apache-kafka

Meetups– Apache Kafka - http://www.meetup.com/http-kafka-apache-org– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/

Contribute code

http://kafka.apache.org/

mailto:[email protected]




http://www.meetup.com/http-kafka-apache-org

http://www.meetup.com/http-kafka-apache-org

http://www.meetup.com/Bay-Area-Samza-Meetup/

http://www.meetup.com/Bay-Area-Samza-Meetup/


Data @ LinkedIn is Hiring!

Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)

LinkedIn– Strong commitment to open source– Do cool things and work with awesome people

Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels



Appendix


JDK Options

Heap Size -Xmx6g -Xms6g

Metaspace -XX:MetaspaceSize=96m -XX:MinMetaspaceFreeRatio=50-XX:MaxMetaspaceFreeRatio=80

G1 Tuning -XX:+UseG1GC -XX:MaxGCPauseMillis=20-XX:InitiatingHeapOccupancyPercent=35-XX:G1HeapRegionSize=16M

GC Logging -XX:+PrintGCDetails -XX:+PrintGCTimeStamps-XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps-XX:+PrintTenuringDistribution-Xloggc:/path/to/logs/gc.log -verbose:gc

Error Handling -XX:-HeapDumpOnOutOfMemoryError-XX:ErrorFile=/path/to/logs/hs_err.log


OS Tuning Parameters

Networking:net.core.rmem_default = 124928net.core.rmem_max = 2048000net.core.wmem_default = 124928net.core.wmem_max = 2048000net.ipv4.tcp_rmem = 4096 87380 4194304net.ipv4.tcp_wmem = 4096 16384 4194304net.ipv4.tcp_max_tw_buckets = 262144net.ipv4.tcp_max_syn_backlog = 1024


OS Tuning Parameters (cont.)

Virtual Memoryvm.oom_kill_allocating_task = 1vm.max_map_count = 200000vm.swappiness = 1vm.dirty_writeback_centisecs = 500vm.dirty_expire_centisecs = 500vm.dirty_ratio = 60vm.dirty_background_ratio = 5


Kafka Broker Sensors

kafka.server:name=BytesInPerSec,type=BrokerTopicMetricskafka.server:name=BytesOutPerSec,type=BrokerTopicMetricskafka.server:name=MessagesInPerSec,type=BrokerTopicMetricskafka.server:name=PartitionCount,type=ReplicaManagerkafka.server:name=LeaderCount,type=ReplicaManagerkafka.server:name=UnderReplicatedPartitions,type=ReplicaManagerkafka.server:name=RequestHandlerAvgIdlePercent,type=KafkaRequestHandlerPoolkafka.controller:name=ActiveControllerCount,type=KafkaControllerkafka.controller:name=OfflinePartitionsCount,type=KafkaControllerkafka.log:name=max-dirty-percent,type=LogCleanerManagerkafka.network:name=NetworkProcessorAvgIdlePercent,type=SocketServerkafka.network:name=RequestsPerSec=*,type=RequestMetricskafka.network:name=RequestQueueTimeMs,request=*,type=RequestMetricskafka.network:name=LocalTimeMs,request=*,type=RequestMetricskafka.network:name=RemoteTimeMs,request=*,type=RequestMetricskafka.network:name=ResponseQueueTimeMs,request=*,type=RequestMetricskafka.network:name=ResponseSendTimeMs,request=*,type=RequestMetricskafka.network:name=TotalTimeMs,request=*,type=RequestMetrics


Kafka Broker Sensors - Topics

kafka.server:name=BytesInPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=BytesOutPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics,topics=*kafka.server:name=TotalProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=FailedProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=TotalFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.server:name=FailedFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*kafka.log:type=Log,name=LogEndOffset,topic=*,partition=*

Engineering

Kafka at Peak Performance