34
SITE RELIABILITY ENGINEERING ©2015 LinkedIn Corporation. All Rights Reserved. More Datacenters, More Problems Multi-Tier Architectures

More Datacenters, More Problems

Embed Size (px)

Citation preview

Page 1: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.

More Datacenters, More ProblemsMulti-Tier Architectures

Page 2: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.

Todd PalinoStaff Site Reliability EngineerLinkedIn, Data Infrastructure Streaming

Page 3: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 3

Who Am I?

Kafka, Samza, and Zookeeper SRE at LinkedIn

Site Reliability Engineering– Administrators– Architects– Developers

Keep the site running, always

Page 4: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 4

Data @ LinkedIn is Hiring!

Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)

Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels

LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup

Page 5: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 5

Kafka At LinkedIn

1100+ Kafka brokers Over 32,000 topics 350,000+ Partitions

875 Billion messages per day 185 Terabytes In 675 Terabytes Out

Peak Load (whole site)– 10.5 Million messages/sec– 18.5 Gigabits/sec Inbound– 70.5 Gigabits/sec Outbound

1800+ Kafka brokers Over 95,000 topics 1,340,000+ Partitions

1.3 Trillion messages per day 330 Terabytes In 1.2 Petabytes Out

Peak Load (single cluster)– 2 Million messages/sec– 4.7 Gigabits/sec Inbound– 15 Gigabits/sec Outbound

Page 6: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 6

What Will We Talk About?

Tiered Cluster Architecture

Multiple Datacenters

Performance Concerns

Conclusion

Page 7: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 7

Tiered Cluster Architecture

Page 8: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 8

One Kafka Cluster

Page 9: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 9

Single Cluster – Remote Clients

Page 10: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 10

Single Cluster – Spanning Datacenters

Page 11: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 11

Multiple Clusters – Local and Remote Clients

Page 12: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 12

Multiple Clusters – Message Aggregation

Page 13: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 13

Why Not Direct?

Network Concerns– Bandwidth– Network partitioning– Latency

Security Concerns– Firewalls and ACLs– Encrypting data in transit

Resource Concerns– A misbehaving application can swamp production resources

Page 14: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 14

What Do We Lose?

You may lose message ordering– Mirror maker breaks apart message batches and redistributes them

You may lose key to partition affinity– Mirror maker will partition based on the key– Differing partition counts in source and target will result in differing distribution– Mirror maker does not (without work) honor custom partitioning

Page 15: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 15

Multiple Datacenters

Page 16: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 16

Why Multiple Datacenters?

Disaster Recovery Geolocalization Legal / Political

Page 17: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 17

Planning For the Worst

Keep your datacenters identical– Snowflake services are hard to fail over

Services that need to run only once in the infrastructure need to coordinate– Zookeeper across sites can be used for this (at least 3 sites)– Moving from one Kafka cluster to another is hard– KIP-33: Time-based log indexing

What about AWS (or Google Cloud)?– Disaster recovery can be accomplished via availiability zones

Page 18: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 18

Planning For the Best

Not much different than designing for disasters

Consider having a limited number of aggregate clusters– Every copy of a message costs you money– Have 2 downstream (or super) sites that process aggregate messages– Push results back out to all sites– Good place for stream processing, like Samza

What about AWS (or Google Cloud)?– Geolocalization can be accomplished via regions

Page 19: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 19

Segregating Data

All the data, available everywhere The right data, only where we need it

Topics are (usually) the smallest unit we want to deal with– Mirroring a topic between clusters is easy– Filtering that mirror is much harder

Have enough topics so consumers do not throw away messages

Page 20: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 20

Retaining Data

Retain data only as long as you have to

Move data away from the front lines as quickly as possible.

Page 21: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 21

Performance Concerns

Page 22: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 22

Buy The Book!

Early Access available now.

Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting.

Also discusses stream processing and other use cases.

Page 23: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 23

Kafka Cluster Sizing

How big for your local cluster?– How much disk space do you have?– How much network bandwidth do you have?– CPU, memory, disk I/O

How big for your aggregate cluster?– In general, multiple the number of brokers by the number of local clusters– May have additional concerns with lots of consumers

Page 24: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 24

Topic Configuration

Partition Counts for Local– Many theories on how to do this correctly, but the answer is “it depends”– How many consumers do you have?– Do you have specific partition requirements?– Keeping partition sizes manageable

Partition Counts for Aggregate– Multiply the number of partitions in a local cluster by the number of local clusters– Periodically review partition counts in all clusters

Message Retention– If aggregate is where you really need the messages, only retain it in local for long enough

to cover mirror maker problems

Page 25: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 25

Where Do I Put Mirror Makers?

Best practice was to keep the mirror maker local to the target cluster– TLS (can) make this a new game

In the datacenter with the produce cluster– Fewer issues from networking– Significant performance hit when using TLS on consume

In the datacenter with the consume cluster– Highest performance TLS– Potential to consume messages and drop on produce

Page 26: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 26

Mirror Maker Sizing

Number of servers and streams– Size the number of servers based on the peak bytes per second– Co-locate mirror makers– Run more mirror makers in an instance than you need– Use multiple consumer and producer streams

Other tunables to look at– Partition assignment strategy– In flight requests per connection– Linger time

Page 27: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 27

Segregation of Topics

Not all topics are created equal

High Priority Topics– Topics that change search results– Topics used for hourly or daily reporting

Low Latency Topics

Run a separate mirror maker for these topics– One bloated topic won’t affect reporting– Restarting the mirror maker takes less time– Less time to catch up when you fall behind

Page 28: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 28

Conclusion

Page 29: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 29

Broker Improvements

Namespaces– Namespace topics by datacenter– Eliminate local clusters and just have aggregate– Significant hardware savings

JBOD Fixes– Intelligent partition assignment– Admin tools to move partitions between mount points– Broker should not fail completely with a single disk failure

Page 30: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 30

Mirror Maker Improvements

Identity Mirror Maker– Messages in source partition 0 get produced directly to partition 0– No decompression of message batches– Keeps key to partition affinity, supporting custom partitioning– Requires mirror maker to maintain downstream partition counts

Multi-Consumer Mirror Maker– A single mirror maker that consumes from more than one cluster– Reduces the number of copies of mirror maker that need to be running– Forces a produce-local architecture, however

Page 31: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 31

Administrative Improvements

Multiple cluster management– Topic management across clusters– Visualization of mirror maker paths

Better client monitoring– Burrow for consumer monitoring– No open source solution for producer monitoring (audit)

End-to-end availability monitoring

Page 32: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 32

Getting Involved With Kafka

http://kafka.apache.org

Join the mailing lists– [email protected][email protected]

irc.freenode.net - #apache-kafka

Meetups– Apache Kafka - http://www.meetup.com/http-kafka-apache-org– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/

Contribute code

Page 33: More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 33

Data @ LinkedIn is Hiring!

Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)

Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels

LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup

Page 34: More Datacenters, More Problems