When One Data Center is not Enough Guozhang Wang Strata San Jose, 2016 Building large-scale stream infrastructure across multiple data centers with Apache Kafka

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

Embed Size (px)

Citation preview

Page 1: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

When One Data Center is not Enough

Guozhang Wang Strata San Jose, 2016

Building large-scale stream infrastructure across multiple data centers with Apache Kafka

Page 2: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion


Page 3: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why across Data Centers?

Page 4: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why across Data Centers

• Catastrophic / expected failures

• Routine maintenance

• Geo-locality (Example: CDNs)

Page 5: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why NOT across Data Centers

• Low bandwidth (10Mbps - 1Gbps)

• High latency (50ms - 450ms)

• Much More $$$

Page 6: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why NOT across Data Centers

• … is hard and expensive

Page 7: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why NOT across Data Centers

• … is hard and expensive

• … with real-time writes? Harder

Page 8: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Why NOT across Data Centers

• … is hard and expensive

• … with real-time writes? Harder

• … consistently? Oh My!

Page 9: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka



• Weak

• Eventual

• StrongLatency Guarantee

Page 10: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Weak No Consistency

• Now you see my writes, now you don’t

• Best effort only, data can be stale

• Examples: think of “caches”, VoIP

Page 11: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Eventual Consistency

• You will see my writes, … eventually

• May need to resolve conflicts (manually)

• Examples: think of “emails”, SMTP

Page 12: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Strong Consistency

• You get what you write, for sure

• External > Sequential > Causal (Session)

• Examples: RDBMS, file systems

Page 13: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• LAN: consistency over latency

• WAN: latency over consistency

Latency vs. Consistency

Page 14: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion


Page 15: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option I: Don’t do it

• Bunkerize the single data center

• Expect data loss at failures

• Examples: ??

Page 16: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option II: Primary with Hot Standby

• Failover to hot standby (maybe inconsistent)

• Window of data loss at failures

• Examples: MySQL binlog

Page 17: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option III: Active-Active

• Accepts writes in multi-DC

• Resolve conflicts (strong / week consistency)

• Examples: Amazon DynamoDB (vector clock) Google Spanner (2PC), Mesa (Paxos)

Page 18: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Ordering is the Key!

Page 19: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Ordering is Key

• Vector clocks: partial ordering

• Paxos, 2PC: global ordering

• Log shipping: logical ordering (per-partition)

Page 20: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Page 21: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Apache Kafka

• A distributed messaging system

..that store messages as a log!

Page 22: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Store Messages as a Log

4 5 5 7 8 9 10 11 12...

Producer Write

Consumer1 Reads (offset 7)

Consumer2 Reads (offset 10)



Page 23: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Partition the Log across Machines

Topic 1

Topic 2







Page 24: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


ACK mode Latency On Failures

“no" no network delay some data loss

“leader" 1 network roundtrip a few data loss

“all" ~2 network roundtrips no data loss

Configurable ISR Commits

Page 25: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• Why across Data Centers?

• Design patterns for Multi-DC

• Kafka for Multi-DC

• Conclusion


Page 26: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option I: Active-Passive Replication

Kafka local


consumer consumer

DC 1


DC 2

Kafka replica

Page 27: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option I: Active-Passive Replication

• Async- replication across DC

• May lose data on failover

• Example: ETL to data warehouse / HDFS

Kafka local


consumer consumer

DC 1


DC 2

Kafka replica

Page 28: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option II: Active-Active Replication

Kafka local

Kafka aggregate

Kafka aggregate

producers producers

consumer consumer

MirrorMakerKafka local

on DC1 failure

DC 1 DC 2

Page 29: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option II: Active-Active Replication

• Global view on agg. cluster

• Require offsets to resume

• Example: store materialization, index updates

Kafka local

Kafka agg

Kafka agg

producers producers

consumer consumer

MirrorMakerKafka local

on DC1 failure

DC 1 DC 2

Page 30: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• Offsets not identical between Kafka clusters• Duplicates during failover• Partition selection may be different

• Solutions• Resume from log end offset (suitable for real-time apps)• Resume from a timestamp (ListOffsets, offset index: KIP-33)

Caveats: offsets across DCs

Page 31: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option III: Deploy across DCs


producers producers

consumer consumer

DC 1 DC 2

Page 32: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Option III: Deploy across DCs

• Multi-tenancy support• Security (0.9)

• Quota Management (0.9)

• Latency optimization• Rack-aware partition assignment (0.10)

• Read affinity (future?)


producers producers

consumer consumer

DC 1 DC 2

Page 33: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


• Same region: essentially same network• asymmetric partitioning is rare, low latency• Need at least 3 DCs for Zookeeper

• Reserved instance to reduce churns• EIP for external clients, private IPs for internal communication• Reserved instance, local storage

Example: EC2 multi-AZ Deployment

Page 34: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka


Take-aways• Multi-DC: trade-off between latency and consistency

• Kafka: replicated log streams for multihoming

Page 35: Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

Thank youGuozhang | [email protected] | @guozhangwang

Meet Confluent in booth #838

Confluent University ~ Kafka training ~ confluent.io/training

Join the Stream Data Hackathon Apr 25, SFkafka-summit.org/hackathon/

Download Apache Kafka & Confluent Platform
