Upload
jyrki-pulliainen
View
1.147
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Taming Pythons with ZooKeeper
PyCon Finland 2012
@nailor
Content Squad
• Consume boatloads of XML
• Build database out ofit
• Transcode audio
• Enrich data
• Build indexes
• Ship indexes
Day to day operations
Shipping
Pre-defined order
• Central orchestrator
• Assume all machines are running
• Run remote commands
• We need to know all the quirks
• Fail fast
The naïve solution
Guess what? It breaks
...and replacing isn't simple
• No central orchestrator
• Assume some machines are always down
• Run locally
• Shift responsibility to system owners
• Fail gracefully
New tool is needed
CAP Theorem
The CAP Theorem
1.Consistency
2.Availability
3.Partition tolerance
Pick two. Any two will do.
We went for CA
• Users are mostly contained in single DC
• Inside a single DC connections are quite robust
• Remember the order? We need consistency
• Availability is everything
Why?
How?
Apache ZooKeeper
• A distributed tree-like data structure
• Simple primitives
• Automatic leader elections
• Guaranteed hard consistency
• Ephemeral nodes
What?
Tree-like structure
/
/dir/
/
/subdir/
/dir2/
Simple primitives
● Guaranteed atomic operations● Counters● Change notifications
Automatic leader election
● Nodes know who is the most up to date● If no leader can be picked, ZooKeeper refuses to work
Guaranteed hard consistency
● Every change is sent to every node!● Quorum for all operations is always required
Ephemeral nodes
● Node is present only if the client is alive
Library: zkPython
The Good:
● Thin● Comes with ZooKeeper● Maintained by the Apache ZooKeeper project
The bad:
● Thin● C bindings only, no PyPy for you
The ugly:
● No documentation :(
There are others: Kazoo
The Good:
● Pure Python● Recipes implemented● Used by many (Quora, Mozilla, reddit, Zope)
The bad:
● Not much recipes done● Not owned by the mainline
The ugly:
● Own implementation of the protocol
Dos and Don'ts
Don't ship large chunks
Monitor the ZooKeeper
Don't write there all the time
Stay in one DC
Spotify & ZooKeeper
Summary time!
Concurrency == hard
Distributed consistency == hard
No partitions? Go ZooKeeper!
Pick your weapon library
Remember tradeoffs
Thank you