Transcript
Page 1: Distributed Coordination with Python

DISTRIBUTED COORDINATIONWITH PYTHON

Ben Bangertmozilla

Page 2: Distributed Coordination with Python

Tools of the Trade

Page 3: Distributed Coordination with Python

DISTRIBUTED COORDINATION IS NOT...

• Distributed Databases (Cassandra, Riak)

• Distributed Computing (Hadoop, etc.)

• Distributed Event Analysis (Storm)

Page 4: Distributed Coordination with Python

The Common Element

Page 5: Distributed Coordination with Python

Apache Zookeeper

Page 6: Distributed Coordination with Python

ZooKeeper is a centralized service for maintaining configuration information,

naming, providing distributed synchronization, and providing group services.

Page 7: Distributed Coordination with Python

ZOOKEEPER

Page 8: Distributed Coordination with Python

WHY NOT USE...

• Memcached?

• MongoDB?

• Postgres/MySQL?

Page 9: Distributed Coordination with Python
Page 10: Distributed Coordination with Python
Page 11: Distributed Coordination with Python

Hierarchical data structure in znodes

Page 12: Distributed Coordination with Python
Page 13: Distributed Coordination with Python

• Session Based

• Znode watches

• Ephemeral and Sequential Znodes

Page 14: Distributed Coordination with Python

• Last for duration of client session

• Session dies when connection is closed or expires

• Can’t have children znodes

EPHEMERAL ZNODES

Page 15: Distributed Coordination with Python

SEQUENTIAL ZNODES

• Supply a node name (or not), get node name back with a trailing sequence number (0001, 0002, 0003, etc.)

• Can be combined with ephemeral flag

Page 16: Distributed Coordination with Python

BASIC COMMANDS

• create(PATH, DATA...)

• get(PATH...)

• get_children(PATH...)

• set(PATH, DATA...)

• delete(PATH...)

Page 17: Distributed Coordination with Python
Page 18: Distributed Coordination with Python

PYTHON CLIENTS

• txzookeeper

• kazoo

• unified client that works with gevent

• implements wire protocol in pure Python

Page 19: Distributed Coordination with Python

USE KAZOO

Page 20: Distributed Coordination with Python

EASY TO USE

from kazoo.client import KazooClient

client = KazooClient()client.start()

Page 21: Distributed Coordination with Python

USE CASES

Page 22: Distributed Coordination with Python

CONFIGURATION

• Store settings in node data

• Organize node structure

• Set watches on nodes of interest

Page 23: Distributed Coordination with Python
Page 24: Distributed Coordination with Python

PARTY MEMBERSHIP

• Join a party, find out who else is around

• Elect a leader if desired

• Recipe in Kazoo

Page 25: Distributed Coordination with Python

LOCKS

• Lock a resource for a single client

• Lock a resource for multiple clients (Semaphore)

• Hard to write properly

• Recipe in Kazoo

Page 26: Distributed Coordination with Python

BUILDING HIGHER LEVELABSTRACTIONS

ONZOOKEEPER

Page 27: Distributed Coordination with Python

CAVEAT

Page 28: Distributed Coordination with Python

DO NOT IMPLEMENT YOURSELFUSE THE RECIPE

Page 29: Distributed Coordination with Python
Page 30: Distributed Coordination with Python

BASIC STEPS

• Create lock parent node if needed

• Create ephemeral+sequence node under parent, store node name returned

• Get children of lock node

• Sort children list by sequence number

• First child in the list has the lock!

Page 31: Distributed Coordination with Python

THINGS TO WATCH OUT FOR

• Avoid the thundering herd, use watches only when needed

• When our node isn’t the lowest, watch the one in front of us

• Only one client wanting a lock is ‘woken’ when the lock is released by a different client

Page 32: Distributed Coordination with Python

HANDLING FAILURE

Page 33: Distributed Coordination with Python

ROBUST CODE TAKES EFFORT

• What happens when a server fails?

• What happens when the client fails?

• What happens when we don’t know if the server has failed?

Page 34: Distributed Coordination with Python

STOPPING WHEN UNCERTAIN

Page 35: Distributed Coordination with Python

A BIT BETTER VERSION...

Page 36: Distributed Coordination with Python

EVEN BETTER

Page 37: Distributed Coordination with Python

FAILURE WILL HAPPEN

• Fail fast, fail completely.

• Session expiration is a good time to sys.exit

• Always include jitter (kazoo includes jitter on its connection and command retry operations)

• Consider what exceptions can occur in any code relying on a distributed system

Page 38: Distributed Coordination with Python

• Distributed systems are hard

• Use existing battle-proven tools (Zookeeper, Kazoo)

• Always consider everything that can fail, and how

• Be wary of tools that don’t tell you how they fail

• Read Kyle Kingsbury’s Jepsen posts to see examples of systems failing: http://aphyr.com/tags/jepsen

Page 39: Distributed Coordination with Python

FIN

Page 40: Distributed Coordination with Python

QUESTIONS?


Recommended