Upload
mumrah
View
8.938
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"Demo code available at https://github.com/mumrah/trihug-zookeeper-demo
Citation preview
Apache ZooKeeperAn Introduction and Practical Use Cases
Who am I● David Arthur● Engineer at Lucid Imagination● Hadoop user● Python enthusiast● Father● Gardener
Play along!Grab the source for this presentation at GitHub github.com/mumrah/trihug-zookeeper-demo
You'll need Java, Ant, and bash.
Apache ZooKeeper● Formerly a Hadoop sub-project● ASF TLP (top level project) since Nov 2010● 7 PMC members, 8 committers - most from
Yahoo! and Cloudera ● Ugly logo
One liner"ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers" - ZooKeeper wiki
Who uses it?Everyone* * https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy
● Yahoo!● HBase● Solr● LinkedIn (Kafka, Hedwig)● Many more
What is it good for?● Configuration management - machines
bootstrap config from a centralized source, facilitates simpler deployment/provisioning
● Naming service - like DNS, mappings of names to addresses
● Distributed synchronization - locks, barriers, queues
● Leader election - a common problem in distributed coordination
● Centralized and highly reliable (simple) data registry
Namespace (ZNodes)parent : "foo" |-- child1 : "bar" |-- child2 : "spam" `-- child3 : "eggs" `-- grandchild1 : "42" Every znode has data (given as byte[]) and can optionally have children.
Sequential znodeNodes created in "sequential" mode will append a 10 digit zero padded monotonically increasing number to the name. create("/demo/seq-", ..., ..., PERSISTENT_SEQUENTIAL) x4 /demo|-- seq-0000000000|-- seq-0000000001|-- seq-0000000002`-- seq-0000000003
Ephemeral znodeNodes created in "ephemeral" mode will be deleted when the originating client goes away. create("/demo/foo", ..., ..., PERSISTENT);create("/demo/bar", ..., ..., EPHEMERAL);
Connected Disconnected/demo /demo
|-- foo `-- foo `-- bar
Simple APIPretty much everything lives under the ZooKeeper class ● create● exists● delete● getData● setData● getChildren
Synchronicitysync and async version of API methods exists("/demo", null); exists("/demo", null, new StatCallback() {
@Overridepublic processResult(int rc,
String path, Object ctx, Stat stat) {
...}
}, null);
WatchesWatches are a one-shot callback mechanism for changes on connection and znode state ● Client connects/disconnects● ZNode data changes● ZNode children change
Demo time!For those playing along, you'll need to get ZooKeeper running. Using the default port (2181), run:
ant zk Or specify a port like:
ant zk -Dzk.port=2181
Things to "watch" out for● Watches are one-shot - if you want continuous
monitoring of a znode, you have to reset the watch after each event
● Too many clients watches on a single znode creates a "herd effect" - lots of clients get notifications at the same time and cause spikes in load
● Potential for missing changes● All watches are executed in a single, separate
thread (be careful about synchronization)
Building blocks● Hierarchical nodes● Parent and leaf nodes can have data● Two special types of nodes - ephemeral and
sequential● Watch mechanism● Consistency guarantees
○ Order of updates is maintained○ Updates are atomic○ Znodes are versioned for MVCC○ Many more
The Fun StuffRecipes:● Lock● Barrier● Queue● Two-phase commit● Leader election● Group membership
Demo Time!Group membership (i.e., the easy one) Recipe:● Members register a sequential ephemeral
node under the group node● Everyone keeps a watch on the group node
for new children
Lots of boilerplate● Synchronize the asynchronous connection
(using a latch or something)● Handling disconnects/reconnects● Exception handling● Ensuring paths exist (nothing like mkdir -p)● Resetting watches● Cleaning up
What happens?● Everyone writes their own high level
wrapper/connection manager○ ZooKeeperWrapper○ ZooKeeperSession○ (\w+)ZooKeeper○ ZooKeeper(\w+)
Open Source, FTW!Luckily, some smart people have open sourced their ZooKeeper utilities/wrappers ● Netflix Curator - Netflix/curator● Linkedin - linkedin/linkedin-zookeeper● Many others
Netflix Curator● Handles the connection management● Implements many recipes
○ leader election○ locks, queues, and barriers○ counters○ path cache
● Bonus: service discovery implementation (we use this)
Demo Time!Group membership refactored with Curator ● EnsurePath is nice● Robust connection management is
awesome● Exceptions are more sane
Thoughts on Curatori.e., my non-expert subjective opinions
● Good level of abstraction - doesn't do anything "magical"
● Doesn't hide ZooKeeper● Weird API design (builder soup)● Extensive, well tested recipe support● It works!
ZooKeeper in the wildSome use cases
Use case: Solr 4.0Used in "Solr cloud" mode for:● Cluster management - what machines are
available and where are they located● Leader election - used for picking a shard as
the "leader"● Consolidated config storage● Watches allow for very non-chatty steady-
state● Herd effect not really an issue
Use case: Kafka● Linkedin's distributed pub/sub system● Queues are persistent● Clients request a slice of a queue (offset,
length)● Brokers are registered in ZooKeeper, clients
load balance requests among live brokers● Client state (last consumed offset) is stored
in ZooKeeper● Client rebalancing algorithm, similar to
leader election
Use case: LucidWorks Big Data
● We use Curator's service discovery to register REST services
● Nice for SOA● Took 1 dev (me) 1 day to get something
functional (mostly reading Curator docs)● So far, so good!
Review of "gotchas"● Watch execution is single threaded and synchronized● Can't reliably get every change for a znode● Excessive watchers on the same znode (herd effect)
Some new ones
● GC pauses: if your application is prone to long GC pauses, make sure your session timeout is sufficiently long
● Catch-all watches: if you use one Watcher for everything, it can be tedious to infer exactly what happened
Four letter wordsThe ZooKeeper server responds to a few "four letter word" commands via TCP or Telnet*
> echo ruok | nc localhost 2181imok
I'm glad you're OK, ZooKeeper - really I am. * http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
QuorumsIn a multi-node deployment (aka, ZooKeeper Quorum), it is best to use an odd number of machines. ZooKeeper uses majority voting, so it can tolerate ceil(N/2)-1 machine failures and still function properly.
Multi-tenancyZooKeeper supports "chroot" at the session level. You can add a path to the connection string that will be implicitly prefixed to everything you do:
new ZooKeeper("localhost:2181/my/app");
Curator also supports this, but at the application level:
CuratorFrameworkFactory.builder()
.namespace("/my/app");
Python clientDumb wrapper around C client, not very Pythonic import zookeeperzk_handle = zookeeper.init("localhost:2181")zookeeper.exists(zk_handle, "/demo")zookeeper.get_children(zk_handle, "/demo") Stuff in contrib didn't work for me, I used a statically linked version: zc-zookeeper-static
Other clientsIncluded in ZooKeeper under src/contrib:● C (this is what the Python client uses)● Perl (again, using the C client)● REST (JAX-RS via Jersey)● FUSE? (strange) 3rd-party client implementations:● Scala, courtesy of Twitter● Several others
Overview● Basics of ZooKeeper (znode types, watches)● High-level recipes (group membership, et
al.)● Lots of boilerplate for basic functionality● 3rd party helpers (Curator, et al.)● Gotchas and other miscellany
Questions?David [email protected]/mumrah/trihug-zookeeper-demo