Getting Rid of Zookeeper
MesosCon Asia 2016
1
Jay GuoSoftware Developer @[email protected]
Kapil AryaMesos Committer @Mesosphere
Motivation
2
Zookeeper is...
● Mature● Feature-rich● ...
3
But!
● Primitive K/V store○ Provide your own tooling for other abstractions!
● Heavy● Hard dependencies● Language binding instead of RESTfull API● ...
4
It’s all about having options!
5
● Chocolate
● Strawberry
● Vanilla
● ...
Mesos HA:An Overview
6
High Availability
First of all, we need a Distributed Key-Value storage...
7
Mesos HA
● At least three Mesos Masters● One leading Master
○ Leader election○ Leader detection
● Replicated Log
Zookeeper as the distributed key-value store
8
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
leader
Zookeeper cluster
Mesos Agents / Frameworks
Mesos HA: Leader Election
● All Masters “contend” to be the leader!
9
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Contend
Contend Contend
Mesos HA: Leader Election
● All Masters “contend” to be the leader!
● Only one succeeds; others fail
10
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Fail
Fail Success
Mesos HA: Leader Election
● All Masters “contend” to be the leader!
● Only one succeeds; others fail ● We have a leading Masters!
11
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Watch
Watch Hold
Mesos HA: Losing a Leader
● Suppose the leading Master is “lost”
12
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Watch
Watch
Master connection lost
Mesos HA: Losing a Leader
● Suppose the leading Master is “lost”
● All other Masters are notified
13
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Notify
Notify
Master connection lost
Mesos HA: Losing a Leader
● Suppose the leading Master is “lost”
● All other Masters are notified● The remaining Masters
contend again
14
ZKZKZK
MesosMaster
MesosMaster
Contend
Contend
MesosMaster
Mesos HA: Losing a Leader
● Suppose the leading Master is “lost”
● All other Masters are notified● The remaining Masters
contend again● One of them succeeds
15
ZKZKZK
MesosMaster
MesosMaster
Success
Fail
MesosMaster
Mesos HA: Losing a Leader
● Suppose the leading Master is “lost”
● All other Masters are notified● The remaining Masters
contend again● One of them succeeds
● A new leader is elected!
16
ZKZKZK
MesosMaster
MesosMaster
Watch
Hold
MesosMaster
What about Agents/Frameworks?
17
Mesos HA: Leader Detection
● Framework/Agent connects to Zookeeper to “detect” about the current leading Master
18
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Watch
Watch Hold
Mesos Agents/ Frameworks
Detect
Mesos HA: Leader Detection
● Framework/Agent connects to Zookeeper to “detect” about the current leading Master
● Zookeeper provides Master’s location○ I.e. IP:Port
19
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Watch
Watch Hold
IP:Port
Mesos Agents/ Frameworks
Mesos HA: Leader Detection
● Framework/Agent connects to Zookeeper to “detect” the current leading Master
● Zookeeper provides Master’s location
● Framework/Agent connectsto the “leader”
20
ZKZKZK
MesosMaster
MesosMaster
MesosMaster
Watch
Watch Hold
Connect
Mesos Agents/ Frameworks
What about Replicated Log?
Replicated Log lets you create replicated fault-tolerant append-only logs. The Mesos master uses Replicated Log to store cluster state in a replicated, durable way.
21
Mesos HA: Replicated Log
● Each replica registers its pid into ZK and maintain the presence.
22
ZKZKZK
Replica
Replica
Register & hold
Register & hold
Mesos HA: Replicated Log
23
● Each replica registers its pid into ZK and maintain the presence.
● When new replica joins the cluster, existing ones get notified and get to know the pid of new replica.
ZKZKZK
Replica
ReplicaReplica
Notified with info of new replica
registerNotified with info of new replica
Mesos HA: Replicated Log
24
● Each replica registers its own pid into ZK and maintain the presence.
● When new replica joins the cluster, existing ones get notified and get to know the pid of new replica.
● Every replica knows all nodes in the cluster and do Paxos.
ZKZKZK
Replica
ReplicaReplica
Paxos
Paxos
Paxos
ReplacingZookeeper
25
? = ZK Etcd Consul|||| ...
ZKZK?
MesosMaster
MesosMaster
MesosMaster
leader
DistributedKV Store
● Master Contender for leader election
Three Key Components
26
ZKZK?
MesosMaster
Contender
DistributedKV Store
bool contend();
● Master Contender for leader election
● Master Detector for discovery
Three Key Components
27
ZKZK?
MesosMaster
ContenderDetector
DistributedKV Store
bool contend();
MasterInfo detect(MasterInfo previous);
● Master Contender for leader election
● Master Detector for discovery
● PIDGroup for initialization
Three Key Components
28
bool contend();
MasterInfo detect(MasterInfo previous);
void initialize(pid_t pid); ZKZK?
MesosMaster
ContenderDetector PIDGroup
DistributedKV Store
A Case for Modularization!
29
● Already a clear-cut interfaces between:○ Master and Contender○ Agent and Detector○ Framework and Detector
● For new distributed KV store implementation, we just write the module without having to modify Mesos itself! ZKZK?
MesosMaster
ContenderDetector PIDGroup
DistributedKV Store
Let’s Talk about Modules!
30
MesosModules
● Module/Plugin/Extension● Add/replace a Mesos component
○ Isolators○ Authenticators○ …
● Hook modules:○ Listen to interesting events○ Modify/enhance certain code paths○ Prepare/enhance task environment○ ...
31
● Compiled as shared libraries○ E.g., libmesos_network_overlay.so
● Specified when launching Master/Agent/Frameworkmesos-agent.sh <master-parameters>
--modules=file:///path/to/modules.json
--isolation=”my_isolator”
● Gets loaded during initialization○ E.g., the ”my_isolator” isolator will be loaded into the Agent to
provide task isolation
How are Modules Used?
32
Community Modules
33
I just wrote a Mesos module that provides a really cute feature.
How do I make it useful for others!
Modules are Tricky!
34
● Developing● Building● Testing● Using● Hosting
● How can we make it all better for community?
Writing Modules
● Doesn’t require intimate Mesos knowledge○ Just the details of the subsystem being implemented (e.g., Isolators)
● Familiarity with Mesos model is required○ E.g., libprocess, events, futures and promises, etc.
● Closely tied with Mesos version○ To ensure mutual compatibility
35
Building Modules: Issues
● Build Mesos first!○ Install all Mesos dependencies○ Takes a long time to build○ Version dependencies
36
Building Modules: Good News!
● Starting Mesos 1.0 release, pre-compiled Mesos deb/rpm packages contain everything needed to build modules
37
Testing Modules
38
I just wrote a simple Mesos module that provide a cute feature and I know how to build it!
Can I write unit tests for it?
Testing Modules
● Key questions:○ How to get good test coverage?○ How can we solicit help from community?
● Good news!○ Efforts on the way to create a “libmesos_test” library that can be used to
create/run gmock style tests just like with Mesos itself.
39
How do we, as a community, make third-party modules available for general consumption?
While making sure the developers and consumers can seamlessly test/integrate into their environments!
Community-Driven Modules
40
Community Modules: Proposal
● A central registry that contains pointers:○ E.g., github.com/mesos/modules ○ Each module (or a set of related modules) in its own repository
● Make Mesos version-specific binary rpm/deb modules available○ E.g., lib_my_module_<module-version>_<mesos_version>.so
41
Module CI: Coming Soon!
● Builds binary packages for every registered module○ Across a given set of Mesos versions○ Work-in-progress!
● Automatic build/testing for upcoming Mesos release○ Catch incompatibilities sooner!
● Run tests!
42
Let’s take a look at Etcd!
43
Etcd:A Distributed KV Store
● HTTP API (no language bindings)● May already exist in your environments
44
Etcd in a Mesos Cluster
45
● Create Etcd-specific modules for:○ Master detector○ Master Contender○ PIDGroup
● No need to modify/rebuild Mesos
ZKZK
MesosMaster
ContenderDetector PIDGroup
DistributedKV Store
Etcd
Again, it’s all about having options!
46
● Chocolate
● Strawberry
● Vanilla
● ...
Again, it’s all about having options!
47
● ChocolateZookeeper
● StrawberryEtcd
● VanillaConsul
● ...
Demo!
48
Module CI:A Glimpse!
49
Acknowledgments!
● Shuai Lin● Cody Maloney● Benjamin Hindman● Joseph Wu
50
Thanks!
51
● Etcd modules:○ https://github.com/guoger/mesos-etcd-module/tree/1.1.x○ https://github.com/guoger/mesos/tree/pid-group-on-1.1.x