Upload
datastax-academy
View
1.631
Download
0
Embed Size (px)
Citation preview
Myself & Instaclustr
• Adam Zegelin — Founding Software Engineer & Co-founder of [email protected] · @zegelin
• Managed DataStax Enterprise and Apache Cassandra in the ☁ (AWS, Azure, SoftLayer)
• Self-service dashboard — create, manage & monitor clusters • 24/7/365 support, on-call engineers, uptime guarantee • Focus on developing your awesome apps — we handle the Cassandra
• Grew from a need for Cassandra in a project
2© 2015. All Rights Reserved.
Nodes — Software Stack
• CoreOS — lightweight OS • Docker — containerisation of everything • systemd — service managemen • journald — logging • D-Bus — controlling systemd from Java from inside containers
3© 2015. All Rights Reserved.
Initial Implementation
• Amazon Web Services only • Custom Ubuntu AMI (Amazon Machine Image)
• Based on stock Ubuntu AMI • 2 AMIs (PV/HVM) × 9 regions = 18 images per version!
(became unmaintainable very quickly)
• Custom cloud-init scripts — RAID disks, fetch config, etc. • Cassandra installed with apt-get install cassandra / dse
4© 2015. All Rights Reserved.
Initial Implementation — AWS
• We selected instance storage backed AWS instances • Instance storage is fast (SSDs) and low latency (local disk) but is volatile
— terminate the instance and all your data is gone • The alternative, EBS (Elastic Block Storage), is basically SAN — slower,
higher latency and originally shared instance network bandwidth • The newer c4.x and m4.x instances are “EBS optimised” and don’t share these limitations
• Only way to change AMI is to start a new machine • Not possible to use immutable images with persistent ephemeral data
• Only feasible solution for updates is apt-get install
5© 2015. All Rights Reserved.
• One of the first “Docker Operating Systems” • Available on every provider we support — AWS, Azure, SoftLayer • CoreOS has pre-built images
• Small and minimalist — not much userland (not even man!) • Other useful software — etcd, fleet, etc.
(we currently don’t use them — but maybe in the future) • In-use by some big players (Rackspace, PlayStation, Instaclustr 😀 ) • Recent funding from Google Ventures
6© 2015. All Rights Reserved.
• Container runtime + standardised image distribution & hosting + ecosystem • Private image hosting options available, such as quay.io
• Immutable images — Yay! 🎉 • Images running in dev, test and production environments are equal • Software installs, upgrades and uninstalls are clean • Components are isolated — potentially conflicting components (different library
versions, JVM versions, etc.) can co-exist • Even different userland layouts (Ubuntu, Debian, CentOS, etc)
7© 2015. All Rights Reserved.
• We containerise everything — C*, internal services, node management and monitoring apps
• Single, well understood, image build and deploy process — docker build & docker push
• Executed via Makefiles — one Make target per image — make push-all builds and pushes everything
• Helps that all our internal apps are Java-based too
8© 2015. All Rights Reserved.
• Docker gives us immutable images for our components without instance replacement
• CoreOS handles the rest (OS-level) via in-place updates
• Docker is provider agnostic • CoreOS runs on all major cloud providers and bare-metal
• The result ☞ Instaclustr-managed C* can run anywhere #
9© 2015. All Rights Reserved.
+
systemd
• CoreOS uses systemd for service management • systemd supports inter-service dependencies
• e.g. cassandra-backups.service “wants” cassandra.service • aka, cassandra-backups can only run when cassandra is running
• systemd can automatically restart services • Instaclustr services are fail-fast • Cassandra not so much — in some cases — watchdog?
10© 2015. All Rights Reserved.
systemd cont’d
• Manages units of different types — service, timer, target, etc. • service units manage processes • timers start services on a schedule (ala cron) • targets are for grouping/sync points
• cassandra.target “wants” cassandra.service, monitoring.serivce, datastax-agent.service, backups.timer, etc
• All units can define dependencies and conflicts • Dependencies of different “strengths” — Wants vs. Requires • In both directions — Requires and RequiredBy
11© 2015. All Rights Reserved.
Basic Integration
• Cassandra runs as PID 1 in the container • 1 primary process per container model
• Runs in foreground mode (-f) • Responds to SIGTERM via docker stop, systemctl stop, etc
• Cassandra data and configuration is persistent on host • Survives container restart • Cassandra data and configuration directories mounted from host
docker run -v /var/lib/instaclustr/etc/cassandra:/etc/cassandra …
12© 2015. All Rights Reserved.
Basic Integration cont’d
• Docker containers managed via systemd • cassandra.service execs docker run cassandra … • systemctl [start|stop|restart|status|…] cassandra
• Cassandra logging configured to write only to stdout • systemd logging best practice • Cassandra ⇢ Docker ⇢ systemd ⇢ journald
• journalctl -u cassandra
13© 2015. All Rights Reserved.
Basic Integration — Issues
• systemd starts dependent units when state is active • process running = service active — unless configured otherwise
• ∴ dependent units start immediately • process can hang but service stays active
14© 2015. All Rights Reserved.
Cassandra Startup
• JVM starts quickly • JMX (nodetool) connectivity is available early
• Objects are exposed where they are constructed • CQL/Thrift available late
• Can be toggled via cassandra.yaml or JMX/nodetool
• When is Cassandra “running”? • When does cassandra.service transition from activating to active?
• When do dependent services start?
15© 2015. All Rights Reserved.
D-Bus
• RPC between processes • Notifications • Socket-based (typically UNIX sockets, but can be TCP)
• Accessible inside a container — mount the socketdocker run -v /run/dbus:/run/dbus -v /run/systemd:/run/systemd …
• Multiple language bindings, including Java
16© 2015. All Rights Reserved.
D-Bus cont’d
• systemd is controlable via D-Bus • Control host systemd inside a Docker container • No need to fork/exec to run systemctl and co.
(in-fact, systemctl is a wrapper around D-Bus calls)
17© 2015. All Rights Reserved.
D-Bus cont’d
Java bindings — dbus-java systemctl restart cassandra ≝ systemdManager.RestartUnit("cassandra.service", "replace");
18© 2015. All Rights Reserved.
Enhanced Integration
• Service status = “active” — process running, or something more? • Cassandra java process running vs. C* accepting CQL connections
• CQL clients are dependencies, but shouldn’t start until CQL is available • Clients could fail-fast on no connectivity
• Will be automatically restarted • Service will oscillate between active and failed — hard to detect
actual failures • systemd will eventually timeout or give up — configurable • JVM startup can be expensive — CPU usage spikes
19© 2015. All Rights Reserved.
Enhanced Integration cont’d
• systemd targets for CQL & Thrift — cassandra-cql.target • Life-cycle tracks internal C* service
• i.e., Starts when CQL is available — not immediate • nodetool disablebinary implies systemctl stop cassandra-cql.target • Services that require CQL connectivity use
WantedBy=cassandra-cql.target • Starting cassandra-cql.target starts these services too • Inverse of Wants
20© 2015. All Rights Reserved.
Enhanced Integration cont’d
• Java Agent side-loaded into Cassandra JVM • Hooks into CQL/Thrift service life-cycle
• Implemented using runtime byte-code modification • Controls systemd via D-Bus to start/stop associated
target units • But Cassandra is open-source — why not modify‽
• Agents work with DSE & Apache Cassandra
21© 2015. All Rights Reserved.
Java Agent
• Java Agents (java.lang.instrument) • java -javaagent:instaclustr-agent.jar …
• premain(…) method called at JVM startup • can hook into JVM class-loading, transform byte-code, etc.
• Javassist, ASM — byte-code modification libraries
22© 2015. All Rights Reserved.
Hookspublic interface Server { public void start(); public void stop();
⋮}
// in CassandraDaemon:
// ThriftthriftServer = new ThriftServer(rpcAddr, rpcPort, listenBacklog); ⋮thriftServer.start(); ⋮thriftServer.stop(); // CQLnativeServer = new org.apache.cassandra.transport.Server(nativeAddr, nativePort); ⋮ nativeServer.start(); ⋮ nativeServer.stop();
23© 2015. All Rights Reserved.
Hookspublic static void premain(String agentArgs, Instrumentation inst) { inst.addTransformer((loader, className, classBeingRedefined, protectionDomain, classfileBuffer) -> { if (!"org/apache/cassandra/transport/Server".equals(className)) return null; final ClassPool pool = ClassPool.getDefault(); try { final CtClass ctClass = pool.get("org.apache.cassandra.transport.Server"); // patch start() and stop() methods of the Server class { final CtMethod method = ctClass.getDeclaredMethod("start"); method.insertAfter("com.instaclustr.Agent.serverStarted($0);"); } { final CtMethod method = ctClass.getDeclaredMethod("stop"); method.insertAfter("com.instaclustr.Agent.serverStopped($0);"); } byte[] byteCode = ctClass.toBytecode(); ctClass.detach(); return byteCode; // return the modified byte-code } catch (final Exception e) {…} return null; });}
// called when Server started — call systemd via dbus-java to start cassandra-cql.target public static void serverStarted(final CassandraDaemon.Server server) {…}
// called when Server stopped — call systemd via dbus-java to stop cassandra-cql.targetpublic static void serverStopped(final CassandraDaemon.Server server) {…}
24© 2015. All Rights Reserved.
Docker Limitations and Sore Spots
• docker run is just a TTY proxy — actual container process is under the docker dæmon process/cgroup
• systemd requires startup & watchdog notifications to originate from started process, child, or process in same cgroup
• docker crash = all containers go bye-bye • docker … everything — inc. image downloads & builds — runs as
root in the dæmon! • processes inside containers are run un-elevated
25© 2015. All Rights Reserved.
Future
• Devel. systemd can now launch Docker containers natively via machinectl
• Tighter integration with systemd • Process hierarchy is correct — right cgroup and parents • Java Agent can notify systemd for startup, status &
watchdog — via JNA + libsystemd
26© 2015. All Rights Reserved.