35
Containing the Database Nick Stott Platform Engineer, Compose

Scylla Summit 2016: Compose on Containing the Database

Embed Size (px)

Citation preview

Containing the DatabaseNick StottPlatform Engineer, Compose

Nick StottPlatform Engineer @Compose

Specialties: Containers and the databases

that go in them.

Compose – Database as a Service• We already host MongoDB, PostgreSQL, Redis,

Elasticsearch, RethinkDB, etcd and RabbitMQ

• Today, we’re announcing ScyllaDB on Compose§ On Compose Enterprise today§ On Compose generally very soon

How Compose Got There

Compose – Phase 1

• Founded as MongoHQ in 2009§ The first hosted MongoDB-as-a-service company§ Y-Combinator Class of 2011

• Early adopters of containers§ Containers helped build the infrastructure to offer DBaaS

Early problems…

• Static deployments, lost a lot of the magic of containers

• Pre-provisioned deployments

• Ran the full system, cron, sshd, everything was running in

the container

• Everything ran on the public internet

• This only worked for Mongo

This sucked….

… but we already knew what we had

to do to fix it.

Elastic Deployments

Compose – Phase 2• Elastic deployments

§ More dynamic containers§ Deploy & destroy containers on demand

• Bonus:§ Orchestration becomes the authority for business/DB logic

But…

• Everything was still facing the public internet

• Orchestration layer became bloated

• This infrastructure limited us

• Still running the entire OS inside the container

• And, of course…

Still Only MongoDB

WE WANTMORE

Into a post-MongoDB world

What needed to change…• At Compose, each database needs:

§ To be on a private network§ Not accessible to the internet § Deploy & destroy containers on demand§ Logic-less orchestration§ Lightweight dynamic containers

The Bad News• From the outside, each database looks different.

§ Different query languages§ Different set of drivers§ Configured differently§ Needs a different environment to run in

• Finding common ground is a daunting problem.

The Good News• From the outside, all databases need the same things:

§ Scaling§ Clustering and Failover§ Backup and restore data§ Quiescing nicely§ Private networks§ Operational health checks

Working for the future

• The next platform we built would need all those things

• So we refined the concepts

• And we took inspiration from the ideas in the Twelve-

Factor App - https://12factor.net/

• So we created…

The Twelve Factors of

Stateful Apps

What is a stateful app?

• A database is an application with state – the data

• A database without data is just a base

12 Factor Stateful Apps

Configuration Scaling

Deployments & Processes Logs & Metrics

Disposability Database Administration

Affinity & Storage Recipes

Network & Portal Access Codebase

Fixed Network Identity Tools & Versions

Configuration• Store configurable parameters in the container’s

environment§ This encourages reuse of container images

• DBs need to be run with one or more configuration§ Configuration files can be built quickly & repeatedly during the

pre-start `Configure` process by putting environmental data into templates

• For Scylla we write the scylla.yaml file and configure the

listen addresses and cluster names.

Deployments and Processes• A usefully redundant database contains more than one

moving piece.§ Decompose the database into a collection of useful and distinct

processes§ Run these processes as possibly stateful services in their own

environment/container

• Scylla is 3 data nodes and 3 proxy nodes

Disposability• Database containers should be entirely ephemeral,

§ Easily created and destroyed.

• Destroying a container should not destroy the data§ The database has a different life cycle than the data

• The database and the data have different lifecycles§ Whatever happens to the database instance the data has to live

on

Affinity and Storage• There needs to be affinity between a database and the

storage § Database nodes can have one or more attached volumes that are

persistent on container restart§ These volumes have a different lifecycle than the container§ Data volumes should be accessible from the host

Network and Portal Access• Databases should live on their own private network

• Access should only happen through specialized “portal”

containers§ Do not unnecessarily expose things on public networks§ Only expose specific, hardened entry points on portal containers

via port binding§ Portal containers should terminate ssl

• For Scylla, each data node is matched with a portal

controlling outside access to it

Fixed Network Identity - 1• The naming and addressing of the the entire system

should be fixed before creation

• Container addresses should be static across container

restarts

• This includes the all elements that make up the networks

configuration

Fixed Network Identity -2 • Discovering network after the fact is problematic.

• Scylla, because of client driver auto discovery, needs the

same number of portals as there are database nodes.

• Each portal needs to know the address of it's partner

database node.

• Each database node also needs to know the address of the

portal.

Scaling• First scale up the container

§ In a hosting environment, there should be plenty of leeway to expand a container’s resources.

§ Databases prefer to stay up so just add resources

• Then scale deployments horizontally§ Scylla lets us do this easily§ PostgreSQL, Redis, MySQL are difficult to scale horizontally§ Scaling horizontally and moving storage can be costly

Logs & Metrics• Database specific metrics and log collection should be

done within the deployment from an extra container

• Collect Logs and Metrics from all nodes

• Each node should provide a stream of logs to stdout§ Easier to collect on container hosts§ Standardized practice across all nodes

Database Orchestration• Encapsulate administration functions within the container

§ Push the logic out of the orchestration and into the containers

• Manage the deployment through the use of recipes§ Sequence of ordered operations§ Don’t know about the internal state of the database

Codebase

• Use one image§ with a deterministic build process § can be used to create many running instances

• Different images should be provided for different database

versions§ For example Scylla 1.0 vs 1.2 vs 1.3 would be three separate

images

Tools and Versions• DB administration tools

§ Should be versioned§ Kept in lock-step with the version of the database- Avoids exposure to changes in how admin tools function.

§ Thrift support in Scylla was released in 1.3- The tools to administer that were added only to that version

• No restarts on upgrades to administration tooling§ Consider overlaying your tooling on top of running databases

That’s the twelve factors….

Configuration Scaling

Deployments & Processes Logs & Metrics

Disposability Database Orchestration

Affinity & Storage Recipes

Network & Portal Access Codebase

Fixed Network Identity Tools & Versions

In practice at Compose today

• We apply these factors throughout our platform

• It works for a wide range of database technologies

• It gives us reliable, repeatable, resilient systems

• And now Scylla is coming to that platform

Thank You!

Contact: [email protected] | @composeio