Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Rebuilding your Cloud, Multiple Times a Day
10/07/2015
Vilmos Nebehaj
Sauce Labs
Sauce Labs
HQ in San Francisco, 2nd office in Vancouver
~100 employees, almost 50% are engineers
Main product: Selenium and Appium testing in the cloud
Private cloud which runs tens of millions of jobs every month
>500 combinations of OSes, desktop browsers, mobile emulators/simulators and real mobile devices for testing browser, native and hybrid applications
We’re hiring
Immutable Infrastructure
"A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in. In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention.
Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible."
Without mutable variables, testing becomes trivial: if we're transforming certain input via a given side effect free function, we always get the same output (referential transparency).
Note: this is just an abstraction of course. If you drill deep enough, latest at the CPU instruction level, you have side effects e.g. caches, TLB, etc. But as an abstraction, this is still pretty useful.
So what are the downsides?
In several cases, performance is not as good as with simply mutating a data structure in place.
What does it have to do with my infrastructure?
In Ops/DevOps, we have the exact same issue as in the application
development space. Large fraction of the problems we are facing
are due to the almost incomprehensible state space in
configuration on our servers.
Think about it: how many configuration files are there, with how
many possible settings in each on the average server? What
interactions and interference is possible between them?
NO MUTATING STATE
in your infrastructure?
The primary goal of treating your infrastructure as code:
"Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal resources."
Model Configuration Enabling technology
pets manual, minimal scripting internet, IP, server hosting
cattle automated configuration management softwareinfrastructure as code
immutable infrastructure
automatedno modification, rebuilding for any change
virtualizationcloud services
Containers vs VMs
Containers
● Security concerns
● Lock you into a specific OS
● No (or minimal) performance
penalty
● Lightweight
● Very fast startup times
Virtual Machines
● Fully isolated at the
hardware level
● Another layer of security
● Different operating
environment (kernel/OS)
● Performance overhead
● Slower boot times
Two repositories:● sauce-ansible with our inventory, playbooks and roles● vmbuilder with a packer templates for our VMs/containers
We use branch builds for pull requests in sauce-ansible.
A commit/merge into sauce-ansible master kicks off new image builds for all templates in vmbuilder.
Infrastructure as Code at Sauce
+
Packer builders
{ "builders": [
{ "type": "virtualbox-iso", "guest_os_type": "Ubuntu_32",
"iso_checksum": "1214cd22448338b60bb24f583dd8741a","iso_url": "http://releases.ubuntu.com/14.04/...",
... }, { "type": "qemu", "format": "qcow2",
"iso_checksum": "1214cd22448338b60bb24f583dd8741a","iso_url": "http://releases.ubuntu.com/14.04/...",
... }
], ...}
Packer provisioners
{ "provisioners": [
{ "type": "shell",
"inline": ["sudo pip install ansible"] },
{ "destination": "/tmp/ansible", "type": "file",
"source": "../ansible" }, {
"type": "shell", "inline": ["cd /tmp/ansible && ansible-playbook -c local -i inventory chef.yml"]
} ],
...}
Building an LXC image is as simple as:
rm -rf output-lxcPACKER_CONFIG=/etc/packer.conf packer build -only=lxc ./packer.json
Building a QEMU image:
# Parse command line arguments.# ...
# Remove output directory in case we get killed.trap "rm -rf ${OUTDIR}; exit" SIGHUP SIGINT SIGTERM
# Remove any previous build.rm -rf ${OUTDIR}
# Build image.packer build -var basename=${NAME} -only=qemu ./packer.json
# Convert image to desired format.# ...
# Jenkins has problems transferring large images when they are larger than# 8GB. Split it up into smaller chunks.# ...
Long image builds
● Relying on a new image to be built for any change means you want
to minimize image build times
● CI infrastructure you can scale out is key
● Especially VMs might take a long time
● We split our longest running builds into multiple Jenkins jobs
○ Install base OS
○ Configure system and application(s) in the image
● Jenkins makes it easy to create build pipelines
Deploying images
● Central artifact store (Jenkins)
● Images have unique build numbers
● There are several hundred hypervisors in the Sauce Cloud
● We deploy images to hypervisors in smaller batches via ansible
● The control plane for the Cloud tells hypervisors which image to
boot -> easy to roll back
Tools recap
Jenkins
CI software, also our artifact (image) store
Ansible
Automation and configuration management
Packer
Building images from a common configuration for different backends
LXC
Linux kernel level containment library and tools
QEMU/KVM
Full virtualization solution for Linux with hardware acceleration
Runtime temporary storage
VM
Base image
CoW image
snapshot
reads unchanged blocks
reads/writes changed blocks
Runtime temporary storage
● Images are immutable, VMs are always started from this clean state
● Temporary storage is provided on the hypervisors via per-VM copy on
write images, snapshotted from the immutable image
● For containers, we use aufs for CoW
● Assets created during tests are uploaded to S3
● When job ends, the VM and its CoW image are destroyed
Testing
Repo PRInventory
tests
Playbook tests
Role tests
Join
Testing
Testing
Testing
● For end to end testing , we have a main integration build
● Several thousand Selenium tests - we’re eating our own dogfood
● No continuous delivery for automatic image deployment into
production
TL;DR
Containers are cool. VMs are also cool. Both of them have their use cases.
We use continuous integration for building immutable VM/container images for our cloud.
Images are built on Jenkins in a fully automated fashion. Long builds are split up into multiple jobs and chained together.
Testing is key. Our infra codebase (vmbuilder + sauce-ansible) is tested for any change. We test both our ansible codebase via unit tests and integration tests, and the image artifacts via end to end tests.
Packer is a great tool; you create your image template once, and can use various builders to produce an output image for many different cloud providers and virtualization solutions.
Questions?