A Tale of Two Workflows - ChefConf 2014

Preview:

DESCRIPTION

Watch this talk here: https://www.youtube.com/watch?v=L__8o02od6Q For an example of the code we used in our CI pipeline to make a Chef Environment from a Berksfile.lock - check out this project: https://github.com/petecheslock/berks2env One of the biggest advantages of Chef is it's flexibility, allowing you to customize it at-will to fit your infrastructure needs. While this makes Chef incredibly powerful, it can also be challenging to develop a workflow to manage the day-to-day usage of chef. Should I use a single repo for all my cookbooks? One cookbook per repo? Berkshelf? Librarian? Test-Kitchen? Where does Jenkins(CI) fit it? What about Testing? How does this work with my small team? What about my large team? What about my * Distributed Team? Over the past few years I have been a part of two distinct Chef workflows that take opposite paths about how to solve issues around collaboration, versioning, testing, etc. During the course of this talk I will share: Details about the requirements that lead us down these 2 paths. What worked. What didn't. How we use many of the tools available to safely test code changes. How we deploy cookbook changes safely and quickly (and keep uptime our highest priority).

Citation preview

A Tale of Two Workflows

Pete Cheslock @petecheslock

Age of Wisdom?

Age of Foolishness?

Who Am I?

Pete Cheslock

Currently - Rabble Rouser at Dyn

!

Previously at Sonian - one of the very early Opscode Chef™ Customers (probably?). Also Sensu.

Disclaimer

WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM.

Disclaimer

WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM.

THIS TALK MAY ANGER YOU - I’M HERE IF YOU NEED A HUG AFTERWARDS

Double Disclaimer

For the love of all that is DevOps..

Double Disclaimer

For the love of all that is DevOps..

Please don’t Cargo Cult this.

What do you do here?I’m a people person - I swear.

Biases rule everything around me

Chef

The cause of... and solution to... all of life's problems.

Environments

Databags

Roles are good

Roles are bad

WTF is a Berkshelf ?

Librarian?

Chef Server

Chef Zero

Vagrant-Berkswhat?Hosted Chef

LWRPsDon’t Use Definitions!

Definitions are Awesome!

Pick Your Poison

SonianFounded 2008

2008 AWS Startup Challenge Finalist

I joined in 2009

Very early Chef user - Originally with Puppet (before Opscode existed)

Pre-Databags, Roles, etc, etc.

Massive growth in short time - reaching 100’s of TB’s of ElasticSearch and well over a PB of S3 Storage.

https://github.com/opscode/chef-repo

.chef/knife.rb

cookbooks

data_bags

environments

roles

Soon - business started to pick up - very quickly.

Speed picked up, things moved fast and we broke stuff

Soon - business started to pick up - very quickly.

Speed picked up, things moved fast and we broke stuff

To close some deals we had contracts signed that would limit when we could push changes to the systems.

Customer A: HEAD

sonian/chef-repo:master

Customer B: fd50a5c

Customer C: sonian/chef-repo:tag-v0.1.1

a1add77

Customer A: HEAD

sonian/chef-repo:master

Customer B: fd50a5c

Customer C: sonian/chef-repo:tag-v0.1.1

HEAD

Now imagine that scenario with 20 environments - Each environment living either on AWS, Rackspace Cloud, HP Cloud or IBM “SmartCloud”

Each environment has a different contracted deployment schedule.

I know what you are thinking - system changes aren’t a “deploy” - well next time I’ll bring you to meet with the lawyers on that.

How did this work in practice?

In the past we’d push a small change to Prod - everything would break terribly. Lots of technical debt - scenarios that no one could ever believe could happen

This is email archiving - in some cases customers would have mail forwarded to us via their mail server. We CAN NOT drop that mail. If they are audited and we are proven to be missing data - that is really, really bad. Srs super bad.

We liked our single Chef-repo

Every Story had Branch- and we got into the cycle of commit, merge, push and test

Represented our pre-prod environments as branches in git - using some internal tooling to manage.

eng-9999HEAD (master)

QA (Daily)

Dev (Daily)

Cut a new branch from Master

Developer adds commits and test

locally

Developer merges to dev branch for dev

testing

If things “work” and nothing breaks -

merge to QA

If it passes regression testing - merge into master

(with others)

• roles/stack.rb • base.rb • nonprod.rb • cloud.rb (ec2, rackspace)

• roles/application.rb • application.rb • service.rb • etc.rb

“Hold on a minute. I’m just going to push this small

change to this one role.”

It’s roles all the way down

We got burned all the time.

“Move Fast and Break Everything”

Needed something that worked for today & the future

Let’s create a Git branching strategy!

Wut?

I know.

Seriously. I know.

We were trying to answer this one question.

“How do you version the cookbooks, roles, and databags as one singular asset.”

release/2011-08-01

release/2011-07-01

master (HEAD)

release/2011-08-01

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

release/2011-08-01

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

QA

New code constantly hitting master

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

QA

New code constantly hitting master

Checkout a branch from the Base Tag

Merge code into Release branch

Merge into master if you want it to advance

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Make individual commits and Cherry-pick forward

Cut a new branch for the release

At the same time create a base/

release tag.

base/2011-08-01

release/2011-08-01 eng-9999

base/2011-07-01

release/2011-07-01

master (HEAD)

Cut a new branch for the release

At the same time create a base/

release tag.

Rebase & Squash commits branches

Backwards

That sounds overly complex

We has some git experts - and it leveled up all our game.

Extensive tooling around our branching strategy.

We were Release Engineering.

https://github.com/sniperd/mise-en-place

So What Happened?

It actually worked.

Not only that - it really worked well.

20+ Stacks, upgrading 4 per night (6pm to 12pm if you are lucky)

Before “Deploy Week” - we deployed all the time - and things broke all the time.

Over the course of about 12 months we went from:

Deploy whenever - things break randomly (little testing)

Create a multi-page deploy checklist of mostly manual items

“Deploy Week” - 20 Stacks over 5 days (6pm to 12am - hopefully)

“Deploy Day” - 20 Stack over one night - 6pm to 9pm

“Deploy Day” - Saturday (contracts) - Best time was 20+ stacks ~1 hour

Deploys were drama free

They were drama free because we tested all the pieces that changes together. And not just unit and integration testing, but full on regression testing and user acceptance testing.

DataBags, Roles, Cookbooks, Application Code - It all moved together.

Tooling was built to support the support team (who eventually did the deploys)

High communication and tight teamwork allowed this to work.

“If I could do it all over again I would do it very differently”

Dyn Incorporated in 2001, Dyn’s global presence services more than four million enterprise, small business and personal customers.

We specialize in Traffic Management & Message Management

I joined early in 2013 to run the System Automation and Release Engineering Team

(We call it DevTools)

There is always technical debt in the banana stand

ChefCFEngine

PuppetNIH

Develop a pipeline that allows for simple usage by plugging it into a CI system for automated testing and deployment.

!

But the hardest challenge is that change is dangerous. It’s even more frightening when you have a MASSIVE chunk of the internet depending on you to stay running ALL THE TIME.

Do it w/o taking down the internet

If we don’t build in the necessary gates and levers to allow for lots of testing and controlled deploy options out to our edge systems, bad things can happen.

Scope of bad

Scope of bad

Scope of bad

Scope of bad

Scope of bad

Scope of bad

Initial Challenges

We have lots of FreeBSD

Change is hard - especially to unknown systems.

We really wanted to deploy a solution that was going to bring in Zero Dependencies.

I heard you like FreeBSD…

Now that FreeBSD problem is solved - we were able to start deploying Chef out to all our nodes.

We created a role[base] - which includes a run list of items of things we wanted in place.

About a month later or so - we wanted to push a change to that role - at the same time it was linked to some specific cookbook versions.

So basically we wanted a versioned run list - but we also want to set and override some attributes also.

So we decided to move our roles (since we were not using them much yet) and just focus on using wrapper recipes.

The bonus here is that any person can just clone a cookbook - and run Test-Kitchen & Serverspec on that “role” to get a node just like it. No dealing with roles from other cookbooks.

Roles vs. No Roles

The wrapper recipe idea made sense to us because we wanted to make sure that when we used community cookbooks - we never edited them. So for example we have a dyn_ci recipe which wraps the functionality inside of the Jenkins recipe.

When Jenkins updates from 1.0 to 2.0 - we simply update and refactor our wrapper cookbook and set the version constraint in the metadata as appropriate.

Circular Dependency

We use the default chef-full template and it has a section that looks like this:

!

Where are most community cookbooks stored? github.com & community.opscode.com. Who does their DNS? You see where we are going.

So - we created a new organization on our Enterprise Chef Server - called the cookbook repo, where we stored community cookbooks we used.

Later we moved those to Github Enterprise locally for 2 reasons.

1. It allowed anyone to easily see which cookbooks we already had locally.

2.It allowed us to run short time forks of those cookbooks while we pushed the changes upstream to the owner. (and for people to see those changes.

Remove the humans from the equation

!

Foodcritic, chefspec, rubocop, serverspec

thor-scmversion to automate versioning and git tagging.

Run will execute - if the tests pass - thor will version based on #patch, #minor, #major

So we try to speed up the iteration to master

So - now the development cycle looks like

User cuts a branch - makes changes - runs tests locally (we hope) - then submits a pull request.

Jenkins tests the PR - if good - report back to GH:E with Green.

When merged - Jenkins runs the tests again - if they pass then Jenkins will tag the release and upload it to the cookbookrepo.

Development Deployment

How has this worked?

We are the product owner

On-Demand support internally

Training

Mentoring

All new apps come with cookbooks.

They even come with tests. (Yay!)

Test Kitchen and Berkshelf for our local development and deploy

github.com/dyninc/cookbookapi

So we built our own cookbook api to use (with Berks 2) that let us use our own site with our own cookbooks (and the community cookbooks in our site repo)

So how do you get it to production?

So - the requirements were such that we wanted a few thing

Easily be able to deploy to a single node in a site

Easily be able to deploy to a single node in many sites

Easily be able to deploy to a single node in every site

Easily be able to deploy to a single node in a region

Easily be able to deploy to a single node in many sites

…… you get the point. EVERY POSSIBLE DEPLOY SCENARIO.

Represent state of chef org in Git

Act as single source of truth

Have Jenkins manage the upload of those cookbooks to prod

Ensure the environment locks those cookbooks explictly

So, i already told you we didn’t use roles because we really wanted to be able to version the run list (many people other than us could be touching that).

We have thor-scmversion auto bumping the versions of cookbooks (and freezing on upload to the package server) As one does.

We knew that when we ran node in production - we want it in an environment with very specific cookbook version locks.

And we wanted those environment to be immutable. Created and uploaded in an automated way.

We’ve been using thor-scm for versioning our cookbooks - why not our servers too?

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_1251_4_124-alpha_1

app-2

app-11_4_LATEST

Virtual Real

=

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3

Virtual Real

=

1_5_LATEST1_5_0

1_4_123

1_4_LATEST1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3

Virtual Real

=

1_5_LATEST1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

=

1_5_LATEST1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

=

1_5_LATEST 1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

=

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

7db580b Deploy dyn_myface v2.0.0 #minor

=

1_5_LATEST 1_5_0

1_4_123

1_4_LATEST 1_4_125

1_4_124-alpha_1

app-2

app-11_4_LATEST

=

6ead49d Deploy dyn_myface v1.0.3d6b0b7e Deploy dyn_myface v1.0.3 to all #patch

Virtual Real

7db580b Deploy dyn_myface v2.0.0 #minor

=

Limited allow list for deploy

Anyone can propose a change to production - but the ops team will approve those changes. (for #patch or greater that is)

The same workflow applies to pre-release environments.

Databags?

Since we version all of our cookbooks using Thor-scmversion

And we do the same with chef environments.

And we need lots of flexibility with our code deployment process due to the nature of the system

We built a tool that allows us to version our databags for deploy. https://github.com/Vanders/knife-databag-version

Version your databags?

Seriously - what is wrong with you?

We use databags pretty sparingly - mostly just encrypted databags for shared secrets and other info.

Our engineers ask us for the flexibility - we build the tools. The tools enable the workflow.

What’s this all look like?

assume we have a simple data bag item:

with knife data bag version this becomes a template:

knife data bag version can then create a JSON file using this template:

knife data bag version will emit a JSON file:

All managed by Jenkins - hands off for the developer

Databags the same as cookbooks - and allow for more flexible deploy options for us.

We still use standard databags - this is just another lever to pull

Room for improvement?

#minor and #major

Site to abstract changing cookbook versions.

Upload cookbooks early - control with environment version locks

Thank You

Pete Cheslock

petecheslock@gmail.com

@petecheslock

Thank You

Pete Cheslock

petecheslock@gmail.com

@petecheslock

Recommended