44
Cheffing Etsy Do too many cooks really spoil the soup?

Cheffing Etsy - Do too many cooks spoil the soup?

Embed Size (px)

Citation preview

Cheffing Etsy

Do too many cooks really spoil the soup?

@jonlives

Jon Cowie

Staff Operations Engineer

@jonlives

So what is Chef, anyway?

@jonlives

What is Chef?

• Desired State Configuration Management • Thin Server

• Datastore, API and Search • Thick Client

• Does all the work!

@jonlives

Chef Vocabulary Primer

• Node • Your server, state saved on Chef server

• Cookbook • Main (versioned) artefact type in Chef • List of recipes and other stuff

@jonlives

Chef Vocabulary Primer - Continued

• Environment • A list of cookbook version constraints

• Knife • A CLI interface to Chef Server • Extensible with Plugins

@jonlives

There is no magic pill.

@jonlives

You are the expert.

@jonlives

Chef at Etsy

@jonlives

Chef at Etsy

• Chef Server

• ~2000 Nodes

• CentOS, some Mac OS X

@jonlives

Chef at Etsy - Continued

• Everything from OS to “Below Code” • Code deployed using “Deployinator” • Single git repository

• 2 sources of truth… • So far, so normal!

@jonlives

Chef at Etsy

@jonlives

Chef at Etsy - Continued

• Translates to ~35 deploys per day • What exactly is a Chef deploy? • Updating “production” version constraint! • Many less-experienced users

@jonlives

Cookbook Workflow

@jonlives

$> review -r jcowie --cc ops

@jonlives

Push Change

• knife-spork • Helps multiple chefs avoid clashing • Visibility into changes • Plugins

@jonlives

Push Change - Continued

• knife spork bump • knife spork upload • Test change* • knife spork promote --remote • git commit and push

@jonlives

Test Change

• Move node to unconstrained environment

• knife node flip foo.etsy.com testing

• knife role flip MyRole testing

@jonlives

Downsides of Existing Approach

• No unit tests… • Holding cookbook in testing is blocking • Testing env affects all cookbooks • “Upgrade” envs often used • How to make it more “Etsy”?

@jonlives

chef-whitelist

• Driven by JSON data

• Cookbook library

• Feature flags!

@jonlives

chef-whitelist{ "id": "php-5-5-17", "patterns": [ "statsd*.ny5.etsy.com", "deploy*.ny5.etsy.com", <snip> ] }

@jonlives

chef-whitelist

if node.is_in_whitelist? "php-5-5-17" package "php-pecl-opcache" do action :remove end end

@jonlives

Monitoring & Debugging

@jonlives

knife-spork & CI Job

<irccat> CHEF: Jon Cowie uploaded [email protected] <irccat> CHEF: Jon Cowie promoted [email protected] to production <snip> <irccat> Git PUSH -> Sysops/chef <snip> <Jenkins> Starting build #5649 for job chef-server-git-sync <Jenkins> Project chef-server-git-sync build #5649: SUCCESS in 2 min 36 sec: http://ci.etsycorp.com/job/chef-server-git-sync/5649/

@jonlives

IRC Handler<irccat> Chef run failed on officebackup01.office.etsy.com gist failed, see /var/log/chef/client.log on the host

<irccat> Still Failing on dbnest01.ny4.etsy.com since 2 days ago https://github.etsycorp.com/gist/656d8914fbef5a6bd9aa

@jonlives

“Lastrun" Data%  knife  node  lastrun  dbnest01.ny4.etsy.com  Status                  failed                                        Elapsed  Time          29.055892                                  Start  Time              2014-­‐10-­‐06  12:54:51  +0000  End  Time                  2014-­‐10-­‐06  12:55:20  +0000  

<snip>  

Exception  <snip>  Installed  package  backupd-­‐1.4-­‐1.365657d.el5.centos  is  newer  than  candidate  package  backupd-­‐1.2-­‐1.99ddb8e.el5  

@jonlives

Dashboards

@jonlives

Links - Workflow Tools• https://github.com/jonlives/knife-spork • https://github.com/jonlives/knife-flip • https://github.com/jgoulah/knife-lastrun • https://github.com/etsy/chef-whitelist

@jonlives

Links - Monitoring• https://github.com/etsy/chef-handlers • https://github.com/etsy/dashboard • https://github.com/bmarini/knife-inspect

@jonlives

So, how’s it all going?

@jonlives

Some Pain Points

• Change Clashes • Confusion over state of changes • People forget things • Testing pains

@jonlives

We can rebuild it. We have the technology.

@jonlives

(201)6 Million Dollar Workflow

• Deployinator-based workflow • Push queue • Unit tests • “try” based testing • More like existing CD workflows

@jonlives

Watch this space!

@jonlives

http://jonliv.es/book

Discount Code: AUTHD

40% off Print 50% off Digital

@jonlives

And now, a brief rant…

38

“Before I [tweet|open source|go to an event], I

first have to consider my personal safety.”

39

40

We also have the privilege to say THIS IS NOT OK!

41

If we don’t speak out, our inaction says “We see

nothing wrong with this.”

42

“Assuming that an arbitrary woman wants to do the work of educating you about sexism is not the most effective choice, any more than assuming any

random open-source contributor wants to provide tech support for you, on demand,

on your timeline.”

http://bridgetkromhout.com/blog/2015/05/31/let-me-google-that-for-you/

43

“Seems Hard? It is. Welcome to being a

minority with an opinion. Now, do the work.”

http://www.catehuston.com/blog/2015/07/08/pitfalls-for-men-talking-about-diversity/

@jonlives

Thanks! Questions?

@jonlives / http://jonliv.es / [email protected]