32
© MIRANTIS 2013 PAGE 1 © MIRANTIS 2013 Scaling Puppet Deployments Matthew Mosesohn Senior Deployment Engineer

Matthew Mosesohn - Configuration Management at Large Companies

  • Upload
    yandex

  • View
    1.767

  • Download
    3

Embed Size (px)

DESCRIPTION

Right from the PuppetConf, which gathered a lot of engineers at San Francisco, Matt will pass the experience of configuration management at big companies. Of course, with his own opinion and criticism, which you are welcome to discuss.

Citation preview

Page 1: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 1© MIRANTIS 2013

Scaling Puppet Deployments

Matthew MosesohnSenior Deployment Engineer

Page 2: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 2

Configure by hand

● Insert media into system● Install OS● Install software● Configure software● Verify● Done?

Page 3: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 3

Automate

● PXE installation

– Imaging– Cobbler– Foreman– Razor

● Configuration

– Puppet– Chef– Salt– Ansible

Page 4: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 4

Puppet

● Powerful tool written in Ruby

● Extensible

● Built in syntax checking

● Large community

● Used in many major companies, including:

– Google– Cisco– PayPal– VMWare

Page 5: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 5

Our purpose

● FUEL is a tool designed to deploy OpenStack

● FUEL consists of:

– Astute: Orchestration library built on Mcollective– Library: Puppet manifests– Web: Python web app to deliver a rich user experience– Cobbler: provisioning of bare metal– Bootstrap: lightweight install environment for node discovery

Page 6: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 6

Tiny example

● 1 master Cobbler and Puppet server● 2 node OpenStack cluster● OS deployment: 5 minutes● Puppet configuration: 15 minutes each● Total time: ~40 minutes

Page 7: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 7

Typical example

● 1 master Cobbler and Puppet server● 10 node OpenStack cluster● OS deployment: 30 minutes total● Puppet configuration: 15 minutes each● Total time: ~2hr 45min

Page 8: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 8

Stretching the limits

● 1 master Cobbler and Puppet server● 100 node OpenStack cluster● OS deployment: ?? minutes total● Puppet configuration: 15 minutes each● Total time: Maybe 24 hours?

Page 9: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 9

How to get to 1,000?

● Physical limitations of physical disks● Physical limitations of network● Puppet limitations● Cobbler limitations● Messaging/orchestration limitations● Durability/patience of client applications

Page 10: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 10

Approach: Scale the server!

● Pure speed. Don't care about anything else.● Buy expensive system with 2 SSDs in RAID-0, 12

cores, 256GB memory, and bonded NICs● Peak I/O: ~800MB/s

Page 11: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 11

How crowded is your network segment?

● More than 500 nodes on one network is bad● Broadcast traffic will hinder normal traffic● One lost packet means TFTP must fail and start

over● Make a second network and set a DHCP relay● Update your PXE server's DHCP configuration

Page 12: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 12

err: Could not retrieve catalog from remote server: Connection 

refused ­ connect(2)

Page 13: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 13

Puppet load

● Catalog compile time– 12s per node

● Serve files: 12mb each host● Receive and store 500kb report in YAML format● Store in PuppetDB

Page 14: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 14

How to avoid failure

● IPMI control of all nodes (expensive)● Orchestration that can reset a host if it gets

“stuck” along the way● Staggered approach to avoid overload on master

Page 15: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 15

How the pros do it

● Large US bank● 2 Puppet CA servers● 3 Puppet catalog masters● DNS round robin for catalog servers● 2000 hosts● Must stagger initial deployments

Page 16: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 16

Conclusion

● Not fast enough● Too much data● Still a bottleneck● Expensive hardware

Page 17: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 17

Approach: Ditch Puppetmaster!

● Still need to provision a base OS● Still need package repository● Still need to be fast● Still need to have some “brain” to identify

servers

Page 18: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 18

Speed up provisioning

● Install every nth server to serve as a provisioning mirror all in RAM

● TFTP still must come from master server, but 30 minutes of pain for bootstrap is okay

● HTTP for OS installation can be balanced via DNS round robin to each mirror

● Provision mirror hosts last

Page 19: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 19

Package repository

● YUM repository should be located close to cluster

● Mirror via Cobbler/Foreman ● Or somewhere in your organization with fast

disks

Page 20: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 20

External Node Classifiers

Arbitrary script to tell nodes what resources to install

ENC providers include:

– Puppet Dashboard – Foreman– Hiera– LDAP– Amazon CloudFormation– YAML file carried by

pigeon

Page 21: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 21

External Node Classifiers

● What they can provide:– Puppet master hostname– Environment name (production, devel, stage)– Classes to use– Puppet facts needed for installation

Page 22: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 22

Getting Puppet manifests to nodes

● How do you place manifests on a node?● Without relying on one host, pick most robust

system available

Page 23: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 23

Getting Puppet manifests to nodes

● Plain Git– Version controlled system– Widely implemented– Simple to get started– Fits into Puppet's environment structure via branches

Page 24: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 24

Getting Puppet manifests to nodes

● Puppet Librarian– Created by Tim “Rodjek” Sharpe from GitHub– Flexible manifest sources– Can specify a puppet “forge”– Can retrieve from git repositories– Dependency handling– Version specification optional– Creates a local Git repository to track changes

Page 25: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 25

Getting Puppet manifests to nodes

● RPM format– Technique used by Sam Bashton– Versioned as well– As easy to deploy as any other package– Requires clever building process

Page 26: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 26

Getting Puppet manifests to nodes

● RPM format magic– Jenkins job to take GIT code with manifests– Run puppet-lint on all puppet code– Create tarball of puppet manifests and hiera data– Wrap inside a package with a new version number– Push ready package to software repository

Page 27: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 27

Running local is better

● Deploying on great new hardware

● Faster catalog build

● No waiting for manifests or uploading reports

● No timeouts or connections refused

Page 28: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 28

What about my precious logs?!

Page 29: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 29

Rsyslog

● Scaling rsyslog requires lots of disk, but they don't have to be fast

● Rsyslog can throttle clients effectively● Clients can hold logs until server is ready to

receive● Everybody wins

Page 30: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 30

Doing the math

Stage Before After

Bootstrap OS 10min 10min (but that's okay)

Base OS provision 8hrs (10 concurrent) 30min to set up 20 mirrors25-40min to install (200 concurrent)30min to install mirrors

Puppet provisioning 10d 10hr (15min x 1000 hosts, one at a time)

45 mins for all 3 controllers, one at a time20 mins for compute nodes

Totals: 12 days 2-3 hours

Page 31: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 31

References

● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html

● http://www.masterzen.fr/2012/01/08/benchmarking-puppet-stacks/

● http://theforeman.org/manuals/1.3/index.html#3.5.5FactsandtheENC

● https://github.com/rodjek/librarian-puppet

● http://www.slideshare.net/PuppetLabs/sam-bashton

Page 32: Matthew Mosesohn - Configuration Management at Large Companies

© MIRANTIS 2013 PAGE 32

Ref commandspuppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "/e'

Time:

....

Nova paste api ini: 0.02

Package: 0.03

Notify: 0.03

Nova config: 0.10

File: 0.40

Exec: 0.56

Service: 1.39

Augeas: 1.56

Total: 11.85

Last run: 1379522172

Config retrieval: 7.73