39
Multi-tenant Puppet Automation for everyone

PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc

  • Upload
    puppet

  • View
    130

  • Download
    1

Embed Size (px)

Citation preview

Multi-tenant Puppet Automation for everyone

JJ John Jawed, github.com./johnj Dogs, anything with an ocean

3

Gap up

Gap up

Linear

Exponential

Change function of time

2014 118,000 hosts

13,000 environments fewer puppetmasters

baremetal, VM, containers

Cha-cha-cha-changes unavoidable

happen everywhere

Oops changes does not always go according to plan

48 minutes

Goals performance & scale

policy seamless on boarding

Bottlenecks? Try giving up. capacity, abilities

paradigms (epoll vs select) insanity

Classification Catalog Reports/Facts

average puppet run 8 seconds

Classification

node_terminus = /enc_script.rb

320ms - loading gems, files, certs only 100ms for API call to ENC Optimize: ENC run time as close to 100ms as possible

Classification

paradigm shift

from exec /enc_script.rb fqdn to write fqdn to ENC workers

Classification

a little dash of bash

node_terminus = /enc_handler.sh $ cat enc_handler.sh!... !echo $1 | nc -U /unix.sock !... !

Classification

a little go go

William Kennedy’s workpool (github.com./goinggo/workpool) go server listening on /unix.sock workpool routes requests to an idle worker

Classification

exec/exit to listen/process

$ cat /enc_script.rb!… !while certname = $stdin.gets do ! enc(certname) !end !… !

Classification

PPM calls node_terminus

node_terminus writes request to socket

go handles request, workpool routes

Classification

end result

gets close to 100ms goal – 110ms CPU usage – no constant bootstrapping frees up resources, puppet master process at scale, 200ms per run adds up quickly (30 for every 60 seconds of CPU time)

catalogs

Catalog compilation – low hanging fruit, difficult

Catalog

source: http://www.isrubyfastyet.com

agents

everything is SSL, that is good everything is SSL, that is expensive use yum.puppetlabs.com. or apt.puppetlabs.com. to make sure you run 3.7+ runtime savings: 40%

Catalog

post run woes

after agent runs, the real fun begins puppetmaster and agent both wait for report processors to finish slow report collection will cause your infrastructure to fall over – some just avoid it

Reports/Facts

foreman

foreman report/fact processing – need to spread read I/O fact processing is read heavy, reports are write heavy ruby activerecord: makara postgresql: local read slaves, pg_shard

Reports/Facts

reports

4k run reports per minute using pg_shard: psql> SELECT master_create_distributed_table(table_name := ’reports', partition_column := ‘report_id'); psql> SELECT master_create_worker_shards(table_name := ‘reports', shard_count := 365);

Reports/Facts

facts

most of the workload is read I/O, kept local

facts updated immediately after puppet runs Master DB loadavg 2

Reports/Facts

Classification Catalog Reports/Facts

average puppet run 2 seconds

runinterval is not your friend

pvc

Open source, github.com./johnj/pvc Basis of orchestration in 2014

pvc

pvc.conf

pvc

host_endpoint=your.pvcbackend.com./host !!

simple is hard

“Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it’s worth it in the end because once you get there, you can move mountains.”

- Steve Jobs

Host Infrastructure

Host events

most systems have audit frameworks files (inotify) processes (audit) network puppet needs react to these events

osquery

osquery

services, files, and any resource that can be tracked as a host event event information can also be recorded (doorman, zentral, etc) event info is stored in tables (sqlite)

file monitoring

{ !"file_paths": { ! "homes": [ ! "/root/.ssh/%%", ! "/home/%/.ssh/%%" ! ], ! ”binaries": [ ! "/usr/bin/%%", ! "/sbin/%%" ! ], ! "etc": [ ! "/etc/%%" ! ], ! "tmp": [ ! "/tmp/%%" ! ] ! } !} !

Infrastructure events

code releases, package upgrades, access changes puppet needs to be told to run when these events occur

pvc and foreman

foreman’s puppetrun API to set flag pvc queries foreman to trigger run logical separation with host groups

runinterval is an after thought

puppet runs instantly when it needs to runinterval can be 3 minutes or 3 hours frees up puppet masters, allows more resources for other things your infrastructure is still kept honest

git

I pummel people with questions, because I need to know what they're thinking, what they're trying to achieve, what

they believe the final outcome is going to be. Tim Gunn