36
Monitoring and Tuning your Chef Server Andrew DuFour and Nathan Cerny

Monitoring and tuning your chef server - chef conf talk

Embed Size (px)

Citation preview

Monitoring and Tuning your Chef ServerAndrew DuFour and Nathan Cerny

Andrew [email protected] EngineerChef Software @andrewdufour

Nathan [email protected] Team ManagerChef Software @ndcerny

The Art of Monitoring

“There is no instance of a nation benefitting from prolonged

warfare.” 

― Sun Tzu, The Art of War

Problem Statement

To make effective decisions and to effectively respond to incidents, we must have visibility into our systems.

Start Small

Simplicity > Perfection

“Everything should be made as simple as possible. But not simpler.” 

― Albert Einstein

Continuous Improvement

 kaizen

改善

Alert Fatigue

Monitor Everything

Monitoring just your Chef Server is low value.

The Science of Monitoring

“who wishes to fight must first count the cost” 

― Sun Tzu, The Art of War

What should you monitor?

Operating SystemDisk

CPUMemory

System Logs

Supporting ServicesRabbitMQ

SolrPostgreSQL

Nginx

Application Services Erchef

Bifrost

Application Logs

Tools 101• StatsD – A network daemon that runs on the Node.js platform and

listens for statistics, like counters and timers. https://github.com/etsy/statsd

• Grafana - Beautiful dashboards• TICK Stack – A series of tools that comprise the ‘Influx Data Platform’,

including an easily scalable time series database. https://influxdata.com/time-series-platform/

• Sensu - Monitoring that doesn't suck. https://sensuapp.org/

• Splunk – centralized logging, operational intelligence, big machine data tool http://www.splunk.com/

Instrumenting our Erlang Based Services

ErchefBifrost

Stats Hero Stats Hero

Statsd

Instrumenting our Erlang Based Services - StatsHero• Example metrics emitted in Statsd format:

test_hero.upstreamRequests.rdbms:1200|h

• Enabling StatsHero in your chef-server.rb:

Estatsd[‘enabled’] = true Estatsd[‘protocol’] = ‘stastd’ Estatsd[‘vip’] = ‘<statsd server>’ Estatsd[‘port’] = ‘<statsd port>’

Namespace Category MetricMeasurement

Metric Type (H=histogram)

Instrumenting our Erlang Based Services

ErchefBifrost

Stats Hero Stats Hero

Statsd

Folsom-Graphite

Graphite

Instrumenting our Erlang Based Services - Folsom Metrics• Example metrics:

pooler.chef_depsolver.in_use_count pooler.chef_depsolver.free_count pooler.sqerl.in_use_count pooler.sqerl.free_count

• Enabling folsom metrics in your chef-server.rb folsom_graphite['enabled'] = true folsom_graphite[‘host’] = ‘<your graphite host>’ folsom_graphite[‘port’] = ‘<your graphite port>’

Instrumenting our Erlang Based Services

ErchefBifrost

Stats Hero Stats Hero

Statsd

Folsom-Graphite

Graphite

Logs Logs

Log Collector

Instrumenting our Erlang Based Services – Collecting Logs• Use a full featured log collector like Splunk to centralize logs.• All of our services log into a common directory structure:

/var/log/opscode/<service name>• The two most important files within that directory are:

currenterror

• There are also request logs which repeat information available elsewhere

• All services shipped with the omnibus package, not just Erlang services, log here

Tuning

Client Side Tuning

USE THE SPLAY, LUKE!

Sometimes Ohai tuning is needed (e.g.. Centrify)

ALWAYS USE PARTIAL SEARCH!(and look at SafeSearch)

Know what a dependency graph is… and manage it.

Server Side Tuning

Almost Everything is Tunable

Chef-server.rb• https://docs.chef.io/config_rb_server.html• https://docs.chef.io/config_rb_server_optional_settings.html• https://github.com/chef/chef-server/blob/master/omnibus/files/private-

chef-cookbooks/private-chef/attributes/default.rb

• How does chef-server.rb work? The Chef servers’ reconfigure is driven by a cookbook called PrivateChef. PrivateChef is a cookbook that’s just like any other - with some helper libraries to

read your chef-server.rb, and make sense of it

• Actually tuning a setting: opscode_erchef[‘db_pool_size’] = “20”

A quick look at PrivateChefYou can see, we’re creating a new Module called PrivateChef.

The Configuration attributes are defined as new Mashes. When you say opscode_erchef[‘key’] = value, you’re truly just assigning a value to the Mash created in the PrivateChef module.

Looking at the Low Hanging Fruit

Chef Front-end Server

Bifrost

Erchef

Nginx

NginxEnable cookbook

cacheS3 URL Expiry

Bifrost

Db pooler timeout

Db pooler queue size

Authz

Db pool size

AuthzInitial Pool Count

Max Pool Count

Max Queue Size

Chef Front-end Server

Bifrost

Erchef

Nginx

Erchef

Depsolver workers

Depsolver timeout

Authz

Db pooler timeout

Db pooler queue size

Db pool size

Keygen_cache_size

Chef Back-end Server

RabbitMQ

PostgreSQL

Solr

PostgreSQL

Checkpoint Segments

Checkpoint completion target

Log min duration statement

Solr

Heap size

New size

RabbitMQ

Analytics max length

Dark launch

Max connections

More Useful Tools• PGBadger - https://github.com/dalibo/pgbadger• Monitor Postgresql: https://wiki.postgresql.org/wiki/Monitoring• How to Monitor Nginx: https://

www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide

• Pgtune - http://pgfoundry.org/projects/pgtune pgtune takes the wimpy default postgresql.conf and expands the database server

to be as powerful as the hardware it's being deployed on Be careful about shared resources, Pgtune assumes you have a dedicated Postgres

server.• GCViewer

Helps you analyze your GC activity, so you can make decisiosn on tuning. http://www.tagtraum.com/gcviewer.html

Alternatives Tools• ELK: https://www.elastic.co/webinars/introduction-elk-stack• Graylog: https://www.graylog.org/• Loggly: https://www.loggly.com/• Graphite: https://github.com/graphite-project/• Datadog - https://www.datadoghq.com/

• So many more….

Special Thanks• Irving Popovetsky and his tuning the chef server for scale blog:

http://irvingpop.github.io/blog/2015/04/20/tuning-the-chef-server-for-scale/• Mark Harrison, Paul Mooring and the Chef server team. The

dashboards are heavily based on their dashboards for hosted Chef.• Phil Dibowitz and Facebook for teaching Andrew a lot about tuning the

Chef server for scale that almost none of our other customers hit.