God - Process and Task Monitoring Done Right

Preview:

Citation preview

odprocess and task monitoring

done right

Jesse Newlandjnewland.com

jesse@railsmachine.com

g

FAILWHALE NEEDSNO INTRODUCTION

Like it or not, the web is 24/7/365

But who wants to be online 24/7/365?

Sometimes, you’ve just gotta take a walk

ZOMG WHAT NOW?

Process monitoring

sudo gem install god

TomPreston-Warner

written by:

git clone git://github.com/jnewland/god_examples.git Follow along at home

The Basics

$ ruby scripts/crashy.rb Wed Jul 09 13:53:13 -0400 2008Wed Jul 09 13:53:14 -0400 2008Wed Jul 09 13:53:15 -0400 2008/Users/jnewland/src/god_examples/lib/god_test.rb:28:in `crash': Crash! (RuntimeError) from /Users/jnewland/src/god_examples/lib/god_test.rb:20:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `loop' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:15:in `initialize' from scripts/crashy.rb:4:in `new' from scripts/crashy.rb:4

#simple.god#The simplest possible watchGod.watch do |w| w.name = 'crashy' w.interval = 1.seconds w.start = 'ruby scripts/crashy.rb'

w.start_if do |start| start.condition(:process_running) do |c| c.running = false end endend

$ god -h

...

Options: -c, --config-file CONFIG Configuration file -p, --port PORT Communications port (default 17165) -b, --auto-bind Auto-bind to an unused port number -P, --pid FILE Where to write the PID file -l, --log FILE Where to write the log file -D, --no-daemonize Don't daemonize -v, --version Print the version number and exit

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

$ god -c simple.god -D[... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids[... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock[... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up'[... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:34 #10897] INFO: crashy move 'up' to 'start'[... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:34 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning)[... 20:19:40 #10897] INFO: crashy move 'up' to 'start'[... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb[... 20:19:40 #10897] INFO: crashy moved 'up' to 'up'[... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning)[... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

$ god -c simple.god$

$ god -c simple.god$ ps ax | grep ruby12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god

$ god -c simple.god$ ps ax | grep ruby12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god$ god -h...Commands: start <task or group name> start task or group restart <task or group name> restart task or group stop <task or group name> stop task or group monitor <task or group name> monitor task or group unmonitor <task or group name> unmonitor task or group remove <task or group name> remove task or group from god load <file> load a config into a running god log <task name> show realtime log for given task status show status of each task quit stop god terminate stop god and all tasks check run self diagnostic

$ god statuscrashy: up$ god restart crashySending 'restart' command

The following watches were affected: crashy$ god stop crashySending 'stop' command

The following watches were affected: crashy$ god statuscrashy: unmonitored$ god start crashySending 'start' command

The following watches were affected: crashy$ god statuscrashy: up

ControllingLeaky Processes

#leaky.godGod.watch do |w| w.name = "leaky" w.interval = 5.seconds w.start = 'ruby scripts/leaky.rb'

w.start_if do |start| start.condition(:process_running) do |c| c.running = false end end

w.restart_if do |restart| restart.condition(:memory_usage) do |c| c.above = 2.megabytes end endend

CPU Usage

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] end end

HTTP Status Codes

w.restart_if do |restart| restart.condition(:http_response_code) do |c| c.host = 'localhost' c.port = '80' c.path = '/heartbeat' c.code_is_not = %w(200 304) end end

Notifications

#email_contacts.godGod::Contacts::Email.message_settings = { :from => 'god@jnewland.com'}

God::Contacts::Email.server_settings = { :address => "smtp.jnewland.com", :port => 25, :domain => "jnewland.com", :authentication => :plain, :user_name => "god", :password => ""}

God.contact(:email) do |c| c.name = 'jesse' c.email = 'jnewland@gmail.com'end

#http://github.com/mojombo/god/tree/master/lib/god/contacts/jabber.rbrequire 'jabber'

God::Contacts::Jabber.settings = { :jabber_id => 'bot@jnewland.com', :password => ' ' }

God.contact(:jabber) do |c| c.name = 'jesse' c.jabber_id = 'jnewland@gmail.com'end

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] c.notify = "jesse" end end

MonitoringMongrels

Putting it all together

• Process Running

• Memory Usage

• CPU Usage

• HTTP Response Code

• Notifications

• Capistrano?

• Web Interface?

#rails/config/god/app.godRAILS_ROOT = ENV['RAILS_ROOT'] ||= "/var/www/apps/test/current"RUBY = `which ruby`.chompMONGREL_RAILS = `which mongrel_rails`.chompRAILS_ENV = ENV['RAILS_ENV'] ||= 'production'MONGRELS = 2MONGREL_START_PORT= 3000USER = GROUP = 'deploy'

0.upto(MONGRELS-1) do |n| port = MONGREL_START_PORT+n God.watch do |w| w.group = 'mongrels' w.name = "mongrel_#{port}" w.uid = USER w.gid = GROUP w.interval = 30.seconds w.start = "#{RUBY} #{MONGREL_RAILS} start --environment #{RAILS_ENV} --chdir #{RAILS_ROOT} --port #{port}" w.start_grace = 90.seconds w.restart_grace = 90.seconds w.log = File.join(RAILS_ROOT, "log/mongrel_#{port}.log")

#process running

#memory usage

#cpu usage

#http response code enddo

class PulseController < ApplicationController session :off def pulse if (ActiveRecord::Base.connection.execute("select 1").num_rows rescue 0) == 1 render :text => "OK #{Time.now.utc.to_s(:db)}" else render :text => 'ERROR', :status => :internal_server_error end endend

Pulse Controller

Capistrano

#rails/config/deploy.rbrole :app, "test.jnewland.com"

require 'san_juan'san_juan.role :app, %w(mongrels)

#overwrite the default start, stop, and restart tasks to use godnamespace :deploy do

desc "Use god to restart the app" task :restart do god.all.reload god.app.mongrels.restart end

desc "Use god to start the app" task :start do god.all.start end

desc "Use god to stop the app" task :stop do god.all.terminate end

end

$ cap -T

...

cap god:all:quit # Quit god, but not the processes it's monitoringcap god:all:reload # Reloading God Configcap god:all:start # Start godcap god:all:start_interactive # Start god interactivelycap god:all:status # Describe the status of the running tasks on ...cap god:all:terminate # Terminate god and all monitored processescap god:app:mongrels:log # Log mongrelscap god:app:mongrels:remove # Remove mongrelscap god:app:mongrels:restart # Restart mongrelscap god:app:mongrels:start # Start mongrelscap god:app:mongrels:stop # Stop mongrelscap god:app:mongrels:unmonitor # Unmonitor mongrelscap god:app:quit # Quit god, but not the processes it's monitoringcap god:app:reload # Reload the god config filecap god:app:start # Start godcap god:app:start_interactive # Start god interactivelycap god:app:status # Describe the status of the running taskscap god:app:terminate # Terminate god and all monitored processes

...

ZOMG WHAT NOW?

#rails/config/god/app.god

...

require 'god_web'GodWeb.watch(:port => 3003)

...

AdvancedFeatures

#jabber_bot.god w.restart_if do |restart| restart.condition(:lambda) do |c| c.interval = 15.seconds c.lambda = lambda do require 'xmpp4r-simple' im = Jabber::Simple.new( 'god@jnewland.com', PASSWORDS['god@jnewland.com'] ) im.deliver('bot@jnewland.com', 'ping') sleep(5) return true unless im.received_messages? chat = im.received_messages.find { |msg| msg.type == :chat} return true unless chat.body =~ /pong/ end end end

Lambda Conditions

#custom_behavior.godmodule God module Behaviors class Speak < Behavior

def before_start `say "Starting now"` 'announced start' end

def before_stop `say "Stopping now"` 'announced stop' end

end endend

God.watch do |w| ... w.behavior(:speak) ...end

Behaviors

#mongrel_cluster.godrequire 'lib/god_mongrel_cluster'

Dir.glob('/etc/mongrel_cluster/*.conf').each do |mongrel_cluster| cluster = GodMongrelCluster.new(mongrel_cluster) cluster.watchend

mongrel_cluster

Questions?

http://www.flickr.com/photos/stuckincustoms/522313332/http://www.flickr.com/photos/91499534@N00/2335651912/http://www.flickr.com/photos/code_martial/1411893703/http://www.flickr.com/photos/extranoise/163847669/http://www.flickr.com/photos/vanz/2480741207/http://www.flickr.com/photos/smartjunco/281071006/http://www.flickr.com/photos/davesag/8312984/http://www.flickr.com/photos/gaetanlee/298178764/http://www.flickr.com/photos/vrogy/511644410/http://www.flickr.com/photos/jeffsmallwood/299208539/http://www.flickr.com/photos/cjdaniel/2240123159/http://www.flickr.com/photos/bobbygreg/139080175/http://www.flickr.com/photos/lordelo/12958772/

Hooray Flickr! (And Creative Commons)

Recommended