Upload
dangkhuong
View
215
Download
0
Embed Size (px)
Citation preview
The good part The less good part The solution
(RMLL 2010) Nagios R© and largeenvironments monitoring
Gabes Jean
RMLL 2010, Bordeaux, July 7
1 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
<SolutionLinux> Nagios R© is the industry standard in systemmonitoring </SolutionLinux>
2 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
In real life
Nagios R© is a daemon that schedule checks and reacts to theresults like send an email or try to correct the problem.(And that’s all!)
3 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
Simple Nagios R© daemon
4 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
Green IT slide
5 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Size problems
For 500 servers and 7000 services ⇒ 60K lines
”A good computer scientist is lazy”
sysadmins are (very) good computer scientists
need to factorize the configuration
6 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Size problems
For 500 servers and 7000 services ⇒ 60K lines
”A good computer scientist is lazy”
sysadmins are (very) good computer scientists
need to factorize the configuration
6 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Size problems
For 500 servers and 7000 services ⇒ 60K lines
”A good computer scientist is lazy”
sysadmins are (very) good computer scientists
need to factorize the configuration
6 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Simple inheritance
Got templates that hosts and services can inherit from.
7 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Simple inheritance example
Template
name web_tpl
register 0
check_command check_http
contact_groups AdmWeb
Defined service
use web_tpl
service_description Http
host_name web_1
Service in memory
service_description Http
host_name web_1
check_command check_http
contact_groups AdmWeb
8 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Not enough
What about crossed criterions?
GNU/Linux - Windows ⇒ check commandfile server - web server ⇒ contactsprod - qualif ⇒ notification period
define 23 = 8 templates ?
change 1 parameter ⇒ 4 modifications !
9 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Multiple inheritance
An element can inheritated from numerous templates
First find value is taken
10 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Multiple inheritance
Defined host
define host{
host_name web_1
address 209.85.227.104
use Linux,Web,Prod
}
Host in memory
define host {
host_name web_1
address 209.85.227.104
check_command check_tcp!22
contacts AdminWeb1
notification_period 24x7
}
11 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
OGM inside (implicit inheritance)
Why re-defined a service property already set in the host?
Void service parameter will be filled with host value
Less service duplications
Available for notification part only :(
12 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
OGM inside (implicit inheritance)
Why re-defined a service property already set in the host?
Void service parameter will be filled with host value
Less service duplications
Available for notification part only :(
12 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Implicit inheritance
Hosts defined
define host{
host_name web_1
contacts AdmWeb
}
define host{
host_name file_1
contacts AdmFile
}
Defined service
define service{
host_name web_1,file_1
check_command check_tcp!22
#contacts none
}
13 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Implicit inheritance
Services in memory
define service{
host_name web_1
check_command check_tcp!22
contacts AdmWeb
}
define service{
host_name file_1
check_command check_tcp!22
contacts AdmFile
}
14 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Service on host groups
Apply services on host groups instead on hosts one by one
Less errors
Far less configuration lines!
15 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The good part
Number of lines at the end
All theses inheritances can be used togethers
From 60K lines to 7K
Lazy admin is happy :)
16 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Distributed environnements
When one Nagios R© is not enough (more than 10k checks in 5minutes)
Add more daemons :)
But some difficulities about configuration and data management
More hacks than real solutions
17 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Distributed environnements
When one Nagios R© is not enough (more than 10k checks in 5minutes)
Add more daemons :)
But some difficulities about configuration and data management
More hacks than real solutions
17 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Manual management
18 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
DNX : distributed checks
19 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
NDO : all data in database
Warning : NDO is no more maintained
20 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Centreon : more easy to manage
21 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
High Availability
Murphy’s law is the only stable thing in the world...
The monitoring solution must be as available as the best monitoredelement
22 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Active/active
23 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
Active/passive
24 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
And with distributed...
Manual management : even harder :)
DNX : do not change a lot
NDO : Quite hard (must drop connexions)
Centreon : not managed (from now)
25 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
A not so happy world
Technical difficulties to manage huge environnements
Code hard to hack
Serious concurrents (Zenoss, Zabbix,...)
Some problems with the Nagios R© copyright and the community
Project from Open Source to Open Core ?
Already fork (Icinga), but no real changes
26 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
A not so happy world
Technical difficulties to manage huge environnements
Code hard to hack
Serious concurrents (Zenoss, Zabbix,...)
Some problems with the Nagios R© copyright and the community
Project from Open Source to Open Core ?
Already fork (Icinga), but no real changes
26 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
A not so happy world
Technical difficulties to manage huge environnements
Code hard to hack
Serious concurrents (Zenoss, Zabbix,...)
Some problems with the Nagios R© copyright and the community
Project from Open Source to Open Core ?
Already fork (Icinga), but no real changes
26 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The less good part
A not so happy world
Technical difficulties to manage huge environnements
Code hard to hack
Serious concurrents (Zenoss, Zabbix,...)
Some problems with the Nagios R© copyright and the community
Project from Open Source to Open Core ?
Already fork (Icinga), but no real changes
26 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Easy as in marketing cloud slides
27 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How does it work?
No more monolitic : Unix Way!
Arbiter : read, cut and dispatch configuration
Schedulers : schedule.. and that’s all!
Pollers : get checks from schedulers and launch thems
Reactionner : launch notification and event handlers
Broker : grab all data and export thems
28 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
A little illustration?
29 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
A little illustration?
30 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
A little illustration?
31 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
A little illustration?
32 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
A little illustration?
33 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What if a node die?
34 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What if a node die?
35 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Example of mixed architecture
GNU/Linux vs Windows
36 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
And for DMZ
LAN vs DMZ
37 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Multisites? Tempest in the clouds!
Asia poller take a check for a US host from an Eu scheduler...
Poller of a customer A take job of a scheduler of the customer B
Multisites? Ok if still one admin/data place
Realms
38 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Multisites? Tempest in the clouds!
Asia poller take a check for a US host from an Eu scheduler...
Poller of a customer A take job of a scheduler of the customer B
Multisites? Ok if still one admin/data place
Realms
38 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Multisites? Tempest in the clouds!
Asia poller take a check for a US host from an Eu scheduler...
Poller of a customer A take job of a scheduler of the customer B
Multisites? Ok if still one admin/data place
Realms
38 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What are Realms?
See it like a ressource pool where to put host or host groups
Way to cut by organizations
Can have a hierarchy
Still one Arbiter
39 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What are Realms?
See it like a ressource pool where to put host or host groups
Way to cut by organizations
Can have a hierarchy
Still one Arbiter
39 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What are Realms?
See it like a ressource pool where to put host or host groups
Way to cut by organizations
Can have a hierarchy
Still one Arbiter
39 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
What are Realms?
See it like a ressource pool where to put host or host groups
Way to cut by organizations
Can have a hierarchy
Still one Arbiter
39 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Distrinct realms
40 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Global realm
41 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Imagine...
42 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
One answer
43 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Project with a strange history
Start as a proof of concept for Nagios R©
Huge perf! More than 150K checks in 5 minutes
Multiplatform (everywhere Python runs!!)
Too far to stop
THE proposal(s)
THE .. no... (Python was too much?)
44 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Module export?
NDO : MySQL/Oracle for Centreon
Merlin : for Ninja
Status.dat : if you still use the old CGI
LiveStatus : for Thruk
Log : all logs in one file
CouchDB, sqlite : for... nothing in fact :)
45 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How to consider it?
Not a fork
New monitoring tool compatible with Nagios R© configuration andplugins
46 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How to test it?
(easy) install from source
(lazy) get the demo VM (with Ninja and Thrunk interface)
http://www.shinken-monitoring.org
Thanks to Monitoring-fr.org for the VM hosting :)
47 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How to test it?
(easy) install from source
(lazy) get the demo VM (with Ninja and Thrunk interface)
http://www.shinken-monitoring.org
Thanks to Monitoring-fr.org for the VM hosting :)
47 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
How to test it?
(easy) install from source
(lazy) get the demo VM (with Ninja and Thrunk interface)
http://www.shinken-monitoring.org
Thanks to Monitoring-fr.org for the VM hosting :)
47 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Monitoring-fr, was Nagios-fr
Remember the R© ?
Hint : do not joke with Nagios R© trademark
New start for the community (OpenNMS, Zabbix,...)
Come see our demos :)
48 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Monitoring-fr, was Nagios-fr
Remember the R© ?
Hint : do not joke with Nagios R© trademark
New start for the community (OpenNMS, Zabbix,...)
Come see our demos :)
48 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Monitoring-fr, was Nagios-fr
Remember the R© ?
Hint : do not joke with Nagios R© trademark
New start for the community (OpenNMS, Zabbix,...)
Come see our demos :)
48 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Monitoring-fr, was Nagios-fr
Remember the R© ?
Hint : do not joke with Nagios R© trademark
New start for the community (OpenNMS, Zabbix,...)
Come see our demos :)
48 / 49(RMLL 2010) Nagios R© and large environments monitoring
N
The good part The less good part The solution
The solution
Questions
(Big thanks to my wife for the diagrams)
Questions?
Examples :
Why Python?Why the Affero GPL license?Why this name Shinken?Vi or Emacs?
49 / 49(RMLL 2010) Nagios R© and large environments monitoring
N