76
The good part The less good part The solution (RMLL 2010) Nagios R and large environments monitoring Gab` es Jean RMLL 2010, Bordeaux, July 7 1 / 49 (RMLL 2010) Nagios R and large environments monitoring N

(RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

Embed Size (px)

Citation preview

Page 1: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

(RMLL 2010) Nagios R© and largeenvironments monitoring

Gabes Jean

RMLL 2010, Bordeaux, July 7

1 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 2: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

<SolutionLinux> Nagios R© is the industry standard in systemmonitoring </SolutionLinux>

2 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 3: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

In real life

Nagios R© is a daemon that schedule checks and reacts to theresults like send an email or try to correct the problem.(And that’s all!)

3 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 4: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

Simple Nagios R© daemon

4 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 5: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

Green IT slide

5 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 6: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Size problems

For 500 servers and 7000 services ⇒ 60K lines

”A good computer scientist is lazy”

sysadmins are (very) good computer scientists

need to factorize the configuration

6 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 7: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Size problems

For 500 servers and 7000 services ⇒ 60K lines

”A good computer scientist is lazy”

sysadmins are (very) good computer scientists

need to factorize the configuration

6 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 8: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Size problems

For 500 servers and 7000 services ⇒ 60K lines

”A good computer scientist is lazy”

sysadmins are (very) good computer scientists

need to factorize the configuration

6 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 9: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Simple inheritance

Got templates that hosts and services can inherit from.

7 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 10: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Simple inheritance example

Template

name web_tpl

register 0

check_command check_http

contact_groups AdmWeb

Defined service

use web_tpl

service_description Http

host_name web_1

Service in memory

service_description Http

host_name web_1

check_command check_http

contact_groups AdmWeb

8 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 11: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Not enough

What about crossed criterions?

GNU/Linux - Windows ⇒ check commandfile server - web server ⇒ contactsprod - qualif ⇒ notification period

define 23 = 8 templates ?

change 1 parameter ⇒ 4 modifications !

9 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 12: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Multiple inheritance

An element can inheritated from numerous templates

First find value is taken

10 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 13: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Multiple inheritance

Defined host

define host{

host_name web_1

address 209.85.227.104

use Linux,Web,Prod

}

Host in memory

define host {

host_name web_1

address 209.85.227.104

check_command check_tcp!22

contacts AdminWeb1

notification_period 24x7

}

11 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 14: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

OGM inside (implicit inheritance)

Why re-defined a service property already set in the host?

Void service parameter will be filled with host value

Less service duplications

Available for notification part only :(

12 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 15: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

OGM inside (implicit inheritance)

Why re-defined a service property already set in the host?

Void service parameter will be filled with host value

Less service duplications

Available for notification part only :(

12 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 16: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Implicit inheritance

Hosts defined

define host{

host_name web_1

contacts AdmWeb

}

define host{

host_name file_1

contacts AdmFile

}

Defined service

define service{

host_name web_1,file_1

check_command check_tcp!22

#contacts none

}

13 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 17: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Implicit inheritance

Services in memory

define service{

host_name web_1

check_command check_tcp!22

contacts AdmWeb

}

define service{

host_name file_1

check_command check_tcp!22

contacts AdmFile

}

14 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 18: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Service on host groups

Apply services on host groups instead on hosts one by one

Less errors

Far less configuration lines!

15 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 19: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The good part

Number of lines at the end

All theses inheritances can be used togethers

From 60K lines to 7K

Lazy admin is happy :)

16 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 20: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Distributed environnements

When one Nagios R© is not enough (more than 10k checks in 5minutes)

Add more daemons :)

But some difficulities about configuration and data management

More hacks than real solutions

17 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 21: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Distributed environnements

When one Nagios R© is not enough (more than 10k checks in 5minutes)

Add more daemons :)

But some difficulities about configuration and data management

More hacks than real solutions

17 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 22: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Manual management

18 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 23: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

DNX : distributed checks

19 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 24: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

NDO : all data in database

Warning : NDO is no more maintained

20 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 25: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Centreon : more easy to manage

21 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 26: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

High Availability

Murphy’s law is the only stable thing in the world...

The monitoring solution must be as available as the best monitoredelement

22 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 27: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Active/active

23 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 28: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

Active/passive

24 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 29: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

And with distributed...

Manual management : even harder :)

DNX : do not change a lot

NDO : Quite hard (must drop connexions)

Centreon : not managed (from now)

25 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 30: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

A not so happy world

Technical difficulties to manage huge environnements

Code hard to hack

Serious concurrents (Zenoss, Zabbix,...)

Some problems with the Nagios R© copyright and the community

Project from Open Source to Open Core ?

Already fork (Icinga), but no real changes

26 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 31: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

A not so happy world

Technical difficulties to manage huge environnements

Code hard to hack

Serious concurrents (Zenoss, Zabbix,...)

Some problems with the Nagios R© copyright and the community

Project from Open Source to Open Core ?

Already fork (Icinga), but no real changes

26 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 32: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

A not so happy world

Technical difficulties to manage huge environnements

Code hard to hack

Serious concurrents (Zenoss, Zabbix,...)

Some problems with the Nagios R© copyright and the community

Project from Open Source to Open Core ?

Already fork (Icinga), but no real changes

26 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 33: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The less good part

A not so happy world

Technical difficulties to manage huge environnements

Code hard to hack

Serious concurrents (Zenoss, Zabbix,...)

Some problems with the Nagios R© copyright and the community

Project from Open Source to Open Core ?

Already fork (Icinga), but no real changes

26 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 34: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Easy as in marketing cloud slides

27 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 35: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 36: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 37: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 38: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 39: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 40: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How does it work?

No more monolitic : Unix Way!

Arbiter : read, cut and dispatch configuration

Schedulers : schedule.. and that’s all!

Pollers : get checks from schedulers and launch thems

Reactionner : launch notification and event handlers

Broker : grab all data and export thems

28 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 41: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

A little illustration?

29 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 42: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

A little illustration?

30 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 43: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

A little illustration?

31 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 44: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

A little illustration?

32 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 45: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

A little illustration?

33 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 46: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What if a node die?

34 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 47: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What if a node die?

35 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 48: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Example of mixed architecture

GNU/Linux vs Windows

36 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 49: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

And for DMZ

LAN vs DMZ

37 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 50: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Multisites? Tempest in the clouds!

Asia poller take a check for a US host from an Eu scheduler...

Poller of a customer A take job of a scheduler of the customer B

Multisites? Ok if still one admin/data place

Realms

38 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 51: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Multisites? Tempest in the clouds!

Asia poller take a check for a US host from an Eu scheduler...

Poller of a customer A take job of a scheduler of the customer B

Multisites? Ok if still one admin/data place

Realms

38 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 52: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Multisites? Tempest in the clouds!

Asia poller take a check for a US host from an Eu scheduler...

Poller of a customer A take job of a scheduler of the customer B

Multisites? Ok if still one admin/data place

Realms

38 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 53: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What are Realms?

See it like a ressource pool where to put host or host groups

Way to cut by organizations

Can have a hierarchy

Still one Arbiter

39 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 54: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What are Realms?

See it like a ressource pool where to put host or host groups

Way to cut by organizations

Can have a hierarchy

Still one Arbiter

39 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 55: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What are Realms?

See it like a ressource pool where to put host or host groups

Way to cut by organizations

Can have a hierarchy

Still one Arbiter

39 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 56: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

What are Realms?

See it like a ressource pool where to put host or host groups

Way to cut by organizations

Can have a hierarchy

Still one Arbiter

39 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 57: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Distrinct realms

40 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 58: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Global realm

41 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 59: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Imagine...

42 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 60: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

One answer

43 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 61: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 62: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 63: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 64: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 65: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 66: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Project with a strange history

Start as a proof of concept for Nagios R©

Huge perf! More than 150K checks in 5 minutes

Multiplatform (everywhere Python runs!!)

Too far to stop

THE proposal(s)

THE .. no... (Python was too much?)

44 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 67: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Module export?

NDO : MySQL/Oracle for Centreon

Merlin : for Ninja

Status.dat : if you still use the old CGI

LiveStatus : for Thruk

Log : all logs in one file

CouchDB, sqlite : for... nothing in fact :)

45 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 68: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How to consider it?

Not a fork

New monitoring tool compatible with Nagios R© configuration andplugins

46 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 69: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How to test it?

(easy) install from source

(lazy) get the demo VM (with Ninja and Thrunk interface)

http://www.shinken-monitoring.org

Thanks to Monitoring-fr.org for the VM hosting :)

47 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 70: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How to test it?

(easy) install from source

(lazy) get the demo VM (with Ninja and Thrunk interface)

http://www.shinken-monitoring.org

Thanks to Monitoring-fr.org for the VM hosting :)

47 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 71: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

How to test it?

(easy) install from source

(lazy) get the demo VM (with Ninja and Thrunk interface)

http://www.shinken-monitoring.org

Thanks to Monitoring-fr.org for the VM hosting :)

47 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 72: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Monitoring-fr, was Nagios-fr

Remember the R© ?

Hint : do not joke with Nagios R© trademark

New start for the community (OpenNMS, Zabbix,...)

Come see our demos :)

48 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 73: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Monitoring-fr, was Nagios-fr

Remember the R© ?

Hint : do not joke with Nagios R© trademark

New start for the community (OpenNMS, Zabbix,...)

Come see our demos :)

48 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 74: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Monitoring-fr, was Nagios-fr

Remember the R© ?

Hint : do not joke with Nagios R© trademark

New start for the community (OpenNMS, Zabbix,...)

Come see our demos :)

48 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 75: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Monitoring-fr, was Nagios-fr

Remember the R© ?

Hint : do not joke with Nagios R© trademark

New start for the community (OpenNMS, Zabbix,...)

Come see our demos :)

48 / 49(RMLL 2010) Nagios R© and large environments monitoring

N

Page 76: (RMLL 2010) Nagios® and large environments monitoring2010.rmll.info/IMG/pdf/RMLL2010-AdminSys-SupervisionGrandsEnviro… · R and large environments monitoring Gab es Jean RMLL 2010,

The good part The less good part The solution

The solution

Questions

(Big thanks to my wife for the diagrams)

Questions?

Examples :

Why Python?Why the Affero GPL license?Why this name Shinken?Vi or Emacs?

49 / 49(RMLL 2010) Nagios R© and large environments monitoring

N