Transcript
Page 1: OSMC 2014: Introduction into collectd | Florian Foster

collectdAn introduction

Page 2: OSMC 2014: Introduction into collectd | Florian Foster

About me

● Florian "octo" Forster

● Open-source work since 2001

● Started collectd in 2005

Page 3: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 4: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 5: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Daemon

● collect metrics

● mangle / transport metrics

● store metrics (no retrieve)

Page 6: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Open-source project○ MIT and GPL licensed

● Platform independent○ Linux, BSD, Solaris, AIX, HP-UX, …○ Windows via SSC Serv (non-free)

Page 7: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Agent based design○ Runs on each host

● Extensible via plugins○ Language bindings (Perl, Python, Java)○ "exec" plugin, e.g. shell scripts

Page 8: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● 95+ "read" (input) plugins

○ System metrics (e.g. CPU, memory)

○ Application metrics (e.g. MySQL)

○ Other (Xeon Phi, SNMP, OneWire)

Page 9: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● 15+ "write" (output) plugins

○ Graphite○ RRDtool○ RRDCacheD○ Riemann○ MongoDB○ HTTP (generic)

Page 10: OSMC 2014: Introduction into collectd | Florian Foster

collectd

# Input

LoadPlugin cpu

LoadPlugin memory

LoadPlugin df

<Plugin df>

MountPoint "/"

ValuesPercentage true

</Plugin>

# Output

LoadPlugin write_graphite

<Plugin write_graphite>

<Node "default">

Host "graphite.example.com"

</Node>

</Plugin>

Example configuration

Page 11: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● collectd's write_graphite plugin

○ Sends metric to Graphite○ TCP or UDP transport○ Metric names somewhat adjustable

→ Monitoring mit Graphite(15:30 in this room, German)

Page 12: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 13: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Aggregates often more useful for alerting○ e.g. sum over CPUs, minimum RTT

● Metric storage often I/O bound

● Dashboards require "sane" amount of information

Page 14: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

collectd Graphite

CPU

Disk

Memory

…Aggregation

Page 15: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Load the Aggregation plugin

● Select (filter) applicable metrics

● Group by metric type and other fields

● Aggregate functions (e.g. sum)

Page 16: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

</Aggregation>

</Plugin>

example.com/battery/percent-charged

example.com/cpu-0/cpu-idle

example.com/cpu-0/cpu-user

example.com/cpu-0/cpu-wait

example.com/cpu-1/cpu-idle

…example.com/df-root/df_complex-free

example.com/df-root/df_complex-used

example.com/df-root/df_complex-rsvd

Load the aggregation plugin

Page 17: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Selection

● Five fields usable for selection

○ Host○ Plugin○ PluginInstance○ Type (mandatory)○ TypeInstance

Page 18: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Selection

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

</Aggregation>

</Plugin>

example.com/cpu-0/cpu-idle

example.com/cpu-0/cpu-user

example.com/cpu-0/cpu-wait

example.com/cpu-1/cpu-idle

example.com/cpu-1/cpu-user

example.com/cpu-1/cpu-wait

example.com/cpu-2/cpu-idle

example.com/cpu-2/cpu-user

example.com/cpu-2/cpu-wait

Select metrics

Page 19: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Grouping

● Four fields usable for selection

○ Host○ Plugin○ PluginInstance○ TypeInstance

● One field unspecified (or more)

Page 20: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Grouping

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

GroupBy Host

GroupBy TypeInstance

</Aggregation>

</Plugin>

example.com/cpu-???/cpu-idle

example.com/cpu-???/cpu-user

example.com/cpu-???/cpu-wait

Configure grouping

Page 21: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Functions

● Up to six aggregate functions

○ Count○ Sum○ Minimum○ Maximum○ Average○ Standard deviation

Page 22: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

GroupBy Host

GroupBy TypeInstance

CalculateSum true

</Aggregation>

</Plugin>

example.com/cpu-sum/cpu-idle

example.com/cpu-sum/cpu-user

example.com/cpu-sum/cpu-wait

Select aggregate function(s)

Page 23: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Creates additional metrics

● Use chains to filter out unwanted "raw" metrics.

● Usable on client and/or server.

Page 24: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 25: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● Load the Unixsock plugin

● Query and check values with collectd-nagios

● Both come with collectd

Page 26: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

LoadPlugin unixsock

<Plugin unixsock>

SocketFile "/var/run/collectd-unixsock"

SocketGroup "collectd-nagios"

SocketPerms "0660"

DeleteSocket true

</Plugin>

Load the Unixsock plugin

Page 27: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

-> GETVAL example.com/cpu-average/cpu-wait

<- 1 Value found

<- value=8.540017+e00

Query values with the Unixsock plugin

Page 28: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● collectd-nagios queries and checks metrics

● Ranged -w (warn) and -c (critical) options

● Conforms to Icinga's best practices

Page 29: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

$ collectd-nagios -s /var/run/collectd-unixsock \

> -n cpu-average/cpu-wait -H example.com \

> -w '0:10' -c '0:25'

OKAY: 0 critical, 0 warning, 1 okay | value=8.540017;;;;

Example: collectd-nagios

Page 30: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

define command{ command_name check_cpuio_collectd command_line collectd-nagios \

-H $HOSTNAME$ \

-n cpu-average/cpu-wait \

-w $ARG1$ -c $ARG2$

}

define service{ use generic-service host_name example.com service_description I/O wait check_command \

check_cpuio_collectd!10:!5: }

commands.cfg services.cfg

Page 31: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● What's next?

○ Use "passive checks"

○ Let collectd push metrics to Icinga 2?

○ Bring on the patches!

Page 32: OSMC 2014: Introduction into collectd | Florian Foster

Thank you!

Thank you!

Page 33: OSMC 2014: Introduction into collectd | Florian Foster

Questions?

It's time for

Questions


Recommended