33
collectd An introduction

OSMC 2014: Introduction into collectd | Florian Foster

  • Upload
    netways

  • View
    260

  • Download
    5

Embed Size (px)

DESCRIPTION

Periodically measuring performance metrics of production systems allows administrators and developers to analyze system behavior during and after outages, quantify performance improvements, and detect trends and take proactive measures before problems arise. Performance metrics are also interesting for alerting, because they can be aggregated meaningfully, thereby basing an alert on a group of hosts rather than each host individually, for example. This talk will give an introduction to collectd, an open-source tool to gather, process and store performance metrics. A sample setup which aggregates a couple of metrics and stores the aggregate in Graphite will be presented. Afterwards, we will show how the collectd-nagios utility can be used to define alerts in Icinga based on this data.

Citation preview

Page 1: OSMC 2014: Introduction into collectd | Florian Foster

collectdAn introduction

Page 2: OSMC 2014: Introduction into collectd | Florian Foster

About me

● Florian "octo" Forster

● Open-source work since 2001

● Started collectd in 2005

Page 3: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 4: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 5: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Daemon

● collect metrics

● mangle / transport metrics

● store metrics (no retrieve)

Page 6: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Open-source project○ MIT and GPL licensed

● Platform independent○ Linux, BSD, Solaris, AIX, HP-UX, …○ Windows via SSC Serv (non-free)

Page 7: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● Agent based design○ Runs on each host

● Extensible via plugins○ Language bindings (Perl, Python, Java)○ "exec" plugin, e.g. shell scripts

Page 8: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● 95+ "read" (input) plugins

○ System metrics (e.g. CPU, memory)

○ Application metrics (e.g. MySQL)

○ Other (Xeon Phi, SNMP, OneWire)

Page 9: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● 15+ "write" (output) plugins

○ Graphite○ RRDtool○ RRDCacheD○ Riemann○ MongoDB○ HTTP (generic)

Page 10: OSMC 2014: Introduction into collectd | Florian Foster

collectd

# Input

LoadPlugin cpu

LoadPlugin memory

LoadPlugin df

<Plugin df>

MountPoint "/"

ValuesPercentage true

</Plugin>

# Output

LoadPlugin write_graphite

<Plugin write_graphite>

<Node "default">

Host "graphite.example.com"

</Node>

</Plugin>

Example configuration

Page 11: OSMC 2014: Introduction into collectd | Florian Foster

collectd

● collectd's write_graphite plugin

○ Sends metric to Graphite○ TCP or UDP transport○ Metric names somewhat adjustable

→ Monitoring mit Graphite(15:30 in this room, German)

Page 12: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 13: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Aggregates often more useful for alerting○ e.g. sum over CPUs, minimum RTT

● Metric storage often I/O bound

● Dashboards require "sane" amount of information

Page 14: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

collectd Graphite

CPU

Disk

Memory

…Aggregation

Page 15: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Load the Aggregation plugin

● Select (filter) applicable metrics

● Group by metric type and other fields

● Aggregate functions (e.g. sum)

Page 16: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

</Aggregation>

</Plugin>

example.com/battery/percent-charged

example.com/cpu-0/cpu-idle

example.com/cpu-0/cpu-user

example.com/cpu-0/cpu-wait

example.com/cpu-1/cpu-idle

…example.com/df-root/df_complex-free

example.com/df-root/df_complex-used

example.com/df-root/df_complex-rsvd

Load the aggregation plugin

Page 17: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Selection

● Five fields usable for selection

○ Host○ Plugin○ PluginInstance○ Type (mandatory)○ TypeInstance

Page 18: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Selection

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

</Aggregation>

</Plugin>

example.com/cpu-0/cpu-idle

example.com/cpu-0/cpu-user

example.com/cpu-0/cpu-wait

example.com/cpu-1/cpu-idle

example.com/cpu-1/cpu-user

example.com/cpu-1/cpu-wait

example.com/cpu-2/cpu-idle

example.com/cpu-2/cpu-user

example.com/cpu-2/cpu-wait

Select metrics

Page 19: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Grouping

● Four fields usable for selection

○ Host○ Plugin○ PluginInstance○ TypeInstance

● One field unspecified (or more)

Page 20: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Grouping

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

GroupBy Host

GroupBy TypeInstance

</Aggregation>

</Plugin>

example.com/cpu-???/cpu-idle

example.com/cpu-???/cpu-user

example.com/cpu-???/cpu-wait

Configure grouping

Page 21: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation: Functions

● Up to six aggregate functions

○ Count○ Sum○ Minimum○ Maximum○ Average○ Standard deviation

Page 22: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

LoadPlugin aggregation

<Plugin aggregation>

<Aggregation>

Plugin "cpu"

Type "cpu"

GroupBy Host

GroupBy TypeInstance

CalculateSum true

</Aggregation>

</Plugin>

example.com/cpu-sum/cpu-idle

example.com/cpu-sum/cpu-user

example.com/cpu-sum/cpu-wait

Select aggregate function(s)

Page 23: OSMC 2014: Introduction into collectd | Florian Foster

Aggregation

● Creates additional metrics

● Use chains to filter out unwanted "raw" metrics.

● Usable on client and/or server.

Page 24: OSMC 2014: Introduction into collectd | Florian Foster

Agenda

● collectd

● Aggregation of metrics

● Alerting with Icinga

Page 25: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● Load the Unixsock plugin

● Query and check values with collectd-nagios

● Both come with collectd

Page 26: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

LoadPlugin unixsock

<Plugin unixsock>

SocketFile "/var/run/collectd-unixsock"

SocketGroup "collectd-nagios"

SocketPerms "0660"

DeleteSocket true

</Plugin>

Load the Unixsock plugin

Page 27: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

-> GETVAL example.com/cpu-average/cpu-wait

<- 1 Value found

<- value=8.540017+e00

Query values with the Unixsock plugin

Page 28: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● collectd-nagios queries and checks metrics

● Ranged -w (warn) and -c (critical) options

● Conforms to Icinga's best practices

Page 29: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

$ collectd-nagios -s /var/run/collectd-unixsock \

> -n cpu-average/cpu-wait -H example.com \

> -w '0:10' -c '0:25'

OKAY: 0 critical, 0 warning, 1 okay | value=8.540017;;;;

Example: collectd-nagios

Page 30: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

define command{ command_name check_cpuio_collectd command_line collectd-nagios \

-H $HOSTNAME$ \

-n cpu-average/cpu-wait \

-w $ARG1$ -c $ARG2$

}

define service{ use generic-service host_name example.com service_description I/O wait check_command \

check_cpuio_collectd!10:!5: }

commands.cfg services.cfg

Page 31: OSMC 2014: Introduction into collectd | Florian Foster

Alerting

● What's next?

○ Use "passive checks"

○ Let collectd push metrics to Icinga 2?

○ Bring on the patches!

Page 32: OSMC 2014: Introduction into collectd | Florian Foster

Thank you!

Thank you!

Page 33: OSMC 2014: Introduction into collectd | Florian Foster

Questions?

It's time for

Questions