Open Source Monitoring in 2015

Preview:

Citation preview

From #MonitoringSucks to  From #MonitoringSucks to  #MonitoringLove #MonitoringLove 

(and back)(and back)

@KrisBuytaert

T-Dose 2015, Eindhoven,.nl

Kris BuytaertKris Buytaert● I used to be a Dev,I used to be a Dev,● Then Became an OpThen Became an Op● Chief Trolling Officer and Open Source Chief Trolling Officer and Open Source

Consultant @inuits.euConsultant @inuits.eu● Everything is an effing DNS ProblemEverything is an effing DNS Problem● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore● Organising Conferences Organising Conferences ● Evangelizing devopsEvangelizing devops

An opinionated talk about the Open Source An opinionated talk about the Open Source Monitoring tooling landscapeMonitoring tooling landscape

In which I hope to learn from YOUIn which I hope to learn from YOU

#devops=~C(L)AMS#devops=~C(L)AMS● CultureCulture

● (Lean)(Lean)

● AutomationAutomation

● Monitoring and MeasurementMonitoring and Measurement

● SharingSharing

Damon Edwards and John WillisDamon Edwards and John Willis

Gene KimGene Kim

Monitoring is usually an Monitoring is usually an aftertoughtaftertought

ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME

An 2008 OLS PaperAn 2008 OLS Paper● We have bloated Java toolsWe have bloated Java tools

● Some open Core stufSome open Core stuf

● DYI folks want traditional NagiosDYI folks want traditional Nagios

● DBA RequiredDBA Required

#monitoringsucks#monitoringsucks● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011

● A sub #devops movement A sub #devops movement

● https://github.com/monitoringsucks/https://github.com/monitoringsucks/

Why #monitoringsucksWhy #monitoringsucks● Manual config (gui)Manual config (gui)

● Not in sync with realityNot in sync with reality

● Hosts onlyHosts only

● Services sometimesServices sometimes

● Aplication neverAplication never

● Chaos or out of sync with realityChaos or out of sync with reality

● Alert FatigueAlert Fatigue

Let's forget aboutLet's forget about● Tools with no (stable) APITools with no (stable) API

● Tools with strong focus on GUITools with strong focus on GUI

● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes

● Zenoss, Hyperic, GroundWork, ....Zenoss, Hyperic, GroundWork, ....

● P.S. : don't even mention proprietary software to meP.S. : don't even mention proprietary software to me

What we wantWhat we want

● Small , well suited componentsSmall , well suited components

• CollectCollect

• Transport / MangleTransport / Mangle

• StoreStore

• Analyse Analyse

• Act / Alert Act / Alert

• VisualizeVisualize

#monitoringlove#monitoringlove•

• Ulf Mansson #devopsdays Rome 2011 Ulf Mansson #devopsdays Rome 2011

• A new era of toolingA new era of tooling

• #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits

• #monitorama#monitorama

IcingaIcinga• 2009 Fork2009 Fork

• I consider Nagios deadI consider Nagios dead

• Vibrant Community (or they stalk me)Vibrant Community (or they stalk me)

• Throw great parties in NurnbergThrow great parties in Nurnberg

• Nobody can pronounce it anyhowNobody can pronounce it anyhow

• https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/

AutomationAutomation

#monitoringlove#monitoringloveBut the love was about :But the love was about :

SensuSensu● Awesome for non static Awesome for non static

environmentsenvironments

● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ?

● This is Europe, U no do cloudThis is Europe, U no do cloud

Automation of Automation of #monitoring #monitoring brought back brought back

the the #love#love

Monitoring a Monitoring a serviceservice

vs vs

Monitoring a Monitoring a ServiceService

definition of done:definition of done:

monitored and in productionmonitored and in production

A software project is not done A software project is not done untill your last end user is deaduntill your last end user is dead

Culture, Culture,

Automation,Automation,

Measurement :Measurement :

measure all the thingsmeasure all the thingsSharingSharing

Deploy StatisticsDeploy Statistics● Time To DeployTime To Deploy

● Deploy Deploy FrequencyFrequency

● Lifecycle Lifecycle frequencyfrequency

● Map to other Map to other metrics metrics

CollectD all the metrics, CollectD all the metrics,

at high intervalsat high intervals

Oldschool graphiteOldschool graphite

Self ServiceSelf ServiceGdash based pipelinesGdash based pipelines

Puppetized Templates (wip)Puppetized Templates (wip)

GdashGdash

GrafanaGrafana

Graphite++Graphite++● Dashboards Dashboards

• GrafanaGrafana

● Engines : Engines :

• InfluxDBInfluxDB

• CyaniteCyanite

Triggers on GraphsTriggers on Graphs● Export Java MetricsExport Java Metrics

● JMXTransJMXTrans

● Export JMXConfigsExport JMXConfigs

● Configure NRPE CheckConfigure NRPE Check

● Export NagiosCheckExport NagiosCheck

● Collect JMX Exports on Collect JMX Exports on JMXTransNodeJMXTransNode

● Graph EmGraph Em

Collect Icinga Configs Collect Icinga Configs on Icingaon Icinga

Aggregation Aggregation ● Alert on streamsAlert on streams

● Alert on aggregated metricsAlert on aggregated metrics

RiemannRiemann● I still don't get it ?I still don't get it ?

● Distributed TopDistributed Top

● Do you like Clojure ?Do you like Clojure ?

● Riemann Health plugin ?Riemann Health plugin ?

● s/riemann-health/collectd/g;s/riemann-health/collectd/g;

● Output to graphiteOutput to graphite

Graphs to KnowledgeGraphs to Knowledge

SkylineSkyline

• OculusOculus

• Creating Information out of this data Creating Information out of this data

• Big dataBig data

• Machine LearningMachine Learning

But I have log files..But I have log files..

Logs and MetricsLogs and Metrics● Graylog2Graylog2

● ELSA (Enterprise Log Search and ELSA (Enterprise Log Search and Archive) Archive)

● ELK StackELK Stack

● Collect from Collect from anywhereanywhere

● FilterFilter

● Send anywhereSend anywhere

● QueingQueing

APMAPMBut what about my apps ?But what about my apps ?

Half the world cheers about SAAS Half the world cheers about SAAS tools :(tools :(

PacketbeatPacketbeat● Traffic Flow Traffic Flow

through networkthrough network

● Transactions Transactions causing errroscausing errros

● SQL per HTTPSQL per HTTP

● API call usageAPI call usage

PacketBeatPacketBeat

So your DC failsSo your DC fails

Whom to alert when ?Whom to alert when ?

'New' kids on the block'New' kids on the block● FlapjackFlapjack

flapjack.ioflapjack.io

monitoring notification routing + monitoring notification routing + event processing systemevent processing system

● OpenDuty OpenDuty

github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty

Duty managementDuty management

My Alerting StrategyMy Alerting Strategy

Is still in beta Is still in beta

And back :(And back :(

In 2014 I`m still running the same check forIn 2014 I`m still running the same check for

- service registration (consul)- service registration (consul)

- high availability (pacemaker/corosync)- high availability (pacemaker/corosync)

- monitoring (icinga)- monitoring (icinga)

But I love where Monitoring is heading But I love where Monitoring is heading

We have much less false positivesWe have much less false positives

And we have a Maintainable Monitoring InfraAnd we have a Maintainable Monitoring Infra

KindaKinda

ContactContactKris.Buytaert@inuits.euKris.Buytaert@inuits.eu

Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.eu/http://www.inuits.eu/

InuitsInuits

Duboistraat 50Duboistraat 502060 Antwerpen2060 AntwerpenBelgiumBelgium891.514.231891.514.231

+32 475 961221+32 475 961221

Recommended