Monitoring by Zabbix: The Final Frontier

Monitoring by Zabbix: the Final Frontier

Detect problems way before end users

AgendaProgramming languages we use to build our software

Standard approach to monitoring

How Zabbix does it?

Who am I?Alexei Vladishev

Creator of Zabbix

CEO and Architect

@avladishev

Riga | Tokyo | New York

Runtime issues

Memory leaks

Uninitialised pointers

Require discipline!

Runtime issues

Memory leaks

Require discipline!

Runtime issues

Out of memory

GC affects execution

Runtime issues

Memory leaks

Require discipline!

Runtime issues

Out of memory

GC affects execution

Runtime issues

Out of memory

Slow execution

Hard to predict resource usage

No guarantees: performance, resource usage, availability, etc.

Confluence KB: How to fix out of memory errors by increasing available memory?

We aren't really able to give a concrete recommendation for the amount of memory to allocate, because that will depend greatly on your server setup, the size of your user base, and their behaviour. You will need to find a value that works for you, ie no noticeable GC pauses, and no OutOfMemory errors.

Solution: Increase Xmx in small increments (eg 512mb at a time), until you no longer experience the OutOfMemory error.

Too many bad things may happen at runtime

That’s why we need monitoring!

Monitoring is about describing abnormal behaviour of our

systems

How to detect it?

Typical approach

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50

CPU load > 5

Typical approach

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50

CPU load > 5

Problem Problem Problem

Recovery Recovery

Too sensitive Flapping

Zabbix does it smart way

History

Analysis

Data collection

Zabbix server

History

Analysis

Data collection

Alerts

Zabbix server

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 10:55 11:00 11:05 11:10

Analyse historyCPU load for the last 10 minutes > 5

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 10:55 11:00 11:05 11:10

Analyse historyProblem!

CPU load for the last 10 minutes > 5

Recovery

Problem disappeared !=

problem is resolved

Problem: free disk space <= 10%

Now free disk space is 10.001%

Have we resolved our problem?

Problem: free disk space <= 10%

Now free disk space is 10.001%

Problem resolved?

Different conditions

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50

Problem: CPU load > 5 Recovery: CPU load < 1

Different conditions

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50

Problem: CPU load > 5 Recovery: CPU load < 1

Problem!

Recovery

No flapping!

Smarter approachProblem if Free disk space < 10%

Recovery if Free disk space > 30% for the last 15 minutes

Problem if 3 consecutive checks of REST service failed

Recovery if 10 consecutive checks of REST service are OK

Anomaly detection

10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 10:55 11:00 11:05 11:10

Compare current system state with the past

Anomaly!

Forecasting

7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00

Forecasting

7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00

y = -2,9455x + 48,309

When and value after period of time

Problem in the future

ConclusionMonitoring by is your best friend

Use smart problem detection, do not spam DevOps

Detect problems way before end users notice

Anomalies

Forecasting

Thank you!Learn more about Zabbix at our booth!

@avladishev

Email: alex@zabbix.com

Monitoring by Zabbix: The Final Frontier

Software

Why Zabbix as monitoring engine? Functional separation Access

SA2 T3 GTS Monitoring in Zabbix (Resources and Projects) · PDF fileSA2 T3 GTS Monitoring in Zabbix (Resources and Projects) Nicolai ILIUHA, RENAM, Moldova 04/11/15, Copenhagen

Monitoring 7000+ hosts in Zabbix - cdn.cavaliercoder.comcdn.cavaliercoder.com/blog/2016-09-18-zabbix-conference-2016/... · # test built-in item $ zabbix_agent_bench -key agent.ping

54 companies from - kampan.snt.sk · Mastering Zabbix S Edition Zabbix Network Monitoring . ZABBIX RUSSIA . Cloud resources Applications Virtual layer Middleware os Network Hardware

MySQL Performance Monitoring with Zabbix An alternative to the

Zabbix 4.0 and beyond - kampan.snt.skkampan.snt.sk/zabbix2018/pdf/Zabbix 4.0 and beyond... · The Universal Open Source Enterprise Level Monitoring Solution Zabbix 4.0 and beyond

ChinaNetCloud - Using Zabbix Monitoring at Scale - Zabbix Conference 2014

Monitoring Oracle Database Instances with Zabbix

Monitoring Cloud Applications Using Zabbix · Monitoring Cloud Applications Using Zabbix ”You must always be able to predict what’s next and then have the ﬂexibility to evolve

Zabbix 3.0 Training Certified Specialist - xiaotonghz.com.cn · Zabbix 3.0 Training Certified Specialist Day 2 The Enterprise class Monitoring Solution for Everyone

Monitoring databases with zabbix · • Database monitoring using Zabbix principal consultant @ Experis Ciber [Oracle] DBA, also postgres, cockroachDB Oracle ACE Oracle Certified

Zabbix at BlaBlaCar - Paris Monitoring meetup #1

MySQL Monitoring with Zabbix

Zabbix Network Monitoring Essentials - Sample Chapter

STORAGE DEVICES ZABBIX MONITORING

SECURITY-RELATED MONITORING WITH ZABBIX › files › zabbix_summit_2019 › Kaspars_Me… · SECURITY-RELATED MONITORING WITH ZABBIX UNSECURE WEB PAGES. 34 HOW CAN WE FIND HTTP ENABLED

ZABBIX Manual v1 Manual v1.4.pdf · ZABBIX Manual v1.4 ZABBIX ZABBIX

Zabbix in Japan - LIAA · 3 What is Zabbix? An image of Google datacenter Zabbix is a solution for monitoring IT infrastructure

Ryan Armstrong - Monitoring More Than 6000 Devices in Zabbix | ZabConf2016

Zabbix Conference LatAm 2016 - Marcio Prop - Monitoring Complex Environments with Zabbix Monitoring Solution.pdf