How to build a container monitoring solution - David Gildeh, CEO and Co-Founder of Outlyer

Preview:

Citation preview

Monitoring for Cloud Scale Microservices

www.outlyer.com | @outlyerapp

How to build a Container

Monitoring Solution04-May-2017

Some Fun Facts About Docker

The Average

Host Runs 8

Containers

=

8 times more

metrics per

host

The Average

Container

Runs for 2

days

=

More Metric

Churn

12% of Hosts

Run

Containers

& % Growing

Fast

Source: Outlyer Customers

Monitoring Docker – The Basics

Physical Server

Hypervisor

OS OS

Mo

nito

rin

g A

ge

nt

MySQL

Java

Tomcat

Mo

nito

rin

g A

ge

nt

MySQL

PHP

Apache

VM Monitoring

Physical Server

Hypervisor

OS OS

Mo

nito

rin

g

Agent

MyS

QL

Ja

va

To

mca

t

Container Monitoring

Docker

Mo

nito

rin

g

Agent

MyS

QL

PH

P

Ap

ach

e

Docker

Monitoring Docker – The Basics

VM Monitoring

• All processes are accessible from ‘localhost’

• Agent runs in each VM

• Simple plugins can monitor each process

Container Monitoring

• All processes are siloed into containers

• Agent on each host inside its own container

or on the host VM

• Can’t monitor inside containers so you have

to monitor from the outside like a remote

mini-host

Monitoring Docker – Monitoring Processes

Shell Command Monitoring

Need to run “docker exec” with container ID

Endpoint Monitoring

Need to inject container IP address on internal

Docker network

Monitoring Docker – Organized Chaos via Orchestration

What Container Monitoring Looks Like in the Real World

Configuration Management is replaced with Auto-discovery

Summary: Everything’s dynamic & needs smart automation.

cAdvisor (GO Binary)

Collection

Where we started: V1 with cAdvisor

Container

Autodiscovery &

Metrics

(Pseudo Files)

Prometheus

Scraper

Generic

Application Metric

Scraper

Read Store

Web UI

REST API

Prometheus

Endpoint

StatdD

InfluxDB

ElasticSearch

Redis

Kafka

BigQuery

Where we started: V1 with cAdvisor – however…

• Used a lot of memory

• Kept crashing & hard to debug remotely

• Hard to customize

Back to drawing board: Replace cAdvisor

Back to drawing board: Use our Prometheus Scraper?

Ports: 9,000 – 10,000

9104

9113

3002

Agent

Back to drawing board: Build our own custom integration

Docker – Getting Metrics Out

Collection Point CPU Metrics Memory Metrics I/O Metrics Network Metrics

Pseudo-Files Yes Yes Some From 1.6.1

Stats Command Basic Basic From 1.9.0 Basic

Docker Remote

API

Yes Yes Some Yes

Docker Pseudo-Files

$ docker exec $CONTAINER_ID cat /sys/fs/cgroup/memory/memory.stat

cache 532480

rss 44650496

rss_huge 0

mapped_file 0

dirty 0

writeback 0

swap 0

pgpgin 244711

pgpgout 233680

pgfault 545794

pgmajfault 0

inactive_anon 8192

active_anon 44703744

inactive_file 102400

active_file 290816

unevictable 0

hierarchical_memory_limit 9223372036854771712

hierarchical_memsw_limit 9223372036854771712

total_cache 532480

total_rss 44650496

total_rss_huge 0

total_mapped_file 0

total_dirty 0

total_writeback 0

total_swap 0

total_pgpgin 244711

total_pgpgout 233680

total_pgfault 545794

total_pgmajfault 0

total_inactive_anon 8192

total_active_anon 44703744

total_inactive_file 102400

total_active_file 290816

total_unevictable 0

Docker Stats Command

$ docker stats CONTAINER_ID [CONTAINER_ID...]

CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS

333db2d96a40 0.19% 50.61 MiB / 1.952 GiB 2.53% 60 kB / 195 kB 0 B / 36.9 kB 35

As of Docker 1.9.0 stats command also includes disk IO metrics.

Docker Remote API

GET /containers/{id}/stats

{

"read": "2015-01-08T22:57:31.547920715Z",

"pids_stats": {

”current": 3

},

"networks": {

"eth0": {},

"eth5": {}

},

"memory_stats": {

"stats": {},

"max_usage": 6651904,

"usage": 6537216,

"failcnt": 0,

"limit": 67108864

},

"blkio_stats": { },

"cpu_stats": {

"cpu_usage": {

"percpu_usage": [],

"usage_in_usermode": 50000000,

"total_usage": 100215355,

"usage_in_kernelmode": 30000000

},

"system_cpu_usage": 739306590000000,

"online_cpus": 4,

"throttling_data": {}

},

"precpu_stats": {

"cpu_usage": {},

"system_cpu_usage": 9492140000000,

"online_cpus": 4,

"throttling_data": {}

}

}

The Winner: Pseudo Files

Collection Point Ranking Reasoning

Pseudo-Files 1 Reliable between Docker versions

Stats Command 3 Basic reporting, only works with Docker

Docker Remote API 2 Good reporting but would vary by Docker version and

also may have networking issues

Making Nagios Plugins Work Against Containers

Shell Command Monitoring

Need to run “docker exec” with container ID

Endpoint Monitoring

Need to inject container IP address on internal

Docker network

Making Nagios Plugins Work Against Containers:

Making it Magic

Making it work with Orchestrators

Services = Pets, Containers = Cattle.

Making it work with Orchestrators

Se

rvic

e

Image from http://blog.arungupta.me/kubernetes-design-patterns/

Making it work with Orchestrators: Dimensional Labels

Every container and their metrics gets applied the following

dimensional labels via Kubernetes:

• Node

• Pod

• Service

• Custom Labels

Making it work with Orchestrators: Host View

Metric Series Churn = Constantly Growing Indexes

ContainerID = 1:

cpu.user=22%

rss=44232322

active_file=232232

Swap=0

ContainerID = 2:

cpu.user=22%

rss=44232322

active_file=232232

Swap=0

ContainerID = 3:

cpu.user=22%

rss=44232322

active_file=232232

Swap=0

Metric Series Churn Solution: Partition indexes by time

https://fabxc.org/blog/2017-04-10-writing-a-tsdb/

What’s Next?

Services & Tracing

Monitoring, done differently.

Signup for free at:

www.outlyer.com

Recommended