What the hell is your software doing at runtime?

Firenze, November 17th 2015

Roberto “FRANK” Franchini

@robfrankie

Increase business value, measure it!

More than 15 years of experience, proud to be a programmer

Member of OrientDB team, tech lead for the full-text, spatial, JDBC and Docker images

Wrote software for NLP and opinion mining (@scale )

Played with servers, then bought a sysadmin

JUG-Torino co-lead

whoami(1)

Agenda

Quotes

System monitoring

Coding

Application monitoring

All together

Feedback

Sample Scenario3

Quotes

Business value

Our code generates business value

when it runs, not when we write it.

We need to know what our code does when it runs.

We can’t do this unless we measure it.

(Codahale)

SLA driven

Have an SLA for your service

Measure and report performance against the SLA

(Ben Treynor, Google inc.)

System monitoring

Infrastructure monitoring

Sysadmins monitor infrastructure

from the beginning of IT

With right tools a single BOFH

can handle hundreds of servers

On premises

collectd zabbix zenoss

nagios cacti graphite/grafana

Cloud based

datadog newrelic

Measures

Cpu load

Network traffic

Disk I/O

Memory

More and more

Charts

Dashboard

Cool, black dashboard

Code and deploy

SOLID principles

Design Patterns

Code metrics

unit tests

integration tests

performance tests

test coverage

code quality reports

Deploy

Deployment pipeline

Microservices

Container

All done, take your rest

I don’t think so anymore

Application monitoring

The day after deployment

How to monitor our service status?

How to measure it?

How it behave?

How it interact with other parts of the system?

Multiply for each µ-service

Monitorability

Design sw to be monitorable

Expose metrics (JMX)

Expose status (REST api)

Send metrics to monitoring tools

We need application monitoring

“Application monitoring? WHAT?”

“Ok, let me explain

What the app is doing right now?

How is the app performing right now?

And then graph it!”

“Ok, I got it!”

“Let me see”22

5 minutes laterpublic class PoorManJavaMetrics {

int called;

long totalTime;

public void doThings() {

final long start = System.currentTimeMillis();

//heavy business logic

called++;

final long end = System.currentTimeMillis();

final long duration = end - start;

totalTime +=duration;

public void logStats() {

System.out.println("---stats---");

//Here be DRAGONS

24Luca Franchini

Use the right tool

Use a library (e.g.: dropwizard metrics)

Count events, measure duration

Log metric values

Send application metrics

to the same backend of system metrics

Don’t forget naming!

A naming pattern<namespace>.<instrumented section>

.<target (noun)>.<action (past tense verb)>

Such asaccounts.authentication.password.failed

Use prefix

prod, test, dev, local

prod.accounts.authentication.password.failed

Which metrics?

Rate of documents processed

Latency

Transactions per second (€€€€)

Total number of errors

Meantime user interaction

All together now

Code on systems

Don’t cross the streams

Enable code metrics means

sysadmins and devs in the same room

talking to each other

to improve business value

application metrics to

the same backend

of system metrics

Correlate application

system metrics

Repeat with me

Correlate application

system metrics

(Cross the streams!)

Single metrics backend

graphite

collectd

applications

grafana

To do what?

Discover bottlenecks

post-mortem analysis

SLA monitoring

IO impact

Network traffic

Memory utilization

To do what?

Why is performing better on dev laptop?

Why on customer infrastructure it takes 24h (our old test server takes 1h)?

Mechanical sympathy at large: the new service is fucking up the I/O

Implement THE User Story

Given the application running

when the manager comes

then I want to show a big green number

The answer

Application metrics dashboard

Get feedback

It’s all about feedback

Our code is talking to us

Listen to it

And take decisions

Decisions

Set new SLAs

Refactor bottleneck

Buy new hw

Expand the cloud

Drop a product

write code

deploy it

measure it

get feedback

Iterative

10 define some metrics

20 deploy

30 add other metrics

40 goto 10

Are you able to deploy every day?

Sample scenario

45 bare metal servers

Ngnix, Jetty, PostgreSQL

GlusterFS, Queues,

Redis, Jenkins (cron on steroids)

Infrastructure

Software

Java shop

deploy with Docker

More than 120 webapps

More than 100 batch jobs

NRT stream processing jobs running 24x7

Monitoring

collectD, graphite, grafana for system monitoring

Dropwizard Metrics inside code for application monitoring

Application metrics reported to graphite too

Feedback and decisions

WTF happened last night?

How is it going this morning?

Do you think we can survive the message flood?

Hey boss, it’s time to buy a new server, we are running out of resources.

Wrap up

Shopping list

Define your SLAs/target

Code and deploy with good practices

Code with monitorability in mind

Monitor your app/service

Correlate system and application metrics

Get feedback

Take decisions50

References

https://dropwizard.github.io/metrics/3.1.0/

https://dl.dropboxusercontent.com/u/2744222/2011-04-09-Metrics-Metrics-Everywhere.pdf

http://graphite.wikidot.com/

http://grafana.org/

http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/

https://www.usenix.org/sites/default/files/conference/protected-

files/srecon15_slides_limoncelli.pdf51

Credits

Sketches by my sons

Andrea (Andrew) and Luca (Luke) Franchini

Cool dashboards are made with Grafana

Thank youRoberto Franchiniro.franchini@gmail.comr.franchini@orientdb.com@robfrankie

What the hell is your software doing at runtime?

Software

Hell Fire - servantsofyahshua.com · Hell #1 . gehenna or hell fire . it is this Hell we will deal with primarily Hell #2 . hades or the grave . the Hebrew, sheol Hell #3 . Tartaros

Heaven hell

Music Hell

Hell Screen

HELL - Rossi & Rossirossirossi.com/wp-content/uploads/2017/02/Hell_on_Earth_Catalogu… · EARTH HELL ON ROSSI & ROSSI LEANG SECKON - HELL ON EARTH. 2 / 2 Leang Seckon EARTH HELL

Powers Hell

Vibrations Hell

Newsletter December 2008 - What the Hell is Hell

Pan-Hell Dance POINTER · Pan-Hell Dance Saturday Eve ·

Owning Your Stuff (Escaping from Development Dependency Hell) · Isolate runtime / interpreter • Remove variables in the OS which limits supported OSes for your project • Apple:

Physics Hell

DevCamp Toronto - What the hell microsoft is doing

Brochure Hell

Green Hell

Uploads Hell

C# (Sharp) Programming Language Interview Questions And ... · How does assembly versioning in .NET prevent DLL Hell? Answer:-1. The runtime checks to see that only one version of

Hell eterrnal

Jataka Hell

Managed Runtime Technology: General Introductionpeople.apache.org/~xli/presentations/managed-runtime-introduction.pdf · –Cons: runtime overhead to maintain counters •Reachability

B-From Mainland Hell to Island Hell