What the hell is your software doing at runtime?

Preview:

Citation preview

{ }

{ }

{ }

Firenze, November 17th 2015

Roberto “FRANK” Franchini

@robfrankie

Increase business value, measure it!

What the hell is your software doing at runtime?

More than 15 years of experience, proud to be a programmer

Member of OrientDB team, tech lead for the full-text, spatial, JDBC and Docker images

Wrote software for NLP and opinion mining (@scale )

Played with servers, then bought a sysadmin

JUG-Torino co-lead

2

whoami(1)

Agenda

Quotes

System monitoring

Coding

Application monitoring

All together

Feedback

Sample Scenario3

{ }

{ }

{ }

Quotes

Business value

Our code generates business value

when it runs, not when we write it.

We need to know what our code does when it runs.

We can’t do this unless we measure it.

(Codahale)

5

SLA driven

Have an SLA for your service

Measure and report performance against the SLA

(Ben Treynor, Google inc.)

6

{ }

{ }

{ }

System monitoring

Infrastructure monitoring

Sysadmins monitor infrastructure

from the beginning of IT

With right tools a single BOFH

can handle hundreds of servers

8

Tools

On premises

collectd zabbix zenoss

nagios cacti graphite/grafana

Cloud based

datadog newrelic

9

Measures

Cpu load

Network traffic

Disk I/O

Memory

More and more

10

Charts

11

Dashboard

12

Cool, black dashboard

13

{ }

{ }

{ }

Code and deploy

Write

TDD

SOLID principles

Design Patterns

Code metrics

15

Build

unit tests

integration tests

performance tests

test coverage

code quality reports

16

Deploy

Deployment pipeline

Microservices

Container

Cloud

17

Rest

All done, take your rest

Umh

I don’t think so anymore

18

{ }

{ }

{ }

Application monitoring

The day after deployment

How to monitor our service status?

How to measure it?

How it behave?

How it interact with other parts of the system?

Multiply for each µ-service

20

Monitorability

Design sw to be monitorable

Expose metrics (JMX)

Expose status (REST api)

Send metrics to monitoring tools

21

We need application monitoring

“Application monitoring? WHAT?”

“Ok, let me explain

What the app is doing right now?

How is the app performing right now?

And then graph it!”

“Ok, I got it!”

“Let me see”22

5 minutes laterpublic class PoorManJavaMetrics {

int called;

long totalTime;

public void doThings() {

final long start = System.currentTimeMillis();

//heavy business logic

called++;

final long end = System.currentTimeMillis();

final long duration = end - start;

totalTime +=duration;

}

public void logStats() {

System.out.println("---stats---");

//Here be DRAGONS

}

}

23

24Luca Franchini

Use the right tool

Use a library (e.g.: dropwizard metrics)

Count events, measure duration

Log metric values

Send application metrics

to the same backend of system metrics

25

Don’t forget naming!

A naming pattern<namespace>.<instrumented section>

.<target (noun)>.<action (past tense verb)>

Such asaccounts.authentication.password.failed

Use prefix

prod, test, dev, local

prod.accounts.authentication.password.failed

26

Which metrics?

Rate of documents processed

Latency

Transactions per second (€€€€)

Total number of errors

Meantime user interaction

27

{ }

{ }

{ }

All together now

Code on systems

Don’t cross the streams

Enable code metrics means

sysadmins and devs in the same room

talking to each other

to improve business value

29

Send

application metrics to

the same backend

of system metrics

30

Correlate application

and

system metrics

31

Repeat with me

32

Correlate application

and

system metrics

(Cross the streams!)

33

Single metrics backend

graphite

collectd

applications

grafana

34

To do what?

Discover bottlenecks

post-mortem analysis

SLA monitoring

IO impact

Network traffic

Memory utilization

35

To do what?

Why is performing better on dev laptop?

Why on customer infrastructure it takes 24h (our old test server takes 1h)?

Mechanical sympathy at large: the new service is fucking up the I/O

36

Implement THE User Story

Given the application running

when the manager comes

then I want to show a big green number

37

The answer

42

38

Application metrics dashboard

39

Get feedback

40

It’s all about feedback

Our code is talking to us

Listen to it

And take decisions

Decisions

Set new SLAs

Refactor bottleneck

Buy new hw

Expand the cloud

Drop a product

41

42

write code

deploy it

measure it

get feedback

Iterative

10 define some metrics

20 deploy

30 add other metrics

40 goto 10

Are you able to deploy every day?

43

{ }

{ }

{ }

Sample scenario

45 bare metal servers

Ngnix, Jetty, PostgreSQL

GlusterFS, Queues,

Redis, Jenkins (cron on steroids)

Infrastructure

45

Software

Java shop

deploy with Docker

More than 120 webapps

More than 100 batch jobs

NRT stream processing jobs running 24x7

46

Monitoring

collectD, graphite, grafana for system monitoring

Dropwizard Metrics inside code for application monitoring

Application metrics reported to graphite too

47

Feedback and decisions

WTF happened last night?

How is it going this morning?

Do you think we can survive the message flood?

Hey boss, it’s time to buy a new server, we are running out of resources.

48

{ }

{ }

{ }

Wrap up

Shopping list

Define your SLAs/target

Code and deploy with good practices

Code with monitorability in mind

Monitor your app/service

Correlate system and application metrics

Get feedback

Take decisions50

References

https://dropwizard.github.io/metrics/3.1.0/

https://dl.dropboxusercontent.com/u/2744222/2011-04-09-Metrics-Metrics-Everywhere.pdf

http://graphite.wikidot.com/

http://grafana.org/

http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/

https://www.usenix.org/sites/default/files/conference/protected-

files/srecon15_slides_limoncelli.pdf51

Credits

Sketches by my sons

Andrea (Andrew) and Luca (Luke) Franchini

Cool dashboards are made with Grafana

52

{ }

{ }

{ }

Thank youRoberto Franchiniro.franchini@gmail.comr.franchini@orientdb.com@robfrankie