View
231
Download
1
Category
Preview:
Citation preview
{ }
{ }
{ }
Firenze, November 17th 2015
Roberto “FRANK” Franchini
@robfrankie
Increase business value, measure it!
What the hell is your software doing at runtime?
More than 15 years of experience, proud to be a programmer
Member of OrientDB team, tech lead for the full-text, spatial, JDBC and Docker images
Wrote software for NLP and opinion mining (@scale )
Played with servers, then bought a sysadmin
JUG-Torino co-lead
2
whoami(1)
Agenda
Quotes
System monitoring
Coding
Application monitoring
All together
Feedback
Sample Scenario3
{ }
{ }
{ }
Quotes
Business value
Our code generates business value
when it runs, not when we write it.
We need to know what our code does when it runs.
We can’t do this unless we measure it.
(Codahale)
5
SLA driven
Have an SLA for your service
Measure and report performance against the SLA
(Ben Treynor, Google inc.)
6
{ }
{ }
{ }
System monitoring
Infrastructure monitoring
Sysadmins monitor infrastructure
from the beginning of IT
With right tools a single BOFH
can handle hundreds of servers
8
Tools
On premises
collectd zabbix zenoss
nagios cacti graphite/grafana
Cloud based
datadog newrelic
9
Measures
Cpu load
Network traffic
Disk I/O
Memory
More and more
10
Charts
11
Dashboard
12
Cool, black dashboard
13
{ }
{ }
{ }
Code and deploy
Write
TDD
SOLID principles
Design Patterns
Code metrics
15
Build
unit tests
integration tests
performance tests
test coverage
code quality reports
16
Deploy
Deployment pipeline
Microservices
Container
Cloud
17
Rest
All done, take your rest
Umh
I don’t think so anymore
18
{ }
{ }
{ }
Application monitoring
The day after deployment
How to monitor our service status?
How to measure it?
How it behave?
How it interact with other parts of the system?
Multiply for each µ-service
20
Monitorability
Design sw to be monitorable
Expose metrics (JMX)
Expose status (REST api)
Send metrics to monitoring tools
21
We need application monitoring
“Application monitoring? WHAT?”
“Ok, let me explain
What the app is doing right now?
How is the app performing right now?
And then graph it!”
“Ok, I got it!”
“Let me see”22
5 minutes laterpublic class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = System.currentTimeMillis();
//heavy business logic
called++;
final long end = System.currentTimeMillis();
final long duration = end - start;
totalTime +=duration;
}
public void logStats() {
System.out.println("---stats---");
//Here be DRAGONS
}
}
23
24Luca Franchini
Use the right tool
Use a library (e.g.: dropwizard metrics)
Count events, measure duration
Log metric values
Send application metrics
to the same backend of system metrics
25
Don’t forget naming!
A naming pattern<namespace>.<instrumented section>
.<target (noun)>.<action (past tense verb)>
Such asaccounts.authentication.password.failed
Use prefix
prod, test, dev, local
prod.accounts.authentication.password.failed
26
Which metrics?
Rate of documents processed
Latency
Transactions per second (€€€€)
Total number of errors
Meantime user interaction
27
{ }
{ }
{ }
All together now
Code on systems
Don’t cross the streams
Enable code metrics means
sysadmins and devs in the same room
talking to each other
to improve business value
29
Send
application metrics to
the same backend
of system metrics
30
Correlate application
and
system metrics
31
Repeat with me
32
Correlate application
and
system metrics
(Cross the streams!)
33
Single metrics backend
graphite
collectd
applications
grafana
34
To do what?
Discover bottlenecks
post-mortem analysis
SLA monitoring
IO impact
Network traffic
Memory utilization
35
To do what?
Why is performing better on dev laptop?
Why on customer infrastructure it takes 24h (our old test server takes 1h)?
Mechanical sympathy at large: the new service is fucking up the I/O
36
Implement THE User Story
Given the application running
when the manager comes
then I want to show a big green number
37
The answer
42
38
Application metrics dashboard
39
Get feedback
40
It’s all about feedback
Our code is talking to us
Listen to it
And take decisions
Decisions
Set new SLAs
Refactor bottleneck
Buy new hw
Expand the cloud
Drop a product
41
42
write code
deploy it
measure it
get feedback
Iterative
10 define some metrics
20 deploy
30 add other metrics
40 goto 10
Are you able to deploy every day?
43
{ }
{ }
{ }
Sample scenario
45 bare metal servers
Ngnix, Jetty, PostgreSQL
GlusterFS, Queues,
Redis, Jenkins (cron on steroids)
Infrastructure
45
Software
Java shop
deploy with Docker
More than 120 webapps
More than 100 batch jobs
NRT stream processing jobs running 24x7
46
Monitoring
collectD, graphite, grafana for system monitoring
Dropwizard Metrics inside code for application monitoring
Application metrics reported to graphite too
47
Feedback and decisions
WTF happened last night?
How is it going this morning?
Do you think we can survive the message flood?
Hey boss, it’s time to buy a new server, we are running out of resources.
48
{ }
{ }
{ }
Wrap up
Shopping list
Define your SLAs/target
Code and deploy with good practices
Code with monitorability in mind
Monitor your app/service
Correlate system and application metrics
Get feedback
Take decisions50
References
https://dropwizard.github.io/metrics/3.1.0/
https://dl.dropboxusercontent.com/u/2744222/2011-04-09-Metrics-Metrics-Everywhere.pdf
http://graphite.wikidot.com/
http://grafana.org/
http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/
https://www.usenix.org/sites/default/files/conference/protected-
files/srecon15_slides_limoncelli.pdf51
Credits
Sketches by my sons
Andrea (Andrew) and Luca (Luke) Franchini
Cool dashboards are made with Grafana
52
{ }
{ }
{ }
Thank youRoberto Franchiniro.franchini@gmail.comr.franchini@orientdb.com@robfrankie
Recommended