View
171
Download
1
Category
Tags:
Preview:
DESCRIPTION
Slidedeck of the talk Devoxx 2014
Citation preview
@cfalguiere#DV14 #Monitoring
Monitoring
!Claude Falguière
Valtech Paris
@cfalguiere#DV14 #Monitoring
Content
• DevOps is more than tooling!!
!
• Make you love Data!!
• Motivations for providing and collecting data!!
• Monitoring user stories and practices!!
• Getting started and open source tooling
Individuals and interactions over processes and tools
@cfalguiere#DV14 #Monitoring
Claude Falguiere
• Devoxx4Kids!• Paris JUG, Devoxx France,
Duchess!!
http://cfalguiere.wordpress.com !
• DevOps Coach !• Java, Performance!
@cfalguiere#DV14 #Monitoring
Monitoring
that database is brokenthat number of hits doubles every 2 month
that users struggle to find the order formwhy the app is slow
what users want to buy
What would you do if you knew
@cfalguiere#DV14 #Monitoring
Model
DataFacts 42 users
138 ms742 orders
Sales increased by 14%Estimated orders next month 934Average number of requests is 5 times
the number of users
ModelQuestionsHypothesis
@cfalguiere#DV14 #Monitoring
Galaxy Rotation ProblemSpiral galaxies spin too fast !!Expected mass should be ten times the observed mass - calculated from the visible objets - to prevent galaxies from flying apart
@cfalguiere#DV14 #Monitoring
Assumes readings are wrong
Discovery of Dark Matter
Hypothesis of a missing mass
??
If readings are true, is model wrong ?
Jan Oort Fritz Zwicky
Mass calculated from gravitational effects and evidence of Dark Matter
Dark matter estimated to 84.5% of the total matter in the universe
Vera Rubin
Plank Satellite
1932 - 1933 1960 - 1970 2010 - 2013
@cfalguiere#DV14 #Monitoring
• Hypothesis of a missing mass1932 - 1933
Jan Oort Fritz Zwicky
• Data are validated
• Is Universal Gravitation wrong ?
@cfalguiere#DV14 #Monitoring
• Still speculative, but generally accepted by the mainstream scientific community
Vera Rubin
• Mass calculated from gravitational effects and evidence of Dark Matter 1960 - 1970
• Dark matter estimated to 84.5% of the total matter in the universe 2010 - 2013
Plank Satellite
@cfalguiere#DV14 #Monitoring
Lean Startup DevOps
Measure everything
Big Data
Make decisions based on facts!
@cfalguiere#DV14 #Monitoring
Lean Startup DevOps
Measure everything
Big Data
Make decisions based on facts!
@cfalguiere#DV14 #Monitoring
Lean Startup DevOps
Measure everything
Big Data
Make decisions based on facts!
@cfalguiere#DV14 #Monitoring
Lean Startup DevOps
Measure everything
Big Data
Make decisions based on facts!
@cfalguiere#DV14 #Monitoring
What would you do if you knew
that database is brokenthat number of hits doubles every 2 month
that users struggle to find the order formwhy the app is slow
what users want to buy
@cfalguiere#DV14 #Monitoring
Motivations and user stories
Alerting
SLA observance!Alerting
Storage, Visualization
Diagnosis / Post-Mortem!Capacity Planning!Improvement
@cfalguiere#DV14 #Monitoring
Motivations and user stories
Alerting Storage, Visualization
SLA observance!Alerting
Diagnosis / Post-Mortem!Capacity Planning!Improvement
@cfalguiere#DV14 #Monitoring
App
Alerting
Log Parser
Support
Probe
Log
Storage, Aggregation Dev
System
Network DBA
Architecture
Collector
Visualization
@cfalguiere#DV14 #Monitoring
App
Alerting
Log Parser
Support
Probe
Log
Storage, Aggregation Dev
System
Network DBA
Architecture
Collector
Visualization
@cfalguiere#DV14 #Monitoring
App
Alerting
Log Parser
Support
Probe
Log
Storage, Aggregation Dev
System
Network DBA
Architecture
Collector
Visualization
@cfalguiere#DV14 #Monitoring
System
Collector
CollectorApp
MQ
Log
Storage
Storage
filters rules MQ
Alerting
@cfalguiere#DV14 #Monitoring
System
Collector
CollectorApp
MQ
Log
Storage
Storage
filters rules MQ
Alerting
@cfalguiere#DV14 #Monitoring
App
Alerting
Log Parser
Log
Storage, Aggregation
Topology
Collector! Visualization
App Platform
Monitoring Platform
@cfalguiere#DV14 #Monitoring
App
Alerting
Log Parser
Log
Storage, Aggregation
Resilience
Collector
Visualization
App PlatformMonitoring Platform
MQ
MQ Collector
@cfalguiere#DV14 #Monitoring
What would you do if you knew that
database is broken
@cfalguiere#DV14 #Monitoring
Error detection and alerting
• Log filtering !• Event firing!
!
• Context!• is it critical ?!• which feature does it impact ?!• how deep is the impact ?
@cfalguiere#DV14 #Monitoring
Is this a log ?Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: ! Access denied for user 'shopapp'@'shprdb1' to database 'shop'!at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)!at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)!at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)!at java.lang.reflect.Constructor.newInstance(Unknown Source)!at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)!at com.mysql.jdbc.Util.getInstance(Util.java:386)!at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)!at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237)!at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169)!at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:928)!at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1750)!at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1290)!at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2493)!at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2526)!at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2311)!at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)!at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)!at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)!at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)!at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)!at java.lang.reflect.Constructor.newInstance(Unknown Source)!at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)!at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)!at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:347)!at java.sql.DriverManager.getConnection(Unknown Source)!
@cfalguiere#DV14 #Monitoring
Log example2013-12-17 05:53:16,208 ERROR [Order Creation Service](456713) [shpras2](web1234) Could not create order id=456713 - Cause: Can’t connect to database ‘shop” - MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'!
2013-12-17 05:53:16,208 !ERROR ![Order Creation Service]!(456713) ![shpras2]!(web1234) !Could not create order id=456713 !Cause: Can’t connect to database ‘shop” !MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'!
@cfalguiere#DV14 #Monitoring
2013-12-17 05:53:16,208 !ERROR ![Order Creation Service]!(456713) ![shpras2]!(web1234) !Could not create order id=456713 !Cause: Can’t connect to database ‘shop” !MySqlMessage: Access denied for user 'shopapp'@'shprdb1' to database 'shop'!
Timestamp
Severity
Context (technical and business)}{Meaningful information
@cfalguiere#DV14 #Monitoring
Log Collectors
Logstash
Collectd
storage
Log
Alerting!System
Flume
Collector
Splunk (Commercial)
@cfalguiere#DV14 #Monitoring
Logstashinput {! file {! path => “/app/logs/apache/*.log”! type => "apachelog"! }!}!!filter {! if [type] == "apachelog" {! grok {! pattern => “%{COMBINEDAPACHELOG}" ! }! }!}!!output {! elasticsearch { host => localhost } ! stdout { }!}
@cfalguiere#DV14 #Monitoring
Logstashinput {! file {! path => “/app/logs/appserver/monitor*.log"! type => "applog"! }!}!!filter {! if [type] == "applog" {! grok {! pattern => “%{TIMESTAMP_ISO8601:ts}” %{WORD}:severity …! }! }!}!!output {! elasticsearch { host => localhost } ! stdout { }!}
@cfalguiere#DV14 #Monitoring
Rate check
• Frequency of an error increases!• Activity falls (e.g. Frequency of orders)!
!
• Alerting based on threshold
@cfalguiere#DV14 #Monitoring
Baselining
0
30
60
90
120
10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10
0
50
100
150
200
10:00 10:10 10:20 10:30 10:40 10:50 11:00 11:10
A
BB
D
C
@cfalguiere#DV14 #Monitoring
What would you do if you knew thatnumber of hits doubles every 2 month
0
30
60
90
120
Jan Feb Mar Apr May Jun Jul Aug
@cfalguiere#DV14 #Monitoring
Graphers
0
7,5
15
22,5
30
Sun Tue Thu Sat Mon Wed Wed
• Cycles
0
10
20
30
40
Sun Tue Thu Sat Mon Wed Wed
• Correlation
0
17,5
35
52,5
70
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
• Foresight
• Distribution
0
25
50
75
100
April May June July
Collectors (Collectd / Statd / Logstash / Flume)
@cfalguiere#DV14 #Monitoring
Storage / Visualization
Graphite
docker: lopter/collectd-graphite
RESTPlain
@cfalguiere#DV14 #Monitoring
Collect and ShareCollect Once and Share!• Support, !• Ops, Dev!• Business!!
UpToDate!Flexible!!
@cfalguiere#DV14 #Monitoring
Storage / Visualization
Graphite
docker: gsogol/docker-elk
InfluxDB
Kibana
Logstash
ElasticSearch
Grafana
Collectors (Collectd / Statd / Logstash / Flume)
RESTPlain REST REST
@cfalguiere#DV14 #Monitoring
JMX
source: wikipedia
• MBeans!• Registration!• Servo!• RMI and firewalls!• -Dcom.sun.management.jmxremote.rmi.port=p!
• -Djava.rmi.server.hostname=n.n.n.n!
• Jolokia!• jmxtrans!
!
@cfalguiere#DV14 #Monitoring
JMX Collectors
JMX beans
VisualVM!JConsole
logstash collectd
JMX Enabled!!
App Performance Monitoring
tools
storage
Collector
@cfalguiere#DV14 #Monitoring
JSON Event over REST
curl -X POST “…” ! -d '{"ts": "2013-12-17 05:53:16,208", !! "type": “metric”, !! “module”: “Order Creation Service”, !! “module-id”: “456713”, !! “instance”: “shpras2”, !! “thread”: “web1234”, ! “name”: “order-creation”,!! “duration”: “12”, !! “unit”: “ms”}
Timestamp
Context (technical and business)}} Metric)
@cfalguiere#DV14 #Monitoring
What would you do if you knew
why app is slow
@cfalguiere#DV14 #Monitoring
Tuning• Collectd/Statd plugins!• Metrics !• Commercial : Plumbr,
AppDynamics, New Relics!
!
!
Where does it spend time ?!Why ?
System
Back-EndDB
System
System
cross-check metrics from various sub-systems
Front-End
@cfalguiere#DV14 #Monitoring
What would you do if you knew thatusers struggle to find the order form
@cfalguiere#DV14 #Monitoring
Web Analytics / User tracking
• Web analytics!• Page counters!• Tagging!• Log parser!
!
• Google Analytics!• Piwik (docker: cfalguiere/docker-piwik)
• Reporting APIs
@cfalguiere#DV14 #Monitoring
What would you do if you knew what
users want to buy
@cfalguiere#DV14 #Monitoring
Model vs Big Data
• Expected information!• Explicit Model!• List of metrics
• Classification!• Machine Learning!• Patterns detection!
Highlights valuable metrics and relationships
Getting started
@cfalguiere#DV14 #Monitoring
List user stories and
metrics
setup monitoring
get facts
get hypothesis
add metrics
validate hypothesis
get facts
@cfalguiere#DV14 #Monitoring
What should I monitor ?Alerting & Post-Mortem :!Presence check
Activity (how many users, requests, orders …)
Ressources that are limited in size Physical : CPU, memory, free disk space, network bandwidth ...
Logical : pools, queues, caches, …
Errors
Others
@cfalguiere#DV14 #Monitoring
What should I monitor ?Plan & Improve :!Any information which is useful to understand the process
time spent for each major step
things that are done often or requires large datasets
user navigation
context
Listen to users and ops
@cfalguiere#DV14 #Monitoring
Continuous Improvement
Design for Failure
Learn from data
@cfalguiere#DV14 #Monitoring
Thank You
Recommended