Upload
sam-newman
View
865
Download
0
Tags:
Embed Size (px)
DESCRIPTION
As presented at GeeCon 2013. We have lots of information available about our systems. CPU, disk IO, orders placed, error rates, users logged in. But typically all these pieces of information are collected, aggregated and stored in very different ways making correlation difficult and increasing the operational overhead of our systems. What if we could treat all of this information as events? What if we could aggregate, store, and report on all of this information as a uniform event stream? This talk will look at emerging trends in the space of log aggregation, monitoring and event streaming to paint a picture for how you too can start to make real use of the information already available to you using nothing more complex than some free, off the shelf Open Source software.
Citation preview
@samnewman#geecon
Surfing The Event StreamSam Newman
ThoughtWorks
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Operational Data
Sunday, 21 July 13
@samnewman#geecon
Operational Data
CPU
Sunday, 21 July 13
@samnewman#geecon
Operational Data
CPU Memory Use
Sunday, 21 July 13
@samnewman#geecon
Operational Data
CPU Memory Use
Threads
Sunday, 21 July 13
@samnewman#geecon
Operational Data
CPU
Disk IO
Memory Use
Threads
Sunday, 21 July 13
@samnewman#geecon
Collection & Display
• sar
• syslog
• collectd
• syslog-ng
• nagios
• ganglia
Sunday, 21 July 13
@samnewman#geecon
Server
Server
Server
Server
Sunday, 21 July 13
@samnewman#geecon
Server
Server
Server
Server
Sunday, 21 July 13
@samnewman#geecon
Server
Server
Server
Server
Sunday, 21 July 13
@samnewman#geecon
Server
Server
Server
Server
Sunday, 21 July 13
@samnewman#geecon
Business Data
Sunday, 21 July 13
@samnewman#geecon
Business Data
Orders Placed
Sunday, 21 July 13
@samnewman#geecon
Business Data
Orders Placed Revenue
Sunday, 21 July 13
@samnewman#geecon
Business Data
Orders Placed Revenue
Fraud Cases
Sunday, 21 July 13
@samnewman#geecon
Business Data
Orders Placed
Bounce Rate
Revenue
Fraud Cases
Sunday, 21 July 13
@samnewman#geecon
How did we handle them?
• Google Analytics
• Data Warehouse Systems
• Log files!
Sunday, 21 July 13
@samnewman#geecon
Something Happened!
Sunday, 21 July 13
@samnewman#geecon
Something Happened!
What Should We Do?
Sunday, 21 July 13
@samnewman#geecon
Something Happened!
What Should We Do?
Sunday, 21 July 13
@samnewman#geecon
Something Happened!
What Should We Do?
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
http://blog.jgc.org/2006/05/what-slashdot-effect-looks-like.html
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Fast
Sunday, 21 July 13
@samnewman#geecon
Fast
And Easy...
Sunday, 21 July 13
@samnewman#geecon
Fast
And Easy...
At Scale
Sunday, 21 July 13
@samnewman#geecon
Aggregation Is Key
Sunday, 21 July 13
@samnewman#geecon
Mark McGranaghan: "Logs as Data"
http://blip.tv/clojure/mark-mcgranaghan-logs-as-data-5953857
Sunday, 21 July 13
@samnewman#geecon
Paul Ingles: "Users as Data"
http://vimeo.com/45136211
Sunday, 21 July 13
@samnewman#geecon
Log Stash + Graylog2
Sunday, 21 July 13
@samnewman#geecon
Log Stash + Graylog2
Sunday, 21 July 13
@samnewman#geecon
Log Stash + Graylog2
Sunday, 21 July 13
@samnewman#geecon
Log Stash + Graylog2
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Graphite
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
www01.cpuUsage 42 1286269200
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
???
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Graphite
Sunday, 21 July 13
@samnewman#geecon
Graphite
Server
collectd
Sunday, 21 July 13
@samnewman#geecon
Graphite
AppServer
collectd
Sunday, 21 July 13
@samnewman#geecon
Graphite
App
Server
Server
collectd
Sunday, 21 July 13
@samnewman#geecon
Graphite
App
Server
Server
collectd Yammer Metrics
Sunday, 21 July 13
@samnewman#geecon
Graphite
App
Server
Server
collectd Yammer Metrics
Sunday, 21 July 13
@samnewman#geecon
Volume!
Sunday, 21 July 13
@samnewman#geecon
Aggregation!
Sunday, 21 July 13
@samnewman#geecon
www01.cpuUsage 42 1286269200
Sunday, 21 July 13
@samnewman#geecon
orderplaced 1 1286269200
Sunday, 21 July 13
@samnewman#geecon
orderplaced 1 1286269200
orderplaced 1 1286269200
Sunday, 21 July 13
@samnewman#geecon
orderplaced 1 1286269200
orderplaced 1 1286269200
orderplaced = 1
Sunday, 21 July 13
@samnewman#geecon
StatsD
Sunday, 21 July 13
@samnewman#geecon
Counters
ordersplaced:1|c
Sunday, 21 July 13
@samnewman#geecon
timings
orderduration:140|ms
Sunday, 21 July 13
@samnewman#geecon
StatsD
Client Client
Graphite
Sunday, 21 July 13
@samnewman#geecon
StatsD
Client Client
Graphite
Sunday, 21 July 13
@samnewman#geecon
StatsD
Client Client
Graphite
Sunday, 21 July 13
@samnewman#geecon
Riemann
Sunday, 21 July 13
@samnewman#geecon
Riemann
Sunday, 21 July 13
@samnewman#geecon
Riemann
Sunday, 21 July 13
@samnewman#geecon
Riemann
Sunday, 21 July 13
@samnewman#geecon
Riemann
Client Client
Graphite
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
(service "api req") (percentiles 5 [0.5 0.95 0.99] index))
Sunday, 21 July 13
@samnewman#geecon
(service "api req") (percentiles 5 [0.5 0.95 0.99] index))
Sunday, 21 July 13
@samnewman#geecon
(def tell-ops (rollup 5 3600 (email "[email protected]")))
(streams (where (state "critical") tell-ops))
Sunday, 21 July 13
@samnewman#geecon
(let [client (tcp-client :host "aggregator")] (by [:host :service] (changed :state (forward client))))
Sunday, 21 July 13
@samnewman#geecon
Riemann Server
Client Client
Sunday, 21 July 13
@samnewman#geecon
Riemann Server
Client Client
Riemann Server
Client Client
Sunday, 21 July 13
@samnewman#geecon
Riemann Server
Client Client
Riemann Server
Client Client
Riemann Server
Sunday, 21 July 13
@samnewman#geecon
So What Do We Have?
Sunday, 21 July 13
@samnewman#geecon
Server Server
GraphiteGraylog 2
Server
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Server Server
Graphite Graylog 2Dashboard A
Dashboard B
Dashboard C
Server
Sunday, 21 July 13
@samnewman#geecon
Server Server
StatsD/Riemann
Graylog 2
Graphite
Dashboard A
Dashboard B
Dashboard C
Sunday, 21 July 13
@samnewman#geecon
http://shopify.github.io/dashing/
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Data is lost!
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Data is lost!
Sunday, 21 July 13
@samnewman#geecon
Real-time metrics requires upfront
knowledge
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Lossless Event Store
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Lossless Event Store
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Lossless Event Store
HadoopHBase
Cassandra
Sunday, 21 July 13
@samnewman#geecon
Riemann Server
Client Client
Sunday, 21 July 13
@samnewman#geecon
Riemann Server
Client Client
Lossless Event Store
Sunday, 21 July 13
@samnewman#geecon
Event Sourcing
Sunday, 21 July 13
@samnewman#geecon
But...
Sunday, 21 July 13
@samnewman#geecon
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
Can I have one view?
Lossless Event Store
RealtimeAggregator
Sunday, 21 July 13
@samnewman#geecon
http://nathanmarz.com/
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Up to date, but only for a small window
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Consistent, but out of date
Up to date, but only for a small window
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Unified Query
Consistent, but out of date
Up to date, but only for a small window
Sunday, 21 July 13
@samnewman#geecon
Lossless Event Store
Realtime Aggregator
Lambda Architecture
Unified Query
Consistent, but out of date
Up to date, but only for a small window
Sunday, 21 July 13
@samnewman#geecon
The Future?
Sunday, 21 July 13
@samnewman#geecon
Server Server
Aggregating Relay
Graphite
Graylog 2
Hadoop
Sunday, 21 July 13
@samnewman#geecon
Server Server
Aggregating Relay
Graphite
Graylog 2
Hadoop
Unified Query
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
All Your Data
Sunday, 21 July 13
@samnewman#geecon
All Your Data
In Realtime
Sunday, 21 July 13
@samnewman#geecon
All Your Data
In Realtime
Sunday, 21 July 13
@samnewman#geeconSunday, 21 July 13
@samnewman#geecon
Find and free your data
Sunday, 21 July 13
@samnewman#geecon
Find and free your data
Start simple
Sunday, 21 July 13
@samnewman#geecon
Find and free your data
Start simple
Create different views for different stakeholders
Sunday, 21 July 13
@samnewman#geecon
Find and free your data
Start simple
Create different views for different stakeholders
Don’t be scared of real-time!
Sunday, 21 July 13