Upload
others
View
25
Download
0
Embed Size (px)
Citation preview
Instrumenting a Data Center
2 / 69
3 / 69
4 / 69
The problem:Efficiently monitor hundreds or
thousands of servers
5 / 69
The solution:Automate it!
6 / 69
Automate what?data collection, storage, analysis, &
alerting
7 / 69
Easy!Write a few scripts & store the data in
SQL.
8 / 69
1 server
100 measurements
8,640 per day (once every 10s)
365 days
= 315 million records (points) per year9 / 69
10 servers
100 measurements per server
8,640 per day (once every 10s)
365 days
= 3.2 billion records (points) per year10 / 69
2,000 servers
200 measurements per server
17,280 per day (once every 5s)
365 days
= 2.5 trillion records (points) per year
11 / 69
Time series database is ideal for thistype of data & workload
12 / 69
"time series"
13 / 69
Values and time stampsValue
Time?14 / 69
Values and time stamps
Value
Time?
time value20150422 5:00 PM20150422 6:00 PM
10,020,0
15 / 69
CPU usage every 10 seconds...
Value
Time
16 / 69
Cumulative steps taken...
Step
Value
Step
Count Value
Time
17 / 69
Happy InfluxDB users...
Value
TimeInfluxDB
Users
18 / 69
Time series database
Value
Time
Step
Count Value
Time
Value
TimeInfluxDB
Users
Value
Time?19 / 69
20 / 69
InfluxDB
21 / 69
InfluxDB time series database
22 / 69
InfluxDB time series database
no external dependencies
23 / 69
InfluxDB time series database
no external dependencies
distributed & scalable
24 / 69
InfluxDB time series database
no external dependencies
distributed & scalable
easy to install, use, & maintain
25 / 69
InfluxDB time series database
no external dependencies
distributed & scalable
easy to install, use, & maintain
open source (MIT license)
26 / 69
InfluxDB time series database
no external dependencies
distributed & scalable
easy to install, use, & maintain
open source (MIT license)
written in Go27 / 69
Features SQLlike query language
HTTP(S) API for writes & queries
Supports other protocols (collectd, graphite,opentsdb)
Automated data retention policies
Aggregate data onthefly
28 / 69
Data model...
29 / 69
influx d
30 / 69
influx d
31 / 69
influx d
32 / 69
influx d
measurements
DISK
33 / 69
influx d
measurements
DISK
tags
country ='DEU'
region ='neast'
34 / 69
influx d
country ='DEU'
region ='west'
+
series = measurement + unique tag set
35 / 69
influx d
country ='DEU'
region ='west'
+
series = measurement + unique tag set
OR
country ='DEU'
region ='east'
+
36 / 69
influx d
Points: values in a series
country ='DEU'
region ='east'
+time value20150422 5:00 PM20150422 5:01 PM
10,020,0
country ='USA'
region ='east'
+time value20150422 5:00 PM20150422 5:01 PM
14,038,0
37 / 69
What's indexed?
38 / 69
What's indexed?Metadata:
Measurement names
39 / 69
What's indexed?Metadata:
Measurement names
Tag keys & values
40 / 69
What's indexed?Metadata:
Measurement names
Tag keys & values
Field keys
41 / 69
What's indexed?Metadata:
Measurement names
Tag keys & values
Field keys
Field values by time*
WHERE time > now() 1m
42 / 69
Metadata index is held inmemory atrun time
43 / 69
What's not indexed?
44 / 69
What's not indexed? Field values by value
WHERE value > 2.718
45 / 69
What's not indexed? Field values by value
WHERE value > 2.718
Metadata by time
SHOW MEASUREMENTS WHERE time > now() - 1h
46 / 69
Data retention
47 / 69
48 / 69
InfluxDB hasRetention Policies
49 / 69
How do they work?
50 / 69
All data is written to aretention policy
51 / 69
Each retention policy has aduration specified(between 1 hour and infinite)
(between 1 hour and infinite)
52 / 69
raw(1 day)
cpu_1h(1 week)
shards shards
SELECT mean(value) INTO cpu_1h.cpu FROM raw.cpu WHERE time > now() 1w GROUP BY time(1h)
53 / 69
How to automatedownsampling?
54 / 69
Continuous QueriesStored in the database and run periodically in
the background
55 / 69
raw(1 day)
cpu_1h(1 week)
shards shards
CREATE CONTINUOUS QUERY mycq ON mydbCREATE CONTINUOUS QUERY mycq ON mydb BEGIN BEGIN SELECT mean(value) SELECT mean(value) INTO cpu_1h.cpu INTO cpu_1h.cpu FROM raw.cpu FROM raw.cpu GROUP BY time(1h) GROUP BY time(1h) END END
56 / 69
Replication is handled throughRetention Policies
57 / 69
FunctionsAggregations:
count, distinct, mean, median, sum
count, distinct, mean, median, sum
58 / 69
FunctionsAggregations:
count, distinct, mean, median, sum
Selectors:
bottom, first, last, max, min, percentile, top
bottom, first, last, max, min, percentile, top
59 / 69
FunctionsAggregations:
count, distinct, mean, median, sum
Selectors:
bottom, first, last, max, min, percentile, top
Transformations:
ceiling, derivative, difference, floor, histogram,stddev
stddev60 / 69
Data ingestion
61 / 69
InfluxDB supports: collectd
openTSDB
graphite
62 / 69
Telegraf http://github.com/influxdb/telegraf
Open Source (MIT License)
Also written in Go
Plugin based (~30 currently and growing)
Cross platform
63 / 69
Client libraries
Client libraries Go
Ruby
Java
Python
etc. (http://github.com/influxdb)
64 / 69
HTTP(S) APIWrite data via HTTP using a simpletextbased protocol
cpu,host=serverA value=10.0 1234567890
65 / 69
Visualization
66 / 69
Grafana
67 / 69
Chronograf
68 / 69