Upload
mark-wong
View
11.622
Download
6
Embed Size (px)
DESCRIPTION
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
Citation preview
My Story
• How did I get to collectd?
• What is collectd
• Hacking collectd
• Using collectd with Postgres
• Visualizing the data
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43
Brief background
• Working at a little company called Emma http://myemma.com
• Collect performance data from production systems
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43
What did we have?
• A database with over 1 million database objects• >500,000 tables• >1,000,000 indexes
• Tables alone generate 11,000,000 data point per sample
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43
What did we try?
Only free things:
• Cacti http://www.cacti.net/
• Ganglia http://ganglia.info/
• Munin http://munin-monitoring.org/
• Reconnoiter https://labs.omniti.com/labs/reconnoiter
• Zenoss http://community.zenoss.org/
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43
What doesn’t work
Dependency on RRDtool; can’t handle more than hundreds of thousands ofmetrics (Application Buffer-Cache Management for Performance: Running theWorld’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA2007):
• Cacti
• Ganglia
• Munin
• Reconnoiter
• Zenoss
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43
Reconnoiter almost worked for us
Pro’s:
• Write your own SQL queries to collect data from Postgres
• Used Postgres instead of RRDtool for storing data
• JavaScript based on-the-fly charting
• Support for integrating many other health and stats collection solutions
Con’s:
• Data collection still couldn’t keep up; maybe needed more tuning
• Faster hardware? (using VM’s)
• More hardware? (scale out MQ processes)
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43
Couldn’t bring myself to try anything else
• Hands were tied, no resources available to help move forward.
• Can we build something light weight?
• Played with collectd (http://collectd.org/) while evaluatingReconnoiter
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43
What is collectd?
collectd is a daemon which collects system performancestatistics periodically and provides mechanisms to store thevalues in a variety of ways, for example in RRD files.
http://collectd.org/
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43
Does this look familiar?
Note: RRDtool is an option, not a requirementmarkwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43
What is special about collectd?
From their web site:
• it’s written in C for performance and portability• includes optimizations and features to handle hundreds
of thousands of data sets
• PostgreSQL plugin enables querying the database
• Can collect most operating systems statistics (I say “most” because Idon’t know if anything is missing)
• Over 90 total pluginshttp://collectd.org/wiki/index.php/Table_of_Plugins
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43
collectd data description
• time - when the data was collected
• interval - frequency of data collection
• host - server hostname
• plugin - collectd plugin used
• plugin instance - additional plugin information
• type - type of data collected for set of values
• type instance - unique identifier of the metric
• dsnames - names for the values collected
• dstypes - type of data for values collected (e.g. counter, gauge, etc.)
• values - array of values collected
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43
PostgreSQL plugin configuration
Define custom queries in collectd.conf:
LoadPlugin postgresql
<Plugin postgresql>
<Query magic>
Statement "SELECT magic FROM wizard;"
<Result>
Type gauge
InstancePrefix "magic"
ValuesFrom magic
</Result>
</Query>
...
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 13 / 43
. . . per database.
...
<Database bar>
Interval 60
Service "service_name"
Query backend # predefined
Query magic_tickets
</Database>
</Plugin>
Full details athttp://collectd.org/wiki/index.php/Plugin:PostgreSQL
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43
Hurdles
More meta data:
• Need a way to save schema, table, and index names; can’t differentiatestats between tables and indexes
• Basic support of meta data in collectd but mostly unused
• How to store data in something other than RRDtool
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43
Wanted: additional meta data
Hack the PostgreSQL plugin to create meta data for:
• database - database name (maybe not needed, same asplugin instance)
• schemaname - schema name
• tablename - table name
• indexname - index name
• metric - e.g. blks hit, blks read, seq scan, etc.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43
Another database query for collecting a table statistic
<Query table_stats>
SELECT schemaname, relname, seq_scan
FROM pg_stat_all_tables;
<\Query>
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43
Identify the data
<Result>
Type counter
InstancePrefix "seq_scan"
InstancesFrom "schemaname" "relname"
ValuesFrom "seq_scan"
</Result>
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43
Meta data specific parameters
<Database postgres>
Host "localhost"
Query table_stats
SchemanameColumn 0
TablenameColumn 1
</Database>
Note: The database name is set by what is specified in the <Database>tag, ifit is not retrieved by the query.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43
Example data
• time: 2011-10-20 18:04:17-05
• interval: 300
• host: pong.int
• plugin: postgresql
• plugin instance: sandbox
• type: counter
• type instance: seq scan-pg catalog-pg class
• dsnames: {value}• dstypes: {counter}• values: {249873}
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43
Example meta data
• database: sandbox
• schemaname: pg catalog
• tablename: pg class
• indexname:
• metric: seq scan
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43
Now what?
Hand’s were tied (I think I mentioned that earlier); open sourced work to date:
• collectd forked with patcheshttps://github.com/mwongatemma/collectd
• YAMS https://github.com/myemma/yams
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43
Yet Another Monitoring System
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43
Switching hats and boosting code
Using extracurricular time working on equipment donated to Postgres fromSUN, IBM, and HP to continue proofing collectd changes.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43
How am I going to move the data?
Options from available write plugins; guess which I used:• Carbon - Graphite’s storage API to Whisper
http://collectd.org/wiki/index.php/Plugin:Carbon
• CSV http://collectd.org/wiki/index.php/Plugin:CSV
• Network - Send/Receive to other collectd daemonshttp://collectd.org/wiki/index.php/Plugin:Network
• RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
• RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
• SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
• UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
• Write HTTP - PUTVAL (plain text), JSONhttp://collectd.org/wiki/index.php/Plugin:Write_HTTP
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43
Process of elimination
If RRDtool (wriiten in C) can’t handle massive volumes of data, a PythonRRD like database probably can’t either:
• Carbon
• CSV
• Network
• RRDCacheD
• RRDtool
• SysLog
• UnixSock
• Write HTTP - PUTVAL (plain text), JSON
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43
Process of elimination
Writing to other collectd daemons or just locally doesn’t seem useful at themoment:
• CSV
• Network
• SysLog
• UnixSock
• Write HTTP - PUTVAL (plain text), JSON
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43
Process of elimination
Let’s try CouchDB’s RESTful JSON API!
• CSV
• SysLog
• Write HTTP - PUTVAL (plain text), JSON
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 28 / 43
Random: What Write HTTP PUTVAL data looks like
Note: Each PUTVAL is a single line but is broken up into two lines to fit ontothe slide.
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
interval=10 1251533299:197141504:175136768
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
interval=10 1251533299:10765:12858
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
interval=10 1251533299:5:140
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
interval=10 1251533299:4658:29899
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43
Random: What the Write HTTP JSON data looks like
Note: Write HTTP packs as much data as it can into a 4KB buffer.
[ {
"values": [197141504, 175136768],
"dstypes": ["counter", "counter"],
"dsnames": ["read", "write"],
"time": 1251533299,
"interval": 10,
"host": "leeloo.lan.home.verplant.org",
"plugin": "disk",
"plugin_instance": "sda",
"type": "disk_octets",
"type_instance": ""
}, ... ]
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43
I didn’t know anything about CouchDB at the time
• Query interface not really suited for retrieving data to visualize
• Insert performance not suited for millions of metrics of data over shortintervals (can insert same data into Postgres several orders ofmagnitude faster)
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43
Now where am I going to put the data?
Hoping that using the Write HTTP is still a good choice:• Write an ETL
• Table partitioning logic; creation of partition tables• Transform JSON data into INSERT statements
• Use Postgres
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43
Database design
Table "collectd.value_list"
Column | Type | Modifiers
-----------------+--------------------------+-----------
time | timestamp with time zone | not null
interval | integer | not null
host | character varying(64) | not null
plugin | character varying(64) | not null
plugin_instance | character varying(64) |
type | character varying(64) | not null
type_instance | character varying(64) |
dsnames | character varying(512)[] | not null
dstypes | character varying(8)[] | not null
values | numeric[] | not null
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43
Take advantage of partitioning
At least table inheritance in Postgres’ case; partition data by plugin
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43
Child table
Table "collectd.vl_postgresql"
Column | Type | Modifiers
-----------------+--------------------------+-----------
...
database | character varying(64) | not null
schemaname | character varying(64) |
tablename | character varying(64) |
indexname | character varying(64) |
metric | character varying(64) | not null
Check constraints:
"vl_postgresql_plugin_check" CHECK (plugin::text =
’postgresql’::text)
Inherits: value_list
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43
How much partitioning?
Lots of straightforward options:
• Date
• Database
• Schema
• Table
• Index
• Metric
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43
Back to the ETL
Parameters set for fastest path to working prototype:
• Keeping using HTTP POST (Write HTTP plugin) for HTTP protocoland JSON
• Use Python for built in HTTP Server and JSON parsing (Emma isprimarily a Python shop)
• Use SQLAlchemy/psycopg2
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43
Back again to the ETL
Python didn’t perform; combination of JSON parsing, data transformation,and INSERT performance still several orders of magnitude below acceptablelevels:
• redis to queue data to transform
• lighttpd for the HTTP interface
• fastcgi C program to push things to redis• multi-threaded C program using libpq for Postgres API
• pop data out of redis• table partitioning creation logic• transform JSON data into INSERT statements
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43
Success?
• Table statistics for 1 million tables collect in approximately 12 minutes.
• Is that acceptable?
• Can we go faster?
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43
If you don’t have millions of data
Easier ways to visualize the data:
• RRDtool
• RRDtool compatible front-endshttp://collectd.org/wiki/index.php/List_of_front-ends
• Graphite with the Carbon and Whisper combohttp://graphite.wikidot.com/
• Reconnoiter
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43
__ __
/ \~~~/ \ . o O ( Thank you! )
,----( oo )
/ \__ __/
/| (\ |(
^ \ /___\ /\ |
|__| |__|-"
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 41 / 43
Acknowledgements
Hayley Jane Wakenshaw
__ __
/ \~~~/ \
,----( oo )
/ \__ __/
/| (\ |(
^ \ /___\ /\ |
|__| |__|-"
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 42 / 43
License
This work is licensed under a Creative Commons Attribution 3.0 UnportedLicense. To view a copy of this license, (a) visithttp://creativecommons.org/licenses/by/3.0/us/; or, (b) send aletter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,California, 94105, USA.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43