View
1.401
Download
0
Category
Tags:
Preview:
DESCRIPTION
Initially presented at OpenWest 2014 conference. Graphite and StatsD gather line series data and offer a robust set of APIs to access that data. While the tools are robust, the dashboards are straight from 1992 and alerting off the data is nonexistent. Nark, an opensource project, solves both of these problems. It provides easy to use dashboards and readily available alerts and notifications to users. It has been used in production at Lucid Software for almost a year. Related to Nark are the tools required to make Graphite highly available.
Citation preview
GRAPHITE:
HIGHLY AVAILABLE
Alyssa Stringham & Matthew Barlocker
About Alyssa
Software Developer at Lucid Software Inc
BYU graduate with Bachelors in Computer Science
I love
Playing the carillon and piano
Fast-paced board games
Hats
Traveling
Playing foosball
About “The Barlocker”
• Chief Architect at Lucid Software Inc
• Bachelors degree from BYU in Computer Science
• I love to
• play board games
• go 4-wheeling
• wrestle my sons
• fly airplanes
• Follow me on nineofclouds.blogspot.com
Tools
Graphite
Graphite is a highly scalable real-time graphing system
Initially developed by Chris Davis at Orbitz.com
Comprised of 3 related projects
Carbon – collects and records metrics
Whisper – Backend storage mechanism
Graphite-Web – HTTP frontend that displays graphs
Written in Python
http://graphite.wikidot.com/
https://github.com/graphite-project/
StatsD
A network daemon that aggregates statistics for
backend services.
Developed by Etsy
Written in Node.js
https://github.com/etsy/statsd/
http://codeascraft.etsy.com/2011/02/15/measure
-anything-measure-everything/
HA Receiver
Used to make StatsD highly available and scalable.
Initially developed by Matthew Barlocker at Lucid
Software Inc
Written in Node
https://github.com/lucidsoftware/statsd-ha-receiver
Nark
Nark is an alerting and dashboard frontend for
Graphite.
Under active development by Lucid Software.
Written in Scala using the Play! Framework
MySQL backed
https://github.com/lucidchart/nark
Demo
Data Flow Overview
Data Flows IN
Applications report
different types of
metrics
StatsD aggregates
metrics
Carbon-cache gathers
and groups metrics
Whisper stores metrics
to disk
Data Flows OUT
User initiates request over HTTP
Graphite-web requests information from carbon-cache
Carbon-cache reads data from disk using whisper
Graphite-web builds graph using data
High Availability & Scaling
StatsD - Options
We can put StatsD in 3 places:
On the reporting server
Scales as well as your reporting servers do
As available as the reporting servers are
Can’t get vital metrics like stats.production.applications.chart.users.login
On a central server
Doesn’t scale
Single point of failure
On a load-balanced set of servers
AWS ELB doesn’t listen on UDP
One stat will be aggregated in multiple places
StatsD - Solution
StatsD with smart-repeater on reporting servers Accepts UDP and sends
TCP for reliability
Reduces chattiness over the wire
Allows aggregation to occur at a centralized location
As scalable and available as the application servers
StatsD - Solution
AWS Elastic Load Balancer distributes traffic to ha-receivers
HA-receivers: Duplicate and transform
metrics
Deliver metrics to correct server for aggregation
Are stateless – they scale horizontally
Are highly available behind the ELB
StatsD - Solution
HA-receivers pass the
data to StatsD
StatsD does the final
aggregation
Every metric has
exactly one StatsD
destination
Aggregated metrics
are sent to carbon
Carbon & Whisper
Carbon and whisper direct data to disk
The daemons are stateless except for buffers
Carbon consists of multiple daemons
Carbon-relay: Direct traffic to other carbon daemons
Carbon-aggregator: A mix between carbon-relay and StatsD
Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper
Whisper is called from carbon-cache, and is short-lived
Carbon & Whisper
We chose to use sharding
Every server holds 1/n metrics, where n = # shards
All servers in a shard hold the same data
Syncing data requires a single rsync
A b-tree of carbon-relays is used to pick a shard
Adding new shards is as easy as adding a new node in
the b-tree of carbon-relays
Retrieving data can be done by checking one server
from every shard
Carbon & Whisper
StatsD sends metrics to the root carbon-relay on localhost
Carbon-relay is setup in a binary tree to pick a shard
Every metric goes to exactly one shard
Every carbon-relay goes to either 1 shard or 2 relays
Carbon & Whisper
Carbon-cache receives
the metrics from the
final relay
Metrics are written to
disk using whisper on
localhost
Carbon-cache has a
last-in-wins policy
graphite-web
Graphite-web is stateless
All state is contained within carbon-cache
Reading data out from a highly available, scalable
graphite installation is the same as reading from a
single server
Use the same ELB as the ha-receiver
Nark
Nark is stateless
All state is contained in MySQL and Graphite
Nark will be no more highly available than your
MySQL and Graphite installations
Use an ELB, an autoscale group, and a multi-AZ RDS
instance
Recap
Questions?
Feature Requests?
Thanks For Your Time
Join The Team
• Building the next generation of collaborative web applications
• VC funded
• High growth rate
• Profitable
• Graduates from Harvard, MIT, Stanford
• Former Google, Amazon, Microsoft employees
https://www.golucid.co/jobs
Recommended