Highly Available Graphite

GRAPHITE:

HIGHLY AVAILABLE

Alyssa Stringham & Matthew Barlocker

About Alyssa

Software Developer at Lucid Software Inc

BYU graduate with Bachelors in Computer Science

I love

Playing the carillon and piano

Fast-paced board games

Traveling

Playing foosball

About “The Barlocker”

• Chief Architect at Lucid Software Inc

• Bachelors degree from BYU in Computer Science

• I love to

• play board games

• go 4-wheeling

• wrestle my sons

• fly airplanes

• Follow me on nineofclouds.blogspot.com

Graphite

Graphite is a highly scalable real-time graphing system

Initially developed by Chris Davis at Orbitz.com

Comprised of 3 related projects

Carbon – collects and records metrics

Whisper – Backend storage mechanism

Graphite-Web – HTTP frontend that displays graphs

Written in Python

http://graphite.wikidot.com/

https://github.com/graphite-project/

StatsD

A network daemon that aggregates statistics for

backend services.

Developed by Etsy

Written in Node.js

https://github.com/etsy/statsd/

http://codeascraft.etsy.com/2011/02/15/measure

-anything-measure-everything/

HA Receiver

Used to make StatsD highly available and scalable.

Initially developed by Matthew Barlocker at Lucid

Software Inc

Written in Node

https://github.com/lucidsoftware/statsd-ha-receiver

Nark is an alerting and dashboard frontend for

Graphite.

Under active development by Lucid Software.

Written in Scala using the Play! Framework

MySQL backed

https://github.com/lucidchart/nark

Data Flow Overview

Data Flows IN

Applications report

different types of

metrics

StatsD aggregates

metrics

Carbon-cache gathers

and groups metrics

Whisper stores metrics

to disk

Data Flows OUT

User initiates request over HTTP

Graphite-web requests information from carbon-cache

Carbon-cache reads data from disk using whisper

Graphite-web builds graph using data

High Availability & Scaling

StatsD - Options

We can put StatsD in 3 places:

On the reporting server

Scales as well as your reporting servers do

As available as the reporting servers are

Can’t get vital metrics like stats.production.applications.chart.users.login

On a central server

Doesn’t scale

Single point of failure

On a load-balanced set of servers

AWS ELB doesn’t listen on UDP

One stat will be aggregated in multiple places

StatsD - Solution

StatsD with smart-repeater on reporting servers Accepts UDP and sends

TCP for reliability

Reduces chattiness over the wire

Allows aggregation to occur at a centralized location

As scalable and available as the application servers

StatsD - Solution

AWS Elastic Load Balancer distributes traffic to ha-receivers

HA-receivers: Duplicate and transform

metrics

Deliver metrics to correct server for aggregation

Are stateless – they scale horizontally

Are highly available behind the ELB

StatsD - Solution

HA-receivers pass the

data to StatsD

StatsD does the final

aggregation

Every metric has

exactly one StatsD

destination

Aggregated metrics

are sent to carbon

Carbon & Whisper

Carbon and whisper direct data to disk

The daemons are stateless except for buffers

Carbon consists of multiple daemons

Carbon-relay: Direct traffic to other carbon daemons

Carbon-aggregator: A mix between carbon-relay and StatsD

Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper

Whisper is called from carbon-cache, and is short-lived

Carbon & Whisper

We chose to use sharding

Every server holds 1/n metrics, where n = # shards

All servers in a shard hold the same data

Syncing data requires a single rsync

A b-tree of carbon-relays is used to pick a shard

Adding new shards is as easy as adding a new node in

the b-tree of carbon-relays

Retrieving data can be done by checking one server

from every shard

Carbon & Whisper

StatsD sends metrics to the root carbon-relay on localhost

Carbon-relay is setup in a binary tree to pick a shard

Every metric goes to exactly one shard

Every carbon-relay goes to either 1 shard or 2 relays

Carbon & Whisper

Carbon-cache receives

the metrics from the

final relay

Metrics are written to

disk using whisper on

localhost

Carbon-cache has a

last-in-wins policy

graphite-web

Graphite-web is stateless

All state is contained within carbon-cache

Reading data out from a highly available, scalable

graphite installation is the same as reading from a

single server

Use the same ELB as the ha-receiver

Nark is stateless

All state is contained in MySQL and Graphite

Nark will be no more highly available than your

MySQL and Graphite installations

Use an ELB, an autoscale group, and a multi-AZ RDS

instance

Questions?

Feature Requests?

Thanks For Your Time

Join The Team

• Building the next generation of collaborative web applications

• VC funded

• High growth rate

• Profitable

• Graduates from Harvard, MIT, Stanford

• Former Google, Amazon, Microsoft employees

https://www.golucid.co/jobs

Highly Available Graphite

Software

Highly Available Wide Area Network Design

Table of Contents - ESCNJ · 2012. 8. 14. · Available Edges: L3, L11 Graphite Nebula (GN) Matches WA 4623-60 Graphite Nebula Available Edges: L3, L11 Manitoba Maple (MA) Matches

Enabling Highly Available Grid Sites

20130714 php matsuri - highly available php

Amazon's Highly Available Key-value Store

Varnish highly available

Compacted Graphite Iron A New Material for Highly Stressed ... file1 Compacted Graphite Iron – A New Material for Highly Stressed Cylinder Blocks and Cylinder Heads Vermicular-Graphit-Guss:

Technical Report Highly Available OpenStack Deployments ...community.netapp.com/fukiw75442/attachments/fukiw75442/... · Technical Report Highly Available OpenStack Deployments Built

StarFish : highly-available block storage

ComDB2: Bloomberg’s Highly Available Relational Database ... · PDF fileBloomberg’s Highly Available Relational Database System Alex Scotti, Mark Hannum, Michael Ponomarenko, Dorin

Highly Available Docker Networking With BGP

Highly orientated pyrolytic graphite (HOPG)

Highly Exfoliated Graphite Fluoride as a Precursor for

Highly Available Web Properties in Aws

Operating a Highly Available Cloud Service - …files.meetup.com/1460349/Operating a Highly Available Cloud Service... · Operating a Highly Available Cloud Service Depankar Neogi

Highly Available Oracle, The Unknown Details

Replication (14.4-14.6) by Ramya Balakumar. Highly available services A few systems that provide highly available services : Gossip Bayou Coda

Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable Traffic

David J. Erskine and William J. Nellis- Shock-induced martensitic transformation of highly oriented graphite to diamond

Comparing Highly Available Solutions With Percona