Scalable Monitoring & Alerting

Preview:

DESCRIPTION

We'd like to share our experience building a scalable solution using Graphite, Grafana, Collectd, Nagios, Logstash, Elasticsearch and Kibana, among others. We believe that it is easy to collect metrics from any one system, and to define alerts on single metrics. We already have this capability in place. However, in complex systems, the real operational challenges arise from the way system components interact. Some of these components live inside our data center, some live outside our data center, and all are updating on differing timelines. The functionality and performance of every component has the potential to change every day. Our challenge is to identify patterns and correlations across multiple systems in our stack. We need to integrate top-down and bottom-up analysis, so we can see, for example, that trial subscription signups (a user metric) fell off at the same time that an internal API call began to fail (an application metric), and it was caused by a database host falling offline (a system metric). When collecting so much data, there is a risk of being overwhelmed and not being able to make sense of it all. In essence, a risk of collecting data but not producing intelligence. We combat this risk by converting our accumulated data into the most visually information-dense format available: graphs. Then we make graphs easy to compare and easy to share. We make them informative at a glance and easy for the team to keep watching. Finally, once we are regularly identifying patterns across our graphs, we should have an automated way to "watch the graphs" in our absence. It is not an AI or a pattern recognition "black box", it should just automate patterns that humans have first validated to be meaningful.

Citation preview

SCALABLEMONITORING & ALERTING

NEW YORK CITY

NEW YORK CITY

EXPERIENCE

THE BEGINNING

NEW YORK CITY

NEW YORK CITY

DEPLOYMENTS

+ =

NEW YORK CITY

DDOS ATTACKS

NEW YORK CITY

THE VISION● Easy publishing of metrics

● Single source of truth

● Democratized access

NEW YORK CITY

CARBON-AGGREGATOR

CARBON-CACHE

WHISPER

FILESYSTEM (rotating disks)

SEPTEMBER 2013

GRAPHITE WEBAPP

NEW YORK CITY

16 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (rotating)

LOAD BALANCER

WEBAPP

JANUARY 2014

16 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (rotating)

WEBAPP

16 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (rotating)

WEBAPP

NEW YORK CITY

CULTURE

NEW YORK CITY

CULTURE

SUPER BOWL 2014

NEW YORK CITY

NEW YORK CITY

NEW YORK CITY

THE PROBLEMS● Reads didn’t scale

● No usage insight

● Dumb dashboards

NEW YORK CITY

8 CARBON-RELAY

LOAD BALANCER

APRIL 2014

8 CARBON-RELAY 8 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (SSDs)

WEBAPP

NEW YORK CITY

8 CARBON-RELAY

LOAD BALANCER

MAY 2014

8 CARBON-RELAY 8 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (SSDs)

GRAPHITE API

NEW YORK CITY

8 CARBON-RELAY

LOAD BALANCER

JUNE 2014

8 CARBON-RELAY

8 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (SSDs)

GRAPHITE API

10 CARBON-AGGREGATOR 10 CARBON-AGGREGATOR

10 CARBON-AGGREGATOR

NEW YORK CITY

USAGE INFORMATION

+

=

GRAPHITE API + +

NEW YORK CITY

NEW YORK CITYGRAFANA

ALERTING

NEW YORK CITY

Core InfrastructureTHRESHOLDS

CRITICAL THRESHOLD

WARNING THRESHOLD

Core InfrastructureALERTING MATH

Core InfrastructureALERTING MATH

SkipRead30 5

Alarm9

Core InfrastructureALERTING MATH

SkipRead30 5

Frequency10 sec

300 secs 50 secs

Alarm9

90 secs

Core InfrastructureNAGIOS INTEGRATION

NEW YORK CITY

NEXT STEPS● Replication

● Scale even more

● More robust alerting logic

NEW YORK CITY

8 CARBON-RELAY

LOAD BALANCER

REPLICATION

8 CARBON-RELAY

8 CARBON-RELAY

10 CARBON-CACHE

WHISPER

FILESYSTEM (SSDs)

GRAPHITE API

10 CARBON-AGGREGATOR 10 CARBON-AGGREGATOR

10 CARBON-AGGREGATOR

Core InfrastructureALERTING LOGIC

= Google Forms Throttling

Core InfrastructureALERTING LOGIC

Trial subscription signups

Internal subscription API calls

Database host offline

User

Application

System

NEW YORK CITYLEARN MORE

QUESTIONS

NEW YORK CITY

Recommended