Lessons from Automatic Incident Resolution for a Million Databases

View
248
Download
3
Category

Engineering

Preview:

Citation preview

SRECon EU, July 2016Greg Burek

The Twelve-Factor App

Department of Data

PostgresqlRedisKafka

~ Million Databases

Tens of thousands of AWS Instances

Some Databases

Hundreds of AWS Instances

“The goal is to build systems that can scale

linearly with machines & sub-linearly with people” -

Caitie McCaffreyTackling Alert Fatigue

Monitor and alert on your business

Usually, don’t alert on machine specific metrics

Write runbooks and playbooks

Turn playbooks into code

“The goal is not to never get paged, the goal is to never get paged for the

same thing twice” - Astrid Atkinson

Engineering for the long game

Verify monitoring before restarting the world

Circuit breakers

Automation can’t handle the unknown

Wake someone up on exceptions and timeouts

Have a REPL/console

Aggregate and review trends

Humans can break

Automation can be simplistic

Humans + Automation for a resilient and operable

system

1.Monitor and alert on your business

2.Write playbooks3.Make playbooks into automation4.Checks and balances of

automation5.Circuit breakers6.Alert on exceptions and timeouts7.Admin console8.Aggregate and review trends

gregburek@heroku.com@gregburek

State Machines

Recommended

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases

Documents

AUTOMATIC MIGRATION OF DATA TO NOSQL DATABASES USING SERVICE ORIENTED ARCHITECTUREethesis.nitrkl.ac.in/7488/1/145.pdfto NoSQL databases using service oriented architecture. 2.To create

Documents

SAP Databases on Oracle Automatic Storage Management 11g ... · PDF fileSAP Databases on Oracle Automatic Storage Management 11g Release 2 ... Mandatory OS Software Components and

Documents

Automatic Mixed-Signal Design Verification Instrumentation ... · – Scoreboard concept • Message passing to Databases – XML is not data “efficient” but – parsers are widespread

Documents

Incident Command and Incident Management Overview … · Incident Command and Incident Management ... National Incident Management System Command and ... Resource Management 8

Documents

Automatic Physical Design of Databases Systems: An ...ddash/thesis-proposal.pdf · Automatic Physical Design of Databases Systems: An Optimization Theory Approach (Thesis Proposal)

Documents

Automatic Classification of Text Databases Through Query Probing Panagiotis G. Ipeirotis Luis Gravano Columbia University Mehran Sahami E.piphany Inc

Documents

Demonstrating the Operational Utility of Incident-Based ... · and communications network !hat gives you access to on-line databases, downloadable sof~Nare, indispensable justice

Documents

Automatic Discovery of Attributes in Relational Databasesooibc/sigmod11-zmh.pdf · Automatic Discovery of Attributes in Relational Databases ... for identifying strong relationships

Documents

Multimedia Databases Text I. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video

Documents

AUTOMATIC CONVERSION OF RELATIONAL DATABASES …algorithms for the conversion of relational databases into ontologies. The formal structure of a relational database as well as the

Documents

SAP Databases on Oracle Automatic Storage Management 11g Release 2

Documents

Cementation #1 Shaft Incident July 6th 2011. 2 Incident overview Incident Timeline Incident Causation Recommendations Cementation #1 Shaft Incident

Documents

Automatic Overfill Prevention System (AOPS) - Learning ... · White Paper Automatic Overfill Prevention System (AOPS) - Learning from CAPECO incident . Suresh K Ravindran Global Product

Documents

SAP Databases on Oracle Automatic Storage … Databases on Oracle Automatic Storage Management 11g Release 2 Configuration Guidelines for UNIX and Linux Platforms 1 Preface 3 Naming

Documents

Conviction Model for Incident Reaction Architecture Monitoring Based on Automatic Sensors Alert Detection

Documents

CLOSE QUARTERS INCIDENT INVOLVING COLLAROY AND … · Close Quarters Incident, Sydney Harbour, 18 February 2014 iii GLOSSARY OF TERMS Automatic Identification System (AIS) AIS is

Documents

Peter Willett - University of Sheffield Web viewIdentification and conflation of word variants in text databases. The automatic indexing of chemical reaction databases . ... Gardiner,

Documents

Incident Management help topics for printing - … · Toviewyourfavorites,clickFavorites and Dashboards >Incident Management.HPServiceManager. Incident. incident incident incident

Documents

Databases - University of Torontomashiyat/csc309/Lectures/Databases.pdf · –Oracle,Sybase,MySQL,SQLServer , ... NoSQL databases also support automatic replication, meaning that

Documents