37
Scaling Automated Database Monitoring at Uber … with M3 and Prometheus Richard Artoul

Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Scaling Automated Database Monitoring at Uber… with M3 and Prometheus

Richard Artoul

Page 2: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

01 Automated database monitoring02 Why scaling automated monitoring is hard03 M3 architecture and why it scales so well04 How you can use M3

Agenda

Page 3: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Uber’s “Architecture”

2015 20194000+ microservices -

Most of which directly or

indirectly depend on

storage

22+ storage technologies -

Ranging from C* to MySQL

1000’s of dedicated servers

running databases - Monitoring

all of these is hard

Page 4: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Monitoring Databases

ApplicationHardware

Technology

Application

Page 5: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Hardware Level Metrics

Page 6: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Technology Level Metrics

E

E

Page 7: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Application Level Metrics

● Number of successes, errors, and latency broken down by○ All queries against a given database○ All queries issue by a specific service○ A specific query

■ SELECT * FROM TABLE WHERE USER_ID = ?

Page 8: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per
Page 9: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

What’s so hard about that?

Page 10: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Monitoring Applications at Scale

1200 Microservices w/ dedicated storage

100 Instances per service

20 Instances per DB cluster

20 Queries per service

10 Metrics per query

480 million unique time series

X100+ million dollars!

Page 11: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Writes per second/s(post replication)

110MDatapoints emitted pre-aggregation

800M

Unique Metric IDs

9BDatapoints read per second

200B

Workload

Page 12: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

How do we do it?

Page 13: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

A Brief History of M3

● 2014-2015: Graphite + WhisperDB

○ No replication, operations were ‘cumbersome’

● 2015-2016: Cassandra

○ Solved operational issues

○ 16x YoY Growth

○ Expensive (> 1500 Cassandra Hosts)

○ Compactions => R.F=2

● 2016-Today: M3DB

Page 14: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3DB Overview

Page 15: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3DB

An open source distributed time series database

● Store arbitrary timestamp precision data points at any resolution for any

retention

● Tag (key/value pair) based inverted index

● Optimized file-system storage with minimal need for compactions of time

series data for real-time workloads

Page 16: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

High-Level Architecture

Like a Log Structured Merge Tree (LSM)

Except a typical LSM has levelled or size based compaction

M3DB has time window compaction

Page 17: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Topology and Consistency

● Strong consistent topology (using etcd)

○ No gossip

○ Replicated with zone/rack aware layout and configurable replication

factor

● Consistency managed via synchronous quorum writes and reads

○ Configurable consistency level for both read and write

Page 18: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Cost Savings and Performance

● Disk Usage in 2017

○ ~1.4PB for Cassandra at R.F=2

○ ~200TB for M3DB at R.F=3

● Much higher throughput per node with M3DB

○ Hundreds of thousands of writes/s on commodity hardware

Page 19: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

But what about the index?

Page 20: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Centralized Elasticsearch Index

● Actual time series data was stored in Cassandra and the M3DB

● But indexing of data (for querying) was handled by Elasticsearch

● Worked for us for a long time

● … but scaling writes and reads required a lot of engineering

Page 21: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Elasticsearch Index - Write Path

m3agg

E.S

Indexer

Redis Cache

“Don’t write cache”Influx of new metrics:1. Large Service

Deployment2. Datacenter Failover

M3DB

Page 22: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Elasticsearch Index - Read Path

Query

E.S

M3DB

1

2

Page 23: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Elasticsearch Index - Read Path

Query

E.S

Redis Query Cache

Need high T.T.L to prevent overwhelming E.S

… but high T.T.L means long delay for new time series to become queryable

Page 24: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Elasticsearch Index - Read Path

Query

Redis Query Cache E.S Short

E.S Long

Redis Query Cache

Short TTL

Merge on read

Long TTL

Page 25: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Elasticsearch Index - Final Breakdown

● Two elasticsearch clusters with separate configuration

● Two query caches

● Two don’t-write caches

● A stateful indexing tier that requires consistent hashing, an in-memory

cache, and breaks everything if you deploy it too quickly

● Another service just for automatically rotating the short-term elasticsearch

cluster indexes

● A background process that’s always running and trying to delete stale

documents from the long term index

Page 26: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Inverted Index

● Not nearly as expressive or feature-rich as Elasticsearch

● … but like M3DB, it’s designed for high throughput

● Temporal by nature (T.T.Ls are cheap)

● Fewer moving parts

Page 27: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Inverted Index

● Primary use-case, support queries in the form:

○ service = “foo” AND

○ endpoint = “bar” AND

○ client_version regexp matches “3.*”

service=”foo” endpoint=”bar” client_version=”3.*”AND AND

client_version=”3.1”

client_version=”3.2”

OR

OR

AND → Intersection

OR → Union

Page 28: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Inverted Index - F.S.Ts

● The inverted index is similar to Lucene in that it uses F.S.T segments to build an

efficient and compressed structure for fast regexp.

● Each time series’ label has its own F.S.T that can be searched to find the offset to

unpack another data structure that contains the set of metrics IDs associated with a

particular label value (postings list)

Encoded Relationships

are --> 4

ate --> 2

see --> 3

Compressed + fast regexp!

Page 29: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Inverted Index - Postings List

● For every term combination in the form of service=”foo” need to store a set of metric

IDs (integers) that match, this is called a postings list.

● Index is broken into blocks and segments, so need to be able to calculate

intersection (AND - across terms) and union (OR - within a term).

12P.M -> 2P.M Index Block

service=”foo” endpoint=”bar” client_version=”3.*”INTERSECT

Intersect → AND

Union → ORINTERSECT

client_version=”3.1”

client_version=”3.2”

Union

Union

Primary data structure for the postings list in M3DB

is the roaring bitmap

Page 30: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Inverted Index - File Structure

────────────────────────────── Time ────────────────────────────────────────▶┌──────────────────────────────────────┐

│/var/lib/m3db/data/namespace-a/shard-0│

└───────────────────┬───────────────┬──┴────────────┬───────────────┬───────────────┐

│Fileset File │Fileset File │Fileset File │Fileset File │

│Block │Block │Block │Block │

└───────────────┴───────────────┴───────────────┴───────────────┘

┌──────────────────────────────────────┐

│/var/lib/m3db/index/namespace-a │

└───────────────────┬──────────────────┴────────────┬───────────────────────────────┐

│Index Fileset File │Index Fileset File │

│Block │Block │

└───────────────────────────────┴───────────────────────────────┘

Page 31: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 Summary

● Time series compression + temporal data and file structures

● Distributed and horizontally scalable by default

● Complexity pushed to the storage layer

Page 32: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

How you can use M3

Page 33: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 and Prometheus

Scalable monitoring

Page 34: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

M3 and Prometheus

Prometheus

My App

Grafana

Alerting

M3

Coordinator

M3DBM3DBM3DB

Directly query M3

using coordinator for

single Grafana

datasource

Page 35: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Roadmap

Page 36: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

Roadmap

● Bring the Kubernetes operator out of Alpha with further lifecycle management

● Arbitrary out of order writes for writing data into the past and backfilling

● Asynchronous cross region replication (for disaster recovery)

● Evolving M3DB into a generic time series database (think event store)

○ Efficient compression of events in the form of Protobuf messages

Page 37: Scaling Automated Database Monitoring at Uber · 2019. 6. 3. · Monitoring Applications at Scale 1200 Microservices w/ dedicated storage 100 Instances per service 20 Instances per

We’re Hiring!

● Want to work on M3DB? We’re hiring!

○ Reach out to me at [email protected]