FOSDEM 2020 Querying millions to billions of metrics with ...€¦ · metrics mostly.Adding more...

@chronosphereio

Querying millions to billions of metrics with M3DB’s index

FOSDEM 2020

@chronosphereio

@roskilli

Previously M3 tech lead at Uber, creator of M3DB.

CTO at Chronosphere.

Member of OpenMetrics.

@chronosphereio

Schema for data you would like to collect and aggregate

● http_requests

Dimensions/Labels

● endpoint (e.g. /api/search)● status_code (e.g. 500)● deploy_version_git_sha (e.g. 25149a04c)

Monitoring: what is a metric?

@chronosphereio

1. Increasing number of regions, containers, k8s pods, tracking deployed version - (cardinality!)

2. Metrics can have arbitrary number of dimensions

3. Building compound index is expensive

Problem

@chronosphereio

1. We have monitoring, it’s awesome and developers are happy with standardized metrics mostly.

Adding more metrics at organizations2. Developers put custom metrics on everything and I am deploying tons of applications in something like Kubernetes, things are ok!

3. Things are on way too on fire, we can’t manage this many things anymore, can everyone just stop please.

@chronosphereio

Timeseries

Timeseries from lots of hosts and container pods

ID Timeseries

1 __name__=cpu_seconds_total, pod=foo-123abc

8 __name__=memory_memfree, pod=foo-123abc

33 __name__=cpu_seconds_total, pod=foo-456def

44 __name__=memory_memfree, pod=foo-456def

45 __name__=cpu_seconds_total, pod=bar-768ghe

58 __name__=memory_memfree, pod=bar-768ghe

… millions .. and if you are unfortunate... billions

@chronosphereio

Aggregate metric cpu_seconds_total

ID Timeseries

@chronosphereio

cpu_seconds_total and pod=foo-(.+)

ID Timeseries

@chronosphereio

Need high flexibility and speed1. Any arbitrary set of dimensions/labels can be

specified for filtering

2. Ideally speed is sub-linear

@chronosphereio

Timeseries column lookup1. Secondary lookup using prefix ordered table

2. Secondary inverted index

Labels Timeseries ID (fingerprint)

__name__=cpu, pod=foo-123abc 1 ID Column key Col value

1 __name__=cpu, pod=foo-123abc {t=...,v=...} ➡

2 __name__=cpu, pod=foo-456def {t=...,v=...} ➡

3 __name__=cpu, pod=bar-123abc {t=...,v=...} ➡Label Label value Timeseries IDs

__name__ cpu 1, 2, 3

pod foo-123abc 1

foo-456def 2

bar-123abc 3

@chronosphereio

Ways to keep timeseries index/data1. Index and data live separately

Lookup and returning timeseries data across processes, typically making network request between the two operations.

2. Index and data live togetherLookup next to timeseries data, send data back directly once matches index query.

@chronosphereio

M3 storage evolution (pre-open release, 2015)

Cassandra

ElasticSearch

AlreadyIndexedCache

Heavy read

QueryQuery

QueryQueryRecently

read cache

1. Fetch index (ES)

2. Fetch data (C*)

>100 servers

>1,000 servers

@chronosphereio

Cassandra

ElasticSearch

AlreadyIndexedCache

Heavy read

QueryQuery

QueryQueryRecently

read cache

>100 servers

>1,000 servers

@chronosphereio

M3DB(data on disk

with LRU caches)

ElasticSearch

AlreadyIndexedCache

Heavy read

QueryQuery

With M3DB 7x less servers from Cassandra, while increasing RF=2 to RF=3

@chronosphereio

M3DB(data on disk

with LRU caches)

ElasticSearch

AlreadyIndexedCache

Heavy read

QueryQuery

@chronosphereio

All read/write caches for data/index now in M3DB nodes

M3 storage evolution (open release, 2018)

M3DB(data and

index on disk with LRU caches)

QueryQuery

@chronosphereio

Inverted index w/ Prometheus

Timeseries IDs 1, 33, 45

Timeseries IDs 8, 44, 58

Timeseries IDs 1, 8

Timeseries IDs 33, 44

Timeseries IDs 45, 58

__name__

cpu_seconds

mem_free

foo-123abc

foo-456def

bar-123abc

https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/index.md

@chronosphereio

Inverted index w/ Prometheushttps://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/index.md

TS IDs 1, 33, 45

TS IDs 8, 44, 58

TS IDs 1, 8

TS IDs 33, 44

TS IDs 45, 58

__name__

cpu_seconds

mem_free

foo-123abc

foo-456def

bar-123abc

ID Timeseries

1 __name__=cpu_seconds, pod=foo-123abc

8 __name__=mem_free, pod=foo-123abc

33 __name__=cpu_seconds, pod=foo-456abc

44 __name__=mem_free, pod=foo-456abc

45 __name__=cpu_seconds, pod=bar-123abc

58 __name__=mem_free, pod=bar-123abc

@chronosphereio

Inverted index w/ PrometheusLabels (name and distinct values entries)

@chronosphereio

Inverted index w/ PrometheusPostings/Timeseries IDs

@chronosphereio

Inverted index w/ PrometheusMatching label valueshttps://github.com/prometheus/prometheus/blob/38d32e06862f6b72700f67043ce574508b5697f0/tsdb/querier.go#L417-L451

vals, err := ix.LabelValues(m.Name)

var res []string

for _, val := range vals {

if m.Matches(val) {

res = append(res, val)

return ix.Postings(m.Name, res...) // Merges postings/timeseries IDs together

@chronosphereio

Inverted index w/ M31. Inverted index more similar to ElasticSearch & Apache Lucene.

2. Instead of storing distinct label values with associated postings, instead stores distinct label values in FST (Finite State Transducer).

3. Instead of storing postings/timeseries IDs as integer sets (one after another), instead stores using Roaring Bitmaps (compressed bitmaps) for fast intersection (across thousands of sets).

@chronosphereio

What is an FST?Like a compressed trie.

Good overview and some examples at https://blog.burntsushi.net/transducers/

Searching data set of wikipedia titles is more than 10x faster than grep.

This matters when you have billions of metrics, i.e. Uber with 11 billion metrics.

@chronosphereio

https://github.com/chronosphereiox/high_cardinality_microbenchmark

Disclaimer: This is only testing one part of much bigger systems, mainly to support architectural choices not for real world performance.

@chronosphereio

Thank you to M3 contributors:…@chronosphere.io, …@uber.com, …@aiven.io, …@cloudera.com, …@linkedin.com and many other great individuals!

Learn more (release 0.15.0 coming soon):

● Slack https://bit.ly/m3slack● Mailing list https://groups.google.com/forum/#!forum/m3db● GitHub https://github.com/m3db/m3● Documentation https://m3db.io● Chronosphere contact@chronosphere.io

Thank you, questions? Come say hi

@chronosphereio

FOSDEM 2020 Querying millions to billions of metrics with ...€¦ · metrics mostly.Adding more...

Documents

MuseScore at FOSDEM 2009

Fosdem 2011 Lpi Why

Crazyflie 2.0 FOSDEM 2016

Org fosdem-2011

Pharo3 at Fosdem

Rustifying the VM Introspection Ecosystem - FOSDEM€¦ · Rustifying the VM Introspection Ecosystem FOSDEM 2020 Dorian Eikenberg Mathieu Tarral. Agenda ... querying its hardware

FOSDEM 2006

(R)Evolution - FOSDEM

MINIX 3 on ARM FOSDEM 2014 - Previous FOSDEM Editions · PDF fileMINIX 3 on ARM FOSDEM 2014. MINIX 3 on ARM FOSDEM 2014 . MINIX history the 80's ... MINIX 3 BeagleBoard-Xm

Fosdem 2013 petra selmer flexible querying of graph data

Fosdem 2012 Closing talk

FOSDEM 2006 presentation

FOSDEM 2012 - wolfSSL · info@yassl.com Technical / Community Update! FOSDEM 2012

StratusLab at FOSDEM'13

Spoilers fosdem-2013

OpenMotics Introduction @ Fosdem 2015

FOSDEM 2014 Overview

Presentation Opsi Fosdem

LLVM Auto-Vectorization - Fosdem

Aptpbo fosdem