28
Need For Time Series Database Pramit Choudhary, ML Engineer @eHarmony

Need for Time series Database

Embed Size (px)

Citation preview

Page 1: Need for Time series Database

Need For Time Series Database

Pramit Choudhary, ML Engineer @eHarmony

Page 2: Need for Time series Database

MotivationSpeed Matters

We want to know, what’s happening NOWUser accessing data through different mobile platform, no patience

Data is scattered aroundMongoDb, Voldemort, Netezza, Hive, Whisper, may be moreFor cross platform analytical work, data is still moved around ( cause of worry )Need for simplifying the Database Tech StackIncrease in complexity as we start tracking more metrics in-regards to Mobile devices

Data-Analytics Use-cases:Most of the time we study data pattern over a period of time

e.g. 1. What are probable times for the user to get matches ? => need to start tracking the amount of time user spends during the day 2. Feature exploration and extraction: What other features could we possibly use ? => more t/f/z/p statistics tests probably ?

Page 3: Need for Time series Database

Re-CAPConsistency: Data remains consistent after the execution of an operation. E.g. Post update all client have the same state of the data.

Availability: Always on ( no downtime)

Partition Tolerance: System continues to function even with no communication with one another

Page 4: Need for Time series Database

Different CombinationsCA : Single Cite cluster, all nodes are always in

contact. e.g. SQL type RDMS

CP : Some data may not be accessible, but the rest is consistent and accurate e.g. MongoDB, HBase, Redis

AP : Available under partitioning, but no guarantee on consistency e.g. Cassandra, Riak, DynamoDb

Page 5: Need for Time series Database

No SQL World• Key-Value Store (Redis, Riak)

• Document Store (MongoDB, Couchbase)

• Column Store (Cassandra, Hbase, OpenTSDB)

• Graph Store (Neo4j, Node.js)

Page 6: Need for Time series Database

Introducing a new DB

OpenTSDBAuthor: Benoit Sigoure @ StumbleUpon

Page 7: Need for Time series Database

What is OpenTSDB?

Open Source Time Series Database

Store trillions of data points

Sucks up all data and keeps going

Never loses precision

Scales using HBase

Note: Using this as an example, better results with KairosDB or InfluxDB. They work on similar principles.

Author: Benoit Sigoure and Chris Larsen

Page 8: Need for Time series Database

Use-CasesMongoDB and Couchbase : user profiles, product catalogs, geospatial, financial products, social media, digital content, gaming, metadata, events, bills and invoices

Hbase and Cassandra : Structured, semi-structured, unstructured data, full table scans, read, intensive operations, time series interval data, geospatial data

Page 9: Need for Time series Database

Other Options

Author: Oliver Hankeln

Page 10: Need for Time series Database

What are Time Series?

Time Series: Data points for an identity over time Typical Identity:

Dotted string: web01.sys.cpu.user.0 ( no concept of filters )

OpenTSDB Identity: Metric: sys.cpu.userTags (name/value pairs): act as filters

host=web01 cpu=0

Author: Benoit Sigoure and Chris Larsen

Page 11: Need for Time series Database

What are Time Series?

Data Point:

Metric + Tags

+ Value: 42

+ Timestamp: 123

„ sys.cpu.user 1234567890 42 host=web01 cpu=0 „

Author: Benoit Sigoure and Chris Larsen Metric Name

Timestamp

Metric value

Filter1

Filter2

Page 12: Need for Time series Database

Architecture

Author: Benoit Sigoure and Chris Larsen

Page 13: Need for Time series Database

Another View

Author: slideshare

Page 14: Need for Time series Database

About TSDsWrite throughput

Are CPU boundedWorst Case: Can handle 2000 points/sec on an old 2006 dual core CPU

Read throughputDepends on the cardinality of a metricTimespan and number of data points retrieved

ReliabilityNo single point of failure no concept of master daemonDependency, needs HBase with zookeeperHas single point of failure if running over HDFS, but none with respect to database.

More info on the Wiki : http://opentsdb.net/faq.html

Page 15: Need for Time series Database

Simplistic View of the Table

Without OpenTSDB Hbase Table Representation

Author: Oliver Hankeln

Page 16: Need for Time series Database

OpenTSDB Magic“Compact columns by concatenation “

Author: Oliver Hankeln

• Tags are put at the end of the row key• Timestamp is normalized on 1hr boundaries

Page 17: Need for Time series Database

Row Key Size

Author: Oliver Hankeln

Page 18: Need for Time series Database

BenchMarksLoad Phase

Page 19: Need for Time series Database

Heavy Read

Page 20: Need for Time series Database

Heavy Read

Page 21: Need for Time series Database

Heavy Range Scan

Page 22: Need for Time series Database

Heavy Inserts

Page 23: Need for Time series Database

Is it being extensively used?

OVH: #3 largest cloud/hosting provider : Monitor everything includes network performance, resource utilization, application performance, customer facing metric

35 servers, 100k writes/s, 25tb raw data5 day moving window of Hbase snapshotRedis cache on top for customer facing data

Page 24: Need for Time series Database

Yahoo: Monitoring application performance and statistics ( 15 servers, 280k writes/s

Arista Networks: High performance network monitoring

5k writes/s uses varnish for caching

MapR

“OpenTSDB is a widely used database intended to store and analyze time-series data. Originally designed for only data center monitoring, poor ingest performance had limited the expansion of its use. This benchmark demonstrates a viable option for new applications, such as IoT and other real-time data-analysis applications, using OpenTSDB running on MapR. “ Ted Dunning, Chief Application Architect

Page 25: Need for Time series Database

Others

Page 26: Need for Time series Database

Some ReferencesBook: TimeSeries Database – Ted Dunning and Ellen Friedman ( https://www.dropbox.com/s/c1zj0l0q0qmfvo8/Time_Series_Databases.pdf?dl=0 )

Benchmarks: https://www.dropbox.com/s/g67yoxwabwb5s0g/PerformanceBenchMark.pdf?dl=0

Lessons learned: http://www.slideshare.net/cloudera/4-opentsdb-hbasecon

Some Comparisons: http://prometheus.io/docs/introduction/comparison/

Page 27: Need for Time series Database

Demo

Page 28: Need for Time series Database

Questions?