17
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist Time Series with Apache Cassandra 1

Time series with apache cassandra strata

Embed Size (px)

DESCRIPTION

This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.

Citation preview

Page 1: Time series with apache cassandra   strata

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist

Time Series with Apache Cassandra

�1

Page 2: Time series with apache cassandra   strata

Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo

Page 3: Time series with apache cassandra   strata

Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL

THRO

UG

HPU

T O

PS/S

EC)

Page 4: Time series with apache cassandra   strata

Uptime• Built to replicate • Resilient to failure • Always on

NONE

Page 5: Time series with apache cassandra   strata

Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking

CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;

Page 6: Time series with apache cassandra   strata

Time series in production• It’s all about “What’s happening” • Data is the new currency

“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*

* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html

Page 7: Time series with apache cassandra   strata

Why Cassandra for Time Series

ScalesResilientGood data modelEfficient Storage Model

What about that?

Page 8: Time series with apache cassandra   strata

Data Model•Weather Station Id and Time

are unique • Store as many as needed

CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');

Page 9: Time series with apache cassandra   strata

Storage Model - Logical View

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

1234ABCD

1234ABCD

1234ABCD

weatherstation_id event_time temperature

2013-04-03 07:04:00

74F1234ABCD

Page 10: Time series with apache cassandra   strata

Storage Model - Disk Layout

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';

Merged, Sorted and Stored Sequentially

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Page 11: Time series with apache cassandra   strata

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F1234ABCD

2013-04-03 07:04:00

74F

2013-04-03 07:05:00 !!74F

2013-04-03 07:06:00 !!75F

Single seek on disk

Page 12: Time series with apache cassandra   strata

Query patterns• Range queries • “Slice” operation on disk

SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';

2013-04-03 07:01:00

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

1234ABCD

2013-04-03 07:04:00

74F

weatherstation_id event_time temperature

1234ABCD

1234ABCD

1234ABCD

Programmers like this

Sorted by event_time

Page 13: Time series with apache cassandra   strata

Ingestion models• Apache Kafka • Apache Flume • Storm • Custom Applications

Apache Kafka

Your totally!killer!application

Page 14: Time series with apache cassandra   strata

Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?

• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID

Your totally!killer!application weatherstation_id='1234ABCD'

weatherstation_id='5678EFGH'

Page 15: Time series with apache cassandra   strata

TimeUUID

• Also known as a Version 1 UUID • Sortable • Reversible

Timestamp to Microsecond + UUID = TimeUUID

04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT

http://www.famkruithof.net/uuid/uuidgen

=

Page 16: Time series with apache cassandra   strata

Way more information

• 5 minute interviews • Use cases • Free training!

!www.planetcassandra.org

Page 17: Time series with apache cassandra   strata

Thank You!

Follow me for more updates all the time: @PatrickMcFadin