48
@PatrickMcFadin Patrick McFadin Chief Evangelist for Apache Cassandra A little Cassandra for the Relational Brain 1

Cassandra for the relational brain - Percona · A little Cassandra for the Relational Brain 1. Relational Modeling ... 'First in a three part series for Cassandra Data Modeling','v

Embed Size (px)

Citation preview

@PatrickMcFadin

Patrick McFadinChief Evangelist for Apache Cassandra

A little Cassandra for the Relational Brain

1

Relational Modeling

Data

Models

Application

Normalized Data

• Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag

CREATE TABLE users ( id number(12) NOT NULL , firstname nvarchar2(25) NOT NULL , lastname nvarchar2(25) NOT NULL, email nvarchar2(50) NOT NULL, password nvarchar2(255) NOT NULL, created_date timestamp(6), PRIMARY KEY (id), CONSTRAINT email_uq UNIQUE (email) );

-- Users by email address indexCREATE INDEX idx_users_email ON users (email);

•Create entity table• Add constraints• Index fields• Foreign Key relationships

CREATE TABLE videos ( id number(12), userid number(12) NOT NULL, name nvarchar2(255), description nvarchar2(500), location nvarchar2(255), location_type int, added_date timestamp, CONSTRAINT users_userid_fk FOREIGN KEY (userid) REFERENCES users (Id) ON DELETE CASCADE, PRIMARY KEY (id) );

Cassandra Modeling

Data

Models

Application

Think Before You ModelOr how to keep doing what you’re already doing

8

Some of the Entities and Relationships in KillrVideo

9

Userid

firstname

lastname

email

password Video

id

name

description

location

preview_image

tagsfeatures

Commentcomment

id

adds

timestamp

posts

timestamp

1

nn

1

1

nn

mrates

rating

• What are your application’s workflows?

• How will I access the data?

• Knowing your queries in advance is NOT optional

• Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries

10

Modeling Queries

Some Application Workflows in KillrVideo

11

User Logs into site

Show basic information about user

Show videos added by a

user

Show comments posted by a

user

Search for a video by tag

Show latest videos

added to the site

Show comments for a video

Show ratings for a

video

Show video and its details

Some Queries in KillrVideo to Support Workflows

12

Users

User Logs into site

Find user by email address

Show basic information about user

Find user by id

Comments

Show comments for a video

Find comments by video (latest first)

Show comments posted by a

user

Find comments by user (latest first)

Ratings

Show ratings for a

video

Find ratings by video

Some Queries in KillrVideo to Support Workflows

13

Videos

Search for a video by tag Find video by tag

Show latest videos

added to the site

Find videos by date (latest first)

Show video and its details

Find video by idShow videos added by a

user

Find videos by user (latest first)

“Static” Table

CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );

Table Name

Column NameColumn CQL Type

Primary Key Designation Partition Key

Insert

INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29');

Table Name Fields

Values

Partition Key: Required

Partition keys

06049cbb-dfed-421f-b889-5f649a0de1ed Murmur3 Hash Token = 7224631062609997448

873ff430-9c23-4e60-be5f-278ea2bb21bd Murmur3 Hash Token = -6804302034103043898

Consistent hash. 64 bit numberbetween 2-63 and 264-1

INSERT INTO videos (videoid, name, userid, description) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling');

INSERT INTO videos (videoid, name, userid, description) VALUES (873ff430-9c23-4e60-be5f-278ea2bb21bd,'Become a Super Modeler’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'Second in a three part series for Cassandra Data Modeling');

“Dynamic” Table

CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );

Partition Key Clustering Column

Users – The Cassandra Way

User Logs into site

Find user by email address

Show basic information about user

Find user by id

CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) );

CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );

Primary key relationship

PRIMARY KEY (tag,videoid)

Primary key relationship

Partition Key

PRIMARY KEY (tag,videoid)

Primary key relationship

Partition Key Clustering Column

PRIMARY KEY (tag,videoid)

Primary key relationship

Partition Key

data model

PRIMARY KEY (tag,videoid)

Clustering Column

-5.6

06049cbb-dfed-421f-b889-5f649a0de1ed

Primary key relationship

Partition Key

2013-05-16 16:50:002013-05-02 12:30:29

873ff430-9c23-4e60-be5f-278ea2bb21bd

PRIMARY KEY (tag,videoid)

Clustering Column

data model49f64d40-7d89-4890-b910-dbf923563a33

2013-06-11 11:00:00

Select

name | description | added_date---------------------------------------------------+----------------------------------------------------------+--------------------------The data model is dead. Long live the data model. | First in a three part series for Cassandra Data Modeling | 2013-05-02 12:30:29-0700

SELECT name, description, added_dateFROM videosWHERE videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed;

FieldsTable Name

Primary Key: Partition Key Required

Controlling OrderCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9);

INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);

Clustering

200510010:99999 12 1 10

200510010:99999 12 1 9

raw_weather_data

-5.6

-5.1

200510010:99999 12 1 8

200510010:99999 12 1 7

-4.9

-5.3

Order By

DESC

Write PathClient INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)

VALUES (‘10010:99999’,2005,12,1,7,-5.3);

year 1wsid 1 month 1 day 1 hour 1

year 2wsid 2 month 2 day 2 hour 2

Memtable

SSTable

SSTable

SSTable

SSTable

Node

Commit Log Data * Compaction *

Temp

Temp

Storage Model - Logical View

2005:12:1:10

-5.6

2005:12:1:9

-5.1

2005:12:1:8

-4.9

10010:99999

10010:99999

10010:99999

wsid hour temperature

2005:12:1:7

-5.310010:99999

SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;

2005:12:1:10

-5.6 -5.3-4.9-5.1

Storage Model - Disk Layout

2005:12:1:9 2005:12:1:810010:99999

2005:12:1:7

Merged, Sorted and Stored Sequentially

SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;

2005:12:1:10

-5.6

2005:12:1:11

-4.9 -5.3-4.9-5.1

Storage Model - Disk Layout

2005:12:1:9 2005:12:1:810010:99999

2005:12:1:7

Merged, Sorted and Stored Sequentially

SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;

2005:12:1:10

-5.6

2005:12:1:11

-4.9 -5.3-4.9-5.1

Storage Model - Disk Layout

2005:12:1:9 2005:12:1:810010:99999

2005:12:1:7

Merged, Sorted and Stored Sequentially

SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;

2005:12:1:12

-5.4

Read PathClient

SSTableSSTable

SSTable

Node

Data

SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;

year 1wsid 1 month 1 day 1 hour 1

year 2wsid 2 month 2 day 2 hour 2

Memtable

Temp

Temp

Query patterns•Range queries• “Slice” operation on disk

Single seek on disk

10010:99999

Partition key for locality

SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;

2005:12:1:10

-5.6 -5.3-4.9-5.1

2005:12:1:9 2005:12:1:8 2005:12:1:7

Query patterns•Range queries• “Slice” operation on disk

Programmers like this

Sorted by event_time2005:12:1:10

-5.6

2005:12:1:9

-5.1

2005:12:1:8

-4.9

10010:99999

10010:99999

10010:99999

weather_station hour temperature

2005:12:1:7

-5.310010:99999

SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;

Other New and Not-so-New-but-different things

CollectionsSet

tags set<varchar>

CQL Type: For Ordering

Column Name CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );

CollectionsSet

List

Column Name

Column Name

CQL Type

CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );

tags set<varchar>

CQL Type: For Ordering

tags set<varchar>

CollectionsSet

List

Map

preview_thumbnails map<text,text>

Column Name

Column Name

CQL Key Type CQL Value Type

Column Name CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );

CQL Type

tags set<varchar>

tags set<varchar>

CQL Type: For Ordering

Aggregates (Sort of)

*As of Cassandra 2.2

•Built-in: avg, min, max, count(<column name>)•Runs on server•Always use with partition key

User Defined Functions

CREATE FUNCTION maxI(current int, candidate int) CALLED ON NULL INPUTRETURNS int LANGUAGE java AS'if (current == null) return candidate; else return Math.max(current, candidate);' ; CREATE AGGREGATE maxAgg(int) SFUNC maxISTYPE intINITCOND null;

CQL Type

Pure Function

SELECT maxAgg(temperature) FROM raw_weather_dataWHERE wsid='10010:99999' AND year = 2005 AND month = 12 AND day = 1

Aggregate usingfunction overpartition

Lightweight Transactions

Don’t overwrite!

INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29’) IF NOT EXISTS;

Lightweight Transactions

No-op. Don’t throw error

CREATE TABLE IF NOT EXISTS videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );

Regular Update

UPDATE videosSET name = 'The data model is dead. Long live the data model.'WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;

Table Name Fields to Update: Not in Primary Key

Primary Key

Lightweight Transactions

Don’t overwrite!

UPDATE videosSET name = 'The data model is dead. Long live the data model.'WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed IF userid = 9761d3d7-7fbd-4269-9988-6cfd4e188678;

Deleting Data

Delete

DELETE FROM videosWHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;

Table Name

Primary Key: Required

Expiring Data

Time To Live = TTL

INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29’) USING TTL = 2592000

Expire Data: 30 Days

Thank you!Bring the questions

Follow me on twitter @PatrickMcFadin