Upload
dinhquynh
View
221
Download
0
Embed Size (px)
Citation preview
@PatrickMcFadin
Patrick McFadinChief Evangelist for Apache Cassandra
A little Cassandra for the Relational Brain
1
• Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
CREATE TABLE users ( id number(12) NOT NULL , firstname nvarchar2(25) NOT NULL , lastname nvarchar2(25) NOT NULL, email nvarchar2(50) NOT NULL, password nvarchar2(255) NOT NULL, created_date timestamp(6), PRIMARY KEY (id), CONSTRAINT email_uq UNIQUE (email) );
-- Users by email address indexCREATE INDEX idx_users_email ON users (email);
•Create entity table• Add constraints• Index fields• Foreign Key relationships
CREATE TABLE videos ( id number(12), userid number(12) NOT NULL, name nvarchar2(255), description nvarchar2(500), location nvarchar2(255), location_type int, added_date timestamp, CONSTRAINT users_userid_fk FOREIGN KEY (userid) REFERENCES users (Id) ON DELETE CASCADE, PRIMARY KEY (id) );
Some of the Entities and Relationships in KillrVideo
9
Userid
firstname
lastname
password Video
id
name
description
location
preview_image
tagsfeatures
Commentcomment
id
adds
timestamp
posts
timestamp
1
nn
1
1
nn
mrates
rating
• What are your application’s workflows?
• How will I access the data?
• Knowing your queries in advance is NOT optional
• Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries
10
Modeling Queries
Some Application Workflows in KillrVideo
11
User Logs into site
Show basic information about user
Show videos added by a
user
Show comments posted by a
user
Search for a video by tag
Show latest videos
added to the site
Show comments for a video
Show ratings for a
video
Show video and its details
Some Queries in KillrVideo to Support Workflows
12
Users
User Logs into site
Find user by email address
Show basic information about user
Find user by id
Comments
Show comments for a video
Find comments by video (latest first)
Show comments posted by a
user
Find comments by user (latest first)
Ratings
Show ratings for a
video
Find ratings by video
Some Queries in KillrVideo to Support Workflows
13
Videos
Search for a video by tag Find video by tag
Show latest videos
added to the site
Find videos by date (latest first)
Show video and its details
Find video by idShow videos added by a
user
Find videos by user (latest first)
“Static” Table
CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
Table Name
Column NameColumn CQL Type
Primary Key Designation Partition Key
Insert
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29');
Table Name Fields
Values
Partition Key: Required
Partition keys
06049cbb-dfed-421f-b889-5f649a0de1ed Murmur3 Hash Token = 7224631062609997448
873ff430-9c23-4e60-be5f-278ea2bb21bd Murmur3 Hash Token = -6804302034103043898
Consistent hash. 64 bit numberbetween 2-63 and 264-1
INSERT INTO videos (videoid, name, userid, description) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling');
INSERT INTO videos (videoid, name, userid, description) VALUES (873ff430-9c23-4e60-be5f-278ea2bb21bd,'Become a Super Modeler’, 9761d3d7-7fbd-4269-9988-6cfd4e188678, 'Second in a three part series for Cassandra Data Modeling');
“Dynamic” Table
CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );
Partition Key Clustering Column
Users – The Cassandra Way
User Logs into site
Find user by email address
Show basic information about user
Find user by id
CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) );
CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
-5.6
06049cbb-dfed-421f-b889-5f649a0de1ed
Primary key relationship
Partition Key
2013-05-16 16:50:002013-05-02 12:30:29
873ff430-9c23-4e60-be5f-278ea2bb21bd
PRIMARY KEY (tag,videoid)
Clustering Column
data model49f64d40-7d89-4890-b910-dbf923563a33
2013-06-11 11:00:00
Select
name | description | added_date---------------------------------------------------+----------------------------------------------------------+--------------------------The data model is dead. Long live the data model. | First in a three part series for Cassandra Data Modeling | 2013-05-02 12:30:29-0700
SELECT name, description, added_dateFROM videosWHERE videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed;
FieldsTable Name
Primary Key: Partition Key Required
Controlling OrderCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);
Clustering
200510010:99999 12 1 10
200510010:99999 12 1 9
raw_weather_data
-5.6
-5.1
200510010:99999 12 1 8
200510010:99999 12 1 7
-4.9
-5.3
Order By
DESC
Write PathClient INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
year 1wsid 1 month 1 day 1 hour 1
year 2wsid 2 month 2 day 2 hour 2
Memtable
SSTable
SSTable
SSTable
SSTable
Node
Commit Log Data * Compaction *
Temp
Temp
Storage Model - Logical View
2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
wsid hour temperature
2005:12:1:7
-5.310010:99999
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:10
-5.6 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:12
-5.4
Read PathClient
SSTableSSTable
SSTable
Node
Data
SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
year 1wsid 1 month 1 day 1 hour 1
year 2wsid 2 month 2 day 2 hour 2
Memtable
Temp
Temp
Query patterns•Range queries• “Slice” operation on disk
Single seek on disk
10010:99999
Partition key for locality
SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
2005:12:1:10
-5.6 -5.3-4.9-5.1
2005:12:1:9 2005:12:1:8 2005:12:1:7
Query patterns•Range queries• “Slice” operation on disk
Programmers like this
Sorted by event_time2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:7
-5.310010:99999
SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
CollectionsSet
tags set<varchar>
CQL Type: For Ordering
Column Name CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
CollectionsSet
List
Column Name
Column Name
CQL Type
CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
tags set<varchar>
CQL Type: For Ordering
tags set<varchar>
CollectionsSet
List
Map
preview_thumbnails map<text,text>
Column Name
Column Name
CQL Key Type CQL Value Type
Column Name CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
CQL Type
tags set<varchar>
tags set<varchar>
CQL Type: For Ordering
Aggregates (Sort of)
*As of Cassandra 2.2
•Built-in: avg, min, max, count(<column name>)•Runs on server•Always use with partition key
User Defined Functions
CREATE FUNCTION maxI(current int, candidate int) CALLED ON NULL INPUTRETURNS int LANGUAGE java AS'if (current == null) return candidate; else return Math.max(current, candidate);' ; CREATE AGGREGATE maxAgg(int) SFUNC maxISTYPE intINITCOND null;
CQL Type
Pure Function
SELECT maxAgg(temperature) FROM raw_weather_dataWHERE wsid='10010:99999' AND year = 2005 AND month = 12 AND day = 1
Aggregate usingfunction overpartition
Lightweight Transactions
Don’t overwrite!
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29’) IF NOT EXISTS;
Lightweight Transactions
No-op. Don’t throw error
CREATE TABLE IF NOT EXISTS videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) );
Regular Update
UPDATE videosSET name = 'The data model is dead. Long live the data model.'WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;
Table Name Fields to Update: Not in Primary Key
Primary Key
Lightweight Transactions
Don’t overwrite!
UPDATE videosSET name = 'The data model is dead. Long live the data model.'WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed IF userid = 9761d3d7-7fbd-4269-9988-6cfd4e188678;
Delete
DELETE FROM videosWHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;
Table Name
Primary Key: Required
Expiring Data
Time To Live = TTL
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata) VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1, {'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'}, '2013-05-02 12:30:29’) USING TTL = 2592000
Expire Data: 30 Days