37
Rigorous Cassandra Data Modeling for the Relational Data Architect Artem Chebotko

DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Embed Size (px)

Citation preview

Page 1: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Rigorous Cassandra Data Modeling

for the Relational Data Architect

Artem Chebotko

Page 2: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

1 Cassandra Data and Query Models

2 Rigorous Data Modeling

3 Data Modeling Example

4 From Relational to Cassandra

5 Conclusions

2 © 2015. All Rights Reserved.

Page 3: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Tables with Single-Row Partitions

© 2015. All Rights Reserved. 3

username age address

Alice 28 Santa Clara, CA

Alex 37 Austin, TX

users

id type settings owner

1 phone {gps ⇒ on,

pedometer ⇒ on}

Alice

2 wristband {heart rate ⇒ on, …} Alice

3 thermostat {temp ⇒ 75, …} Alice

4 security {…} Alex

5 phone {…} Alex

sensors

Page 4: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Tables with Single-Row Partitions

CREATE TABLE users (

username TEXT,

age INT,

address TEXT,

PRIMARY KEY(username)

);

SELECT * FROM users

WHERE username = ?;

CREATE TABLE sensors (

id INT,

type TEXT,

settings MAP<TEXT,TEXT>,

owner TEXT,

PRIMARY KEY(id)

);

SELECT * FROM sensors

WHERE id = ?;

© 2015. All Rights Reserved. 4

Page 5: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Tables with Multi-Row Partitions

© 2015. All Rights Reserved. 5

username id type settings age address

Alice 1 phone {gps ⇒ on, …} 28 Santa Clara, CA

Alice 2 wristband {heart rate ⇒ on, …} 28 Santa Clara, CA

Alice 3 thermostat {temp ⇒ 75, …} 28 Santa Clara, CA

Alex 4 security … 37 Austin, TX

Alex 5 phone … 37 Austin, TX

sensors_by_user

AS

C

AS

C

Page 6: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Tables with Multi-Row Partitions

CREATE TABLE sensors_by_user (

username TEXT, age INT STATIC, address TEXT STATIC,

id INT, type TEXT, settings MAP<TEXT,TEXT>,

PRIMARY KEY(username, id)

) WITH CLUSTERING ORDER BY (id ASC);

SELECT * FROM sensors_by_user WHERE username = ?;

SELECT * FROM sensors_by_user WHERE username = ? AND id = ?;

SELECT * FROM sensors_by_user WHERE username = ? AND id > ?

ORDER BY id DESC;

© 2015. All Rights Reserved. 6

Page 7: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Key Observations

• C* Data Model

– Single-row partitions

– Multi-row partitions

• C* Query Model

– Partition key

– Partition and clustering keys

– Range search and ordering on

a clustering key

• Relational Data Model

– Normalized tables

• Relational Query Model

– SQL and relational algebra

– Expressive

– Expensive

© 2015. All Rights Reserved. 7

Page 8: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

1 Cassandra Data and Query Models

2 Rigorous Data Modeling

3 Data Modeling Example

4 From Relational to Cassandra

5 Conclusions

8 © 2015. All Rights Reserved.

Page 9: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Rigorous: Definition and Implications

© 2015. All Rights Reserved. 9

Formal, Well-Defined, Sound

Repeatable, Automatable

Tools, Ease of Use

Wider Adoption

Page 10: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

We Need the Methodology!

© 2015. All Rights Reserved. 10

Conceptual

Data Model

Application

Workflow

Logical

Data Model

Physical

Data Model Mapping Optimization

Page 11: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

n

1

id type

1

datetime

parameter

usernameage address

n

User owns Sensor

records

Measurement

has

n m

settings

value

use

r

follo

wer

Methodology Models

© 2015. All Rights Reserved. 11

Model Representation

Conceptual Data Model ERD

Application Workflow Model Graph

Logical Data Model Chebotko Diagram

Physical Data Model Chebotko Diagram, CQL

CREATE TABLE users (

username TEXT,

age INT,

address TEXT,

PRIMARY KEY(username)

);

Q2

Q1

Display user

information

Find

followers

Display

sensors

Show measurementsin a date range

Show today's hourly

aggregates

Q3

Q3

Q4 Q5

Q4

SELECT * FROM users

WHERE username = ?

SELECT * FROM followers_by_user

WHERE username = ?

SELECT * FROM sensors_by_user

WHERE username = ?

SELECT *

FROM measurements_by_sensor

WHERE id = ? AND parameter = ?

AND datetime > ?

SELECT *

FROM summary_by_sensor

WHERE id = ? AND date = ?

users

username K

age

address

followers_by_user

username K

follower_username C↑

follower_age

follower_address

Q1

sensors_by_user

username K

id C↑

type

<settings>

measurements_by_sensor

id K

week K

parameter K

datetime C↓

value

summary_by_sensor

id K

date K

parameter C↑

hour C↓

avg

...

Q2

Q3

Q4

Q5

Q4

MAP<TEXT,TEXT>

FLOAT

TEXT

TEXT

TEXT

TEXT

INT

TEXT

TIMESTAMP

UUID

INT

TEXT

TEXT

TEXT

TEXT

UUID

UUID

TIMESTAMP FLOAT

INT

TIMESTAMP

users

username K

age

address

followers_by_user

username K

follower_username C↑

follower_age

follower_address

Q1

sensors_by_user

username K

id C↑

type

<settings>

measurements_by_sensor

id K

parameter K

datetime C↓

value

summary_by_sensor

id K

date K

parameter C↑

hour C↓

avg

...

Q2

Q3

Q4

Q5

Q4

Page 12: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Methodology Protocols

© 2015. All Rights Reserved. 12

• Conceptual-to-logical mapping

– Mapping rules

– Mapping patterns

• Physical optimizations

– Partition size analysis

– Duplication factor analysis

– Keys, aggregation, transactions, …

Page 13: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Sample Mapping Pattern

© 2015. All Rights Reserved. 13

ET1

key1.2

attr1.1

attr1.2

ET2_by_ET1_key

key1.1 Kkey1.2 Kkey2.1 C↑key2.2 C↑attr1.1 Sattr1.2 Sattr1.3 (collection) S attr2.1 attr2.2 attr2.3 (collection) attr

RT

attr

1 nkey1.1

ET2

key2.1

attr2.1

attr2.2

key2.2

attr2.3

attr1.3

ACCESS PATTERN search attributes: key1.1 key1.2

ET2_by_ET1_key

key1.1 Kkey1.2 C↑key2.1 C↑key2.2 C↑attr2.1 attr2.2 attr2.3 (collection) attr

= >

PRIMARY KEY:All search attributes, followed by all key

attributes of RT

STATIC COLUMNS:Non-key attributes of

ET1, iff all key attributes of ET1 are

part of the partition keyWhat if we add green attributes

to the above table?

Page 14: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

The Easy Way

© 2015. All Rights Reserved. 14

kdm.dataview.org

• Implements the methodology

– CDM and Query design

– Automated LDM generation

– Automated PDM and CQL generation

Yesterday’s talk:

World’s Best Data Modeling Tool

for Apache Cassandra

Page 15: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

1 Cassandra Data and Query Models

2 Rigorous Data Modeling

3 Data Modeling Example

4 From Relational to Cassandra

5 Conclusions

15 © 2015. All Rights Reserved.

Page 16: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Conceptual Data Model: Fact-Based Model

• Alice is a user

• Alice is 28 y.o.

• Alice wears a wristband

• A wristband is a sensor

• A wristband records a heart rate

• A heart rate is a measurement

• …

© 2015. All Rights Reserved. 16

Page 17: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Conceptual Data Model: Entity-Relationship Model

© 2015. All Rights Reserved. 17

n

1

id type

1

datetime

parameter

usernameage address

n

User owns Sensor

records

Measurement

has

n m

settings

value

use

r

follo

wer

Page 18: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

ACCESS PATTERNSQ1: Find a user with a known usernameQ2: Find followers of a userQ3: Find sensors owned by a userQ4: Find measurements for a sensor in a date rangeQ5: Find daily summary of hourly aggregates

Q2

Q1

Display user

information

Find

followers

Display

sensors

Show measurementsin a date range

Show today's hourly

aggregates

Q3

Q3

Q4 Q5

Q4

Application Workflow

© 2015. All Rights Reserved. 18

Page 19: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Q2

Q1

Display user

information

Find

followers

Display

sensors

Show measurementsin a date range

Show today's hourly

aggregates

Q3

Q3

Q4 Q5

Q4

SELECT * FROM users

WHERE username = ?

SELECT * FROM followers_by_user

WHERE username = ?

SELECT * FROM sensors_by_user

WHERE username = ?

SELECT *

FROM measurements_by_sensor

WHERE id = ? AND parameter = ?

AND datetime > ?

SELECT *

FROM summary_by_sensor

WHERE id = ? AND date = ?

Application Workflow and Queries

© 2015. All Rights Reserved. 19

Page 20: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

users

username K

age

address

followers_by_user

username K

follower_username C↑

follower_age

follower_address

Q1

sensors_by_user

username K

id C↑

type

<settings>

measurements_by_sensor

id K

parameter K

datetime C↓

value

summary_by_sensor

id K

date K

parameter C↑

hour C↓

avg

...

Q2

Q3

Q4

Q5

Q4

Logical Data Model

© 2015. All Rights Reserved. 20

Page 21: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

users

username K

age

address

followers_by_user

username K

follower_username C↑

follower_age

follower_address

Q1

sensors_by_user

username K

id C↑

type

<settings>

measurements_by_sensor

id K

week K

parameter K

datetime C↓

value

summary_by_sensor

id K

date K

parameter C↑

hour C↓

avg

...

Q2

Q3

Q4

Q5

Q4

MAP<TEXT,TEXT>

FLOAT

TEXT

TEXT

TEXT

TEXT

INT

TEXT

TIMESTAMP

UUID

INT

TEXT

TEXT

TEXT

TEXT

UUID

UUID

TIMESTAMP FLOAT

INT

TIMESTAMP

Physical Data Model

© 2015. All Rights Reserved. 21

Page 22: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

1 Cassandra Data and Query Models

2 Rigorous Data Modeling

3 Data Modeling Example

4 From Relational to Cassandra

5 Conclusions

22 © 2015. All Rights Reserved.

Page 23: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Relational Methodology

© 2015. All Rights Reserved. 23

CDM

Normalized

Relational

Relational

LDM

Relational

PDM

Mapping

Optimization

Normalization

Queries

Page 24: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Relational Design Example

© 2015. All Rights Reserved. 24

users

username PK

age

address

followers

username PK, FK

follower_username PK, FK

ownership

username PK, FK

sensor_id PK, FK

measurements

sensor_id PK, FK

parameter PK

datetime PK

value

sensors

sensor_id PK

type

settings

sensor_id PK, FK

setting_name PK

settings_value

Page 25: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Relational-to-Cassandra: Indirect Translation

© 2015. All Rights Reserved. 25

Relational

Data Model

Conceptual

Data Model

Reverse

Engineer

Relational

Application

Application

Workflow

Reverse

Engineer

Apply the C*

Methodology

Page 26: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Reverse Engineering is Almost Straightforward

© 2015. All Rights Reserved. 26

users

username PK

age

address

followers

username PK, FK

follower_username PK, FK

ownership

username PK, FK

sensor_id PK, FK

measurements

sensor_id PK, FK

parameter PK

datetime PK

value

sensors

sensor_id PK

type

User owns Sensor

records Measurement

hassettings

sensor_id PK, FK

setting_name PK

settings_value

has Setting

Page 27: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Relational-to-Cassandra: Direct Translation

© 2015. All Rights Reserved. 27

Relational

Schema

SQL

Queries

Cassandra

Schema

Relational-to-Cassandra

Mapping

Page 28: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Extracting Functional Dependencies

© 2015. All Rights Reserved. 28

username age, address

username, sensor_id username, sensor_id

sensor_id type

username, follower_username username, follower_username

sensor_id, parameter, datetime value

sensor_id, setting_name setting_value

users

username PK

age

address

ownership

username PK, FK

sensor_id PK, FK

measurements

sensor_id PK, FK

parameter PK

datetime PK

value

sensors

sensor_id PK

type

followers

username PK, FK

follower_username PK, FK

settings

sensor_id PK, FK

setting_name PK

settings_value

Page 29: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Entailing New Functional Dependencies

• Armstrong’s Axioms

– Reflexivity: If Y X then X Y (trivial functional dependency)

username, sensor_id username, sensor_id

– Augmentation: If X Y then XZ YZ

username age, address

username, sensor_id age, address, sensor_id

– Transitivity: If X Y and Y Z then X Z

© 2015. All Rights Reserved. 29

Page 30: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

The Idea

Cassandra table schema must satisfy

the original or entailed relational FDs

The best way to verify this is by computing

an attribute closure

© 2015. All Rights Reserved. 30

No kidding!

You better believe

this guy …

Page 31: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

(1) A BC, (2) B F, (3) AD E

AD

{AD}

{ADBC}

{ADBCF}

{ADBCFE}

(trivial)

(1)

(2)

(3)

Computing an Attribute Closure

© 2015. All Rights Reserved. 31

Page 32: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Simple Example

© 2015. All Rights Reserved. 32

Partition key Clustering key Other columns Primary key attribute closure

username age address username, age, address

username age, address

sensor_id type

sensor_id, parameter, datetime value

sensor_id, setting_name setting_value

SELECT age, address FROM users WHERE username = ‘Alice’

username, age, address

Page 33: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Advanced Example

© 2015. All Rights Reserved. 33

SELECT age, type, datetime, value FROM users NATURAL JOIN ownership NATURAL JOIN sensors NATURAL JOIN measurements

WHERE username = ‘Alice’ AND parameter = ‘heart rate’

ORDER BY datetime DESC

Partition key Clustering key Other Primary key attribute closure

username

parameter

datetime ↓

age (S)

type value

username, age, address, parameter,

datetime

username

parameter

datetime ↓

sensor_id ↑

age (S)

type value

username, age, address, sensor_id,

type, parameter, datetime, value

username age, address

sensor_id type

sensor_id, parameter, datetime value

sensor_id, setting_name setting_value

username, age, address, sensor_id,

type, parameter, datetime, value

Page 34: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

© 2015. All Rights Reserved. 34

Page 35: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

1 Cassandra Data and Query Models

2 Rigorous Data Modeling

3 Data Modeling Example

4 From Relational to Cassandra

5 Conclusions

35 © 2015. All Rights Reserved.

Page 36: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Conclusions

• Cassandra data models from scratch

– The methodology: academy.datastax.com

– Automation: kdm.dataview.org

• Cassandra data models from a relational database

– Two approaches to consider

– Ripe for automation

© 2015. All Rights Reserved. 36

Page 37: DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect

Thank you