22
C* Keys: Partitioning, Clustering, & CrossFit Adam Hutson - Data Architect, DataScale Inc.

C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Embed Size (px)

Citation preview

Page 1: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

C* Keys: Partitioning, Clustering, & CrossFit

Adam Hutson - Data Architect, DataScale Inc.

Page 2: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Who am I & What do we do?

2

Adam Hutson Data Architect @ DataScale -> www.datascale.io DataStax MVP for Apache Cassandra DataScale provides hosted data platforms as a service Offering Cassandra & Spark, with more to come Currently hosted in Amazon & Azure

Page 3: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Overview

Page 4: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

1 Why

2 Partition

3 Partition Key

4 Composite Partition Key

5 Clustering Columns

4

Page 5: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Why give this presentation?

Partitioning & Clustering should be the foundation.

Too often glossed over.

Has the biggest impact to performance of the cluster

5

Page 6: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Partition

Page 7: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Partition Explained• Token values can range from -263 to 263-1.

• Nodes in the cluster/ring are assigned a single

token.

• A node is responsible for the token value and

expands to the previous node’s token.

• A Partitioner decides where a partition key maps onto the cluster/ring.

7

Node #3 is responsible for tokens from -1844674407370955162

to -5534023222112865485

Page 8: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Partition Explained

8

Page 9: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Partition Key

Page 10: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Partition Key ExplainedThe Partition Key is: • responsible for distribution of data amongst the nodes • the first column defined in the PRIMARY KEY

10

Page 11: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Partition Key Explained

11

Page 12: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Partition Key Explained

12

Page 13: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Composite Partition Key

Page 14: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Composite Partition Key ExplainedUsing multiple columns for the token hash value.

14

Page 15: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Composite Partition Key Explained

15

Page 16: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Composite Partition Key Explained

16

Page 17: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Clustering Columns

Page 18: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Clustering Columns ExplainedClustering Columns are:

• responsible for sorting within the partition

• any column added to the Primary Key, past

the first column

18

Page 19: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Clustering Columns ExplainedCan be used for Hierarchical structured data.

19

Page 20: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Clustering Columns ExplainedCan be used for Time Series structured data.

CREATE TABLE member_log

( member text,

workout_date timestamp,

workout_duration text,

PRIMARY KEY (member, workout_date)

) WITH CLUSTERING ORDER BY (workout_date DESC);

20

Page 21: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

© DataStax, All Rights Reserved.

Clustering Columns Explained

21

Page 22: C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cassandra Summit 2016

Thank You! Questions?

Adam Hutson @AdamHutson [email protected] @DataScaleInc