80
@doanduyhai Cassandra nice use-cases and worst anti- patterns DuyHai DOAN, Technical Advocate

Cassandra nice use cases and worst anti patterns

Embed Size (px)

DESCRIPTION

Cassandra nice use cases and worst anti patterns

Citation preview

Page 1: Cassandra nice use cases and worst anti patterns

@doanduyhai

Cassandra nice use-cases and worst anti-patterns DuyHai DOAN, Technical Advocate

Page 2: Cassandra nice use cases and worst anti patterns

@doanduyhai

Agenda!

2

Anti-patterns •  Queue-like designs •  CQL null values •  Intensive update on same column •  Design around dynamic schema

Page 3: Cassandra nice use cases and worst anti patterns

@doanduyhai

Agenda!

3

Nice use-cases •  Rate-limiting •  Anti Fraud •  Account validation •  Sensor data timeseries

Page 4: Cassandra nice use cases and worst anti patterns

Worst anti-patterns!

Queue-like designs!CQL null!

Intensive update on same column!Design around dynamic schema!

!

Page 5: Cassandra nice use cases and worst anti patterns

@doanduyhai

Failure level!

5

☠☠

☠☠☠

☠☠☠☠

Page 6: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

6

Adding new message ☞ 1 physical insert

Page 7: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

7

Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone)

Page 8: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

8

Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone) Transactional queue = re-inserting messages ☞ physical insert * <many>

Page 9: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

9

A

FIFO queue

{ A }

Page 10: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

10

A B

FIFO queue

{ A, B }

Page 11: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

11

A B C

FIFO queue

{ A, B, C }

Page 12: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

12

A B C A

FIFO queue

{ B, C }

Page 13: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

13

A B C A D

FIFO queue

{ B, C, D }

Page 14: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

14

A B C A D B

FIFO queue

{ C, D }

Page 15: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

15

A B C A D B C

FIFO queue

{ D }

Page 16: Cassandra nice use cases and worst anti patterns

@doanduyhai

Queue-like designs!

16

A A A A A A A A A A

FIFO queue, worst case

{ }

Page 17: Cassandra nice use cases and worst anti patterns

@doanduyhai

Failure level!

17

☠☠☠

Page 18: Cassandra nice use cases and worst anti patterns

@doanduyhai

CQL null semantics!

18

Reading null value means •  value does not exist (has never bean created) •  value deleted (tombstone)

SELECT age FROM users WHERE login = ddoan; à NULL

Page 19: Cassandra nice use cases and worst anti patterns

@doanduyhai

CQL null semantics!

19

Writing null means •  delete value (creating tombstone) •  even though it does not exist

UPDATE users SET age = NULL WHERE login = ddoan;

Page 20: Cassandra nice use cases and worst anti patterns

@doanduyhai

CQL null semantics!

20

Seen in production: prepared statement

UPDATE users SET age = ?, … geo_location = ?, mood = ?, … WHERE login = ?;

Page 21: Cassandra nice use cases and worst anti patterns

@doanduyhai

CQL null semantics!

21

Seen in production: bound statement

preparedStatement.bind(33, …, null, null, null, …);

null ☞ tombstone creation on each update …

jdoe age name geo_loc mood status

33 John DOE ý ý ý

Page 22: Cassandra nice use cases and worst anti patterns

@doanduyhai

Failure level!

22

Page 23: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update!

23

Context •  small start-up •  cloud-based video recording & alarm •  internet of things (sensor) •  10 updates/sec for some sensors

Page 24: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

24

Data model

sensor_id value

45.0034

CREATE TABLE sensor_data ( sensor_id long, value double, PRIMARY KEY(sensor_id));

Page 25: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

25

Updates

sensor_id value (t1)

45.0034

UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; UPDATE sensor_data SET value = 47.4182 WHERE sensor_id = …; UPDATE sensor_data SET value = 48.0300 WHERE sensor_id = …;

sensor_id value (t13)

47.4182 sensor_id

value (t36)

48.0300

Page 26: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

26

Read

SELECT sensor_value from sensor_data WHERE sensor_id = …;

read N physical columns, only 1 useful …

sensor_id value (t1)

45.0034 sensor_id

value (t13)

47.4182 sensor_id

value (t36)

48.0300

Page 27: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

27

Solution 1: leveled compaction! (if your I/O can keep up)

sensor_id value (t1)

45.0034 sensor_id

value (t13)

47.4182 sensor_id

value (t36)

48.0300

sensor_id value (t36)

48.0300

Page 28: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

28

Solution 2: reversed timeseries & DateTiered compaction strategy

CREATE TABLE sensor_data ( sensor_id long, date timestamp, sensor_value double, PRIMARY KEY((sensor_id), date)) WITH CLUSTERING ORDER (date DESC);

Page 29: Cassandra nice use cases and worst anti patterns

@doanduyhai

Intensive update on same column!

29

Data cleaning by configuration (max_sstable_age_days)

SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1;

sensor_id date3(t3) date2(t2) date1(t1) ...

48.0300 47.4182 45.0034 …

Page 30: Cassandra nice use cases and worst anti patterns

@doanduyhai

Failure level!

30

☠☠

Page 31: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

31

Customer emergency call •  3 nodes cluster almost full •  impossible to scale out •  4th node in JOINING state for 1 week •  disk space is filling up, production at risk!

Page 32: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

32

After investigation •  4th node in JOINING state because streaming is stalled •  NPE in logs

Page 33: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

33

After investigation •  4th node in JOINING state because streaming is stalled •  NPE in logs Cassandra source-code to the rescue

Page 34: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

34

public class CompressedStreamReader extends StreamReader { … @Override public SSTableWriter read(ReadableByteChannel channel) throws IOException { … Pair<String, String> kscf = Schema.instance.getCF(cfId); ColumnFamilyStore cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right);

NPE here

Page 35: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

35

The truth is •  the devs dynamically drop & recreate table every day •  dynamic schema is in the core of their design Example:

DROP TABLE catalog_127_20140613; CREATE TABLE catalog_127_20140614( … );

Page 36: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

36

Failure sequence

n1

n2

n4

n3

catalog_x_y

catalog_x_y

catalog_x_y

catalog_x_y

1 4

2

3

5

6

Page 37: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

37

Failure sequence

n1

n2

n4

n3

catalog_x_y

catalog_x_y

catalog_x_y

catalog_x_y

1 4

2

3

5

6

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_z

Page 38: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

38

Failure sequence

n1

n2

n4

n3

1 4

2

3

5

6

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_y ????

Page 39: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

39

Consequences •  joining node got always stuck •  à cannot extend cluster •  à changing code takes time •  à production in danger (no space left) •  à sacrify analytics data to survive

Page 40: Cassandra nice use cases and worst anti patterns

@doanduyhai

Design around dynamic schema!

40

Nutshell •  dynamic schema change as normal operations is not recommended •  concurrent schema AND topology change is an anti-pattern

Page 41: Cassandra nice use cases and worst anti patterns

@doanduyhai

Failure level!

41

☠☠☠☠

Page 42: Cassandra nice use cases and worst anti patterns

Q & R

! " !

Page 43: Cassandra nice use cases and worst anti patterns

Nice Examples!

Rate limiting!Anti Fraud!

Account Validation!Sensor Data Timeseries!

Page 44: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

44

Start-up company, reset password feature

1) /password/reset

2) SMS with token A0F83E63DB935465CE73DFE….

Phone number Random token

3) /password/new/<token>/<password>

Page 45: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

45

Problem 1 •  account created with premium phone number

Page 46: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

46

Problem 1 •  account created with premium phone number •  /password/reset x 100

Page 47: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

47

« money, money, money, give money, in the richman’s world » $$$

Page 48: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

48

Problem 2 •  massive hack

Page 49: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

49

Problem 2 •  massive hack •  106 /password/reset calls from few accounts

Page 50: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

50

Problem 2 •  massive hack •  106 /password/reset calls from few accounts •  SMS messages are cheap

Page 51: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

51

Problem 2 •  ☞ but not at the 106/per user/per day scale

Page 52: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

52

Solution •  premium phone number ☞ Google libphonenumber

Page 53: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting!

53

Solution •  premium phone number ☞ Google libphonenumber •  massive hack ☞ rate limiting with Cassandra

Page 54: Cassandra nice use cases and worst anti patterns

@doanduyhai

Cassandra Time To Live!

54

Time to live •  built-in feature •  insert data with a TTL in sec •  expires server-side automatically •  ☞ use as sliding-window

Page 55: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting in action!

55

Implementation •  threshold = max 3 reset password per sliding 24h

Page 56: Cassandra nice use cases and worst anti patterns

@doanduyhai

Rate limiting in action!

56

Implementation •  when /password/reset called •  check threshold •  reached ☞ error message/ignore

•  not reached ☞ log the attempt with TTL = 86400

Page 57: Cassandra nice use cases and worst anti patterns

Rate limiting demo

Page 58: Cassandra nice use cases and worst anti patterns

@doanduyhai

Anti Fraud!

58

Real story •  many special offers available •  30 mins international calls (50 countries)

•  unlimited land-line calls to 5 countries •  …

Page 59: Cassandra nice use cases and worst anti patterns

@doanduyhai

Anti Fraud!

59

Real story •  each offer has a duration (week/month/year) •  only one offer active at a time

Page 60: Cassandra nice use cases and worst anti patterns

@doanduyhai

Anti Fraud!

60

Cassandra TTL •  check for existing offer before

SELECT count(*) FROM user_special_offer WHERE login = ‘jdoe’;

Page 61: Cassandra nice use cases and worst anti patterns

@doanduyhai

Anti Fraud!

61

Cassandra TTL •  then grant new offer

INSERT INTO user_special_offer(login, offer_code, …) VALUES(‘jdoe’, ’30_mins_international’,…) USING TTL <offer_duration>;

Page 62: Cassandra nice use cases and worst anti patterns

@doanduyhai

Account Validation!

62

Requirement •  user creates new account •  sends sms/email link with token to validate account •  10 days to validate

Page 63: Cassandra nice use cases and worst anti patterns

@doanduyhai

Account Validation!

63

How to ? •  create account with 10 days TTL

INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33) USING TTL 864000;

Page 64: Cassandra nice use cases and worst anti patterns

@doanduyhai

Account Validation!

64

How to ? •  create random token for validation with 10 days TTL

INSERT INTO account_validation(token, login, name, age) VALUES(‘A0F83E63DB935465CE73DFE…’, ‘jdoe’, ‘John DOE’, 33) USING TTL 864000;

Page 65: Cassandra nice use cases and worst anti patterns

@doanduyhai

Account Validation!

65

On token validation •  check token exist & retrieve user details

SELECT login, name, age FROM account_validation WHERE token = ‘A0F83E63DB935465CE73DFE…’;

•  re-insert durably user details without TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);

Page 66: Cassandra nice use cases and worst anti patterns

@doanduyhai

Sensor Data Timeseries!

66

Requirements •  lots of sensors (103 – 106) •  medium to high insertion rate (0.1 – 10/secs) •  keep good load balancing •  fast read & write

Page 67: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

67

CREATE TABLE sensor_data ( sensor_id text, date timestamp, raw_data blob, PRIMARY KEY(sensor_id, date));

sensor_id date1 date2 date3 date4 …

blob1 blob2 blob3 blob4 …

Page 68: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

68

Problems: •  limit of 2.109 physical columns •  bad load balancing (1 sensor = 1 node) •  wide row spans over many files

sensor_id date1 date2 date3 date4 …

blob1 blob2 blob3 blob4 …

Page 69: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

69

Idea: •  composite partition key: sensor_id:date_bucket •  tunable date granularity: per hour/per day/per month …

CREATE TABLE sensor_data ( sensor_id text, date_bucket int, //format YYYYMMdd date timestamp, raw_data blob, PRIMARY KEY((sensor_id, date_bucket), date));

Page 70: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

70

sensor_id:2014091014 date1 date2 date3 date4 …

blob1 blob2 blob3 blob4 …

Idea: •  composite partition key: sensor_id:date_bucket •  tunable date granularity: per hour/per day/per month …

sensor_id:2014091015 date11 date12 date13 date14 …

blob11 blob12 blob13 blob14 …

Buckets

Page 71: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

71

Advantage: •  distribute load: 1 bucket = 1 node •  limit partition width (max x columns per bucket)

Buckets

sensor_id:2014091014 date1 date2 date3 date4 …

blob1 blob2 blob3 blob4 …

sensor_id:2014091015 date11 date12 date13 date14 …

blob11 blob12 blob13 blob14 …

Page 72: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

72

But how can I select raw data between 14:45 and 15:10 ?

14:45 à ?

15:00 à 15:10

sensor_id:2014091014 date1 date2 date3 date4 …

blob1 blob2 blob3 blob4 …

sensor_id:2014091015 date11 date12 date13 date14 …

blob11 blob12 blob13 blob14 …

Page 73: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing!

73

SELECT * FROM sensor_data WHERE sensor_id = xxx AND date_bucket IN (2014091014 , 2014091015) AND date >= ‘2014-09-10 14:45:00.000‘ AND date <= ‘2014-09-10 15:10:00.000‘

Solution •  use IN clause on partition key component •  with range condition on date column ☞ date column should be monotonic function (increasing/decreasing)

Page 74: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing Caveats!

74

IN clause for #partition is not silver bullet ! •  use scarcely •  keep cardinality low (≤ 5)

n1

n2

n3

n4

n5

n6

n7

n8

coordinator

sensor_id:2014091014

sensor_id:2014091015

Page 75: Cassandra nice use cases and worst anti patterns

@doanduyhai

Bucketing Caveats!

75

IN clause for #partition is not silver bullet ! •  use scarcely •  keep cardinality low (≤ 5) •  prefer // async queries •  ease of query vs perf

n1

n2

n3

n4

n5

n6

n7

n8

n1 Async client

sensor_id:2014091014

sensor_id:2014091015

Page 76: Cassandra nice use cases and worst anti patterns

Q & R

! " !

Page 77: Cassandra nice use cases and worst anti patterns

@doanduyhai

Cassandra developers!

77

If you don’t know, ask for help (me, Cassandra ML, PlanetCassandra, stackoverflow, …)

Rule n°1 !

Page 78: Cassandra nice use cases and worst anti patterns

@doanduyhai

Cassandra developers!

78

Do not blind-guess troubleshooting alone in production

(ask for help, see rule n°1)

Rule n°2 !

Page 79: Cassandra nice use cases and worst anti patterns

@doanduyhai

Cassandra developers!

79

Share with the community (your best use-cases … and worst failures)

Rule n°3 !

http://planetcassandra.org/

Page 80: Cassandra nice use cases and worst anti patterns

Thank You @doanduyhai

[email protected]