DOAN DuyHai – Cassandra: real world best use-cases and worst anti-patterns - NoSQl matters...

Preview:

Citation preview

@doanduyhai

Cassandra nice use-cases and worst anti-patterns DuyHai DOAN, Technical Advocate

@doanduyhai

Shameless self-promotion!

2

Duy Hai DOAN Cassandra technical advocate •  talks, meetups, confs •  open-source devs (Achilles, …) •  technical point of contact

☞ duy_hai.doan@datastax.com •  production troubleshooting

@doanduyhai

Agenda!

3

Anti-patterns •  Queue-like designs •  CQL null values •  Intensive updates on same column •  Design around dynamic schema

@doanduyhai

Agenda!

4

Nice use-cases •  Rate-limiting •  Anti Fraud •  Account validation •  Sensor data timeseries

Data Model Crash Course!

@doanduyhai

Last Write Win (LWW)!

6

jdoe age name

33 John DOE

INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);

#partition

@doanduyhai

Last Write Win (LWW)!

jdoe age (t1) name (t1)

33 John DOE

7

INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);

auto-generated timestamp (μs)

.

@doanduyhai

Last Write Win (LWW)!

8

UPDATE users SET age = 34 WHERE login = jdoe;

jdoe age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

SSTable1 SSTable2

@doanduyhai

Last Write Win (LWW)!

9

DELETE age FROM users WHERE login = jdoe;

jdoe age (t3)

ý

tombstone

jdoe age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

SSTable1 SSTable2 SSTable3

@doanduyhai

Last Write Win (LWW)!

10

SELECT age FROM users WHERE login = jdoe;

? ? ?

SSTable1 SSTable2 SSTable3

jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

@doanduyhai

Last Write Win (LWW)!

11

SELECT age FROM users WHERE login = jdoe;

✓ ✕ ✕

SSTable1 SSTable2 SSTable3

jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

@doanduyhai

Compaction!

12

SSTable1 SSTable2 SSTable3

jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

New SSTable

jdoe age (t3) name (t1)

ý John DOE

@doanduyhai

Simple Table!

13

CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login));

partition key (#partition)

@doanduyhai

Clustered table (1 – N)!

14

CREATE TABLE sensor_data ( sensor_id text, date timestamp, raw_data blob, PRIMARY KEY((sensor_id), date));

partition key clustering column (sorted)

unicity

@doanduyhai

Sorted on disk!

sensor_id date1(t1) date2(t2) date3(t3) date4(t4) date5(t5) …

… … … … …

SSTable1

Worst anti-patterns!

Queue-like designs!CQL null!

Intensive updates on same column!Design around dynamic schema!

!

@doanduyhai

Failure level!

17

☠☠

☠☠☠

☠☠☠☠

@doanduyhai

Queue-like designs!

18

Adding new message ☞ 1 physical insert

@doanduyhai

Queue-like designs!

19

Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone)

@doanduyhai

Queue-like designs!

20

Adding new message ☞ 1 physical insert Consuming message = deleting it ☞ 1 physical insert (tombstone) Transactional queue = re-inserting messages ☞ physical insert * <many>

@doanduyhai

Queue-like designs!

21

A

FIFO queue

{ A }

@doanduyhai

Queue-like designs!

22

A B

FIFO queue

{ A, B }

@doanduyhai

Queue-like designs!

23

A B C

FIFO queue

{ A, B, C }

@doanduyhai

Queue-like designs!

24

A B C A

FIFO queue

{ B, C }

@doanduyhai

Queue-like designs!

25

A B C A D

FIFO queue

{ B, C, D }

@doanduyhai

Queue-like designs!

26

A B C A D B

FIFO queue

{ C, D }

@doanduyhai

Queue-like designs!

27

A B C A D B C

FIFO queue

{ D }

@doanduyhai

Queue-like designs!

28

A A A A A A A A A A

FIFO queue, worst case

{ }

@doanduyhai

Failure level!

29

☠☠☠

@doanduyhai

Queue-like designs!

30

Solution: event-sourcing •  write ahead, never delete •  read = move a cursor forward (or backward in time for history)

A B C D A E

Write cursor

Read cursor. Next read will give {A, E}

@doanduyhai

CQL null semantics!

31

Reading null value means •  value does not exist (has never bean created) •  value deleted (tombstone)

SELECT age FROM users WHERE login = jdoe; à NULL

@doanduyhai

CQL null semantics!

32

Writing null means •  delete value (creating tombstone) •  even though it does not exist

UPDATE users SET age = NULL WHERE login = jdoe;

@doanduyhai

CQL null semantics!

33

Seen in production: prepared statement

UPDATE users SET age = ?, … geo_location = ?, mood = ?, … WHERE login = ?;

@doanduyhai

CQL null semantics!

34

Seen in production: bound statement

preparedStatement.bind(33, …, null, null, null, …);

null ☞ tombstone creation on each update …

jdoe age name geo_loc mood status

33 John DOE ý ý ý

@doanduyhai

Failure level!

35

@doanduyhai

Intensive update!

36

Context •  small start-up •  cloud-based video recording & alarm •  internet of things (sensor) •  10 updates/sec for some sensors

@doanduyhai

Intensive update on same column!

37

Data model

sensor_id value

45.0034

CREATE TABLE sensor_data ( sensor_id long, value double, PRIMARY KEY(sensor_id));

@doanduyhai

Intensive update on same column!

38

Updates

sensor_id value (t1)

45.0034

UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …; UPDATE sensor_data SET value = 47.4182 WHERE sensor_id = …; UPDATE sensor_data SET value = 48.0300 WHERE sensor_id = …;

sensor_id value (t13)

47.4182 sensor_id

value (t36)

48.0300

@doanduyhai

Intensive update on same column!

39

Read

SELECT sensor_value from sensor_data WHERE sensor_id = …;

read N physical columns, only 1 useful … (until compaction)

sensor_id value (t1)

45.0034 sensor_id

value (t13)

47.4182 sensor_id

value (t36)

48.0300

@doanduyhai

Failure level!

40

☠☠

@doanduyhai

Intensive update on same column!

41

Solution 1: leveled compaction! (if your I/O can keep up)

sensor_id value (t1)

45.0034 sensor_id

value (t13)

47.4182 sensor_id

value (t36)

48.0300

sensor_id value (t36)

48.0300

@doanduyhai

Intensive update on same column!

42

Solution 2: reversed timeseries & DateTiered compaction strategy

CREATE TABLE sensor_data ( sensor_id long, date timestamp, value double, PRIMARY KEY((sensor_id), date)) WITH CLUSTERING ORDER (date DESC);

@doanduyhai

Intensive update on same column!

43

Data cleaning by configuration the strategy (base_time_seconds)

SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1;

sensor_id date3(t3) date2(t2) date1(t1) ...

48.0300 47.4182 45.0034 …

@doanduyhai

Design around dynamic schema!

44

Customer emergency call •  3 nodes cluster almost full •  impossible to scale out •  4th node in JOINING state for 1 week •  disk space is filling up, production at risk!

@doanduyhai

Design around dynamic schema!

45

After investigation •  4th node in JOINING state because streaming is stalled •  NPE in logs

@doanduyhai

Design around dynamic schema!

46

After investigation •  4th node in JOINING state because streaming is stalled •  NPE in logs Cassandra source-code to the rescue

@doanduyhai

Design around dynamic schema!

47

public class CompressedStreamReader extends StreamReader { … @Override public SSTableWriter read(ReadableByteChannel channel) throws IOException { … Pair<String, String> kscf = Schema.instance.getCF(cfId); ColumnFamilyStore cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right);

NPE here

@doanduyhai

Design around dynamic schema!

48

The truth is •  the devs dynamically drop & recreate table every day •  dynamic schema is in the core of their design Example:

DROP TABLE catalog_127_20140613; CREATE TABLE catalog_127_20140614( … );

@doanduyhai

Design around dynamic schema!

49

Failure sequence

n1

n2

n4

n3

catalog_x_y

catalog_x_y

catalog_x_y

catalog_x_y

1 4

2

3

5

6

@doanduyhai

Design around dynamic schema!

50

Failure sequence

n1

n2

n4

n3

catalog_x_y

catalog_x_y

catalog_x_y

catalog_x_y

1 4

2

3

5

6

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_z

@doanduyhai

Design around dynamic schema!

51

Failure sequence

n1

n2

n4

n3

1 4

2

3

5

6

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_z

catalog_x_y ????

@doanduyhai

Design around dynamic schema!

52

Nutshell •  dynamic schema change as normal prod operation is not

recommended •  schema AND topology change at the same time is an anti-pattern

@doanduyhai

Failure level!

53

☠☠☠☠

Q & R

! " !

Nice Examples!

Rate limiting!Anti Fraud!

Account Validation!

@doanduyhai

Rate limiting!

56

Start-up company, reset password feature

1) /password/reset

2) SMS with token A0F83E63DB935465CE73DFE….

Phone number Random token

3) /password/new/<token>/<password>

@doanduyhai

Rate limiting!

57

Problem 1 •  account created with premium phone number

@doanduyhai

Rate limiting!

58

Problem 1 •  account created with premium phone number •  /password/reset x 100

@doanduyhai

Rate limiting!

59

« money, money, money, give money, in the richman’s world » $$$

@doanduyhai

Rate limiting!

60

Problem 2 •  massive hack

@doanduyhai

Rate limiting!

61

Problem 2 •  massive hack •  106 /password/reset calls from few accounts

@doanduyhai

Rate limiting!

62

Problem 2 •  massive hack •  106 /password/reset calls from few accounts •  SMS messages are cheap

@doanduyhai

Rate limiting!

63

Problem 2 •  ☞ but not at the 106/per user/per day scale

@doanduyhai

Rate limiting!

64

Solution •  premium phone number ☞ Google libphonenumber

@doanduyhai

Rate limiting!

65

Solution •  premium phone number ☞ Google libphonenumber •  massive hack ☞ rate limiting with Cassandra

@doanduyhai

Cassandra Time To Live!

66

Time to live •  built-in feature •  insert data with a TTL in sec •  expires server-side automatically •  ☞ use as sliding-window

@doanduyhai

Rate limiting in action!

67

Implementation •  threshold = max 3 reset password per sliding 24h per

user

@doanduyhai

Rate limiting in action!

68

Implementation •  when /password/reset called •  check threshold •  reached ☞ error message/ignore

•  not reached ☞ log the attempt with TTL = 86400

Rate Limiting Demo

@doanduyhai

Anti Fraud!

70

Real story •  many special offers available •  30 mins international calls (50 countries)

•  unlimited land-line calls to 5 countries •  …

@doanduyhai

Anti Fraud!

71

Real story •  each offer has a duration (week/month/year) •  only one offer active at a time

@doanduyhai

Anti Fraud!

72

Cassandra TTL •  when granting new offer

INSERT INTO user_special_offer(login, offer_code, …) VALUES(‘jdoe’, ’30_mins_international’,…) IF NOT EXISTS USING TTL <offer_duration>;

Anti Fraud Demo

@doanduyhai

Account Validation!

74

Requirement •  user creates new account •  sends sms/email link with token to validate account •  10 days to validate

@doanduyhai

Account Validation!

75

How to ? •  create account with 10 days TTL

INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33) USING TTL 864000;

@doanduyhai

Account Validation!

76

How to ? •  create random token for validation with 10 days TTL

INSERT INTO account_validation(token, login, name, age) VALUES(‘A0F83E63DB935465CE73DFE…’, ‘jdoe’, ‘John DOE’, 33) USING TTL 864000;

@doanduyhai

Account Validation!

77

On token validation •  check token exist & retrieve user details

SELECT login, name, age FROM account_validation WHERE token = ‘A0F83E63DB935465CE73DFE…’;

•  re-insert durably user details without TTL INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);

Q & R

! " !

Thank You @doanduyhai

duy_hai.doan@datastax.com

https://academy.datastax.com/

Recommended