138
Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil

Libon cassandra summiteu2014

Embed Size (px)

DESCRIPTION

Lessons learnd, billions of contacts data from SQL to Cassandra

Citation preview

Page 1: Libon cassandra summiteu2014

Billion Records from SQL to Cassandra, lessons learned DuyHai Doan Brice Dutheil

Page 2: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Who are we ?

Brice Dutheil

Mockito Java Track Lead @ Devoxx France Independant contractor @ Libon (Orange-Vallée)

DuyHai Doan

Achilles Cassandra Technical Advocate Former Java Developer @ Libon

2

Page 3: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Agenda •  Libon context

•  Migration strategy

•  Business code migration

•  Data Modeling

•  Take Away

3

Page 4: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Libon Context

Page 5: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

What is Libon ? •  Messaging app

•  VOIP (out)

•  Custom voicemail & greetings

•  SMS/chat/file transfer

•  Contacts matching

5

Page 6: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

6

Libon User

Page 7: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

7

Libon User Friend

Page 8: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

8

Libon User Friend

Contact matching

Page 9: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Contact Matching

9

Libon User Friend

Accept link

Page 10: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  Application grew over the years

10

Page 11: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  Application grew over the years

•  Already using Cassandra to handle events

•  messaging / file sharing / SMS / notifications

•  Cassandra R/W latencies ≈ 0,4 ms

•  server response time under 10 ms

11

Page 12: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

12

Page 13: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

13

Page 14: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

14

Page 15: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

15

Page 16: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

•  query latency unpredictable

16

Page 17: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil 17

Page 18: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

18

Page 19: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

19

Page 20: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

20

Page 21: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

21

Page 22: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

22

Page 23: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

That worked

23

Page 24: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

That worked but …

24

Page 25: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Back-end application

RDBMS Cassandra

25

Page 26: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

26

Page 27: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

27

Page 28: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

•  Going to multi data-centers

28

Page 29: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

29

Page 30: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

30

Page 31: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

•  Linear scaling out

31

Page 32: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Going for Cassandra •  Denormalize (if possible …)

•  Know your business ☞ know your queries

•  Linear scaling out

•  Consistent performance

32

Page 33: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data Migration Strategy

Page 34: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

34

Page 35: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

35

Page 36: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

36

Page 37: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

•  Replay-ability & resume-ability

37

Page 38: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

38

Page 39: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

39

Page 40: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

•  Old contacts migration

40

Page 41: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Strategy •  3 phases

•  Write contacts to both data stores

•  Old contacts migration

•  Switch to Cassandra …

•  … and deprecate SQL

41

Page 42: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 1 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

contactUUID

42

contactId … contactUUID 129363 123e4567-

e89b-12d3… 834849

contacId(long) + contactUUID

Page 43: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 1 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Read

43

Page 44: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

•  On live production, migrate old contacts

44

Old contacts created before phase 1

Page 45: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week

•  On live production, migrate old contacts

45

Old contacts created before phase 1

Page 46: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

USING TIMESTAMP now() - 1 week 😳

46

Page 47: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

47

Page 48: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

48

Page 49: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

•  … and updates from production for the same contact

49

Page 50: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Insert from batch (to the past)

Update from production

50

Page 51: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Future reads pick the most up-to-date value

51

Page 52: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 2

"Write to the Past… to save the Future"

Libon – 2014/10/08

52

Page 53: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Migration Phase 3 Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

❌ 53

Page 54: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Business Code Refactoring

Page 55: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

55

Page 56: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

56

Page 57: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

57

Page 58: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

•  Spring @Transactional everywhere

58

Page 59: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Entities go through Services & Repositories

59

Repositories

Services

ContactEntity

Page 60: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Hibernate is auto-magic

60

Page 61: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Inventory cont. •  Hibernate is auto-magic

•  lazy loading

•  1st level cache

•  N+1 select

61

Repositories

Services

ContactEntity

Page 62: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra

62

Page 63: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra No way !

63

Page 64: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

64

Page 65: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

•  and ≈600+ integration tests

65

Page 66: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  We are TDD aficionados …

66

Page 67: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality •  We are TDD aficionados …

•  … and we love our code coverage

67

Page 68: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Code Quality

"The code coverage is one of your most

valuable technical asset" Libon – since beginning

68

Page 69: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService ContactSync

ContactEntity

n 1 n n

69

Page 70: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

70

Proxy

Page 71: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Repositories

Services

Refactoring Strategy

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

Denorm2 … DenormN Denorm1

71

Proxy

Page 72: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  Use CQRS

•  ContactReadRepository

•  ContactWriteRepository

•  ContactUpdateRepository

•  ContactDeleteRepository

72

Page 73: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactReadRepository

•  direct sequential read

•  no joins

•  1 read ≈ 1 SELECT

73

Page 74: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactWriteRepository

•  write to all denormalized tables

•  using CQL logged batches

•  use TTLs

74

Page 75: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactUpdateRepository

•  read-before-write most of the time 😟

•  rare updates ☞ acceptable perf penalty

75

Page 76: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Refactoring Strategy •  ContactDeleteRepository

•  delete

•  update contact modification date

76

Page 77: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

77

Page 78: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

78

Page 79: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

79

Page 80: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Gatling Output

80

Page 81: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

81

Page 82: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

•  … we are almost there for production

82

Page 83: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data Model

Page 84: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Denormalization, the good •  Support fast reads

•  1 read ≈ 1 SELECT

•  Worthy because mostly read, few updates

84

Page 85: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Denormalization, the bad •  Updating mutable data can be nightmare

•  Data model bound by existing client-facing API

•  Update paths very error-prone without tests

85

Page 86: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

86

Page 87: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

87

user_id always component

of partition key

Page 88: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Scalable design

88

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1

user_id2

user_id3

user_id4

user_id5

Page 89: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Scalable design

89

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1 user_id2

user_id3

user_id4

user_id5

Page 90: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Bloom filters in action

90

•  For some tables, partition key = (user_id, contact_id)

☞ fast look-up, leverages Bloom filters

☞ touches 1 SSTable most of the time

Page 91: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model in detail

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

91

Wide partition Bucketed

Page 92: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

A "queue" story

92

•  contacts_by_modification_date

•  queue-like pattern 😭

Page 93: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

A "queue" story

93

•  contacts_by_modification_date

•  queue-like pattern 😭

☞ buckets to the rescue

user_id:2014-12 date35 date12 … … date47

… … … …

user_id:2014-11 date11 date12 … … date34

… … … …

Page 94: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

94

Page 95: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

95

Page 96: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

•  Read-before write in most update scenarios 😟

96

Page 97: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  In SQL, auto-generated long using sequence

•  In Cassandra, auto-generated timeuuid

97

Page 98: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

98

Page 99: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

99

Page 100: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

•  … but waste of space !

•  because encoded as UTF-8 or ASCII in Cassandra

100

Page 101: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  Long ☞ 8 bytes

•  Long as text(UTF-8: 1 byte) ☞ "digits count" bytes

101

Page 102: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  UUID ☞ 16 bytes

•  32 hex chars + 4 hyphens = 36 chars

•  UUID as text(UTF-8: 1 byte) ☞ 36 bytes

•  Bytes overhead = 36 – 16 = 20 bytes

102

Page 103: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

103

Page 104: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

104

Page 105: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

•  × 109 contacts = 140 GB wasted

😠 105

not even counting replication factor …

Page 106: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

106

Page 107: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

107

Page 108: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

•  Use blobAsBigInt( ) or blobAsUUID( ) to view data

108

Page 109: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Advanced "object mapper"

•  Fluent API

•  Tons of features

•  TDD friendly

109

Page 110: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

110

Page 111: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

111

Page 112: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

112

Page 113: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty checking, why is it important ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

•  and not even counting multiple fields updates …

113

Page 114: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

114

Page 115: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

•  Or just use dynamic plain string statements and get some perf penalty ?

115

Page 116: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dirty check in action

//No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…);

manager.update(proxy);

116

Page 117: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

117

Empty Entity

DirtyMap

Proxy Setters interception

PrimaryKey

Page 118: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Dynamic statements generation

UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=?

118

prepared statements are cached, of course

Page 119: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Insert strategy, what is it ?

119

Page 120: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Simple INSERT prepared statement

INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?);

120

Page 121: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Runtime values binding

•  some columns are optional

preparedStatement.bind(49374,’John DOE’,33, null, null, …, null);

121

Page 122: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Wait … are you saying inserting null in CQL???

😳

122

Page 123: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones

123

Page 124: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones × 7 denormalizations

124

Page 125: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

Inserting null ≡ creating tombstones × 7 denormalizations

× billions of contacts created

😱 125

not even counting replication factor …

Page 126: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

@Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById {

}

126

•  Simple annotation

Page 127: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles •  Runtime dynamic INSERT statement

INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address);

127

prepared statements are cached, of course

Page 128: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

@PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId;

128

•  Remember the contactId ⇄ byte[ ] conversion ?

BYOC ☞ Bring Your Own Codec

Page 129: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles public interface Codec<FROM, TO> { Class<FROM> sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); }

129

Page 130: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

130

2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]

2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]

•  Dynamic logging in action

Page 131: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Achilles

131

•  Dynamic logging

•  runtime activation

•  no need to recompile/re-deploy

•  save us hours of debugging

•  TRACE log level ☞ query tracing

Page 132: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Take Away

Page 133: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

133

Page 134: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

134

Page 135: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

135

Page 136: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

136

Page 137: Libon cassandra summiteu2014

#CassandraSummit @doanduyhai @BriceDutheil

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

•  Mindset shifts for the team

137

Page 138: Libon cassandra summiteu2014

Thank You