Transcript
Page 1: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Denormalize your data

Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London

Page 2: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

About eBuddy

Page 3: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

XMS

Page 4: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra in eBuddy Messaging Platform

• User Data Service

Page 5: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra in eBuddy Messaging Platform

• User Data Service

• User Discovery Service

Page 6: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra in eBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

Page 7: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra in eBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

Page 8: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra in eBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

• Location-based Discovery

Page 9: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Some Statistics

• Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data

!• 12 million sessions (11 million users plus groups) !

• Almost a billion rows in one column family(inverse social graph)

Page 10: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Page 11: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

The Problem (a “classic”)

Complex Object

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

??

??

??

? ?

Key-Value Store(RDB table, NoSQL, etc.)

Page 12: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Some Strategies

Serialization!

Page 13: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

Page 14: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Decomposition!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

name/ John

addresses/@0/street 123 Main St.

phones/@0/number +31123456789

... ...

Page 15: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Strategies Comparison

✔ ✘ ✔

✔ ✘ ✔

✔ ✔

✘ ✔ ✔

✔ ✔ ✘

Serialization Normalization Decomposition

Single Write

Single Read

Consistent Updates not enforced

Structural Access

Cycles

Page 16: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra

https://github.com/ ebuddy/c-star-path !!

* Artifacts available at Maven Central.

Page 17: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

Page 18: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

Page 19: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

– Good for denormalizing data, can read or write large complex objects with one read or write operation

Page 20: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

How does it work?

Page 21: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !

Page 22: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !

Page 23: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !dao.writeToPath(rowKey, path, pojo);

Page 24: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!

Page 25: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });

Page 26: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Delete

!!dao.deletePath(rowKey, path);

Page 27: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Batch Operations

!BatchContext batch = dao.beginBatch(); !dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); !dao.applyBatch(batch);

Page 28: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !

Page 29: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);

Page 30: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

Page 31: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

Page 32: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

• Step 3:

– Write this map as key-value pairs in the database

Page 33: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

Simplify structure into regular Maps, Lists, and simple values

Page 34: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

Simplify structure into regular Maps, Lists, and simple values

Map

name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>

[0] = <Map>

[1] = <Map>

street="Singel 45"

place="Amsterdam"

street="123 Main"

place="New York"

phones=<List>

[0] = <Map>

name="mobile"

number="+31651234567"

Page 35: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

path value

name/ “John”

birthdate/ “-39080932298”

nickname/ “Jack”

addresses/@0/street “123 Main St.”

addresses/@0/place “New York”

addresses/@1/street “Singel 45”

addresses/@1/place “Amsterdam”

phones/@0/name “mobile”

phones/@1/number "+31651234567"

Example Decomposition - step 2

Page 36: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

Page 37: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

Page 38: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

• Step 3:

– Use Jackson to convert basic structure back into domain object using a TypeReference

Page 39: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Design & Challenges

Page 40: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Path Encoding

• Paths stored as strings

• Forward slashes in paths (but hidden by Path API)

• Path elements are internally URL encoded allowing use of special characters in the implementation

• Special characters: @ for list indices(@0, @1, @2, ...)

Page 41: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: “Shrinking Lists”

➀ Write a list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

Page 42: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list. ➁ Write a shorter list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

Challenge: “Shrinking Lists”

Page 43: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list. ➁ Write a shorter list. ➂ Read the list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”,”2”}

Challenge: “Shrinking Lists”

Page 44: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Solution: Implementation writes a list terminator value.

x/@0/ “1”

x/@1/ “2”

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ 0xFFFFFFFF

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”}

Challenge: “Shrinking Lists”

Page 45: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Solution: Implementation writes a list terminator value.

Challenge: “Shrinking Lists”

Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. !This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. !Conclusion: The user must know what they are doing and understand the implementation.

Page 46: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no

protection against a write “corrupting” an object structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);

Page 47: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no

protection against a write “corrupting” an object structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);

path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1);

x/address/street/ “Singel 45”

x/name/ “John”

x/name/address/street/ “Singel 45”

x/name/name/ “John”✘

Page 48: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent Updates

Solution: Don’t do that!

* If it does happen... !The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.

Conclusion: The user must know what they are doing and understand the implementation.

Page 49: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !!

Page 50: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !

Page 51: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !We tried it.

Page 52: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It can work. CQL supports it as a user-defined type. !Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.

Page 53: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It is still in consideration to use DynamicComposite for paths in a future version.

Page 54: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra Data Model

Page 55: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Thriftx/address/street/ “Singel 45”

x/name “John”

… …

<UUID>

row key column name column value

column family

- OR -

super column family !(coming soon)

xaddress/street/ “Singel 45”name “John”… …

<UUID>

row keysuper column name

Page 56: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Thrift

ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer); !!!!

StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);

Thrift implementation relies on the Hector client.

Page 57: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

CQLCREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )

• Cannot use the path itself as a column name because it is “dynamic”

• Dynamic column family

Page 58: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

CQL: Data Model Constraints

• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key

• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.

• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator

CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )

Page 59: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

CQL

!StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);

CQL implementation relies on the DataStax Java driver.

Page 60: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

And the rest…

Page 61: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Planned Features

• Sets with simple values: element values stored in path

• DynamicComposites?

• Multiple row reads and writes

• Slice queries on path ranges

Page 62: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

Credits and Acknowledgements

• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback

• jackson JSON Processor, which is core to the C* Path implementation http://wiki.fasterxml.com/JacksonHome

• Image credits:

Slide image name author link

Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/

Page 63: C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra

https://github.com/ ebuddy/c-star-path !!

* Artifacts available at Maven Central.