64
#CASSANDRAEU CASSANDRASUMMITEU C* Path: Denormalize your data Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London dinsdag 22 oktober 13

C* path

Embed Size (px)

DESCRIPTION

Library for decomposing your structured data and storing it in Cassandra. Same simple API implemented for both Thrift and CQL.

Citation preview

Page 1: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path:Denormalize your data

Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London

dinsdag 22 oktober 13

Page 2: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Topics

• About eBuddy

• Introducing C* Path

• How does it work?

• Design and Challenges

• Cassandra Data Model

• Futures

dinsdag 22 oktober 13

Page 3: C* path

#CASSANDRAEU CASSANDRASUMMITEU

About eBuddy

dinsdag 22 oktober 13

Page 4: C* path

#CASSANDRAEU CASSANDRASUMMITEU

XMS

dinsdag 22 oktober 13

Page 5: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

dinsdag 22 oktober 13

Page 6: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

dinsdag 22 oktober 13

Page 7: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

dinsdag 22 oktober 13

Page 8: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

dinsdag 22 oktober 13

Page 9: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra ineBuddy Messaging Platform

• User Data Service

• User Discovery Service

• Persistent Session Store

• Message History

• Location-based Discovery

dinsdag 22 oktober 13

Page 10: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some Statistics

• Current size of data– 1,4 TB total (replication of 3x); 467 GB actual data

• 16 million sessions (11 million users plus groups)

• Almost a billion rows in one column family(inverse social graph)

dinsdag 22 oktober 13

Page 11: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

dinsdag 22 oktober 13

Page 12: C* path

#CASSANDRAEU CASSANDRASUMMITEU

The Problem (a “classic”)

Complex Object

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

??

??

??

? ?

Key-Value Store(RDB table, NoSQL, etc.)

dinsdag 22 oktober 13

Page 13: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some Strategies

Serialization!

dinsdag 22 oktober 13

Page 14: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

dinsdag 22 oktober 13

Page 15: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Some StrategiesSerialization!

Normalization!

Decomposition!

Personid

John

birthdate

Jack

1979-11-30

110 1985-04-06

Mary111 Mary

name nickname

person_id

001

003

street

New York

78 Hoofd Str

456 Singel

110 123 Main St

Amsterdam110 002

address_id city

London111

Address

person_id

mobile

mobile

phone

+44030393

+44884800

110 +15551234

111 home

name

111

Phone

name/ John

addresses/@0/street 123 Main St.

phones/@0/number +31123456789

... ...

dinsdag 22 oktober 13

Page 16: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Strategies Comparison

✔ ✘ ✔

✔ ✘ ✔

✔ ✔

✘ ✔ ✔

✔ ✔ ✘

Serialization Normalization Decomposition

Single Write

Single Read

Consistent Updates not enforced

Structural Access

Cycles

dinsdag 22 oktober 13

Page 17: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra

https://github.com/ ebuddy/c-star-path

* Artifacts available at Maven Central.

dinsdag 22 oktober 13

Page 18: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

dinsdag 22 oktober 13

Page 19: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

dinsdag 22 oktober 13

Page 20: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path: Decomposition

• Easy to Use • Simple API

• Good for Cassandra because:

– Structural Access: Write parts of objects without reading first

– Good for denormalizing data, can read or write large complex objects with one read or write operation

dinsdag 22 oktober 13

Page 21: C* path

#CASSANDRAEU CASSANDRASUMMITEU

How does it work?

dinsdag 22 oktober 13

Page 22: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

dinsdag 22 oktober 13

Page 23: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dinsdag 22 oktober 13

Page 24: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Write to a Path

StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dao.writeToPath(rowKey, path, pojo);

dinsdag 22 oktober 13

Page 25: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

dinsdag 22 oktober 13

Page 26: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Read from a Path

Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);

Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });

dinsdag 22 oktober 13

Page 27: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Delete

dao.deletePath(rowKey, path);

dinsdag 22 oktober 13

Page 28: C* path

#CASSANDRAEU CASSANDRASUMMITEU

API Example - Batch Operations

BatchContext batch = dao.beginBatch();

dao.writeToPath(rowKey1, path, pojo1, batch);dao.writeToPath(rowKey2, path, pojo2, batch);dao.deletePath(rowKey3, path, pojo3, batch);

dao.applyBatch(batch);

dinsdag 22 oktober 13

Page 29: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …;

Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);

dinsdag 22 oktober 13

Page 30: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read or write at any level of a path

Person person = …;

Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);

Path pathToName = path.withElements(“name”);String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);

dinsdag 22 oktober 13

Page 31: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

dinsdag 22 oktober 13

Page 32: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

dinsdag 22 oktober 13

Page 33: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Write Implementation: Decomposition

• Step 1:

– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations

• Step 2:

– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer

• Step 3:

– Write this map as key-value pairs in the database

dinsdag 22 oktober 13

Page 34: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

name: Stringbirthdate: Datenickname: String

Person

street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String

Address

*1

name: Stringnumber: String

Phone*

1

Simplify structure into regular Maps, Lists, and simple values

dinsdag 22 oktober 13

Page 35: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Example Decomposition - step 1

Simplify structure into regular Maps, Lists, and simple values

Map

name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>

[0] = <Map>

[1] = <Map>

street="Singel 45"

place="Amsterdam"

street="123 Main"

place="New York"

phones=<List>

[0] = <Map>

name="mobile"

number="+31651234567"

dinsdag 22 oktober 13

Page 36: C* path

#CASSANDRAEU CASSANDRASUMMITEU

path value

name/ “John”

birthdate/ “-39080932298”

nickname/ “Jack”

addresses/@0/street “123 Main St.”

addresses/@0/place “New York”

addresses/@1/street “Singel 45”

addresses/@1/place “Amsterdam”

phones/@0/name “mobile”

phones/@1/number "+31651234567"

Example Decomposition - step 2

dinsdag 22 oktober 13

Page 37: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

dinsdag 22 oktober 13

Page 38: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

dinsdag 22 oktober 13

Page 39: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Read implementation: Composition

• Step 1:

– Read path-value pairs from database

• Step 2:

– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer

• Step 3:

– Use Jackson to convert basic structure back into domain object using a TypeReference

dinsdag 22 oktober 13

Page 40: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Design & Challenges

dinsdag 22 oktober 13

Page 41: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Path Encoding

• Paths stored as strings

• Forward slashes in paths (but hidden by Path API)

• Path elements are internally URL encoded allowing use of special characters in the implementation

• Special characters: @ for list indices(@0, @1, @2, ...)

dinsdag 22 oktober 13

Page 42: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: “Shrinking Lists”

➀ Write a list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

dinsdag 22 oktober 13

Page 43: C* path

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list.➁ Write a shorter list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 44: C* path

#CASSANDRAEU CASSANDRASUMMITEU

➀ Write a list.➁ Write a shorter list.➂ Read the list.

x/@0/ “1”

x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ “2”dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”,”2”}

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 45: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Solution:Implementation writes a list terminator value.

x/@0/ “1”

x/@1/ “2”

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“1”,”2”});

x/@0/ “3”

x/@1/ 0xFFFFFFFF

x/@2/ 0xFFFFFFFF

dao.writeToPath(key, “x”, {“3”});

dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});

{“3”}

Challenge: “Shrinking Lists”

dinsdag 22 oktober 13

Page 46: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Solution:Implementation writes a list terminator value.

Challenge: “Shrinking Lists”

Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path.

This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically.

Conclusion: The user must know what they are doing and understand the implementation.

dinsdag 22 oktober 13

Page 47: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object

structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);

dinsdag 22 oktober 13

Page 48: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object

structure

x/address/street/ “Singel 45”

x/name/ “John”

Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);

path = dao.createPath(“x”,”name”);dao.writeToPath(key, path, person1);

x/address/street/ “Singel 45”

x/name/ “John”

x/name/address/street/ “Singel 45”

x/name/name/ “John”✘

dinsdag 22 oktober 13

Page 49: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Challenge: Inconsistent Updates

Solution:Don’t do that!

* If it does happen...

The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.

Conclusion: The user must know what they are doing and understand the implementation.

dinsdag 22 oktober 13

Page 50: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

dinsdag 22 oktober 13

Page 51: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

Instead of storing paths as strings, the implementation could have used DynamicComposite.

dinsdag 22 oktober 13

Page 52: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

Instead of storing paths as strings, the implementation could have used DynamicComposite.

We tried it.

dinsdag 22 oktober 13

Page 53: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

It can work. CQL supports it as a user-defined type.

Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.

dinsdag 22 oktober 13

Page 54: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Issue: Sorting

Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?

It is still in consideration to use DynamicComposite for paths in a future version.

dinsdag 22 oktober 13

Page 55: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Cassandra Data Model

dinsdag 22 oktober 13

Page 56: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Thriftx/address/street/ “Singel 45”

x/name “John”

… …

<UUID>

row key column name column value

column family

- OR -

super column family

(coming soon)

xxaddress/street/ “Singel 45”name “John”… …

<UUID>

row keysuper column name

dinsdag 22 oktober 13

Page 57: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Thrift

ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer);

StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);

Thrift implementation relies on the Hector client.

dinsdag 22 oktober 13

Page 58: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQLCREATE TABLE person (

key text, path text, value text, PRIMARY KEY (key, path) )

• Cannot use the path itself as a column name because it is “dynamic”

• Dynamic column family

dinsdag 22 oktober 13

Page 59: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQL: Data Model Constraints

• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key

• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.

• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator

CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )

dinsdag 22 oktober 13

Page 60: C* path

#CASSANDRAEU CASSANDRASUMMITEU

CQL

StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);

CQL implementation relies on the DataStax Java driver.

dinsdag 22 oktober 13

Page 61: C* path

#CASSANDRAEU CASSANDRASUMMITEU

And the rest…

dinsdag 22 oktober 13

Page 62: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Planned Features

• Sets with simple values: element values stored in path

• DynamicComposites?

• Multiple row reads and writes

• Slice queries on path ranges

dinsdag 22 oktober 13

Page 63: C* path

#CASSANDRAEU CASSANDRASUMMITEU

Credits and Acknowledgements

• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback

• jackson JSON Processor, which is core to the C* Path implementationhttp://wiki.fasterxml.com/JacksonHome

• Image credits:

Slide image name author link

Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/

dinsdag 22 oktober 13

Page 64: C* path

#CASSANDRAEU CASSANDRASUMMITEU

C* Path

Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra

https://github.com/ ebuddy/c-star-path

* Artifacts available at Maven Central.

dinsdag 22 oktober 13