Upload
techblog
View
609
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Library for decomposing your structured data and storing it in Cassandra. Same simple API implemented for both Thrift and CQL.
Citation preview
#CASSANDRAEU CASSANDRASUMMITEU
C* Path:Denormalize your data
Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Topics
• About eBuddy
• Introducing C* Path
• How does it work?
• Design and Challenges
• Cassandra Data Model
• Futures
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
About eBuddy
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
XMS
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra ineBuddy Messaging Platform
• User Data Service
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra ineBuddy Messaging Platform
• User Data Service
• User Discovery Service
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra ineBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra ineBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra ineBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Some Statistics
• Current size of data– 1,4 TB total (replication of 3x); 467 GB actual data
• 16 million sessions (11 million users plus groups)
• Almost a billion rows in one column family(inverse social graph)
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
The Problem (a “classic”)
Complex Object
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
??
??
??
? ?
Key-Value Store(RDB table, NoSQL, etc.)
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Some Strategies
Serialization!
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Decomposition!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
name/ John
addresses/@0/street 123 Main St.
phones/@0/number +31123456789
... ...
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Strategies Comparison
✔ ✘ ✔
✔ ✘ ✔
✔ ✔
✘ ✔ ✔
✔ ✔ ✘
Serialization Normalization Decomposition
Single Write
Single Read
Consistent Updates not enforced
Structural Access
Cycles
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra
https://github.com/ ebuddy/c-star-path
* Artifacts available at Maven Central.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex objects with one read or write operation
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
How does it work?
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;
Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;UUID rowKey = … ;Pojo pojo = … ;
Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);
dao.writeToPath(rowKey, path, pojo);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”);
Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Delete
dao.deletePath(rowKey, path);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Batch Operations
BatchContext batch = dao.beginBatch();
dao.writeToPath(rowKey1, path, pojo1, batch);dao.writeToPath(rowKey2, path, pojo2, batch);dao.deletePath(rowKey3, path, pojo3, batch);
dao.applyBatch(batch);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …;
Path path = dao.createPath(“x”);dao.writeToPath(rowKey, path, person);
Path pathToName = path.withElements(“name”);String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
• Step 3:
– Write this map as key-value pairs in the database
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
Simplify structure into regular Maps, Lists, and simple values
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
Simplify structure into regular Maps, Lists, and simple values
Map
name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>
[0] = <Map>
[1] = <Map>
street="Singel 45"
place="Amsterdam"
street="123 Main"
place="New York"
phones=<List>
[0] = <Map>
name="mobile"
number="+31651234567"
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
path value
name/ “John”
birthdate/ “-39080932298”
nickname/ “Jack”
addresses/@0/street “123 Main St.”
addresses/@0/place “New York”
addresses/@1/street “Singel 45”
addresses/@1/place “Amsterdam”
phones/@0/name “mobile”
phones/@1/number "+31651234567"
Example Decomposition - step 2
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
• Step 3:
– Use Jackson to convert basic structure back into domain object using a TypeReference
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Design & Challenges
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing use of special characters in the implementation
• Special characters: @ for list indices(@0, @1, @2, ...)
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list.➁ Write a shorter list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
Challenge: “Shrinking Lists”
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list.➁ Write a shorter list.➂ Read the list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”,”2”}
Challenge: “Shrinking Lists”
✘
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Solution:Implementation writes a list terminator value.
x/@0/ “1”
x/@1/ “2”
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ 0xFFFFFFFF
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”}
Challenge: “Shrinking Lists”
✔
✔
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Solution:Implementation writes a list terminator value.
Challenge: “Shrinking Lists”
✔
Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path.
This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically.
Conclusion: The user must know what they are doing and understand the implementation.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object
structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no protection against a write “corrupting” an object
structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”);dao.writeToPath(key, path, person1);
path = dao.createPath(“x”,”name”);dao.writeToPath(key, path, person1);
x/address/street/ “Singel 45”
x/name/ “John”
x/name/address/street/ “Singel 45”
x/name/name/ “John”✘
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Solution:Don’t do that!
✔
* If it does happen...
The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.
Conclusion: The user must know what they are doing and understand the implementation.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?
Instead of storing paths as strings, the implementation could have used DynamicComposite.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?
Instead of storing paths as strings, the implementation could have used DynamicComposite.
We tried it.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?
It can work. CQL supports it as a user-defined type.
Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements?
It is still in consideration to use DynamicComposite for paths in a future version.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra Data Model
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Thriftx/address/street/ “Singel 45”
x/name “John”
… …
<UUID>
row key column name column value
column family
- OR -
super column family
(coming soon)
xxaddress/street/ “Singel 45”name “John”… …
<UUID>
row keysuper column name
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Thrift
ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer);
StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);
Thrift implementation relies on the Hector client.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
CQLCREATE TABLE person (
key text, path text, value text, PRIMARY KEY (key, path) )
• Cannot use the path itself as a column name because it is “dynamic”
• Dynamic column family
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
CQL: Data Model Constraints
• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key
• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.
• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator
CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
CQL
StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);
CQL implementation relies on the DataStax Java driver.
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
And the rest…
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Planned Features
• Sets with simple values: element values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
Credits and Acknowledgements
• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback
• jackson JSON Processor, which is core to the C* Path implementationhttp://wiki.fasterxml.com/JacksonHome
• Image credits:
Slide image name author link
Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/
dinsdag 22 oktober 13
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs —and storing them in Cassandra
https://github.com/ ebuddy/c-star-path
* Artifacts available at Maven Central.
dinsdag 22 oktober 13