Upload
planet-cassandra
View
885
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Speaker: Eric Zoerner, Senior Software Developer at eBuddy Video: http://www.youtube.com/watch?v=fwgCJ2MzakA&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=12 In this session you'll learn about the design and implementation of a new open source general-purpose Java library that supports storing structured data in Cassandra. Instead of mapping the data to multiple tables like an ORM would or embedding data using serialization, this approach decomposes structured data of arbitrary complexity into separate columns of simple values, allowing the data to be retrieved or updated in parts using hierarchical paths. Implementations are included for Cassandra using both the Thrift and CQL3 APIs. In addition, Eric's experiences are shared regarding the challenges of using CQL3 vs. Thrift for schema-less data.
Citation preview
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Denormalize your data
Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London
#CASSANDRAEU CASSANDRASUMMITEU
About eBuddy
#CASSANDRAEU CASSANDRASUMMITEU
XMS
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery
#CASSANDRAEU CASSANDRASUMMITEU
Some Statistics
• Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data
!• 12 million sessions (11 million users plus groups) !
• Almost a billion rows in one column family(inverse social graph)
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
#CASSANDRAEU CASSANDRASUMMITEU
The Problem (a “classic”)
Complex Object
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
??
??
??
? ?
Key-Value Store(RDB table, NoSQL, etc.)
#CASSANDRAEU CASSANDRASUMMITEU
Some Strategies
Serialization!
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Decomposition!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
name/ John
addresses/@0/street 123 Main St.
phones/@0/number +31123456789
... ...
#CASSANDRAEU CASSANDRASUMMITEU
Strategies Comparison
✔ ✘ ✔
✔ ✘ ✔
✔ ✔
✘ ✔ ✔
✔ ✔ ✘
Serialization Normalization Decomposition
Single Write
Single Read
Consistent Updates not enforced
Structural Access
Cycles
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra
https://github.com/ ebuddy/c-star-path !!
* Artifacts available at Maven Central.
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex objects with one read or write operation
#CASSANDRAEU CASSANDRASUMMITEU
How does it work?
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !dao.writeToPath(rowKey, path, pojo);
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Delete
!!dao.deletePath(rowKey, path);
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Batch Operations
!BatchContext batch = dao.beginBatch(); !dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); !dao.applyBatch(batch);
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
• Step 3:
– Write this map as key-value pairs in the database
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
Simplify structure into regular Maps, Lists, and simple values
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
Simplify structure into regular Maps, Lists, and simple values
Map
name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>
[0] = <Map>
[1] = <Map>
street="Singel 45"
place="Amsterdam"
street="123 Main"
place="New York"
phones=<List>
[0] = <Map>
name="mobile"
number="+31651234567"
#CASSANDRAEU CASSANDRASUMMITEU
path value
name/ “John”
birthdate/ “-39080932298”
nickname/ “Jack”
addresses/@0/street “123 Main St.”
addresses/@0/place “New York”
addresses/@1/street “Singel 45”
addresses/@1/place “Amsterdam”
phones/@0/name “mobile”
phones/@1/number "+31651234567"
Example Decomposition - step 2
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
• Step 3:
– Use Jackson to convert basic structure back into domain object using a TypeReference
#CASSANDRAEU CASSANDRASUMMITEU
Design & Challenges
#CASSANDRAEU CASSANDRASUMMITEU
Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing use of special characters in the implementation
• Special characters: @ for list indices(@0, @1, @2, ...)
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list. ➁ Write a shorter list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
Challenge: “Shrinking Lists”
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list. ➁ Write a shorter list. ➂ Read the list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”,”2”}
Challenge: “Shrinking Lists”
✘
#CASSANDRAEU CASSANDRASUMMITEU
Solution: Implementation writes a list terminator value.
x/@0/ “1”
x/@1/ “2”
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ 0xFFFFFFFF
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”}
Challenge: “Shrinking Lists”
✔
✔
#CASSANDRAEU CASSANDRASUMMITEU
Solution: Implementation writes a list terminator value.
Challenge: “Shrinking Lists”
✔
Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. !This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. !Conclusion: The user must know what they are doing and understand the implementation.
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no
protection against a write “corrupting” an object structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no
protection against a write “corrupting” an object structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);
path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1);
x/address/street/ “Singel 45”
x/name/ “John”
x/name/address/street/ “Singel 45”
x/name/name/ “John”✘
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Solution: Don’t do that!
✔
* If it does happen... !The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.
Conclusion: The user must know what they are doing and understand the implementation.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !!
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !We tried it.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It can work. CQL supports it as a user-defined type. !Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It is still in consideration to use DynamicComposite for paths in a future version.
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra Data Model
#CASSANDRAEU CASSANDRASUMMITEU
Thriftx/address/street/ “Singel 45”
x/name “John”
… …
<UUID>
row key column name column value
column family
- OR -
super column family !(coming soon)
xaddress/street/ “Singel 45”name “John”… …
<UUID>
row keysuper column name
#CASSANDRAEU CASSANDRASUMMITEU
Thrift
ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer); !!!!
StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);
Thrift implementation relies on the Hector client.
#CASSANDRAEU CASSANDRASUMMITEU
CQLCREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )
• Cannot use the path itself as a column name because it is “dynamic”
• Dynamic column family
#CASSANDRAEU CASSANDRASUMMITEU
CQL: Data Model Constraints
• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key
• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.
• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator
CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )
#CASSANDRAEU CASSANDRASUMMITEU
CQL
!StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);
CQL implementation relies on the DataStax Java driver.
#CASSANDRAEU CASSANDRASUMMITEU
And the rest…
#CASSANDRAEU CASSANDRASUMMITEU
Planned Features
• Sets with simple values: element values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
#CASSANDRAEU CASSANDRASUMMITEU
Credits and Acknowledgements
• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback
• jackson JSON Processor, which is core to the C* Path implementation http://wiki.fasterxml.com/JacksonHome
• Image credits:
Slide image name author link
Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra
https://github.com/ ebuddy/c-star-path !!
* Artifacts available at Maven Central.