If you can't read please download the document
Upload
zznate
View
8.270
Download
1
Embed Size (px)
Citation preview
PowerPoint Presentation
Introduction to
Apache Cassandra
(for Java Developers!)
Nate [email protected]@zznate
Overview
Apache Cassandra is NOT a "key/value storeColumns are dynamic
inside a column family
(but they don't have to be)
Gain an understanding concepts in Apache Cassandra that have particulr effect on application development
Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
Brief Intro - Storage
SSTables are immutable
SSTables merged on reads
Brief Intro - Compaction
Combine columnsKeep SSTable count down
Discard tombstones (more on this later)
Brief Intro- The Ring
All nodes share the same role: No single point of failure
Easy to scale
Simplified operations
Brief Intro - Consistency Level - ONE
Cassandra provides consistency when
R + W > N (read replica count +write replica count >
replication factor).
Brief Intro - Consistency Level QUORUM
Brief Intro Read Repair
vs. RDBMS - Consistency Level
*** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK ***
Idempotent: an operation can be applied multiple times without changing the result(except counters!)
vs. RDBMS - Append Only
Proper data modeling will minimizes seeksNo read before write(Go to Matt's presentation for more!)
How does this impact development?
Substantially.For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.Trade availability for consistencyStore whatever you want. It's all just bytes.Think about how you will query the data before you write it.
Neat. So Now What?
Like any database, you need a client!
Python:Telephus:http://github.com/driftx/Telephus(Twisted)
Pycassa:http://github.com/pycassa/pycassa
Java:Hector:http://github.com/rantav/hector(Exampleshttps://github.com/zznate/hector-examples)
Pelops:http://github.com/s7/scale7-pelops
Kunderahttp://code.google.com/p/kundera/
Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin
Grails:grails-cassandra:https://github.com/wolpert/grails-cassandra
.NET:FluentCassandra:http://github.com/managedfusion/fluentcassandra
Aquiles:http://aquiles.codeplex.com/
Ruby:Cassandra:http://github.com/fauna/cassandra
PHP:phpcassa:http://github.com/thobbs/phpcassa
SimpleCassie:http://code.google.com/p/simpletools-php/wiki/SimpleCassie
... but do not roll your own
Thrift
Fast, efficient serialization and network IO.
Lots of clients available (you can probably use it in other places as well)
Why you don't want to work with the Thrift API directly:
SuperColumn
ColumnOrSuperColumn (don't forget Counters!)
ColumnParent.super_column
ColumnPath.super_column
Map mutationMap
Higher Level Clients
Hector
JMX Counters
Add/remove hosts:
automatically
programatically
via JMX
Plugable load balancing
Complete encapsulation of Thrift API
Type-safe approach to dealing with Apache Cassandra
Lightweight ORM (supports JPA 1.0 annotations)
JPA support: https://github.com/riptano/hector-jpa
Mavenized!http://repo2.maven.org/maven2/me/prettyprint/
CQL
Viable alternative as of 0.8.0
JDBC Driver implementation means lots of possibilities
Encapsulate API changes
In-tree support on the way for:
DataSource
Pooling
Avro, etc??
Gone. Added too much complexity after
Thrift caught up.
None of the libraries distinguished themselves as being a particularly crappy choice for serialization.
(SeeCASSANDRA-1765)
Thrift API Methods
Five general categoriesRetrieving
Writing/Updating/Removing (all the same op!)Increment counters
Meta Information
Schema Manipulation
CQL Execution
On to the Code...
https://github.com/zznate/cassandra-tutorialUses Maven.Really basic.Modify/abuse/alter as needed.Descriptions of what is going on and how to run each example are in the Javadoc comments.Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps)http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx
Data Shape
512 202 30.27 097.74 W TX Austin512 203 30.27 097.74 L TX Austin512 204 30.32 097.73 W TX Austin512 205 30.32 097.73 W TX Austin512 206 30.32 097.73 L TX Austin
Get a Single Column for a Key
GetCityForNpanxx.java
columnQuery.setColumnFamily("Npanxx");columnQuery.setKey("512204");
columnQuery.setName("city");
Get the Contents of a Row
GetSliceForNpanxx.java
sliceQuery.setColumnFamily("Npanxx");sliceQuery.setKey("512202");
sliceQuery.setColumnNames("city","state","lat","lng");
Get the (sorted!) Columns of a Row
GetSliceForStateCity.java
sliceQuery.setColumnFamily("StateCity");sliceQuery.setKey("TX Austin");
sliceQuery.setRange(202L, 204L, false, 5)
Get the Same Slice from Several Rows
MultigetSliceForNpanxx.java
multigetSlicesQuery.setColumnFamily("Npanxx");multigetSlicesQuery.setColumnNames("city","state","lat","lng");
multigetSlicesQuery.setKeys("512202","512203","512205","512206");
Get Slices From a Range of Rows
GetRangeSlicesForStateCity.javaThe results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!)
rangeSlicesQuery.setColumnFamily("Npanxx");rangeSlicesQuery.setColumnNames("city","state","lat","lng");
rangeSlicesQuery.setKeys("512202", "512205");
rangeSlicesQuery.setRowCount(5);
Get Slices From a Range of Rows - 2
GetSliceForAreaCodeCity.javaBonus: DynamicComparator and DynamicComposite (Ed's talk)
sliceQuery.setKey("512");
sliceQuery.setRange("Austin", "Austin__204", false, 5);
Get Slices from Indexed Columns
GetIndexedSlicesForCityState.javaYou only need to index
a single column to apply
clauses on other columns
isq.setColumnFamily("Npanxx");
isq.setColumnNames("city","lat","lng");isq.addEqualsExpression("state", "TX");
isq.addEqualsExpression("city", "Austin");
isq.addGteExpression("lat", "30.30");
Insert, Update and Delete
... are effectively the same operation:Application of columns to a row
Insertion
InsertRowsForColumnFamilies.java
mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57"));mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34"));
mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame"));
mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));
mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer));
mutator.addInsertion("650", "AreaCode",
HFactory.createStringColumn("Burlingame__650", "37.57x122.34"));
Add insertions to the other two column families to the same mutation
Deletion
DeleteRowsForColumnFamily.java
mutator.addDeletion("650222", "Npanxx", city, stringSerializer);
mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer);mutator.addDeletion("650", "AreaCode", null, stringSerializer);
mutator.addDeletion("650222", "Npanxx", null, stringSerializer);
Or row level
Record Level
Deletion
[default@Tutorial] list StateCity;Using default limit of 100
-------------------
RowKey: CA Burlingame
=> (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
-------------------
RowKey: TX Austin
=> (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
=> (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
=> (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
=> (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
=> (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
Deletion
[default@Tutorial] list StateCity;Using default limit of 100
-------------------
RowKey: CA Burlingame
-------------------
RowKey: TX Austin
=> (column=202, value=33302e3237783039372e3734, timestamp=1310143852392000)
=> (column=203, value=33302e3237783039372e3734, timestamp=1310143852444000)
=> (column=204, value=33302e3332783039372e3733, timestamp=1310143852448000)
=> (column=205, value=33302e3332783039372e3733, timestamp=1310143852453000)
=> (column=206, value=33302e3332783039372e3733, timestamp=1310143852457000)
Deletion - FYI
mutator.addDeletion("202230", "Npanxx", city, stringSerializer);
You just inserted a tombstone!
Sending a deletion for a non-existing row:
[default@Tutorial] list Npanxx; Using default limit of 100
. . .
-------------------
RowKey: 202230
-------------------
. . .
ColumnFamilyTemplate
ColumnFamilyUpdater updater = template.createUpdater("cskey1");
updater.setString("stringval","value1");
updater.setDate("curdate", date);
updater.setLong("longval", 5L);
template.update(updater);
template.addColumn("stringval", se);
template.addColumn("curdate", DateSerializer.get());
template.addColumn("longval", LongSerializer.get());
ColumnFamilyResult wrapper = template.queryColumns("cskey1");
Template method design patternhttps://github.com/rantav/hector/wiki/Getting-started-%285-minutes%29
Development Resources
Cassandra Maven Plugin
http://mojo.codehaus.org/cassandra-maven-plugin/CCM localhost
cassandra cluster
https://github.com/pcmanus/ccmOpsCenter
http://www.datastax.com/products/opscenter
Cassandra AMIs
https://github.com/riptano/CassandraClusterAMI
Stuff I Punted on for the Sake of Brevity
meta_* methods
CassandraClusterTest.java: L43-81 @hectorsystem_* methods
SchemaManipulation.java @ hector-examples
CassandraClusterTest.java: L84-157 @hectorORM (it works and is in
production)
https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29multiple
nodes and failure scenariosData modeling (go see Matt's
presentation)
Things to Remember
deletes and timestamp granularity
range ghosts and tombstones
using the wrong column comparator, key/default validators and InvalidRequestException
Schema-less Schema Optional
use column-level TTL to automate deletion
"how do I iterate over all the rows in a column family"?
get_range_slices, but don't do that
a good sign your data model is wrong
Questions?