Upload
zznate
View
2.560
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
1 CONFIDENTIAL |
NATE MCCALLSr. Software Developer
Building Applications With Apache Cassandra
2 CONFIDENTIAL |
What We’ll Cover
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
3 CONFIDENTIAL |
Requirements JDK 1.6 or greater
Apache Maven 3.0.2 or greater
Apache Cassandra 1.0.7 – DataStax community edition:
http://www.datastax.com/download/community/versions
IDE such as Eclipse or IntelliJ will be helpful but not necessary
Several thumb drives available (please share)
All source on GitHub: https://github.com/zznate/strata-west-2012
Getting Started
4 CONFIDENTIAL |
Learning by doing Looking at and writing code
Examples are constructed explicitly to show off certain concepts
Move ahead if it gets slow – just start hacking
You must be comfortable writing and debugging software
How We’ll Cover It
5 CONFIDENTIAL |
It does not have to be hard.
Getting Down To It
6 CONFIDENTIAL |
Getting Down To It
7 CONFIDENTIAL |
It does not have to be mysterious.
Getting Down To It
8 CONFIDENTIAL |
Getting Down To It
9 CONFIDENTIAL |
You can leverage a mature language with stable clients against a proven, best of breed solution in use at high-traffic production environments right now
Getting Down To It
10 CONFIDENTIAL |
What We’ll Cover
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
11 CONFIDENTIAL |
Scale Out. But Really Though.
Best of Breed Linear scaling Real multi-datacenter support “Fix it on Monday” fault tolerance
12 CONFIDENTIAL |
Static Column Family
GOOG Price:589.55 Name=Google
APPL Price=401.76 Name=Apple
NFLX Price=78.73 Nam=Netflix
NOK Price=6.90 Name=Nokia Exchange=NYSE
Schema Optional Not all columns are required
13 CONFIDENTIAL |
Dynamic Column Family
GOOG 10/25/11=583.16 10/24/11=596.42 10/23/11=590.49
APPL 10/25/11=397.77 10/24/11=405.77 10/23/11=392.87
NFLX 10/25/11=77.37 10/24/11=118.14 10/23/11=117.23
NOK 10/25/11=6.71 10/24/11=6.76 10/23/11=6.61
Prematerialized Queries Store it how you read it
14 CONFIDENTIAL |
The API
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
15 CONFIDENTIAL |
Starting up If you didn’t look before hand: http
://www.datastax.com/docs/1.0/getting_started/index
We want to run the Cassandra process in the foreground to see what’s going on:
cd $CASSANDRA_HOME
/bin/cassandra -f
Common API Usage
16 CONFIDENTIAL |
DataStax OpsCenter
If you are not sure why you should have monitoring, have this running at all times.
http://www.datastax.com/docs/opscenter/index
Common API Usage
17 CONFIDENTIAL |
Static Column Families See org.apache.tutorial.BasicUsageExample
Common API Usage
18 CONFIDENTIAL |
Dynamic Column Families See org.apache.tutorial.TimeseriesInserter
– A Cassandra row can hold up to 2 billion columns
Common API Usage
19 CONFIDENTIAL |
Dynamic Column Families See org.apache.tutorial.TimeseriesIterationQuery
– Encapsulate paging in iteration for easier traversal of wide rows
Common API Usage
20 CONFIDENTIAL |
Using CQL See comments in class files as we go
– Use cqlsh for queries, some administration tasks– Caveat: no composites or super column support
Common API Usage
21 CONFIDENTIAL |
JdbcTemplate Some compiling required
– Not quite there on the typing support– Pooling library needs work– Give this a try if you want: https://github.com/riptano/jdbc-conn-pool
Specifically:– https://github.com/riptano/jdbc-conn-pool/tree/master/portfolio-example
Common API Usage
22 CONFIDENTIAL |
JdbcTemplate Configuration via ResourceRef
Common API Usage
23 CONFIDENTIAL |
JdbcTemplate Configuration via Context
Common API Usage
24 CONFIDENTIAL |
JdbcTemplate Insertion
Common API Usage
25 CONFIDENTIAL |
JdbcTemplate Selection
Common API Usage
26 CONFIDENTIAL |
Storage and On-Disk Structure
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
27 CONFIDENTIAL |
Merge-On-Read
On-disk structure is immutable
No read-before-write Highest timestamp wins Delete markers (“tombstones”)
thrown out on merge
Benefits
28 CONFIDENTIAL |
Compaction
Merge SSTables
Keeps SSTable count down Makes merge-on-read process
more efficient Groups rows into single SSTable Can be vary on workload
Size-Tiered compaction Leveled compaction
Benefits
29 CONFIDENTIAL |
Indexing Techniques See org.apache.tutorial.CompositeDataLoader
– Store a static index in a single row
Common API Usage
30 CONFIDENTIAL |
Indexing Techniques See org.apache.tutorial.CompositeQuery
– Use slice of composites to narrow in on query
Common API Usage
31 CONFIDENTIAL |
Indexing Techniques See org.apache.tutorial.CompositeQuery
– Let’s add another level to the composite
Common API Usage
32 CONFIDENTIAL |
Indexing Techniques See org.apache.tutorial.CompositeQuery
– Add a third level to composite to narrow search to “cities in California starting with “Ag”
Common API Usage
33 CONFIDENTIAL |
Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesInserter
– Uses buckets for granularity
Every minute gets a distinct row 2012_02_28_13_30
Common API Usage
34 CONFIDENTIAL |
Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesQuery
– More advanced slicing examples– Keys can be rebuilt for any time window– Keep rows grouped tightly on disk
I need the 30 minutes between 3 and 4pm for every day last week
Storage Model
35 CONFIDENTIAL |
Tombstones See org.apache.tutorial.TombstoneDemoInserter and
TombstoneDemoQuery
Storage Model
36 CONFIDENTIAL |
Tombstone
Output before deletion
37 CONFIDENTIAL |
Tombstone
Output after deletion
38 CONFIDENTIAL |
Understanding the Ring and Consistency
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
39 CONFIDENTIAL |
The Ring
Lexigraphically similar tokens are hashed to (very) different values
Provides for shared knowledge of key location
The actual token range is from 0 to 2^128
The token is created by converting an MD5 hash of the key to a java.lang.BigInteger
Token Distribution Distributed Hashing
40 CONFIDENTIAL |
The Ring
The next token after the highest possible value is the lowest possible value.
Token Distribution as a Ring Wrapping Ranges
41 CONFIDENTIAL |
The Ring
Nodes distribute ownership via Token ranges
A node owns it’s token and the range immediately before
Nodes continuously “gossip” ring ownership
Any node can act as a coordinator to service requests for any other node
4 Node Token Distribution Simplified Ring Example
“foo”
42 CONFIDENTIAL |
The Ring
Initial Token First Token Last Token
Node 1 0 76 0
Node 2 25 1 25
Node 3 50 26 50
Node 4 75 51 75
Inclusive token ranges for a four node cluster
43 CONFIDENTIAL |
Integrating with Web Applicaitons
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
44 CONFIDENTIAL |
Using Spring AccountController and AccountDao
– Similar to JDBC example for wiring
Web Application Integration
45 CONFIDENTIAL |
Probably as far as we’ll get…
DataStax Documentation: http://www.datastax.com/docs/1.0/index
Apache Cassandra project wiki: http://wiki.apache.org/cassandra/
“The Dynamo Paper”: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
P. Helland. Building on Quicksand: http://arxiv.org/pdf/0909.1788
P. Helland. Life Beyond Distributed Transactions: http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
“The Megastore Paper”: http://research.google.com/pubs/archive/36971.pdf
The Hector Client: http://hector-client.org
Web Application Integration
46 CONFIDENTIAL |
Developer Resources
CQL Documentation: http://www.datastax.com/docs/1.0/dml/using_cql
Hector Documentation: http://hector-client.org
Cassandra Maven Plugin: http://mojo.codehaus.org/cassandra-maven-plugin/
CCM localhost cassandra cluster: https://github.com/pcmanus/ccm
OpsCenter: http://www.datastax.com/products/opscenter
Cassandra AMIs: https://github.com/riptano/CassandraClusterAMI
Cassandra Launcher: https://github.com/joaquincasares/cassandralauncher
Web Application Integration