Upload
gdusbabek
View
23.447
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presented to the Silicon Valley Cloud Computing Group. 17 June 2010.
Citation preview
Gary DusbabekRackspace
Apache
Silicon Valley Cloud Computing Group • 17 June 2010
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Why Cassandra?
161 EB
988 EB
2006 2010
Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
6 fold growthIn 4 years
322 million 500GB drives
1.98 billion 500 GB drives
Why Cassandra?
SQL
• Specialized data structures (think B-trees)– Shines with complicated queries
• Focus on fast query & analysis quickly– Not necessarily on large datasets
Ever tried scaling a RDBMS
• For reads?– Memcache etc.
• For writes?– Oh noes!
VerticalScalingIs hard
credit: janetmck via flickr
VerticalScalingIs hard
No, really:
Enter Cassandra
• Amazon Dynamo– Consistent hashing– Partitioning– Replication– One-hop routing
• Google BigTable– Column Families– Memtables– SSTables
Origins
Pre-2008
Moving Along
2008
Landed
2009
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Distributed and Scalable
• Horizontal!• All nodes are identical– No master or SPOF– Adding is simple
• Automatic cluster maintenance
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Replication
• Replication factor– How many nodes data is replicated on
• Consistency level– Zero, One, Quorum, All– Sync or async for writes– Reliability of reads– Read repair
Ring Topology
a
j
g
d
RF=3
Conceptual Ring
One token per node
Multiple ranges per node
Ring Topology
a
j
g
d
RF=2
Conceptual Ring
One token per node
Multiple ranges per node
New Node
a
j
g
d
RF=3
Token assignment
Range adjustment
Bootstrap
Arrival only affects immediate neighbors
m
Ring Partition
a
j
g
d
RF=3
Node dies
Available?HintingHandoff
Achtung!Plan for this
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Schema-free Sparse-table
• Flexible column naming• You define the sort order• Not required to have a specific column just
because another row does
Data Model
• Keyspace• ColumnFamily• Row (indexed)• Key• Columns•Name (sorted)•Value
Easier to show from the bottom up
Data Model
A single column
Data Model
A single row
Data Model
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Eventually Consistent
• CAP Theorem– Consistency– Availability– Partition Tolerance
• Choose two• Cassandra chooses A and P
But…
Eventually ConsistentI got a fever! And the only prescription is
MORE CONSISTENCY!
Tunable Consistency
• Give up a little A and P to get more C• Ratchet up the consistency level• R + W > N Strong consistency
• More to come
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Inserting: Overview
• Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.
Inserting: Writes• Commit log for durability
– Configurable fsync– Sequential writes only
• Memtable – no disk access (no reads or seeks)
• Sstables are final (become read only)– Indexes– Bloom filter– Raw data
• Bottom line: FAST!!!
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Querying: Overview
• You need a key or keys:– Single: key=‘a’– Range: key=‘a’ through ’f’
• And columns to retrieve:– Slice: cols={bar through kite}– By name: key=‘b’ cols={bar, cat, llama}
• Nothing like SQL “WHERE col=‘faz’”– But secondary indices are being worked on (see
CASSANDRA-749)
Querying: Reads• Practically lock free• Sstable proliferation• New in 0.6:
– Row cache (avoid sstable lookup, not write-through)
– Key cache (avoid index scan)
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Client API (Low Level)
• Fat Client– Live non-storage node– Reduced RPC overhead
• Thrift (12 language bindings!)– http://incubator.apache.org/thrift/– No streaming
• Avro– Work in progress
Client API (High Level)
• http://wiki.apache.org/cassandra/ClientOptions• Feature rich• Connection pooling• Load balancing/failover• Simplified APIs• Version opaque
Outline
• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations
Practical Considerations• Partitioner-Random or Order Preserving– Range queries
• Provisioning– Virtual or bare metal– Cluster size
• Data model– Think in terms of access– Giving up transactions, ad-hoc queries, arbitrary
indexes and joins• (you may already do this with an RDBMS!)
Practical Considerations
• Wide rows• Data life-span• Cluster planning– Bootstrapping
Future Direction
• Vector clocks (server side conflict resolution)• Alter keyspace/column families on a live
cluster• Compression• Multi-tenant features• Less memory restrictions
Wrapping Up
• Use Cassandra if you want/need– High write throughput– Near-linear scalability– Automated replication/fault tolerance– Can tolerate missing RDBMS features
Questions?
Linkage• wiki.apache.org/cassandra• cassandra.apache.org• [email protected]• gdusbabek on twitter and just about
everything else.