Introduction to Cassandra (June 2010)

Gary DusbabekRackspace

Apache

Silicon Valley Cloud Computing Group • 17 June 2010

Outline

• History• Scaling• Replication Model• Data Model• Tuning• Write Path• Read Path• Client Access• Practical Considerations

Outline


Why Cassandra?

161 EB

988 EB

2006 2010

Source: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf

6 fold growthIn 4 years

322 million 500GB drives

1.98 billion 500 GB drives

Why Cassandra?

SQL

• Specialized data structures (think B-trees)– Shines with complicated queries

• Focus on fast query & analysis quickly– Not necessarily on large datasets

Ever tried scaling a RDBMS

• For reads?– Memcache etc.

• For writes?– Oh noes!

VerticalScalingIs hard

credit: janetmck via flickr

VerticalScalingIs hard

No, really:

Enter Cassandra

• Amazon Dynamo– Consistent hashing– Partitioning– Replication– One-hop routing

• Google BigTable– Column Families– Memtables– SSTables

Origins

Pre-2008

Moving Along

2008

Landed

2009

Outline


Distributed and Scalable

• Horizontal!• All nodes are identical– No master or SPOF– Adding is simple

• Automatic cluster maintenance

Outline


Replication

• Replication factor– How many nodes data is replicated on

• Consistency level– Zero, One, Quorum, All– Sync or async for writes– Reliability of reads– Read repair

Ring Topology

a

j

g

d

RF=3

Conceptual Ring

One token per node

Multiple ranges per node

Ring Topology

a

j

g

d

RF=2

Conceptual Ring

One token per node

Multiple ranges per node

New Node

a

j

g

d

RF=3

Token assignment

Range adjustment

Bootstrap

Arrival only affects immediate neighbors

m

Ring Partition

a

j

g

d

RF=3

Node dies

Available?HintingHandoff

Achtung!Plan for this

Outline


Schema-free Sparse-table

• Flexible column naming• You define the sort order• Not required to have a specific column just

because another row does

Data Model

• Keyspace• ColumnFamily• Row (indexed)• Key• Columns•Name (sorted)•Value

Easier to show from the bottom up

Data Model

A single column

Data Model

A single row

Data Model

Outline


Eventually Consistent

• CAP Theorem– Consistency– Availability– Partition Tolerance

• Choose two• Cassandra chooses A and P

But…

Eventually ConsistentI got a fever! And the only prescription is

MORE CONSISTENCY!

Tunable Consistency

• Give up a little A and P to get more C• Ratchet up the consistency level• R + W > N Strong consistency

• More to come

Outline


Inserting: Overview

• Simple: put(key, col, value) • Complex: put(key, [col:value, …, col:value]) • Batch: multi key.

Inserting: Writes• Commit log for durability

– Configurable fsync– Sequential writes only

• Memtable – no disk access (no reads or seeks)

• Sstables are final (become read only)– Indexes– Bloom filter– Raw data

• Bottom line: FAST!!!

Outline


Querying: Overview

• You need a key or keys:– Single: key=‘a’– Range: key=‘a’ through ’f’

• And columns to retrieve:– Slice: cols={bar through kite}– By name: key=‘b’ cols={bar, cat, llama}

• Nothing like SQL “WHERE col=‘faz’”– But secondary indices are being worked on (see

CASSANDRA-749)

Querying: Reads• Practically lock free• Sstable proliferation• New in 0.6:

– Row cache (avoid sstable lookup, not write-through)

– Key cache (avoid index scan)

Outline


Client API (Low Level)

• Fat Client– Live non-storage node– Reduced RPC overhead

• Thrift (12 language bindings!)– http://incubator.apache.org/thrift/– No streaming

• Avro– Work in progress

Client API (High Level)

• http://wiki.apache.org/cassandra/ClientOptions• Feature rich• Connection pooling• Load balancing/failover• Simplified APIs• Version opaque

Outline


Practical Considerations• Partitioner-Random or Order Preserving– Range queries

• Provisioning– Virtual or bare metal– Cluster size

• Data model– Think in terms of access– Giving up transactions, ad-hoc queries, arbitrary

indexes and joins• (you may already do this with an RDBMS!)

Practical Considerations

• Wide rows• Data life-span• Cluster planning– Bootstrapping

Future Direction

• Vector clocks (server side conflict resolution)• Alter keyspace/column families on a live

cluster• Compression• Multi-tenant features• Less memory restrictions

Wrapping Up

• Use Cassandra if you want/need– High write throughput– Near-linear scalability– Automated replication/fault tolerance– Can tolerate missing RDBMS features

Questions?

Linkage• wiki.apache.org/cassandra• cassandra.apache.org• [email protected]• gdusbabek on twitter and just about

everything else.

Documents

Introduction to Cassandra (June 2010)