Cassandra Intro -- TheEdge2012

Preview:

DESCRIPTION

This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability. I gave this talk at TheEdge conference.

Citation preview

Practical IntroductionTo

March 2012

#theedge2012

Sonia Margulis

@robosonia

Your Application

Gone Viral

Best Hardware Money Can Buy

Improve Reads

Sharding RDBMS – A Nightmare

Cassandra’s Sweet Spot

Inherently Clustered

Many concurrent

users

Linear Scalability

High Volumes of Operations

Distributed

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Introduction to Cassandra

A non-relational database

Values availability

Scales out, not up

Open source

Active community

AlwaysAvailable

Who Uses It?

Use Case: Social & Timelines

Logs by Rick Payette

Use Case: Statistics & Logs

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Running a Server

The Cassandra Project

» Project

» Runs on:

» Apache License

» Current release: 1.0.8

sonia@hiro:~/apache-cassandra-1.0.8$

You arehere

Running a Server

sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f

....

Now serving reads.

localhost/127.0.0.1:9160

Cassandra command line interface (CLI) tool

sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160

Connected to: “Test Cluster” on localhost/9160

Welcome to Cassandra CLI version 1.0.8

Connecting to Our Server

Cassandra’s equivalent to RDBMSs database

Lets start using it

[default@unknown] create keyspace demo;

[default@unknown] use demo;

[default@demo]

Creating a Keyspace

Creating a Column Family

A column family holds data, much like a table in RDBMS.

Start adding data

[default@demo] create column family user;

[default@demo] set user[1][a]=utf8(„foo‟);

[default@demo] set user[2][b]=utf8(„bar‟);

[default@demo] set user[2][c]=utf8(„test‟);

Retrieving Data

Retrieving columns by user key[default@demo] get user[2];

(column=b, value=bar)

(column=c, value=test)

Returned 2 results.

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Data Model

Column

Column Name

Value

Column

namePeter Parker

name Peter Parker1

Row

name

Peter Parker

residence

New Yorkspiderman

icon

Row

Row IdColumns

21

spiderman

name Peter Parkerspiderman

icon name

Peter Parker

residence

New York

Column Family

spider-man

batman

name

Peter P

name

Bruce W

name

Bruce B

residence

New York

residence

Gotham

residence

New Yorkhulk

icon

icon

icon

Column Family

spider-man

batman

name

Peter P

name

Bruce W

name

Bruce B

residence

New York

residence

Gotham

residence

New Yorkhulk

icon

icon

icon

set user[„spiderman‟][„name‟] = „Peter Parker‟

Column Family

Row id

Column name

Value

The Allies Column Family

batmanRobin Alfred

spider-man

Iceman Firestar Iron Man Storm

Published Issues Column Family

~2600 columns

. . .1/5/1939 2/3/2012 9/3/2012batman

spider-man

1/8/1962 1/3/2012 8/3/2012. . .

~3800 columns

### ### ###

#########

Model Flexibility

FlexibleData Model

Image: photostock / FreeDigitalPhotos.net

Keyspace

» Like RDBMS database

» A container for column families

» One keyspace per application, in most cases

[default@unknown] create keyspace demo;

Expiring Columns – TTL

set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200;

passwd_reminder

abcd

name

Peter P

residence

New York

iconspider-man

passwd_reminder

abcd

7200s = 2 hours

Distributed Counters

javaedge.com

sessionsspeakers

1035 3402

incr page_views[„javaedge.com‟][„speakers‟] by 1

get page_views[„javaedge.com‟][„speakers‟]

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Communication with the Server: Clients

Cassandra Query Language

» Looks a lot like SQL

» Mostly valid SQL

SELECT name, universe

FROM users

WHERE KEY = „hulk‟

INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel)

»

Advantages of using CQL

» Run ad-hoc queries

» Very familiar, easier to use

» Stable interface

▪ For library developers

▪ For users

CQL Example

SELECT name, residence FROM users

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

CQL Example

SELECT name, residence FROM users

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

CQL Example

SELECT name, residence FROM users

SELECT FIRST 5

FROM allies

WHERE KEY = „spiderman‟

SELECT 01/1/2011 .. 1/1/2012

FROM published_issues

WHERE KEY = „spiderman‟

Cassandra JDBC Driver

import java.sql.*;

Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");

Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/keyspace");

Cassandra JDBC Driver

Statement stmt = con.createStatement();

ResultSet rs = stmt.executeQuery(“SELECT name, residence

FROM users

WHERE KEY ='" + key + "'");

Cassandra JDBC Driver

JDBC

Hector

SliceQuery<...> query =

HFactory.createSliceQuery(keyspace, ...);

query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues")

.setKey("spiderman");

QueryResult<ColumnSlice<Date, String>> result =query.execute();

Hector: Advanced Features

» Failover support

» Connection pooling

» Load balancing

» JMX counters

» Object mapper

Maven plugin

mvn cassandra:start

mvn cassandra:cql-exec

mvn cassandra:stop

Run your tests

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Modeling Data

Queries First

» Use the same Column Family for data that should be fetched together

▪ Reduces IO

» Consider filtering and ordering

Denormalize

» Less seeks - faster reads

» Storing redundant data

▪ Manually handling data integrity

» Disk space is cheaper than seek time

Secondary Index

» Requirement:

spidermanicon name

Peter Parker

residence

New York

Find all superheroes that live in New York

Secondary Index

» Requirement:

» Good for indexes with low cardinality

spidermanicon name

Peter Parker

residence

New York

Find all superheroes that live in New York

create column family users

... and column_metadata=

[{column_name: residence, index_type: KEYS}];

SELECT nameFROM usersWHERE residence = „New York‟

Manually Managed Index

» Requirement:

Find a superhero by name

Manually Managed Index

» Requirement:

» Manually maintain an inverted index

Find a superhero by name

Bruce batmanhulk

Keys in users CF

Search term

Peterspiderman

Bucketing

hulk_jan_2012

1/1/2012 2/1/2012 4/1/2012

Issue-1 Issue-2 Issue-3

hulk_feb_2012

2/2/2012 28/2/2012 29/2/2012

Issue-4 Issue-5 Issue-6

All issues

By month

The Road to Mastership

Running a Server

Growing a Cluster

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Cassandra Cluster

Virtual Ring

10

40

6075

90

Node Token

10

40

6075

90KeysNode

91-1010

11-4040

41-6060

61-7575

76-9090

Node Token

10

40

6075

90

MD5’(hulk) = 20

hulk

Node Token

10

40

6075

90

hulkMD5’(hulk) = 20

Node Token

10

40

6075

90

hulk

thorMD5’(thor) = 42

Node Token

10

40

6075

90

hulk

thor

MD5’(thor) = 42

10

40

6075

90

Inter-Node Communication

» Gossip

» FailureDetection

Fault Tolerance

» Replication factor

» Hinted Handoff10

40

6075

90

hulk

thor

Replication Factor

» Replication factor

» Hinted Handoff10

40

6075

90

hulk

thor

hulk

thor

hulk

thor Replication factor = 3

10

40

6075

90

Fault Tolerance

» Replication factor

» Hinted Handoff

10

40

6075

90

Hinted Handoff

» Replication factor

» Hinted Handoff

10

40

6075

90

Hinted Handoff

» Replication factor

» Hinted Handoff

Client Requests

Write Request

10

40

6075

90

Coordinator

Consistency Level

10

40

6075

90Write Request

Consistency level = ONE

Consistency Level

10

40

6075

90Write Request

Consistency level = ALL

The Road to Mastership

Running a Server

Introduction to Cassandra

Communicating with the Server

Modeling Data

Data Model

Summary

Growing a Cluster

Where Do You Sign?

» Cassandra

▪ http://cassandra.apache.com

▪ http://www.datastax.com/

• Docs, tutorials & videos

▪ IRC: #cassandra on freenode

» Hector

▪ https://github.com/rantav/hector

▪ https://github.com/zznate/hector-examples

Recommended