Upload
robosonia-mar
View
2.182
Download
3
Embed Size (px)
DESCRIPTION
This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability. I gave this talk at TheEdge conference.
Citation preview
Practical IntroductionTo
March 2012
#theedge2012
Sonia Margulis
@robosonia
Your Application
Gone Viral
Best Hardware Money Can Buy
Improve Reads
Sharding RDBMS – A Nightmare
Cassandra’s Sweet Spot
Inherently Clustered
Many concurrent
users
Linear Scalability
High Volumes of Operations
Distributed
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Introduction to Cassandra
A non-relational database
Values availability
Scales out, not up
Open source
Active community
AlwaysAvailable
Who Uses It?
Use Case: Social & Timelines
Logs by Rick Payette
Use Case: Statistics & Logs
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Running a Server
The Cassandra Project
» Project
» Runs on:
» Apache License
» Current release: 1.0.8
sonia@hiro:~/apache-cassandra-1.0.8$
You arehere
Running a Server
sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f
....
Now serving reads.
localhost/127.0.0.1:9160
Cassandra command line interface (CLI) tool
sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160
Connected to: “Test Cluster” on localhost/9160
Welcome to Cassandra CLI version 1.0.8
Connecting to Our Server
Cassandra’s equivalent to RDBMSs database
Lets start using it
[default@unknown] create keyspace demo;
[default@unknown] use demo;
[default@demo]
Creating a Keyspace
Creating a Column Family
A column family holds data, much like a table in RDBMS.
Start adding data
[default@demo] create column family user;
[default@demo] set user[1][a]=utf8(„foo‟);
[default@demo] set user[2][b]=utf8(„bar‟);
[default@demo] set user[2][c]=utf8(„test‟);
Retrieving Data
Retrieving columns by user key[default@demo] get user[2];
(column=b, value=bar)
(column=c, value=test)
Returned 2 results.
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Data Model
Column
Column Name
Value
Column
namePeter Parker
name Peter Parker1
Row
name
Peter Parker
residence
New Yorkspiderman
icon
Row
Row IdColumns
21
spiderman
name Peter Parkerspiderman
icon name
Peter Parker
residence
New York
Column Family
spider-man
batman
name
Peter P
name
Bruce W
name
Bruce B
residence
New York
residence
Gotham
residence
New Yorkhulk
icon
icon
icon
Column Family
spider-man
batman
name
Peter P
name
Bruce W
name
Bruce B
residence
New York
residence
Gotham
residence
New Yorkhulk
icon
icon
icon
set user[„spiderman‟][„name‟] = „Peter Parker‟
Column Family
Row id
Column name
Value
The Allies Column Family
batmanRobin Alfred
spider-man
Iceman Firestar Iron Man Storm
Published Issues Column Family
~2600 columns
. . .1/5/1939 2/3/2012 9/3/2012batman
spider-man
1/8/1962 1/3/2012 8/3/2012. . .
~3800 columns
### ### ###
#########
Model Flexibility
FlexibleData Model
Image: photostock / FreeDigitalPhotos.net
Keyspace
» Like RDBMS database
» A container for column families
» One keyspace per application, in most cases
[default@unknown] create keyspace demo;
Expiring Columns – TTL
set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200;
passwd_reminder
abcd
name
Peter P
residence
New York
iconspider-man
passwd_reminder
abcd
7200s = 2 hours
Distributed Counters
javaedge.com
sessionsspeakers
1035 3402
incr page_views[„javaedge.com‟][„speakers‟] by 1
get page_views[„javaedge.com‟][„speakers‟]
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Communication with the Server: Clients
Cassandra Query Language
» Looks a lot like SQL
» Mostly valid SQL
SELECT name, universe
FROM users
WHERE KEY = „hulk‟
INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel)
»
Advantages of using CQL
» Run ad-hoc queries
» Very familiar, easier to use
» Stable interface
▪ For library developers
▪ For users
CQL Example
SELECT name, residence FROM users
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
CQL Example
SELECT name, residence FROM users
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
CQL Example
SELECT name, residence FROM users
SELECT FIRST 5
FROM allies
WHERE KEY = „spiderman‟
SELECT 01/1/2011 .. 1/1/2012
FROM published_issues
WHERE KEY = „spiderman‟
Cassandra JDBC Driver
import java.sql.*;
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/keyspace");
Cassandra JDBC Driver
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(“SELECT name, residence
FROM users
WHERE KEY ='" + key + "'");
Cassandra JDBC Driver
JDBC
Hector
SliceQuery<...> query =
HFactory.createSliceQuery(keyspace, ...);
query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues")
.setKey("spiderman");
QueryResult<ColumnSlice<Date, String>> result =query.execute();
Hector: Advanced Features
» Failover support
» Connection pooling
» Load balancing
» JMX counters
» Object mapper
Maven plugin
mvn cassandra:start
mvn cassandra:cql-exec
mvn cassandra:stop
Run your tests
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Modeling Data
Queries First
» Use the same Column Family for data that should be fetched together
▪ Reduces IO
» Consider filtering and ordering
Denormalize
» Less seeks - faster reads
» Storing redundant data
▪ Manually handling data integrity
» Disk space is cheaper than seek time
Secondary Index
» Requirement:
spidermanicon name
Peter Parker
residence
New York
Find all superheroes that live in New York
Secondary Index
» Requirement:
» Good for indexes with low cardinality
spidermanicon name
Peter Parker
residence
New York
Find all superheroes that live in New York
create column family users
... and column_metadata=
[{column_name: residence, index_type: KEYS}];
SELECT nameFROM usersWHERE residence = „New York‟
Manually Managed Index
» Requirement:
Find a superhero by name
Manually Managed Index
» Requirement:
» Manually maintain an inverted index
Find a superhero by name
Bruce batmanhulk
Keys in users CF
Search term
Peterspiderman
Bucketing
hulk_jan_2012
1/1/2012 2/1/2012 4/1/2012
Issue-1 Issue-2 Issue-3
hulk_feb_2012
2/2/2012 28/2/2012 29/2/2012
Issue-4 Issue-5 Issue-6
All issues
By month
The Road to Mastership
Running a Server
Growing a Cluster
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Cassandra Cluster
Virtual Ring
10
40
6075
90
Node Token
10
40
6075
90KeysNode
91-1010
11-4040
41-6060
61-7575
76-9090
Node Token
10
40
6075
90
MD5’(hulk) = 20
hulk
Node Token
10
40
6075
90
hulkMD5’(hulk) = 20
Node Token
10
40
6075
90
hulk
thorMD5’(thor) = 42
Node Token
10
40
6075
90
hulk
thor
MD5’(thor) = 42
10
40
6075
90
Inter-Node Communication
» Gossip
» FailureDetection
Fault Tolerance
» Replication factor
» Hinted Handoff10
40
6075
90
hulk
thor
Replication Factor
» Replication factor
» Hinted Handoff10
40
6075
90
hulk
thor
hulk
thor
hulk
thor Replication factor = 3
10
40
6075
90
Fault Tolerance
» Replication factor
» Hinted Handoff
10
40
6075
90
Hinted Handoff
» Replication factor
» Hinted Handoff
10
40
6075
90
Hinted Handoff
» Replication factor
» Hinted Handoff
Client Requests
Write Request
10
40
6075
90
Coordinator
Consistency Level
10
40
6075
90Write Request
Consistency level = ONE
Consistency Level
10
40
6075
90Write Request
Consistency level = ALL
The Road to Mastership
Running a Server
Introduction to Cassandra
Communicating with the Server
Modeling Data
Data Model
Summary
Growing a Cluster
Where Do You Sign?
» Cassandra
▪ http://cassandra.apache.com
▪ http://www.datastax.com/
• Docs, tutorials & videos
▪ IRC: #cassandra on freenode
» Hector
▪ https://github.com/rantav/hector
▪ https://github.com/zznate/hector-examples