C*ollege Credit: Creating Your First App in Java with Cassandra

Preview:

DESCRIPTION

 

Citation preview

CREATING YOUR FIRST JAVA APP W/

C*

Brian O’Neill, Lead Architect, Health Market Science

bone@alumni.brown.edu@boneill42

MISSION: HELP SANTA!

Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)

Our Problem

Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.

The World-Wide Globally Scalable Naughty List!

How about a Naughty and Nice list for Santa?

1.9 billion childrenThat will fit in a single row!

Queries to support:Children can login and check

their standing.Santa can find nice children by

country, state or zip.

Getting Setup.

Installation

As easy as… Downloadhttp://cassandra.apache.org/download/

Uncompresstar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz

Runbin/cassandra –f

(-f puts it in foreground)

Configuration

conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassandra/commitlog

conf/log4j-server.propertieslog4j.appender.R.File=/var/log/cassandra/system.log

Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row

Have arbitrary #’s of columnsValidator for keys (e.g. UTF8Type)

ColumnValidator for values and keysComparator for keys (e.g. DateType or BYOC)

(http://www.youtube.com/watch?v=bKfND4woylw)

Distributed Architecture Nodes form a token ring.

Nodes partition the ring by initial tokeninitial_token: (in cassandra.yaml)

Partitioners map row keys to tokens.Usually randomly, to evenly distribute the data

All columns for a row are stored together on disk in sorted order.

Visually

A(67-0)

B(1-33)

C(34-66)

Row Hash

Alice 50

Bob 3

Eve 15

Token/Hash Range : 0-99

Java Interpretation

Each table is a Distributed HashMap Each row is a SortedMap.

Cassandra provides a massively scalable version of:

HashMap<rowKey, SortedMap<columnKey, columnValue>

Implications:Direct row fetch is fast.Searching a range of rows can be costly.Searching a range of columns is cheap.

Defining our schema

Two Tables

Children TableStore all the children in the world.One row per child.One column per attribute.

NaughtyOrNice TableSupports the queries we anticipateWide-Row Strategy

Details of the NaughtyOrNice List One row per standing:country

Ensures all children in a country are grouped together on disk.

One column per child using a compound keyEnsures the columns are sorted to support our search at varying levels of granularity○ e.g. All nice children in the US.○ e.g. All naughty children in PA.

Node 3

Node 2

Node 1

VisuallyNice:USA

CA:94333:johny.b.good

CA:94333:richie.rich

Nice:IRL

D:EI33:collin.oneill

D:EI33:owen.oneill

Nice:USA

CA:94111:bart.simpson

CA:94222:dennis.menace

PA:18964:michael.myers

Watch out for:• Hot spotting• Unbalanced Clusters

(1) Go to the row.(2) Get the column slice

Our Schema

bin/cqlsh -3 CREATE KEYSPACE northpole WITH replication =

{'class':'SimpleStrategy', 'replication_factor':1};

create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;

create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) );

bin/cassandra-cli(the “old school” interface)

The CQL->Data Model Rules First primary key becomes the rowkey.

Subsequent components of the primary key form a composite column name.

One column is then written for each non-primary key column.

CQL Viewcqlsh:northpole> select * from naughtyornicelist ;

standingbycountry | state | zip | childid-------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich

CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100-------------------RowKey: naughty:USA=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)-------------------RowKey: nice:IRL=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)-------------------RowKey: nice:USA=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)

Data Model Implications

select * from children where childid='owen.oneill';

select * from naughtyornicelist where childid='owen.oneill';

Bad Request:

select * from naughtyornicelist where standingbycountry='nice:IRL' and state='D' and zip='EI33' and childid='owen.oneill';

Let’s get cranking.

No, seriously. Let’s code! What API should we use?

Production-Readiness

Potential Momentum

Thrift 10 -1 -1

Hector 10 8 8

Astyanax 8 9 10

Kundera (JPA) 6 9 9

Pelops 7 6 7

Firebrand 8 10 8

PlayORM 5 8 7

GORA 6 9 7

CQL Driver ? ? ?

IMHO!

Asytanax FTW!

Connect this.astyanaxContext = new AstyanaxContext.Builder()

.forCluster("ClusterName")

.forKeyspace(keyspace)

.withAstyanaxConfiguration(…)

.withConnectionPoolConfiguration(…)

.buildKeyspace(ThriftFamilyFactory.getInstance());

Specify:Cluster Name (arbitrary identifier)Keyspace Node Discovery MethodConnection Pool Information

Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get());mutation.withRow(columnFamily, rowKey)

.putColumn(entry.getKey(), entry.getValue(), null);mutation.execute();

Process:Create a mutationSpecify the Column Family with SerializersPut your columns.Execute

Composite Types

Composite (a.k.a. Compound)

public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId;}

Range Builders

range = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");

Then...

.withColumnRange(range).execute();

What about the toys!?

CQL Collections!

http://www.datastax.com/dev/blog/cql3_collections

Set UPDATE users SET emails = emails + {'fb@friendsofmordor.org'} WHERE user_id = 'frodo';

List UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';

Maps UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id = 'frodo';

CQL vs. Thrift

http://www.datastax.com/dev/blog/thrift-to-cql3

Thrift is legacy API on which all of the Java APIs are built.

CQL is the new native protocol and driver.

Let’s get back to cranking… Recreate the schema (to be CQL friendly) UPDATE children SET toys = toys + [ ‘legos' ] WHERE

childId = ’owen.oneill’;

Crank out a Dao layer to use CQL collections operations.

Shameless Shoutout(s)

Virgil https://github.com/boneill42/virgil

REST interface for Cassandra

https://github.com/boneill42/storm-cassandraDistributed Processing on Cassandra(Webinar in January)

Thanks!

https://github.com/boneill42/naughty-or-nice

Brian O’Neill@boneill42bone@alumni.brown.edu

Recommended