Upload
datastax
View
113
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
CREATING YOUR FIRST JAVA APP W/
C*
Brian O’Neill, Lead Architect, Health Market Science
[email protected]@boneill42
MISSION: HELP SANTA!
Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)
Our Problem
Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.
The World-Wide Globally Scalable Naughty List!
How about a Naughty and Nice list for Santa?
1.9 billion childrenThat will fit in a single row!
Queries to support:Children can login and check
their standing.Santa can find nice children by
country, state or zip.
Getting Setup.
Installation
As easy as… Downloadhttp://cassandra.apache.org/download/
Uncompresstar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz
Runbin/cassandra –f
(-f puts it in foreground)
Configuration
conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassandra/commitlog
conf/log4j-server.propertieslog4j.appender.R.File=/var/log/cassandra/system.log
Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row
Have arbitrary #’s of columnsValidator for keys (e.g. UTF8Type)
ColumnValidator for values and keysComparator for keys (e.g. DateType or BYOC)
(http://www.youtube.com/watch?v=bKfND4woylw)
Distributed Architecture Nodes form a token ring.
Nodes partition the ring by initial tokeninitial_token: (in cassandra.yaml)
Partitioners map row keys to tokens.Usually randomly, to evenly distribute the data
All columns for a row are stored together on disk in sorted order.
Visually
A(67-0)
B(1-33)
C(34-66)
Row Hash
Alice 50
Bob 3
Eve 15
Token/Hash Range : 0-99
Java Interpretation
Each table is a Distributed HashMap Each row is a SortedMap.
Cassandra provides a massively scalable version of:
HashMap<rowKey, SortedMap<columnKey, columnValue>
Implications:Direct row fetch is fast.Searching a range of rows can be costly.Searching a range of columns is cheap.
Defining our schema
Two Tables
Children TableStore all the children in the world.One row per child.One column per attribute.
NaughtyOrNice TableSupports the queries we anticipateWide-Row Strategy
Details of the NaughtyOrNice List One row per standing:country
Ensures all children in a country are grouped together on disk.
One column per child using a compound keyEnsures the columns are sorted to support our search at varying levels of granularity○ e.g. All nice children in the US.○ e.g. All naughty children in PA.
Node 3
Node 2
Node 1
VisuallyNice:USA
CA:94333:johny.b.good
CA:94333:richie.rich
Nice:IRL
D:EI33:collin.oneill
D:EI33:owen.oneill
Nice:USA
CA:94111:bart.simpson
CA:94222:dennis.menace
PA:18964:michael.myers
Watch out for:• Hot spotting• Unbalanced Clusters
(1) Go to the row.(2) Get the column slice
Our Schema
bin/cqlsh -3 CREATE KEYSPACE northpole WITH replication =
{'class':'SimpleStrategy', 'replication_factor':1};
create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;
create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) );
bin/cassandra-cli(the “old school” interface)
The CQL->Data Model Rules First primary key becomes the rowkey.
Subsequent components of the primary key form a composite column name.
One column is then written for each non-primary key column.
CQL Viewcqlsh:northpole> select * from naughtyornicelist ;
standingbycountry | state | zip | childid-------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich
CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100-------------------RowKey: naughty:USA=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)-------------------RowKey: nice:IRL=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)-------------------RowKey: nice:USA=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
Data Model Implications
select * from children where childid='owen.oneill';
select * from naughtyornicelist where childid='owen.oneill';
Bad Request:
select * from naughtyornicelist where standingbycountry='nice:IRL' and state='D' and zip='EI33' and childid='owen.oneill';
Let’s get cranking.
No, seriously. Let’s code! What API should we use?
Production-Readiness
Potential Momentum
Thrift 10 -1 -1
Hector 10 8 8
Astyanax 8 9 10
Kundera (JPA) 6 9 9
Pelops 7 6 7
Firebrand 8 10 8
PlayORM 5 8 7
GORA 6 9 7
CQL Driver ? ? ?
IMHO!
Asytanax FTW!
Connect this.astyanaxContext = new AstyanaxContext.Builder()
.forCluster("ClusterName")
.forKeyspace(keyspace)
.withAstyanaxConfiguration(…)
.withConnectionPoolConfiguration(…)
.buildKeyspace(ThriftFamilyFactory.getInstance());
Specify:Cluster Name (arbitrary identifier)Keyspace Node Discovery MethodConnection Pool Information
Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get());mutation.withRow(columnFamily, rowKey)
.putColumn(entry.getKey(), entry.getValue(), null);mutation.execute();
Process:Create a mutationSpecify the Column Family with SerializersPut your columns.Execute
Composite Types
Composite (a.k.a. Compound)
public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId;}
Range Builders
range = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");
Then...
.withColumnRange(range).execute();
What about the toys!?
CQL Collections!
http://www.datastax.com/dev/blog/cql3_collections
Set UPDATE users SET emails = emails + {'[email protected]'} WHERE user_id = 'frodo';
List UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
Maps UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id = 'frodo';
CQL vs. Thrift
http://www.datastax.com/dev/blog/thrift-to-cql3
Thrift is legacy API on which all of the Java APIs are built.
CQL is the new native protocol and driver.
Let’s get back to cranking… Recreate the schema (to be CQL friendly) UPDATE children SET toys = toys + [ ‘legos' ] WHERE
childId = ’owen.oneill’;
Crank out a Dao layer to use CQL collections operations.
Shameless Shoutout(s)
Virgil https://github.com/boneill42/virgil
REST interface for Cassandra
https://github.com/boneill42/storm-cassandraDistributed Processing on Cassandra(Webinar in January)