23
The Myth of Cassandra I’ve had it with these crazed oracles Cameron Kilgore | @thrillgo NoSQL Series

The myth of Cassandra

Embed Size (px)

DESCRIPTION

This presentation goes over some introductory concepts of the Cassandra distributed database.

Citation preview

Page 1: The myth of Cassandra

The Myth of Cassandra

I’ve had it with these crazed oracles

Cameron Kilgore | @thrillgore

NoSQL Series

Page 2: The myth of Cassandra

Cas·san·dra [kəˈsændrə], noun

1) [Classical Greek Mythology.] A daughter of Priam and Hecuba, a prophet cursed by Apollo so that her prophecies, though true, were fated never to be believed.

2) [fml. “Apache Cassandra”] An open-source distributed, non-relational (NoSQL) database developed at Facebook, written in Java, and maintained as an Apache Software Foundation product

Page 3: The myth of Cassandra

What Cassandra does

Nonrelational associative array (key-value) data storage

Distributed One-hop DHT (akin to Amazon Dynamo)

Eventually Consistent Column-based storage Queries faster than MySQL

Based on white papers and real-world use cases Fault tolerant

Provides no single point of failure Load balancing

Page 4: The myth of Cassandra

What Cassandra Does Not Revision History Relational Data

There’s this thing called “MySQL” that might be just up your alley

Provide an admin app Chiton is an in-development desktop app▪ http://github.com/driftx/chiton

Store individual data fields greater than 231-1 (2,147,483,647) bytes

Provide any interfaces outside of Thrift or high-level interfaces

Page 5: The myth of Cassandra

She who entangles companies

Already at use at Facebook Also being used at:

Digg Reddit Twitter Rackspace Cisco IBM Cloudkick OpenX And more…

Page 6: The myth of Cassandra

Introducing CassandraUnderstanding the concepts of data in Cassandra, scalability

Page 7: The myth of Cassandra

Columns and Data

Data is stored in columns, each organized by keyspaces

Each column stores data and can be culled based on its name value, akin to an associative array

Key1

Column

Key2

Column

Column

Key3ColumnColumnColumnColumn

+name: byte[]+value: byte[]+timestamp: long

Page 8: The myth of Cassandra

Supercolumns

What happens when Xzibit uses Cassandra Supercolumns allow you to nest n number of

columns in another column And in return in a key you can nest n number of

supercolumns. (not shown here due to Office fail)

Key2

Supercolumn

Column

Column

Column

Column

Key1

Supercolumn

Column

Column

Column

Page 9: The myth of Cassandra

Anatomy of a Column

Cassandra is written in Java, so we abide by the rules of its variables Most of them will be bytestrings (byte[]), set

in Unicode +time being the only value not stored as a

bytestring, instead as a long▪ Java compares the +time across other Cassandra

nodes to reconcile data across nodes▪ Is NOT used for revision history

Each column represented by an unseen UUID

Page 10: The myth of Cassandra

Anatomy of a Column (cont.)

Columns are found by their +name value, not their UUID

You cannot have multiple columns of the same name (assigning one with the same name rewrites an existing one in that given keyspace)

Page 11: The myth of Cassandra

Accessing the Data

Data accessed through the Apache Incubator™ Thrift API

Thrift can be accessed with any programming language or application

High-level implementations for languages exist

For our demos we’re going to use the cassandra-cli client, which gives us the ability to insert/remove/edit

Page 12: The myth of Cassandra

<INSERT CALL TO DEMO HERE>OH GOD HOW DID I GET HERE I AM NOT GOOD WITH COMPUTER

Page 13: The myth of Cassandra

Security in Cassandra

Cassandra does have user authentication through a SimpleAuthenticator module that is configured in conf files Very rudimentary

Ran out of time and suitable documentation to demonstrate it

Cassandra is not ACID-compliant

Page 14: The myth of Cassandra

Load Balancing

Cassandra 0.6 has load balancing capabilities Not automatic, must be configured per

node Load is shared in a token-ring fashion

across the nodes in a multi-node configuration

Covered in the documentation for Cassandra

Page 15: The myth of Cassandra

Monitoring Cassandra

Cassandra exposes metrics as JMX data, so any JMX monitoring app should be sufficient. Nagios Munin OpenNMS Any official Oracle™ Java monitoring and

administration software▪ What? I can’t be bothered to not search for the name of

the software? Cassandra also has software for monitoring

node activity, check the docs

Page 16: The myth of Cassandra

Use Case ExampleAnd a very simple one at that

Page 17: The myth of Cassandra

Product Ordering Application

An ordering application implemented using a SQL database could span hundreds of tables and require constant iterations over its lifespan

What if the attributes of these products (in this case, HVAC components) were stored in Cassandra, and we kept pricing, users, and sessions data in a RDBMS?

Page 18: The myth of Cassandra

Benefits to Cassandra

The data for these products that might need to be added won’t require new RDBMS fields – we can just add them in new columns and write our code statements to ignore them if they aren’t there

We aren’t limited to bottlenecks in the RDBMS if we choose to go multinode in our Cassandra setup

No single point of failure if we choose to go multinode

If we get a lot of users (unlikely), the nodes will equally distribute the load

Less time spent on queries Depends on how effective our data is stored and the

performance of our application

Page 19: The myth of Cassandra

Downsides to Cassandra

We may not have the funding needed to procure a multinode configuration

No guarantee that existing data that might need to be reconfigured might be changed over time to meet the demands of sales, engineering, executive, etc.

Data collected and given some form of relation inside the application itself, with no schema

Cassandra lacks a vetted security framework that could put us at risk

Cassandra also lacks a complete administration application Chiton is barely functional as-is

Might not make sense when some RDBMS can scale across machines

Page 20: The myth of Cassandra

A (crude) data map showing our data in practice

Louvers keyspace

Louver supercolumn

PriceProduct Multipli

erHeight Width

Actuator

Options

Misc. Options Paint

Page 21: The myth of Cassandra

Cassandra and PHPThis is a PHP User group after all.

Page 22: The myth of Cassandra

Talking to Cassandra

Low-level framework, Thrift, is the actual client API for Cassandra

In PHP we have two such frameworks that work through Thrift phpcassa Pandra

Ran out of time to prepare a demo There’s always another time for a demo.

Stay tuned.

Page 23: The myth of Cassandra

Any Questions?You will be baked, and there will be cake