PHP and Cassandra

Preview:

DESCRIPTION

A very quick introduction to Cassandra database plus accessing it from PHP. My slides from my 5 minute "lightening" talk at PHP London on 1/7/2010.

Citation preview

A (very) quick introduction to using

PHP and Cassandra

@davegardnerisme

What is Cassandra?

Highly scalable distributed database

Brings together Dynamo and Bigtable

Schema-less

#nosql!

Why Cassandra?

Horizontally scalable – RW increase linearly

Fault tolerant – no single point of failure

Hadoop integration for scalable Map Reduce

Good support via community (plus professional support)

Data model

Store your data in the way you want to query it – denormalise

Some people say Cassandra is a 4 or 5 level hash (1)

[KeySpace][ColumnFamily][Key][Column]

[KeySpace][ColumnFamily][Key][SuperColumn][SubColumn]

Configuring – storage-conf.xml

<ColumnFamily Name="Standard1" CompareWith="BytesType"/>

<ColumnFamily Name="Standard2" CompareWith="UTF8Type"/>

<ColumnFamily Name="Super1" ColumnType="Super" CompareWith="BytesType" CompareSubcolumnsWith="BytesType" />

Good example within an easily understandableproblem domain: Twissandra! (2)

PHP

Access the core API via Thrift (3)

Higher level libraries do exist; PHPCassa (4)

and Pandra (5)

Compile Thrift which will generate the PHP libraries for you (6)

Native PHP Thrift extension recommended

// SOME CODE!// connect$socket = new TSocket('192.168.1.206', 9160);$transport = new TBufferedTransport($socket, 1024, 1024);$protocol = new TBinaryProtocolAccelerated($transport);$client = new CassandraClient($protocol);$transport->open();

// fetch single column from single row$columnPath = new cassandra_ColumnPath();$columnPath->column_family = 'UrlsVisited';$columnPath->column = $url;

$userUrl = $client->get($keyspace, // a bit like the database$userId, // the row key$columnPath, //identifies columns we want$consistencyLevelOne);

// Inserting multiple columns for a single row// (bit like populating one row of MySQL)$key = 'UniqueRowKey';$columnFamily = 'ResponsePersonality';$mutationMap = array(

$key=>array($columnFamily=>array()) );

// add our first column:$column = new cassandra_Column(array( 'name'=> 'howMuchWork', 'value'=> 'quiteABit', 'timestamp'=> time() ));$columnOrSupercolumn = new cassandra_ColumnOrSuperColumn( array('column'=>$column));$mutationMap[$key][$columnFamily][] = new cassandra_Mutation( array('column_or_supercolumn'=>$columnOrSupercolumn));

// add our second column!:$column = new cassandra_Column(array( 'name'=> 'nextColumnName', 'value'=> 'wow', 'timestamp'=> time() ));$columnOrSupercolumn = new cassandra_ColumnOrSuperColumn( array('column'=>$column));$mutationMap[$key][$columnFamily][] = new cassandra_Mutation( array('column_or_supercolumn'=>$columnOrSupercolumn));

// repeat with other columns ...

// finally we call batch_mutate to add!$client->batch_mutate(

$keyspace,$mutationMap,$consistencyLevelZero);

Finally… Hadoop integration

Support for creating Hadoop jobs in Java (7)

Support for PIG (higher level language) (7)

Cassandra 0.7 will include output support (8)

No support for Hive yet! (9)

(SQL-like syntax for creating Map Reduce jobs)

References/links

1. Cassandra is four or five level hash:https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/

2. Twissandra:http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

3. Thrift API:http://wiki.apache.org/cassandra/API

4. PHPCassa:http://github.com/hoan/phpcassa

5. Pandra:http://github.com/mjpearson/Pandra

References/links

6. Using Cassandra with PHP (installing Thrift)https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP

7. Hadoop Support in Cassandra:http://wiki.apache.org/cassandra/HadoopSupport

8. Output support in Cassandra:https://issues.apache.org/jira/browse/CASSANDRA-1101

9. Hive support (feature request!):https://issues.apache.org/jira/browse/CASSANDRA-913

Recommended