Embracing Constraints With CouchDB

Preview:

DESCRIPTION

Presentation given at Dutch PHP Conference 2010. http://joind.in/1651

Citation preview

EMBRACING CONSTRAINTS WITH COUCHDB

David Zülke

David Zuelke

http://en.wikipedia.org/wiki/File:München_Panorama.JPG

Founder

Lead Developer

http://joind.in/1651

A DISCLAIMER FIRSTBefore You All Figure This Out Yourselves...

NEIN NEIN NEIN NEIN

DAS IST BETRUG

This talk is not really about embracing constraints

I’ll tell you what it’s really about when we’re finished

I’ll also apologize to you for lying at that point

(it’s always easier to apologize than to ask for permission)

COUCHDB IN THREE SLIDESFull Of DIS IS SRS BSNS Bullet Points

COUCHDB STORES DOCUMENTS

• CouchDB stores documents with arbitrary keys and values

• Each document is identified by an ID and has a revision

•Documents can have file attachments

• Stored as JSON, so it’s easy to interface with

COUCHDB SPEAKS HTTP

• CouchDB uses HTTP to communicate with clients & servers

• That means scalability

• That means a lot of kick ass stuff totally for free

• Caching

• Load Balancing

• Content Negotiation

COUCHDB USES MVCC

•Multiversion Concurrency Control

•When updating, you must supply a revision number

• Your change will be rejected if the revision is not the latest

• All writes are serialized

•No need for locks, but puts some responsibility on developers

SOME DETAILSAn In-Depth Look At What Makes CouchDB Different

CAP

consistency

availability

partition toleranceX

“So, CouchDB does not have consistency of CAP?”

“Booh, that means my data will be inconsistent. Fail!”

psssshhh

YOUR MOM IS INCONSISTENT

CouchDB is eventually consistent

When replicating, conflicting revisions will be marked as such

These conflicts can then be resolved (users, daemons,...)

and everything will be fine\o/

which brings us to...

REPLICATION

• You can do Master-Master replication

• Conflicts are detected and marked automatically

• Conflicts are supposed to be resolved by applications

•Or by users, who usually know best what to do!

CouchDB is Ground Computing

Imagine a world where every computer runs CouchDB

Ubuntu One already does, to sync bookmarks etc!

MAP/REDUCE

BASIC PRINCIPLE: MAPPER

• The Mapper reads records and emits <key, value> pairs

• Example: Apache access.log

• Each line is a record

• Extract client IP address and number of bytes transferred

• Emit IP address as key, number of bytes as value

• For hourly rotating logs, the job can be split across 24 nodes** In pratice, it’s a lot smarter than that

BASIC PRINCIPLE: REDUCER

• A Reducer is given a key and all values for this specific key

• Even if there are many Mappers on many computers; the results are aggregated before they are handed to Reducers

• Example: Apache access.log

• The Reducer is called once for each client IP (that’s our key), with a list of values (transferred bytes)

•We simply sum up the bytes to get the total traffic per IP!

EXAMPLE OF MAPPED INPUT

IP Bytes

212.122.174.13 18271

212.122.174.13 191726

212.122.174.13 198

74.119.8.111 91272

74.119.8.111 8371

212.122.174.13 43

REDUCER WILL RECEIVE THIS

IP Bytes

212.122.174.13

18271

212.122.174.13191726

212.122.174.13198

212.122.174.13

43

74.119.8.11191272

74.119.8.1118371

AFTER REDUCTION

IP Bytes

212.122.174.13 210238

74.119.8.111 99643

COUCHDB INCREMENTAL MAPREDUCE

THE KEY DIFFERENCE

•Maps and Reduces are incremental:

• If one document changes, only that one document needs:

•mapping

• reduction

• Then a few new reduce runs are performed to compute the final result

MAPPER: DOCS BY TAGS

function(doc)  {    if(doc.type  ==  'talk')  {        (doc.tags  ||  []).forEach(function(tag)  {            emit(tag,  doc);        });    }}

MAPREDUCE: COUNT TAGS

function(doc)  {    if(doc.type  ==  'talk')  {        (doc.tags  ||  []).forEach(function(tag)  {            emit(tag,  1);        });    }}

function(key,  values)  {    return  sum(values);}

LUCENE INTEGRATIONFull Control Over What Is Indexed, And How

COUCHAPPPython Tool For Development And Deployment

DEMO TIMELet’s Relax On The Couch

!e End

FURTHER READING

• http://books.couchdb.org/

• http://couchdb.apache.org/

• http://github.com/couchapp/couchapp

• http://github.com/rnewson/couchdb-lucene/

• http://janl.github.com/couchdbx/

• http://j.mp/oqbQs (E4X in CouchDB for XML parsing)

DID YOU SEE THE HEAD FAKE?This Talk Was Not About Embracing Constraints

It Was About Embracing Awesomeness

Questions?

THANK YOU!This was

http://joind.in/1651by

@dzuelke

Recommended