Upload
david-zuelke
View
3.861
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Presentation given at International PHP Conference Spring Edition 2011.
Citation preview
David Zülke
David Zuelke
http://en.wikipedia.org/wiki/File:München_Panorama.JPG
Founder
Lead Developer
COUCHDB IN THREE SLIDESFull Of DIS IS SRS BSNS Bullet Points
COUCHDB STORES DOCUMENTS
• CouchDB stores documents with arbitrary keys and values
• Each document is identified by an ID and has a revision
•Documents can have file attachments
• Stored as JSON, so it’s easy to interface with
COUCHDB SPEAKS HTTP
• CouchDB uses HTTP to communicate with clients & servers
• That means scalability
• That means a lot of kick ass stuff totally for free
• Caching
• Load Balancing
• Content Negotiation
COUCHDB USES MVCC
•Multiversion Concurrency Control
•When updating, you must supply a revision number
• Your change will be rejected if the revision is not the latest
• All writes are serialized
•No need for locks, but puts some responsibility on developers
THE DETAILSAn In-Depth Look At What Makes CouchDB Different
CAP
consistency
availability
partition toleranceX
Do you know the
theorem?
“So, CouchDB does not have consistency of CAP?”
“Booh, that means my data will be inconsistent. Fail!”
psssshhh
YOUR MOM IS INCONSISTENT
CouchDB is eventually consistent
When replicating, conflicting revisions will be marked as such
These conflicts can then be resolved (users, daemons,...)
and everything will be fine\o/
which brings us to...
REPLICATION
• You can do Master-Master replication
• Conflicts are detected and marked automatically
• Conflicts are supposed to be resolved by applications
•Or by users, who usually know best what to do!
CouchDB is Ground Computing
Imagine a world where every computer runs CouchDB
Ubuntu One already does, to sync bookmarks etc!
MAP/REDUCE
BASIC PRINCIPLE: MAPPER
• The Mapper reads records and emits <key, value> pairs
• Example: Apache access.log
• Each line is a record
• Extract client IP address and number of bytes transferred
• Emit IP address as key, number of bytes as value
• For hourly rotating logs, the job can be split across 24 nodes*
* In pratice, it’s a lot smarter than that
BASIC PRINCIPLE: REDUCER
• A Reducer is given a key and all values for this specific key
• Even if there are many Mappers on many computers; the results are aggregated before they are handed to Reducers
• Example: Apache access.log
• The Reducer is called once for each client IP (that’s our key), with a list of values (transferred bytes)
•We simply sum up the bytes to get the total traffic per IP!
EXAMPLE OF MAPPED INPUT
IP Bytes
212.122.174.13 18271
212.122.174.13 191726
212.122.174.13 198
74.119.8.111 91272
74.119.8.111 8371
212.122.174.13 43
REDUCER WILL RECEIVE THIS
IP Bytes
212.122.174.13
18271
212.122.174.13191726
212.122.174.13198
212.122.174.13
43
74.119.8.11191272
74.119.8.1118371
AFTER REDUCTION
IP Bytes
212.122.174.13 210238
74.119.8.111 99643
COUCHDB INCREMENTAL MAPREDUCE
THE KEY DIFFERENCE
•Maps and Reduces are incremental:
• If one document changes, only that one document needs:
•mapping
• reduction
• Then a few new reduce runs are performed to compute the final result
MAPPER: DOCS BY TAGS
function(doc) { if(doc.type == 'product') { (doc.tags || []).forEach(function(tag) { emit(tag, doc); }); }}
MAPREDUCE: COUNT TAGS
function(doc) { if(doc.type == 'product') { (doc.tags || []).forEach(function(tag) { emit(tag, 1); }); }}
function(key, values) { return sum(values);}
_sum
built-in CouchDB function, very efficient
BUT WAIT!There are no tables :(
so... how do you join data from related documents?
JOIN PRODUCTS WITH THEIR CATEGORIES
function(doc) { if(doc.type == 'product') { emit([doc._id, 0], doc); emit([doc._id, 1], { _id: doc.category_id }); }}
["123", 0] {_id: "123", _rev: "5-‐a72", type: "product", "name": "Laser Beam"}["123", 1] {_id: "est", _rev: "2-‐9af", type: "category", "name": "Evil Stuff"}
["817", 0] {_id: "817", _rev: "2-‐aa8", type: "product", "name": "Rocketship"}["817", 1] {_id: "cst", _rev: "3-‐d8a", type: "category", "name": "Cool Stuff"}
["441", 0] {_id: "441", _rev: "19-‐fdf", type: "product", "name": "Sharks"}["441", 1] {_id: "est", _rev: "2-‐9af", type: "category", "name": "Evil Stuff"}
JOIN CATEGORIES WITH ALL THEIR PRODUCTS
function(doc) { if(doc.type == 'category') { emit([doc._id, 0], doc); } elseif(doc.type == 'product') { emit([doc.category_id, doc._id], doc); }}
["est", 0] {_id: "est", _rev: "2-‐9af", type: "category", "name": "Evil Stuff"}["est", "123"] {_id: "123", _rev: "5-‐a72", type: "product", "name": "Laser Beam"}["est", "441"] {_id: "441", _rev: "19-‐fdf", type: "product", "name": "Sharks"}
["cst", 0] {_id: "cst", _rev: "3-‐d8a", type: "category", "name": "Cool Stuff"}["cst", "817"] {_id: "817", _rev: "2-‐aa8", type: "product", "name": "Rocketship"}
BUT... BUT... WAIT!How to guarantee a document's structure if it’s all schema-less?
VALIDATE DOCUMENTS
function (newDoc, savedDoc, userCtx) {
if(savedDoc && savedDoc.created_at != newDoc.created_at) { throw({forbidden: 'created_at is immutable'}); }
if(doc.type == 'product') { if(!doc.price) { throw({forbidden: 'product must have a price'}); } }
}
VALIDATE DOCUMENTS
function (newDoc, savedDoc, userCtx) {
function require(beTrue, message) { if(!beTrue) throw({forbidden: message}); }
require(savedDoc && savedDoc.created_at != newDoc.created_at, 'created_at is immutable' );
if(doc.type == 'product') { require(!doc.price, 'product must have a price' ); }
}
LUCENE INTEGRATIONFull Control Over What Is Indexed, And How
COUCHAPPPython Tool For Development And Deployment
DEMO TIMELet’s Relax On The Couch
!e End
FURTHER READING
• http://guide.couchdb.org/
• http://couchdb.apache.org/
• http://github.com/couchapp/couchapp
• http://github.com/rnewson/couchdb-lucene/
• http://www.couchbase.com/downloads/
• http://j.mp/oqbQs (E4X in CouchDB for XML parsing)
Questions?
THANK YOU!This was
http://joind.in/3521by
@dzuelke