CouchDB: the last RESTful JSON store you'll ever need

Preview:

Citation preview

CouchDB: the last RESTful JSON service you’ll ever needCarlo Cabanilla

CouchDB is a RESTful JSON store

CouchDB is a RESTful JSON store

Create

> PUT /mydb/stuff> Content-type: application/json>> {"thing_count":1}

Create

> PUT /mydb/stuff> Content-type: application/json>> {"thing_count":1}

< HTTP/1.1 201 Created<< {"_id":"stuff", < "_rev":"1-e86a94c1...", < "thing_count":1}

Create

Read

> GET /mydb/stuff

Read

> GET /mydb/stuff

< HTTP/1.1 200 OK<< {"_id":"stuff", < "_rev":"1-e86a94c1...", < "thing_count":1}

Read

Update

> PUT /mydb/stuff> Content-type: application/json>> {"_id":"stuff", > "_rev":"1-e86a94c1...", > "thing_count":13}

Update

> PUT /mydb/stuff> Content-type: application/json>> {"_id":"stuff", > "_rev":"1-e86a94c1...", > "thing_count":13}

Update

< HTTP/1.1 201 Created<< {"_id":"stuff", < "_rev":"2-39a9bf12...", < "thing_count":13}

Delete

> DELETE /mydb/stuff?rev=2-39a9bf12...

Delete

> DELETE /mydb/stuff?rev=2-39a9bf12...

Delete

< HTTP/1.1 200 OK

a RESTful JSON store

a RESTful JSON store

PUT /look/ma/im/doing/rest

PUT /look/ma/im/doing/rest

Do you support Conditional GET?

Conditional GET is the

holy grail of

REST

Conditional GET is the holy grail of REST

> GET /that/money

Conditional GET is the holy grail of REST

> GET /that/money

< HTTP/1.1 200 OK<< {“that_money”: “$5”}

Conditional GET is the holy grail of REST

Conditional GET is the holy grail of REST

> GET /that/money

Conditional GET is the holy grail of REST

> GET /that/money

(wait for a few ms)

Conditional GET is the holy grail of REST

> GET /that/money

< HTTP/1.1 200 OK<< {“that_money”: “$5”}

(wait for a few ms)

Conditional GET is the holy grail of REST

( )> GET /that/money

< HTTP/1.1 200 OK<< {“that_money”: “$5”}

(wait for a few ms)

x 1,000,000

Conditional GET is the holy grail of REST

( )> GET /that/money

< HTTP/1.1 200 OK<< {“that_money”: “$5”}

(wait for a few ms)

x 1,000,000 = slow!

Conditional GET is the holy grail of REST

Conditional GET is the holy grail of REST

> GET /that/money

Conditional GET is the holy grail of REST

> GET /that/money

(Uncached: slow)

Conditional GET is the holy grail of REST

> GET /that/money

< HTTP/1.1 200 OK< Etag: "1-24c87859"<< {“that_money”: “$5”}

(Uncached: slow)

Conditional GET is the holy grail of REST

> GET /that/money

< HTTP/1.1 200 OK< Etag: "1-24c87859"<< {“that_money”: “$5”}

(Uncached: slow)

Conditional GET is the holy grail of REST

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

(Cache hit: fast)

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

< HTTP/1.1 304 Not Modified< Etag: "1-24c87859"< Content-Length: 0

(Cache hit: fast)

Conditional GET is the holy grail of REST

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

(Data changed so cache miss )

Conditional GET is the holy grail of REST

> GET /that/money> If-None-Match: "1-24c87859"

< HTTP/1.1 200 Ok< Etag: "2-535d9fb2"< < {“that_money”: “$1,000,000”}

(Data changed so cache miss )

Conditional GET is the holy grail of REST

a RESTful JSON store

a RESTful JSON store

GET /my_db/some_id

GET /my_db/some_id

{

GET /my_db/some_id

{ "_id": "some_id",

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...",

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings",

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12,

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12, "and arrays": ["ain't", "it", "cool?"],

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12, "and arrays": ["ain't", "it", "cool?"], "dicts too!": { "don't":"try",

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12, "and arrays": ["ain't", "it", "cool?"], "dicts too!": { "don't":"try", "this":"in",

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12, "and arrays": ["ain't", "it", "cool?"], "dicts too!": { "don't":"try", "this":"in", "Oracle!":null }

GET /my_db/some_id

{ "_id": "some_id", "_rev": "1-24c8785964763c21d...", "my_field1": "json does strings", "numbers too": 123.12, "and arrays": ["ain't", "it", "cool?"], "dicts too!": { "don't":"try", "this":"in", "Oracle!":null }}

Schema-less

How do I know what a document means??

Duck-type your database

Duck-type your database

Duck-typing

def document_factory(doc):

Duck-type your database

Duck-typing

def document_factory(doc): if doc['look'] == Duck.look()

Duck-type your database

Duck-typing

def document_factory(doc): if doc['look'] == Duck.look() and doc['swim'] == Duck.swim()

Duck-type your database

Duck-typing

def document_factory(doc): if doc['look'] == Duck.look() and doc['swim'] == Duck.swim() and doc['quack'] == Duck.quack():

Duck-type your database

Duck-typing

def document_factory(doc): if doc['look'] == Duck.look() and doc['swim'] == Duck.swim() and doc['quack'] == Duck.quack(): return Duck(doc)

Duck-type your database

Duck-typing

Duck-type your database

Python

def document_factory(doc):

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck':

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc)

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog':

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog': return Dog(doc)

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog': return Dog(doc) elif ...:

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog': return Dog(doc) elif ...: ...

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog': return Dog(doc) elif ...: ... else:

Duck-type your database

Python

def document_factory(doc): if doc['type'] == 'Duck': return Duck(doc) elif doc['type'] == 'Dog': return Dog(doc) elif ...: ... else: return GenericDocument(doc)

Duck-type your database

Python

Duck-type your database

Python

def document_factory(doc):

Duck-type your database

Python

def document_factory(doc): import document_types

Duck-type your database

Python

def document_factory(doc): import document_types doc_class = getattr(doc, doc['type'],

Duck-type your database

Python

def document_factory(doc): import document_types doc_class = getattr(doc, doc['type'], 'GenericDocument')

Duck-type your database

Python

def document_factory(doc): import document_types doc_class = getattr(doc, doc['type'], 'GenericDocument') return doc_class(doc)

Duck-type your database

Python

Duck-type your database

Java

public static CouchObj documentFactory(CouchDoc doc) {

Duck-type your database

Java

public static CouchObj documentFactory(CouchDoc doc) { ClassLoader loader = CouchObj.class.getClassLoader();

Duck-type your database

Java

public static CouchObj documentFactory(CouchDoc doc) { ClassLoader loader = CouchObj.class.getClassLoader(); dispatcherClass = loader.loadClass(doc.get("type"));

Duck-type your database

Java

public static CouchObj documentFactory(CouchDoc doc) { ClassLoader loader = CouchObj.class.getClassLoader(); dispatcherClass = loader.loadClass(doc.get("type")); return (CouchObj) dispatcherClass.newInstance(doc);

Duck-type your database

Java

public static CouchObj documentFactory(CouchDoc doc) { ClassLoader loader = CouchObj.class.getClassLoader(); dispatcherClass = loader.loadClass(doc.get("type")); return (CouchObj) dispatcherClass.newInstance(doc);}

Duck-type your database

Java

That’s silly. How do I query something that has no schema?

That’s silly. How do I query something that has no schema?

You write a map reduce function in Javascript!

What the heck is map reduce (in CouchDB)?

You write a map reduce function in Javascript!

What the heck is map reduce (in CouchDB)?

map reduce ≈ programmatically building an index

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

In CouchDB, a map/reduce pair is called a view.

You write a map reduce function in Javascript!

In CouchDB, a map/reduce pair is called a view.(different from a RDBMS view!)

You write a map reduce function in Javascript!

In CouchDB, a map/reduce pair is called a view.(different from a RDBMS view!)

On creation => full database scan then cacheOn db write => incrementally update the cache

You write a map reduce function in Javascript!

In CouchDB, a map/reduce pair is called a view.(different from a RDBMS view!)

On creation => full database scan then cacheOn db write => incrementally update the cache

(different from Hadoop map reduce!)

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

You can tell CouchDB to store any data in the view

You write a map reduce function in Javascript!

You can tell CouchDB to store any data in the view(RDBMSs only store indexed values and row ids)

You write a map reduce function in Javascript!

You can tell CouchDB to store any data in the view(RDBMSs only store indexed values and row ids)

CouchDB view ≈ RDBMS index + materialized view

You write a map reduce function in Javascript!

Yo, data structures nerds,

You write a map reduce function in Javascript!

Yo, data structures nerds,

a CouchDB view is a B+ tree

You write a map reduce function in Javascript!

Yo, data structures nerds,

a CouchDB view is a B+ treemap function determines how keys map to values

You write a map reduce function in Javascript!

Yo, data structures nerds,

a CouchDB view is a B+ treemap function determines how keys map to values

keys are stored in sorted order

You write a map reduce function in Javascript!

Yo, data structures nerds,

a CouchDB view is a B+ treemap function determines how keys map to values

keys are stored in sorted order

O(log n) for reads, writes, deletes and range queries

You write a map reduce function in Javascript!

Yo, data structures nerds,

a CouchDB view is a B+ treemap function determines how keys map to values

keys are stored in sorted order

O(log n) for reads, writes, deletes and range queries

keys are stored close to values

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

For example:

Create an view called people on the names of all person documents.

You write a map reduce function in Javascript!

function map(doc) { if (doc['type'] == 'person') { emit(doc['name'], doc); }}

For example:

Create an view called people on the names of all person documents.

You write a map reduce function in Javascript!

function map(doc) { if (doc['type'] == 'person') { emit(doc['name'], doc); }}

function reduce(keys, values) { return values.length; }

For example:

Create an view called people on the names of all person documents.

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

SELECT count(*) FROM people

You write a map reduce function in Javascript!

SELECT count(*) FROM people

GET /people

You write a map reduce function in Javascript!

SELECT name, count(*) FROM people GROUP BY name ORDER BY name ASC

SELECT count(*) FROM people

GET /people

You write a map reduce function in Javascript!

SELECT name, count(*) FROM people GROUP BY name ORDER BY name ASC

GET /people?group=true

SELECT count(*) FROM people

GET /people

You write a map reduce function in Javascript!

SELECT name, count(*) FROM people GROUP BY name ORDER BY name ASC

GET /people?group=true

SELECT count(*) FROM people

GET /people

SELECT * FROM people ORDER BY name ASC

You write a map reduce function in Javascript!

SELECT name, count(*) FROM people GROUP BY name ORDER BY name ASC

GET /people?group=true

SELECT count(*) FROM people

GET /people

SELECT * FROM people ORDER BY name ASC

GET /people?reduce=false

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

SELECT * FROM people WHERE name = “bjorn”

You write a map reduce function in Javascript!

SELECT * FROM people WHERE name = “bjorn”

GET /people?reduce=false&key=”bjorn”

You write a map reduce function in Javascript!

SELECT * FROM people WHERE name = “bjorn”

GET /people?reduce=false&key=”bjorn”

SELECT * FROM ( SELECT p.*, rownum rn FROM people p ORDER BY name ASC) WHERE rn BETWEEN 50 AND 60

You write a map reduce function in Javascript!

SELECT * FROM people WHERE name = “bjorn”

GET /people?reduce=false&key=”bjorn”

SELECT * FROM ( SELECT p.*, rownum rn FROM people p ORDER BY name ASC) WHERE rn BETWEEN 50 AND 60

GET /people?reduce=false&limit=10&skip=50

You write a map reduce function in Javascript!

What about joins??????

You write a map reduce function in Javascript!

You write a map reduce function in Javascript!

j/k.

Joins will be in version 0.11 (trunk)

You write a map reduce function in Javascript!

j/k.

Joins will be in version 0.11 (trunk)

function map(doc) { for (i in doc['comment_ids']) { var comment_id = doc['comment_ids'][i]; emit(doc['_id'], {'_id': comment_id }); }}

You write a map reduce function in Javascript!

[ {'_id': 1, 'comment_ids': [3, 4]}, {'_id': 2, 'comment_ids': [5]}, {'_id': 3, 'text': 'whoa'}, {'_id': 4, 'text': 'omg'}, {'_id': 5, 'text': 'ponies'}]

Documents in my blog db

You write a map reduce function in Javascript!

[ {'_id': 1, 'comment_ids': [3, 4]}, {'_id': 2, 'comment_ids': [5]}, {'_id': 3, 'text': 'whoa'}, {'_id': 4, 'text': 'omg'}, {'_id': 5, 'text': 'ponies'}]

Documents in my blog db

GET /myblog/comments? reduce=false&include_docs=true

You write a map reduce function in Javascript!

[ {'_id': 1, 'comment_ids': [3, 4]}, {'_id': 2, 'comment_ids': [5]}, {'_id': 3, 'text': 'whoa'}, {'_id': 4, 'text': 'omg'}, {'_id': 5, 'text': 'ponies'}]

Documents in my blog db

GET /myblog/comments? reduce=false&include_docs=true

[ {'key': 1, 'value': {'_id': 3, 'text': 'whoa'}}, {'key': 1, 'value': {'_id': 4, 'text': 'omg'}}, {'key': 5, 'value': {'_id': 5, 'text': 'ponies'}}]

Ok great, so it can pretty much do what a RDBMS can do.

Ok great, so it can pretty much do what a RDBMS can do.

Why not just use a RDBMS?

The real question you should be asking yourself is:

Why keep rewriting HTTP protocol semantics and data object accessorsto front your RDBMS?

The real question you should be asking yourself is:

Middle tier-less web apps!(well, kind of)

No, forreals,

No, forreals,

Why not just use a RDBMS?

ORACLE

OMGReallyAsininelyCostlyLicencingExpenses

ORACLE

ClusterOf Unreliable Commodity Hardware DataBase

ClusterOf Unreliable Commodity Hardware DataBase

?

Cluster?

Replication!

Cluster?

Replication!

Incremental,

Cluster?

Replication!

Incremental,Fault Tolerant,

Cluster?

Replication!

Incremental,Fault Tolerant,Over HTTP

replication is expensive

Mmm, yeah how much did we pay for the LSB?

replication is expensive

Mmm, yeah how much did we pay for the LSB?

replication is expensive

Seems to work well tho(fingers crossed)

Postgres replication is . . . ?

No native replication support yet (coming in v9.0)

Postgres replication is . . . ?

No native replication support yet (coming in v9.0)

Postgres replication is . . . ?

3rd party solutions are trigger-based (yuck)

Replication!

Managing replication conflicts (abends)

Replication!

Managing replication conflicts (abends)

Deterministically pick a version to “win”

Replication!

Managing replication conflicts (abends)

Deterministically pick a version to “win”

Add a “\_conflict” property to the document

Replication!

Managing replication conflicts (abends)

Deterministically pick a version to “win”

Add a “\_conflict” property to the document

App must resolve the conflict

Replication!

Managing replication conflicts (abends)

Much better explanation here:http://books.couchdb.org/relax/reference/conflict-management

Deterministically pick a version to “win”

Add a “\_conflict” property to the document

App must resolve the conflict

Let’s see some appz!

• Design philosophy

• Transforming views with list functions

• Lucene integration

• Performance benchmarks

• Erlang under the hood

• Many to many relationships

• Changes feed

• Potential applications at WGen

• Elaborate on middle tier-less web apps

Not enough time :(

http://couchdb.apache.org/

Recommended