79
Using CouchDB with Python Stefan Kögl @skoegl

Python-CouchDB Training at PyCon PL 2012

Embed Size (px)

Citation preview

Page 1: Python-CouchDB Training at PyCon PL 2012

Using CouchDB with Python

Stefan Kögl@skoegl

Page 2: Python-CouchDB Training at PyCon PL 2012

What we will cover● What is CouchDB?

– Access from Python though couchdbkit

– Key-value Store Functionality

– MapReduce Queries

– HTTP API

● When is CouchDB useful and when not?

– Multi-Master Replication

– Scaling up and down

● Pointers to other resources, CouchDB ecosystem

Page 3: Python-CouchDB Training at PyCon PL 2012

What we won't cover

● CouchApps – browser-based apps that are served by CouchDB

● Detailled Security, Scaling and other operative issues

● Other functionality that didn't fit

Page 4: Python-CouchDB Training at PyCon PL 2012

Training Modes● Code-Along

– Follow Examples, write your own code

– Small Scripts or REPL

● Learning-by-Watching

– Example Application at https://github.com/stefankoegl/python-couchdb-examples

– Slides at https://slideshare.net/skoegl/couch-db-pythonpyconpl2012

– Use example scripts and see what happens

– Submit Pull-Requests!

Page 5: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

– Contents

– CouchDB

– Example Application

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

● Complex MapReduce Queries

● Replication

● Additional Features and the Couch Ecosystem

Page 6: Python-CouchDB Training at PyCon PL 2012

CouchDB● Apache Project

● https://couchdb.apache.org/

● Current Version: 1.2

● Apache CouchDB™ is a database that uses JSON for documents, JavaScript for MapReduce queries, and regular HTTP for an API

Page 7: Python-CouchDB Training at PyCon PL 2012

Example Application● Lending Database

– Stores Items that you might want to lend

– Stores when you have lent what to whom

● Stand-alone or distributed

● Small Scripts that do one task each

● Look at HTTP traffic

Page 8: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

– Setting Up CouchDB

– Installing couchdbkit

– Creating a Database

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

● Complex MapReduce Queries

● Replication

● Additional Features and the Couch Ecosystem

Page 9: Python-CouchDB Training at PyCon PL 2012

Getting Set Up: CouchDB● Provided by me (not valid anymore after the training)

● http://couch.skoegl.net:5984/<yourname>

● Authentication: username training, password training

● Setup your DB_URL in settings.py

● If you want to install your own

– Tutorials: https://wiki.apache.org/couchdb/Installation

– Ubuntu: https://launchpad.net/~longsleep/+archive/couchdb

– Mac, Windows: https://couchdb.apache.org/#download

Page 10: Python-CouchDB Training at PyCon PL 2012

Getting Set Up: couchdbkit● http://couchdbkit.org/

● Python client library # install with pip

pip install couchdbkit

# or from source

git clone git://github.com/benoitc/couchdbkit.git

cd couchdbkit 

sudo python setup.py install

# and then you should be able to import 

import couchdbkit

Page 11: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

– Setting Up CouchDB

– Installing couchdbkit

– Creating a Database

● Key-Value Store

● Simple MapReduce Queries

● Complex MapReduce Queries

● The _changes Feed

● Replication

● Additional Features and the Couch Ecosystem

Page 12: Python-CouchDB Training at PyCon PL 2012

Creating a Database● What we have: a CouchDB server and its URL

eg http://127.0.0.1:5984

● What we want: a database there

eg http://127.0.0.1:5984/myname

● http://wiki.apache.org/couchdb/HTTP_database_API

Page 13: Python-CouchDB Training at PyCon PL 2012

A note on Debugging● Apache-style log files

● Locally– $ tail ­f /var/log/couchdb/couch.log

● HTTP

– http://127.0.0.1:5984/_log?bytes=5000

– http://wiki.apache.org/couchdb/HttpGetLog

Page 14: Python-CouchDB Training at PyCon PL 2012

Creating a Database# ldb-init.py

from restkit import BasicAuth

from couchdbkit import Database

from couchdbkit.exceptions import ResourceNotFound

auth_filter = BasicAuth('username', 'pwd')

db = Database(dburl, filters=[auth_filter])

server = db.server

try:

server.delete_db(db.dbname)

except ResourceNotFound:

pass

db = server.get_or_create_db(db.dbname)

Page 15: Python-CouchDB Training at PyCon PL 2012

Creating a Database

[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - - DELETE /myname/ 200

[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1435.0>] 127.0.0.1 - - HEAD /myname/ 404

[Thu, 06 Sep 2012 16:44:30 GMT] [info] [<0.1440.0>] 127.0.0.1 - - PUT /myname/ 201

Page 16: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

– Modelling Documents

– Storing and Retrieving Documents

– Updating Documents

● Simple MapReduce Queries

● Complex MapReduce Queries

● The _changes Feed

● Replication

● Additional Features and the Couch Ecosystem

Page 17: Python-CouchDB Training at PyCon PL 2012

Key-Value Store● Core of CouchDB

● Keys (_id): any valid JSON string

● Values (documents): any valid JSON objects

● Stored in B+-Trees

● http://guide.couchdb.org/draft/btree.html

Page 18: Python-CouchDB Training at PyCon PL 2012

Modelling a Thing● A thing that we want to lend

– Name

– Owner

– Dynamic properties like ● Description● Movie rating● etc

Page 19: Python-CouchDB Training at PyCon PL 2012

Modelling a Thing● In CouchDB documents are JSON objects

● You can store any dict

– Wrapped in couchdbkit's Document classes for convenience

● Documents can be serialized to JSON …

mydict = mydoc.to_json()

● … and deserialized from JSON

mydoc = DocClass.wrap(mydict)

Page 20: Python-CouchDB Training at PyCon PL 2012

Modelling a Thing# models.py

from couchdbkit import Database, Document, StringProperty

class Thing(Document):

owner = StringProperty(required=True)

name = StringProperty(required=True)

db = Database(DB_URL)

Thing.set_db(db)

Page 21: Python-CouchDB Training at PyCon PL 2012

Storing a Document● Document identified by _id

– Auto-assigned by Database (bad)

– Provided when storing the database (good)

– Think about lost responses

– couchdbkit does that for us

● couchdbkit adds property doc_type with value „Thing“

Page 22: Python-CouchDB Training at PyCon PL 2012

Internal Storage● Database File /var/lib/couchdb/dbname.couch

● B+-Tree of _id

● Access: O(log n)

● Append-only storage

● Accessible in historic order (we'll come to that later)

Page 23: Python-CouchDB Training at PyCon PL 2012

Storing a Document# ldb-new-thing.py

couchguide = Thing(owner='stefan', name='CouchDB The Definitive Guide')

couchguide.publisher = "O'Reilly“

couchguide.to_json()# {'owner': u'stefan', 'doc_type': 'Thing', # 'name': u'CouchDB The Definitive Guide', # 'publisher': u"O'Reilly"}

couchguide.save()

print couchguide._id

# 448aaecfe9bc1cde5d6564a4c93f79c2

Page 24: Python-CouchDB Training at PyCon PL 2012

Storing a Document

[Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - - GET /_uuids?count=1000 200

[Thu, 06 Sep 2012 19:40:26 GMT] [info] [<0.962.0>] 127.0.0.1 - - PUT /lendb/8f14ef7617b8492fdbd800f1101ebb35 201

Page 25: Python-CouchDB Training at PyCon PL 2012

Retrieving a Document● Retrieve Documents by its _id

– Limited use

– Does not allow queries by other properties

# ldb­get­thing.py 

thing = Thing.get(thing_id)

Page 26: Python-CouchDB Training at PyCon PL 2012

Retrieving a Document

[Thu, 06 Sep 2012 19:45:30 GMT] [info] [<0.962.0>] 127.0.0.1 - - GET /lendb/8f14ef7617b8492fdbd800f1101ebb35 200

Page 27: Python-CouchDB Training at PyCon PL 2012

Updating a Document● Optimistic Concurrency Control

● Each Document has a revision

● Each Operation includes revision

● Operation fails if revision doesn't match

Page 28: Python-CouchDB Training at PyCon PL 2012

Updating a Document>>> thing1 = Thing.get(some_id)

>>> thing1._rev

'1­110e1e46bcde6ed3c2d9b1073f0b26'

>>> thing1.something = True

>>> thing1.save()

>>> thing1._rev

'2­3f800dffa62f4414b2f8c84f7cb1a1'

Success

>>> thing2 = Thing.get(some_id)

>>> thing2._rev

'1­110e1e46bcde6ed3c2d9b1073f0b26'

>>> thing2._rev

'1­110e1e46bcde6ed3c2d9b1073f0b26'

>>> thing2.conflicting = 'test'

>>> thing2.save()

couchdbkit.exceptions.ResourceConflict: Document update conflict.

Failed

Page 29: Python-CouchDB Training at PyCon PL 2012

Updating a Document

[Thu, 13 Sep 2012 06:16:52 GMT] [info] [<0.7977.0>] 127.0.0.1 - - GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200

[Thu, 13 Sep 2012 06:16:55 GMT] [info] [<0.7977.0>] 127.0.0.1 - - GET /lendb/d46d311d9a0f64b1f7322d20721f9f1d 200

[Thu, 13 Sep 2012 06:17:34 GMT] [info] [<0.7977.0>] 127.0.0.1 - - PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 201

[Thu, 13 Sep 2012 06:17:48 GMT] [info] [<0.7977.0>] 127.0.0.1 - - PUT /lendb/d46d311d9a0f64b1f7322d20721f9f1d 409

Page 30: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

– Create a View

– Query the View

● Complex MapReduce Queries

● The _changes Feed

● Replication

● Additional Features and the Couch Ecosystem

Page 31: Python-CouchDB Training at PyCon PL 2012

Views● A specific „view“ on (parts of) the data in a database

● Indexed incrementally

● Query is just reading a range of a view sequentially

● Generated using MapReduce

Page 32: Python-CouchDB Training at PyCon PL 2012

MapReduce Views● Map Function

– Called for each document

– Has to be side-effect free

– Emits zero or more intermediate key-value pairs

● Reduce Function (optional)

– Aggregates intermediate pairs

● View Results stored in B+-Tree

– Incrementally pre-computed at query-time

– Queries are just a O(log n)

Page 33: Python-CouchDB Training at PyCon PL 2012

List all Things● Implemented as MapReduce View

● Contained in a Design Document

– Create

– Store

– Query

Page 34: Python-CouchDB Training at PyCon PL 2012

Create a Design Document● Regular document, interpreted by the database

● Views Mapped to Filesystem by directory structure_design/<ddoc name>/views/<view name>/{map,reduce}.js

● Written in JavaScript or Erlang● Pluggable View Servers

– http://wiki.apache.org/couchdb/View_server– http://packages.python.org/CouchDB/views.html– Lisp, PHP, Ruby, Python, Clojure, Perl, etc

Page 35: Python-CouchDB Training at PyCon PL 2012

Design Document

# _design/things/views/by_owner_name/map.js

function(doc) {

if(doc.doc_type == “Thing“) {

emit([doc.owner, doc.name], null);

}

}

Page 36: Python-CouchDB Training at PyCon PL 2012

Intermediate ResultsKey Value

[„stefan“, „couchguide“] null

[„stefan“, „Polish Dictionary“] null

[„marek“, „robot“] null

Page 37: Python-CouchDB Training at PyCon PL 2012

Design Document

# _design/things/views/by_owner_name/reduce.js

_count

Page 38: Python-CouchDB Training at PyCon PL 2012

Reduced Results● Result depends on group level

Key Value

[„stefan“, „couchguide“] 1

[„stefan“, „Polish Dictionary“] 1

[„marek“, „robot“] 1

Key Value

[„stefan“] 2

[„marek“] 1

Key Value

null 3

Page 39: Python-CouchDB Training at PyCon PL 2012

Synchronize Design Docs● Upload the design document

● _id: _design/<ddoc name>

● couchdbkit syncs ddocs from filesystem

● We'll need this a few more times

– Put the following in its own script

– or run$ ./ldb­sync­ddocs.py

Page 40: Python-CouchDB Training at PyCon PL 2012

Synchronize Design Docs# ldb­sync­ddocs.py

from couchdbkit.loaders import FileSystemDocsLoader

auth_filter = BasicAuth('username', 'pwd')

db = Database(dburl, filters=[auth_filter])

loader = FileSystemDocsLoader('_design')

loader.sync(db, verbose=True)

Page 41: Python-CouchDB Training at PyCon PL 2012

View things/by_name● Emitted key-value pairs

● Sorted by key http://wiki.apache.org/couchdb/View_collation

● Keys can be complex (lists, dicts)

● Query

http://127.0.0.1:5984/myname/_design/things/_view/by_name?reduce=false

Key Value _id (implicit) Document (implicit)

[“stefan“, “couchguide“] null { … }

[“stefan“, “Polish Dictionary“] null { … }

Page 42: Python-CouchDB Training at PyCon PL 2012

Query a View

# ldb­list­things.py

things = Thing.view('things/by_owner_name',                    include_docs=True, reduce=False)

for thing in things:

   print thing._id, thing.name, thing.owner

Page 43: Python-CouchDB Training at PyCon PL 2012

Query a View – Reduced

# ldb­overview.py

owners = Thing.view('things/by_owner_name',                    group_level=1)

for owner_status in owners:

    owner = owner_status['key'][0]

    count = owner_status['value']

    print owner, count

Page 44: Python-CouchDB Training at PyCon PL 2012

Break

Page 45: Python-CouchDB Training at PyCon PL 2012

From the Break● Filtering by Price

– startkey = 5

– endkey = 10

● Structure: ddoc name / view name

– Logical Grouping

– Performance

Page 46: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

– Accessing the _changes Feed

– Lending Objects

● Advanced MapReduce Queries

● Replication

● Additional Features and the Couch Ecosystem

Page 47: Python-CouchDB Training at PyCon PL 2012

Changes Sequence● With every document update, a change is recorded

● local history, ordered by _seq value

● Only the latest _seq is kept

Page 48: Python-CouchDB Training at PyCon PL 2012

Changes Feed● List of all documents, in the order they were last modified

● Possibility to

– React on changes

– Process all documents without skipping any

– Continue at some point with since parameter

● CouchDB as a distributed, persistent MQ

● http://guide.couchdb.org/draft/notifications.html

● http://wiki.apache.org/couchdb/HTTP_database_API#Changes

Page 49: Python-CouchDB Training at PyCon PL 2012

Changes Feed# ldb­changes­log.py

def callback(line):

    seq = line['seq']

    doc = line['doc']

   

    # get obj according to doc['doc_type']

    print seq, obj

consumer = Consumer(db)

consumer.wait(callback, since=since, include_docs=True)

Page 50: Python-CouchDB Training at PyCon PL 2012

„Lending“ Objects● Thing that is lent

● Who lent it (ie who is the owner of the thing)

● To whom it is lent

● When it was lent

● When it was returned

Page 51: Python-CouchDB Training at PyCon PL 2012

Modelling a „Lend“ Object# models.py 

class Lending(Document):

    thing = StringProperty(required=True)

    owner = StringProperty(required=True)

    to_user = StringProperty(required=True)

    lent = DateTimeProperty(default=datetime.now)

    returned = DateTimeProperty()

Lending.set_db(db)

Page 52: Python-CouchDB Training at PyCon PL 2012

Lending a Thing# ldb­lend­thing.py

lending = Lending(thing=thing_id,                  owner=username,                  to_user=to_user)           

lending.save()                                                              

Page 53: Python-CouchDB Training at PyCon PL 2012

Returning a Thing# ldb­return­thing.py    

lending = Lending.get(lend_id)

lending.returned = datetime.now()

lending.save()           

Page 54: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

● Advanced MapReduce Queries

– Imitating Joins with „Mixed“ Views

● Replication

● Additional Features and the Couch Ecosystem

Page 55: Python-CouchDB Training at PyCon PL 2012

Current Thing Status● View to get the current status of a thing

● No Joins

● We emit with keys, that group together

Page 56: Python-CouchDB Training at PyCon PL 2012

Complex View# _design/things/_view/history/map.js

function(doc) {

    if(doc.doc_type == "Thing") {

        emit([doc.owner, doc._id, 1], doc.name);

    }

    if(doc.doc_type == "Lending") {

        if(doc.lent && !doc.returned) {

            emit([doc.owner, doc.thing, 2], doc.to_user);

        }

    }

}                                                                               

Page 57: Python-CouchDB Training at PyCon PL 2012

Intermediate View ResultsKey Value

[„stefan“, 12345, 1] „couchguide“

[„stefan“, 12345, 2] [„someone“, „2012-09-12“]

[„marek“, 34544, 1] „robot“

Page 58: Python-CouchDB Training at PyCon PL 2012

Reduce Intermediate Results# _design/things/_view/status/reduce.js

/* use with group_level = 2 */

function(keys, values) {

    

    /* there is at least one „Lending“ row */

    if(keys.length > 1) {

        return "lent";

    } else {

        return "available";

    }

}

Page 59: Python-CouchDB Training at PyCon PL 2012

● Don't forget to synchronize your design docs!

● Group Level: 2

● Reduce Function receives rows with same grouped valueIntermediate – not reduced

reduced

Reduce Intermediate Results

Key Value

[„owner“, 12345] „lent“

[„owner“, 34544] „available“

Key Value

[„stefan“, 12345, 1] „couchguide“

[„stefan“, 12345, 2] [„someone“, „2012-09-12“]

[„marek“, 34544, 1] „robot“

Page 60: Python-CouchDB Training at PyCon PL 2012

Get Status# ldb­status.py

things = Thing.view('things/status', group_level = 2)

for result in things:

    owner = result['key'][0]

    thing_id = result['key'][1]

    status = result['value'])

    Print owner, thing_id, status

Page 61: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

● Advanced MapReduce Queries

● Replication

– Setting up filters

– Find Friends and Replicate from them

– Eventual Consistency and Conflicts

● Additional Features and the Couch Ecosystem

Page 62: Python-CouchDB Training at PyCon PL 2012

Replication● Replicate Things and their status from friends

● Don't replicate things from friends of friends

– we don't want to borrow anything from them

Page 63: Python-CouchDB Training at PyCon PL 2012

Replication● Pull replication

– Pull documents from our friends, and store them locally

● There's also Push replication, but we won't use it

● Goes through the source's _changes feed

● Compares with local documents, updates or creates conflicts

Page 64: Python-CouchDB Training at PyCon PL 2012

Set up a Filter● A Filter is a JavaScript function that takes

– a document

– a request object

● and returns

– true, if the document passes the filter

– false otherwise

● A filter is evaluated at the source

Page 65: Python-CouchDB Training at PyCon PL 2012

Replication Filter# _design/things/filters/from_friend.js

/* doc is the document, 

   req is the request that uses the filter */

function(doc, req)

{

    /* Allow only if entry is owned by the friend */

    return (doc.owner == req.query.friend);

}

Page 66: Python-CouchDB Training at PyCon PL 2012

Replication● Sync design docs to your own database!

● Find friends to borrow from

– Post your nickname and Database URL to http://piratepad.net/pycouchpl

– Pick at least two friends

Page 67: Python-CouchDB Training at PyCon PL 2012

Replication● _replicator database

● Objects describe Replication tasks

– Source

– Target

– Continuous

– Filter

– etc

● http://wiki.apache.org/couchdb/Replication

Page 68: Python-CouchDB Training at PyCon PL 2012

Replication# ldb­replicate­friend.py

auth_filter = BasicAuth(username, password)

db = Database(db_url, filters=[auth_filter])

replicator_db = db.server['_replicator']

replication_doc = {

    "source": friend_db_url,  "target": db_url,

    "continuous": True, 

    "filter": "things/from_friend",

    "query_params": { "friend": friend_name }

}

replicator_db[username+“­“+friend_name]=replication_doc

Page 69: Python-CouchDB Training at PyCon PL 2012

Replication● Documents should be propagated into own database

● Views should contain both own and friends' things

Page 70: Python-CouchDB Training at PyCon PL 2012

Dealing with Conflicts● Conflicts introduces by

– Replication

– „forcing“ a document update

● _rev calculated based on

– Previous _rev

– document content

● Conflict when two documents have

– The same _id

– Distinct _rev

Page 71: Python-CouchDB Training at PyCon PL 2012

Dealing with Conflicts● Select a Winner

● Database can't do this for you

● Automatic strategy selects a (temporary) winner

– Deterministic: always the same winner on each node

– leaves them in conflict state

● View that contains all conflicts

● Resolve conflict programmatically

● http://guide.couchdb.org/draft/conflicts.html

● http://wiki.apache.org/couchdb/Replication_and_conflicts

Page 72: Python-CouchDB Training at PyCon PL 2012

Contents● Intro

● DB Initialization

● Key-Value Store

● Simple MapReduce Queries

● The _changes Feed

● Advanced MapReduce Queries

● Replication

● Additional Features and the Couch Ecosystem

– Scaling and related Projects

– Fulltext Search

– Further Reading

Page 73: Python-CouchDB Training at PyCon PL 2012

Scaling Up / Out● BigCouch

– Cluster of CouchDB nodes that appears as a single server

– http://bigcouch.cloudant.com/

– will be merged into CouchDB soon

● refuge

– Fully decentralized data platform based on CouchDB

– Includes fork of GeoCouch for spatial indexing

– http://refuge.io/

Page 74: Python-CouchDB Training at PyCon PL 2012

Scaling Down● CouchDB-compatible Databases on a smaller scale

● PouchDB

– JavaScript library http://pouchdb.com/

● TouchDB● IOS: https://github.com/couchbaselabs/TouchDB-iOS● Android: https://github.com/couchbaselabs/TouchDB-Android

Page 75: Python-CouchDB Training at PyCon PL 2012

Fulltext and Relational Search● http://wiki.apache.org/couchdb/Full_text_search

● CouchDB Lucene

– http://www.slideshare.net/martin.rehfeld/couchdblucene

– https://github.com/rnewson/couchdb-lucene

● Elastic Search

– http://www.elasticsearch.org/

Page 76: Python-CouchDB Training at PyCon PL 2012

Operations Considerations● Append Only Storage

● Your backup tools: cp, rsync

● Regular Compaction needed

Page 77: Python-CouchDB Training at PyCon PL 2012

Further Features● Update Handlers: JavaScript code that carries out update in

the database server

● External Processes: use CouchDB as a proxy to other processes (eg search engines)

● Attachments: attach binary files to documents

● Update Validation: JavaScript code to validate doc updates

● CouchApps: Web-Apps served directly by CouchDB

● Bulk APIs: Several Updates in one Request

● List and Show Functions: Transforming responses before serving them

Page 78: Python-CouchDB Training at PyCon PL 2012

Summing Up● Apache CouchDB™ is a database that uses JSON for

documents, JavaScript for MapReduce queries, and regular HTTP for an API

● couchdbkit is a a Python library providing access to Apache CouchDB

Page 79: Python-CouchDB Training at PyCon PL 2012

Thanks!

Time for Questions and Discussion

Downloads

https://slideshare.net/skoegl/couch-db-pythonpyconpl2012

https://github.com/stefankoegl/python-couchdb-examples

Stefan Kögl

[email protected]

@skoegl