68
Birkbeck, University of London School of Computer Science and Information Systems MSc Computer Science Project Report Document-Oriented Persistence with CouchDB Supervisor: Nigel Martin Author: Michael Lenahan This project report is substantially the result of my own work, expressed in my own words, except where explicitly indicated in the text. I give my permission for it to be submitted to the JISC Plagarism Detection Service. September 2010

Project Report for my MSc Computer Science project at Birkbeck

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Project Report for my MSc Computer Science project at Birkbeck

Birkbeck University of London

School of Computer Science and Information Systems

MSc Computer Science

Project Report

Document-OrientedPersistence with CouchDB

Supervisor Nigel Martin

Author Michael Lenahan

This project report is substantially the result of my own work expressed inmy own words except where explicitly indicated in the text I give my

permission for it to be submitted to the JISC Plagarism Detection Service

September 2010

Abstract

This project investigates an innovative database technology CouchDB byway of assessing its use in applications derived from the British CouncilActivity Mapping project The projectrsquos aim is to examine the advantagesand disadvantages of CouchDB from the perspective of a web applicationsdeveloper

CouchDB combines a web server with a data storage mechanism Datais stored in the form of denormalised documents and queried through map-reduce functions which result in the creation of indexed views

The project considers the suitability of CouchDB as a data store andweb development platform in support of an existing relational database ap-plication with an assessment of the strengths of both approaches

Alles im richtigen Maszlig

Acknowledgments

With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track

Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian

Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos

I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions

Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething

Contents

1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4

2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14

3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19

4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22

1

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 2: Project Report for my MSc Computer Science project at Birkbeck

Abstract

This project investigates an innovative database technology CouchDB byway of assessing its use in applications derived from the British CouncilActivity Mapping project The projectrsquos aim is to examine the advantagesand disadvantages of CouchDB from the perspective of a web applicationsdeveloper

CouchDB combines a web server with a data storage mechanism Datais stored in the form of denormalised documents and queried through map-reduce functions which result in the creation of indexed views

The project considers the suitability of CouchDB as a data store andweb development platform in support of an existing relational database ap-plication with an assessment of the strengths of both approaches

Alles im richtigen Maszlig

Acknowledgments

With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track

Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian

Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos

I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions

Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething

Contents

1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4

2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14

3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19

4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22

1

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 3: Project Report for my MSc Computer Science project at Birkbeck

Alles im richtigen Maszlig

Acknowledgments

With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track

Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian

Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos

I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions

Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething

Contents

1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4

2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14

3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19

4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22

1

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 4: Project Report for my MSc Computer Science project at Birkbeck

Acknowledgments

With thanks to my supervisor Nigel Martin for timely patient and good-humoured advice - you really helped me to stay on track

Thanks also to my colleagues and ex-colleagues at the British Councilin particular Terry Pyle Kshipra Singhvi Phil Street Spero BlassoplesMichael Sadler Roger Moran and Masoud Ahanchian

Thank you Su Peneycad for your encouragement and wonderful skill atspotting typos

I had the great fortune to be doing this project during the summer of2010 when the (first) lsquoNoSql Summerrsquo was in full swing Thanks to theorganisers of the London meetups Neil Robbins and Makoto Inoue forproviding the framework for such interesting discussions

Finally thank you to the active and friendly CouchDB community - JChris Anderson Volker Mische Jason Smith and many others for your helpon the couchdb-users mailing list and to Damien Katz for starting the wholething

Contents

1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4

2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14

3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19

4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22

1

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 5: Project Report for my MSc Computer Science project at Birkbeck

Contents

1 Introduction 311 Motivation 312 Does lsquoone sizersquo still lsquofit allrsquo 313 Organisation 4

2 Background 521 The Relational Model 522 Relational Database Management Systems 623 Normalisation 624 SQL 725 Transactional Guarantees 726 The lsquoNoSqlrsquo Movement 827 lsquoNoSqlrsquo databases at large scale 928 Brewerrsquos lsquoCAPrsquo Theorem 1029 Reducing the Impedance Mismatch 11210 Benefits of lsquoNoSqlrsquo 11211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo 12212 The Activity Mapping Project 12213 Raising Awareness of British Council Impact in the UK 12214 Using CouchDB to Serve Mapping Data 14

3 Introduction to CouchDB 1631 lsquoOf the Webrsquo 1632 Some History 1733 Document-Oriented 1734 Erlang 1835 How Ubuntu uses CouchDB 1936 Desktop Couch Python and Quickly 19

4 Using CouchDB 2141 Motivation 2142 Hosted CouchDB Service Providers 2143 Installing CouchDB 22

1

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 6: Project Report for my MSc Computer Science project at Birkbeck

CONTENTS 2

44 Inserting a Document into a CouchDB Database 2245 Deleting a Document 2646 Updating a Document 2647 Adding Attachments 2648 Replication 2749 Querying a CouchDB Database using Map-Reduce 27

5 Serving HTML from CouchDB 3151 Bulk upload of JSON documents 3152 A CouchDB Design Document 3353 A more complex lsquoshowrsquo function 3554 Views and Lists 3655 Lessons Learned 38

6 Serving GeoRSS using CouchApp 3961 Introduction to CouchApp 3962 Sofa - a Blogging Application 3963 Developing with CouchApp 4164 Deployment 4465 Using templates 44

7 Critical Assessment and Conclusion 4771 Why might a developer choose CouchDB 4772 CouchDBrsquos challenges 4873 Conclusion 49

A Design Document d1 50

B GeoRSS on Sofa 52

C georssjs 55

D georsshtml 57

E Bash Script to Upload Country Flag Files 59

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 7: Project Report for my MSc Computer Science project at Birkbeck

Section 1

Introduction

The MSc Computer Science project described in this report aims to in-vestigate a relatively new database product called CouchDB from a webapplication development perspective [10] [13] I have built a simple businessapplication and use this as the test case from which to evaluate CouchDBrsquossuitability as a web development platform

11 Motivation

The motivation for the project is to understand what can be gained bymoving away from traditional relational database systems towards a rangeof lsquokey-value storersquo products which have emerged recently to deal with thechallenges of very large data volumes in online environments

The background to the project is one of rapid change inspired by theemergence of a set of innovative open-source non-relational (also known aslsquoNoSqlrsquo1) database products

These lsquoNoSqlrsquo database products are challenging the accepted standarduse of relational database management systems as the only storage mecha-nism used by developers when storing data Relational databases have beenvery successful in the past and are likely to remain so but in a certainsub-set of usage scenarios non-relational systems are starting to take theirplace

12 Does lsquoone sizersquo still lsquofit allrsquo

One of the original pioneers of relational database research Michael Stone-braker [46] [25] stated in a 2005 paper [45] that established relationaldatabase management systems (DBMSs) would face increased competitionfrom newer architectures

1Pronounced lsquoNo-See-Quelrsquo

3

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 8: Project Report for my MSc Computer Science project at Birkbeck

SECTION 1 INTRODUCTION 4

ldquoThe last 25 years of commercial DBMS development can besummed up in a single phrase lsquoOne size fits allrsquo This phraserefers to the fact that the traditional DBMS architecture (orig-inally designed and optimised for business data processing) hadbeen used to support many data-centric applications with widelyvarying characteristics and requirements

[ ] we argue that this concept is no longer applicable to thedatabase market and that the commercial world will fractureinto a collection of independent database engines some of whichmay be unified by a common front-end parserrdquo

Why does this matter Does lsquoone size fit allrsquo Up until recent years theanswer has always been lsquoyesrsquo practically every data-centric IT project overthe past 25 years has used a relational database to persist data Systemsdevelopers accept normalisation as a standard part of the software develop-ment process However as I explore in this project there are times when itis more economical and effective to take a different approach to the problemof data storage

13 Organisation

The sections are organised in the following way

bull in Section 2 I discuss the background and technological developmentsin the database arena since the original work on relational systemsin the 1970s I also introduce the Activity Mapping system whichformed the test case for an application written using CouchDB forthis project

bull in Section 3 I introduce the open-source document-oriented databaseCouchDB

bull in Section 4 I provide an overview of CouchDBrsquos application program-ming interface

bull in Section 5 I describe some initial experiments in serving HTML froma CouchDB database - that is using CouchDB to store both data andapplication code

bull in Section 6 I develop the project further using CouchApp as an ap-plications framework to serve GeoRSS data for consumption on Googleand Bing maps

bull in Section 7 I conclude with a critical assessment of CouchDB as aplatform for web application development based on the investigationcarried out for this project

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 9: Project Report for my MSc Computer Science project at Birkbeck

Section 2

Background

In recent years a new generation of data stores have emerged which approachdata persistence in a different way to the established relational model

In this section I briefly discuss this relational model and describe someof the recent challenges which have led to the development of the neweralternative database systems

21 The Relational Model

The relational model was first formulated and proposed in 1969 by TedCodd and described in an extremely influential paper published in June1970 [15]

One description [54] of Coddrsquos relational model reads as follows

ldquoThe fundamental assumption of the relational model is that alldata is represented as mathematical n-ary relations where an n-ary relation is a subset of the Cartesian product of n domainsrdquo

A relation is defined as a set of tuples where a tuple is a set of attributevalues The conventional visual representation is to represent relations astables tuples as rows and attributes as columns

The main advantage of the relational database is its precise and math-ematically proven organisation of data In this context it is importantto recognise that the relational model is a logical model Both relationaldatabases and non-relational databases (such as file systems or lsquoNoSqlrsquodatabases) commonly use B-Trees as their underlying data structure Seenfrom this perspective the question of whether a database is lsquorelationalrsquo ornot is more a matter of how applications use the database - a relationaldatabase represents its data to the user in the form of relations tuples andattributes (tables rows and columns) [49]

5

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 10: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 6

22 Relational Database Management Systems

The term lsquoRelational Database Management Systemrsquo refers to the class ofdatabases which follow Coddrsquos relational model The most popular commer-cial (DB2 Oracle SybaseSql Server) and open source (MySql PostgreSQL)databases in use over the past three decades have been based on this model

These products have been very successful and provide data persistencefor the vast majority of web and desktop applications developed todayIndeed the very term lsquodatabasersquo has become associated with relationaldatabases to such an extent that non-relational databases are often referredto as lsquonon-relational data storesrsquo for clarity (Formally and in its originalmeaning the term lsquodatabasersquo simply means a collection of data)

23 Normalisation

A key factor in relational data storage is data normalisationData normalisation is the process which aims to ensure that each element

of data is stored in one and only one place This ensures data integrity bymeans of removing duplication [48]

The first level of normalisation lsquofirst normal formrsquo requires that eachdata item (table cell) contains an atomic value - typically a single numberstring or date value First normal form does not permit sets or otherstructures as valid data item values

Consider for example a table (letrsquos call it lsquoStaffrsquo) containing staff records

Name Nationality

Bob AustraliaJane Sweden

All is well until the system encounters a staff member with multiplenationalities

Name Nationality

Bob AustraliaJane SwedenAlice United Kingdom South Africa

This second table breaks the rules of normalisation and would not bevalid in any of the relational database management systems listed aboveThe standard way to solve this problem would be to create two more tablesa lsquoNationalityrsquo table with a lsquoStaffNationalityrsquo table linked with Primaryand Foreign Keys

While this is a trivial example it can be seen that changing real-worldbusiness rules can lead to changes in data schemata - which can naturally

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 11: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 7

be expensive as database developers need to be brought in to make thenecessary amendments

In a non-relational database (such as CouchDB) Alicersquos record could bestored as follows

name Alice

nationality [

United Kingdom

South Africa

]

This record (or lsquodocumentrsquo in CouchDB terminology) can be stored inthe database alongside the other records without a change in schema Forthis reason CouchDB and other lsquoNoSqlrsquo products are sometimes referred toas lsquoschema-lessrsquo database management systems

The notation used is JavaScript Object Notation (JSON) This will bediscussed in more detail in Section 3

24 SQL

A major benefit of following normalisation rules is to permit easier retrievalof data using the set-based declarative query language SQL Originally de-veloped by IBM in the early 1970s SQL (Structured Query Language) be-came an international standard during the 1980s [22] [25]

To continue with our trivial example the following SQL statement couldbe used to determine how many staff members of each nationality are em-ployed

SELECT Nationality COUNT() FROM Staff GROUP BY Nationality

For this to work the table needs to store atomic values In our exampleabove this query would result in lsquoUnited Kingdom South Africarsquo beingconsidered a distinct nationality

In Section 4 I will discuss map-reduce which is the method used byCouchDB and some other lsquoNoSqlrsquo systems to query the database

25 Transactional Guarantees

A further very significant aspect of traditional database management sys-tems is that they provide a set of transactional guarantees referred to bythe acronym ACID (Atomicity Consistency Isolation Durability) Theseproperties guarantee that database transactions are processed reliably Most

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 12: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 8

systems rely on data access (read and write) locking to provide ACID capa-bilities [50]

The goal of these transactional guarantees is to ensure that as far asthe user is concerned the database is always in a consistent state that atany given moment in time there is only lsquoone version of the truthrsquo for eachdata item in the system In the canonical example if a sum of money istransferred from one account to another the system must ensure that bothaccounts are amended correctly To the user of the system it must appearas if the changes happen at the same time in both places [37]

For a banking application it is vital that the database provides theappearance of complete consistency even though in reality the debit andcredit do not happen at the exact same time Even in the event of hardwareor network failure traditional relational database management systems musthave the ability to roll-back to a consistent state so that data integrity isassured at all times

The paradigm which is followed by such a RDBMS is that data mustappear at least to be held in one place and that each atomic data valuerepresents the only version of the truth at a given moment in time This isvital for banking and airline booking systems

However many applications do not require such strict transactional guar-antees Since the arrival of the Internet we are more comfortable with mul-tiple versions of the same data existing in different places - for examplewe cache web pages on proxy servers to reduce network latency Users ofweb pages are generally tolerant of slightly stale pages however they areextremely intolerant of latency (a delay in the time it takes to load the page)

Caching data in many places so that the data is near to the user whenit is needed is the best way to reduce latency There may no longer bea lsquosingle version of the truthrsquo but for many applications the trade-off isworthwhile

26 The lsquoNoSqlrsquo Movement

lsquoNoSqlrsquo refers to a range of non-relational database products which haveemerged in recent years The term lsquoNoSqlrsquo is itself slightly tongue-in-cheek made in reference to the popular naming scheme for classic relationaldatabase management systems (MySQL PostgreSQL etc)1

lsquoNoSqlrsquo is used to refer to data stores which do not provide all the servicesof the established relational database management systems described aboveIn particular these newer products are distinctive in two main areas

bull Non-relational database systems allow row entries (tuples) of arbitrarylength containing attributes of arbitrary data types In other words

1During the summer of 2010 as this project report was being prepared a set of meetingstook place world-wide under the theme of lsquoa NoSql Summerrsquo[1]

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 13: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 9

data is not stored in the form of normalised tables and records do notneed to conform to any pre-determined schema

bull lsquoNoSqlrsquo databases do not generally attempt to provide strong transac-tional guarantees since such guarantees are expensive to ensure withvery high volumes of data distributed over a large number of nodes

27 lsquoNoSqlrsquo databases at large scale

Much of the impetus for this new breed of database product came fromthe experiences of large-scale web-based services such as Google YahooFacebook and Amazon Originally such services began with single-nodedatabases As they grew into global brands they found it increasingly dif-ficult to add sufficient system capacity and thus started to consider newarchitectures which could scale to many thousands of nodes in various dat-acentres around the world [16]

The experience of these large-scale data providers was that in many ap-plication domains it was acceptable to relax the strict transactional guar-antees provided by the relational DBMS For example in a social network-ing application availability is far more important than consistency in itsstrictest form

To give an example if Alice in London updates her Facebook statusit doesnrsquot really matter if Bob in Singapore sees a lsquostalersquo version for a fewseconds Of course in a foreign exchange trading application those fewseconds would make a huge difference lsquoNoSqlrsquo systems are generally in-tended for the Facebook scenario Foreign exchange trading requires thevery strict data integrity provided for example by traditional relationaldatabase management systems such as Oracle

In the Facebook example we may be content with weaker forms of con-sistency The data is held in replicas around the world so that neitherAlice nor Bob are kept waiting too long to see their Facebook pages Thesereplicas are not guaranteed to be in the same state all the time but areguaranteed to eventually converge to a consistent state

Holding replica data sets in multiple locations also promotes partitiontolerance that is the ability to keep the service running despite hardwareor network failure If the London datacenter were to go off-line Alice wouldstill be able to get hold of her data albeit more slowly from one of the otherdatacenters around the world

In a paper describing the system used by Yahoo [16] the authors de-scribe this approach to consistency as follows

ldquoOccasionally an entire datacenter will go down (eg if thepower is cut) or become unreachable (eg if the network cableis cut) and then any records mastered in that datacenter will

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 14: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 10

become unwriteable This scenario exposes the known trade-offbetween consistency availability and partition tolerance onlytwo of these three properties can be guaranteed at all timesSince our database is global partitions will happen and cannotcause an outage and thus in reality we only have a choice be-tween consistency and availabilityrdquo

28 Brewerrsquos lsquoCAPrsquo Theorem

The idea of trading off consistency for availability is a key feature of lsquoNoSqlrsquodatabases Reference is often made to Eric Brewerrsquos conjecture [27] that itis impossible for a web service to provide the following three guarantees

bull Consistency

bull Availability

bull Partition tolerance

Of course any data persistence mechanism used by global-scale webservices require a high degree of partition tolerance the ability to continueservice despite occasional unavailability of nodes or network links [26] Inother words availability and partition tolerance are essential but strictconsistency is not - so a looser model of consistency is considered a priceworth paying

This realisation - that lsquoeventualrsquo consistency is often sufficient - has hadthe effect of unleashing a large amount of innovative effort in the area ofdatabase systems development The code-base required to ensure ACIDguarantees in traditional DBMSs is complex and the newer systems havehad the luxury of foregoing such development to concentrate solely on scal-able available data storage solutions

In addition these lsquoNoSqlrsquo data stores do not have to be as lsquointelligentrsquoas their predecessors the burden of ensuring data integrity now lies verysquarely with the application not the database At most a lsquoNoSqlrsquo systemmay provide some rudimentary validation but it does not enforce a schemain the same way as traditional relational systems It may be a little unfair todescribe them in this way but compared with traditional databases lsquoNoSqlrsquosystems are simply lsquodumbrsquo repositories of data This has meant of coursethat the code-base required for lsquoNoSqlrsquo systems is much smaller In the caseof CouchDB this means the database can be installed anywhere from webservers to desktops even hand-held devices and mobile phones

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 15: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 11

29 Reducing the Impedance Mismatch

The term lsquoobject-relational impedance mismatchrsquo refers to a set of difficultiesencountered by object-oriented applications developers when storing objectsin relational databases Over the years a set of Object-Relational Mapping(ORM) tools have emerged - examples are Active Record [41] Hibernate[9] and Entity Framework [39] These tools are very widely used by applica-tions developers to lsquobridge the gaprsquo between the object and relational datamodels

However ORM tools are complex and thus difficult to maintain2Another tactic used by application developers is to use SQL Views -

effectively de-normalising data so that the data model is closer to the ap-plicationrsquos object model

A key advantage of lsquoNoSqlrsquo systems is that they accept data of anystructure so that the developer does not need to fragment the data intorelational tables in order to store it This by-passes the object-relationalmismatch problem entirely and radically simplifies application development[36]

210 Benefits of lsquoNoSqlrsquo

As discussed lsquoNoSqlrsquo databases take a fresh look at the problem of datapersistence They take advantage of Moorersquos Law - disk space has becomeso economical as to be almost free memory has grown so large it is nowmeasured in Gigabytes so that many databases can fit entirely in memory

bull lsquoNoSqlrsquo databases provide benefits to consumers in the form of cachedreplicated always-available data

bull They provide benefits to enterprises because for reporting data it ischeaper to operate a lsquoNoSqlrsquo databases at scale managing clusters ofrelational database management systems is very expensive Itrsquos betterto reserve this expense for transactional processing and to off-load thereporting burden to a cheaper more scaleable product CouchDB forexample is designed to run reliably on clusters of low-cost commodity-grade hardware

bull Finally lsquoNoSqlrsquo databases provide benefits to developers through aneasier way to persist and access data as well as greater flexibility ofschema The persistence layer in effect uses the same data model asthe application layer

2An entertaining and informative essay on the topic [40] describes object-relationalmapping as lsquothe Vietnam of Computer Sciencersquo referring to the quagmire of the Vietnamwar

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 16: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 12

211 Ad-hoc Querying - the lsquoAchilles Heelrsquo of lsquoNoSqlrsquo

It doesnrsquot come as much of a surprise that the main disadvantage of lsquoNoSqlrsquosolutions is the lack of SQL support - or (to state it more generally) the lackof ad-hoc querying support As we will see in the case of CouchDB queriesare pre-computed using map-reduce functions there is little capability toquery the system in an arbitrary way

(A similar lsquoNoSqlrsquo product MongoDB does allow for SQL-like dynamicquerying [38])

Very often a lsquoNoSqlrsquo solution is used for standard reporting purposeswhere the reporting requirements are well-known in advance The data itselfwill come from a relational database with business teams using SQL againstthe relational database for ad-hoc queries In this way the lsquoNoSqlrsquo solutionis a cached representation of the database used purely for reporting theestablished and more expensive relational system can be kept in continueduse for both transactional processing and ad-hoc querying using SQL

212 The Activity Mapping Project

The rest of this section introduces the Activity Mapping Project which isa project I have been developing for the British Council [2]

The British Council is the United Kingdomrsquos international organisationfor cultural relations It has offices in over 100 countries and territoriesworld-wide In 2009-2010 the organisation had a turnover of pound705 million[20]

213 Raising Awareness of British Council Impactin the UK

The main aim of the Activity Mapping project is to raise awareness of thework the British Council does for communities throughout the UK The web-site for the project is httpactivitymapbritishcouncilorg This siteconsists of a set of many hundreds of maps displaying the range of educa-tional arts youth and science projects supported by the British Council inthe UK

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 17: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 13

Figure 21 British Council project data displayed on Bing Maps

The data from these maps is exported to a set of static files in HTML andXMLGeoRSS [4] formats from an internal (Microsoft SQL Server) databasecontaining project information

The internal system used to upload data from each of the approximately200 British Council initiatives was written using C and ASPNet on Mi-crosoft Visual Studio It accepts uploads from internal staff members inspreadsheet format and uses string-matching techniques to de-duplicate in-stitution names and locations The result of this work is that an organi-sation the British Council now has a unified view of the work undertakenwith each partner institution throughout the UK

Outputs from the Activity Mapping have included letters to MPs on-line maps for each MP and local authority reports for overseas educationministers and a Google Earth presentation for display in the reception areaof our offices

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 18: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 14

Figure 22 British Council data on Google Earth

The MSc project described in this report uses data from the ActivityMapping Project The aim is to investigate the extent to which it is possibleto use CouchDB to store the Activity Mapping data in document formatrather than as relational data and then to use CouchDBrsquos built-in webserver to serve the data in HTML and GeoRSS formats

214 Using CouchDB to Serve Mapping Data

Currently the generated html and xml files are stored statically on theBritish Councilrsquos web server and a logical next step would be to build aback-end database that would serve the data dynamically

Since the data for the maps is read-only reporting the activity for theprevious year I feel that it would be a good fit for the back-end data storeto be something like CouchDB - in effect using CouchDB as a web cachinglayer

The data is already structured (by being exported from the British Coun-cilrsquos relational database) The main tasks for CouchDB would be to storethe reporting data in JSON format and to transform this data on demandto GeoRSS or KML [7] formats for consumption by Microsoft Bing MapsGoogle Maps Google Earth etc

This is a report into my investigation into writing an application usingCouchDB to fulfil these requirements I am interested to explore how readythis new technology is for lsquoprime timersquo - that is to what extent it is ready

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 19: Project Report for my MSc Computer Science project at Birkbeck

SECTION 2 BACKGROUND 15

for adoption by web applications developers such as myself who are usedto developing solutions on current standard platforms3

3In my case the bulk of my experience has been in C and ASPNet I also have someexperience working with PHP and Drupal

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 20: Project Report for my MSc Computer Science project at Birkbeck

Section 3

Introduction to CouchDB

The non-relational database that I focused on for this MSc project is ApacheCouchDB [10] Written in Erlang CouchDB combines a web (HTTP) serverwith a data storage mechanism Data is stored in a series of B-Tree indexesas key-value pairs where each value is expressed in terms of a JSON string[30]

31 lsquoOf the Webrsquo

The feature of combining a web server with B-Tree based data storage hasbeen key to CouchDBrsquos success as a project all modern programming lan-guages have libraries that implement communication over HTTP As a re-sult it is possible to write code that uses CouchDB in practically any lan-guage A developer using CouchDB is thus able to dispense with object-relational mapping layers and database connectivity libraries which are re-quired to communicate with standard DBMSs [52]

The other advantage of combining a web server and database engine isthat the web presentation layer (the lsquoweb pagesrsquo) can be stored in CouchDBitself alongside the data The HTML and JavaScript needed to render thedata to a web page can be stored in CouchDB design documents It is thisinnovative storage of data and code in one place which I take advantage ofin this project

An often-used quote about CouchDB by Jacob Kaplan-Moss the creatorof the web application framework Django [3] emphasises the possibilitieswhich emerge from combining web and database technologies in a singleproduct

ldquo Let me tell you something Django may be built for theWeb but CouchDB is built of the Web Irsquove never seen softwarethat so completely embraces the philosophies behind HTTP [ ]HTTP is the lingua franca of our age if you speak HTTP it

16

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 21: Project Report for my MSc Computer Science project at Birkbeck

SECTION 3 INTRODUCTION TO COUCHDB 17

opens up all sorts of doors Therersquos something almost subversiveabout CouchDB itrsquos completely language- platform- and OS1-agnosticrdquo [31]

Normally a web development project requires two separate servers - adatabase server and a web applications server CouchDB combines both ofthese in one single product As a result it is interesting to explore CouchDBas an all-in-one solution for serving both data and HTML code

32 Some History

CouchDB was started by Damien Katz in 2004 as an lsquoindexable schema-lessdatabasersquo [33] It draws many ideas from Lotus Notes a product that Katzworked on at IBM Lotus Notes is also a document-oriented system whichcan be used as a platform for web applications

A very interesting 30-minute talk by Katz about his motivation for theCouchDB project is available here [32] In the talk Katz describes hisdecision to sell his house move his family and live from savings in order toestablish a new free software project (For a more technical discussion withKatz about CouchDB see [55])

33 Document-Oriented

CouchDB is a lsquodocument-orientedrsquo database where each lsquodocumentrsquo is aJavaScript Object Notation (JSON) string JSON is a lightweight data ex-change format based on a subset of the JavaScript programming languageIt is often used for serialising and transmitting structured data over a net-work connection and has established itself in recent years as an alternativeto XML [6]

The following example (taken from Wikipedia [53]) shows the JSONrepresentation of an object that describes a person

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

1Operating System

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 22: Project Report for my MSc Computer Science project at Birkbeck

SECTION 3 INTRODUCTION TO COUCHDB 18

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

Consider the use of NULL values in relational databases in a relationaldatabase table we would usually represent addresses in countries whichdonrsquot use postal codes by putting a NULL value in the postalCode fieldIn CouchDB and other similar systems the absence of a postal code wouldsimply be denoted by its absence in the document itself - this is much morein line with what we intuitively expect from real-world experience

The schema is flexible so if we wish to we may legitimately store thefollowing document in the very same lsquoaddressbookrsquo database

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

(Note that n denotes a new line character so that in this case thestreetAddress spans three separate printed lines)

In Section 4 I will demonstrate how these documents are uploaded toand retrieved from a CouchDB server

34 Erlang

Damien Katz chose to develop CouchDB using Erlang The Erlang pro-gramming language was developed by Ericsson to support distributed fault-tolerant non-stop telecoms applications A key element in its reliability isthat it is a functional language which avoids mutable data There is noshared memory state between processes so that if a process in Erlang en-counters a problem that process can be shut down without affecting therest of the system Compiled Erlang code has been known to run for yearsin telecoms switches with zero downtime [14] [24]

Reliability is one of the main objectives of the CouchDB project One ofthe core developers on the project J Chris Anderson describes how usingErlang together with write-only B-Tree structures for data promotes this

ldquoThe overall effect of this design is to maximise for reliability andconcurrency There is no way to corrupt the data as we never

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 23: Project Report for my MSc Computer Science project at Birkbeck

SECTION 3 INTRODUCTION TO COUCHDB 19

touch what wersquove already written to disk Erlang can supportas many simultaneous readers as resources allow which wersquovefound in practice to be in the tens of thousandsrdquo [11]

35 How Ubuntu uses CouchDB

The Ubuntu Linux operating system uses CouchDB as an on-board databasefor application settings This is similar to the way that the Microsoft Win-dows registry stores settings for programs running on a particular computer[43]

The innovative element about Ubuntursquos lsquoDesktop Couchrsquo is that it cantake advantage of CouchDBrsquos replication capabilities Desktop Couch cancommunicate every ten minutes with a CouchDB instance on an externalweb server known as the UbuntuOne service [5] Currently every Ubuntudesktop user can sign up for 2GB of free space on the UbuntuOne CouchDBserver CouchDB can also store files as well as application settings so thesystem can be used for secure off-site backup of important files

Client applications running on Ubuntu desktop may save their data toDesktop Couch For example the Firefox web browser may save websitebookmarks to Desktop Couch Then as a background process the book-marks are synchronised to the UbuntuOne server Subsequently when theuser goes to another machine the Desktop Couch on that machine may fetchthe latest bookmarks data from UbuntuOne As a result the data followsthe user around from machine to machine

In this way a user can store their contacts internet bookmarks musicfiles and so on in multiple places and have them constantly available [34]Since CouchDB is available for multiple platforms (Windows Mac Androidetc) the data can be made available across different devices not just Ubuntudesktop PCs A short (2 minute) video illustration of this is available here[19]

36 Desktop Couch Python and Quickly

Desktop Couch is pre-installed on every Ubuntu desktop from version 910onwards The developers of Ubuntu have always been interested in loweringthe barriers of entry for new programmers To promote this they havedeveloped lsquoQuicklyrsquo [8] to provide a way for beginner programmers to getstarted using Python to create desktop applications

Many programmers of my generation - I started coding in 1998 - usedMicrosoft Visual Basic to get started Quickly is similar to this The in-novative aspect is that it uses Desktop Couch for persistence so that anlsquoopportunisticrsquo developer can be productive without needing to worry aboutdatabase setup The other advantage is that once data is stored in Desktop

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 24: Project Report for my MSc Computer Science project at Birkbeck

SECTION 3 INTRODUCTION TO COUCHDB 20

Couch it is replicated to the UbuntuOne service so that the data can beretrieved on other machines or devices running CouchDB as well

Section 4

Using CouchDB

This section contains a brief tutorial introduction to CouchDB

41 Motivation

My motivation for including this section is that CouchDB is still a rela-tively new product Version 10 was released in July 2010 (For a list ofretrospective pieces marking the occasion of the 10 release see [47])

As CouchDB is still quite new it can at times be difficult to find or fol-low documentation Consequently I wrote this section to help myself under-stand and work through a few of the more important aspects of CouchDBrsquosapplication programming interface (API)

A wider aim of this MSc project is to investigate how ready CouchDB isfor wider adoption in the community of lsquocorporatersquo web applications devel-opers I have worked for about 10 years as a Microsoft-platform applicationsdeveloper so I consider myself fairly typical of this group

The two main sources of information for this section were the CouchDBbook [13] also available on-line at httpguidecouchdborg and theHTTP Document API available at httpwikiapacheorgcouchdb

HTTP_Document_API

42 Hosted CouchDB Service Providers

I strongly recommend that if you are reading this section you sign up foran on-line instance of CouchDB from CouchOne (httpwwwcouchonecomget) At time of writing (September 2010) the service is in beta and isfree-of-charge

Please note that after signing up for an online instance of CouchDB itis important to protect it with a strong username and password

21

SECTION 4 USING COUCHDB 22

43 Installing CouchDB

In addition to the hosted service I installed CouchDB locally on Ubuntu1004 using the installer from httpwwwcouchonecomget Installersalso exist at the same URL for Windows and Mac I installed the currentversion at time of writing which was 101

(As already noted Ubuntu has an instance of CouchDB - Desktop Couch- pre-installed It is also possible to install a separate lsquostandardrsquo instance ofCouchDB on an Ubuntu machine using the installer from CouchOne Thisis what I have done for this project)

44 Inserting a Document into a CouchDB Database

As mentioned previously any program with an HTTP library can be usedto post documents to a CouchDB server

For this MSc project I took the approach of going back to first principlesand using command-line tools to interact with CouchDB

To upload a document from the terminal or command-line we use lsquocURLrsquowhich is a command-line tool for getting or sending files using URL syntax1[51]

To create a database called lsquoaddressbookrsquo2 on a remotely hosted CouchDBinstance we use the following syntax

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

(username and password are placeholders for my real username and pass-word)

To delete a CouchDB database we issue the statement with the HTTPDELETE verb

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

(Please re-create the database once again if you wish to follow the stepslaid out in this tutorial section)

In Section 3 above we saw a document in JSON format containing theaddress details for a certain John Smith residing in New York

If this document were to be stored in a file named john-smithjsonthe cURL syntax for uploading this document to the addressbook databasewould be as follows

1If cURL is not pre-installed with your operating system you can download it fromhttpcurlhaxxsedownloadhtml

2Ed Parcell [42] has written a useful tutorial on creating a CouchDB address bookapplication

SECTION 4 USING COUCHDB 23

curl -H Content-Typeapplicationjson

-X POST httpusernamepasswordmickcouchonecomaddressbook

-d john-smithjson

cURL syntax is explained here [44] Briefly

bull -H introduces the HTTP Header

bull -X POST instructs cURL to use an HTTP POST (instead of a GET)request

bull -d introduces the data to be sent

bull denotes that what follows is a file name

All data in CouchDB is stored in JSON format together with a uniquekey and revision version number - so that in CouchDB the resulting docu-ment would have id and rev values as follows

_id 760cd53c55a93497067f90d6242fc25e

_rev 1-91bce055fc8db86480400321079f0834

firstName John

lastName Smith

age 25

address

streetAddress 21 2nd Street

city New York

state NY

postalCode 10021

phoneNumber [

type home number 212 555-1234

type fax number 646 555-4567

]

This document may be retrieved online here httpmickcouchonecomaddressbook760cd53c55a93497067f90d6242fc25e The output isunformatted JSON

To view the output as formatted JSON we may copy the output intothe JSONLint validator httpjsonlintcom

SECTION 4 USING COUCHDB 24

Figure 41 JSONLint

The entire addressbook database can also be viewed using lsquoFutonrsquo CouchDBrsquosmanagement console httpmickcouchonecom_utilsdatabasehtmladdressbook

Figure 42 Futon

In some scenarios we may wish CouchDB to use meaningful documentid values rather than a meaningless string of random characters

For example it would be useful for us to store the British Council Zambiaaddress at a memorable URL httpmickcouchonecomaddressbookbritish-council-zambia

SECTION 4 USING COUCHDB 25

To achieve this we use HTTP PUT instead of POST specifying theURL we wish to lsquoputrsquo to as follows

curl -H Content-Typeapplicationjson

-X PUT

httpusernamepasswordmickcouchonecomaddressbookbritish-council-zambia

-d british-council-zambiajson

This is the resulting document in CouchDB

_id british-council-zambia

_rev 1-da7bcd810c608d6fbcb9ce92e9ade343

company British Council Zambia

address

streetAddress Heroes PlacenCairo RoadnPO Box 34571

city Lusaka

country Zambia

As a result this document may be retrieved at the intended locationhttpmickcouchonecomaddressbookbritish-council-zambia Theresulting JSON output may be consumed and transformed by a client ap-plication using JavaScript or some other programming language

Figure 43 Document viewed using Futon

SECTION 4 USING COUCHDB 26

45 Deleting a Document

When deleting a document we need to provide the revision value in thequery string (a lsquoquery stringrsquo is the section of a URL following a questionmark )

By requiring the revision value CouchDB ensures that the client appli-cation has a reference to the latest version of a document before it allowsthat document to be deleted

The following command will delete the existing British Council Zambiadocument from the database

curl -X DELETE httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiarev=1-da7bcd810c608d6fbcb9ce92e9ade343

46 Updating a Document

Similarly when updating a document the revision value must be supplied toensure that the application making the update is aware of the latest versionof the document CouchDB will return a Document update conflict errorif an update is attempted on an lsquooldrsquo version of a document

As we are updating a lsquonamedrsquo document we use the PUT verb insteadof POST

The following command will update John Smithrsquos record with the con-tents of the file john-smith-v2json

curl -X PUT httpusernamepasswordmickcouchonecomaddressbook

760cd53c55a93497067f90d6242fc25erev=1-da7bcd810c608d6fbcb9ce92e9ade343

-d john-smith-v2json

47 Adding Attachments

CouchDB provides us with a way to save attachments for example imagefiles to a database The Activity Mapping project uses national flag iconsfrom httpwwwfamfamfamcomlabiconsflags3

The following command will upload the flag of Zambia (zmpng) as anattachment to the British Council Zambia address record

curl -H Content-Typeimagepng -X PUT

httpusernamepasswordmickcouchonecomaddressbook

british-council-zambiaflagrev=3-e6fbb83f5aa5c2e2111b5b14559115af

--data-binary zmpng

3Many thanks to Mark James for creating these files and making them available forfree use

SECTION 4 USING COUCHDB 27

The uploaded image may be viewed here httpmickcouchonecomaddressbookbritish-council-zambiaflag

48 Replication

One particularly interesting and extremely useful aspect of CouchDB is itsability to replicate data between separate CouchDB instances

Continuing with the example above let us imagine that we want to havea replica of the addressbook on our local machine so that we can work onit in off-line mode

If the addressbook database does not already exist on the local machinewe can create it as follows

curl -X PUT http1270015984addressbook

To replicate the addressbook database to the local machine we use thefollowing command

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d rsquosourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbookrsquo

Note that if using cURL on Microsoft Windows we need to exclusivelyuse double quotes backslash-escaping where necessary

curl -H Content-Typeapplicationjson

-X POST http1270015984_replicate

-d sourcehttpusernamepasswordmickcouchonecomaddressbook

targethttp1270015984addressbook

As a result the replica of the database is now available on the localmachine

49 Querying a CouchDB Database using Map-Reduce

So far we have seen how data is entered on to CouchDB and a little of howreplication works Now we turn to how to query the database

Data is retrieved from CouchDB using a map-reduce function usuallywritten in JavaScript (although it is also possible to write CouchDB map-reduce functions in Erlang)

The map-reduce paradigm was popularised by Googlersquos 2004 paper [23]describing how it is used to distribute calculations across large clusters of

SECTION 4 USING COUCHDB 28

commodity machines In effect it works in two stages the lsquomaprsquo stageranges over the document collection and emits a value for each document itencounters the lsquoreducersquo stage is used when a calculation of the intermediatevalues is required

In a blog posting for this project httpklena02wordpresscom

20100201couchdb I work through a small example of how to retrievesome United Kingdom parliamentary election data from CouchDB using amap-reduce function

The data relates to the 2005 UK election and was taken from http

wwwelectoralcalculuscouk

Figure 44 election-2005 database

(As you can see from the URL this is on a CouchDB instance runningon my local machine rather than on the web)

Here is an example of a document stored in the database with the resultsfrom a single constituency

[httplocalhost5984election-2005Aberavon]

_idAberavon

_rev2-4defd6a39cb379b4480a72ddb0ab2ee5

mpHywel Francis

electorate51079

con3062

SECTION 4 USING COUCHDB 29

lab18077

lib4138

pc3546

oth1278

CouchDB can hold two types of document the most common type ofdocument is one which contains data such as the one above showing electoralresults for Aberavon

The other type of CouchDB document is known as a design documentDesign documents are used to store code as opposed to data

The CouchDB design document lsquoconrsquo has been defined with two lsquoviewsrsquoof the data

The first view lsquoConservative Votesrsquo simply ranges over the documentcollection in a map function

The second view lsquoConservative Votes Totalrsquo combines the map functionwith a reduce function to output the sum of all conservative votes (analogousto a GROUP BY in SQL)

[httplocalhost5984election-2005_designcon]

_id_designcon

_rev6-95b012f051cee87dc7a36d73cef8f2c8

languagejavascript

views

Conservative Votes

mapfunction(doc) n emit(doccon doc_id)n

Conservative Votes Total

mapfunction(doc) n emit(null doccon)n

reducefunction(keys values) n return sum(values)n

The sum of Conservative votes (8782198) is returned as the result ofthe following request

[httplocalhost5984election-2005_designcon_view

Conservative20Votes20Total]

SECTION 4 USING COUCHDB 30

rows[keynullvalue8782198]

It is also possible to emulate a WHERE clause the following request willreturn the constituencies where the Conservative vote was between 1000 and2000

[httplocalhost5984election-2005_designcon_view

Conservative20VotesstartKey=1000ampendKey=2000]

Figure 45 WHERE clause in CouchDB

This method of retrieving data is certainly cumbersome compared tousing SQL and one criticism of CouchDB is that it does not provide sophis-ticated ad-hoc querying capabilities - all queries on the system need to bedefined in terms of map-reduce functions

The advantage is that every time a map-reduce function is written againsta document collection in CouchDB a new B-Tree index is created for thatquery From that point on whenever a new document is added to the col-lection the index is incrementally updated

As a result the entire system is optimised for fast indexed retrieval -this makes it impossible in CouchDB to write a query which performs alsquotable scanrsquo at run-time - every read is an indexed lookup

Section 5

Serving HTML fromCouchDB

This section consists of some initial experiments in storing and servingHTML code with CouchDB taking advantage of its capabilities as a webserver My aim is to make an initial evaluation of CouchDBrsquos suitability asan environment for web application development

There is a more advanced way to write applications using CouchDB -using CouchApp I intend to explore this in Section 6 but for now I wouldlike to see how far we can get just using CouchDB on its own

As we have seen CouchDB stores data in the form of documents it storesHTML and JavaScript application code in so-called lsquoDesign Documentsrsquo

A design document in CouchDB is a document which stores applicationcode rather than data

51 Bulk upload of JSON documents

To demonstrate basic design document usage I have created a databasecalled lsquouniversitiesrsquo to investigate how some of the British Council ActivityMapping data may be served to the web via CouchDB

The lsquouniversitiesrsquo data set represents a sub-set of the institution recordsin the Activity Mapping database It consists of those universities withstudents participating in the Erasmus programme [21]

I wrote a Net application using C code and SQL stored procedures toexport the data out of the project database at work The data was exportedinto a set JSON files I then transferred these files to my PC at home whichruns Ubuntu Linux

31

SECTION 5 SERVING HTML FROM COUCHDB 32

Figure 51 Folder containing JSON files

To upload these documents from my desktop PC to my CouchDB in-stance on httpmickcouchonecom I wrote a bash script as follows

The script loops through the folder and uses cURL to send the datacontained in each to the CouchDB instance on the web [28] (In Windowsa similar effect would be achieved using a batch file script)

binbash

host=http1270015984

host=httpusernamepasswordmickcouchonecom

database=universities

folder=universities

fileextension=json

FILES=$folder

create the database

curl -X PUT $host$database

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e s$folderg)

echo $filename

docname=$(echo $filename | sed -e s$fileextensiong)

echo $docname

url=$host$database$docname

echo $url

put the document into CouchDB

echo curl -X PUT $url -d $filepath

curl -X PUT $url -d $filepath

done

Note that we are using HTTP PUT with cURL to lsquoputrsquo the document

SECTION 5 SERVING HTML FROM COUCHDB 33

at a meaningful url (POST is used when we wish CouchDB to assign arandom ID to the document)

To put a single document to the web the command used would be asfollows

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

Aberystwyth20University -d universitiesAberystwyth20Universityjson

Figure 52 Documents in CouchDB

The resulting universities database may be viewed online at http

mickcouchonecom_utilsdatabasehtmluniversities

52 A CouchDB Design Document

As mentioned application code in CouchDB is stored in design documentsA CouchDB design document is a document with an id that begins with_design [12]

For the first design document I have chosen an id of _designd1 Theaim of this very simple design document is to display (or lsquoshowrsquo) a singledocument using HTML

The listing below contains the initial code for the design document_designd1 It contains a shows section which in turn contains a sin-gle lsquoshowrsquo function s1 The function returns a JavaScript string which isrendered as output

_id _designd1

SECTION 5 SERVING HTML FROM COUCHDB 34

shows

s1 function(docreq)

return rsquolth1gtrsquo + doc_id + rsquolth1gtrsquo +

rsquoltpgtlatitude rsquo + doclatitude + rsquoltpgtrsquo +

rsquoltpgtlongitude rsquo + doclongitude + rsquoltpgtrsquo

Note that shows is a reserved word in a CouchDB design documentThe lsquoshowrsquo function s1 takes two parameters doc and req doc refers

to the document being rendered req refers to the request object and canbe used for example to retrieve data passed by a query string in the URL

Having saved the design document as d1json it can be uploaded toCouchDB in the usual way

curl -X PUT httpusernamepasswordmickcouchonecomuniversities

_designd1 -d d1json

To render a document (in this case the one for Aston University) usingthe s1 lsquoshowrsquo function we use the following URL httpmickcouchonecomuniversities_designd1_shows1Aston20University

Figure 53 Document rendered in HTML

This is a rather simple document but it serves as a proof-of-concept thatCouchDB as well as storing data can serve HTML files

The URL can be read as follows

bull httpmickcouchonecom is the hosted instance of CouchDB

bull universities is the database

bull _designd1 is the design document

bull _shows1 is the lsquoshowrsquo function that returns an HTML string

bull Aston20University is the id of the document to be rendered

SECTION 5 SERVING HTML FROM COUCHDB 35

53 A more complex lsquoshowrsquo function

The design document may be extended to show more complex HTML doc-uments for example a Google Map for each location

Figure 54 Document data rendered using Google Maps

This page is available at httpmickcouchonecomuniversities

_designd1_showmap1Aston20UniversityAs can be seen from the URL the lsquoshowrsquo function which returns the

HTML for the map is map1HTML for Google Maps is based on the code from the Google Map

JavaScript API V3 Tutorial available at httpcodegooglecomapismapsdocumentationjavascripttutorialhtml

The full listing of the design document including the map1 lsquoshowrsquo func-tion is provided in Appendix A

It becomes apparent that authoring complex HTML and JavaScriptwithin a design document so that it is returned from a JavaScript func-tion at string is very cumbersome For example every double quote has tobe escaped so that the returned string is valid

While manually authoring design documents it can sometimes be easierto enter text directly into the design document using CouchDBrsquos manage-ment console (Futon) rather than uploading documents from the desktopPC using cURL However the process is rather difficult and un-intuitiveeither way

SECTION 5 SERVING HTML FROM COUCHDB 36

Figure 55 Documents may be edited directly using Futon

54 Views and Lists

As we saw in Section 4 queries against a CouchDB database take the formof map-reduce functions

In a CouchDB design document such queries are listed in the views

section and are known as CouchDB lsquoviewrsquo functionsHere we define a very simple lsquoviewrsquo function simply outputting the _id

field for each document in the collection

_id _designd1

_rev 3-98e327097d3d7ed5a9454800c25d9ff9

shows

lists

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 37

The output from the lsquoviewrsquo function is here httpmickcouchone

comuniversities_designd1_viewv1

Figure 56 Output from a simple lsquoviewrsquo returning document id values

A lsquoviewrsquo is in effect a query function which returns a set of results Inorder to display the results to HTML we may define lsquolistrsquo functions in aCouchDB design document

bull v1 is a very simple View simply emitting the document ids

bull l1 is a very simple List that shows a hyperlink for each document id

We augment the design document with a lsquolistrsquo function as follows

_id _designd1

_rev 5-d6da34f482e0ee1a711ed302b9b08bb1

shows

lists

l1 function(head req)

var row

start( headers Content-Type texthtml )

while(row = getRow())

send(rsquoltpgtlta href=httpmickcouchonecomuniversities

_designdefault_showgooglemaprsquo + rowvalue + rsquogtrsquo + rowvalue

+ rsquoltagtltpgtrsquo)

views

v1

map function(doc)emit(doc_id doc_id)

SECTION 5 SERVING HTML FROM COUCHDB 38

Note the built-in functions start and send

bull start is executed once at the beginning of the function

bull send is executed once for each row in the dataset

At the following URL the lsquolistrsquo function l1 consumes the result set fromthe lsquoviewrsquo function v1 and displays the resulting HTML httpmick

couchonecomuniversities_designd1_listl1v1

Figure 57 The simple lsquoviewrsquo rendered using a lsquolistrsquo function

55 Lessons Learned

The motivation behind this section was to explore CouchDB design docu-ments using very simple tools I found that it is possible to author designdocuments in a text editor and upload these to a CouchDB server in thenormal way However this approach becomes very cumbersome and difficultto maintain as the design document becomes more complex

Another approach which helped with more complex design documentswas to use CouchDBrsquos management console lsquoFutonrsquo to make in-place amend-ments to documents This however also has its limitations Fundamentallythe problem is that it is very difficult to write HTML as if it were the outputof a JavaScript function

As a result of working through this exercise in simple application develop-ment we have gained a deeper understanding of the structure of a CouchDBdesign document In the following section we will look at CouchApp whichis a development framework used to create full-featured applications usingCouchDB design documents

Section 6

Serving GeoRSS usingCouchApp

This section details the development process used to build more full-featuredCouchDB applications I will add GeoRSS functionality to an existingCouchDB blogging application Sofa and conclude with some commentson how to build an application from scratch to serve GeoRSS

GeoRSS (httpwwwgeorssorg) is a standard for interoperable syn-dicated feeds of geographical information One very common use of GeoRSSis to create feeds which can be displayed on online maps

61 Introduction to CouchApp

As we saw in the previous section it is possible to create HTML andJavaScript applications in CouchDB design documents

CouchApp (httpcouchapporg) is a set of Python scripts developedby J Chris Anderson and Benoit Chesneau to enable easier authoring anddeployment of complex design documents [29]

I will not go into detail on installation of CouchApp but refer the readerto the instructions and tutorials at httpcouchapporg

The key benefit of CouchApp from an application developerrsquos perspec-tive is that it creates a set of separate files for each element of the designdocument From the developerrsquos perspective a CouchDB design documentis in fact a web application project - CouchApp helps us to keep eachelement separate so that development proceeds in an organised fashion

62 Sofa - a Blogging Application

The reference application for CouchApp is Sofa a blogging application Thesource code for Sofa is available here httpgithubcomjchrissofa

39

SECTION 6 SERVING GEORSS USING COUCHAPP 40

I would like to make it absolutely clear that Sofa was created by J ChrisAnderson I made a small number of changes to 3 files on my copy of it inorder to use the platform for serving GeoRSS data

Sofa allows a user to enter blog posts and provides an RSS feed of theposts I took this as a starting point and with the help of J Chris Andersonand others on the couchdb-users mailing list amended the RSS output toserve GeoRSS

I describe the changes I made in more detail in Appendix B

Figure 61 Sofa amended to accept latitude and longitude

The instance of Sofa that I amended is available here httpmick

couchonecomblog_designsofa_listindexrecent-posts

The GeoRSS feed is here httpmickcouchonecomblog_designsofa_listindexrecent-postsdescending=trueamplimit=10ampformat=atom

The GeoRSS feed on Google Maps is available here httpmaps

googlecommapsf=qampsource=s_qamphl=enampgeocode=ampq=http2F2Fmick

couchonecom2Fblog2F_design2Fsofa2F_list2Findex2Frecent-posts

3Fdescending3Dtrue26limit3D1026format3Datomampsll=370625-95

677068ampsspn=3554717656513672ampie=UTF8ampz=3

SECTION 6 SERVING GEORSS USING COUCHAPP 41

Figure 62 Sofa GeoRSS feed on Google Maps

63 Developing with CouchApp

In this section I describe the process of developing a simple CouchApp fromscratch using lessons learned from the amendments I made to Sofa

As discussed CouchApp is fundamentally a system for managing acomplex CouchDB design document It does this by separating each area ofthe design document into a separate file

As an example for the British Council Activity Mapping Project wewant to create a design document bc

To do this we navigate to the folder where couchapp is installed andrun the following command

couchapp generate bc

On my Ubuntu Linux PC the full command line looks like this

michaeldell~couchapp$ couchapp generate bc

This command creates a folder structure which matches the eventualstructure of the design document that we want to create

bull The Views folder contains mapjs and when needed reducejs filesfor each CouchDB View A View can be thought of as a query againstthe database

bull The Shows folder contains js files corresponding to each lsquoShowrsquo func-tion in the design document A Show is a function used to transforma single document into HTML

bull Similarly the Lists folder contains js files one for each lsquoListrsquo func-tion in the eventually generated design document A List is used totransform a set of documents returned by a View into HTML

SECTION 6 SERVING GEORSS USING COUCHAPP 42

Figure 63 Files generated by CouchApp

The screen-shot illustrates the files generated by CouchApp Some ad-ditions have been made to the default files as follows

bull To make sure that couchapprc is visible it may be necessary to lsquoshowhidden filesrsquo in the file browser This file is required later when weupload the design document to CouchDB

bull I added a templates folder This folder contains the georsshtml

template that is used to create the GeoRSS feed

bull I added a lib folder This folder contains the mustachejs file copiedfrom Sofa I discuss the reason for adding this file in more detail below

To illustrate the use of CouchApp files I will initially use the simplefunctions s1 l1 and v1 which were introduced in Section 5

To generate a lsquoShowrsquo function s1 we navigate to the bc folder

michaeldell~couchapp$ cd bc

and run the following command

michaeldell~couchappbc$ couchapp generate show s1

SECTION 6 SERVING GEORSS USING COUCHAPP 43

couchapp generate show creates a stub for a Show function which wecan fill in using a text editor A Show function takes a single CouchDBdocument and transforms it to HTML

Figure 64 s1js in a file generated by CouchApp

Similarly couchapp generate list l1 will create a lsquoListrsquo function l1In the case of views couchdb generate view v1 will generate a folder

named v1 containing two files - mapjs and reducejsWe recall that in CouchDB a View is in effect a query against the

databaseAs v1 is a very simple View with a map function but no reduce function

we remove the empty reducejs from the folder as it is not needed (Itrsquosimportant to do this if we supply an empty reduce function then the Viewwill return no values)

Figure 65 Remove the reducejs file if it is not needed

SECTION 6 SERVING GEORSS USING COUCHAPP 44

64 Deployment

CouchApp uses Python scripts to merge all the separate files into a designdocument and to send the resulting design document using the correctHTTP commands to the CouchDB server

So that CouchApp knows what server to send the document to we needto amend the couchapprc file

env

default

dbhttpusernamepasswordmickcouchonecomuniversities

So far we have uploaded a very simple design document to CouchDBjust as we did in Section 5 This had demonstrated how much easier it is todevelop solutions with the CouchApp framework rather than working withdesign documents directly

65 Using templates

In this section we complete the projectrsquos aim of serving university projectdata from the British Council Activity Mapping Project using CouchDB

For this we will make use of mustachejs a templating system usedfor Sofa [35] Mustache allows us to create HTML or XML output by trans-forming a set of documents It provides the ability to iterate through setsof values and output HTML or XML in the exact way we wish

Firstly let us generate a simple View all This consists of a mapjs

file as follows (the corresponding reducejs file has been deleted)

couchapp generate view all

The contents of mapjs are as follows

function(doc)

emit(doc_id doc)

This simply outputs the content of all the documents in the collection atthe following URL httpmickcouchonecomuniversities_designbc_viewall

The next step is to create a List function to transform the contents ofthe View

couchapp generate list georss

SECTION 6 SERVING GEORSS USING COUCHAPP 45

This creates a file called georssjsI added some code derived from that in Sofa which lsquopackagesrsquo the data

from the View and sends it using mustachejs to a template fileThe full text of the file is provided in Appendix CThe template itself georsshtml is stored in the templates directory

The full text is provided in Appendix D but the main part of the file is alsoreproduced below

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

Each institution has a list of British Council programmes associated withit The code loops through each of the programmes and if the programmehas a list of countries the flags of those countries are emitted

(The flags are uploaded to a countries database The flags themselvescome from httpwwwfamfamfamcomlabiconsflags as noted inSection 4 The bash script used to upload the files to CouchDB is providedin Appendix E)

The GeoRSS feed is available here httpmickcouchonecomuniversities_designbc_listgeorssall

The resulting Google Map may be viewed here httpmapsgoogle

comq=httpmickcouchonecomuniversities_designbc_listgeorss

all

SECTION 6 SERVING GEORSS USING COUCHAPP 46

Figure 66 Activity Mapping data from CouchDB on Google Maps

This illustration shows that CouchDB can be used as a replacementfor the current flat files used for the British Council Activity Map projecthttpactivitymapbritishcouncilorg

The key factors in achieving this outcome have been the use of CouchAppand a server-side JavaScript templating framework (in this case Mustache)

Section 7

Critical Assessment andConclusion

This MSc project investigated the suitability of CouchDB as a platform forbusiness application development Why should an applications developerin a corporate environment choose to develop an application on CouchDBas opposed to simply using tried-and-trusted products such as PHP andMySql or C and SQL Server

The first comment to make is that it is not necessarily an lsquoeither-orrsquoscenario In this project CouchDB is used as a web publishing layer in sup-port of an existing application written using C and SQL Server I believethat this is a trend which is likely to continue so-called lsquoNoSqlrsquo may readilybe used alongside relational databases with each deployed according to itsstrengths In the British Council Activity Project relational integrity is im-portant when putting the data together (using the internal ASPNetSQLServer application) For the publicly-available web layer CouchDB is moresuitable since all we really need to do is render structured documents usingXML HTML and JavaScript

71 Why might a developer choose CouchDB

One distinctive feature that CouchDB provides is that it is both a web serverand a database As such server-side code is simply stored as another kindof document CouchDB makes use of communication over HTTP to easilyreplicate both data and code between remote servers and local machinesThis gives a developer a key advantage - the ability to have data and codereplicated in many places so that the user has high-speed access to it evenwithout a good internet connection Distributed data and application codeare important for reliable computing

Such replication cannot be achieved easily using conventional web appli-cations Even considering the maturity of prevailing technology replication

47

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 48

of code and data together as a unit is difficult and expensiveThe lesson I have gained from working with CouchDB is that document-

oriented storage is closer to the desired end-product of web development(which is itself a document in HTML or XML format) CouchDB is nativelysimpler for web use than conventional databases which were conceived anddesigned long before the web existed

Web development with MySql or SQL Server appears easy only becauseof the existence of Object-Relational Mapping (ORM) tools The processof getting normalised data out of a relational database and displaying it on aweb page is in fact quite complex In comparison for CouchDB relativelysimple tools such as CouchApp and Mustache do the same work of takingdata and transforming it into useful formats for the web

A further benefit of CouchDB is that it is standards-based All commu-nication with CouchDB takes place via standard HTTP verbs in additionit makes use of JSON a widely adopted standard as a document stor-age format The use of HTTP as a communications standard in particularmeans that existing programming languages and command-line tools suchas cURL are compatible with CouchDB Learning about CouchDB actuallyassists with learning about the HTTP protocol

A potential driver for development with CouchDB is that of cost -CouchDB is designed for robustness and reliability on commodity-gradehardware (During the time I was working on this project CouchOne startedmaking hosted database capacity available at zero cost as a beta service athttpwwwcouchonecomget)

72 CouchDBrsquos challenges

As someone with experience of traditional applications development I wantedto investigate the feasibility of creating a web application using this inno-vative technology I found that it was indeed possible to use CouchDBdesign documents to transform data into useful formats for the web such asGeoRSS and HTML

The challenges revealed while developing on the CouchDB platform wereas follows

bull writing queries as map-reduce functions instead of as SQL

bull understanding the structure of design documents

bull the use of server-side JavaScript functions to transform documents toHTML and XML

The task was made easier by tools such as CouchApp and the Mus-tache templating engine Without them as seen in Section 5 the process oftransforming data is possible but cumbersome

SECTION 7 CRITICAL ASSESSMENT AND CONCLUSION 49

CouchDB is still a new product and as such there is a lack of lsquoentry-levelrsquo documentation aimed at mainstream developers

A further challenge for developers using CouchDB is the lack of SQL-style ad-hoc querying Every query against the database is a pre-compiledmap-reduce function Many developers are used to lsquothinkingrsquo in SQL Map-Reduce is a different way of expressing an information need - it is not nec-essarily more difficult than SQL but the learning curve is significant

73 Conclusion

The lsquoNoSqlrsquo movement in database technology has come about largely asa result of the advances proposed by large-scale web applications providerssuch as Google Amazon Facebook and Yahoo

Questions of large-scale data storage at low cost using CouchDB wereoutside the scope of this particular project but examples such as that of theBBC [17] and the Large Hadron Collider [18] are documented elsewhere

This projectrsquos focus was on the web development process at more con-ventional scale

The project demonstrated the importance of the CouchApp frameworkas an aid to developer productivity As CouchDB matures as a productover the coming years (together with other lsquoNoSqlrsquo offerings) the set oftools available to developers will increase I suspect also that developerswill happily begin to adopt non-relational document-oriented approaches todata persistence for certain applications if this increases their productivity

Based on my experiments I believe that this technology has immensepotential for web and database developers who want to make applicationsand data available (with replication) at low latency

For CouchDB the lack of ad-hoc querying remains an issue For thisreason I think that CouchDB has a future mainly as a caching publishingor reporting layer in support of existing relational business applicationsThis is what is meant by lsquothe end of one size fits allrsquo [45] - in the comingyears relational databases will continue alongside lsquoNoSqlrsquo offerings eachdoing what it is best suited for

Appendix A

Design Document d1

This is the CouchDB design document described in Section 5The design document is also available online httpmickcouchone

com_utilsdocumenthtmluniversities_designd1sourceIt contains two lsquoshowrsquo functions the first s1 returns some simple

HTML the second map1 returns the HTML to render part of the doc-ument data on a Google Map

A simple example httpmickcouchonecomuniversities_designd1_shows1University20of20Aberdeen

A Google Map example httpmickcouchonecomuniversities

_designd1_showmap1University20of20Aberdeen

Note the difficulty of this mode of design document development allHTML is returned as a string from a server-side JavaScript function - thismeans that all double-quotes for example need to be backslash-escapedThis is very tedious to do by hand

As seen in Section 6 the CouchApp framework allows us to developeach HTML file in the normal way before packaging the files into a designdocument and sending it to the CouchDB server

_id _designd1

_rev 5- d6da34f482e0ee1a711ed302b9b08bb1

shows

s1 function(doc req) return

rsquolth1gtrsquo + doc_id + rsquolth1 gtrsquo +

rsquoltpgtlatitude rsquo + doc

latitude + rsquoltpgtrsquo + rsquoltpgt

longitude rsquo + doclongitude + rsquoltpgtrsquo

map1 function(doc req) return rsquoltDOCTYPE

html gtlthtml gtlthead gtltmeta name = viewport

content = initial -scale =10 user -scalable

50

APPENDIX A DESIGN DOCUMENT D1 51

=no gtltstyle type = textcssgt html

height 100 body height 100

margin 0px padding 0px map_canvas

height 100 ltstyle gtltscript type = text

javascript src= http mapsgoogle

commapsapijssensor=falsegtltscript gtlt

script type = textjavascript gt function

initialize () var latlng = new google

mapsLatLng(rsquo + doclatitude + rsquo rsquo + doc

longitude + rsquo) var myOptions =

zoom 14 center latlng

mapTypeId googlemapsMapTypeIdROADMAP

var map = new googlemapsMap(

documentgetElementById ( map_canvas )

myOptions) var marker = new google

mapsMarker ( position latlng title rsquo

+ doc_id + rsquo) var infowindow = new

googlemapsInfoWindow ( content rsquo +

doc_id + rsquo ) googlemapsevent

addListener(marker click function ()

infowindowopen(map marker) ) marker

setMap(map) ltscript gtlthead gtltbody onload

= initialize ()gt ltdiv id= map_canvas

style = width 100 height 100 gt ltdiv

gtltbody gtlthtml gtrsquo

Appendix B

GeoRSS on Sofa

In this appendix I detail a few small changes I made to 3 files from J ChrisAndersonrsquos Sofa CouchApp project in order to enable it to serve GeoRSSfeeds

The archive of the relevant discussion on the couchdb-user public mail-ing list is available here httpmail-archivesapacheorgmod_mbox

couchdb-user201007mbox3CAANLkTi=N+SgEwwW92FTUriF4XWgKsMKhdujq_

ikjzR4Smailgmailcom3EThe changes below refer to files originally found in the Sofa source

code httpgithubcomjchrissofa I have indicated my changes withdouble-forward-slash comments ()

At the end of vendorcouchapplibatomjs

exportsheader = function(data)

var f = ltfeed xmlns=httpwwww3org2005Atomgt

var f = ltfeed xmlns=httpwwww3org2005Atom

xmlnsgeorss=httpwwwgeorssorggeorssgt

ftitle = datatitle

fid = datafeed_id

flinkhref = datafeed_link

flinkrel = self

fgenerator = CouchApp on CouchDB

fupdated = rfc3339(dataupdated)

return ftoXMLString()replace(ltfeedgtrsquorsquo)

exportsentry = function(data)

var entry = ltentrygt

entryid = dataentry_id

entrytitle = datatitle

52

APPENDIX B GEORSS ON SOFA 53

entrycontent = datacontent

entrycontenttype = (datacontent_type || rsquohtmlrsquo)

entryupdated = rfc3339(dataupdated)

entryauthor = ltauthorgtltnamegtdataauthorltnamegtltauthorgt

entrylinkhref = dataalternate

entrylinkrel = alternate

entrypoint = datapoint

return entry

At the end of listsindexjs

alternate pathabsolute(pathshow(rsquopostrsquo rowid))

point rowvalueloc[1] + + rowvalueloc[0]

point rowvaluelatitude + + rowvaluelongitude

)

send the entry to client

send(feedEntry)

while (row = getRow())

close the loop after all rows are rendered

return ltfeedgt

)

I also made the following rudimentary changes to templatesedithtml

lt-- form to create a post --gt

ltform id=new-post action=newhtml method=postgt

lth1gtpageTitlelth1gt

lt-- amended for geosofa --gt

ltpgtltlabelgtPlace Nameltlabelgt

ltinput type=text size=50 name=title value=gtltpgt

ltpgtltlabelgtLatitudeltlabelgt

ltinput type=text size=50 name=latitude value=gtltpgt

ltpgtltlabelgtLongitudeltlabelgt

ltinput type=text size=50 name=longitude value=gtltpgt

lt-- --gt

this is further down in templatesedithtml

APPENDIX B GEORSS ON SOFA 54

apply docForm at login

$(account)evently(

loggedIn function(er)

var userCtx = ruserCtx

postForm = appdocForm(formnew-post

id docid

fields [title body tags]

fields [title latitude longitude body tags]

template

type post

format markdown

author userCtxname

Appendix C

georssjs

As can be seen by the signature function(headreq) this is a CouchDBList function It takes as input a set of documents returned by a View - inthe simplest case the View all which simply returns all documents

What is interesting about this code Well it gathers together the amodel of the data in the variable stash then uses the Mustacheto_html

function to send the lsquostashrsquo to a Mustache template file georsshtml Thecontents of this file are provided in Appendix D

This provides a great deal of templating flexibility

function(head req)

var ddoc = this

var Mustache = require (libmustache )

var List = require ( vendorcouchappliblist)

provides (html function ()

var key =

var stash =

institutions ListwithRows(function(row)

var institution = rowvalue

key = rowkey

return

id institution_id

rev institution_rev

institutionID institution

institutionID

postcode institution

postcode

latitude institutionlatitude

longitude institutionlongitude

constituency institutionconstituency

localAuthority institution

55

APPENDIX C GEORSSJS 56

localAuthority

region institutionregion

has_programmes institutionprogrammes

true false

programmes institutionprogrammes

institutionprogrammesmap(

function(programme)

return

name programmename

url programmeurl

has_countries programmecountries

true false

countries programmecountries

programmecountriesmap(function(

country)

return

country

country

) [] return

nothing if no

countries

) [] return nothing if no

programmes

)

return Mustacheto_html(ddoctemplatesgeorss

stash)

)

Appendix D

georsshtml

The georsshtml code uses the Mustache templating system to transformthe data from a CouchDB document into a valid GeoRSS-format file

The code below goes hand-in-hand with the georssjs file described inAppendix C

Note in particular the iteration code the section between countries

and countries will execute once for each country listed in a particularprogramme

Here is an example of a typical document which is transformed by thetemplate code

_id University of St Andrews

_rev 1-7a29159d52cd71b9f6759ad6d3884945

institutionID 15161

postcode KY16 9AJ

latitude 563412139443169

longitude -279301175308608

constituency North East Fife

localAuthority Fife Council

region Scotland

programmes [

name Chevening Programme

url httpwwwcheveningcom

countries [

idmztj

]

name Commonwealth Scholarship and Fellowship Plan (CSFP)

url httpwwwcsfp-onlineorg

57

APPENDIX D GEORSSHTML 58

countries [

bdza

]

]

Listed below is the code for georsshtml Even though it is a html

file the output it produces is in fact GeoRSS XML format The html fileextension is simply there as a convention for Mustache

ltfeed xmlns =http wwww3org 2005 Atom xmlns

georss =http wwwgeorssorggeorssgt

lttitle gtBritish Council Activity Mapping lttitle gt

ltid gthttp mickcouchonecomuniversities_design

bc_listgeorssall ltidgt

ltlink href=http mickcouchonecomuniversities

_designbc_listgeorssall rel=selfgt

ltgenerator gtCouchApp on CouchDB ltgenerator gt

ltupdated gt2010 -09 -12 lt updated gt

institutions

ltentry gt

ltid gt id ltidgt

lttitle gt id lttitle gt

ltcontent type=htmlgt

has_programmes

programmes

ampltpampgt name

has_countries

countries

ampltimg src=http mickcouchonecom

countries country flag alt =

country title = country ampgt

countries

has_countries

ampltpampgt

programmes

has_programmes

ltcontent gt

ltpoint gt latitude longitude ltpoint gt

ltentry gt

institutions

ltfeed gt

Appendix E

Bash Script to UploadCountry Flag Files

This is a Linux bash script used to upload png files from a folder so thatthe image files become attachments to a CouchDB document

An example of an uploaded flag file is httpmickcouchonecom

countriesdeflag the flag of Germany

binbash

png country flag files from http wwwfamfamfam

comlabiconsflags

png country flag files are copied to countries

png folder

(beneath the current folder)

FILES = countriespng

create countries database in CouchDB

curl -X PUT http usernamepasswordmickcouchone

comcountries

for filepath in $FILES

do

echo $filepath

get the file name from the file path

filename=$(echo $filepath | sed -e rsquos countries

png grsquo)

echo $filename

docname=$(echo $filename | sed -e rsquospnggrsquo)

echo $docname

url=http usernamepasswordmickcouchonecom

countries$docname flag

59

APPENDIX E BASH SCRIPT TO UPLOAD COUNTRY FLAG FILES60

echo $url

put the attachment into CouchDB

this command creates the lsquoiersquo record and puts

the png as an attachment

under countriesieflag

curl -H Content -Typeimagepng

-X PUT http 1270015984 countriesieflag --

data -binary iepng

echo curl -H Content -Typeimagepng -X PUT $url

--data -binary $filepath

curl -H Content -Typeimagepng -X PUT $url --

data -binary $filepath

done

Bibliography

[1] A NoSql Summer A seasonal worldwide reading club for databasesdistributed systems and NoSql-related scientific papers http

nosqlsummerorg

[2] British Council httpwwwbritishcouncilorg

[3] django The Web framework for perfectionists with deadlines http

wwwdjangoprojectcom

[4] GeoRSS httpwwwgeorssorg

[5] How desktopcouch works httpwwwfreedesktoporg

wikiSpecificationsdesktopcouchDocumentationHow_

Desktopcouch_Works

[6] Introducing JSON httpjsonorg

[7] KML httpcodegooglecomapiskmldocumentation

[8] Quickly httpswikiubuntucomQuickly

[9] Relational Persistence for Java and NET httpwwwhibernate

org

[10] The CouchDB Project httpcouchdbapacheorg

[11] J Chris Anderson CouchDB Implements a Fundamental Algo-rithm httpjchrisanetdrl5Fdesignsofa5Fshowpost

CouchDB-Implements-a-Fundamental-Algorithm

[12] J Chris Anderson Jan Lehnardt and Noah Slater Design Documentshttpguidecouchdborgeditions1endesignhtml

[13] J Chris Anderson Jan Lehnardt and Noah Slater CouchDB TheDefinitive Guide OrsquoReilly Media 2010

[14] Joe Armstrong Programming Erlang Software for a ConcurrentWorld Pragmatic Bookshelf 2007

61

BIBLIOGRAPHY 62

[15] E F Codd A Relational Model of Data for Large Shared Data BanksIn Communications of the ACM 1970

[16] Brian F Cooper Raghu Ramakrishnan and Utkarsh Srivastava CloudStorage Design in a PNUTShell volume Beautiful Data The StoriesBehind Elegant Data Solutions chapter 4 OrsquoReilly Media 2009

[17] CouchOne BBC A Case Study httpwwwcouchonecom

case-study-bbc

[18] CouchOne CERN A Case Study httpwwwcouchonecom

case-study-cern

[19] CouchOne Why A Mobile Database httpwwwcouchonecom

pagewhy-mobile

[20] British Council Annual Report 2009-10 httpwww

britishcouncilorgnewGlobalBC20Annual20Report

202009-10_reuploadpdf

[21] British Council Erasmus httpwwwbritishcouncilorg

erasmushtm

[22] C J Date An Introduction to Database Systems Pearson EducationInc 8th edition 2004

[23] Jeffrey Dean and Sanjay Ghemawat MapReduce Simplified Data Pro-cessing on Large Clusters In OSDIrsquo04 Sixth Symposium on OperatingSystem Design and Implementation December 2004

[24] Dr Lawrie Brown School of Computer Science Australian DefenceForce Academy Canberra Australia ErlangmdashAn Open Source Lan-guage for Robust Distributed Applications httpwwwunswadfa

eduau~lpbpapersauug99-erlhtml

[25] Paul McJones (ed) The 1995 SQL Reunion People Projects andPolitics 1997

[26] A Fox and EA Brewer Harvest yield and scalable tolerant systemsIn Hot Topics in Operating Systems 1999 Proceedings of the SeventhWorkshop on

[27] Seth Gilbert and Nancy Lynch Brewerrsquos conjecture and the feasibil-ity of consistent available partition-tolerant web services In In ACMSIGACT News 2002

[28] Vivek Gite Bash Shell Loop Over Set of Files httpwww

cybercitibizfaqbash-loop-over-file

BIBLIOGRAPHY 63

[29] github CouchApp Commit History httpgithubcomcouchapp

couchappcommitsmaster

[30] Ricky Ho Couchdb implementation httphorickyblogspotcom200810couchdb-implementationhtml

[31] Jacob Kaplan-Moss Of the Web httpjacobianorgwriting

of-the-web

[32] Damien Katz CouchDB and Me httpwwwinfoqcom

presentationskatz-couchdb-and-me

[33] Damien Katz CouchDB Architecture httpdamienkatznet

200504couchdb5Farchitehtml

[34] Stuart Langridge Firefox bookmarks in CouchDBhttpwwwkryogenixorgdays20090706

firefox-bookmarks-in-couchdb

[35] Jan Lehnardt mustachejs httpgithubcomjanlmustachejs

[36] Vance Lucas NoSql First Impressions Object DatabasesMissed the Boat httpwwwvancelucascomblog

nosql-first-impressions-object-databases-missed-the-boat

[37] Nigel Martin Transaction Management Lecture Notes MSc ComputerScience Birkbeck College

[38] Dwight Merriman Comparing Mongo DB and Couch DB httpwwwmongodborgdisplayDOCSComparing+Mongo+DB+and+Couch+DB

[39] Microsoft Entity Framework Design httpblogsmsdncomb

efdesign

[40] Ted Nedward The Vietnam of Computer Science http

blogstednewardcom20060626The+Vietnam+Of+Computer+

Scienceaspx

[41] Ruby on Rails Class ActiveRecordBase httpapirubyonrailsorgclassesActiveRecordBasehtml

[42] Ed Parcell Tutorial Using JQuery and CouchDb to build asimple AJAX web application httpedparcellposterouscom

using-jquery-and-couchdb-to-build-a-simple-we

[43] Ryan Paul Code tutorial make your application sync with UbuntuOne httparstechnicacomopen-sourceguides200912

code-tutorial-make-your-application-sync-with-ubuntu-one

ars

BIBLIOGRAPHY 64

[44] Daniel Stenberg and contributors curl1 the man page httpcurlhaxxsedocsmanpagehtml

[45] M Stonebraker and U Cetintemel ldquoOne Size Fits Allrdquo An Idea WhoseTime Has Come and Gone In PROCEEDINGS OF THE INTER-NATIONAL CONFERENCE ON DATA ENGINEERING pages 2ndash11IEEE Computer Society Press 2005

[46] Stonebraker M Wong E Kreps P and Held G The Design andImplementation of INGRES ACM TODS 1 3 (September 1976)

[47] Klaus Trainer CouchDB 10 Retrospectives httpmambofulani

couchonecomblog_designsofa_listpostpost-page

startkey=[22CouchDB-1-0-Retrospectives22]

[48] Rik van der Sanden Accessing Azure Masterrsquos thesis Delft Univer-sity the Netherlands 2009 httpswerltudelftnltwikipub

MainPastAndCurrentMScProjectsThesis5FRik5Fvan5Fder

5FSandenpdf

[49] Peter Vogel [Weblog Comment] Is Microsoft Feeling the lsquoNoSQLrsquo HeathttpreddevnewscomBlogsData-Driver200912NoSQL-Heat

[50] Wikipedia ACID httpenwikipediaorgwikiACID

[51] Wikipedia cURL httpenwikipediaorgwikiCURL

[52] Wikipedia Database Normalization httpenwikipediaorg

wikiOpen5FDatabase5FConnectivity

[53] Wikipedia JSON httpenwikipediaorgwikiJSON

[54] Wikipedia Relational model httpenwikipediaorgwiki

Relational5Fmodel

[55] Interview with Damien Katz by Werner Schuster DamienKatz Relaxing on CouchDB httpwwwinfoqcominterviews

CouchDB-Damien-Katz

  • Introduction
    • Motivation
    • Does `one size still `fit all
    • Organisation
      • Background
        • The Relational Model
        • Relational Database Management Systems
        • Normalisation
        • SQL
        • Transactional Guarantees
        • The `NoSql Movement
        • `NoSql databases at large scale
        • Brewers `CAP Theorem
        • Reducing the Impedance Mismatch
        • Benefits of `NoSql
        • Ad-hoc Querying - the `Achilles Heel of `NoSql
        • The Activity Mapping Project
        • Raising Awareness of British Council Impact in the UK
        • Using CouchDB to Serve Mapping Data
          • Introduction to CouchDB
            • `Of the Web
            • Some History
            • Document-Oriented
            • Erlang
            • How Ubuntu uses CouchDB
            • Desktop Couch Python and Quickly
              • Using CouchDB
                • Motivation
                • Hosted CouchDB Service Providers
                • Installing CouchDB
                • Inserting a Document into a CouchDB Database
                • Deleting a Document
                • Updating a Document
                • Adding Attachments
                • Replication
                • Querying a CouchDB Database using Map-Reduce
                  • Serving HTML from CouchDB
                    • Bulk upload of JSON documents
                    • A CouchDB Design Document
                    • A more complex `show function
                    • Views and Lists
                    • Lessons Learned
                      • Serving GeoRSS using CouchApp
                        • Introduction to CouchApp
                        • Sofa - a Blogging Application
                        • Developing with CouchApp
                        • Deployment
                        • Using templates
                          • Critical Assessment and Conclusion
                            • Why might a developer choose CouchDB
                            • CouchDBs challenges
                            • Conclusion
                              • Design Document d1
                              • GeoRSS on Sofa
                              • georssjs
                              • georsshtml
                              • Bash Script to Upload Country Flag Files
Page 25: Project Report for my MSc Computer Science project at Birkbeck
Page 26: Project Report for my MSc Computer Science project at Birkbeck
Page 27: Project Report for my MSc Computer Science project at Birkbeck
Page 28: Project Report for my MSc Computer Science project at Birkbeck
Page 29: Project Report for my MSc Computer Science project at Birkbeck
Page 30: Project Report for my MSc Computer Science project at Birkbeck
Page 31: Project Report for my MSc Computer Science project at Birkbeck
Page 32: Project Report for my MSc Computer Science project at Birkbeck
Page 33: Project Report for my MSc Computer Science project at Birkbeck
Page 34: Project Report for my MSc Computer Science project at Birkbeck
Page 35: Project Report for my MSc Computer Science project at Birkbeck
Page 36: Project Report for my MSc Computer Science project at Birkbeck
Page 37: Project Report for my MSc Computer Science project at Birkbeck
Page 38: Project Report for my MSc Computer Science project at Birkbeck
Page 39: Project Report for my MSc Computer Science project at Birkbeck
Page 40: Project Report for my MSc Computer Science project at Birkbeck
Page 41: Project Report for my MSc Computer Science project at Birkbeck
Page 42: Project Report for my MSc Computer Science project at Birkbeck
Page 43: Project Report for my MSc Computer Science project at Birkbeck
Page 44: Project Report for my MSc Computer Science project at Birkbeck
Page 45: Project Report for my MSc Computer Science project at Birkbeck
Page 46: Project Report for my MSc Computer Science project at Birkbeck
Page 47: Project Report for my MSc Computer Science project at Birkbeck
Page 48: Project Report for my MSc Computer Science project at Birkbeck
Page 49: Project Report for my MSc Computer Science project at Birkbeck
Page 50: Project Report for my MSc Computer Science project at Birkbeck
Page 51: Project Report for my MSc Computer Science project at Birkbeck
Page 52: Project Report for my MSc Computer Science project at Birkbeck
Page 53: Project Report for my MSc Computer Science project at Birkbeck
Page 54: Project Report for my MSc Computer Science project at Birkbeck
Page 55: Project Report for my MSc Computer Science project at Birkbeck
Page 56: Project Report for my MSc Computer Science project at Birkbeck
Page 57: Project Report for my MSc Computer Science project at Birkbeck
Page 58: Project Report for my MSc Computer Science project at Birkbeck
Page 59: Project Report for my MSc Computer Science project at Birkbeck
Page 60: Project Report for my MSc Computer Science project at Birkbeck
Page 61: Project Report for my MSc Computer Science project at Birkbeck
Page 62: Project Report for my MSc Computer Science project at Birkbeck
Page 63: Project Report for my MSc Computer Science project at Birkbeck
Page 64: Project Report for my MSc Computer Science project at Birkbeck
Page 65: Project Report for my MSc Computer Science project at Birkbeck
Page 66: Project Report for my MSc Computer Science project at Birkbeck
Page 67: Project Report for my MSc Computer Science project at Birkbeck
Page 68: Project Report for my MSc Computer Science project at Birkbeck