24
Explore MongoDB Learn why this database management system is so popular Skill Level: Intermediate Joe Lennon Lead Mobile Developer Core International 21 Jun 2011 In this article, you will learn about MongoDB, the open source, document-oriented database management system written in C++ that provides features for scaling your databases in a production environment. Discover what benefits document-oriented databases have over traditional relational database management systems (RDBMS). Install MongoDB and start creating databases, collections, and documents. Examine Mongo's dynamic querying features, which provide key/value store efficiency in a way familiar to RDBMS database administrators and developers. What is MongoDB? In recent years, we have seen a growing interest in database management systems that differ from the traditional relational model. At the heart of this is the concept of NoSQL, a term used collectively to denote database software that does not use the Structured Query Language (SQL) to interact with the database. One of the more notable NoSQL projects out there is MongoDB, an open source document-oriented database that stores data in collections of JSON-like documents. What sets MongoDB apart from other NoSQL databases is its powerful document-based query language, which makes the transition from a relational database to MongoDB easy because the queries translate quite easily. MongoDB is written in C++. It stores data inside JSON-like documents (using BSON — a binary version of JSON), which hold data using key/value pairs. One feature that differentiates MongoDB from other document databases is that it is very straightforward to translate SQL statements into MongoDB query function calls. This Explore MongoDB Trademarks © Copyright IBM Corporation 2011 Page 1 of 24

Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Explore MongoDBLearn why this database management system is so popular

Skill Level: Intermediate

Joe LennonLead Mobile DeveloperCore International

21 Jun 2011

In this article, you will learn about MongoDB, the open source, document-orienteddatabase management system written in C++ that provides features for scaling yourdatabases in a production environment. Discover what benefits document-orienteddatabases have over traditional relational database management systems (RDBMS).Install MongoDB and start creating databases, collections, and documents. ExamineMongo's dynamic querying features, which provide key/value store efficiency in away familiar to RDBMS database administrators and developers.

What is MongoDB?

In recent years, we have seen a growing interest in database management systemsthat differ from the traditional relational model. At the heart of this is the concept ofNoSQL, a term used collectively to denote database software that does not use theStructured Query Language (SQL) to interact with the database. One of the morenotable NoSQL projects out there is MongoDB, an open source document-orienteddatabase that stores data in collections of JSON-like documents. What setsMongoDB apart from other NoSQL databases is its powerful document-based querylanguage, which makes the transition from a relational database to MongoDB easybecause the queries translate quite easily.

MongoDB is written in C++. It stores data inside JSON-like documents (using BSON— a binary version of JSON), which hold data using key/value pairs. One featurethat differentiates MongoDB from other document databases is that it is verystraightforward to translate SQL statements into MongoDB query function calls. This

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 1 of 24

Page 2: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

makes is easy for organizations currently using relational databases to migrate. It isalso very straightforward to install and use, with binaries and drivers available formajor operating systems and programming languages.

MongoDB is an open-source project, with the database itself licensed under theGNU AGPL (Affero General Public License) version 3.0. This license is a modifiedversion of the GNU GPL that closes a loophole where the copyleft restrictions do notapply to the software's usage but only its distribution. This of course is important insoftware that is stored on the cloud and not usually installed on client devices. Usingthe regular GPL, one could perceive that no distribution is actually taking place, andthus potentially circumvent the license terms.

The AGPL only applies to the database application itself, and not to other elementsof MongoDB. The official drivers that allow developers to connect to MongoDB fromvarious programming languages are distributed under the Apache License Version2.0. The MongoDB documentation is available under a Creative Commons license.

Document-oriented databases

Document-oriented databases are quite different from traditional relationaldatabases. Rather than store data in rigid structures like tables, they store data inloosely defined documents. With relational database management systems(RDBMS) tables, if you need to add a new column, you need to change the definitionof the table itself, which will add that column to every existing record (albeit withpotentially a null value). This is due to RDBMS' strict schema-based design.However, with documents you can add new attributes to individual documentswithout changing any other documents. This is because document-orienteddatabases are generally schema-less by design.

Another fundamental difference is that document-oriented databases don't providestrict relationships between documents, which helps maintain their schema-lessdesign. This differs greatly from relational databases, which rely heavily onrelationships to normalize data storage. Instead of storing "related" data in aseparate storage area, in document databases they are embedded in the documentitself. This is much faster than storing a reference to another document where therelated data is stored, as each reference would require an additional query.

This works extremely well for many applications where it makes sense for the datato be self-contained inside a parent document. A good example (which is also givenin MongoDB documentation) is blog posts and comments. The comments only applyto a single post, so it does not make sense to separate them from that post. InMongoDB, your blog post document would have a scomments attribute that storesthe comments for that post. In a relational database you would probably have acomments table with an ID primary key, a posts table with an ID primary key and anintermediate mapping table post_comments that defines which comments belong towhich post. This is a lot of unnecessary complexity for something that should be very

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 2 of 24

Page 3: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

straightforward.

However, if you must store related data separately you can do so easily in MongoDBusing a separate collection. Another good example is that you store customer orderinformation in the MongoDB docs. This can typically comprise information about acustomer, the order itself, line items in the order, and product information. UsingMongoDB, you would probably store customers, products, and orders in individualcollections, but you would embed line item data inside the relevant order document.You would then reference the products and customers collections using foreignkey-style IDs, much like you would in a relational database. The simplicity of thishybrid approach makes MongoDB an excellent choice for those accustomed toworking with SQL. With that said, take time and care to decide on the approach youneed to take for each individual use case, as the performance gains can besignificant by embedding data inside the document rather than referencing it in othercollections.

Features at a glance

MongoDB is a lot more than just a basic key/value store. Let's take a brief look atsome of its other features:

• Official binaries available for Windows®, Mac OS X, Linux® and Solaris,source distribution available for self-build

• Official drivers available for C, C#, C++, Haskell, Java™, JavaScript, Perl,PHP, Python, Ruby and Scala, with a large range ofcommunity-supported drivers available for other languages

• Ad-hoc JavaScript queries that allow you to find data using any criteria onany document attribute. These queries mirror the functionality of SQLqueries, making it very straightforward for SQL developers to writeMongoDB queries.

• Support for regular expressions in queries

• MongoDB query results are stored in cursors that provide a range offunctions for filtering, aggregation, and sorting including limit(),skip(), sort(), count(), distinct() and group().

• map/reduce implementation for advanced aggregation

• Large file storage using GridFS

• RDBMS-like attribute indexing support, where you can create indexesdirectly on selected attributes of a document

• Query optimization features using hints, explain plans, and profiling

• Master/slave replication similar to MySQL

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 3 of 24

Page 4: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

• Collection-based object storage, allowing for referential querying wherenormalized data is required

• Horizontal scaling with auto-sharding

• In-place updates for high-performance contention-free concurrency

• Online shell allows you to try out MongoDB without installing

• In-depth documentation, several books published and currently in writing

Installing MongoDB

Fortunately, MongoDB is very straightforward to install on a wide variety ofplatforms. Binary distributions are available for Windows, Mac OS X, Linux, andSolaris, while various package managers provide easy installation and setup optionsfor other systems. If you're brave enough, you can compile the source code foryourself. In this section, you will learn how to install MongoDB on Windows and MacOS X, setting the process up as a service on Windows or as a daemon on OS X.

Installing on Windows

Installation of MongoDB on Windows is very straightforward. In your favorite webbrowser, navigate to http://www.mongodb.org/downloads and download the lateststable production release for Windows. The 64-bit version is recommended, but canonly be used if you are using the 64-bit version of the Windows operating system. Ifyou're unsure, just use the 32-bit version.

Extract the zip file to the C:\ drive, which will create a new folder with a name likemongodb-win32-i386-1.6.4. To make your life easier, rename this folder to mongo.Next, you need to create a data directory. In Windows Explorer, go to the root of theC:\ drive and create a new folder named data. Inside this folder, create a new foldernamed db.

You can now start the MongoDB server. Use Windows Explorer to navigate toC:\mongo\bin and double-clicking mongod.exe. Closing the command promptwindow that opens will stop the MongoDB server. As a result, it is more convenientto set up the MongoDB server as a service that is Windows controls. Let's do thatnow.

Open a command prompt window (Start>Run>, enter cmd and press OK) and issuethe commands in Listing 1.

Listing 1. Setting up the MongoDB server as a service

> cd \mongo\bin

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 4 of 24

Page 5: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

> mongod --install --logpath c:\mongo\logs --logappend--bind_ip 127.0.0.1 --directoryperdb

You should see the output in Listing 2.

Listing 2. Service created successfully

all output going to c:\mongo\logsCreating service MongoDB.Service creation successful.Service can be started from the command line via 'net start "MongoDB"'.

With Mongo installed as a service, you can now start it with the following command:> net start "MongoDB"

You should see the output in Listing 3.

Listing 3. Mongo started successfully

The Mongo DB service is starting.The Mongo DB service was started successfully.

You can now run the MongoDB shell client. If you have a command prompt windowopen, make sure you are in the c:\mongo\bin folder and enter the followingcommand: > mongo.

Alternatively, in Windows Explorer navigate to C:\mongo\bin and double-click onmongo.exe. Whichever way you choose to start the shell, you should see a promptas in Listing 4.

Listing 4. Starting the shell

MongoDB shell version: 1.8.1connecting to: test>

Unless you also want to set up MongoDB on a Mac OS X machine, you can nowskip the next part of this section and move on to "Getting started", where you willlearn how to interact with the MongoDB server using the shell client.

Installing on Mac OS X

Assuming you are using a 64-bit version of Mac OS X, the following steps detail howto download the 64-bit OS X binary of MongoDB, extract it and configure it to getstarted. It will also show you how to run MongoDB as a daemon.

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 5 of 24

Page 6: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

First, launch Terminal (Applications>Utilities>Terminal). In the Terminal window,run the commands in Listing 5.

Listing 5. Setting up MongoDB on Mac OS X

$ cd ~$ curl http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.6.4.tgz > mongo.tgz$ tar xzf mongo.tgz$ mv mongodb-osx-x86_64-1.8.1/ mongo$ mkdir -p /data/db

MongoDB is now set up and ready to use. Before going any further, it might be goodto add MongoDB to your path. Execute the following command: $ nano~/.bash_profile.

This file may not exist yet. In any case, add the following line: exportPATH={$PATH}:~/mongo/bin.

Save the file by pressing ctrl + O and then hit Enter at the prompt. Then press ctrl +X to exit nano. Now, reload your bash profile with the following command: $source ~/.bash_profile.

You are now ready to startup MongoDB. To start it, simply issue the followingcommand: $ mongod.

This will start the MongoDB database server as a foreground process. If you'd preferto start MongoDB as a daemon process in the background, issue the followingcommand instead: $ sudo mongod --fork --logpath/var/log/mongodb.log --logappend.

You will be asked to enter a password; enter your Mac OS X administrator passwordat this prompt.

Regardless of which method you chose to start MongoDB, the server should now berunning. If you started it in the foreground, you will need a separate Terminal tab orwindow to start the client. To start the client, you simply use the command: $ mongo

You should see the prompt in Listing 6.

Listing 6. Staring the client

MongoDB shell version: 1.8.1connecting to: test>

In the next section, you will learn how to use the MongoDB shell to createdatabases, collections, documents, and so on.

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 6 of 24

Page 7: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Getting started using MongoDB

Included with the MongoDB distribution is a shell application that allows youcomplete control over your databases. Using the shell, you can create and managedatabases, collections, documents, and indexes using server-side JavaScriptfunctions. This makes it easy to get up and running with MongoDB quickly. In thissection, you will learn how to start the shell and see examples of some basiccommands to do basic data storage and retrieval.

The MongoDB shell

The MongoDB shell application is included with the MongoDB distribution in the binfolder. On Windows, this is in the form of the application mongo.exe. Double-clickingthis program in Windows Explorer will start the shell. In UNIX®-based operatingsystems (including Mac OS X) you can start the MongoDB shell by executing themongo command in a terminal window (assuming you followed the instructionsabove to add the MongoDB directory to your path).

When you first launch the shell, you should see the message in Listing 7.

Listing 7. Message after launching the shell

MongoDB shell version: 1.8.1connecting to: test>

You are now connected to your local MongoDB server, and in particular, the "test"database. In the next section, you will learn how to create databases, documents,and collections. If at any stage you are looking for some help, you can simply issuethe command "help" to the Mongo shell prompt. Figure 1 shows the typical output ofa help command.

Figure 1. Output from Mongo shell help command

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 7 of 24

Page 8: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

If you ever want to see the source code behind a MongoDB function, simply type thename of that function in the shell, and it will print the JavaScript source. Forexample, type connect and hit the return key, and you will see the source codeused to connect to a MongoDB database.

Creating databases, collections, and documents

By default, the Mongo shell connects to the "test" database. To switch to a differentdatabase, you use the "use dbname" command. If the database does not exist,MongoDB will create it as soon as you add any data to it. Let's switch to the"mymongo" database with the following command: > use mymongo.

The shell should return the message: switched to db mymongo.

At this point, the database still doesn't really exist, as it doesn't contain any data. InMongoDB, data is stored in collections, allowing you to separate documents ifrequired. Let's create a document and store it in a new collection named "colors": >db.colors.save({name:"red",value:"FF0000"});.

Let's verify that the document has been stored by querying the database: >db.colors.find();.

You should see a response similar to the following (the _id attribute is a uniqueidentifier and will more than likely be different in your result): { "_id" :ObjectId("4cfa43ff528bad4e29beec57"), "name" : "red", "value": "FF0000" }.

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 8 of 24

Page 9: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Documents in MongoDB are stored as BSON (binary JSON). Using the Mongo shell,we can insert data using a JSON-like syntax where each document is an object ofkey-value pairs. In this example, we created a document with two attributes: nameand value, which have values of red and FF0000 (the hexadecimal representationof the standard red color), respectively.

As you may have notice, you did not need to predefine the colors collection, this isautomatically done when you insert an item using the save function.

In this example, you created a very simple document. However, the JSON-likesyntax used can be used to create documents that are more complex. Consider thefollowing JSON document, which represents a purchase order or invoice (SeeListing 8).

Listing 8. Creating a simple document

{order_id: 109384,order_date: new Date("12/04/2010"),customer: {

name: "Joe Bloggs",company: "XYZ Inc.",phone: "(555) 123-4567"

},payment: {

type: "Cash",amount: 4075.99,paid_in_full: true

},items: [

{sku: "ABC1200",description: "A sample product",quantity: 1,price_per_unit: 75.99,

}, {sku: "XYZ3400",description: "An expensive product",quantity: 2,price_per_unit: 2000

}],cashier_id: 340582242

}

As you can see, these documents can store various data types include strings,integers, floats, dates, objects, arrays and more. In Listing 8, the order items havebeen embedded directly into the order document, making it faster to retrieve thisinformation when querying on the document later.

Because the MongoDB shell uses JavaScript, you can write regular JavaScriptconstructs when interacting with your database. Take Listing 9, which creates acollection of character documents, each containing the string representation of thecharacter and its associated ASCII code.

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 9 of 24

Page 10: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Listing 9. Creating a collection of character documents

> var chars = "abcdefghijklmnopqrstuvwxyz"> for(var i =0; i<chars.length; i++) {... var char = chars.substr(i, 1);... var doc = {char:char, code: char.charCodeAt(0)};... db.alphabet.save(doc);... }

This loop will create 26 documents, one for each lowercase letter of the alphabet,each document containing the character itself and its ASCII character code. In thenext section, you will see how to retrieve this data in various ways.

Retrieving data

In the last section, you not only learned how to insert data into a MongoDBdatabase, but you in fact also learned how to use the most basic data retrievalfunction, find. Let's start by using the find command on the alphabet collection wecreated at the end of the previous section: db.alphabet.find();.

This should generate a response like Listing 10.

Listing 10. Generated response

> db.alphabet.find(){ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec90"), "char" : "e", "code" : 101 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f", "code" : 102 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g", "code" : 103 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h", "code" : 104 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i", "code" : 105 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec95"), "char" : "j", "code" : 106 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec96"), "char" : "k", "code" : 107 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec97"), "char" : "l", "code" : 108 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec98"), "char" : "m", "code" : 109 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec99"), "char" : "n", "code" : 110 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9a"), "char" : "o", "code" : 111 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9b"), "char" : "p", "code" : 112 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9c"), "char" : "q", "code" : 113 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9d"), "char" : "r", "code" : 114 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9e"), "char" : "s", "code" : 115 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec9f"), "char" : "t", "code" : 116 }has more>

By default, the find() function retrieved all of the documents in the collection, butdisplayed only the first 20 documents. Giving the command it will retrieve theremaining 6 documents (see Listing 11).

Listing 11. Retrieving the remaining 6 documents

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 10 of 24

Page 11: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

> it{ "_id" : ObjectId("4cfa4adf528bad4e29beeca0"), "char" : "u", "code" : 117 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca1"), "char" : "v", "code" : 118 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca2"), "char" : "w", "code" : 119 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca3"), "char" : "x", "code" : 120 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca4"), "char" : "y", "code" : 121 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca5"), "char" : "z", "code" : 122 }>

The find() function actually returns a cursor to the result set of the query, in thiscase, retrieve all documents. When this is not assigned to a variable or nofurther functions are performed, it will by default print a sample result set to thescreen. To display all of the result set, we could have used the following command:> db.alphabet.find().forEach(printjson);.

This would print every record in the result set, rather than displaying a subset. Wewill see more about using cursors and queries to filter data next.

Querying data

One of MongoDB's greatest strengths is its powerful support for ad-hoc querying thatworks in much the same manner as a traditional relational databases, albeit filteringand returning BSON documents rather than table rows. This approach sets it apartfrom other document stores, which can often be difficult to get to grips with for SQLdevelopers. With MongoDB, relatively complex SQL queries can be easily translatedto JavaScript function calls. In this section, you will learn about the various functionsavailable that allow you to query the data in MongoDB, and how to set up indexes tohelp optimize your queries, just as you would in the likes of DB2, MySQL or Oracle.

Basic queries

In the previous section, you learned how to use the find function to retrieve alldocuments. The find function accepts a series of arguments that allow you to filterthe results that are returned. For example, in the alphabet collection we createdpreviously, you could find any records where the "char" attribute has a value of "q"with the following command: > db.alphabet.find({char: "o"});.

This returns the following response: { "_id" :ObjectId("4cfa4adf528bad4e29beec9a"), "char" : "o", "code" :111 }.

If you want to return all characters with a code less than or equal to 100, you coulduse the following command: > db.alphabet.find({code:{$lte:100}});.

This returns the result in Listing 12, as you might expect.

Listing 12. Result

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 11 of 24

Page 12: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

{ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }

MongoDB supports a variety of conditional operators, including:

• $lt (less than)

• $lte (less than or equal to)

• $gt (greather than)

• $gte (greater than or equal to)

• $all (match all values in an array)

• $exists (check if a field exists or does not exist)

• $mod (modulus)

• $ne (not equals)

• $in (match one or more values in an array)

• $nin (match zero values in an array)

• $or (match one query or another)

• $nor (match neither one query nor another)

• $size (match any array with a defined number of elements)

• $type (match values with a specified BSON data type)

• $not (not equal to)

For more details on all of these operators, see the MongoDB documentation (seeResources for a link).

You can restrict the fields that are returned by your queries using a secondargument in the find function. For example, the following query will only return thechar attribute for any documents with a code value in the range 102 to 105: >db.alphabet.find({code:{$in:[102,103,104,105]}}, {char: 1});.

This should produce the result in Listing 13.

Listing 13. Result

{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f" }{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g" }

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 12 of 24

Page 13: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h" }{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i" }

In the next section, you will learn how to create indexes to speed up your queries.

Indexing

MongoDB indexes are quite similar to relational databases indexes. You can placean index on any attribute. In addition, indexed fields may be of any data type,including an object or an array. Like RDBMS indexes, you can create compoundindexes using multiple attributes, and unique indexes, which ensure that duplicatevalues are not allowed.

To create a basic index, you use the ensureIndex function. Let's create an indexon the code and char attributes in the alphabet collection now (see Listing 14).

Listing 14. Creating an index

> db.alphabet.ensureIndex({code: 1});> db.alphabet.ensureIndex({char: 1});

You can drop indexes using the dropIndex and dropIndexes functions. See theMongoDB documentation for further information.

Sorting

To sort your result set, you can apply the sort function to your cursor. Our alphabetcollection is already sorted in ascending order on both code and char attributes, solet's get a subset back in ascending order, sorted by the code attribute: >db.alphabet.find({code: {$gte: 118}}).sort({code: 0});.

This returns the result in Listing 15.

Listing 15. Result

{ "_id" : ObjectId("4cfa4adf528bad4e29beeca5"), "char" : "z", "code" : 122 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca4"), "char" : "y", "code" : 121 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca3"), "char" : "x", "code" : 120 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca2"), "char" : "w", "code" : 119 }{ "_id" : ObjectId("4cfa4adf528bad4e29beeca1"), "char" : "v", "code" : 118 }

If you supplied the argument {code: 1} to the sort function in the previouscommand, it would sort the results in ascending order. To ensure high performancequeries, be sure to add an index to any attribute you sort your data using.

Paging results using skip and limit

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 13 of 24

Page 14: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Often when dealing with data result sets, you only want to retrieve a subset at atime, perhaps to provide paged results on a web page. In MySQL, you wouldtypically do this using the LIMIT keyword. You can easily replicate this functionalityin MongoDB using the skip and limit functions. To return the first 5 documents in thealphabet collection, you could perform the following operation: >db.alphabet.find().limit(5);.

This returns the result in Listing 16.

Listing 16. Result

{ "_id" : ObjectId("4cfa4adf528bad4e29beec8c"), "char" : "a", "code" : 97 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8d"), "char" : "b", "code" : 98 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8e"), "char" : "c", "code" : 99 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec8f"), "char" : "d", "code" : 100 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec90"), "char" : "e", "code" : 101 }

To get the next page, you would use the following command: >db.alphabet.find().skip(5).limit(5);.

As you can see in Listing 17, this fetches the next 5 records.

Listing 17. Fetching the next five records

{ "_id" : ObjectId("4cfa4adf528bad4e29beec91"), "char" : "f", "code" : 102 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec92"), "char" : "g", "code" : 103 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec93"), "char" : "h", "code" : 104 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec94"), "char" : "i", "code" : 105 }{ "_id" : ObjectId("4cfa4adf528bad4e29beec95"), "char" : "j", "code" : 106 }

Group functions and aggregation

MongoDB's query engine also makes it very simple to apply aggregation and groupfunctions on your data. These are analogous to their SQL counterparts. Arguably,the most widely used function is the count() function: >db.alphabet.find().count();.

This should return 26. You can count filtered queries just as easily: >db.alphabet.find({code: {$gte: 105}}).count();.

The above statement should return 18.

Another useful aggregate function is distinct. This is used to return a set ofdistinct values for an attribute. Our alphabet collection is a bad example as all thedata is unique, so let's add a couple of records to the colors collection we createdearlier in this article (see Listing 18).

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 14 of 24

Page 15: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Listing 18. Adding records to the color collection

> db.colors.save({name:"white",value:"FFFFFF"});> db.colors.save({name:"red",value:"FF0000"});> db.colors.find();

Assuming you did not delete the colors collection, you should see the response inListing 19.

Listing 19. Response

{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "red", "value" : "FF0000" }{ "_id" : ObjectId("4cfa5830528bad4e29beeca8"), "name" : "white", "value" : "FFFFFF" }{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

As you can see, there are clearly two red documents in this collection. Now, let's usethe distinct function to get a set of unique name attribute values from thiscollection: > db.colors.distinct("name");.

This returns the following: [ "red", "white" ].

It's worth noting that you do not perform the distinct function on a cursor or result setas you do other query functions, but you perform it directly on the collection. You'llalso note that it does not return a set of documents, but rather an array of values.

MongoDB also provides a group function for performing actions like you would do ina GROUP BY expression in SQL. The group function is a complex beast, so I onlygive a brief example here. For our example, let's say we want to count the number ofdocuments grouped by the name value. In SQL, we could define this expression asSELECT name, COUNT(*) FROM colors GROUP BY name;.

To perform this query in MongoDB, you would use the command in Listing 20.

Listing 20. Using the group function

> db.colors.group(... {key: {name: true},... cond: {},... initial: {count: 0},... reduce: function(doc, out) { out.count++; }... });

This produces the result in Listing 21.

Listing 21. Result

[

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 15 of 24

Page 16: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

{"name" : "red","count" : 2

},{

"name" : "white","count" : 1

}]

If you need to perform advanced aggregation or use large data sets, MongoDB alsoincludes an implementation of map/reduce, which will allow you to do so. Thegroup function outlined above does not work in sharded MongoDB setups, so if youare using sharding, be sure to use map/reduce instead.

Updating existing data

In the MongoDB shell, it is very easy to update documents. In the colors collectionwe created earlier, we had two records for red. Let's say we want to take one ofthose records and change it to black, with the value attribute 000000 (thehexadecimal value of black). First, we can use the findOne function to retrieve asingle item with the value red, change its properties as required, and save thedocument back to the database.

Get a single document with the name red and store it in the blackDoc variable: >var blackDoc = db.colors.findOne({name: "red"});.

Next, we use dot notation to alter the properties of the document (see Listing 22).

Listing 22. Altering the properties of the document

> blackDoc.name = "black";> blackDoc.value = "000000";

Before saving, let's check that the document looks right (it should have an _idattribute, otherwise it will just insert a new record rather than saving over the redone): > printjson(blackDoc);.

If this returns something similar to Listing 23 you're ready to go.

Listing 23. Result

{"_id" : ObjectId("4cfa43ff528bad4e29beec57"),"name" : "black","value" : "000000"

}

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 16 of 24

Page 17: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Finally, use the save function to save the document back to the colors collectionin the database: > db.colors.save(blackDoc);.

We can now use the find function to make sure that our collection looks right: >db.colors.find();.

This should return something like Listing 24. If you have 4 records, you are doing itwrong.

Listing 24. Result

{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "black", "value" : "000000" }{ "_id" : ObjectId("4cfa5830528bad4e29beeca8"), "name" : "white", "value" : "FFFFFF" }{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

Outside of the Mongo shell, you would use the update function in your applicationsto apply changes to existing data. For more information on the update function, seethe MongoDB documentation.

Deleting data

To delete data in MongoDB, you use the remove function. Please note that thisapplies to the MongoDB shell program, some drivers may implement a deletefunction or otherwise. Check the documentation for a specific implementation ifrequired.

The remove function works in a similar way to the find function. To remove anydocuments in the colors collection that match the name white, you would use thefollowing command: > db.colors.remove({name:"white"});.

You can then check that this document has been removed: >db.colors.find();.

If all is well, you should only see two documents (see Listing 25).

Listing 25. Deleting data

{ "_id" : ObjectId("4cfa43ff528bad4e29beec57"), "name" : "black", "value" : "000000" }{ "_id" : ObjectId("4cfa5839528bad4e29beeca9"), "name" : "red", "value" : "FF0000" }

To remove all documents in a collection, simply omit the filter from your command,like the following: > db.colors.remove();.

Now when you try to use the find function, you won't get any response, signifying anempty result set: > db.colors.find();.

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 17 of 24

Page 18: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

If you have a document stored in a variable, you can also pass this document to theremove function to delete it, but this is an inefficient way of doing so. You'd be betteroff finding the _id attribute of this document and passing that to the removefunction instead.

To drop a collection, you can use the following command: > db.colors.drop();.

This returns the following: true.

You can now check that the collection has indeed been dropped using the showcollections command. This should produce the output in Listing 26.

Listing 26. Using the show collections command

alphabetsystem.indexes

Finally, if you wish to remove an entire database, you perform the followingcommand: > db.dropDatabase();.

This deletes the currently selected database. You should see the following output: {"dropped" : "mymongo", "ok" : 1 }.

You can use the command show dbs to get a list of available databases. mymongoshould not appear in this list.

Tools and other features

MongoDB includes a series of useful utilities for administering your database. Itprovides various means of importing and exporting data, either for reporting orbackup purposes. In this section, you will discover how to import and export files inJSON format, as well as how to create hot backup files that are more efficient forrecovery purposes. You will also learn about how you can use map/reducefunctions as an alternative to Mongo's regular query functions for complexaggregation of data.

Importing and exporting data

MongoDB's bin directory contains a series of utilities for importing and exporting datain a variety of formats. The mongoimport utility allows you to supply a file with eachline containing a document in JSON, CSV or TSV format and insert each of thesedocuments into a MongoDB database. Because MongoDB uses BSON, if you areimporting JSON documents you need to supply some modifier information if youwish to avail of any of BSON's additional data types that are not available in regularJSON.

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 18 of 24

Page 19: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

The mongoexport utility allows you to produce a file output with every document in aMongoDB database represented in either JSON or CSV format. This is useful forproducing reports where the application accepts either JSON or CSV data as aninput. To produce a CSV file, you need to provide the fields in the order they shouldappear in the output file.

Backing up and restoring databases

The mongoimport and mongoexport utilities are useful for taking data out ofMongoDB for use in other applications or importing from other applications that canmake JSON or CSV data available. However, these utilities should not be used fortaking periodical backups or a MongoDB database or restoring a MongoDBdatabase. Because MongoDB uses BSON and not JSON or CSV, it is difficult topreserve data types when importing data from these formats.

To provide proper backup and restore functionality, MongoDB provides two utilities:mongodump and mongorestore. mongodump produces a binary file backup of adatabase, and mongorestore reads this file and restores a database using it,automatically creating indexes as required (unless you have removed thesystem.indexes.bson file from your backup directory).

Administration utilities

MongoDB also provides a web-based diagnostic interface; available athttp://localhost:28017/ on default MongoDB configurations. This screenlooks like the screenshot in Figure 2.

Figure 2. MongoDB diagnostics

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 19 of 24

Page 20: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

To get other administration information, you can also run the following commands inthe MongoDB shell:

• db.serverStatus();

• db.stats();

If your MongoDB server crashes, you should repair the database to check for anycorruption and perform some data compaction. You can run a repair by runningmongod --repair at your OS command line, or alternatively using the commanddb.repairDatabase(); from the MongoDB shell. The latter command runs at aper-database level, so you would need to run this command for each database onthe server.

You can also validate collection data using the validate function. If you have a

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 20 of 24

Page 21: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

collection named contacts, you could validate that collection with the commanddb.contacts.validate();

MongoDB features many other features to make the lives of DBAs easier. In additiona variety of third-party administration tools and interfaces are available. See theMongoDB documentation for more information.

map/reduce

If you have used the CouchDB database before, you are likely familiar withmap/reduce, as the view engine uses map/reduce functions to filter andaggregate data by default. In MongoDB, this is not the case; simple queries andfiltering (and even aggregation) do not rely on map/reduce. However, MongoDBdoes provide an implementation of map/reduce for use in aggregating large datasets.

map/reduce would likely warrant an article by itself. For detailed information onMongoDB's implementation of it, see the MongoDB documentation (see Resourcesfor a link).

Scaling MongoDB

A primary reason for the recent popularity of key/value stores anddocument-oriented databases is their light footprint and tendency to be highlyscalable. In order to facilitate this, MongoDB relies on the concepts of sharding andreplication, which you will learn about in this section. In addition, you'll also learnhow you can store large files in MongoDB using GridFS. Finally, you'll see how youcan profile your queries to optimize the performance of your database.

Sharding

An important part of any database infrastructure is ensuring that it scales well.MongoDB implementations are scaled horizontally using an auto-shardingmechanism, allowing the scaling of a MongoDB configuration to thousands of nodes,with automatic load balancing, no single point of failure and automatic failover. It isalso very straightforward to add new machines to a MongoDB cluster.

The beauty of MongoDB's auto-sharding features is that it makes it verystraightforward to go from a single server to a sharded cluster, often with little or nochanges to application code required. For detailed documentation on howauto-sharding works and how to implement it, see the MongoDB documentation.

Replication

MongoDB provides replication features in a master-slave configuration (similar toMySQL) for the purposes of failover and redundancy, ensuring a high level of

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 21 of 24

Page 22: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

consistency between nodes. Alternatively, MongoDB can use replica sets to define anode as a primary at any one time, with another node taking over as the primary inthe event of a failure.

Unlike CouchDB, which uses replication as the basis for scaling, MongoDB usesreplication primarily for ensuring high availability by using slave nodes as redundantreplicas.

For further information on MongoDB replication, see the documentation (seeResources for a link).

Large file storage with GridFS

MongoDB databases store data in BSON documents. The maximum size of a BSONdocument is 4MB however, which makes them unsuitable for storing large files andobjects. MongoDB uses the GridFS specification to store large files, by dividing thefile into smaller chunks among multiple documents.

The standard MongoDB distribution includes command line utilities for adding andretrieving GridFS files to and from the local file system. In addition, all officialMongoDB API drivers include support for GridFS. For more details, refer to theMongoDB documentation (see Resources).

Conclusion

In this article, you learned about the MongoDB database management system andwhy it is one of the fastest-growing options in the popular NoSQL section of theDBMS market. You learned about why you would choose a document-orienteddatabase over a traditional RDBMS, and about the various great features thatMongoDB has to offer. You learned how to install and use MongoDB for storage andretrieval of data, and about the various tools and scalability options it provides.

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 22 of 24

Page 23: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

Resources

Learn

• Visit the official MongoDB site.

• Read Wikipedia's MongoDB entry.

• Access the official MongoDB documentation.

• Learn more about the MongoDB map/reduce functions.

• The MongoDB Cookbook provides guidance on all the common ways of usingMongoDB.

• Follow MongoDB on Twitter.

• View the developerWorks demo "An introduction to MongoDB".

• Learn about Using NoSQL and analyzing big data through the developerWorksknowledge path.

• Read Notes from a production MongoDB deployment to see how one companyswitched from MySQL to MongoDB.

• Read Reflections on MongoDB to learn how Collective Idea has made theswitch.

• Read the blog 12 Months with MongoDB to learn how Wordnik has made theswitch.

• Tune into Eltot Horowitz's (CTO of 10gen, the company that sponsorsMongoDB) podcast on MongoDB.

• 10gen develops and supports MongoDB, the open source, high performance,scalable, document-oriented database.

• Java development 2.0: MongoDB: A NoSQL datastore with (all the right)RDBMS moves (Andrew Glover, developerWorks, Sept 2010): Learn all aboutMongoDB's custom API, interactive shell, and support for RDBMS-styledynamic queries, as well as quick and easy MapReduce calculations

• Exploring CouchDB (Joe Lennon, developerWorks, March 2009): Apache'sopen source CouchDB offers a new method of storing data, in what is referredto as a schema-free document-oriented database model. Instead of the highlystructured data storage of a relational model, CouchDB stores data in asemi-structured fashion, using a JavaScript-based view model for generatingstructured aggregation and report results from these semi-structureddocuments.

• Events of interest: Check out upcoming conferences, trade shows, andwebcasts that are of interest to IBM open source developers.

ibm.com/developerWorks developerWorks®

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 23 of 24

Page 24: Explore MongoDB - public.dhe.ibm.compublic.dhe.ibm.com/.../os-mongodb4/os-mongodb4-pdf.pdf · MongoDB apart from other NoSQL databases is its powerful document-based query language,

• developerWorks Open source zone: Find extensive how-to information, tools,and project updates to help you develop with open source technologies and usethem with IBM's products.

Get products and technologies

• Download MongoDB.

• IBM trial software: Innovate your next open source development project usingtrial software, available for download or on DVD.

Discuss

• developerWorks community: Connect with other developerWorks users whileexploring the developer-driven blogs, forums, groups, and wikis. Help build theReal world open source group in the developerWorks community.

About the author

Joe LennonJoe Lennon is a software developer from Cork, Ireland. He works as aWeb application and Oracle PL/SQL developer for Core International,having graduated from University College Cork in 2007 with a degree inbusiness information systems.

developerWorks® ibm.com/developerWorks

Explore MongoDB Trademarks© Copyright IBM Corporation 2011 Page 24 of 24