57
Usage Guidelines Do not forward this document to any non-Infosys mail ID. Forwarding this document to a non-Infosys mail ID may lead to disciplinary action against you, including termination of employment. Contents of this material cannot be used in any other internal or external document without explicit permission from E&[email protected] .

MongoDB_Day2.pdf

Embed Size (px)

Citation preview

Page 1: MongoDB_Day2.pdf

Usage Guidelines

Do not forward this document to any non-Infosys mail ID. Forwarding this document to a

non-Infosys mail ID may lead to disciplinary action against you, including termination of

employment.

Contents of this material cannot be used in any other internal or external document

without explicit permission from E&[email protected].

Page 2: MongoDB_Day2.pdf

Introduction to MongoDBEducation & Research

•ER/CORP/CRS/ERCLD0008/003

•“© 2012 Infosys Limited, Bangalore, India. All rights reserved. Infosys believes the information in this document is accurate as of its publication date; suchinformation is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and suchother intellectual property rights mentioned in this document. Except as expressly permitted, neither this document nor any part of it may be reproduced, storedin a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the priorpermission of Infosys Limited and/or any named intellectual property rights holders under this document.”

Page 3: MongoDB_Day2.pdf

Confidential Information

• This Document is confidential to Infosys Limited. This document contains information and data that Infosys considers confidential

and proprietary (“Confidential Information”).

• Confidential Information includes, but is not limited to, the following:

– Corporate and Infrastructure information about Infosys;

– Infosys’ project management and quality processes;

– Project experiences provided included as illustrative case studies.

• Any disclosure of Confidential Information to, or use of it by a third party, will be damaging to Infosys.

• Ownership of all Infosys Confidential Information, no matter in what media it resides, remains with Infosys.

• Confidential information in this document shall not be disclosed, duplicated or used – in whole or in part – for any purpose without

specific written permission of an authorized representative of Infosys.

Page 4: MongoDB_Day2.pdf

Course Objectives

• Performing basic operations through shell prompt

• Performing aggregation functions in shell

• Create and manage indexes

• To import and export data

Page 5: MongoDB_Day2.pdf

Session Plan

• Querying Mongo

• Aggregation

• Indexing

• Backup and Restore

• Import and Export Data

• Mongo tools

Page 6: MongoDB_Day2.pdf

Querying Mongo

Page 7: MongoDB_Day2.pdf

Querying Mongo : Selection

• db.collection_name.find({JSON_for_where_clause})

– Example : db.trainees.find({stream: ‘Java’, track: ‘fast track’})

– This will return all those documents where the ‘stream’ equals ‘Java’

and ‘track’ equals ‘fast track’.

Page 8: MongoDB_Day2.pdf

Querying Mongo : Selection & Projection

• db.collection_name.find({JSON_for_where_clause},

{JSON_for_select_clause})

Examples

– db.trainees.find({stream: ‘Java’, track: ‘fast track’},{name:1,

emp_id:1})

• This will return the name, emp_id and the document _id of all those

documents where the ‘stream’ equals ‘Java’ and ‘track’ equals ‘fast track’

– db.trainees.find({},{name:1, emp_id:1, _id:0})

• This will return all trainees name and emp_id as there is no where

clause. Also ‘null’ can be specified instead of ‘{}’.

Page 9: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({$or:[{key1: ‘v1’},{ k2: ‘v2’}]})

– will select the documents if any one condition is satisfied

– Example: db.trainees.find({batch: ‘Jan12CS’, $or:[{stream:

‘Java’},{stream: ‘OS’},{track: ‘intermediate’}]})

– This will select the trainees from Jan12CS batch who belong to either

Java stream or to OS stream or if they are from intermediate track

• db.collection_name.find({key1: {$gt: v1}})

– will fetch all documents with key1’s value greater than v1

– Example: db.trainees.find({GPA: {$gt: 4, $lt: 4.9}})

– This will return all the trainee details who have a GPA of between 4 and 4.9

– Similarly, we have gte, lt, lte and ne to check greater than or equal, less

than, less than or equal and not equals respectively

Page 10: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({key: {$in: [‘v1’, ‘v2’]}})

– will retreive those documents whose ‘key’ value is equal to either ‘v1’ or

‘v2’.

– Example: db.trainees.find({stream: {$in: [‘Java’, ‘OS’]}}) will retrieve

the Java and OS stream trainee details

– {$nin: [‘Java’, ‘OS’]} will retrieve trainee details who are not in both

‘Java’ and ‘OS’.

• db.collection_name.find({key: {$all: [‘v1’, ‘v2’]}})

– will retreive those documents that have all the values passed in the

argument as values of the ‘key’ array

– Example: db.trainees.find({module: {$all: [‘JPA’, ‘POJO’]}}) will retreive

those trainee details whose module array has ‘JPA’ and ‘POJO’ in it

Page 11: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({key: {$size: value}})

– will retrieve the documents whose ‘key’ array has size specified in

‘value’

– Example: db.trainees.find({module: {$size: 4}}) will select the

trainees who have completed 4 modules

• ‘$size’ operator cannot have a range as its value, i.e., $gt, $lt, $gte, $lte,

$ne cannot be used with ‘$size’ operators value, the value can only be an

integer

• db.collection_name.find({},{array_field :{$slice : n}})

– ‘n’ can be positive or negative

– this will return only the first ‘n’ values from the given array (if ‘n’ is

positive) or the last ‘n’ values in the given array (when ‘n’ is negative)

Page 12: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({key: {$exists:true})

– useful to retrieve only those documents which have entries for a

particular ‘key’.

– Example: db.trainees.find(certification: {$exists: true}) will select

the details of those trainees who have done some certification.

Page 13: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({key: perl_compatible_regex})

– select those documents whose ‘key’ has value that matches with the

given ‘perl_compatible_regex’.

Example:

– db.trainees.find({name: /an/i}) will retrieve all trainees whose name

has the alphabet series ‘an’ in their name

– The ‘i’ at the end specifies that the regular expression is case

insensitive.

– db.trainees.find({emp_id: /^620/}) will retrieve all trainees whose

emp_id starts with ‘620’.

Page 14: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.collection_name.find({key: {$type: value}})

– will select only those documents whose ‘key’s value’s data type

matches with the data type passed.

– Assume for the field certification, few documents have the name of the

course (whose data type will be string), few have the number of

certification(type Double) and few have null. If you want to select only

those documents with the course name, then use the query,

db.trainees.find({certification: {$type: 2}})

– Data types and the values to be passed:

• Double - 1; String – 2; Array – 4; Object id – 7;

• Boolean – 8; Date – 9; Null – 10; Regular Expression – 11

Page 15: MongoDB_Day2.pdf

Querying Mongo : Operators

• Accessing embedded document

– db.collection_name.find({‘parent_key.emdedded_key’: ‘value’}) is

used to find those documents whose embedded document’s

‘embedded_key’s is equal to the value passed.

• Eg: db.trainees.find({‘project.IDE’: ‘Eclipse’}) will retrieve the trainees

who use ‘Eclipse’ IDE for their project.

Page 16: MongoDB_Day2.pdf

Querying Mongo : Operators

• $elemMatch

Consider a collection ‘CDP’ with the following documents

{ emp_id: 101, certification:

[ { name: ‘Big Data’, grade: ‘A’ },

{ name: ‘AWS’, grade: ‘B’ }

]

}

{ emp_id: 102, certification:

[ { name: ‘Hadoop’, grade: ‘B’ },

{ name: ‘AWS’, grade: ‘A’ }

]

}

Problem Statement: To find all the employees who are certified in AWS with grade ‘A’

Expected output: Only the second document must be returned

Page 17: MongoDB_Day2.pdf

Querying Mongo : Operators

• db.CDP.find({‘certification.name’: ‘AWS’, ‘certification.grade’: ‘A’})

This will return both the documents because

– In the first document, there is an array element with name ‘AWS’ and

there is also another array element with grade ‘A’, thus satisfying both the

selection condition

– The second document is displayed because it has an array element

that has both the name as ‘AWS’ and grade as ‘A’

• So to get the desired output (only the second document), a documents

should be selected only when both conditions are satisfied by the single

element of the array

• For this we use ‘$elemMatch’ operator

• So the query to do the same will be

db.CDP.find({ certification: { $elemMatch: { name: ‘AWS’, grade: ‘A’}}})

Page 18: MongoDB_Day2.pdf

Querying Mongo : Operators

• $where

– Helps to use javascript expression (as a string) or javascript functions

in query

– The javascript expression or function is processed against each

document

– Each document is referred using ‘this’ or ‘obj’ in the javascript

Example:

db.trainees.find({$where: ‘this.currentCDP > this.previousCDP’})

db.trainees.find({$where: function() {return (this.currentCDP > this.previousCDP)})

– Always $where is executed as the last filter during selection

Page 19: MongoDB_Day2.pdf

Querying Mongo : Functions

• db.collection_name.find().count()

– Number of documents in the given collection

• db.collection_name.find().explain()

– Number of objects scanned, time taken to scan and other useful

information

• db.collection_name.distinct(‘key’)

– Returns an array of distinct values for the key

• db.collection_name.help()

– gives all the commands that can be performed on the collection

• db.help()

– gives all the commands that can be performed on the database

Page 20: MongoDB_Day2.pdf

Querying Mongo : Functions

• db.stats()

– gives information about the database such as name, number of

collections and indexes, and the amount of memory used by it

• db.collection_name.stats()

– gives the number of indexes on that collection, total size of all indexes

and individual size of each index along with other information

• db.getLastError()

– gives the details of the last error that occurred during a write operation

if any

Page 21: MongoDB_Day2.pdf

Querying Mongo : Functions

• db.serverStatus() – will give details about the host server, the mongodb version,

the process (mongod / mongos), the memory used by the server, no. of client

connections, the different operations executed by the server, and the cursor type

used.

• db.currentOp() – returns an array that contains various information (like

operationId, secs running, operation name, namespace, the client that issued the

operation, lock status) about all the currently executing operations.

Page 22: MongoDB_Day2.pdf

Querying Mongo : Functions

• To copy database between two server instances, copyDatabase() function can be

used from the destination server instance

• Example:

– db.copyDatabase(“mysourcedb”, “mydestdb”,

“MYSGEC240748D:27017”)

Will copy the database ‘mysourcedb’ from the server running at

‘MYSGEC240748D:27017’ to the destination server (current server)

with the name ‘mydestdb’

Page 23: MongoDB_Day2.pdf

Querying Mongo : Limiting & Ordering

• db.collection_name.find().limit(n)

– limits the results to n documents.

• db.collection_name.find().sort({key: n})

– will sort the result based on the field ‘key’

– n can take either ‘1’ or ‘-1’

– ‘1’ for ascending, and ‘-1’ for descending

Page 24: MongoDB_Day2.pdf

Querying Mongo : Skipping and Chaining

• db.collection_name.find().skip(n)

– skips the first n documents of the result set of find function

• limit(), sort() and skip() function can be chained.

– Example: db.trainees.find().sort({emp_id: 1}).limit(10).skip(5) will

display the ten trainee details sorted based on their emp_id after

skipping the first five in the result generated by find().

Page 25: MongoDB_Day2.pdf

Quiz : Provide the Mongo equivalent

1. INSERT INTO users(user_id, age, status) VALUES ("bcd001", 45, "A")

Answer: db.users.insert( { user_id: "bcd001", age: 45, status: "A" } )

2. SELECT user_id, status FROM users

Answer: db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )

3. SELECT user_id, status FROM users WHERE status = "A“

Answer: db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } )

4. SELECT * FROM users WHERE status = "A" OR age = 50

Answer : db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )

5. SELECT * FROM users WHERE age > 25 AND age <= 50

Answer: db.users.find( { age: { $gt: 25, $lte: 50 } } )

Page 26: MongoDB_Day2.pdf

Quiz : Provide the Mongo equivalent

6. SELECT * FROM users WHERE user_id like "%bc%"

Answer: db.users.find( { user_id: /bc/ } )

7. SELECT * FROM users WHERE status = "A" ORDER BY user_id DESC

Answer: db.users.find( { status: "A" } ).sort( { user_id: -1 }

8. SELECT COUNT(*) FROM users

Answer: db.users.count() OR db.users.find().count()

9. SELECT COUNT(user_id) FROM users

Answer : db.users.count( { user_id: { $exists: true } } )

10. SELECT DISTINCT(status) FROM users

Answer: db.users.distinct( "status" )

Page 27: MongoDB_Day2.pdf

AGGREGATION

Page 28: MongoDB_Day2.pdf

Simple Aggregation Functions

• Count

– db.collection_name.count() gives the number of documents present in the collection.

– db.collection_name.count({JSON for where clause}) will give the number of documents

with the specified selecting criteria.

• Distinct

– db.collection_name.distinct(‘key’) will return the documents with distinct values for the

passed ‘key’

– db.collection_name.distinct(‘key’, {JSON for where clause} ) will return documents that

meets the search criteria and with distinct values for the passed ‘key’

Page 29: MongoDB_Day2.pdf

Simple Aggregation Functions (Contd.)

• Group

– Assume there is a employee_details relational database

table with fields emp_no, emp_name, role, experience and

resources_allocated.

– The MongoDB document equivalent will be like

{emp_no: 6475, emp_name: ‘amit’, role: ‘project lead’, experience:

7, resources_allocated: 5 }

Page 30: MongoDB_Day2.pdf

Simple Aggregation Functions (Contd.)

• Now if we have to group the employee_details based on the role and calculate the sum of resources allocated to each role, the SQL query will be as follows

SELECT role, SUM(resources_allocated) as total_resources

FROM emloyee_details

GROUP BY role

The MongoDB equivalent will be

db. employee_details.group( {

key: {role: 1 },

reduce: function ( cur, result ) {

result.total_resources += cur.resources_allocated;

},

initial: { total_resources : 0 }

} )

Page 31: MongoDB_Day2.pdf

Simple Aggregation Functions (Contd.)

• Now if we have to group those employee_details whose experience is less than 3, based on the role and calculate the sum of resources allocated to each role, the SQL query will be as follows

SELECT role, SUM(resources_allocated) as total_resources

FROM emloyee_details

WHERE experience < 3

GROUP BY role

The MongoDB equivalent will be

db. employee_details.group( {

key: {role: 1 },

cond: {experience : { $lt: 3 } },

reduce: function ( cur, result ) {

result.total_resources += cur.resources_allocated;

},

initial: { total_resources : 0 }

} )

Page 32: MongoDB_Day2.pdf

Simple Aggregation Functions (Contd.)

• Now if we have to group those employee_details whose experience is less than 3, based on the role and experience, and then calculate the sum of resources allocated to each role, the SQL query will be as follows

SELECT role, experience, SUM(resources_allocated) as total_resources

FROM emloyee_details

WHERE experience < 3

GROUP BY role, experience

The MongoDB equivalent will be

db. employee_details.group( {

key: {role: 1, experience: 1 },

cond: {experience : { $lt: 3 } },

reduce: function ( cur, result ) {

result.total_resources += cur.resources_allocated;

},

initial: { total_resources : 0 }

} )

Page 33: MongoDB_Day2.pdf

Simple Aggregation Functions (Contd.)

• Group – Syntax:

– db.collection_name.group({key, reduce, initial, [keyf,][cond,][finalize]})

• Key – specifies the key based on which grouping should be done.

• Reduce – it is a function that specifies what operation (like count, sum) has to be performed on the

grouping documents. The function takes two parameters – the current document and the result till

the aggregation of previous document.

• Initial – the result set of the aggregation operation will be initialized with this value at the beginning

of the operation.

• Keyf – it is an alternative for ‘key’ field. This function is defined when grouping has to be done

based on some derived values rather than the fields.

• Cond – specifies the selection criteria. Only the documents qualifying with this condition will be

considered for grouping.

• Finalize – it is a function that specifies the changes that need to be done to the final result set.

• ‘group’ function cannot be used with sharded cluster. For sharded cluster,

aggregation framework has to be used.

Page 34: MongoDB_Day2.pdf

Aggregation framework

• Used to calculate aggregated values without map-reduce on a sharded cluster

• Provides similar functionality to GROUP BY and related SQL operators

• Provides simple forms of self joins

• Have projection capabilities which reshapes the result

Page 35: MongoDB_Day2.pdf

Framework Components

• Pipelines – are the different properties that the aggregation framework provides.

These properties can be chained. The different pipeline properties are

– $project, $match, $limit, $skip, $unwind, $group, $sort

• Expressions – are the operators that calculate values wen the pipeline properties

are performed. The various expressions that are available are classified into

Boolean, comparison, arithmetic, string, date and conditional operators.

Page 36: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $project – helps to select particular fields.

db.employee_details.aggregate(

{ $project : {

role : 1 ,

experience : 1 ,

}}

);

– This will retrieve the role, experience and _id from all the documents.

Page 37: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $match – used to filter documents using a selection criteria

db. employee_details.aggregate(

{ $match : { role : ‘project lead’ } }

);

This will return those documents whose role is ‘project lead’

Page 38: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $limit – used to limit the number of documents displayed

db.employee_details.aggregate(

{ $limit : 5 }

);

This will display only 5 documents from the collection.

Page 39: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $skip – skips the specified number of documents from the result set

db.employee_details.aggregate(

{ $skip : 5 }

);

This skips the first 5 documents in the result set.

Page 40: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $unwind – if there are ‘n’ values in an array field, if unwind is set to the field, it

creates ‘n’ copies of the document, each copy having one value from the array

db.employee_details.aggregate(

{ $project : {

emp_no : 1 ,

emp_name : 1 ,

specialization : 1

}},

{ $unwind : "$specialization" }

);

In above example, it is assumed that specialization is an array field. Suppose if there are

2 specialization for a particular document, then the output will have two entries with same

emp_no and emp_name but differs only by the specialization.

Page 41: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $group – performs grouping operation

db.employee_details.aggregate(

{ $group : {

_id : ‘$role’,

tot_no_of_emp_in_this_role : { $sum : 1 },

tot_resources : { $sum : ‘$resources_allocated’ }

}}

);

This will group the employees based on role (given as the value of ‘_id’). Also it displays

the total number of employees under the particular role, as it adds ‘1’ to the group’s count

each time it encounters a matching document. And it gives the total resources allocated to

that role, as it adds up the individual resources of that group’s employees.

Page 42: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $group must have any of the following aggregate function with it to develop the

composite value.

$addToSet, $first, $last, $max, $min, $avg, $push, $sum

Page 43: MongoDB_Day2.pdf

Aggregation framework (Contd.)

• $sort – sorts the result set

db.employee_details.aggregate(

{ $sort : { experience : 1 } }

);

This sorts the result set based on the experience.

Page 44: MongoDB_Day2.pdf

Aggregation frameworkMore Examples

SELECT

SUM(resources_allocated) AS

total_resources

FROM employee_details

db. employee_details.aggregate(

[

{ $group: { _id: null,

total_resources: { $sum:

"$resources_allocated" } } }

] )

Sum the resources_allocated field

from employee_details

Page 45: MongoDB_Day2.pdf

Aggregation frameworkMore Examples – Contd.

Page 46: MongoDB_Day2.pdf

Indexing

Page 47: MongoDB_Day2.pdf

Indexing

• Performance can be increased by proper implementation of Indexes

• Indexes increases the speed of read operations

• Index can be created on any field using the following syntax

– db.collection_name.ensureIndex({key:1})

– ‘1’ represents ascending Index and ‘-1’ represents descending Index

• Index can be dropped by

– db.collection_name.dropIndex({key:1})

• Indexes are auto updated after every insert

Page 48: MongoDB_Day2.pdf

Indexing

• ensureIndex() method can have an optional second parameter

• Few values which it can take

– {unique: true} : to create a unique index

– {background: true} : the system does not wait for the index to be

created. Index will be created in the background

– {sparse: true} : will create indexes only on those documents that has

the indexed field in it

– {dropDups: true} : will delete those documents that has duplicated

values for the indexed fields

Page 49: MongoDB_Day2.pdf

Indexing

• db.collection_name.getIndexes() will get all the Indexes created on the particular

collection

• db.collection_name.reIndex() rebuilds all the indexes on the particular collection

• db.collection_name.totalIndexSize() will give the total size in bytes of all the

indexes

Page 50: MongoDB_Day2.pdf

Index Types

• _id Index

– _id index is a unique index on the _id field

– MongoDB creates this index by default on all collections

– Cannot delete the index on _id.

• Secondary Indexes

– All indexes in MongoDB are secondary indexes

– Can create indexes on any field within any document or sub-document

– It can be Indexes on Sub-documents, Embedded Fields or Compound Indexes

Page 51: MongoDB_Day2.pdf

Backup and Restore

Page 52: MongoDB_Day2.pdf

Backup and Restore

• Backups

– Backups of the databases can be created by instantiating the

mongodump application (present in the bin folder)

– The syntax is

mongodump --out path_to_store_backup

– It can also be customized to backup a particular database or collection

mongodump --out path_to_store_backup --db

database_name --collection collection_name

– To restore a backup, mongorestore application have to be instantiated

mongorestore --collection collection_name --db

database_name

path_to_the_backup\collection_name.bson

Page 53: MongoDB_Day2.pdf

Backup and Restore - Cont

• MongoDB export

– To export a collection from the server to local machine to a json

or csv file, ‘mongoexport’ application can be used

• mongoexport --db database_name --collection collection_name -

-out path_to_json\file_name.json

• mongoexport --db database_name --collection collection_name -

-csv --out path_to_json\file_name.csv --fields

field_name1,field_name2

Page 54: MongoDB_Day2.pdf

Backup and Restore – Contd.

• MongoDB import

– To import data from a ‘json’ or ‘csv’ file to a collection,

‘mongoimport’ application can be used

• mongoimport --db dest_database_name --collection

dest_collection_name path_to_input_json

• mongoimport --type csv --db dest_database_name --collection

dest_collection_name path_to_input_csv --fields

new_field_name1,new_field_name2

Page 55: MongoDB_Day2.pdf

Summary

• Querying Mongo

• Aggregation

• Indexing

• Backup and Restore

• Import and Export Data

• Mongo tools

Page 56: MongoDB_Day2.pdf

References

• www.mongodb.org/

• Karl Seguin, “The Little MongoDB Book”

• Kristina Chodorow & Michael Dirolf, “MongoDB: The Definitive Guide”, O'Reilly

Media, 2010

• www.mkyong.com/tutorials/java-mongodb-tutorials/

Page 57: MongoDB_Day2.pdf

Thank You

•ER/CORP/CRS/ERCLD0008/003