Upload
edgar-vega
View
129
Download
122
Tags:
Embed Size (px)
Citation preview
Usage Guidelines
Do not forward this document to any non-Infosys mail ID. Forwarding this document to a
non-Infosys mail ID may lead to disciplinary action against you, including termination of
employment.
Contents of this material cannot be used in any other internal or external document
without explicit permission from E&[email protected].
Introduction to MongoDBEducation & Research
•ER/CORP/CRS/ERCLD0008/003
•“© 2012 Infosys Limited, Bangalore, India. All rights reserved. Infosys believes the information in this document is accurate as of its publication date; suchinformation is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and suchother intellectual property rights mentioned in this document. Except as expressly permitted, neither this document nor any part of it may be reproduced, storedin a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the priorpermission of Infosys Limited and/or any named intellectual property rights holders under this document.”
Confidential Information
• This Document is confidential to Infosys Limited. This document contains information and data that Infosys considers confidential
and proprietary (“Confidential Information”).
• Confidential Information includes, but is not limited to, the following:
– Corporate and Infrastructure information about Infosys;
– Infosys’ project management and quality processes;
– Project experiences provided included as illustrative case studies.
• Any disclosure of Confidential Information to, or use of it by a third party, will be damaging to Infosys.
• Ownership of all Infosys Confidential Information, no matter in what media it resides, remains with Infosys.
• Confidential information in this document shall not be disclosed, duplicated or used – in whole or in part – for any purpose without
specific written permission of an authorized representative of Infosys.
Course Objectives
• Performing basic operations through shell prompt
• Performing aggregation functions in shell
• Create and manage indexes
• To import and export data
Session Plan
• Querying Mongo
• Aggregation
• Indexing
• Backup and Restore
• Import and Export Data
• Mongo tools
Querying Mongo
Querying Mongo : Selection
• db.collection_name.find({JSON_for_where_clause})
– Example : db.trainees.find({stream: ‘Java’, track: ‘fast track’})
– This will return all those documents where the ‘stream’ equals ‘Java’
and ‘track’ equals ‘fast track’.
Querying Mongo : Selection & Projection
• db.collection_name.find({JSON_for_where_clause},
{JSON_for_select_clause})
Examples
– db.trainees.find({stream: ‘Java’, track: ‘fast track’},{name:1,
emp_id:1})
• This will return the name, emp_id and the document _id of all those
documents where the ‘stream’ equals ‘Java’ and ‘track’ equals ‘fast track’
– db.trainees.find({},{name:1, emp_id:1, _id:0})
• This will return all trainees name and emp_id as there is no where
clause. Also ‘null’ can be specified instead of ‘{}’.
Querying Mongo : Operators
• db.collection_name.find({$or:[{key1: ‘v1’},{ k2: ‘v2’}]})
– will select the documents if any one condition is satisfied
– Example: db.trainees.find({batch: ‘Jan12CS’, $or:[{stream:
‘Java’},{stream: ‘OS’},{track: ‘intermediate’}]})
– This will select the trainees from Jan12CS batch who belong to either
Java stream or to OS stream or if they are from intermediate track
• db.collection_name.find({key1: {$gt: v1}})
– will fetch all documents with key1’s value greater than v1
– Example: db.trainees.find({GPA: {$gt: 4, $lt: 4.9}})
– This will return all the trainee details who have a GPA of between 4 and 4.9
– Similarly, we have gte, lt, lte and ne to check greater than or equal, less
than, less than or equal and not equals respectively
Querying Mongo : Operators
• db.collection_name.find({key: {$in: [‘v1’, ‘v2’]}})
– will retreive those documents whose ‘key’ value is equal to either ‘v1’ or
‘v2’.
– Example: db.trainees.find({stream: {$in: [‘Java’, ‘OS’]}}) will retrieve
the Java and OS stream trainee details
– {$nin: [‘Java’, ‘OS’]} will retrieve trainee details who are not in both
‘Java’ and ‘OS’.
• db.collection_name.find({key: {$all: [‘v1’, ‘v2’]}})
– will retreive those documents that have all the values passed in the
argument as values of the ‘key’ array
– Example: db.trainees.find({module: {$all: [‘JPA’, ‘POJO’]}}) will retreive
those trainee details whose module array has ‘JPA’ and ‘POJO’ in it
Querying Mongo : Operators
• db.collection_name.find({key: {$size: value}})
– will retrieve the documents whose ‘key’ array has size specified in
‘value’
– Example: db.trainees.find({module: {$size: 4}}) will select the
trainees who have completed 4 modules
• ‘$size’ operator cannot have a range as its value, i.e., $gt, $lt, $gte, $lte,
$ne cannot be used with ‘$size’ operators value, the value can only be an
integer
• db.collection_name.find({},{array_field :{$slice : n}})
– ‘n’ can be positive or negative
– this will return only the first ‘n’ values from the given array (if ‘n’ is
positive) or the last ‘n’ values in the given array (when ‘n’ is negative)
Querying Mongo : Operators
• db.collection_name.find({key: {$exists:true})
– useful to retrieve only those documents which have entries for a
particular ‘key’.
– Example: db.trainees.find(certification: {$exists: true}) will select
the details of those trainees who have done some certification.
Querying Mongo : Operators
• db.collection_name.find({key: perl_compatible_regex})
– select those documents whose ‘key’ has value that matches with the
given ‘perl_compatible_regex’.
Example:
– db.trainees.find({name: /an/i}) will retrieve all trainees whose name
has the alphabet series ‘an’ in their name
– The ‘i’ at the end specifies that the regular expression is case
insensitive.
– db.trainees.find({emp_id: /^620/}) will retrieve all trainees whose
emp_id starts with ‘620’.
Querying Mongo : Operators
• db.collection_name.find({key: {$type: value}})
– will select only those documents whose ‘key’s value’s data type
matches with the data type passed.
– Assume for the field certification, few documents have the name of the
course (whose data type will be string), few have the number of
certification(type Double) and few have null. If you want to select only
those documents with the course name, then use the query,
db.trainees.find({certification: {$type: 2}})
– Data types and the values to be passed:
• Double - 1; String – 2; Array – 4; Object id – 7;
• Boolean – 8; Date – 9; Null – 10; Regular Expression – 11
Querying Mongo : Operators
• Accessing embedded document
– db.collection_name.find({‘parent_key.emdedded_key’: ‘value’}) is
used to find those documents whose embedded document’s
‘embedded_key’s is equal to the value passed.
• Eg: db.trainees.find({‘project.IDE’: ‘Eclipse’}) will retrieve the trainees
who use ‘Eclipse’ IDE for their project.
Querying Mongo : Operators
• $elemMatch
Consider a collection ‘CDP’ with the following documents
{ emp_id: 101, certification:
[ { name: ‘Big Data’, grade: ‘A’ },
{ name: ‘AWS’, grade: ‘B’ }
]
}
{ emp_id: 102, certification:
[ { name: ‘Hadoop’, grade: ‘B’ },
{ name: ‘AWS’, grade: ‘A’ }
]
}
Problem Statement: To find all the employees who are certified in AWS with grade ‘A’
Expected output: Only the second document must be returned
Querying Mongo : Operators
• db.CDP.find({‘certification.name’: ‘AWS’, ‘certification.grade’: ‘A’})
This will return both the documents because
– In the first document, there is an array element with name ‘AWS’ and
there is also another array element with grade ‘A’, thus satisfying both the
selection condition
– The second document is displayed because it has an array element
that has both the name as ‘AWS’ and grade as ‘A’
• So to get the desired output (only the second document), a documents
should be selected only when both conditions are satisfied by the single
element of the array
• For this we use ‘$elemMatch’ operator
• So the query to do the same will be
db.CDP.find({ certification: { $elemMatch: { name: ‘AWS’, grade: ‘A’}}})
Querying Mongo : Operators
• $where
– Helps to use javascript expression (as a string) or javascript functions
in query
– The javascript expression or function is processed against each
document
– Each document is referred using ‘this’ or ‘obj’ in the javascript
Example:
db.trainees.find({$where: ‘this.currentCDP > this.previousCDP’})
db.trainees.find({$where: function() {return (this.currentCDP > this.previousCDP)})
– Always $where is executed as the last filter during selection
Querying Mongo : Functions
• db.collection_name.find().count()
– Number of documents in the given collection
• db.collection_name.find().explain()
– Number of objects scanned, time taken to scan and other useful
information
• db.collection_name.distinct(‘key’)
– Returns an array of distinct values for the key
• db.collection_name.help()
– gives all the commands that can be performed on the collection
• db.help()
– gives all the commands that can be performed on the database
Querying Mongo : Functions
• db.stats()
– gives information about the database such as name, number of
collections and indexes, and the amount of memory used by it
• db.collection_name.stats()
– gives the number of indexes on that collection, total size of all indexes
and individual size of each index along with other information
• db.getLastError()
– gives the details of the last error that occurred during a write operation
if any
Querying Mongo : Functions
• db.serverStatus() – will give details about the host server, the mongodb version,
the process (mongod / mongos), the memory used by the server, no. of client
connections, the different operations executed by the server, and the cursor type
used.
• db.currentOp() – returns an array that contains various information (like
operationId, secs running, operation name, namespace, the client that issued the
operation, lock status) about all the currently executing operations.
Querying Mongo : Functions
• To copy database between two server instances, copyDatabase() function can be
used from the destination server instance
• Example:
– db.copyDatabase(“mysourcedb”, “mydestdb”,
“MYSGEC240748D:27017”)
Will copy the database ‘mysourcedb’ from the server running at
‘MYSGEC240748D:27017’ to the destination server (current server)
with the name ‘mydestdb’
Querying Mongo : Limiting & Ordering
• db.collection_name.find().limit(n)
– limits the results to n documents.
• db.collection_name.find().sort({key: n})
– will sort the result based on the field ‘key’
– n can take either ‘1’ or ‘-1’
– ‘1’ for ascending, and ‘-1’ for descending
Querying Mongo : Skipping and Chaining
• db.collection_name.find().skip(n)
– skips the first n documents of the result set of find function
• limit(), sort() and skip() function can be chained.
– Example: db.trainees.find().sort({emp_id: 1}).limit(10).skip(5) will
display the ten trainee details sorted based on their emp_id after
skipping the first five in the result generated by find().
Quiz : Provide the Mongo equivalent
1. INSERT INTO users(user_id, age, status) VALUES ("bcd001", 45, "A")
Answer: db.users.insert( { user_id: "bcd001", age: 45, status: "A" } )
2. SELECT user_id, status FROM users
Answer: db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )
3. SELECT user_id, status FROM users WHERE status = "A“
Answer: db.users.find( { status: "A" }, { user_id: 1, status: 1, _id: 0 } )
4. SELECT * FROM users WHERE status = "A" OR age = 50
Answer : db.users.find( { $or: [ { status: "A" } , { age: 50 } ] } )
5. SELECT * FROM users WHERE age > 25 AND age <= 50
Answer: db.users.find( { age: { $gt: 25, $lte: 50 } } )
Quiz : Provide the Mongo equivalent
6. SELECT * FROM users WHERE user_id like "%bc%"
Answer: db.users.find( { user_id: /bc/ } )
7. SELECT * FROM users WHERE status = "A" ORDER BY user_id DESC
Answer: db.users.find( { status: "A" } ).sort( { user_id: -1 }
8. SELECT COUNT(*) FROM users
Answer: db.users.count() OR db.users.find().count()
9. SELECT COUNT(user_id) FROM users
Answer : db.users.count( { user_id: { $exists: true } } )
10. SELECT DISTINCT(status) FROM users
Answer: db.users.distinct( "status" )
AGGREGATION
Simple Aggregation Functions
• Count
– db.collection_name.count() gives the number of documents present in the collection.
– db.collection_name.count({JSON for where clause}) will give the number of documents
with the specified selecting criteria.
• Distinct
– db.collection_name.distinct(‘key’) will return the documents with distinct values for the
passed ‘key’
– db.collection_name.distinct(‘key’, {JSON for where clause} ) will return documents that
meets the search criteria and with distinct values for the passed ‘key’
Simple Aggregation Functions (Contd.)
• Group
– Assume there is a employee_details relational database
table with fields emp_no, emp_name, role, experience and
resources_allocated.
– The MongoDB document equivalent will be like
{emp_no: 6475, emp_name: ‘amit’, role: ‘project lead’, experience:
7, resources_allocated: 5 }
Simple Aggregation Functions (Contd.)
• Now if we have to group the employee_details based on the role and calculate the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details
GROUP BY role
The MongoDB equivalent will be
db. employee_details.group( {
key: {role: 1 },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
},
initial: { total_resources : 0 }
} )
Simple Aggregation Functions (Contd.)
• Now if we have to group those employee_details whose experience is less than 3, based on the role and calculate the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, SUM(resources_allocated) as total_resources
FROM emloyee_details
WHERE experience < 3
GROUP BY role
The MongoDB equivalent will be
db. employee_details.group( {
key: {role: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
},
initial: { total_resources : 0 }
} )
Simple Aggregation Functions (Contd.)
• Now if we have to group those employee_details whose experience is less than 3, based on the role and experience, and then calculate the sum of resources allocated to each role, the SQL query will be as follows
SELECT role, experience, SUM(resources_allocated) as total_resources
FROM emloyee_details
WHERE experience < 3
GROUP BY role, experience
The MongoDB equivalent will be
db. employee_details.group( {
key: {role: 1, experience: 1 },
cond: {experience : { $lt: 3 } },
reduce: function ( cur, result ) {
result.total_resources += cur.resources_allocated;
},
initial: { total_resources : 0 }
} )
Simple Aggregation Functions (Contd.)
• Group – Syntax:
– db.collection_name.group({key, reduce, initial, [keyf,][cond,][finalize]})
• Key – specifies the key based on which grouping should be done.
• Reduce – it is a function that specifies what operation (like count, sum) has to be performed on the
grouping documents. The function takes two parameters – the current document and the result till
the aggregation of previous document.
• Initial – the result set of the aggregation operation will be initialized with this value at the beginning
of the operation.
• Keyf – it is an alternative for ‘key’ field. This function is defined when grouping has to be done
based on some derived values rather than the fields.
• Cond – specifies the selection criteria. Only the documents qualifying with this condition will be
considered for grouping.
• Finalize – it is a function that specifies the changes that need to be done to the final result set.
• ‘group’ function cannot be used with sharded cluster. For sharded cluster,
aggregation framework has to be used.
Aggregation framework
• Used to calculate aggregated values without map-reduce on a sharded cluster
• Provides similar functionality to GROUP BY and related SQL operators
• Provides simple forms of self joins
• Have projection capabilities which reshapes the result
Framework Components
• Pipelines – are the different properties that the aggregation framework provides.
These properties can be chained. The different pipeline properties are
– $project, $match, $limit, $skip, $unwind, $group, $sort
• Expressions – are the operators that calculate values wen the pipeline properties
are performed. The various expressions that are available are classified into
Boolean, comparison, arithmetic, string, date and conditional operators.
Aggregation framework (Contd.)
• $project – helps to select particular fields.
db.employee_details.aggregate(
{ $project : {
role : 1 ,
experience : 1 ,
}}
);
– This will retrieve the role, experience and _id from all the documents.
Aggregation framework (Contd.)
• $match – used to filter documents using a selection criteria
db. employee_details.aggregate(
{ $match : { role : ‘project lead’ } }
);
This will return those documents whose role is ‘project lead’
Aggregation framework (Contd.)
• $limit – used to limit the number of documents displayed
db.employee_details.aggregate(
{ $limit : 5 }
);
This will display only 5 documents from the collection.
Aggregation framework (Contd.)
• $skip – skips the specified number of documents from the result set
db.employee_details.aggregate(
{ $skip : 5 }
);
This skips the first 5 documents in the result set.
Aggregation framework (Contd.)
• $unwind – if there are ‘n’ values in an array field, if unwind is set to the field, it
creates ‘n’ copies of the document, each copy having one value from the array
db.employee_details.aggregate(
{ $project : {
emp_no : 1 ,
emp_name : 1 ,
specialization : 1
}},
{ $unwind : "$specialization" }
);
In above example, it is assumed that specialization is an array field. Suppose if there are
2 specialization for a particular document, then the output will have two entries with same
emp_no and emp_name but differs only by the specialization.
Aggregation framework (Contd.)
• $group – performs grouping operation
db.employee_details.aggregate(
{ $group : {
_id : ‘$role’,
tot_no_of_emp_in_this_role : { $sum : 1 },
tot_resources : { $sum : ‘$resources_allocated’ }
}}
);
This will group the employees based on role (given as the value of ‘_id’). Also it displays
the total number of employees under the particular role, as it adds ‘1’ to the group’s count
each time it encounters a matching document. And it gives the total resources allocated to
that role, as it adds up the individual resources of that group’s employees.
Aggregation framework (Contd.)
• $group must have any of the following aggregate function with it to develop the
composite value.
$addToSet, $first, $last, $max, $min, $avg, $push, $sum
Aggregation framework (Contd.)
• $sort – sorts the result set
db.employee_details.aggregate(
{ $sort : { experience : 1 } }
);
This sorts the result set based on the experience.
Aggregation frameworkMore Examples
SELECT
SUM(resources_allocated) AS
total_resources
FROM employee_details
db. employee_details.aggregate(
[
{ $group: { _id: null,
total_resources: { $sum:
"$resources_allocated" } } }
] )
Sum the resources_allocated field
from employee_details
Aggregation frameworkMore Examples – Contd.
Indexing
Indexing
• Performance can be increased by proper implementation of Indexes
• Indexes increases the speed of read operations
• Index can be created on any field using the following syntax
– db.collection_name.ensureIndex({key:1})
– ‘1’ represents ascending Index and ‘-1’ represents descending Index
• Index can be dropped by
– db.collection_name.dropIndex({key:1})
• Indexes are auto updated after every insert
Indexing
• ensureIndex() method can have an optional second parameter
• Few values which it can take
– {unique: true} : to create a unique index
– {background: true} : the system does not wait for the index to be
created. Index will be created in the background
– {sparse: true} : will create indexes only on those documents that has
the indexed field in it
– {dropDups: true} : will delete those documents that has duplicated
values for the indexed fields
Indexing
• db.collection_name.getIndexes() will get all the Indexes created on the particular
collection
• db.collection_name.reIndex() rebuilds all the indexes on the particular collection
• db.collection_name.totalIndexSize() will give the total size in bytes of all the
indexes
Index Types
• _id Index
– _id index is a unique index on the _id field
– MongoDB creates this index by default on all collections
– Cannot delete the index on _id.
• Secondary Indexes
– All indexes in MongoDB are secondary indexes
– Can create indexes on any field within any document or sub-document
– It can be Indexes on Sub-documents, Embedded Fields or Compound Indexes
Backup and Restore
Backup and Restore
• Backups
– Backups of the databases can be created by instantiating the
mongodump application (present in the bin folder)
– The syntax is
mongodump --out path_to_store_backup
– It can also be customized to backup a particular database or collection
mongodump --out path_to_store_backup --db
database_name --collection collection_name
– To restore a backup, mongorestore application have to be instantiated
mongorestore --collection collection_name --db
database_name
path_to_the_backup\collection_name.bson
Backup and Restore - Cont
• MongoDB export
– To export a collection from the server to local machine to a json
or csv file, ‘mongoexport’ application can be used
• mongoexport --db database_name --collection collection_name -
-out path_to_json\file_name.json
• mongoexport --db database_name --collection collection_name -
-csv --out path_to_json\file_name.csv --fields
field_name1,field_name2
Backup and Restore – Contd.
• MongoDB import
– To import data from a ‘json’ or ‘csv’ file to a collection,
‘mongoimport’ application can be used
• mongoimport --db dest_database_name --collection
dest_collection_name path_to_input_json
• mongoimport --type csv --db dest_database_name --collection
dest_collection_name path_to_input_csv --fields
new_field_name1,new_field_name2
Summary
• Querying Mongo
• Aggregation
• Indexing
• Backup and Restore
• Import and Export Data
• Mongo tools
References
• www.mongodb.org/
• Karl Seguin, “The Little MongoDB Book”
• Kristina Chodorow & Michael Dirolf, “MongoDB: The Definitive Guide”, O'Reilly
Media, 2010
• www.mkyong.com/tutorials/java-mongodb-tutorials/
Thank You
•ER/CORP/CRS/ERCLD0008/003