View
249
Download
0
Category
Preview:
DESCRIPTION
Aggregation and Indexing in mongdb
Citation preview
Aggregation and Indexing
AggregationAggregations operations process data records and return
computed results. Aggregation operations group values from multiple
documents together, and can perform a variety of operations on the grouped data to return a single result.
Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents.
MongoDB provides three ways to perform aggregation:– Aggregation pipeline– Map-reduce function and – Single purpose aggregation methods and commands
1.Aggregation PipelineThe aggregation pipeline is a framework for performing
aggregation tasks, modeled on the concept of data processing pipelines.
Using this framework, MongoDB passes the documents of a single collection through a pipeline.
The pipeline transforms the documents into aggregated results, and is accessed through the aggregate database command.
1.Aggregation Pipeline Contd…
Pipeline:The MongoDB aggregation pipeline starts with the documents
of a collection and streams the documents from one pipeline operator to the next to process the documents.
Each operator in the pipeline transforms the documents as they pass through the pipeline.
Pipeline operators do not need to produce one output document for every input document.
Operators may generate new documents or filter out documents. Pipeline operators can be repeated in the pipeline.
The db.collection.aggregate() method returns a cursor and can return result sets of any size.
1.Aggregation Pipeline Contd…
1.Aggregation Pipeline Contd…
Pipeline Expressions: Each pipeline operator takes a pipeline expression as its
operand. Pipeline expressions specify the transformation to apply to
the input documents. Expressions have a document structure and can contain
fields, values, and operators.Pipeline expressions can only operate on the current
document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.
1.Aggregation Pipeline Contd…
Pipeline Operators Pipeline operators appear in an array. Documents pass through the
operators in a sequence.
1.Aggregation Pipeline Contd…
Expression Operators
Limitations of Aggregation Pipeline Type Restrictions
Can’t operate on Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.
Result Size Restrictions The aggregate command will produce an error if the result set exceeds the BSON
Document Size limit, which is currently 16 megabytes. The aggregate command can return results as a cursor or store the results in a
collection, which are not subject to the size limit. The db.collection.aggregate() returns a cursor and can return result sets of any
size.
Memory Restrictions Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this
limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable
aggregation pipeline stages to write data to temporary files.
2.Map ReduceMap-reduce is a data processing paradigm for condensing
large volumes of data into useful aggregated results.
For map-reduce operations, MongoDB provides the mapReduce database command.
All map-reduce functions in MongoDB are JavaScript and run within the mongod process.
Map-reduce operations take the documents of a single collection as the input and can perform any arbitrary sorting and limiting before beginning the map stage.
2.Map Reduce contd…
In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs.
For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data.
MongoDB then stores the results in a collection. Optionally, the output of the reduce function may pass through a finalize function to further condense or process the results of the aggregation.
2.Map Reduce contd…
2.Map Reduce contd…
Map-Reduce JavaScript Functions In MongoDB, map-reduce operations use custom
JavaScript functions to map, or associate, values to a key. If a key has multiple values mapped to it, the operation
reduces the values for the key to a single object.The use of custom JavaScript functions provide:flexibility to map-reduce operations.to make final modifications to the results at the end of the
map and reduce operation, such as perform additional calculations.
2.Map Reduce contd…
Map Reduce Concurrency The map-reduce operation is composed of many tasks like: including reads from the input collection, executions of the map function, executions of the reduce function, writes to a temporary collection during processing, and writes to the output
collection.
During the operation, map-reduce takes the following locks: The read phase takes a read lock. It yields every 100 documents. The insert into the temporary collection takes a write lock for a single write. If the output collection does not exist, the creation of the output collection
takes a write lock. If the output collection exists, then the output actions (i.e. merge, replace,
reduce) take a write lock. This write lock is global, and blocks all operations on the mongod instance.
3.Single Purpose Aggregation OperationsFor a number of common single purpose aggregation
operations, MongoDB provides special purpose database commands.
These common aggregation operations are: returning a count of matching documents, Returning the distinct values for a field, and grouping data based on the values of a field.
All of these operations aggregate documents from a single collection.
While these operations provide simple access to common aggregation processes, they lack the flexibility and capabilities of the aggregation pipeline and map-reduce.
3.Single Purpose Aggregation Operations
3.Single Purpose Aggregation OperationsCount
MongoDB can return a count of the number of documents that match a query.
The count command as well as the count() and cursor.count() methods provide access to counts in the mongo shell.
3.Single Purpose Aggregation OperationsExample of Count Given a collection named records with only the following
documents:{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }
db.records.count()The operation would count all documents in the collection and return the number 4.
db.records.count( { a: 1 } )The operation will count only the documents where the value of the field a is 1 and return 3.
3.Single Purpose Aggregation OperationsDistinct
The distinct operation takes a number of documents that match a query and returns all of the unique values for a field in the matching documents.
The distinct command and db.collection.distinct() method provide this operation in the mongo shell.
3.Single Purpose Aggregation OperationsExample of Distinct Given a collection named records with only the following
documents:{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }{ a: 2, b: 2 }
db.collection.distinct() operation which returns the distinct values of the field b:
db.records.distinct( "b" ) The results of this operation would resemble: [ 0, 1, 4, 2 ]
3.Single Purpose Aggregation OperationsGroupThe group operation takes a number of documents that
match a query, and then collects groups of documents based on the value of a field or fields.
It returns an array of documents with computed results for each group of documents.
Access the grouping functionality via the group command or the db.collection.group() method in the mongo shell.
Group does not support data in sharded collections. In addition, the results of the group operation must be no larger than 16 megabytes.
3.Single Purpose Aggregation OperationsExample of Group Given a collection named records with the following
documents:
{ a: 1, count: 4 }{ a: 1, count: 2 }{ a: 1, count: 4 }{ a: 2, count: 3 }{ a: 2, count: 1 }{ a: 1, count: 5 }{ a: 4, count: 4 }
3.Single Purpose Aggregation OperationsExample of GroupGroup operation which groups documents by the field a, where a is less than
3, and sums the field count for each group:db.records.group( {key: { a: 1 },cond: { a: { $lt: 3 } },reduce: function(cur, result) { result.count += cur.count },initial: { count: 0 }} )
The results of this group operation would resemble the following:[{ a: 1, count: 15 },{ a: 2, count: 4 }]
IndexingIndexes provide high performance read operations for
frequently used queries.Indexes support the efficient execution of queries in
MongoDB.Without indexes, MongoDB must scan every document in
a collection to select those documents that match the query statement.
These collection scans are inefficient because they require mongod to process a larger volume of data than an index for each operation.
MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection.
IndexingAll MongoDB collections have an index on the _id field
that exists by default. If applications do not specify a value for _id the driver or the mongod will create an _id field with an ObjectId value.
The _id index is unique, and prevents clients from inserting two documents with the same value for the _id field.
Type of Indexing1. Single Field Indexes
2. Compound Indexes
3. MultiKey Indexes
4. Geospatial Indexes
5. Text Indexes
6. Hashed Indexes
1.Single Field IndexesSingle field index only includes data from a single field of the
documents in a collection.Example: Collection: user
{"_id": ObjectId(...)"name": "John Doe“"address": {"street": "Main","zipcode": "53511","state": "WI"} }
A.Index on Single Field:db.collection_name.ensureIndex( { “field” : 1 } )
B.Index on Embedded Field:db.user.ensureIndex( { "address.zipcode": 1 } )
2.Compound IndexesA Compound index includes more than one field of the
documents in a collection.Single index structure holds references to multiple fields
within a collection’s documents.The order of the fields in a compound index is very important.Example:
{"_id": ObjectId(...),"item": "Banana","category": ["food", "produce", "grocery"],"location": "4th Street Store","stock": 4,"type": "cases",
}Compound Index on item and stock is:
db.products.ensureIndex( { "item": 1, "stock": 1 } )
2.Compound Indexes contd…
Sort Order: The order of the fields in a compound index is very important. Indexes store references to fields in either ascending (1) or
descending (-1) sort order. For single-field indexes, the sort order of keys doesn’t matter
because MongoDB can traverse the index in either direction. However, for compound indexes sort order can matter in
determining whether the index can support a sort operation.
db.events.find().sort( { username: 1, date: -1 } )db.events.find().sort( { username: -1, date: 1 } )
The following index can support both these sort operations:db.events.ensureIndex( { "username" : 1, "date" : -1 } )
3.Multikey IndexesMongoDB uses multikey indexes to index the content stored in
arrays. If you index a field that holds an array value, MongoDB creates
separate index entries for every element of the array. These multikey indexes allow queries to select documents that
contain arrays by matching on element or elements of the arrays.MongoDB automatically determines whether to create a
multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.
db.user.ensureIndex( { “subject.dmsa": 1 } )
4.Text IndexesText indexes supports search of string content in documents.
Text indexes can include any field whose value is a string or an array of string elements. To perform queries that access the text index, use the $text query operator.
To create a text index, use the db.collection.ensureIndex() method.
To index a field that contains a string or an array of string elements, include the field and specify the string literal "text" in the index document.
db.reviews.ensureIndex( { comments: "text" } )
Index PropertiesTTL(Time To Live) Indexes: The TTL index is used for TTL collections, which expire data
after a period of time.
TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time.
This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time.
Index PropertiesUnique Indexes: A unique index causes MongoDB to reject all documents that
contain a duplicate value for the indexed field.
To create a unique index, use the db.collection.ensureIndex() method with the unique option set to true.
By default, unique is false on MongoDB indexes.db.members.ensureIndex( { "user_id": 1 }, { unique: true } )
Drop Duplicates Force MongoDB to create a unique index by deleting documents with
duplicate values when building the index.
db.collection.ensureIndex( { a: 1 }, { unique: true, dropDups: true } )
Index PropertiesSparse Indexes: A sparse index does not index documents that do not have the
indexed field. i.e Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value.
The index is “sparse” because it does not include all documents of a collection.
To create a sparse index, use the db.collection.ensureIndex() method with the sparse option set to true.
db.addresses.ensureIndex( { “user_id": 1 }, { sparse: true } )
Remove IndexTo remove an index from a collection use the dropIndex()
method and the following procedure.
Remove a Specific Index
db.accounts.dropIndex( { “user_id": 1 } )
Remove All Indexes You can also use the db.collection.dropIndexes() to remove all
indexes, except for the _id index from a collection.
Rebuild IndexesIf you need to rebuild indexes for a collection you can use the
db.collection.reIndex() method to rebuild all indexes on a collection in a single operation.
This operation drops all indexes, including the _id and then rebuilds all indexes.
Thank You
Recommended