Upload
anuj-jain
View
349
Download
1
Embed Size (px)
Citation preview
Aggregation Framework in MongoDB
Overview - Part-1
What is Aggregation
English Definition:- The act of gathering something together.
Database Definition:- Aggregation are operations that process on the data sets, group some data records, do some computation on that records and return computed results.
Aggregration Approach in MongoDB
● Aggregation Pipeline
● Map-Reduce
● Single Purpose Aggregation Operations
● Hadoop Connector
Aggregation-Pipeline
Similar to pipeline in UNIX.
In Unix -
Cat file.txt | grep abc | wc -l
In MongoDB -
Group Limit SortCollection Output
MapReduce
● Two Phase process – Mapper & Reducer.● Use JavaScript function for map-reduce
job.● Less efficient as it run over only one thread.● Finalise stage to make final modification.
Aggregarion Vs. MapReduce
Easy to use, just need to use the buildin operators.
Complex and steep learning curve.
Supports non-sharded and sharded input collections.
Supports non-sharded and sharded input collections.
Returns results inline. Return result in inline, new collection, merge, replace, reduce.
Limited to the operators and expressions supported by the aggregation pipeline.
Custom map, reduce and finalize JavaScript functions offer flexibility to aggregation logic.
Single Purpose Aggregation Operation
● Applicable for single purpose of aggregation, like count, distinct, group.
● Limited scope as compared to aggregation pipeline and map-reduce.
● group does not support data in sharded collections, and result of operation should not be more than 16 MB.
Aggregation Pipeline Operators
● $group● $match● $project● $limit● $sort● $unwind● $skip
SQL to Aggregation Mapping Chart
SQL Operators Aggregation Operators
WHERE $match
GROUP BY $group
SELECT $project
LIMIT $limit
ORDER BY $sort
SUM() $sum
COUNT() $sum
JOIN $unwind (Not exact operator as JOIN works, but unwinds the array embedded in the document)
Examples
SQL Query Aggregation Query
SELECT COUNT(*) AS COUNT FROM EMPLOYEE
db.employee.aggregate([{$group: {_id:null, count: { $sum:1} } }])
Explaination : Group by : on nothing Sum : Just add 1 to count field for each record
SELECT SUM(SALARY) AS TOTALFROM EMPLOYEE
db.employee.aggregate([{$group: {_id:null, total: { $sum:”$salary”} } }])
Explaination : Group by : on nothing Sum: do the sum of value of salary field of each doc and put result in total.
Example Cont.
SELECT DEPARTMENT_ID, SUM(SALARY) AS TOTAL FROM EMPLOYEE GROUP BY DEPARTMENT_ID ORDER BY TOTAL
db.employee.aggregate( [{ $group: { _id: “$department_id”, total: { $sum:”$salary” } }, { $sort: { total : 1 } } }] )
Explanation:Group by : department,Sum : salary,Order by : total of salary for each department
Example Cont.
SELECT DEPT_ID, SUM(SALARY) AS TOTAL FROM EMPLOYEE WHERE AGE>25 GROUB BY DEPT_ID, HAVING TOTAL > 5000
db.employee.aggregate( [{ $match : { age : {$gt: 25 } } }{ $group: { _id: “$dept_id”, total: { $sum: ”$salary” } } },{ $match: { total: {$gt:5000 } } }] )
Explanation:Group by : department id,Sum : salary,Having: on total of each department salary
Example Cont.
SELECT DEPT_ID, SUM(SALARY) AS TOTAL FROM EMPLOYEE WHERE AGE>25 GROUB BY DEPT_ID, HAVING TOTAL > 5000
db.employee.aggregate( [{ $match : { age : {$gt: 25 } } }{ $group: { _id: “$dept_id”, total: { $sum: ”$salary” } } },{ $match: { total: {$gt:5000 } } }] )
Explanation:Group by : department id,Sum : salary,Having: on total of each department salary
$unwind Operator
Decompose the embedded array into flat document and relate each entry in the array with outer fields.
Example :-
{ _id: “blog”, tags: [ “social”, “economic” ] }
{ _id: “blog”, { _id: “blog”
tags: “social” } tags: “economic” }
Mongo Aggregation Optimization
MongoDB re-arranged the pipeline operations to optimize the aggregation performance.
● Pipeline Sequence Optimization.● Projection Optimization.
Mongo Aggregation Optimization
MongoDB re-arranged the pipeline operations to optimize the aggregation performance.
● Pipeline Sequence Optimization.● Projection Optimization.
Pipeline Sequence Optimization
● $sort + $skip + $limit
{ $sort: { salary : -1 } }, { $sort: { salary: -1 } },
{ $skip: 10 }, { $limit: 20 },
{ $limit: 10 } { $skip: 10}
Projection Optimization
● $project● Reduce the amount of data passing
through channels of operation and will help in performace improvement.
● Below example will only emit salary.
db.employee.aggregate(
[ {$match: {“name”: “xyz” } }
{ $project: { salary:1, _id:0} } ]
)
Restriction
● Output BSON document cannot exceed the 16 MB of data. If exceed will throw error.
● If single aggregation operation consumes more than 10 percent of system RAM, the operation will produce error.
Aggregation Examples
● Download data from the below link:http://media.mongdb.org/zips.json
● Practice sample problem from the below
links:http://docs.mongodb.org/manual/tutorial/aggregatio-zip-code-data-set/
References
● http://docs.mongodb.org/
Good Examples:-
● http://rubayeet.wordpress.com/2013/12/29/web-analytics -using-mongodb-aggregation-framework/
● http://derickrethans.nl/aggregation-framework.html
● http://architects.dzone.com/articles/using-mongodb -aggregation