65
Exploring the Aggregation Framework Jason Mimick - Senior Consulting Engineer [email protected] @jmimick Original Slide Credits: Jay Runkel [email protected] et al

Webinar: Exploring the Aggregation Framework

  • Upload
    mongodb

  • View
    2.598

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Webinar: Exploring the Aggregation Framework

Exploring the Aggregation Framework

Jason Mimick - Senior Consulting [email protected] @jmimick

Original Slide Credits:Jay Runkel [email protected] al

Page 2: Webinar: Exploring the Aggregation Framework

2

Warning or WhewThis is a “101” beginner talk!Assuming you know some basics about MongoDBBut basically nothing about the Aggregation Framework

Page 3: Webinar: Exploring the Aggregation Framework

3

Agenda

1. Analytics in MongoDB?2. Aggregation Framework3. Aggregation Framework in Action

– US Census Data– Aggregation Framework Options

4. New 3.2 stuff– Friends of friends $lookup for self-joins

Page 4: Webinar: Exploring the Aggregation Framework

4

Analytics in MongoDB?

CreateReadUpdateDelete

Analytics

?

GroupCountDerive ValuesFilterAverageSort

Page 5: Webinar: Exploring the Aggregation Framework

5

For Example: US Census Data• Census data from 1990, 2000, 2010

• Question: Which US Division has the fastest growing population density?– We only want to include data states with more than 1M people– We only want to include divisions larger than 100K square miles

Division = a group of US StatesPopulation density = Area of division/# of

peopleData is provided at the state level

Page 6: Webinar: Exploring the Aggregation Framework

6

US Regions and Divisions

Page 7: Webinar: Exploring the Aggregation Framework

7

How would we solve this in SQL?

SELECT GROUP BY HAVING

Of course, we don’t have SQL

we’re a noSQL database

Page 8: Webinar: Exploring the Aggregation Framework

8

The Aggregation Framework

Page 9: Webinar: Exploring the Aggregation Framework

9

Core Concept: Pipeline

ps -ef | grep mongod

Page 10: Webinar: Exploring the Aggregation Framework

10

What is the Aggregation Pipeline?A Series of Document Transformations

– Executed in stages– Original input is a collection– Output as a cursor or a collection

Rich Library of Functions– Filter, compute, group, and summarize data– Output of one stage sent to input of next– Operations executed in sequential order

Page 11: Webinar: Exploring the Aggregation Framework

11

An Example Aggregation Pipeline

Page 12: Webinar: Exploring the Aggregation Framework

12

Syntax

>db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ])mongo shell

1 db - variable pointing to current database

2 collection name

3 aggregate - method on collection

4 array of objects, each a pipeline operator

5 pipeline operators

1 2 3 4 ...5...

Page 13: Webinar: Exploring the Aggregation Framework

13

Syntax - Driver - Java

db.hospital.aggregate( [ { "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } },{ "$match" : { "count" : { "$gte" : 5 } } },

{ "$sort" : { "count" : -1 } } ] )

Page 14: Webinar: Exploring the Aggregation Framework

14

Some Popular Pipeline Operators$match Filter documents

$project Reshape documents

$group Summarize documents

$unwind Expand arrays in documents

$sort Order documents

$limit/$skip Paginate documents

$redact Restrict documents

$geoNear Proximity sort documents

$let,$map Define variables

Page 15: Webinar: Exploring the Aggregation Framework

15

80+ operators available as of MongoDB 3.2

Page 16: Webinar: Exploring the Aggregation Framework

Aggregation Framework in Action(let’s play with the census data)

Page 17: Webinar: Exploring the Aggregation Framework

17

cData Collection• Document For Each State

– Name– Region– Division

• Census Data For 1990, 2000, 2010– Population– Housing Units– Occupied Housing Units

• Census Data is an array with three subdocuments

Page 18: Webinar: Exploring the Aggregation Framework

18

Count, Distinct

• Check out cData docs • count()• distinct()

When you starting building your aggregations you need to ‘get to know’ your data!

Page 19: Webinar: Exploring the Aggregation Framework

19

Simple $groupCensus data has a collection called regions> db.regions.findOne(){

"_id" : ObjectId("54d0e1ac28099359f5660f9f"),"state" : "Connecticut","region" : "Northeast","regNum" : 1,"division" : "New England","divNum" : 1

}

How can we find out how many states are in each region?

Page 20: Webinar: Exploring the Aggregation Framework

20

> db.regions.aggregate( [ { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } }

} ] )

{ "_id" : "West", "count" : 13 }{ "_id" : "South", "count" : 17 }{ "_id" : "Midwest", "count" : 12 }{ "_id" : "Northeast", "count" : 9 }

// make more readable - store your pipeline ops in variables>var group = { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } };db.regions.aggregate( [ group ] )

Page 21: Webinar: Exploring the Aggregation Framework

21

$group• Group documents by value

– _id - field reference, object, constant

– Other output fields are computed• $max, $min, $avg, $sum• $addToSet, $push• $first, $last

– Processes all data in memory by default

Page 22: Webinar: Exploring the Aggregation Framework

22

Total US Area

Back to cData…

Can we use $group to find the total area of the US (according to these data)?

Page 23: Webinar: Exploring the Aggregation Framework

23

db.cData.aggregate([{"$group" : {"_id" : null,

"totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"} }}])

{ "_id" : null, "totalArea" : 3802067.0700000003, "avgArea" : 73116.67442307693 }

Page 24: Webinar: Exploring the Aggregation Framework

24

Area By Regiondb.cData.aggregate([

{"$group" : {"_id" : "$region", "totalArea" : {$sum : "$areaM"},

"avgArea" : {$avg : "$areaM"}, "numStates" : {$sum : 1},

"states" : {$push : "$name"}}}]){ "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of Columbia", "Puerto Rico" ] }{ "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states" : [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut", "Massachusetts", "New York" ] }{ "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12, "states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota", "Kansas", "South Dakota", "Michigan", "Nebraska" ] }{ "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13, "states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] }{ "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [ "Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana", "North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ] }

Page 25: Webinar: Exploring the Aggregation Framework

25

Calculating Average State Area By Region

{ $group: { _id: "$region", avgAreaM: {$avg: ”$areaM" }}}

{ _id: ”North East", avgAreaM: 154}

{ _id: “West", avgAreaM: 300}

{ state: ”New York", areaM: 218, region: “North East"}

{ state: ”New Jersey", areaM: 90, region: “North East”}

{ state: “California", areaM: 300, region: “West"}

Page 26: Webinar: Exploring the Aggregation Framework

26

Calculating Total Area and State Count

{ $group: { _id: "$region", totArea: {$sum: ”$areaM" }, sCount : {$sum : 1}}}

{ _id: ”North East", totArea: 308 sCount: 2}

{ _id: “West", totArea: 300, sCount: 1}

{ state: ”New York", areaM: 218, region: “North East"}

{ state: ”New Jersey", areaM: 90, region: “North East”}

{ state: “California", area: 300, region: “West"}

Page 27: Webinar: Exploring the Aggregation Framework

27

Total US Population By Yeardb.cData.aggregate( [{$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {$sum : "$data.totalPop"}}}, {$sort : {"totalPop" : 1}}])

{ "_id" : 1990, "totalPop" : 248709873 }{ "_id" : 2000, "totalPop" : 281421906 }{ "_id" : 2010, "totalPop" : 312471327 }

Page 28: Webinar: Exploring the Aggregation Framework

28

$unwind• Flattens arrays• Create documents from array elements

• Array replaced by element value• Missing/empty fields → no output• Non-array fields → error

• Pipe to $group to aggregate{ "a" : "foo", "b" : [1, 2, 3] }

{ "a" : "foo", "b" : 1 }{ "a" : "foo", "b" : 2 }{ "a" : "foo", "b" : 3 }

Page 29: Webinar: Exploring the Aggregation Framework

29

$unwind{ $unwind: $census }

{ state: “New York, census: 1990}

{ state: ”New York", census: [1990, 2000, 2010]}

{ state: ”New Jersey", census: [1990, 2000]}

{ state: “California", census: [1980, 1990, 2000, 2010]}{ state: ”Delaware", census: [1990, 2000]}

{ state: “New York, census: 2000}

{ state: “New York, census: 2010}

{ state: “New Jersey, census: 1990}

{ state: “New Jersey, census: 2000}

Page 30: Webinar: Exploring the Aggregation Framework

30

Southern State Population By Yeardb.cData.aggregate( [{$match : {"region" : "South"}}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop” : {"$sum” :

"$data.totalPop"}}}])

{ "_id" : 2010, "totalPop" : 113954021 }{ "_id" : 2000, "totalPop" : 99664761 }{ "_id" : 1990, "totalPop" : 84839030 }

Page 31: Webinar: Exploring the Aggregation Framework

31

$match

• Filter documents–Uses existing query syntax

Page 32: Webinar: Exploring the Aggregation Framework

32

$match{ $match: { “region” : “West” }}

{ state: ”New York", areaM: 218, region: “North East"}

{ state: ”Oregon", areaM: 245, region: “West”}

{ state: “California", area: 300, region: “West"}

{ state: ”Oregon", areaM: 245, region: “West”}

{ state: “California", area: 300, region: “West"}

Page 33: Webinar: Exploring the Aggregation Framework

33

Population Delta By State from 1990 to 2010

db.cData.aggregate([{$unwind : "$data"},

{$sort : {"data.year" : 1}},{$group :{"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}}}, {$project : {"_id" : 0, "name" : "$_id",

"delta" : {"$subtract" : ["$pop2010", "$pop1990"]}, "pop1990" : 1,

"pop2010” : 1} }])

Page 34: Webinar: Exploring the Aggregation Framework

34

{ "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 }{ "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 }{ "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 }{ "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 }{ "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 }{ "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 }{ "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 }{ "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 }{ "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 }{ "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 }{ "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218 }

{ "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 }{ "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204 }

{ "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" : 2906846 }

{ "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 }{ "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 }{ "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935 }

{ "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 }{ "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736 }

{ "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }

Page 35: Webinar: Exploring the Aggregation Framework

35

$sort, $limit, $skip• Sort documents by one or more

fields– Same order syntax as cursors– Waits for earlier pipeline operator to

return– In-memory unless early and indexed

• Limit and skip follow cursor behavior

Page 36: Webinar: Exploring the Aggregation Framework

36

$first, $last

• Collection operations like $push and $addToSet

• Must be used in $group• $first and $last determined by document

order• Typically used with $sort to ensure ordering is

known

Page 37: Webinar: Exploring the Aggregation Framework

37

$project• Reshape/Transform Documents

– Include, exclude or rename fields– Inject computed fields– Create sub-document fields

Page 38: Webinar: Exploring the Aggregation Framework

38

Including and Excluding Fields{ $project: { “_id” : 0, “pop1990” : 1, “pop2010” : 1}

{ "_id" : "Virginia”, "pop1990" : 453588, "pop2010" : 3725789}

{ "_id" : "South Dakota", "pop1990" : 453588, "pop2010" : 3725789}

{ "pop1990" : 453588, "pop2010" : 3725789}

{ "pop1990" : 453588, "pop2010" : 3725789}

Page 39: Webinar: Exploring the Aggregation Framework

39

{ ”name" : “South Dakota”, ”delta" : 118176}

Renaming and Computing Fields{ $project: { “_id” : 0, “pop1990” : 0, “pop2010” : 0, “name” : “$_id”, "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}}}

{ "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024}

{ "_id" : "South Dakota", "pop1990" : 696004, "pop2010" : 814180}

{ ”name" : “Virginia”, ”delta" : 1813666}

Page 40: Webinar: Exploring the Aggregation Framework

40

Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010

Page 41: Webinar: Exploring the Aggregation Framework

41

Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010

db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}])

Page 42: Webinar: Exploring the Aggregation Framework

42

{ "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }

{ "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }

{ "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }

Page 43: Webinar: Exploring the Aggregation Framework

43

$geoNear

• Order/Filter Documents by Location– Requires a geospatial index– Output includes physical distance– Must be first aggregation stage

Page 44: Webinar: Exploring the Aggregation Framework

44

{ "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” :

[86.6, 37.8]}}

{ "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024, “center” : {“type” : “Point”, “coordinates” :

[78.6, 37.5]}}

$geoNear{$geoNear : { "near”: {"type”: "Point", "coordinates”: [90, 35]}, maxDistance : 500000, spherical : true }}

{ "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” :

[86.6, 37.8]}}

Page 45: Webinar: Exploring the Aggregation Framework

45

What if I want to save the results to a collection?

db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}, {$out : “peopleNearMemphis”}])

Page 46: Webinar: Exploring the Aggregation Framework

46

$out

db.cData.aggregate([ <pipeline stages>, {“$out” :“resultsCollection”}])

• Save aggregation results to a new collection• NOTE: Overwrites any data existing in collection• Transform documents - ETL

Page 47: Webinar: Exploring the Aggregation Framework

47

Back To The Original Question

• Which US Division has the fastest growing population density?– We only want to include data states with more than 1M people– We only want to include divisions larger than 100K square miles

Page 48: Webinar: Exploring the Aggregation Framework

48

Division with Fastest Growing Pop Density

db.cData.aggregate( [{$match : {"data.totalPop" : {"$gt" : 1000000}}}, {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group : {"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"},

"pop2010" : {"$last" : "$data.totalPop"}, "areaM" : {"$first" : "$areaM"}, "division" : {"$first" : "$division"}}}, {$group : {"_id" : "$division", "totalPop1990" : {"$sum" : "$pop1990"}, "totalPop2010" : {"$sum" : "$pop2010"},

"totalAreaM" : {"$sum" : "$areaM"}}}, {$match : {"totalAreaM" : {"$gt" : 100000}}}, {$project : {"_id" : 0, "division" : "$_id", "density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]}, "density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]}, "denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]}, "totalAreaM" : 1, "totalPop1990" : 1, "totalPop2010" : 1}}, {$sort : {"denDelta" : -1}}])

Page 49: Webinar: Exploring the Aggregation Framework

49

{ "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997, "division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" : 200.6566049221612, "denDelta" : 55.03359806413451 }

{ "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995, "division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658, "denDelta" : 30.765371019911385 }

{ "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" : "Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674, "denDelta" : 29.90973998350529 }

{ "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" : "West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626, "denDelta" : 21.716845736155996 }

{ "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" : "East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348, "denDelta" : 17.754371635499567 }

{ "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" : "East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866, "denDelta" : 14.641944911508176 }

{ "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" : "Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta" : 13.101876233449651 }

{ "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" : "West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449, "denDelta" : 7.230812757118798 }

Page 50: Webinar: Exploring the Aggregation Framework

Aggregate Options

Page 51: Webinar: Exploring the Aggregation Framework

51

Aggregate optionsdb.cData.aggregate([<pipeline stages>], {‘explain’ : false 'allowDiskUse' : true, 'cursor' : {'batchSize' : 5}})

explain – similar to find().explain()allowDiskUse – enable use of disk to store intermediate resultscursor – specify the size of the initial result

Page 52: Webinar: Exploring the Aggregation Framework

New things in 3.2

Page 53: Webinar: Exploring the Aggregation Framework

53

$sample

{ $sample: { size: <positive integer> } }

● If WT - pseudo-random cursor to return docs

● If MMAPv1 - uses _id index to randomly select docs

Used by Compass, Useful for unit tests, etc

Page 54: Webinar: Exploring the Aggregation Framework

54

$lookup• Performs a left outer join to another collection in the same database to filter in

documents from the “joined” collection for processing.• To each input document, the $lookup stage adds a new array field whose

elements are the matching documents from the “joined” collection.

{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> }}

CANNOT BE SHARDED

https://docs.mongodb.org/master/reference/operator/aggregation/lookup/

Page 55: Webinar: Exploring the Aggregation Framework

55

• Sample data:> db.data.find(){ "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 }{ "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 }{ "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 }{ "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 }

> db.keys.find(){ "_id" : 0, "name" : "East Meter" }{ "_id" : 1, "name" : "Central Meter 12" }{ "_id" : 2, "name" : "New HIFI Monitor" }

Page 56: Webinar: Exploring the Aggregation Framework

56

• Try to find ave “v” value but lookup name of “k”db.data.aggregate( [ { "$lookup" : { "from" : "keys", "localField" : "k", "foreignField" : "_id", "as" : "name" } }, { "$unwind" : "$name" }, { "$project" : { "k" : "$k", "name" : "$name.name", "v" : "$v" } }, { "$group" : { "_id" : "$name", "aveValue" : { "$avg" : "$v" } } }, { "$project" : { "_id" : 0, "name" : "$_id", "aveValue" : "$aveValue" } }]);

{ "aveValue" : 277.5, "name" : "New HIFI Monitor"}{ "aveValue" : 559, "name" : "Central Meter 12"}{ "aveValue" : 391, "name" : "East Meter"}

Page 57: Webinar: Exploring the Aggregation Framework

57

friends of friends

Use $lookup to perform "self-joins" for graph problems.Simple case: find the friends of someone's friendsCan extend this to find cliques, paths, etc.

Dataset:

{ "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE", "MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA", "DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA", "JOAN", "HIRAM" ] }

{ "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI", "KESHIA" ] }

...

Page 58: Webinar: Exploring the Aggregation Framework

58

Page 59: Webinar: Exploring the Aggregation Framework

59

don't forget your indexes…Running FOF.friendsOfFriends(1)2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: { r: 562 } } } protocol:op_command 48ms

with indexes { "friends" : 1 } & { "name" : 1 }:

2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } protocol:op_command 2ms

Page 60: Webinar: Exploring the Aggregation Framework

60

lots of new mathematical operators

$stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> }$stdDevPop Calculates population standard deviation. { $stdDevPop: <array> }$sqrt Calculates the square root. { $sqrt: <number> }$abs Returns the absolute value of a number. { $abs: <number> }$log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] }$log10 Calculates the log base 10 of a number. { $log10: <number> }$ln Calculates the natural log of a number. { $ln: <number> }$pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] }$exp Raises e to the specified exponent. { $exp: <number> }$trunc Truncates a number to its integer. { $trunc: <number> }$ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>}

$floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}

Page 61: Webinar: Exploring the Aggregation Framework

61

new array operators

$slice Returns a subset of an array.{ $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] }

$arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx> ] }$concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]}$isArray Determines if the operand is an array. { $isArray: [ <expression> ] }$filter Selects a subset of the array based on the condition.

{ $filter: { input: <array>, as: <string>, cond: <expression> }}

Page 62: Webinar: Exploring the Aggregation Framework

Summary

Page 63: Webinar: Exploring the Aggregation Framework

63

Analytics in MongoDB?

CreateReadUpdateDelete

Analytics

?

GroupCountDerive ValuesFilterAverageSort

YES!

Page 64: Webinar: Exploring the Aggregation Framework

64

Framework Use Cases

• Basic aggregation queries

• Ad-hoc reporting

• Real-time analytics

• Visualizing and reshaping data

Page 65: Webinar: Exploring the Aggregation Framework

Questions?

Thanks for attending & happy aggregatingPlease complete survey

[email protected]@jmimick