29
Solutions Architect, MongoDB Marc Schwering #MongoDBBasics @MongoDB @m4rcsch Applikationsentwicklung mit MongoDB Reporting & Aggregation

Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

  • Upload
    mongodb

  • View
    307

  • Download
    0

Embed Size (px)

DESCRIPTION

Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation

Citation preview

Page 1: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Solutions Architect, MongoDB

Marc Schwering

#MongoDBBasics @MongoDB @m4rcsch

Applikationsentwicklung mit MongoDB

Reporting & Aggregation

Page 2: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

2

• Recap from last session

• Reporting / Analytics options

• Map Reduce

• Aggregation Framework introduction– Aggregation explain

• mycms application reports

• Geospatial with Aggregation Framework

• Text Search with Aggregation Framework

Agenda

Page 3: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

3

• Virtual Genius Bar

– Use the chat to post questions

– EMEA Solution Architecture / Support team are on hand

– Make use of them during the sessions!!!

Q & A

Page 4: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Recap from last time….

Page 5: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

5

Indexing

• Indexes• Multikey, compound,

‘dot.notation’

• Covered, sorting

• Text, GeoSpatial

• Btrees

>db.articles.ensureIndex( { author : 1, tags : 1 } )

>db.user.find({user:"danr"}, {_id:0, password:1})

>db.articles.ensureIndex( { location: “2dsphere” } )

>>db.articles.ensureIndex( { "$**" : “text”,

name : “TextIndex”} )

options db.col.ensureIndex({ key : type})

Page 6: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

6

Index performance / efficiency

• Examine index plans

• Identity slow queries

• n / nscanned ratio

• Which index used.

operators .explain() , db profiler> db.articles.find(

{author:'Dan Roberts’})

.sort({date:-1}).explain()

> db.setProfilingLevel(1, 100){ "was" : 0, "slowms" : 100, "ok" : 1 }

> db.system.profile.find().pretty()

Page 7: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Reporting / Analytics options

Page 8: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

8

• Query Language– Leverage pre aggregated documents

• Aggregation Framework– Calculate new values from the data that we have– For instance : Average views, comments count

• MapReduce– Internal Javascript based implementation– External Hadoop, using the MongoDB connector

• A combination of the above

Access data for reporting, options

Page 9: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

9

• Immediate results– Simple from a query

perspective.

– Interactions collection

Pre Aggregated Reports

{‘_id’ : ObjectId(..),

‘article_id’ : ObjectId(..), ‘section’ : ‘schema’,

‘date’ : ISODate(..),‘daily’: { ‘views’ : 45,

‘comments’ : 150 } ‘hours’ : { 0 : { ‘views’ : 10 }, 1 : { ‘views’ : 2 }, … 23 : { ‘views’ : 14,

‘comments’ : 10 } }}

> db.interactions.find(

{"article_id" : ObjectId(”…..")},{_id:0, hourly:1}

)

Page 10: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

10

• Use query result to display directly in application– Create new REST API

– D3.js library or similar in UI

Pre Aggregated Reports

{"hourly" : {

"0" : {

"view" : 1},"1" : {

"view" : 1},……"22" : {

"view" : 5},"23" : {

"view" : 3}

}}

Page 11: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Map Reduce

Page 12: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

12

• Map Reduce– MongoDB – JavaScript

• Incremental Map Reduce

Map Reduce

//Map Reduce Example> db.articles.mapReduce(

function() { emit(this.author, this.comment_count); },function(key, values) { return Array.sum (values) },{

query : {},out: { merge: "comment_count" }

})

Output

{ "_id" : "Dan Roberts", "value" : 6 }{ "_id" : "Jim Duffy", "value" : 1 }{ "_id" : "Kunal Taneja", "value" : 2 }{ "_id" : "Paul Done", "value" : 2 }

Page 13: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

13

MongoDB – Hadoop Connector

Hadoop Integration

Primary

Secondary

Secondary

HDFS

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

HDFS HDFS HDFS

MapReduce MapReduce MapReduce MapReduce

MongoS MongoSMongoS

Application ApplicationApplication

Application Dash Boards / Reporting

1) Data Flow, Input / Output via Application Tier

Page 14: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Aggregation Framework

Page 15: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

15

• Multi-stage pipeline– Like a unix pipe –

• “ps -ef | grep mongod”

– Aggregate data, Transform documents

– Implemented in the core server

Aggregation Framework

//Find out which are the most popular tags…db.articles.aggregate([

{ $unwind : "$tags" },{ $group : { _id : "$tags" , number : { $sum : 1 } } },{ $sort : { number : -1 } }

])

Output

{ "_id" : "mongodb", "number" : 6 }{ "_id" : "nosql", "number" : 3 }{ "_id" : "database", "number" : 1 }{ "_id" : "aggregation", "number" : 1 }{ "_id" : "node", "number" : 1 }

Page 16: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

16

In our mycms application..

//Our new python [email protected]('/cms/api/v1.0/tag_counts', methods=['GET'])def tag_counts():

pipeline = [ { "$unwind" : "$tags" },{ "$group" : { "_id" : "$tags" ,

"number" : { "$sum" : 1 } } },{ "$sort" : { "number" : -1 } }]

cur = db['articles'].aggregate(pipeline, cursor={})# Check everything okif not cur:

abort(400) # iterate the cursor and add docs to a dict tags = [tag for tag in cur] return jsonify({'tags' : json.dumps(tags, default=json_util.default)})

Page 17: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

17

• Pipeline and Expression operators

Aggregation operators

Pipeline

$match $sort$limit$skip$project$unwind$group$geoNear$text$search

Tip: Other operators for date, time, boolean and string manipulation

Expression

$addToSet

$first$last$max$min$avg$push$sum

Arithmetic

$add$divide$mod$multiply$subtract

Conditional

$cond$ifNull

Variables

$let$map

Page 18: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

18

• What reports and analytics do we need in our application?– Popular Tags– Popular Articles– Popular Locations – integration with Geo Spatial– Average views per hour or day

Application Reports

Page 19: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

19

• Unwind each ‘tags’ array

• Group and count each one, then Sort

• Output to new collection– Query from new collection so don’t need to compute for

every request.

Popular Tags

db.articles.aggregate([{ $unwind : "$tags" },{ $group : { _id : "$tags" , number : { $sum : 1 } } },{ $sort : { number : -1 } },{ $out : "tags"}

])

Page 20: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

20

• Top 5 articles by average daily views– Use the $avg operator – Use use $match to constrain data range

• Utilise with $gt and $lt operators

Popular Articles

db.interactions.aggregate([ {

{$match : { date : { $gt : ISODate("2014-02-

20T00:00:00.000Z")}}},{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},{$sort : { a : -1}},{$limit : 5}

]);

Page 21: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

21

• Use Explain plan to ensure the efficient use of the index when querying.

Aggregation Framework Explain

db.interactions.aggregate([{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},{$sort : { a : -1}},{$limit : 5}

],{explain : true}

);

Page 22: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

22

Explain output…

{"stages" : [

{"$cursor" : { "query" : … }, "fields" : { … },

"plan" : {"cursor" : "BasicCursor","isMultiKey" : false,"scanAndOrder" : false,"allPlans" : [

{"cursor" :

"BasicCursor",

"isMultiKey" : false,

"scanAndOrder" : false}

]}

}},…

"ok" : 1}

Page 23: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Geo Spatial & Text Search Aggregation

Page 24: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

24

• $text operator with aggregation framework– All articles with MongoDB– Group by author, sort by comments count

Text Search

db.articles.aggregate([ { $match: { $text: { $search: "mongodb" } } }, { $group: { _id: "$author", comments:

{ $sum: "$comment_count" } } }{$sort : {comments: -1}},

])

Page 25: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

25

• $geoNear operator with aggregation framework– Again use geo operator in the $match statement.– Group by author, and article count.

Utilise with Geo spatial

db.articles.aggregate([ { $match: { location: { $geoNear :

{ $geometry :{ type: "Point" ,coordinates : [-0.128,

51.507] } }, $maxDistance :5000} }

}, { $group: { _id: "$author", articleCount: { $sum: 1 } } } ])

Page 26: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

Summary

Page 27: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

27

• Aggregating Data…– Map Reduce– Hadoop– Pre-Aggregated Reports– Aggregation Framework

• Tune with Explain plan

• Compute on the fly or Compute and store

• Geospatial

• Text Search

Summary

Page 28: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

28

– MongoDB World Recap!– Preview into the operation Series.

Next Session – 9th July

Page 29: Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation