13
Taming Social Media with MongoDB Danny Holloway [email protected] om June 26, 2012

MongoDC 2012: Taming Social Media with MongoDB

  • Upload
    mongodb

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MongoDC 2012: Taming Social Media with MongoDB

Taming Social Media with MongoDB

Danny [email protected]

June 26, 2012

Page 2: MongoDC 2012: Taming Social Media with MongoDB

2

Overview

• Introduction• Social Media Challenges• MongoDB Setup• Collecting Tweets• Querying Tweets• Accessing the Data• Finding Most Active Tweeter• Lessons Learned• Building an Interface• Demo

Page 3: MongoDC 2012: Taming Social Media with MongoDB

3

Introduction

• Built a tool to collect tweets over Australia and interact with them on a map

• Working at HumanGeo– Building tools and services for geospatial analysis

of Big Data– Using MongoDB for horizontally scalable storage

and geospatial analysis

Page 4: MongoDC 2012: Taming Social Media with MongoDB

4

Social Media Challenges

• No control over data– “Consumers of Tweets should tolerate the addition

of new fields and variance in ordering of fields with ease.” - Twitter

• High Volume– ~17k tweets in a day or 6.2M per year with exact

coordinates in Australia– Record high of >25k tweets per second or >788B

per year around the world - Twitter

Page 5: MongoDC 2012: Taming Social Media with MongoDB

5

MongoDB Setup

• Create database• Create capped collections• Create indexes

Page 6: MongoDC 2012: Taming Social Media with MongoDB

6

Collecting Tweets

• Using tweetstream to collect tweets over Australia from statuses/filter endpoint

• Insert results into collections

Page 7: MongoDC 2012: Taming Social Media with MongoDB

7

Collecting Tweets (cont)

• Augment results for better queries– Twitter provides date strings like "Wed Jun 13

23:17:58 +0000 2012“

Page 8: MongoDC 2012: Taming Social Media with MongoDB

8

Querying Tweets

• Get all of the latest tweets

• Get all the tweets from a user

Page 9: MongoDC 2012: Taming Social Media with MongoDB

9

Querying Tweets (cont)

• Get tweets near a point

• Get tweets within a bounding box

Page 10: MongoDC 2012: Taming Social Media with MongoDB

10

Accessing the Data

• Using Bottle to create a RESTful API

Page 11: MongoDC 2012: Taming Social Media with MongoDB

11

Finding Most Active Tweeter

• Calculate tweet count for each user and return tweets for that user

Page 12: MongoDC 2012: Taming Social Media with MongoDB

12

Lessons Learned

• Use Longitude, Latitude ordering for coordinates

• Default index value range is exclusive of upper bound

• Twitter has bugs too• Making your own maps isn’t hard (it can take

some time)

Page 13: MongoDC 2012: Taming Social Media with MongoDB

13

Building an Interface

• Dust javascript templating library• Leaflet javascript interactive map library• jQuery javascript library• TileStream map tile server