21
CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis

CS 765 – Fall 2014 Paulo Alexandre Regis Reddit analysis

Embed Size (px)

Citation preview

CS 765 – Fall 2014Paulo Alexandre Regis

Reddit analysis

Outline

• ABOUT REDDIT

• WHY REDDIT

• PREVIOUS WORKS

• INITIAL PROPOSAL

• Q&A

What is reddit?Reddit is an open-source platform that supports the interaction of

communities.

It has been used as news hub, Q&A platform, internet hoax/meme propagatio.

Features• Subreddits

• Voting

• Karma

• Public API

Why reddit?• Growing communities

• Diverse usage

• Open-source platform

• Unexplored opportunities

Why reddit?

The API• Easy to parse, returns JSON objects

• 30 requests per minute limit

• 60 requests per minute if using Oauth

Useful links:• Dev community: http://www.reddit.com/r/redditdev• API documentation: http://www.reddit.com/dev/api

Previous works

• PRAW

• Information and social analysis

• Identifying social roles

• Backbone networks

PRAW• Python Reddit API Wrapper

• Open-source

• Respects Reddit’s guidelines

• Easy integration

• Well documented

Project website: https://praw.readthedocs.org

Information and social analysis of reddit

• Insights on comments section

• Generated 3 social graphs:

– Loose: user A comments on user B establishes an edge– Tight: user A commenting on user B and user B commenting on user A– Strict: user A comments 4 times on user B and vice-versa

Information and social analysis of reddit

Information and social analysis of reddit

• Limited data collection:– Time constraints– 1% (250) of the top subcommunities crawled

• Results:

Identifying social roles in reddit• Identify specific role (answer-person: responds to questions but only in a

few different discussions. i.e. Q&A) in reddit

• Sampled top users from top submissions and targeted communities

• Used PRAW

• Crawler script open-sourcehttps://github.com/cbuntain/redditResponseExtractor

(a) Mike Shuttleworth (Ubuntu) IAmA Q&A(b) Regular user from other subreddit

Using backbone networks to map user interests in social media

• Focus on communities (subreddits)

• Communities linked by users (bipartite graph)

• Small-world (shortest path ~= 3.71)

• Roughly 1/3 of users crawled

• Anonymized data available: http://figshare.com/articles/reddit_user_posting_behavior/874101

Initial proposalAnalyze the influence of social hubs in reddit’s network. Se if high

degree nodes attract more attention from lower degree nodes.

An edge would be formed when both nodes comment in the same post.

The degree of the nodes would be their predefined “karma”. And it could be compared with other ranking algorithms (i.e. PageRank)

Questions?