Unleashing twitter data for fun and insight

Preview:

DESCRIPTION

Matthew Russell, VP Engineering at Digital Reasoning, discusses techniques and results of mining twitter data for fun and insight

Citation preview

Agile Data SolutionsMining the Social Web

Matthew A. Russell

http://linkedin.com/in/ptwobrussell@ptwobrussell

Unleashing Twitter Datafor fun and insight

Happy Groundhog Day!

Mining the Social Web Chapters 1-5

Introduction: Trends, Tweets, and Twitterers

Microformats: Semantic Markup and Common Sense Collide

Mailboxes: Oldies but Goodies

Friends, Followers, and Setwise Operations

Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet

Mining the Social Web Chapters 6-10

LinkedIn: Clustering Your Professional Network For Fun (and Profit?)

Google Buzz: TF-IDF, Cosine Similarity, and Collocations

Blogs et al: Natural Language Processing (and Beyond)

Facebook: The All-In-One Wonder

The Semantic Web: A Cocktail Discussion

O•Trends, Tweets, and Retweet Visualizations

•Friends, Followers, and Setwise Operations

•The Tweet, the Whole Tweet, and Nothing but the Tweet

verview

Insight Matters

•What is @user's potential influence?

•What are @user's passions right now?

•Who are @user's most trusted friends?

Agile Data SolutionsMining the Social Web

Part 1:Tweets, Trends, and Retweet

Visualizations

A point to ponder:Twitter : Data :: JavaScript : Programming Languages (???)

Agile Data SolutionsMining the Social Web

Getting Ready To Code

Python Installation

•Mac users already have it

•Linux users probably have it

•Windows users should grab ActivePython

easy_install

•Installs packages from PyPI

•Get it:

•http://pypi.python.org/pypi/setuptools

•Ships with ActivePython

•It really is easy:

easy_install twitter

easy_install nltk

easy_install networkx

Git It?

•http://github.com/ptwobrussell/Mining-the-Social-Web

•git clone git://github.com/ptwobrussell/Mining-the-Social-Web.git

•introduction__*.py

•friends_followers__*.py

•the_tweet__*.py

Agile Data SolutionsMining the Social Web

Getting Data

Twitter Data Sources

•Twitter API Resources

•GNIP

•Infochimps

•Library of Congress

>>> import twitter # Remember to "easy_install twitter">>> twitter_search = twitter.Twitter(domain="search.twitter.com") >>> trends = twitter_search.trends() >>> [ trend['name'] for trend in trends['trends'] ]

[u'#ZodiacFacts', u'#nowplaying', u'#ItsOverWhen', u'#Christoferdrew', u'Justin Bieber', u'#WhatwouldItBeLike', u'#Sagittarius', u'SNL', u'#SurveySays', u'#iDoit2']

Trending Topics

Search Results

>>> search_results = [] >>> for page in range(1,6): ... search_results.append(twitter_search.search(q="SNL",rpp=100, page=page))

Search Results (continued)

>>> import json >>> print json.dumps(search_results, sort_keys=True, indent=1) [ { "completed_in": 0.088122000000000006, "max_id": 11966285265, "next_page": "?page=2&max_id=11966285265&rpp=100&q=SNL", "page": 1, "query": "SNL", "refresh_url": "?since_id=11966285265&q=SNL",

...more...

Search Results (continued)

"results": [ { "created_at": "Sun, 11 Apr 2010 01:34:52 +0000", "from_user": "bieber_luv2", "from_user_id": 106998169, "geo": null, "id": 11966285265, "iso_language_code": "en", "metadata": { "result_type": "recent" }, ...more...

"profile_image_url": "http://a1.twimg.com/profile_images/80...", "source": "<a href="http://twitter.com/&quo...", "text": "im nt gonna go to sleep happy unless i see ...", "to_user_id": null } ... output truncated - 99 more tweets ... ], "results_per_page": 100, "since_id": 0 }, ... output truncated - 4 more pages ... ]

Search Results (continued)

•Ratio of unique terms to total terms

•A measure of "stickiness"?

•A measure of "group think"?

•A crude indicator of retweets to originally authored tweets?

Lexical Diversity

>>> # search_results is already defined

>>> tweets = [ r['text'] \ ... for result in search_results \ ... for r in result['results'] ]

>>> words = []

>>> for t in tweets: ... words += [ w for w in t.split() ] ...

Distilling Tweet Text

Agile Data SolutionsMining the Social Web

Analyzing Data

>>> len(words)7238

>>> # unique words>>> len(set(words)) 1636

>>> # lexical diversity>>> 1.0*len(set(words))/len(words) 0.22602928985907708

>>> # average number of words per tweet>>> 1.0*sum([ len(t.split()) for t in tweets ])/len(tweets)14.476000000000001

Lexical Diversity

Size Frequency Matters

•Counting: always the first step

•Simple but effective

•NLTK saves us a little trouble

Frequency Analysis>>> import nltk >>> freq_dist = nltk.FreqDist(words)>>> freq_dist.keys()[:50] #50 most frequent tokens

[u'snl', u'on', u'rt', u'is', u'to', u'i', u'watch', u'justin', u'@justinbieber', u'be', u'the', u'tonight', u'gonna', u'at', u'in', u'bieber', u'and', u'you', u'watching', u'tina', u'for', u'a', u'wait', u'fey', u'of', u'@justinbieber:', u'if', u'with', u'so', u"can't", u'who', u'great', u'it', u'going', u'im', u':)', u'snl...', u'2nite...', u'are', u'cant', u'dress', u'rehearsal', u'see', u'that', u'what', u'but', u'tonight!', u':d', u'2', u'will']

Frequency Visualization

Tweet and RT were sitting on a fence. Tweet fell off. Who was left?

RTs: past, present, & future

•Retweet: Tweeting a tweet that's already been tweeted

•RT or via followed by @mention

•Example: RT @SocialWebMining Justin Bieber is on SNL 2nite. w00t?!?

•Relatively new APIs were rolled out last year for retweeting sans

conventions

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two

problems. -- Jamie Zawinski

Parsing Retweets>>> example_tweets = ["Visualize Twitter search results w/ this simple script http://bit.ly/cBu0l4 - Gist instructions http://bit.ly/9SZ2kb (via @SocialWebMining @ptwobrussell)"]

>>> import re >>> rt_patterns = re.compile(r"(RT|via)((?:\b\W*@\w+)+)", \... re.IGNORECASE) >>> rt_origins = []>>> for t in example_tweets: ... try:... rt_origins += [mention.strip() \... for mention in rt_patterns.findall(t)[0][1].split()]... except IndexError, e:... pass

>>> [rto.strip("@") for rto in rt_origins]

Agile Data SolutionsMining the Social Web

Visualizing Data

Graph Construction

>>> import networkx as nx>>> g = nx.DiGraph()>>> g.add_edge("@SocialWebMining", "@ptwobrussell", \... {"tweet_id" : 4815162342},)

Writing out DOT OUT_FILE = "out_file.dot"

try: nx.drawing.write_dot(g, OUT_FILE)except ImportError, e: dot = ['"%s" -> "%s" [tweet_id=%s]' % \ (n1, n2, g[n1][n2]['tweet_id']) for n1, n2 in g.edges()]

f = open(OUT_FILE, 'w') f.write('strict digraph {\n%s\n}' % (';\n'.join(dot),)) f.close()

Example DOT Language

strict digraph { "@ericastolte" -> "bonitasworld" [tweet_id=11965974697]; "@mpcoelho" ->"Lil_Amaral" [tweet_id=11965954427]; "@BieberBelle123" -> "BELIEBE4EVER" [tweet_id=11966261062]; "@BieberBelle123" -> "sabrina9451" [tweet_id=11966197327]; }

DOT to Image

•Download Graphviz: http://www.graphviz.org/

•$ dot -Tpng out_file.dot > graph.png•Windows users might prefer GVEdit

Graphviz: Extreme Closeup

But you want more sexy?

Mining the Social Web

Protovis: Extreme Closeup

38

It Doesn't Have To Be a Graph

Graph Connectedness

Agile Data SolutionsMining the Social Web

Part 2:Friends, Followers, and Setwise

Operations

Insight Matters

•What is my potential influence?

•Who are the most popular people in my network?

•Who are my mutual friends?

•What common friends/followers do I have with @user?

•Who is not following me back?

•What can I learn from analyzing my friendship cliques?

Agile Data SolutionsMining the Social Web

Getting Data

OAuth (1.0a)import twitterfrom twitter.oauth_dance import oauth_dance

# Get these from http://dev.twitter.com/apps/newconsumer_key, consumer_secret = 'key', 'secret'

(oauth_token, oauth_token_secret) = oauth_dance('MiningTheSocialWeb', consumer_key, consumer_secret)

auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret, consumer_key, consumer_secret)

t = twitter.Twitter(domain='api.twitter.com', auth=auth)

Getting Friendship Data

friend_ids = t.friends.ids(screen_name='timoreilly', cursor=-1)follower_ids = t.followers.ids(screen_name='timoreilly', cursor=-1)

# store the data somewhere...

Perspective: Fetching all of Lady Gaga's ~7M followers would take ~4 hours

But there's always a catch...

Rate Limits•350 requests/hr for authenticated requests

•150 requests/hr for anonymous requests

•Coping mechanisms:

•Caching & Archiving Data

•Streaming API

•HTTP 400 codes

•See http://dev.twitter.com/pages/rate-limiting

The Beloved Fail Whale

•Twitter is sometimes "overcapacity"

•HTTP 503 Error

•Handle it just as any other HTTP error

•RESTfulness has its advantages

Abstraction Helps

friend_ids = []wait_period = 2 # secscursor = -1

while cursor != 0: response = makeTwitterRequest(t, # twitter.Twitter instance t.friends.ids, screen_name=screen_name, cursor=cursor)

friend_ids += response['ids'] cursor = response['next_cursor']

# break out of loop early if you don't need all ids

Abstracting Abstractions

screen_name = 'timoreilly'

# This is what you ultimately want...

friend_ids = getFriends(screen_name)follower_ids = getFollowers(screen_name)

Agile Data SolutionsMining the Social Web

Storing Data

Flat Files?

./ screen_name1/ friend_ids.json follower_ids.json user_info.json

screen_name2/ ...

...

Pickles?

import cPickle

o = { 'friend_ids' : friend_ids, 'follower_ids' : follower_ids, 'user_info' : user_info}

f = open('screen_name1.pickle, 'wb')cPickle.dump(o, f)f.close()

A relational database?import sqlite3 as sqlite

conn = sqlite.connect('data.db')c = conn.cursor()

c.execute('''create table friends...''')

c.execute('''insert into friends... ''')

# Lots of fun...sigh...

import redis

r = redis.Redis()

[ r.sadd("timoreilly$friend_ids", i) for i in friend_ids ]

r.smembers("timoreilly$friend_ids") # returns a set

Redis (A Data Structures Server)

Windows binary: http://code.google.com/p/servicestack/wiki/RedisWindowsDownload

Project page: http://redis.io

Redis Set Operations

•Key/value store...on typed values!

•Common set operations

•smembers, scard

•sinter, sdiff, sunion

•sadd, srem, etc.

•See http://code.google.com/p/redis/wiki/CommandReference

•Don't forget to $ easy_install redis

Agile Data SolutionsMining the Social Web

Analyzing Data

Setwise Operations

•Union

•Intersection

•Difference

•Complement

Venn Diagrams

Friends

Followers

Friends - Followers

Friends Followers

Followers - Friends

U

Count Your Blessings# A utility functiondef getRedisIdByScreenName(screen_name, key_name): return 'screen_name$' + screen_name + '$' + key_name

# Number of friendsn_friends = r.scard(getRedisIdByScreenName(screen_name, 'friend_ids'))

# Number of followersn_followers = r.scard(getRedisIdByScreenName(screen_name, 'follower_ids'))

Asymmetric Relationships

# Friends who aren't following backfriends_diff_followers = r.sdiffstore('temp', [ getRedisIdByScreenName(screen_name, 'friend_ids'), getRedisIdByScreenName(screen_name, 'follower_ids') ]) # ... compute interesting things ...r.delete('temp')

Asymmetric Relationships

# Followers who aren't friended followers_diff_friends = r.sdiffstore('temp', [ getRedisIdByScreenName(screen_name, 'follower_ids'), getRedisIdByScreenName(screen_name, 'friend_ids') ]) # ... compute interesting things ...r.delete('temp')

Symmetric Relationships

mutual_friends = r.sinterstore('temp', [ getRedisIdByScreenName(screen_name, 'follower_ids'), getRedisIdByScreenName(screen_name, 'friend_ids') ]) # ... compute interesting things ...r.delete('temp')

Sample Output

timoreilly is following 663

timoreilly is being followed by 1,423,704

131 of 663 are not following timoreilly back

1,423,172 of 1,423,704 are not being followed back by timoreilly

timoreilly has 532 mutual friends

Who Isn't Following Back?user_ids = [ ... ] # Resolve these to user info objects

while len(user_ids) > 0: user_ids_str, = ','.join([ str(i) for i in user_ids[:100] ]) user_ids = user_ids[100:]

response = t.users.lookup(user_id=user_ids)

if type(response) is dict: response = [response] r.mset(dict([(getRedisIdByUserId(resp['id'], 'info.json'), json.dumps(resp)) for resp in response]))

r.mset(dict([(getRedisIdByScreenName(resp['screen_name'],'info.json'), json.dumps(resp)) for resp in response]))

Friends in Common

# Assume we've harvested friends/followers and it's in Redis...screen_names = ['timoreilly', 'mikeloukides']

r.sinterstore('temp$friends_in_common', [getRedisIdByScreenName(screen_name, 'friend_ids') for screen_name in screen_names])

r.sinterstore('temp$followers_in_common', [getRedisIdByScreenName(screen_name,'follower_ids') for screen_name in screen_names])

# Manipulate the sets

Potential Influence

•My followers?

•My followers' followers?

•My followers' followers' followers?

•for n in range(1, 7): # 6 degrees? print "My " + "followers' "*n + "followers?"

Saving a Thousand Words...

1

2 3

4 5 6 7

8 9 10 11 12 13 14 15{Depth = 3

BranchingFactor = 2

Same Data, Different Layout

1

2

3

4 5

6 7

4 5

8

9 10

11

12

13

12

14

15

Space Complexity

1 2 3 4 52 3 7 15 31 633 4 13 40 121 3644 5 21 85 341 13655 6 31 156 781 39066 7 43 259 1555 9331

Depth

BranchingFactor

Breadth-First TraversalCreate an empty graph Create an empty queue to keep track of unprocessed nodes

Add the starting point to the graph as the "root node" Add the root node to a queue for processing

Repeat until some maximum depth is reached or the queue is empty: Remove a node from queue For each of the node's neighbors: If the neighbor hasn't already been processed: Add it to the graph Add it to the queue Add an edge to the graph connecting the node & its neighbor

Breadth-First Harvest

next_queue = [ 'timoreilly' ] # seed noded = 1

while d < depth: d += 1 queue, next_queue = next_queue, [] for screen_name in queue: follower_ids = getFollowers(screen_name=screen_name) next_queue += follower_ids getUserInfo(user_ids=next_queue)

The Most Popular Followers

freqs = {} for follower in followers: cnt = follower['followers_count'] if not freqs.has_key(cnt): freqs[cnt] = []

freqs[cnt].append({'screen_name': follower['screen_name'], 'user_id': f['id']})

popular_followers = sorted(freqs, reverse=True)[:100]

Average # of Followers

all_freqs = [k for k in keys for user in freqs[k]] avg = sum(all_freqs) / len(all_freqs)

@timoreilly's Popular Followers

The top 10 followers from the sample:

aplusk 4,993,072 BarackObama 4,114,901 mashable 2,014,615 MarthaStewart 1,932,321 Schwarzenegger 1,705,177 zappos 1,689,289 Veronica 1,612,827 jack 1,592,004 stephenfry 1,531,813 davos 1,522,621

Futzing the Numbers

•The average number of timoreilly's followers' followers: 445

•Discarding the top 10 lowers the average to around 300

•Discarding any follower with less than 10 followers of their

own increases the average to over 1,000!

•Doing both brings the average to around 800

The Right Tool For the Job:NetworkX for Networks

Friendship Graphs

for i in ids: #ids is timoreilly's id along with friend ids info = json.loads(r.get(getRedisIdByUserId(i, 'info.json'))) screen_name = info['screen_name'] friend_ids = list(r.smembers(getRedisIdByScreenName(screen_name, 'friend_ids'))) for friend_id in [fid for fid in friend_ids if fid in ids]: friend_info = json.loads(r.get(getRedisIdByUserId(friend_id, 'info.json'))) g.add_edge(screen_name, friend_info['screen_name'])

nx.write_gpickle(g, 'timoreilly.gpickle') # see also nx.read_gpickle

Clique Analysis

•Cliques

•Maximum Cliques

•Maximal Cliques

http://en.wikipedia.org/wiki/Clique_problem

Calculating Cliquescliques = [c for c in nx.find_cliques(g)]

num_cliques = len(cliques) clique_sizes = [len(c) for c in cliques]

max_clique_size = max(clique_sizes) avg_clique_size = sum(clique_sizes) / num_cliques max_cliques = [c for c in cliques if len(c) == max_clique_size] num_max_cliques = len(max_cliques)

people_in_every_max_clique = list(reduce( lambda x, y: x.intersection(y),[set(c) for c in max_cliques]))

Cliques for @timoreilly

Num cliques: 762573 Avg clique size: 14 Max clique size: 26 Num max cliques: 6Num people in every max clique: 20

Agile Data SolutionsMining the Social Web

Visualizing Data

Graphs, etc

•Your first instinct is naturally

G = (V, E) ?

Dorling Cartogram

•A location-aware bubble chart (ish)

•At least 3-dimensional

•Position, color, size

•Look at friends/followers by state

Sunburst of Friends

•A very compact visualization

•Slice and dice friends/followers by

gender, country, locale, etc.

Agile Data SolutionsMining the Social Web

Part 3:The Tweet, the Whole Tweet, and

Nothing but the Tweet

Insight Matters

•Which entities frequently appear in @user's tweets?

•How often does @user talk about specific friends?

•Who does @user retweet most frequently?

•How frequently is @user retweeted (by anyone)?

•How many #hashtags are usually in @user's tweets?

Pen : Sword :: Tweet : Machine Gun (?!?)

Mining the Social Web

Getting Data

Let me count the APIs...

•Timelines

•Tweets

•Favorites

•Direct Messages

•Streams

Anatomy of a Tweet (1/2){ "created_at" : "Thu Jun 24 14:21:11 +0000 2010", "id" : 16932571217, "text" : "Great idea from @crowdflower: Crowdsourcing ... #opengov", "user" : { "description" : "Founder and CEO, O'Reilly Media. Watching the alpha geeks...", "id" : 2384071, "location" : "Sebastopol, CA", "name" : "Tim O'Reilly", "screen_name" : "timoreilly", "url" : "http://radar.oreilly.com" },

...

Anatomy of a Tweet (2/2)

...

"entities" : { "hashtags" : [ {"indices" : [ 97, 103 ], "text" : "gov20"}, {"indices" : [ 104, 112 ], "text" : "opengov"} ],

"urls" : [{"expanded_url" : null, "indices" : [ 76, 96 ], "url" : "http://bit.ly/9o4uoG"} ], "user_mentions" : [{"id" : 28165790, "indices" : [ 16, 28 ], "name" : "crowdFlower","screen_name" : "crowdFlower"}] } }

Entities & Annotations

•Entities

•Opt-in now but will "soon" be standard

• $ easy_install twitter_text

•Annotations

•User-defined metadata

•See http://dev.twitter.com/pages/annotations_overview

Manual Entity Extraction

import twitter_text

extractor = twitter_text.Extractor(tweet['text'])

mentions = extractor.extract_mentioned_screen_names_with_indices()hashtags = extractor.extract_hashtags_with_indices()urls = extractor.extract_urls_with_indices()

# Splice info into a tweet object

Mining the Social Web

Storing Data

•Flat files? (Really, who does that?)

•A relational database?

•Redis?

•CouchDB (Relax...?)

Storing Tweets

CouchDB: Relax

•Document-oriented key/value

•Map/Reduce

•RESTful API

•Erlang

As easy as sitting on the couch

•Get it - http://www.couchone.com/get

• Install it

•Relax - http://localhost:5984/_utils/

•Also - $ easy_install couchdb

Storing Timeline Dataimport couchdbimport twitter

TIMELINE_NAME = "user" # or "home" or "public"

t = twitter.Twitter(domain='api.twitter.com', api_version='1)

server = couchdb.Server('http://localhost:5984')db = server.create(DB)

page_num = 1while page_num <= MAX_PAGES: api_call = getattr(t.statuses, TIMELINE_NAME + '_timeline') tweets = makeTwitterRequest(t, api_call, page=page_num) db.update(tweets, all_or_nothing=True) print 'Fetched %i tweets' % len(tweets) page_num += 1

Mining the Social Web

Analyzing & Visualizing Data

Approach: Map/Reduce on Tweets

Map/Reduce Paraadigm

•Mapper: yields key/value pairs

•Reducer: operates on keyed mapper output

•Example: Computing the sum of squares

•Mapper Input: (k, [2,4,6])

•Mapper Output: (k, [4,16,36])

•Reducer Input: [(k, 4,16), (k, 36)]

•Reducer Output: 56

Which entities frequently appear in @mention's tweets?

@timoreilly's Tweet Entities

How often does @timoreilly mention specific friends?

Filtering Tweet Entities

•Let's find out how often someone talks about

specific friends

•We have friend info on hand

•We've extracted @mentions from the tweets

•Let's cound friend vs non-friend mentions

@timoreilly's friend mentionsNumber of @user entities in tweets: 20 Number of @user entities in tweets who are friends: 18 ahier pkedrosky CodeforAmerica nytimes brady carlmalamud pahlkadot make jamesoreilly andrewsavikas

Number of user entities in tweets who are not friends: 2 n2vip timoreilly andrewsavikas

gnat slashdot OReillyMedia dalepd mikeloukides monkchips fredwilson digiphile

Who does @timoreilly retweet most frequently?

Counting Retweets

•Map @mentions out of tweets using a regex

•Reduce to sum them up

•Sort the results

•Display results

Retweets by @timoreilly

How frequently is @timoreilly retweeted?

Retweet Counts

•An API resource /statuses/retweet_count exists (and is now functional)

•Example: http://twitter.com/statuses/show/29016139807.json

•retweet_count

•retweeted

Survey Says...@timoreilly is retweeted about 2/3

of the time

How often does @timoreilly include #hashtags in tweets?

Counting Hashtags

•Use a mapper to emit a #hashtag entities for tweets

•Use a reducer to sum them all up

•Been there, done that...

Survey Says...About 1 out of every 3 tweets by

@timoreilly contain #hashtags

Mining the Social Web

But if you order within the next 5 mintues...

Mining the Social Web

Bonus Material:

What do #JustinBieber and #TeaParty have in common?

Tweet Entities

#bieberblast#Eclipse#somebodytolovehttp://bit.ly/aARD4thttp://bit.ly/b2Kc1L#Escutando#justinBieber#Restart#TT#Telezwerge@rheinzeitung#WTF

http://tinyurl.com/343kax4@JustBieberFact@TinselTownDirt#beliebers#BieberFact#Celebrity#Dschungel@_Yassi_#musicmonday#video#tickets

#music@justinbieber#nowplaying#Justinbieber#JUSTINBIEBER#Proformhttp://migre.me/TJwj@ProSieben@lojadoaltivo#JustinBieber#justinbieber

#JustinBieber co-occurrences

@blogging_tories#cdnpoli#fail#nra#roft@BrnEyeSuss@crispix49@koopersmith@Kriskxx#Kagan@Liliaep#nvsen@First_Patriots#patriot#pjtv@andilinks@RonPaulNews#ampats#cnn#jews#GOPDeficit#wethepeople#asamom@thenewdeal#AFIRE#Dems@JIDF

@STOPOBAMA2012@TheFlaCracker#palin2012#AZ#TopProg#conservativehttp://tinyurl.com/386k5hh@ResistTyranny#tsot@ALIPAC#majority#NoAmnesty#patriottweets@Drudge_Report#military#palin12#rnc#TCOThttp://tinyurl.com/24h36zq#spwbt@welshman007#FF#liberty#glennbeck#news#oilspill#rs#Teaparty

#jcot#tweetcongress#Obama#topprog#palin#dems#acon#cspj#immigration#politics#hhrs#TeaParty#vote2010#libertarian#obama#ucot#iamthemob#GOP#tpp#dnc#twisters#sgp#ocra#gop#tlot#p2#tcot#teaparty

#TeaParty co-occurrences

Hashtag Distributions

Hashtag Analysis

•TeaParty: ~ 5 hashtags per tweet.

•Example: “Rarely is the questioned asked: Is our children

learning?” - G.W. Bush #p2 #topprog #tcot #tlot #teaparty

#GOP #FF

•JustinBieber: ~ 2 hashtags per tweet

•Example: #justinbieber is so coool

Common #hashtags

#lol #jesus #worldcup #teaparty #AZ #milk #ff #guns #WorldCup #bp #News

#dancing #music #glennbeck @addthis #nowplaying#news#WTF #fail #toomanypeople #oilspill #catholic

Retweet Patterns

Retweet Behaviors

Friendship Networks

Juxtaposing Friendships

•Harvest search results for #JustinBieber and #TeaParty

•Get friend ids for each @mention with /friends/ids

•Resolve screen names with /users/lookup

•Populate a NetworkX graph

•Analyze it

•Visualize with Graphviz

Nodes Degrees

Two Kinds of Hairballs...

#TeaParty#JustinBieber

The world twitterverse is your oyster

Mining the Social Web

• Twitter : @SocialWebMining

• GitHub: http://bit.ly/socialwebmining

• Facbook: http://facebook.com/MiningTheSocialWeb

Recommended