16
Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining Thomas Gottron [email protected]

Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Institut für Web Science & Technologien - WeST

Challenges in Mining Social Media

Sparsity and Quality

Dagstuhl Seminar 11171

Challenges in Document Mining

Thomas [email protected]

Page 2: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 2Challenges in Document Mining

Social Media

Definition from Wikipedia:Social media are media for social interaction, using highly accessible and scalable communication techniques. Social media is the use of web-based and mobile technologies to turn communication into interactive dialogue.

Social authority is developed when an individual or organization establishes themselves as an "expert" in their given field or area, thereby becoming an influencer in that field or area.

Prominent examples: Media sharing, collaboration, reviews, communication MicroBlogging

• Twitter & Co. (also Facebook)

Page 3: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 3Challenges in Document Mining

Microblogging – Twitter

My dear @johndoe had

troubles to wake up this #morning

Followers

@janedoe

RT @janedoe: My dear @johndoe had troubles to wake up this

#morning

Page 4: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 4Challenges in Document Mining

Microblogging – Sparsity

Twitter: 140 characters, few terms

0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536394142434446471

10

100

1000

10000

100000

1000000

10000000

100000000

Max TF in Tweet

# T

wee

ts

85% of all tweets do not contain any term more than once

Page 5: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 5Challenges in Document Mining

Microblogging – Quality

Facets/aspects of quality:

Question: Which is the best Online RSS Reader? I need some recommendations, cheers everyone :)

My kitten is pretending to be a laptop

im on the phone rite now

Interesting timeline of major events in the history of information retrieval http://tinyurl.com/ya7rcqt

Purpose (interaction, news propagation, etc.)

Presentation (humor, irony, etc.)

Language (writing style)

Interestingness

Page 6: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 6Challenges in Document Mining

Measuring Quality?

(Social) Network measures PageRank Clustering coefficients Centrality measures.

Quality of people, not messages!

Page 7: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 7Challenges in Document Mining

Retweets

Sign of quality interesting for wider

audience Depends on

Content Social network

• Number of followers• Activity of followers

Content based retweet prediction

Odds of retweet as sign of quality

Page 8: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 8Challenges in Document Mining

Retweets – Features

Feature Dimensions Type

Message feature

Direct message BooleanUsername BooleanHashtag BooleanURL Boolean

SentimentValence RealArousal RealDominance Real

Emoticons Positive BooleanNegative Boolean

Exclamation Positive BooleanNegative Boolean

Punctuation ! Boolean? Boolean

Terms Odds Real (pos)Topic 100 Topics Real (pos)

Page 9: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 9Challenges in Document Mining

Retweets – Prediction Model

Logistic regression

Model parameters learned on training data

Dataset Users Tweets Retweets

Choudhury 118,506 9,998,756 7.89%

Choudhury (extended) 277,666 29,000,000 8.64%

Petrovic 4,050,944 21,477,484 8.46%

Page 10: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 10Challenges in Document Mining

Feature Weights

Feature Dimensions WeightConstant (intercept) -5.45

Message feature

Direct message -147.89Username 146.82Hashtag 42.27URL 249.09

SentimentValence -26.88Arousal 33.97Dominance 19.56

Emoticons Positive -21.8Negative 9.94

Exclamation Positive 13.66Negative 8.72

Punctuation ! -16.85? 23.67

Terms Odds 19.79

Page 11: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 11Challenges in Document Mining

Feature Weights – Topics

Topic Weight

social media market post site web tool traffic network 27.54

follow thank twitter welcome hello check nice cool people 16.08

credit money market business rate economy home 15.25

christmas shop tree xmas present today wrap finish 2.87

home work hour long wait airport week flight head -14.43

twitter update facebook account page set squidoo check -14.43

cold snow warm today degree weather winter morning -26.56

night sleep work morning time bed feel tired home -75.19

Page 12: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 12Challenges in Document Mining

Application: Tweet retrieval

Query: „beer“

Rank Username Tweet1 LoriAG beer2 Crushdwinebar beer!!3 Skippertaylor BEER4 BigMacScola Beer5 VANiamore beer.......6 CindyMcManis To beer or not to beer on Beer Summit ?7 silverlakewine beer beer beer beer beer beer beer. Simple 3pm8 eldoradobar http://ping.fm/p/Bnra7 - In!!! BEER, BEER, BEER, BEER,

BEER, BEER, BEER, BEER, BEER, BEER,

9 tonx Lompoc. beer beer beer beer beer beer beer beer beer beer. http://twitpic.com/l68ld

10 punkeyfunky Beer beer beer beer beer beer beer beer beer beer beer beer beer. Er, guess what I'm looking forward to?

Page 13: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 13Challenges in Document Mining

Application: Tweet retrieval

Rerank top-100 according to retweet-odds

Rank Username Tweet1 BeeracrossTX UK beer mag declares "the end of beer writing." @StanHieronymus says not

so in the US. http://bit.ly/424HRQ #beer

2 narmmusic beer summit @bspward @jhinderaker no one had billy beer? heehee #narm - beer summit @bspward @jhinde http://tinyurl.com/n29oxj

3 beeriety Go green and turn those empty beer bottles into recycled beer glasses! | http://bit.ly/2src7F #beer #recycle (via: @td333)

4 hblackmon Great Divide beer dinner @ Porter Beer Bar on 8/19 - $45 for 3 courses + beer pairings. http://trunc.it/172wt

5 nycraftbeer Interesting Concept-Beer Petitions.com launches&hopes 2help craft beer drinkers enjoy beer they want @their fave pubs. http://bit.ly/11gJQN

6 carichardson Beer Cheddar Soup: Dish number two in my famed beer dinner series is Beer Cheddar Soup. I hadn’t had too.. http://bit.ly/1diDdF

7 BeerBrewing New York City Beer Events - Beer Tasting - New York Beer Festivals - New York Craft Beer http://is.gd/39kXj #beer

8 delphiforums Love beer? Our member is trying to build up a new beer drinker's forum. Grab a #beer and join us: http://tr.im/pD1n

9 Jamie_Mason #Baltimore Beer Week continues w/ a beer brkfst, beer pioneers luncheon, drink & donate event, beer tastings & more. http://ping.fm/VyTwg

10 carichardson Seattle and Beer: I went to Seattle last weekend. It was my friend’s stag - he likes beer - we drank beer.. http://tinyurl.com/cpb4n9

Page 14: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 14Challenges in Document Mining

Application: Tweet retrieval – Evaluation

1 2 3 40

10

20

30

40

50

60

70

80

90

Retrieval Performance

Relevance Retweet Odds

Query Length

MAP

1 2 3 40

1

2

3

4

5

6

User Preference for System

Relevance Retweet Odds

Query Length

Pref

erre

d re

sults

ets

Page 15: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 15Challenges in Document Mining

Summary & Outlook

Microblogging Data sparsity in short messages Quality is an issue

Interestingness: (one) sign of quality

Use Retweet odds for better ranking Notion of content quality

Influence / Potential influence of users.

Page 16: Institut für Web Science & Technologien - WeST Challenges in Mining Social Media Sparsity and Quality Dagstuhl Seminar 11171 Challenges in Document Mining

Thomas Gottron Challenges in Mining Social Media 16Challenges in Document Mining

Thank you!

Contact:WeST – Institute for Web Science and Technologies

Universität Koblenz-Landau

[email protected]