80
9th European Summer School in Information Retrieval September 4th, 2013 http://bit.ly/ESSIR13IRSocMedia IR and Social Media Arjen P. de Vries [email protected] Centrum Wiskunde & Informatica Delft University of Technology Spinque B.V.

ESSIR 2013 - IR and Social Media

Embed Size (px)

DESCRIPTION

Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval! The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks. In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web. [1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107

Citation preview

Page 1: ESSIR 2013 - IR and Social Media

9th European Summer School in Information Retrieval September 4th, 2013

http://bit.ly/ESSIR13IRSocMedia

IR and Social Media

Arjen P. de [email protected]

Centrum Wiskunde & InformaticaDelft University of Technology

Spinque B.V.

Page 2: ESSIR 2013 - IR and Social Media

On slideshare,IR = Investor Relations

Page 3: ESSIR 2013 - IR and Social Media

Social Media

Noun

social media (plural only)

Interactive forms of media that allow users to interact with and publish to each other, generally by means of the Internet.

The early 21st century saw a huge increase in social media thanks to the widespread availability of the Internet.

Page 4: ESSIR 2013 - IR and Social Media

http://www.webanalyticsworld.net/2010/11/history-of-social-media-infographic.html

Page 5: ESSIR 2013 - IR and Social Media

Social Media

“Social bookmarking” sites “User generated content”

Images (flickr) and videos (youtube, vimeo), but also blogs

Social network services Twitter, facebook

Page 6: ESSIR 2013 - IR and Social Media

Not just one beast!

Page 7: ESSIR 2013 - IR and Social Media
Page 8: ESSIR 2013 - IR and Social Media
Page 9: ESSIR 2013 - IR and Social Media

IR and Social Media?

Page 10: ESSIR 2013 - IR and Social Media

Red Hot Chili Peppers

Page 11: ESSIR 2013 - IR and Social Media

“Rock group” in author’s metadata...

Organisation in groups may help

disambiguate query!

More implicit metadata...

Page 12: ESSIR 2013 - IR and Social Media

Information Science

“Search for the fundamental knowledge which will allow us to postulate and utilize the most efficient combination of [human and machine] resources”

M.E. Senko. Information systems: records, relations, sets, entities, and things. Information systems, 1(1):3–13, 1975.

Page 13: ESSIR 2013 - IR and Social Media

Core Questions

How to represent information? The information need and search requests The objects to be shown in response to an

information request

How to match information representations?

Page 14: ESSIR 2013 - IR and Social Media

IR and Social Media

Richer information representations!

Page 15: ESSIR 2013 - IR and Social Media

Richer representations

User profiles User name, full name, description, image,

homepage url, etc.

Connections between users Networks of friends, followers, etc

Comments/reactions Endorsing and sharing

Page 16: ESSIR 2013 - IR and Social Media

Q: Web ancient social media?

Page 17: ESSIR 2013 - IR and Social Media

(C) 2008, The New York Times Company

Anchor tekst: “continue reading”

Page 18: ESSIR 2013 - IR and Social Media

Not a lot of info to represent the page…

Een fan’s hyves page:Kyteman's HipHop Orchestra: www.kyteman.com

Kaartverkoop luxor theater:22 mei - Kyteman's hiphop Orkest - www.kyteman.com

Kluun.nl:De site van Kyteman

Blog Rockin’ Beats:De 21-jarige Kyteman (trompettist, componist en Producer Colin Benders), heeft drie jaar gewerkt aan zijn debuut:the Hermit sessions.

Jazzenzo:...een optreden van het populaireKyteman’s Hiphop Orkest

Page 19: ESSIR 2013 - IR and Social Media
Page 20: ESSIR 2013 - IR and Social Media

‘Co-creation’

Social Media: Consumer becomes a co-creator ‘Data consumption’ traces

In essence: many new sources to play the role of anchor text Tags and/or ratings Tweets Comments, reviews

Page 21: ESSIR 2013 - IR and Social Media

Potential Benefits for IR

Expand content representation Reduce the vocabulary gap(s) between

creators of content, indexers, and users More diverse views on the same content

Page 22: ESSIR 2013 - IR and Social Media

Potential Benefits for IR

Relevance depends on user context User task User knowledge

Page 23: ESSIR 2013 - IR and Social Media

Potential Benefits for IR

Relevance depends on user context User task User knowledge

Social media provide an opportunity to make much better assumptions about user context A specific user’s context The variety of user contexts that may exist

Page 24: ESSIR 2013 - IR and Social Media

Maarten Clements, Arjen P. de Vries and Marcel J.T. Reinders.

The task dependent effect of tags and ratings on social media access.

TOIS 28, 4, article 21 (November 2010), 42 pages.

Page 25: ESSIR 2013 - IR and Social Media

LibraryThing

Page 27: ESSIR 2013 - IR and Social Media

Synonyms

Page 28: ESSIR 2013 - IR and Social Media

Synonyms

Page 29: ESSIR 2013 - IR and Social Media
Page 30: ESSIR 2013 - IR and Social Media

Examples

Humour

Classic

Page 32: ESSIR 2013 - IR and Social Media
Page 33: ESSIR 2013 - IR and Social Media

Search with Random Walk

Present nodes according to estimated probability that a random walk that starts from (task dependent) starting nodes, would end at this node

E.g., tag suggestion starts in a tag node; personalized search in tag and user nodes

Page 34: ESSIR 2013 - IR and Social Media

Tagging Relationships

Page 35: ESSIR 2013 - IR and Social Media
Page 36: ESSIR 2013 - IR and Social Media

An item recommendation walk

Page 37: ESSIR 2013 - IR and Social Media

Ratings

Ratings may enhance the graph, or just be used for evaluation

Page 38: ESSIR 2013 - IR and Social Media

Personalized Search

Assume a user who types a single tag as query

Page 39: ESSIR 2013 - IR and Social Media

Personalized Search

Page 40: ESSIR 2013 - IR and Social Media

A soft clustering effect smoothly relates similar concepts before converging to the background probability

Page 41: ESSIR 2013 - IR and Social Media

Homographs like “Java” are disambiguated because the walk starts in both the query tag and the target user So, content that matches the user’s

preference is more likely to be found first

Page 42: ESSIR 2013 - IR and Social Media

Common System Designs

Page 43: ESSIR 2013 - IR and Social Media

Analysis results

Allowing all users to tag all available content improves retrieval tasks

Combining tags and ratings may improve both search and recommendation tasks

Page 44: ESSIR 2013 - IR and Social Media

Ternary relation lost!

The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices

Page 45: ESSIR 2013 - IR and Social Media

Ternary relation lost!

The UIT matrix represents a ternary relation, that is lost when creating the three UI, IT and UT matrices Potentially a problem if tags express opinion

about an item; e.g., “poetry” can independent from item still describe

the user “awful” requires to know what item the term

belongs to

Page 46: ESSIR 2013 - IR and Social Media
Page 47: ESSIR 2013 - IR and Social Media

Tags vs. rating

Most tags do not deviate far from the mean rating

Only few tags strongly correlated with opinion Note: poetry higher quality than chicklit

Page 48: ESSIR 2013 - IR and Social Media

Metadata

Scientific articles have many types of metadata associated: Abstract Author Booktitle Description Journal Tags

Are all these types of metadata useful for item recommendation?

Page 49: ESSIR 2013 - IR and Social Media

Metadata

According to Toine Bogers’ PhD thesis: Concatenate all fields associated to a single

user’s profile’s items into one huge text field, and use an off-the-shelf IR model to match the profile against metadata of the items.“Profile-centric Matching”

Or, construct item profiles from meta-data of all users for that item, and apply an item-based collaborative filtering approach“Item-based Hybrid Filtering”

Author, description, tags, title, url, journal and booktitle all contribute

Page 50: ESSIR 2013 - IR and Social Media

Finally: a recent case study

Page 51: ESSIR 2013 - IR and Social Media

Artist Popularity?

Let’s ask widely used social media music platforms! I.e., query their APIs

Page 52: ESSIR 2013 - IR and Social Media
Page 53: ESSIR 2013 - IR and Social Media

Artist Popularity (1-3)

Top-5 popular artists in dataset Jan 21 – Mar 21 3 hourly timestamped popularity indices

Page 54: ESSIR 2013 - IR and Social Media

http://bit.ly/ESSIR13IRSocMedia

Page 55: ESSIR 2013 - IR and Social Media

Artist Popularity

Page 56: ESSIR 2013 - IR and Social Media

Artist Popularity (?!)

Top-5 popular artists in dataset Jan 21 – Mar 21 3 hourly timestamped popularity indices

Page 57: ESSIR 2013 - IR and Social Media

The Black Keys

Page 58: ESSIR 2013 - IR and Social Media

The Black Keys

Three grammy awards received!

Page 59: ESSIR 2013 - IR and Social Media

The Black Keys

Web responds, while service based popularity index is static

Page 60: ESSIR 2013 - IR and Social Media

Implications

An “artist popularity” index depends on the platform and its user population

Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events Suitable as an academics’ search log

replacement?

Page 61: ESSIR 2013 - IR and Social Media

Implications

An “artist popularity” index depends on the platform and its user population

Web based popularity – estimated via URL shortener’s API – “reacts” to real-world events Suitable as an academics’ search log

replacement?

Q: What is the most useful popularity – one that changes dynamically or one that lasts?

Page 62: ESSIR 2013 - IR and Social Media
Page 63: ESSIR 2013 - IR and Social Media

Many topics I skipped…

Page 64: ESSIR 2013 - IR and Social Media
Page 65: ESSIR 2013 - IR and Social Media

Tweets about blip.tv

“Twanchor text” E.g.: http://blip.tv/file/2168377

Amazing Watching “World’s most realistic 3D city

models?” Google Earth/Maps killer Ludvig Emgard shows how maps/satellite pics

on web is done (learn Google and MS!) and ~120 more Tweets

Page 66: ESSIR 2013 - IR and Social Media

Wikipedia

Wikipedia contains semantically very rich annotations: Wikipedia Categories Wikipedia Lists Times (1930, 1931, 1932, etc. etc.) Names Disambiguation pages

Etc.

Note: DBPedia is just Wikipedia

Page 67: ESSIR 2013 - IR and Social Media

Wikipedia

People have used Wikipedia edit history to look for events

Page 68: ESSIR 2013 - IR and Social Media

Geotags / POIs

Many social media items carry explicit geo information Geotags are low-level “coordinates” POIs are high-level “point-of-interest” labels

Applications Recommend geo-locations to people Predict POI tags from (tweet) text Predict where a user will go next

Page 69: ESSIR 2013 - IR and Social Media

Map text to locations

Build a language model from all tags assigned to flickr images that belong to a predefined grid cell

Neighbouring cells used for smoothing (like hierarchic language models used previously for video / scene / shot)

User frequency of a term in a location (instead of term frequency)

Neil O’Hare and Vanessa Murdock

Modeling Locations with Social Media

Information Retrieval, February 2013, Volume 16, Issue 1, pp 30-62

Page 70: ESSIR 2013 - IR and Social Media

Placing Images: Easyhttp://www.flickr.com/photos/63666148@N00/3615989115/

Athens, Ohio or Athens, Greece?

Page 71: ESSIR 2013 - IR and Social Media

Placing Images: Hard

Ballooning company in Ottawa

Page 72: ESSIR 2013 - IR and Social Media

Searching the Social Graph

Search entities, and the relationships between them, in the (facebook) social graph

Clearly IR problems, but who has the data to work with?

Micheal Curtiss et al.

Unicorn: A System for Searching the Social Graph

PVLDB, Vol. 6, No. 11

Page 73: ESSIR 2013 - IR and Social Media

Crawling

How to get “the” data? Rate limited APIs ToS

HEADACHES!

Page 74: ESSIR 2013 - IR and Social Media

Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen M. Carley

Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose

ICWSM 2013

Page 75: ESSIR 2013 - IR and Social Media

Not IR yet, but… Interesting stuff nevertheless!

de Volkskrant, March 13, 2013

Michal Kosinski, David Stillwell, and Thore Graepel

Private traits and attributes are predictable from digital records of human behavior

PNAS 2013 ; published ahead of print March 11, 2013, doi:10.1073/pnas.1218772110

Page 76: ESSIR 2013 - IR and Social Media

Take home message(s)

Page 77: ESSIR 2013 - IR and Social Media

Take home message(s)

Social media give us IR researchers access to a rich resource of context Including time & location!

Page 78: ESSIR 2013 - IR and Social Media

Take home message(s)

Social media give us IR researchers access to a rich resource of context Including time & location!

Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly

Page 79: ESSIR 2013 - IR and Social Media

Take home message(s)

Social media give us IR researchers access to a rich resource of context Including time & location!

Gather the right data for your problem domain, and it may be a good alternative for not having the click data we all want so badly

Various recommendation and retrieval tasks exist in social media – can one theory address all of these?

Page 80: ESSIR 2013 - IR and Social Media

C U @ #ECIR2014 ? !