Dynamic Collective Entity Representations for Entity Ranking

  • View
    302

  • Download
    4

  • Category

    Science

Preview:

Citation preview

Dynamic Collective Entity Representations for Entity RankingDavid Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke

2

3

4

Entity search?

Ò Index = Knowledge Base (= Wikipedia) Ò Documents = Entities Ò “Real world entities” have a single representation

(in KB)

5

Representation is not static

Ò Associations between words and entities change over time Ò “ferguson shooting” -> Ferguson, Missouri

Ò People talk about entities all the time

6

*****

7

Dynamic Collective Entity Representations

Ò Use “collective intelligence” to mine entity descriptions to enrich representation. Ò Is like document expansion (add terms found

through explicit links) Ò Is not query expansion (terms found through

predicted links)

8

Advantages

Ò Cheap: Change document in index, leverage tried & tested retrieval algorithms

Ò Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms

Ò “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)

9

Haven’t we seen this before?

Ò Anchors & queries in particular have been shown to improve retrieval [1]

Ò Tweets have been shown to be similar to anchors [2] Ò Social tags, same [3] Ò But: in batch (i.e., add data, see if/how it improves

retrieval)

[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12

[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13

10

Description sourcesDescription sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

11

Original entity representation

Tupac ShakurTupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem-ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author,

actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...]

Original entity description

Entity description

12

Static description sources

KB Anchors2PacTupacMakaveli

KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi

KB Redirects2pac ShakurThug Immortal

KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists

Web AnchorsWhat job did Tupac have before he was a rapper

Tupac

Tupac is arguably more influential

Tupac Amaru Shakur

Tupac Shakur-style drive-by shooting

Tupac Shakur

Tupac Shakur reciting Shake-speare at art school

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

KB Anchors2PacTupacMakaveli

KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi

KB Redirects2pac ShakurThug Immortal

KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists

Web AnchorsWhat job did Tupac have before he was a rapper

Tupac

Tupac is arguably more influential

Tupac Amaru Shakur

Tupac Shakur-style drive-by shooting

Tupac Shakur

Tupac Shakur reciting Shake-speare at art school

13

Dynamic description sourcesDynamic expansions

tupac and the law

hiphop/icons

dead rappers

people influenced by tupac

awesomeartist rapd

Happy Birthday Tupac!!! 2Pac Gemini

RT: Las cenizas de Tupac, el mejor rapero de la historia,-fueron mezcladas con marihuana y fumadas por miembros de Outlawz

Even more crazy that this was an-nounced just one day before what would have been Pac’s 40th birth-day.

Tweets TagsQueries

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

14

Challenge

Ò Heterogeneity 1. Description sources 2. Entities

Ò Dynamic nature Ò Content changes over time

15

Adaptive ranking

Ò Supervised single-field weighting model Ò Features:

Ò field similarity: retrieval score per field. Ò field “importance”: length, novel terms, etc. Ò entity “importance”: time since last update.

Ò Learn optimal field weights from clicks

Supervised single-field weighting modelEeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals.

16

Experimental setup

1. Data: Ò MSN Query log (62,841 queries that yield entity clicks)

Ò For each query: Ò Produce ranking Ò Observe click Ò Evaluate ranking (MAP/P@1) Ò Expand entities (w/ descriptions from dynamic

sources) Ò [re-train ranker]

17

Results

Ò Comparing effectiveness of diff. description sources

Ò Comparing adaptive vs. non-adaptive ranker performance

18

Description sources

0.60

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0 5000 10000 15000 20000 25000 30000

19

Feature weights over time

20

Adaptive vs. non-adaptive ranking

0.60

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

0 5000 10000 15000 20000 25000 30000

21

In summary

Ò Expanding entity representations with different sources enables better matching of queries to entities

Ò As new content comes in, it is beneficial to retrain the ranker

Ò Informing ranker of “expansion state” further improves performance

22

Thank you

Recommended