Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content

chenwq2014/04/16

Mounia Lalmas et al.(Yahoo! Labs, CIKM 2013 Best Paper )

Mounia Lalmas

@mounialalmas

mounia-lalmas

mounialalmas

Principal Research Scientist at Yahoo! Labs

Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London

Her research focuses on three main areas: user engagementsocial media and search.

Contents1/23

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

Why/when do penguins wear sweaters?

Entity SearchBuilding an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers

SerendipityFinding something good or useful while not specifically looking for itSerendipitous search systems provide relevant and interesting results

What is entity search

How people become entitiesHow people become entities

What is entity search

Entities Extraction

Proximity Measure between two entities

Entities Ranking according to their proximity to a query entity

What is Serendipity

“making fortunate discoveries by accident”

M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010.

Serendipity = unexpectedness + relevance“Expected” result baselines from web search

Serendipity = interestingness + relevanceResult interestingness given the queryPersonal interest in result

P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009.

What is Serendipity

Intuition from recsys:

unexpectedness

usefulness u(RSi)

What connections between entities do web community knowledge portals offer?

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?6/23

Why/when do penguins wear sweaters?

community-driven question & answer portal

•67M questions & 262M answers

•2 years [2010/2011]

•English-language

community-driven encyclopedia

•3 795 865 articles

•from end of December 2011

•English Wikipedia

minimally curatedopinions, gossip, personal info

variety of points of view

minimally curatedopinions, gossip, personal info

variety of points of view

curatedhigh-quality knowledgevariety of niche topics

Contents

Entity & Relationship Extraction

Entity defined as any concept having a Wikipedia page

1. Identify surface forms[http],

2. resolve to Wikipedia entities[Zhou],

3. rank entities using aboutness score[Paranjpe];

https://www.otexts.org/node/832

Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.

D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.

Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears)

Entity & Relationship Extraction

Aboutness

Relationship

Entity Networks

Dataset # Nodes # Edges # Isolated

Yahoo! Answers 896,799 112,595,138 69,856

Wikipedia 1,754,069 237,058,218 82,381

Wikipedia

Yahoo Answers

Retrieval

Algorithm: Lazy Random walk with restart[Chung]

[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.

Rank Aggregation

For a given query, combine the results from different search engines

Simple median-rank aggregation[Sculley]

A B C D EC D E A B

C A D B E

Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.

Contents

Retrieval

Wikipedia Yahoo! Answers Combined

Precision @ 5 0.668 0.724 0.744

MAP 0.716 0.762 0.782

3 label per query-result pair

Yahoo! AnswersJon RubinsteinTimothy CookKane Kramer

Steve WozniakJerry York

WikipediaSystem 7

PowerPC G4SuperDrive

Power MacintoshPower Computing Corp.

Steve Jobs Annotator agreement

(overlap): 85% Average overlap in top 5

results: 12%

What connections between entities do web community knowledge portals offer?

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?16/23

• Sentiment

– using SentiStrength compute positive & negative scores

– compute attitude and sentimentality

– Entity-level scores

• Quality

– Flesch Reading Ease score

Attitude (Polarity) Sentimentality (Strength) Readability

Topical Category

– Yahoo Content Taxonomy

Entity Networks with Implicit Metadata17/23

Entity Networks with Metadata

Table 5: Serendipitous across different runs

| relevant & unexpected | / | unexpected |number of serendipitous results out of all of the unexpected results retrieved

| relevant & unexpected | / | retrieved |serendipitous out of all retrieved

User-perceived Quality

1. Which result is more relevant to the query?

2. If someone is interested in the query, would they also be interested in these results?

3. Even if you are not interested in the query, are these results interesting to you personally?

4. Would you learn anything new about the query?

Entity Networks with Metadata

Data General +Topic

Which result is more WP 0.162 0.194

relevant to the query? YA 0.336 0.374

Comb 0.201 0.222

If someone is interested in WP 0.162 0.176

the query, would they also YA 0.312 0.343

be interested in the result? Comb 0.184 0.222

Even if you are not interested WP 0.139 0.144

in the query, is the result YA 0.324 0.359

interesting to you personally? Comb 0.168 0.198

Would you learn anything WP 0.167 0.164

new about the query from YA 0.307 0.346

this result? Comb 0.184 0.203

Topicalcategoryconstraintpromote resultsof same topicas query entity

Sentiment andReadabilityconstraintshurt performance

Table 6: Similarity (Kendall’s tau-b[Fagin]) between result sets and reference ranking

Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Technology

Amazons, Penguins, and Amazon Penguins Todd W. Neller

CROCHET CHARACTER SWEATERS | CROCHET...CROCHET CHARACTER SWEATERS | CROCHET 2 of 3 CROCHET CHARACTER SWEATERS | CROCHET INSTRUCTIONS The instructions are written for smallest size

Very Easy Sweaters

Statistical properties of a sample of serendipitous X-ray ... · Swift XRT serendipitous sources in GRBs pointigs : the “perfect” serendipitous survey Most X-ray sources are radio

Sweaters 2011

Wee Sweaters

Amazons, Penguins, and Amazon Penguins

Penguins, Penguins and More Penguins By: Grace Marie Owens

The XMM-Newton serendipitous surveyxmmssc.irap.omp.eu/3XMM_v10.pdf · The XMM-Newton serendipitous survey⋆ VII. The third XMM-Newton serendipitous source catalogue S. R. Rosen1,

The Best Christmas Sweaters

Abercrombie & fitch mens sweaters

Serendipitous and Strategic Encounters

Sweaters Specification

Penguins, penguins, everywhere! 97 03

Catalogo Blazers Sweaters

Ladies turtleneck long sleeve pullover knit sweaters stock sweaters wholesale

Sweaters 3 18 15

Christmas Dog Sweaters

Serendipitous searching for graded readers

King Penguins Rockhopper Penguins Macaroni Penguins Adelie Penguins Emperor Penguins Choose which type of penguins that you would like to learn more about