View
72
Download
2
Category
Tags:
Preview:
DESCRIPTION
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content by Bordino
Citation preview
Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content
chenwq2014/04/16
Mounia Lalmas et al.(Yahoo! Labs, CIKM 2013 Best Paper )
Mounia Lalmas
@mounialalmas
mounia-lalmas
mounialalmas
Principal Research Scientist at Yahoo! Labs
Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London
Her research focuses on three main areas: user engagementsocial media and search.
Contents1/23
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
Why/when do penguins wear sweaters?
Entity SearchBuilding an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers
SerendipityFinding something good or useful while not specifically looking for itSerendipitous search systems provide relevant and interesting results
2/23
What is entity search
How people become entitiesHow people become entities
3/23
What is entity search
Entities Extraction
Proximity Measure between two entities
Entities Ranking according to their proximity to a query entity
4/23
What is Serendipity
“making fortunate discoveries by accident”
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010.
Serendipity = unexpectedness + relevance“Expected” result baselines from web search
Serendipity = interestingness + relevanceResult interestingness given the queryPersonal interest in result
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009.
5/23
What is Serendipity
Intuition from recsys:
unexpectedness
usefulness u(RSi)
6/23
What connections between entities do web community knowledge portals offer?
WHAT
WHYHow do they contribute to an interesting, serendipitous browsing experience?
Why/when do penguins wear sweaters?6/23
Why/when do penguins wear sweaters?
community-driven question & answer portal
•67M questions & 262M answers
•2 years [2010/2011]
•English-language
community-driven encyclopedia
•3 795 865 articles
•from end of December 2011
•English Wikipedia
minimally curatedopinions, gossip, personal info
variety of points of view
minimally curatedopinions, gossip, personal info
variety of points of view
curatedhigh-quality knowledgevariety of niche topics
curatedhigh-quality knowledgevariety of niche topics
7/23
Contents
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
8/23
Entity & Relationship Extraction
Entity defined as any concept having a Wikipedia page
1. Identify surface forms[http],
2. resolve to Wikipedia entities[Zhou],
3. rank entities using aboutness score[Paranjpe];
https://www.otexts.org/node/832
Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.
Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears)
9/23
Entity & Relationship Extraction
Aboutness
Relationship
10/23
Entity Networks
Dataset # Nodes # Edges # Isolated
Yahoo! Answers 896,799 112,595,138 69,856
Wikipedia 1,754,069 237,058,218 82,381
Wikipedia
Yahoo Answers
11/23
Retrieval
Algorithm: Lazy Random walk with restart[Chung]
[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.
12/23
Rank Aggregation
For a given query, combine the results from different search engines
Simple median-rank aggregation[Sculley]
A B C D EC D E A B
C A D B E
Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.
13/23
Contents
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
14/23
Retrieval
Wikipedia Yahoo! Answers Combined
Precision @ 5 0.668 0.724 0.744
MAP 0.716 0.762 0.782
3 label per query-result pair
Yahoo! AnswersJon RubinsteinTimothy CookKane Kramer
Steve WozniakJerry York
WikipediaSystem 7
PowerPC G4SuperDrive
Power MacintoshPower Computing Corp.
Steve Jobs Annotator agreement
(overlap): 85% Average overlap in top 5
results: 12%
15/23
What connections between entities do web community knowledge portals offer?
WHAT
WHYHow do they contribute to an interesting, serendipitous browsing experience?
Why/when do penguins wear sweaters?16/23
• Sentiment
– using SentiStrength compute positive & negative scores
– compute attitude and sentimentality
– Entity-level scores
• Quality
– Flesch Reading Ease score
Attitude (Polarity) Sentimentality (Strength) Readability
Topical Category
– Yahoo Content Taxonomy
Entity Networks with Implicit Metadata17/23
Entity Networks with Metadata
Table 5: Serendipitous across different runs
| relevant & unexpected | / | unexpected |number of serendipitous results out of all of the unexpected results retrieved
| relevant & unexpected | / | retrieved |serendipitous out of all retrieved
18/23
User-perceived Quality
1. Which result is more relevant to the query?
2. If someone is interested in the query, would they also be interested in these results?
3. Even if you are not interested in the query, are these results interesting to you personally?
4. Would you learn anything new about the query?
19/23
Entity Networks with Metadata
Data General +Topic
Which result is more WP 0.162 0.194
relevant to the query? YA 0.336 0.374
Comb 0.201 0.222
If someone is interested in WP 0.162 0.176
the query, would they also YA 0.312 0.343
be interested in the result? Comb 0.184 0.222
Even if you are not interested WP 0.139 0.144
in the query, is the result YA 0.324 0.359
interesting to you personally? Comb 0.168 0.198
Would you learn anything WP 0.167 0.164
new about the query from YA 0.307 0.346
this result? Comb 0.184 0.203
Topicalcategoryconstraintpromote resultsof same topicas query entity
Sentiment andReadabilityconstraintshurt performance
Table 6: Similarity (Kendall’s tau-b[Fagin]) between result sets and reference ranking
Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.
22/23
Recommended