16
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 1/16 Measuring Website Similarity using an Entity-Aware Click Graph Pablo N. Mendes 1 , Peter Mika 2 , Hugo Zaragoza 2 , Roi Blanco 2 1. Freie Universität Berlin 2. Yahoo! Research Barcelona Nov 1 st 2012, Maui, CIKM 2012

Entity Aware Click Graph

Embed Size (px)

DESCRIPTION

Query logs record the actual usage of search systems and their analysis has proven critical to improving search engine functionality. Yet, despite the deluge of information, query log analysis often suffers from the sparsity of the query space. Based on the observation that most queries pivot around a single entity that represents the main focus of the user’s need, we propose a new model for query log data called the entity-aware click graph. In this representation, we decom- pose queries into entities and modifiers, and measure their association with clicked pages. We demonstrate the benefits of this approach on the crucial task of understanding which websites fulfill similar user needs, showing that using this representation we can achieve a higher precision than other query log-based approaches.

Citation preview

Page 1: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 1/16

Measuring Website Similarity using an Entity-Aware Click Graph

Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2

1. Freie Universität Berlin2. Yahoo! Research Barcelona

Nov 1st 2012, Maui, CIKM 2012

Page 2: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 2/16

Introduction: query log analysis

● Query logs record user interaction with Web search engines

● Query log analysis has been proven critical to improving search

● For search engines – Ranking, autosuggest, “Also try”, etc.

● For site owners – insight into user needs, allows optimizing Web

presence, etc.

Page 3: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 3/16

Introduction: website similarity

● Click graph: relating queries and websites, edges are clicks

● Allows modeling website relatedness based on shared queries leading to each website pair

Site similarity graph (Site similarity graph (SGSG))Click graphClick graph

Page 4: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 4/16

Problems: Sparsity

● 44% of queries occur only once even when considering a full year of data [1]

● using “shared queries” as relatedness measure relatedness becomes tough in the long tail.

[1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.

Page 5: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 5/16

Problems: partial overlaps

● Breaking up into words distorts semantics– “Forest” vs “Forest Gump”

– “Pitt” vs “Brad Pitt”

Page 6: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 6/16

Introduction● >62% of queries contain entity name or type [20]

[20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.

Page 7: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 7/16

Entity-aware Click Graph

● Websites can share entities and/or modifiers

Page 8: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 8/16

Entity-aware Website Similarity Graph

● More connected● Preserves semantics● Allows analysis of

how websites relateto entities and modifiers

Page 9: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 9/16

Experiments

● Website similarity– Find top K similar sites

– Evaluation: two sites are “similar” if they are in the same category in ODP (Open Directory Project)

● Website characteristics from the searcher POV– What entities lead to a website

– What context words lead to a website

Page 10: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 10/16

Dataset Statistics: Query Log

● 1 month of queries from Yahoo!, 45M sessions● 5M entities from Freebase

Page 11: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 11/16

Results 1

● Similarity edge prediction

Page 12: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 12/16

Results 1

● Similarity edge prediction with credit to partial category overlap

Page 13: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 13/16

Results 2

Entropy of distribution of

entities

Entropy of distribution of modifiers

Many entitiesMany modifiers

Few entitiesMany modifiers

Many entitiesFew modifiers

Page 14: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 14/16

Results 2

Page 15: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 15/16

Conclusion

● Recognizing entities in Web search logs allows for click graphs that account for internal composition of queries

● New similarity graphs built from entity-aware click graphs allow enable more robust and flexible similarity analysis (evaluated for website similarity)

● Future:– Exploit the knowledge base (e.g. type hierarchy)

– More complex queries

– etc

Page 16: Entity Aware Click Graph

Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 16/16

Thank you!

● Web: http://pablomendes.com● E-mail: [email protected]● Twitter: @pablomendes● Slideshare: slideshare.net/pablomendes

Questions?