22
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search Iris Miliaraki, Roi Blanco, Mounia LalmasMay, 2015 24th International World Wide Web Conference (WWW 2015), Florence, Italy

From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search

Embed Size (px)

Citation preview

From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search

Iris Miliaraki, Roi Blanco, Mounia Lalmas| May, 2015

24th International World Wide Web Conference (WWW 2015), Florence, Italy

Motivation

Motivation ENTITY → “barcelona”

RELATED ENTITIES

Spark system ▪  What is Spark?

Given a query submitted to Yahoo search engine, Spark provides related entity suggestions for the query exploiting public knowledge bases from the Semantic Web & proprietary data

▪  So, what has the young actress Selena Gomez to do with

Marlon Brando? This is a path that can be explored (and was explored) by a user following suggestions made by Spark

Example navigation patterns

Star behavior: user clicks on many related entities given a single entity query

Path behavior: user follows a path of related entities issuing different successive queries

Goals ●  Study how users interact with Spark recommendations

○  Which types of queries & entities users interact with the most?

○  What are the characteristics of these sessions? ○  What is the interplay between typical search results &

Spark entity recommendation results? ○  Does Spark promote an explorative behavior?

●  Predict user click behavior

○  Exploit the insights from the study to develop a set of query and user based features to reflect the click behavior of users and explore their impact on click prediction on Spark

Talk outline ●  Motivation & Goals ●  Analysis

○  Dataset ○  Query-based analysis ○  User-based analysis ○  Other trends

●  Prediction task ○  Experimental setup ○  Features ○  Results

●  Sum up & contributions

Analysis: Dataset & metrics

▪  Dataset: collected a sample of 2M users focusing on activity related to Spark (queries triggering Spark)

▪  Metrics: Search & Spark CTR (click-through rate) for evaluating “user satisfaction”

▪  Due to confidentiality, all raw CTR values have been normalized via a linear transformation and all reported values are relative.

Query-based analysis I Search vs Spark CTR

Mutual-growth area

Relatively low Spark & Search CTR

High Search CTR - low Spark CTR

Query-based analysis II Search & Spark CTR across different entity types

Query-based analysis III Search & Spark CTR for different query context The user submitting a query with

context looks for a more specialized set of results

The user submitting a query without any surrounding context more likely to click on Spark results

User-based analysis I Session duration effect

Shorter sessions have highest search CTR: users come, find what they are looking for and leave

As the session length increases, search CTR decreases likely due to users trying various queries to find what they are looking for

different behavior for Spark: user willing to explore the recommendation

User-based analysis II CTRs across age groups

Other trends User age vs. Person entity age

Users are enticed to explore people of a closer age to them (Pearson correlation is equal to 0.859 with p<0.0001)

Main insights ▪  Spark promotes explorative behavior

▪  Users are more likely to navigate through the

recommendations for specific type of queries and when no specific context (e.g., “pictures”) is specified

▪  Contrary to standard search behavior, where users find the information they need as soon as possible, users interacting with Spark entity recommendations explore the results leading to longer sessions

▪  Next: we build a prediction model for predicting whether the users will click on Spark results

Talk outline ●  Motivation & Goals ●  Analysis

○  Dataset ○  Query-based analysis ○  User-based analysis ○  Other trends

●  Prediction task ○  Setup ○  Features ○  Results

●  Sum up & contributions

Prediction task: setup

▪  Dataset: ›  sample of 100k users from which we collects their actions

in 6-month period ›  2 cases: users with any number of actions / users with at

least 3 actions

▪  Task & Method: Given a user, her previous interactions and a new issued query, predict whether the user will interact or not with the Spark module using logistic regression

▪  Evaluation metrics: precision, negative predictive value, recall, specificity, accuracy and AUC

Prediction task: features

User-based features

Query-based features

Prediction task: performance

❏  User-based features improve significantly accuracy

❏  Recall is low showing that the particulars under which a user will engage with a Spark module are diverse & cannot be easily captured

Prediction task: user history previous actions of users (i=1,2,3)

❏  The more recent the previous action used is, the more accurate the prediction (i=3 corresponds to the most recent action)

Talk outline ●  Motivation & Goals ●  Analysis

○  Dataset ○  Query-based analysis ○  User-based analysis ○  Other trends

●  Prediction task ○  Experimental setup ○  Features ○  Results

●  Sum up & contributions

Sum up & contributions ●  Large-scale analysis: types of queries and entities that

users interact with, who are the users that are interacting with Spark, characteristics of their sessions, and the interplay between the typical search results and Spark entity recommendation results

●  Spark click prediction: developed a set of query and

user-based features that reflect the click behavior of the users and explored their impact in the context of click prediction on Spark using a prediction approach