Predicting Short-Term Interests Using Activity-Based Search Context

Predicting Short-Term Interests Using Activity-Based

Search ContextCIKM’10Advisor: Jia Ling, KohSpeaker: Yu Cheng, Hsieh

Outline

• Introduction

• Modeling Search Activity

• Study

• Conclusions

Introduction

• Satisfying searchers’ information needs involves a through understanding of their interests through:

- search query

- search engine result page (SERP) clicks

- post-SERP browsing behavior

• Construct interest models of the current query which including: - previous queries

- previous clicks on SERP

• Evaluate the predictive effectiveness of these models using future actions

Modeling Search Activity

• Data - The data set contained browser logs with both

searching and browsing episodes.

- Log entries include a timestamp for each page

view, and the URL of the Web page visited

- Only in English-speaking United States locale

- Search sessions on the Bing Web search engine were

extracted


• ODP Labeling - Represented context a distribution across categories in ODP

topical hierarchy.

- Provides a consistent topical representation of queries and page

visits from which to build the models.

- ODP category label can also reflect topical differences in the

search results for a query or a user’s interests

- Automatic classification skill to assign an ODP category labels to

each page.

- 219 categories at the top two levels of the ODP hierarchy were

used ( called L )

-


• ODP Labeling - Strategy of labeling a page

1. Begin with URLs present in the ODP

2. Incrementally prunes non-present URLs until a match is found,

or miss declared

3. Check for exact match with logistic regression classifier


• Sources and Source Combinations - ODP labels automatically assigned to the following

sources: 1. Query: the top 10 search results for the query

2. SERPClick: the search results clicked by the user during the search

session

3. NavTrai: Web pages that the user visits from a SERP click


• Model Definitions – Query Model(Q) - For each query, the category labels for the top 10

search results were obtained.

- Probabilities are assigned to the categories in L by

1. normalized click frequencies for each top 10 results

from search-engine click log data

2. the distribution across all ODP category labels

- ODP categories in L that are not used to label are

assigned the prior probabilities


• Model Definitions – Context Model(X) - The context model is constructed based on actions

which comprise previous data as follows:

1. Queries

2. Web pages visited through a SERP click

3. Web pages visited on the navigational trail

following a SERP click



• Model Definition – Intent Model(I)


• Relevance Model or Ground Truth (R) - The relevance model contains actions that occur

following the current query in the session


Study

Study

Study

Study

Study

• Learning Optimal Context Weights

Steps 1. Identify the optimal context weight (w) for each query

on a held out training set

2. Create features for the query and the context that could

be useful in predicting w

Study

• Learning Optimal Context Weights

- To create a training set, the query, context, and

relevance models were used to compute the

optimal context weight per query by minimizing

the regularized cross-entropy for each query

independently.

Study

A regularizer that penalizes deviations

from w=0.5

Study

• Generating Features of Query and Context

- Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query

model.

2. Context class: capturing aspects of the pre-query interaction behavior as

well as features of the context model themselves.

3. QueryContext: capturing aspects of how the query model and context

model compare.

- These features were generated for each session in the

set and used to train a predictive model

Study


- Query class

Study


- Context class

Study


- QueryContext class

study

study

• Predicting the Optimal Context Weight - 60% of those queries for training, 20% for validation, 20%

for testing

- 10-fold cross validation was performed to improve result

reliability.

- The folds were constructed by splitting session, so that

all queries in a session are used for either training,

validation, or testing

study

study

• Predicting the Optimal Context WeightThe most performant features related to the information divergence to

the query models and the context model

study

• Predicting the Optimal Context Weight

study

study

• Varying Context and Relevance Information

Conclusions

• A study of investigating the effectiveness of activity-based context in predicting user’s search interests.

• Explored the value of modeling the current query, its context and their combination, and different sources.

• Intent models developed from many sources perform best overall.

• Developed techniques to learn the optimal combinations.

Documents

Predicting Short-Term Interests Using Activity-Based Search Context