Upload
wren
View
19
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Predicting Short-Term Interests Using Activity-Based Search Context. CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Modeling Search Activity Study Conclusions. Introduction. - PowerPoint PPT Presentation
Citation preview
Predicting Short-Term Interests Using Activity-Based
Search ContextCIKM’10Advisor: Jia Ling, KohSpeaker: Yu Cheng, Hsieh
Outline
• Introduction
• Modeling Search Activity
• Study
• Conclusions
Introduction
• Satisfying searchers’ information needs involves a through understanding of their interests through:
- search query
- search engine result page (SERP) clicks
- post-SERP browsing behavior
• Construct interest models of the current query which including: - previous queries
- previous clicks on SERP
• Evaluate the predictive effectiveness of these models using future actions
Modeling Search Activity
• Data - The data set contained browser logs with both
searching and browsing episodes.
- Log entries include a timestamp for each page
view, and the URL of the Web page visited
- Only in English-speaking United States locale
- Search sessions on the Bing Web search engine were
extracted
Modeling Search Activity
• ODP Labeling - Represented context a distribution across categories in ODP
topical hierarchy.
- Provides a consistent topical representation of queries and page
visits from which to build the models.
- ODP category label can also reflect topical differences in the
search results for a query or a user’s interests
- Automatic classification skill to assign an ODP category labels to
each page.
- 219 categories at the top two levels of the ODP hierarchy were
used ( called L )
-
Modeling Search Activity
• ODP Labeling - Strategy of labeling a page
1. Begin with URLs present in the ODP
2. Incrementally prunes non-present URLs until a match is found,
or miss declared
3. Check for exact match with logistic regression classifier
Modeling Search Activity
• Sources and Source Combinations - ODP labels automatically assigned to the following
sources: 1. Query: the top 10 search results for the query
2. SERPClick: the search results clicked by the user during the search
session
3. NavTrai: Web pages that the user visits from a SERP click
Modeling Search Activity
• Model Definitions – Query Model(Q) - For each query, the category labels for the top 10
search results were obtained.
- Probabilities are assigned to the categories in L by
1. normalized click frequencies for each top 10 results
from search-engine click log data
2. the distribution across all ODP category labels
- ODP categories in L that are not used to label are
assigned the prior probabilities
Modeling Search Activity
• Model Definitions – Context Model(X) - The context model is constructed based on actions
which comprise previous data as follows:
1. Queries
2. Web pages visited through a SERP click
3. Web pages visited on the navigational trail
following a SERP click
Modeling Search Activity
Modeling Search Activity
• Model Definition – Intent Model(I)
Modeling Search Activity
• Relevance Model or Ground Truth (R) - The relevance model contains actions that occur
following the current query in the session
Modeling Search Activity
Study
Study
Study
Study
Study
• Learning Optimal Context Weights
Steps 1. Identify the optimal context weight (w) for each query
on a held out training set
2. Create features for the query and the context that could
be useful in predicting w
Study
• Learning Optimal Context Weights
- To create a training set, the query, context, and
relevance models were used to compute the
optimal context weight per query by minimizing
the regularized cross-entropy for each query
independently.
Study
A regularizer that penalizes deviations
from w=0.5
Study
• Generating Features of Query and Context
- Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query
model.
2. Context class: capturing aspects of the pre-query interaction behavior as
well as features of the context model themselves.
3. QueryContext: capturing aspects of how the query model and context
model compare.
- These features were generated for each session in the
set and used to train a predictive model
Study
• Generating Features of Query and Context
- Query class
Study
• Generating Features of Query and Context
- Context class
Study
• Generating Features of Query and Context
- QueryContext class
study
study
• Predicting the Optimal Context Weight - 60% of those queries for training, 20% for validation, 20%
for testing
- 10-fold cross validation was performed to improve result
reliability.
- The folds were constructed by splitting session, so that
all queries in a session are used for either training,
validation, or testing
study
study
• Predicting the Optimal Context WeightThe most performant features related to the information divergence to
the query models and the context model
study
• Predicting the Optimal Context Weight
study
study
• Varying Context and Relevance Information
Conclusions
• A study of investigating the effectiveness of activity-based context in predicting user’s search interests.
• Explored the value of modeling the current query, its context and their combination, and different sources.
• Intent models developed from many sources perform best overall.
• Developed techniques to learn the optimal combinations.