Upload
ceya
View
1.046
Download
1
Tags:
Embed Size (px)
Citation preview
1
Matching Task Profiles and User Needs in
Personalized Web SearchJulia Luxenburger, Shady Elbassuoni, Gerhard Weikum
CIKM’08
Advisor: Chia-Hui ChangStudent: Teng-Kai Fan
Date: 2009-10-13
2
Outline Introduction Model and Algorithms
Architecture Personalization Framework
Experiments Conclusion and Future Work
3
Introduction Personalization provides better search experience to
individual users. User’s goal, tasks, and contexts.
Introducing language model for user tasks representing user profile.
Personalization framework selectively matches the actual user information need with relevant past user tasks.
4
Architecture A client-side search personalization with the use of a proxy
which is running locally. It can intercept all HTTP traffic.
Result re-ranking Whenever a user action allows to update the query representation,
unseen results are re-ranked. Query expansion
For some queries, they might rewrite the query sent to the search engine.
Merging of personalized and original results. Personalized result ranks and original web ranks are aggregated to
form the final result ranking. Combination method: Dwork et al. , “Rank aggregation methods for
the web,” WWW’01.
5
Personalization Framework user profile: query chains (subsequently posed queries), result
sets, clicked result pages, the whole clickstream of subsequently visited web pages.
search session: user's timing as well as the relatedness of subsequent user's actions. Actions: (1) queries (2) result clicks (3) other page visits.
task: user’s past search and browse behavior. They obtain tasks by means of a hierarchical clustering of the user’s
profile.
facet: using a hierarchical clustering of the query’s result set (represented by its title and snippet) to obtain query facets.
6
7
Task: user’s past search and browse behavior.
by means of a hierarchical clustering of the user’s profile.
Session: user's timing as well as the relatedness of subsequent user's actions.
Profile: query chains (subsequently posed queries), result sets, clicked result pages, the clickstream of subsequently visited web pages.
8
Selective Personalization Strategy Case I: the current query is the first query in the
current session. We retrieve the top-k tasks T1,…,Tk most similar to the
query from the user’s profile.
Case II: there exists some query history already, and the current query is a refinement of previously issued query in the same session. The tasks present in the user profile are accompanied by a
current task made up by all the actions of the currently active session, and represented by the language model Tk+1.
9
Selective Personalization Strategy cont. Considering the Kullback Leibler (KL) divergence between a
query fact Fi, and a task Tj
The KL divergence characterizes the strength of their similarity.
If KL(F∗i , T∗
j) is larger than a threshold σ, we conclude that the current query goes for a previously unexplored task, and thus refrain from biasing the search results.
Otherwise, we might either reformulate the query sent to Google or re-rank the original search results.
10
Means of Personalization We update the query representation with terms best
discriminating the query facet F∗i from all other query
facts, while being most similar to the task T∗j.
That is, terms which have the largest impact on the KL-divergence between the union of the chosen facet-task pair and the remaining query facets.
11
Means of Personalization cont. Using a threshold δ to allow for an automatic
reformulation of the query sent to Google.
Thus, Terms v (w) < δ qualify for query expansion. Term with δ < v (w) <τand P(w|∪iFi) > 0 qualify for re-
ranking the original top-50 search results.
12
Task Language Model The language model of a user task is a weighted
mixture of its components: queries, result clicks, clickstream documents and query-independent browsed documents.
Thus, the task language model T is:
Q is a uniform mixture of the task’s query chains. B is average of the individual browsed documents’
language models
13
Query language model Let QC denote a query chain Q1, Q2,…,Qk.
query language model is the average of all query chains’ models.
14
Query language model cont. The mixture model:
q: query string. CR: the set of clicked result items NR: non-clicked result items ranked above a clicked one. UR: unseen results ranked below the lowest-ranked clicked item. CS: the set of clickstream documents beyond the result documents.
15
Query language model cont. All constituent language models employ Dirichlet
prior smoothing:
μ: 2000 c(w,.): the frequency of word w in (.).
16
Facet Language Model The facet language model F is the uniform mixture of
the result snippet s ∈ F
17
Experimental Setup 7 volunteers install proxy to log their search and
browsing activities for a period of 2 months.
18
Experimental Setup cont. Each participant evaluated 8 self-chosen search tasks.
A search task is a sequence of queries, click and browsing actions until the user’s information need is satisfied.
For each task, the participant was presented with the top-50 Google results.
Then the participant was asked to mark each result as highly relevant, relevant or completely irrelevant.
Furthermore, we asked users to group the top-10 results of each query by giving labels to them.
59 search tasks and 89 individual evaluation queries.
19
Experimental Setup cont. Measure: Discounted Cumulative gain (DCG)
i : the rank of the result within the result set G(i): the relevance level of the result.
G(i) = 2 for highly relevant documents. G(i) = 1 for relevant documents. G(i) = 0 for non-relevant documents.
Parameters for the task query language model
20
Evaluation results with re-ranking
Fixed: the same fixed number of expansion terms. Flexible: the optimal threshold τ.
Enforced (no tasks) is unaware of both query facets and tasks.
Enforced (no facets) distinguishes between history tasks but still treats a query always in its entirety.
21
Query Expanding Evaluation
v(w) < τ
v(w) < δ
22
Correlating KL-divergence with performance gains Goal: whether a query benefits from personalization.
A negative correlation indicates that queries with more relevant information in the local index.
23
Parameterizing the personalization framework
24
Efficiency Tasks are computed offline. Using a incremental clustering to fold in the new session.
Hammouda et al. “Incremental document clustering using cluster similarity histograms,” WI’07.
25
Conclusion They proposed a thorough language model that
addresses user tasks and matches the user needs with past user tasks.
The model considers past viewed documents and past queries.
The proposed method achieved significant gains over both the Google ranking and traditional personalization approaches.