Upload
brandon-johns
View
213
Download
0
Embed Size (px)
Citation preview
Enhancing Web Search by Promoting Mul-tiple Search Engine Use
Ryen W. W., Matthew R. Mikhail B. (Microsoft Research)
Allison P. H (Rice University)
SIGIR 2008
Presented by Jae-won Lee
Copyright 2008 by CEBT
Introduction
Users are generally loyal to one search engine Even though it may not satisfy their needs
A given search engine performs well for some queries and poorly for others Excessive loyalty can hinder search effectiveness
Our Goal Support engine switching by recommending the most effective
search engine for a given query
IDS Lab. Seminar - 2Center for E-Business Technology
Copyright 2008 by CEBT
Related Work
Meta search engines such as Clusty and Dogpile Merge search results
Switching search engine is more attractive Strong brand loyalty may discourage users from migrating to meta
search engine
Meta search engine eliminates the benefits of interface feature of each engine
Hurts source engine brand awareness
We lets users keep their default engine Suggest an alternative engine if it performs better for the current
query
IDS Lab. Seminar - 3Center for E-Business Technology
Copyright 2008 by CEBT
Does Switching Help Users?
Potential Benefits of Switching We quantify the benefits of multiple engine use
– Normalized Discounted Cumulative Gain (NDCG)
Measure a topical relevance of results to a given query
– Click-through rate for search results
Reasonable estimation of search result utility
NDCG
– A measure to evaluate the Web search engine performance
Where Ni : a constant for normalization
r(i) : relevance score of the ith result, 0 (bad) ~ 5 (perfect)
IDS Lab. Seminar - 4Center for E-Business Technology
Copyright 2008 by CEBT
Does Switching Help Users?
Potential Benefits of Switching (cont’d) X, Y, and Z are anonymous notations of Google, Yahoo, and Live
Search
Number of queries for which engine performs best
Engine choice for particular query is important
IDS Lab. Seminar - 5Center for E-Business Technology
Search engine Relevance (NDCG) Result click-through rate
X 952 (19.3%) 2,777 (56.4%)
Y 1,136 (23.1%) 1,226 (24.9%)
Z 789 (16.1%) 892 (18.1%)
No difference 2,044 (41.5%) 26 (0.6%)
Copyright 2008 by CEBT
Switching as Classification - Query Pro-cessing
IDS Lab. Seminar - 6Center for E-Business Technology
Query
FeatureExtractor
Classifier(offline Training)
Recommend a Search Engine
Features
Search Engines
Result sets
Copyright 2008 by CEBT
Classifier Features
Classifier must recommend an engine in real-time Derive features from result pages, a query and query-result match-
ing
Features Features from result pages
Features from the query
Features from the query-result page match
IDS Lab. Seminar - 7Center for E-Business Technology
Copyright 2008 by CEBT
Classifier Features
IDS Lab. Seminar - 8Center for E-Business Technology
Copyright 2008 by CEBT
Classifier
Notation q : query
R : result page of original engine
R’ : result page of target engine
R* = {(d1,s1), …, (dk,sk)} : Human-judged result set - dk(result
page), sk (score)
Utility of each engine : U(R) = NDCGR*(R) , U(R’) = NDCGR*(R’)
Training Each training instance D = {(x, y)}
– x = f(q, R, R’) ; comprised of features derived from the query and result page
– y = 1 iff NDCGR*(R’) >= NDCGR*(R) + margin
Switching engine if utility is higher by at least some margin
IDS Lab. Seminar - 9Center for E-Business Technology
Copyright 2008 by CEBT
Experiments
Evaluate accuracy of switching support to determine its viability
Data Set From Google, Yahoo, Live Search logs
IDS Lab. Seminar - 10Center for E-Business Technology
Total number of queries 17,111
Total number of judged pages 4,254,730
Total number of judged pages labeled Fair or higher 1,378,011
Copyright 2008 by CEBT
Precision – Recall Results
The proposed method can achieve high accuracy
Therefore, the method can be used for providing useful search engine suggestions to users
IDS Lab. Seminar - 11Center for E-Business Technology
Copyright 2008 by CEBT
Avoiding Querying the Alternative En-gine
Evaluating the utility of target engine is undesirable to some users due to the network traffic
So, only use the features from the current engine’s result pages
IDS Lab. Seminar - 12Center for E-Business Technology
Copyright 2008 by CEBT
Contribution of Features
All sets of features contribute to accuracy
Features obtained from result pages seems to provide the most benefit
IDS Lab. Seminar - 13Center for E-Business Technology
Copyright 2008 by CEBT
Conclusion
Demonstrated potential benefit of switching
Described a method for automatically determining when to switch engines for a given query
Evaluated the method and illustrated good performance, especially at usable recall
IDS Lab. Seminar - 14Center for E-Business Technology
Copyright 2008 by CEBT
Pros. & Cons.
Pros. Propose a new research area of IR by switching support
Good explanation for user behavior
Cons. Poor explanation for equations
No analysis for the experiment results
IDS Lab. Seminar - 15Center for E-Business Technology