Upload
carol-greer
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
DQR: A Probabilistic Approach to Diversified Query Recommendation
DQR: A Probabilistic Approach to Diversified Query RecommendationRuirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric LoSpeaker: Ruirui Li1The University of Hong KongOutlineMotivationProblem StatementDQR ModelExperiments & Evaluation
22Motivation Massive information arose on the Internet.
32004201220081 trillion8 billion???Numbers from Google Annul report in 2004 and its official blog in 2008yearNumber of Indexed URLs by Google
3
MotivationUser activities in the searching Process.
4querySearch resultsuserSearch intent
: CIKM living place: CIKM 2012 Hotel: Maui Hotel
clicks
Lodging CIKM
Motivation
5querySearch resultsuserSearch intent
: CIKM living place: CIKM 2012 Hotel: Maui Hotel
clicks
MineLodging CIKM
MotivationThe effectiveness of IR depends on input queries.Users suffer:Translating human thoughts (search intent) into a concise set of keywords (query) is never straightforward.
6
Search resultsSearch intent
: CIKM living place: CIKM 2012 Hotel: Maui Hotel
clicksLodging CIKM
MotivationInput queries are short.Composed of only one or two terms.Number of terms in a query.
7
7MotivationShort queries lead to two issues.Issue 1. Ambiguity:Example: query ``jaguar
Issue 2. Not specific enough:Example: query ``Disney
8
CatCartoonStoreParkNFL TeamAutomobile Brand1. The word ``Disney here is not ambiguous, since it is too short, therefore the query is too general. Search engines may not know what users want to find.8MotivationMost traditional approaches focus on relevance.1. The most relevant queries to the input query tend to be similar to each other.2. This generates redundant and monotonic recommendations. 3. Such recommendations provide limit coverage of the recommendation space.
9MotivationA recommender should provide queries that are not only relevant but also diversified.With diversified recommendations:1. We can cover multiple potential search intents of the user.2. The risk users wont be satisfied is minimized.3. Finally, users find desired targets in fewer recommendation cycles.
10Problem statement
11
Output: a list of recommended queries Y.
Input: a query q and an integer m.Query recommenderm: Number of recommended queriesRecommended queries YQuery qGOAL: At least one query in Y is relevant to the users search intent.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.
11Problem statement
12
Output: a list of recommended queries Y.
Input: a query q and an integer m.Query recommenderm: Number of recommended queriesRecommended queries YQuery qGOAL: At least one query in Y is relevant to the users search intent.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.
12Problem statement
13
Query recommenderm: Number of recommended queriesRecommended queries YQuery qFive properties:1. Relevance.2. Redundancy-free.3. Diversity.4. Ranking.5. Real time response.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.
13DQR: framework
14
Offline: Redundancy-free issue.Mine query concepts from search log.Online: Diversity issue.Propose a probabilistic diversification model.
DQR: offlineMining query concepts.The same search intent can be expressed by different queries.Example: ``Microsoft Research Asia, ``MSRA, ``MS Research Beijing.A query concept is a set of queries which express the same or similar search intents.
15Microsoft Research AsiaMSRAMS Research Beijing1.For example, Microsoft Research Asia, MSRA, MS Research Beijing may forms a query concept.2.Finnaly, we only recommend one query from each chosen query concept, This help us to reduce redundancy. (Main reason)3. The number of concepts is less than the number of queries. This helps in order to respond in real time.4.Using specific queries can not capture the search intents concisely. Using concepts is better.
1.Based on the search log, for each query, we use its clicked URLs as the features. User issue a query to express his search intent, and then clicked some of the retrieved URLs, means that the clicked URLs reflected or served his search intent.2. Each query is represented by a URL vector, similarity metric: Euclidean distance.
15DQR: online
16
1.We do DQR on concept level2.Greedy approach3.Ussing query representative of selected concepts as recommendations
16DQR: onlineGreedy strategy:
17
Concept selection
Concept poolInput query:
1.2.1.3.2.m.1.We do DQR on concept level2.Greedy approach3.Ussing query representative of selected concepts as recommendations
17DQR: diversification : query concept belongs. : query concepts already selected. : query concept to be selected.
Objective function:
Favor query concepts which are relevant to the input query.Penalize query concepts which are relevant to the query concepts already selected.
18
DQR: diversificationObjective function:
Estimation:
19
DQR: diversificationClick set s: A set of clicked URLs.
20
DQR: diversificationObjective function:
Relevance:
Diversity:
21
ExperimentsDatasets:Search log collected from search engine.Search log collected from search engine.
AOL time period: 01 March, 2006-31 May, 2006.SOGOU time period: 01 June, 2008-31 June, 2008.
22
1.We remove non-alpha-numerical characters from query strings except for dot . and space.2.We remove queries that appear only once in the search log.22BaselineNo golden standard for query commendation
23
1.No golden standard2. DQR-OPC employs another clustering algorithm in its concept extraction step23EvaluationUser study12 users, 60 test queries
24
24For a test query q and recommendations by a certain approach.
Three relevance levels:Irrelevant (0 points)Partially relevant (1 point)Relevant (2 points)
Evaluation
25
Recommendations25EvaluationThree performance Metrics:RelevanceDiversityRanking
261.No golden standard2. DQR-OPC employs another clustering algorithm in its concept extraction step26
Relevance
27Results on AOLQuery levelConcept levelN number of non-zero recommendationsS_1,2 average score for partially relevant and relevant queriesS_0,1,2 average score for all recommended queries27DiversityMetric: Intent-CoverageIt measures the unique search intents covered by the top m recommended queries.Since each intent represents a specified user search intent, higher Intent-Coverage indicates higher probability to satisfy different users.
2828For a test query q and recommendations by a certain approach.
Three relevance levels:Irrelevant (0 points)Partially relevant (1 point)Relevant (2 points)
Evaluation
29
Recommendations29DiversityMetric: Intent-Coverage
30Results on AOL
30
RankingMetric: Normalized Discounted Cumulative Gain (NDCG)
31
Results on AOL31Thanks!Questions
Suggestions
32
Diversity rankingMetric:
33Results on AOLResults on SOGOU
33MotivationDiversification is highly needed by the use of mobile devices. One in Seven queries come from mobile devices.
With limited space.
34
3.5 inch17.0 inch15.4 inch13.3 inchScreen size is much smallerNumbers from Global mobile statistics 2012 (mobiThinking)1.According to Global mobile statistics 2012(Known as mobiThinking), about one in seven queries now come from mobile devices, such as smart phones. 2.Google also provides recommendations when people search on smart phones.3.These mobile devices are with smaller screen size, when they click recommendations, usually they have to do extra operation(zoom in). If they do extra operation and only to find redundant recommendations, people may not look at recommendations anymore next time, this abate the effectiveness of recommendation.
34DQR: clustering A Hawaii restaurant:Unlimited tables.Each table can hold unlimited customers.Customers arrives in a stream.Problem: whenever a customer arrives, assign him to a table.Properties:Familiar people together.Unfamiliar people apart.
35
DQR: clusteringCustomer stream
36
Compactness control: This illustrates the basic idea how it group similar people together.In our work, we gradually increase L_max, which makes clustering results less sensitive to query order.
36DQR Extract representative query from query concept.Voting strategy:Compute a score for each query
A score for each query q is therefore computed as:
37
The representative should be able to reflect the search intent more accurately than other queries in the concept.Representative selected by voting strategy likely better conveys the central meaning of the concept, since more users prefer to use this query to describe their search intent contained in this concept
37Relevance
38Results on SOGOU
The representative should be able to reflect the search intent more accurately than other queries in the concept.Representative selected by voting strategy likely better conveys the central meaning of the concept, since more users prefer to use this query to describe their search intent contained in this concept
38CIKMProc. of 2012 Int. Conf. on Information and Knowledge Management (CIKM12), Maui, Hawaii, Oct. 2012, to appear
3939