Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong

DQR: A Probabilistic Approach to Diversified Query Recommendation

DQR: A Probabilistic Approach to Diversified Query RecommendationRuirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric LoSpeaker: Ruirui Li1The University of Hong KongOutlineMotivationProblem StatementDQR ModelExperiments & Evaluation

22Motivation Massive information arose on the Internet.

32004201220081 trillion8 billion???Numbers from Google Annul report in 2004 and its official blog in 2008yearNumber of Indexed URLs by Google

3

MotivationUser activities in the searching Process.

4querySearch resultsuserSearch intent

: CIKM living place: CIKM 2012 Hotel: Maui Hotel

clicks

Lodging CIKM

Motivation

5querySearch resultsuserSearch intent


clicks

MineLodging CIKM

MotivationThe effectiveness of IR depends on input queries.Users suffer:Translating human thoughts (search intent) into a concise set of keywords (query) is never straightforward.

6

Search resultsSearch intent


clicksLodging CIKM

MotivationInput queries are short.Composed of only one or two terms.Number of terms in a query.

7

7MotivationShort queries lead to two issues.Issue 1. Ambiguity:Example: query ``jaguar

Issue 2. Not specific enough:Example: query ``Disney

8

CatCartoonStoreParkNFL TeamAutomobile Brand1. The word ``Disney here is not ambiguous, since it is too short, therefore the query is too general. Search engines may not know what users want to find.8MotivationMost traditional approaches focus on relevance.1. The most relevant queries to the input query tend to be similar to each other.2. This generates redundant and monotonic recommendations. 3. Such recommendations provide limit coverage of the recommendation space.

9MotivationA recommender should provide queries that are not only relevant but also diversified.With diversified recommendations:1. We can cover multiple potential search intents of the user.2. The risk users wont be satisfied is minimized.3. Finally, users find desired targets in fewer recommendation cycles.

10Problem statement

11

Output: a list of recommended queries Y.

Input: a query q and an integer m.Query recommenderm: Number of recommended queriesRecommended queries YQuery qGOAL: At least one query in Y is relevant to the users search intent.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.

11Problem statement

12

Output: a list of recommended queries Y.

Input: a query q and an integer m.Query recommenderm: Number of recommended queriesRecommended queries YQuery qGOAL: At least one query in Y is relevant to the users search intent.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.

12Problem statement

13

Query recommenderm: Number of recommended queriesRecommended queries YQuery qFive properties:1. Relevance.2. Redundancy-free.3. Diversity.4. Ranking.5. Real time response.Relevance: the search intents of recommended queries should not drift dramatically from the input query.Diversity: The recommended queries should cover as many different interpretations of input query as possible.Highly relevant queries rank high in the recommendation list; Diverse queries rank high in the recommendation list.QR is an online application.

13DQR: framework

14

Offline: Redundancy-free issue.Mine query concepts from search log.Online: Diversity issue.Propose a probabilistic diversification model.

DQR: offlineMining query concepts.The same search intent can be expressed by different queries.Example: ``Microsoft Research Asia, ``MSRA, ``MS Research Beijing.A query concept is a set of queries which express the same or similar search intents.

15Microsoft Research AsiaMSRAMS Research Beijing1.For example, Microsoft Research Asia, MSRA, MS Research Beijing may forms a query concept.2.Finnaly, we only recommend one query from each chosen query concept, This help us to reduce redundancy. (Main reason)3. The number of concepts is less than the number of queries. This helps in order to respond in real time.4.Using specific queries can not capture the search intents concisely. Using concepts is better.

1.Based on the search log, for each query, we use its clicked URLs as the features. User issue a query to express his search intent, and then clicked some of the retrieved URLs, means that the clicked URLs reflected or served his search intent.2. Each query is represented by a URL vector, similarity metric: Euclidean distance.

15DQR: online

16

1.We do DQR on concept level2.Greedy approach3.Ussing query representative of selected concepts as recommendations

16DQR: onlineGreedy strategy:

17

Concept selection

Concept poolInput query:

1.2.1.3.2.m.1.We do DQR on concept level2.Greedy approach3.Ussing query representative of selected concepts as recommendations

17DQR: diversification : query concept belongs. : query concepts already selected. : query concept to be selected.

Objective function:

Favor query concepts which are relevant to the input query.Penalize query concepts which are relevant to the query concepts already selected.

18

DQR: diversificationObjective function:

Estimation:

19

DQR: diversificationClick set s: A set of clicked URLs.

20

DQR: diversificationObjective function:

Relevance:

Diversity:

21

ExperimentsDatasets:Search log collected from search engine.Search log collected from search engine.

AOL time period: 01 March, 2006-31 May, 2006.SOGOU time period: 01 June, 2008-31 June, 2008.

22

1.We remove non-alpha-numerical characters from query strings except for dot . and space.2.We remove queries that appear only once in the search log.22BaselineNo golden standard for query commendation

23

1.No golden standard2. DQR-OPC employs another clustering algorithm in its concept extraction step23EvaluationUser study12 users, 60 test queries

24

24For a test query q and recommendations by a certain approach.

Three relevance levels:Irrelevant (0 points)Partially relevant (1 point)Relevant (2 points)

Evaluation

25

Recommendations25EvaluationThree performance Metrics:RelevanceDiversityRanking

261.No golden standard2. DQR-OPC employs another clustering algorithm in its concept extraction step26

Relevance

27Results on AOLQuery levelConcept levelN number of non-zero recommendationsS_1,2 average score for partially relevant and relevant queriesS_0,1,2 average score for all recommended queries27DiversityMetric: Intent-CoverageIt measures the unique search intents covered by the top m recommended queries.Since each intent represents a specified user search intent, higher Intent-Coverage indicates higher probability to satisfy different users.

2828For a test query q and recommendations by a certain approach.

Three relevance levels:Irrelevant (0 points)Partially relevant (1 point)Relevant (2 points)

Evaluation

29

Recommendations29DiversityMetric: Intent-Coverage

30Results on AOL

30

RankingMetric: Normalized Discounted Cumulative Gain (NDCG)

31

Results on AOL31Thanks!Questions

Suggestions

32

Diversity rankingMetric:

33Results on AOLResults on SOGOU

33MotivationDiversification is highly needed by the use of mobile devices. One in Seven queries come from mobile devices.

With limited space.

34

3.5 inch17.0 inch15.4 inch13.3 inchScreen size is much smallerNumbers from Global mobile statistics 2012 (mobiThinking)1.According to Global mobile statistics 2012(Known as mobiThinking), about one in seven queries now come from mobile devices, such as smart phones. 2.Google also provides recommendations when people search on smart phones.3.These mobile devices are with smaller screen size, when they click recommendations, usually they have to do extra operation(zoom in). If they do extra operation and only to find redundant recommendations, people may not look at recommendations anymore next time, this abate the effectiveness of recommendation.

34DQR: clustering A Hawaii restaurant:Unlimited tables.Each table can hold unlimited customers.Customers arrives in a stream.Problem: whenever a customer arrives, assign him to a table.Properties:Familiar people together.Unfamiliar people apart.

35

DQR: clusteringCustomer stream

36

Compactness control: This illustrates the basic idea how it group similar people together.In our work, we gradually increase L_max, which makes clustering results less sensitive to query order.

36DQR Extract representative query from query concept.Voting strategy:Compute a score for each query

A score for each query q is therefore computed as:

37

The representative should be able to reflect the search intent more accurately than other queries in the concept.Representative selected by voting strategy likely better conveys the central meaning of the concept, since more users prefer to use this query to describe their search intent contained in this concept

37Relevance

38Results on SOGOU

The representative should be able to reflect the search intent more accurately than other queries in the concept.Representative selected by voting strategy likely better conveys the central meaning of the concept, since more users prefer to use this query to describe their search intent contained in this concept

38CIKMProc. of 2012 Int. Conf. on Information and Knowledge Management (CIKM12), Maui, Hawaii, Oct. 2012, to appear

3939

Documents

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong