101
Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover

Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover

Embed Size (px)

DESCRIPTION

Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover. Introduction. ?. Short (1-2 words). Ambiguous (e.g., Java). Lack of domain knowledge. original query. Query Recommendation. - PowerPoint PPT Presentation

Citation preview

Page 1: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Recommendation

Xiaofei Zhu ([email protected])L3S Research Center, Leibniz Universität Hannover

Page 2: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Introduction

Ambiguous(e.g., Java)

Lack of domain knowledge

Short(1-2 words)

?

Page 3: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

3

Query Recommendation

It aims to provide users alternative queries, which can represent their information needs more clearly in order to return better search results .

original query

recommendation

Page 4: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Recommendation

How to do query recommendation? Find alternative queries with similar search intent. Differ with Document , Image?

19.04.2023 4

Page 5: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query log

Query log. A query log records information about the search actions of the users

of a search engine.

A typical query log is a set of records <qi,ui,ti,Vi,Ci> qi – the submitted query

ui – an anonymized identifier for the user who submitted the query

ti – timestamp, the time at which the query was submitted for search.

Vi – the set of returned results to the query

Ci - the set of documents clicked by the user.

19.04.2023 5

Page 6: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Example of query log (AOL, 2006)

AnonID Query QueryTime ItemRank ClickURL7051923 motorola text messages 2006-03-24 19:35:31 1 http://www.telusmobility.com7051923 motorola text messages 2006-03-24 19:35:31 4 http://support.t-mobile.com7051923 motorola t730 text messages 2006-03-24 19:38:40 2 http://www.phonescoop.com7051923 motorola t730 text messages 2006-03-24 19:38:40 3 http://www.1800mobiles.com7051923 motorola t730 text messages 2006-03-24 19:38:40 5 http://cgi.ebay.com7051923 motorola t730 text messages 2006-03-24 19:38:40 7 http://phonearena.com7051923 spike muscle car 2006-03-25 12:57:43 2 http://www.classicauto-sales.com7051923 spike muscle car 2006-03-25 12:57:43 5 http://sev.prnewswire.com7051923 spike muscle car 2006-03-25 13:00:22 7051923 usps 2006-03-25 14:23:21 1 http://www.usps.com7051923 vc2 auctions 2006-03-25 14:31:417051923 auctions for 1 2006-03-25 14:33:47

19.04.2023 6

Page 7: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Microsoft 2006 RFP dataset

QueryID Query Time URL Position0000003a718649f2 schwab 2006-05-11 08:07:35 http://www.schwab.com/ 10000006d43b549c1 us geography 2006-05-04 14:23:00 http://www.enchantedlearning.com/usa/ 30000006d43b549c1 us geography 2006-05-04 14:23:03 http://www.sheppardsoftware.comState15s_500.html 4

0000016aa52e4fbc wwf 2006-05-21 09:25:34 http://www.panda.org/ 2000002aa6e27443f biggercity 2006-05-07 13:30:45 http://www.biggercity.com/chat/ 11000005aac1f6423f studios 2006-05-09 14:21:29 http://www.shawneestudios.com/contact_us.php 11000008d8afaa459a www.nfl.com 2006-05-28 18:22:39 http://www.nfl.com/teams/NYJ.html 77000009c2848e4a68 north hills school district 2006-05-04 12:29:12 http://www.nhsd.net/ 1

19.04.2023 7

Time Query QueryID SessionID ResultCount2006-05-01 00:00:01defination Gravitational 46c13f0705f6436b 19ab975e898d46d1 112006-05-01 00:00:01kimclement a3d2cae45e2b4c5b 1b748d1afa9b4828 102006-05-01 00:00:01scientology crazy beliefs 418324ef33d14ed2 10f477402db84c9a 102006-05-01 00:00:01www.joj.sk 489238bdf8834d68 16271eb6bf174c5c 92006-05-01 00:00:04www.selectcareers.com f92efd8044904ac4 193f9f8442d44c48 02006-05-01 00:00:08What is May Day? 37afe7af832649d2 21f6a0dfea4348ac 14 2006-05-01 00:00:10vikings draft choices suck b0519e4528d84b44 196b0bb2f1d643f2 102006-05-01 00:00:10wwwcrownawards.com 9eda4716dfb045e2 04e3a26067a84748 0 2006-05-01 00:00:15Australian miners ba6d190cc4cd4fd3 136fd5e571d24886 10

Page 8: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Click-through data

• Click-through data records the clicked documents after user submit a query to the search engine.

Query Feature Representation

Basic Assumption

If user clicks a document after she issues a query, then the clicked document is more or less relevant to the submitted query, thus the query can be represented by it clicked documents.

Query-URL GraphIf two queries co-clicked many common documents, then they have similar search intent.

[Beeferman, KDD’00][Mei, CIKM’08]

How to use query log for query recommendation?

Page 9: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Session

• Query session: a single user submits a sequence of related queries in a time interval for a specific search task.

Association Rules

Basic Assumption

If two queries frequently co-occur in the same sessions, then they are relevant to each other.

Query GraphContinuous submitted queries in short time interval by the same user share similar search intent. [Foneseca, LA-WEB’03]

[Zhang, WWW’06][Boldi, CIKM’08, WSCD’09]

How to use query log for query recommendation?

Page 10: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Relevant Query Recommendation

Query Suggestion Using Hitting Time (CIKM’08) Click-through Data Query-URL Bipartite Graph

Query Suggestions Using Query-Flow Graphs (WSCD’09) Session Data Query-Flow Graph

19.04.2023 10

Page 11: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Relevant Query Recommendation

Query Suggestion Using Hitting Time (CIKM’08) Click-through Data Query-URL Bipartite Graph

Query Suggestions Using Query-Flow Graphs (WSCD’09) Session Data Query-Flow Graph

19.04.2023 11

Page 12: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Suggestion Using Hitting Time (CIKM’08)

Query-URL Bipartite Graph- Edges between V1 and V2

- No edge inside V1 or V2

- Edges are weighted- e.g., V1 = query; V2 = Url

Transition Probabilities

)73(

3),()(

id

jiwjip

)13(

3),()(

jd

jiwijp

A

ij

4

5

7

4V1 V2

7 1

3

2

),(),()(

Vj ji d

jkw

d

jiwkip

Page 13: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Suggestion Using Hitting Time (CIKM’08)

Random Walk and Hitting Time Hitting time. How long does it take to hit node a in a

random walk starting at node b ?

19.04.2023 13

2

3

4

15

• Start at 1

Page 14: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Suggestion Using Hitting Time (CIKM’08)

Random Walk and Hitting Time Hitting time. How long does it take to hit node a in a

random walk starting at node b ?

19.04.2023 14

2

3

4

15

• Start at 1• Pick a neighbor i based

on the transition probability.

• Move to i

t=1

Page 15: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Suggestion Using Hitting Time (CIKM’08)

Random Walk and Hitting Time Hitting time. How long does it take to hit node a in a

random walk starting at node b ?

19.04.2023 15

2

3

4

15

• Start at 1• Pick a neighbor i

uniformly at random• Move to i• Continue

t=2

Page 16: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Suggestion Using Hitting Time (CIKM’08)

Random Walk and Hitting Time Hitting time. How long does it take to hit node a in a

random walk starting at node b ?

19.04.2023 16

2

3

4

15

• Start at 1• Pick a neighbor i

uniformly at random• Move to i• Continue

t=2

If the random walk hits a node quickly,

then its close to the start node!

If the random walk hits a node quickly,

then its close to the start node!

Hitting time!

Page 17: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Hitting time from i to A

i A

Graph G

?Aih

Page 18: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Hitting time from i to A

i A

j

k

Graph G

1Aih L

( , )P i k

( , )P i j

Page 19: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Hitting time from i to A

i A

j

k

Graph G

Ajh

Akh

1 ( , ) ( , )A A Ai j kh P i j h P i k h

( , )P i k

( , )P i j

Page 20: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Generate Query Suggestion

20

T

aa

american airline

mexiana

www.aa.com

www.theaa.com/travelwatch/planner_main.jsp

en.wikipedia.org/wiki/Mexicana

300

15

Query Url• Construct a (kNN)

subgraph from the query log data (of a predefined number of queries/urls)

• Compute transition probabilities p(i j)

• Compute hitting time hiA

• Rank candidate queries using hi

A

Page 21: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Result: Query Suggestion

21

Yahoo

aa route planner

aa route finder

aa airlines

aa meetings

aa autoroute

aa road map

Live

aa route finder

aa route planner

aa airlines

american airlines

aa meeting

aa road map

Query = ‘aa’

Hitting time

alcoholics anonymous

automobile association

theaa

american airlines

american air

american airline ticket reservation

Page 22: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Relevant Query Recommendation

Query Suggestion Using Hitting Time (CIKM’08) Click-through Data Query-URL Bipartite Graph

Query Suggestions Using Query-Flow Graphs (WSCD’09) Session Data Query-Flow Graph

19.04.2023 22

Page 23: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Session Data Definition: the sequence of queries of one particular user

within a specific time limit .

19.04.2023 23

Query Suggestions Using Query-Flow Graphs (WSCD’09)

Page 24: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Graph

24Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, pages 1039–1040, 2006.

• This model works by accumulating many query sessions and adding up the similarity values for many same query pairs

two consecutive queries

queries that are not neighbors in the same session

Page 25: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query-Flow Graph

P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The query-flow graph: model and applications”. CIKM 2008.

Page 26: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Build Query-flow Graph

The key aspect of the construction of the query-flow graph is to define the weighting function w.

19.04.2023 26

represent the number of times the transition was observed in the same search session.

:r E N

Page 27: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Recommendation

The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph.

Random Walk with restart a random surfer starts at the initial query q at each step

α , follows one of the outlinks from the current node 1 - α , jumps back to q

19.04.2023 27

Page 28: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Query Recommendation

The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph.

Random Walk with restart

19.04.2023 28

M - the transition matrix of a Markov chainP - row-normalized weight matrix of the query flow graphej - the vector j-th entry is 1,others are zeroes

(1 )1 TqM P e

Page 29: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Random walks

Random walks on graphs correspond to Markov Chains The set of states S is the set of nodes of the graph G The transition probability matrix is the probability that we

follow an edge from one node to another

Page 30: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

30

Definitions

Adjacency matrix A Transition matrix P1

1

11

1

1/2

1/21

Page 31: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

31

random walk

1

1/2

1/21

t=0

Page 32: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

32

random walk

1

1/2

1/21

1

1/2

1/21

t=0 t=1

Page 33: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

33

random walk

1

1/2

1/21

1

1/2

1/21

t=0 t=1

1

1/2

1/21

t=2

Page 34: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

34

random walk

1

1/2

1/21

1

1/2

1/21

t=0 t=1

1

1/2

1/21

t=2

1

1/2

1/21

t=3

Page 35: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

35

Probability Distributions

xt(i) = probability that the surfer is on node i at time txt+1(i) = ∑j(Probability of being at node j)*Pr(j->i) =∑jxt(j)*P(j,i)xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

What happens when the surfer keeps walking for a long time?

Page 36: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

36

What happens when the surfer keeps walking for a long time? Stationary Distribution

Intuitively the stationary distribution at a node is related to the amount of

time a random walker spends visiting that node. Mathematically

Remember that we can write the probability distribution at a node as

xt+1 = xtP. For the stationary distribution v0 we have

v0 = v0 P

v0 is the left eigenvector of the transition matrix P !

Page 37: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

37

Interesting questions

Does a stationary distribution always exist? Is it unique? Yes, if the graph is “well-behaved”, i.e., P is ergodic

P is ergodic if : irreducible aperiodic

Irreducible Not irreducible

Irreducible: There is a path from every node to every other node.

AperiodicPeriodicity is 3

Aperiodic: State i is periodic with period k if all paths from i to i have length that is multiple of k. Otherwise, it’s aperiodic.

Page 38: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

38

If a markov chain P is irreducible and aperiodic then the largest eigenvalue of the transition matrix will be equal to 1 and all the other eigenvalues will be strictly less than 1. Let the eigenvalues of P be {σi| i=0:n-1} in non-

increasing order of σi .

σ0 = 1 > σ1 > σ2 >= ……>= σn

Page 39: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Result: Query Suggestion (q =“apple” and q =“jeep” )

19.04.2023 39

Page 40: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Why Diversity Query Recommendation

Actually, in query recommendation, only providing the “relevant” recommendations is far away from satisfying users’ information needs.

Original Query :Apple

apple ipad 3

apple iphone 4s

apple tree

apple seed

apple computer

The queries we recommend should cover multiple potential search intents of users and minimize the risk that users will not be satisfied.

Page 41: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Diversity Query Recommendation

Diversifying Query Suggestion Results [Hao Ma, AAAI’10] Query-URL graph Hitting time

A Unified Framework for Recommending Diverse and Relevant Queries[Xiaofei Zhu, WWW’11] Manifold Manifold Ranking with Stop Points

19.04.2023 41

Page 42: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Diversity Query Recommendation

Diversifying Query Suggestion Results [H. Ma, AAAI’10] Query-URL graph Hitting time

A Unified Framework for Recommending Diverse and Relevant Queries[X.F. Zhu, WWW’11] Manifold Manifold Ranking with Stop Points

19.04.2023 42

Page 43: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Graph Construction

19.04.2023 43

Figure 1: Example for Bipartite Graph (extracted from the clickthrough data)

Page 44: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Determining the First Suggested Query

Initial Transition Probability

44

initial transition probability from node i to node j

normalization term, is the total number of times that the query node i has been issued in the dataset.

the number of click frequency between node i and node j

--

--

--

Page 45: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Determining the First Suggested Query

Random Jump In addition to the transition probability, there are random

relations among different queries. It adds a uniform random relation among different queries

19.04.2023 45

-- the probability of taking a “random jump”, i.e., transit among different queries

Without any prior knowledge, it sets , where d is a uniform stochastic distribution vector

--

Page 46: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Determining the First Suggested Query

Random Walk on the Query-URL graph With the transition probabilistic matrix P defined, it then can

perform the random walk on the query-URL graph. the probability of transition from node i to node j after a t

step random walk as:

19.04.2023 46

Explain: 1) The random walk sums the probabilities of all paths of length t between the two nodes. if there are many paths the transition probability will be high 2) The larger the transition probability Pt(i, j) is, the more the node j is similar to the node i.

Page 47: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Determining the First Suggested Query

the largest transition probability from node q will be recommended as the first suggested query performing a t-step random walk

parameter t determines the resolution of the Markov random walk

Large t: the random walk depend more on the graph structure Small t: preserves information about the starting node

19.04.2023 47

Page 48: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Ranking the Rest Queries

Employ the hitting time to rank and diversify the rest of the queries. Hitting time

Let S be a subset of vertex set V, the expected hitting time h(i|S) of the random walk is the expected number of steps before node i is visiting the starting set S.

19.04.2023 48

N(i) denotes the neighbors of node i

Page 49: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Ranking the Rest Queries

Property those nodes strongly connected to s1 will have many fewer

visits by the random walk nodes far away from s1 still allow the random walk to move

among them and thus receive more visits

The second suggestion node select the second suggestion node s2 ∈ Q with the largest

expected hitting time to the subset S containing two nodes q and s1.

19.04.2023 49

Page 50: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Result: Query Suggestion

19.04.2023 50

Page 51: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Diversity Query Recommendation

Diversifying Query Suggestion Results [Hao Ma, aaai’10] Query-URL graph Hitting time

A Unified Framework for Recommending Diverse and Relevant Queries[Xiaofei Zhu, WWW’11] Manifold Manifold Ranking with Stop Points

19.04.2023 51

Page 52: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

19.04.2023 52

relevance diversity

Query Recommendation

Manifold ranking Import stop points

A novel unified frameworkManifold ranking with stop points

Page 53: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

19.04.2023 53

1 1 1

2 2 2

m m m

u u u

u u u

u u u

query1 query2 queryn

Affinity matrix W

Page 54: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Traditional manifold ranking process

19.04.2023 54

W- affinity matrix, D – diagonal matrix

Step 1:

Step 2:

Step 3:

Page 55: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Manifold ranking with stop points

19.04.2023 55

Page 56: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

- set of stop points

- set of free points

T

R

RR RT

TR TT

S SS

S S

0 RT TTS S

( 1) ( )

(1 )

t t

R R R

T T

RR RT

TR T TT

f f y

yS f

S

f

S

S (2)

( 1) ( )

(10

)0

t t

R RR R R

T TR T T

f S f y

f S f y (3)

( 1) ( ) (1 ) t tR RR R Rf S f y (4)

( 1) ( ) (1 ) t tf fS y (1)

Page 57: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

19.04.2023 57

( 1) ( ) (1 )t tR RR R Rf S f y

Page 58: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Results: Query recommendation (‘abc’, ‘yamaha’)

19.04.2023 58

Page 59: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation Metrics

Automatic Evaluation Open Directory Project(ODP) <-> Relevance

Given two queries q and q’

19.04.2023 59

c(q’): Arts/Television/Stations/North America /United States’

c(q): ‘Arts/Television/News’

l(c, c’): their longest common prefix , e.g., ‘Arts/Television’

: the longest category of c and c’, e.g., 5

Page 60: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation Metrics

Automatic Evaluation Open Directory Project(ODP) <-> Relevance

Given two queries q and q’

19.04.2023 60

c(q’): Arts/Television/Stations/North America /United States’

c(q): ‘Arts/Television/News’

Page 61: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation Metrics

Automatic Evaluation Commercial search engine (i.e., Google) <-> Diversity

Given two queries q and q’

19.04.2023 61

o(q, q) is the number of overlapped URLs among thetop k search results of query q and q’.

Page 62: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation Metrics

Automatic Evaluation Commercial search engine (i.e., Google) <-> Diversity

Given two queries q and q’

19.04.2023 62

Page 63: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation Metrics

Automatic Evaluation Open Directory Project(ODP) <-> Relevance Commercial search engine (i.e., Google) <-> Diversity

Evaluation metrics Q-measure

19.04.2023 63

β - parameter to control the tradeoff between relevance and diversity

Page 64: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experiments

Average Q-measure of Query Recommendation over Different Recommendation Size under 5 Approaches.

Proposed Method

Page 65: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Recommendation pool

search results

Experiments

Manual Evaluation Recommendation pool 3 human judges Label tool

Page 66: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experiments

Evaluation Metrics

– Intent-Coverage

– α-nDCG (α -normalized Discounted Cumulative Gain )

Page 67: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experiments

Table 2: Performance of recommendation results over a sample of queries under five different approaches.

Page 68: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

68

Why High Utility Query Recommendation

Focuses on recommending users relevant queries to their initial queries.

Query Levelinitial query

query 1

query 2

query 3

• Common Query Terms (Wen J. et al, WWW2001)

• Same Clicked Documents (Mei Q. et al, CIKM 2008)

• Co-Occurring in Same Search Sessions (Zhang Z.et al, WWW 2006)

Only recommend relevant query is enough for find useful search results?

Page 69: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

69

Why High Utility Query Recommendation

‘iphone start sell’

‘iphone initial release’

iphone sell time

Recommend High Utility Query

Page 70: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Utility Query Recommendation

More Than Relevance: High Utility Query Recommendation By Mining Users’ Search Behaviors[X.F. Zhu, CIKM’12] Probabilistic Graphical Model (Query Utility Model)

Recommending High Utility Query via Session-Flow Graph [X.F. Zhu, ECIR’13] Session-Flow Graph Two-phase model based on absorbing random walk

19.04.2023 70

Page 71: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

High Utility Query Recommendation

More Than Relevance: High Utility Query Recommendation By Mining Users’ Search Behaviors[X.F. Zhu, CIKM’12] Probabilistic Graphical Model (Query Utility Model)

Recommending High Utility Query via Session-Flow Graph [X.F. Zhu, ECIR’13] Session-Flow Graph Two-phase model based on absorbing random walk

19.04.2023 71

Page 72: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

72

A Typical Search Session

UserSatisfied

Information Needs

bad perceived utilitybad posterior utiltiy

red - relevant √ - attractiveness

Page 73: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Probabilistic Graphical Model

73

Ri -1 Ri Ri +1

Ci -1 Ci Ci +1

Ai -1 Si -1 Ai Si Ai +1 Si +1

α β

Ri : whether there is a reformulation at position iCi : whether the user clicks on some of the search results of the reformulation at position i;

Ai : whether the user is attracted by the search results of the reformulaiton at position i;Si : whether the user’s information needs have been satisfied at position i;

Page 74: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

74

Parameter Estimation

Maximum Likelihood Estimation

1: 1: 1: 1:

1 1 1:1

( , , , )

( | , ) ( | , ) ( ) ( | )

M M M M

M

i i i i i i i i ii

P C R A S

P C A R P R R S P A P S C

1( | , ) ( , ) (1 )i iC Ci i i i i i iP C R A R A R A

11 1 1 1 1 1( | , ) ( 1-S )) (1 1-S ))i iR R

i i i i i i iP R R S R R ( (

1( ) ( )( ) (1 )i iA Ai iP Ai

11: ( ) ( )

1 1

( | ) ( ( 1)) (1 ( ( 1)))i i

i iS S

i i k k k kk k

P S C I C I C

Where

Page 75: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Parameter Estimation

Log Likelihood Function

( ) ( )1 1

( )1

( )1

( log( ) (1 ) log(1 )

log( ( ( 1)))

(1 ) log(1 ( ( 1))))

j j

j

j

N Mj j

i i i ij i

ij j

i k kk

ij j

i k kk

L A A

S I C

S I C

1 1

1 1

1 1

1 1

( ( ) )

( ( ) )

( 1) ( ( ) )

( ( ) )

N M ji jj i

t N M

jj i

N M ji jj i

N M

jj i

A I i t

I i t

I C I i t

I i t

Page 76: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Parameter Estimation

Maximize Log Likelihood Function

( ) ( )1 1

( )1

( )1

( log( ) (1 ) log(1 )

log( ( ( 1)))

(1 ) log(1 ( ( 1))))

j j

j

j

N Mj j

i i i ij i

ij j

i k kk

ij j

i k kk

L A A

S I C

S I C

21

M

t tt

L

Regularization term

Lagrange multiplier

0t

Page 77: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Parameter Estimation

Optimization Condition :

Page 78: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Parameter Estimation

Newton-Raphson

Page 79: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Dataset Our experiments are based on publicly available query

logs, namely UFindIt log data. There are totally 40 search tasks represented by 40 test queries.

Page 80: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Metric QRR (Query Relevant Ratio)

MRD (Mean Relevant Document)

Measuring the probability that a user finds relevant results when she uses query q for her search task

Measuring the average number of relevant results a user finds when she uses query q for her search task.

Page 81: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Query Utility Model(QUM): the expected information gain users obtained from the search results of the query according to their original information needs, which is the product of the two component utilities.

QUM

Adjacency (ADJ): given a test query q, the top frequent queries in the same session adjacent to q are recommended to users[www'06].

Co-occurrence (CO): given a test query q, the top frequent queries co-occurred in the same session with q are selected as recommendations [wsdm'10].

ADJ

CO

Query-Flow Graph (QF): query-flow graph based on collective search sessions, and perform a random walk on this graph for query recommendation [cikm'08].

Click-through Graph (CT): query-URL bipartite graph, employs the hitting time as a measure to select queries for recommendation [cikm'08].

QF

CT

Two component utilities (i.e., perceived utility and posterior utility) in the QUM method: Perceived Utility method (PCU) and Posterior Utility method (PTU).

PCU

PTU

Page 82: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experiments

Impact of parameter μ to the performance of QUM

Page 83: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Limitation of QUM method

Cannot make full use of the click-through information. it only considers whether the search results of a

reformulated query have some clicked documents or not, but does not take individually clicked document into consideration.

It is necessary to proposes a novel method to further capture these specific clicked documents for modeling query utility.

19.04.2023 83

Page 84: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

84

Framework of Our Approach

Query-Flow Graph

Reformulation Behaviors

Random Walk

Click Behaviors

Document Nodes

Absorbing States

Session-Flow Graph

Absorbting Random Walk

+

Two-phase model based on Absorbing Random Walk (TARW)

Page 85: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Session Flow Graph

q → q1 → q3

q → q3 → q4

q → q4

query session

Query-Flow Graph: Boldi et al. (CIKM 2008)

Page 86: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Session Flow Graph

q → q1:u1:u2→ q3:u3

q → q3 → q4:u4:u5

q → q4:u6

query session

Session Flow Graph: expands query-flow graph (document nodes + failure nodes)

Page 87: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

87

Session Flow Graph

Definition:

Adjacency MatrixNodes

Edges

Page 88: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

88

Two-phase model based on absorbing random walk (TARW)

Forward Utility Propagation

Backward Utility Propagation

Two-phase Model Based on Absorbing Random Walk

> Utility score was transferred from the original query node to reformulation node, and at last absorbed by document node and failure node.> Utility score was inversely transferred from document nodes to reformulation node.

Recommendation: queries with the highest utilities.

Page 89: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Forward Utility Propagation

89

Assign transition probability to different types of nodes (reformulation, document, failure):

Reformulation Node

—— α2

—— α1

Document Node

—— α3 Failure Node

α1

α2α3

α1+α2+α3=1

Page 90: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Parameter Setting:

α1

α2

α3

—— Reformulation node

—— document node

—— failure node

Our work: assign transition probability based on characteristics of each candidate query.

observed transition probabilityprior transition probability

Previous work (Sadikov, WWW2010): share the same transition probability setting (a1,a2,a3) to different types of nodes.

posterior transition probability

Page 91: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

91

Transition Probability

Reformulation Nodes

Document Nodes:

Failure Node:

Page 92: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Computing the Distribution

In the forward utility propagation, the corresponding transition matrix is:

PQ : n n transition matrix on query nodes

PD : n m matrix of transition from query node to document node

ID,IS: identity matrix, denoting document nodes and failure nodes are absorbing states.

PS : n 1 matrix of transition from query to failure node.

reducible (no station distribution)

Page 93: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Computing the Distribution

Computing the absorbing distribution by an iterative way :

Pt[i, j] represents the probability of node i to node j after t step walk.

we only have to compute the probability from query to document. O(tn3+n2m)

in recommendation scenario, only the probability from original query to documents are needed, i.e. computing the matrix row of original query.

O(tn2+nm)

Page 94: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

94

Backward Utility Propagation

Page 95: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Dataset Our experiments are based on publicly available query

logs, namely UFindIt log data. There are totally 40 search tasks represented by 40 test queries.

Page 96: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Metric QRR (Query Relevant Ratio)

MRD (Mean Relevant Document)

Measuring the probability that a user finds relevant results when she uses query q for her search task

Measuring the average number of relevant results a user finds when she uses query q for her search task.

Page 97: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Experimental Results

Overall Evaluation Results

TARW method significantly better than all the baseline recommendation methods (p-value <= 0.05))

TARW

Page 98: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation of Document Utility

Baseline methods: Document Frequency Based Method (DF)

the click frequency of a document reflects users preference for that document when they search with the original query

Session Document Frequency Based Method (SDF) clicked documents within the same search session convey

the similar search intent

Markov-model Based Method (MM): Based on the learned document distribution for the original

query by a Markov-model based method

Page 99: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Metrics: Precision at position k(P@k)

Normalized Discounted Cumulative Gain(NDCG)

Mean Average Precision (MAP)

Evaluation of Document Utility

Page 100: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Evaluation of Document Utility

TARW improvements over MM by:1) using an adaptive transition probability setting to different types of nodes2) modeling users' behaviors of giving up their search tasks by introducing the failure nodes.

Page 101: Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center,  Leibniz  Universität  Hannover

Summary

query recommendation techniques

High Relevant Query Recommendation

High Diversity Query Recommendation

High Utility Query Recommendation

19.04.2023 101