58
Click Chain Model in Web Search Fan Guo Carnegie Mellon University 1 07/02/22 WWW'09, Madrid, Spain

Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Embed Size (px)

Citation preview

Page 1: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Click Chain Model in Web Search

Fan GuoCarnegie Mellon University

104/10/23 WWW'09, Madrid, Spain

Page 2: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Chao LiuMSR, ISRC-Redmond

Yi-Min WangMSR, ISRC-Redmond

MSR, CambridgeMike Taylor

MSR, Search LabAnitha Kannan

MSR, CambridgeTom Minka

Carnegie Mellon UniversityChristos Faloutsos

Joint Work With…

Page 3: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain
Page 4: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

04/10/23 WWW'09, Madrid, Spain 4

Page 5: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Click Logs

• Auto-generated data keeping important information about search activity.

504/10/23 WWW'09, Madrid, Spain

Rank/Position URL of Document Click1 www.metalwayfestival.com 0

2 www.maquitec. com 03 www.construmat.com 04 www.hispack.com 05 www.themarket.com 06 www.cursabombers.com 07 www.setegibernau.com 08 www2009.org 19 www.solardecathlon.upe.es 0

10 www.nxtbook.com/nxtbooks/suny/2009spring 0

Query www 2009 Time 21 Apr 2009, 9:01:02

Page 6: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Problem Definition

• Given a click log data set, for each query-document pair, compute user-perceived relevance.

604/10/23 WWW'09, Madrid, Spain

Rank/Position Document Idx Click1 1 02 8 03 3 04 7 05 5 06 12 07 2 08 5 19 42 0

10 20 0

Query www 2009

Session Index 103

Document Idx Relevance1 ?

2 ?

3 ?

4 ?

5 ?

6 ?

7 ?

8 ?

9 ?

Impression Data

Click Data

Page 7: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Relevance Representation

04/10/23 WWW'09, Madrid, Spain 7

Excellent

Good

Fair

Bad

0 1

Click Chain Model

0.75

Previous Click ModelsHuman Judge

Integration

Page 8: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Applications

• Automated Ranking Alterations

• Search Engine Performance Metric

• Calibrate Human Judgment

• Related Application in Sponsored Search

804/10/23 WWW'09, Madrid, Spain

Page 9: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Roadmap

• Motivation and Problem Definition• Click Model Basics• CCM and Algorithms• Experimental Evaluation• Related Work and Conclusion

04/10/23 WWW'09, Madrid, Spain 9

Page 10: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

04/10/23 WWW'09, Madrid, Spain 10

Page 11: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Eye-Tracking User Study

1104/10/23 WWW'09, Madrid, Spain

Fixation Heat Map

Page 12: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

• Overall: Fixation is biased towards higher ranks, so do the clicks.

• For each position:fixation/clicks are context dependent.

1204/10/23 WWW'09, Madrid, Spain

Normal Impression

Reversed Impression

Page 13: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Problem Definition (Recap)

• Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be– Aware of the position bias and context

dependency– Scalable to Terabyte data– Incremental to stay updated

04/10/23 13WWW'09, Madrid, Spain

Page 14: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Examination Hypothesis

• User behavior abstraction:Fixation → binary examination variableClick → binary click variable

• A document must be examined before being clicked.

1404/10/23 WWW'09, Madrid, Spain

Page 15: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Examination Hypothesis

• For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1)

• The position bias is reflected in the derivation of P(Examination).

1504/10/23 WWW'09, Madrid, Spain

Page 16: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

• User scans through documents and make decisions in strict linear order.

• The decision process: E1, C1, E2, C2,…

• Essential part of click model:– What is the probability of “See Next Doc”?

Cascade Hypothesis

1604/10/23 WWW'09, Madrid, Spain

Page 17: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Roadmap

• Motivation and Problem Definition• Click Model Basics• CCM and Algorithms• Experimental Evaluation• Related Work and Conclusion

04/10/23 WWW'09, Madrid, Spain 17

Page 18: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

The Context• Top-10 organic search results only.

• Query sessions are independent.• Semantic info are not used.

04/10/23 WWW'09, Madrid, Spain 18

Suggestions

Ads

Other Elements

Page 19: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

User Behavior Description

04/10/23 WWW'09, Madrid, Spain 19

Examine the Document

Click?

See Next Doc?

DoneNo

Yes

Yes

No

Yes

iR

1 iRSee Next

Doc?

DoneNo

2 31 i iR R

Page 20: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

Click Chain Model

20

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

Page 21: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Why Bayesian?

• Modeling Benefit:– A principled way of smoothing the relevance

estimates;– Offers more flexibility such as computing P(Ri>Rj).

• Computational Benefit:– Avoid iterative optimization procedure in

maximum-likelihood estimation

04/10/23 WWW'09, Madrid, Spain 21

log 1 j iR

Page 22: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Relevance Inference

• Given a query, and all its click data compute the posterior for each possible j.

• Let then focus on click probability for a particular

session, and look at different cases 04/10/23 WWW'09, Madrid, Spain 22

1,..., NC CC

|jp R C

1

| |N

nj j j

n

p R p R P C R

C

Page 23: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

Click Chain Model

23

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

Examination Hypothesis

Cascade Hypothesis

Page 24: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

24

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

1 1Case I: | 1 P C R R

0 1 0 1

Page 25: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

25

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

2 2 3 2 2| 1 1Case I /I: P C R R R

0 1 0 1

Page 26: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

26

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

2 33 3 3

1 2

|Case III: 12

P C R R R

0 1 0 1

Page 27: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

27

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

4 4 4Case IV: | 1 P C R R

0 1 0 1

Page 28: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

C4C3C2C1

28

R1

E1 E2

R2 R3 R4

E3 E4

04/10/23 WWW'09, Madrid, Spain

C5

R5

E5

5 5 5Case IV: | 1 P C R R

0 1 0 1

Page 29: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Putting them together

2904/10/23 WWW'09, Madrid, Spain

0| 1m

jm

K

j jK

mp R R R C

1Case I: K

2 0Case II: , K K

3 0Case III: , K K

3Case IV: ,

where is the last clicked position.

j lK

l

Page 30: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Summary of the Algorithm

• Initializing (2*10+2) counts for each pair;• Go through the click log once and update the

counts;• Compute parameter values and get β values;• Ready to output results (using numerical

integration if necessary).

3004/10/23 WWW'09, Madrid, Spain

Page 31: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Sanity Check

• The algorithm should be– Aware of the position bias and context

dependency

– Scalable to Terabyte data Single Pass, Linear

– Incremental to stay updated Update counts

04/10/23 31WWW'09, Madrid, Spain

Page 32: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Roadmap

• Motivation and Problem Definition• Click Model Basics• CCM and Algorithms• Experimental Evaluation• Related Work and Conclusion

04/10/23 WWW'09, Madrid, Spain 32

Page 33: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Data Set

• Collected in 2 weeks in July 2008.• Preprocessing:

– Discard no-click sessions for fair comparison.– 178 most frequent queries removed.

• Split to training/test sets according to time stamps.

3304/10/23 WWW'09, Madrid, Spain

Page 34: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Data Set

• After preprocessing:– 110,630 distinct queries;– 4.8M/4.0M query sessions in the training/test set.

3404/10/23 WWW'09, Madrid, Spain

Page 35: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Metric

• Efficiency:– Computational Time

• Effectiveness: – With known document identities in the test set,– Using the relevance and parameter learned on the

training set, – To do Click Prediction.

04/10/23 WWW'09, Madrid, Spain 35

(resort to indirect measure)

Page 36: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Competitors

• UBM: User Browsing Model (Dupret et al., SIGIR’08)

– More parameters– Iterative, more expensive algorithm

• DCM: Dependent Click Model (WSDM’09)

– Modeling 1+ clicks per session

04/10/23 WWW'09, Madrid, Spain 36

Page 37: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results - Time

• Environment: Unix Server, 2.8GHz cores, MATLAB R2008b.

04/10/23 WWW'09, Madrid, Spain 37

CCM UBM DCM9.8 min 333 min 5.4 min

1.0 34 0.55

Page 38: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Perplexity

• Perplexity: quality of click prediction for each position individually.

3804/10/23 WWW'09, Madrid, Spain

/ /entropyperplexity 2 1/ 1/H TN N N N

H Tp p

Random Guess (pH=0.5): 2.00Best Guess (pH=0.8): 1.65Ground Truth (Cheating): 1.00

Page 39: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Perplexity

3904/10/23 WWW'09, Madrid, Spain

Worse

Better

Page 40: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Perplexity

• Average Perplexity over top 10 positions.

4004/10/23 WWW'09, Madrid, Spain

Model CCM UBM DCMPerplexity 1.1479 1.1577 1.1590Equiv. PH 0.0309 0.0334 0.0337

Improv. 7.5% 8.3%

Page 41: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Log Likelihood

• Log-likelihood: log of the chance to recover the entire click vector out of 210 possibilities.

4104/10/23 WWW'09, Madrid, Spain

Model CCM UBM DCMLL -1.171 -1.264 -1.302

Likelihood 0.3100 0.2719 0.2826Improv. 9.7% 14%

Page 42: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Log Likelihood

4204/10/23 WWW'09, Madrid, Spain

Better

Worse

Page 43: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Roadmap

• Motivation and Problem Definition• Click Model Basics• CCM and Algorithms• Experimental Evaluation• Related Work and Conclusion

04/10/23 WWW'09, Madrid, Spain 43

Page 44: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Related Work

• User behavior study and hypothesis– Eye-tracking Study (Joachims et al., KDD’05, ACM TOIS)

– Examination Hypothesis (Richardson et al., WWW’07)

– Cascade Hypothesis (Craswell et al., WSDM’08)

• Other click models– Logistic Regression (Dupret et al., SIGIR’08)

– Dynamic Bayesian Network (Chapelle et al., WWW’09)

– Bayesian Browsing Model (KDD’09, To appear)

4404/10/23 WWW'09, Madrid, Spain

Page 45: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Conclusion

• Click Chain Model– A probabilistic approach to interpret clicks.– A Bayesian approach to model relevance.– Both scalable and incremental.

• Future Directions– Validation/Bucket Test.– Pairwise comparison– More on context dependency

4504/10/23 WWW'09, Madrid, Spain

Page 46: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Thank you :-)

4604/10/23 WWW'09, Madrid, Spain

Page 47: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Abstract/Document Relevance

• Relevance of Abstract: – Conditional probability of click as defined by

examination hypothesis

• Relevance of Document:– Determines the probability of “See Next Doc”– A binary random variable (integrated out under CCM)

04/10/23 WWW'09, Madrid, Spain 47

~ ( ), 1| 1abstract i i i abstractr p R P C E r

11 2 3

~ ( )

1| 1, 1 document document

document abstract

r ri i i

r Bernoulli r

P E E C

Page 48: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Alt. User Behavior Description

04/10/23 WWW'09, Madrid, Spain 48

Examine the Document

Click?

Relevant?

Yes

Yes

Yes

No

No

See Next Doc?

See Next Doc?

See Next Doc?ir

~ ( )i ir p R Yes

2

Yes

3

Page 49: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Results – Perplexity (by Freq)

4904/10/23 WWW'09, Madrid, Spain

Worse

Better

Page 50: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Examination/Click Distribution

5004/10/23 WWW'09, Madrid, Spain

Page 51: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Predicting First/Last Clicks

• Root-Mean-Square error in predicting the first/last clicked position for the test data.

• Two approaches (bias/variance tradeoff):– EXPectation: using the expected value (bias)– SIMulation: drawing sample from the model

(variance)

5104/10/23 WWW'09, Madrid, Spain

Page 52: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

First Clicked Position

5204/10/23 WWW'09, Madrid, Spain

Page 53: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

Last Clicked Position

5304/10/23 WWW'09, Madrid, Spain

Page 54: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

A Quick Example

• Here we are interested in R3

54

-1 -0.63 0.83 -0.33

0 1 2 3

0

4 5 6

0 0

..

-0.11 -0

0 0 0

.

.

04

m

m

m

K

0

04/10/23 WWW'09, Madrid, Spain

0| 1m

jm

K

j jK

mp R R R C

Page 55: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

A Quick Example

• Here we are interested in R3

55

-1 -0.63 0.83 -0.33

0 1 2 3

4 5 6 .

0 0 1 1

-0.11

0

-0.0

0

4

..

m

m

m

K

0

04/10/23 WWW'09, Madrid, Spain

C4C3C2C1

Page 56: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

A Quick Example

• Here we are interested in R3

56

-1 -0.63 0.83 -0.33

0 1 2 3

1

4 5 6

0

..

1

-0.11

.

0

0

-0.04

1

m

m

m

K

0

04/10/23 WWW'09, Madrid, Spain

C4C3C2C1

C4C3C2C1

Page 57: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

A Quick Example

• Here we are interested in R3

57

-1 -0.63 0.83 -0.33

0 1 2 3

1

4 5 6

1 0

..

-0.11 -0

1 0

.

.

1

04

m

m

m

K

0

04/10/23 WWW'09, Madrid, Spain

C4C3C2C1

C4C3C2C1

C4C3C2C1

Page 58: Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain

A Quick Example

• Here we are interested in R3

58

-1 -0.63 0.83 -0.33

0 1 2 3

1

4 5 6

1 0

..

-0.11 -0

1 0 1

.

.

04

m

m

m

K

0

04/10/23 WWW'09, Madrid, Spain

3 3 3 3 3| 1 1 0.83 1 0.11p R R R R R C

Mean(R3) = 0.52Std(R3) = 0.22