Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data

Vladimir Gulin

Learning to rank using clickthrough data

2

Search Engine Architecture

2

WEB CRAWLER

INDEX

BACKEND

FRONTEND

3

What is ranking?

3

Main algorithm in search engine

Based on ML algorithms

Computes relevance score for query-document pair

The most kept secret of search companies

Today ranking quality depends on

Evaluation of ranking quality

A method of Data Set construction

Features of search engine

ML algorithm

4

How to evaluate ranking quality?

4

Classical approach Classical

Classical approach

Select set of queries 𝑄 = {𝑞1, 𝑞2, … , 𝑞|𝑄|} from logs

For each 𝑞 ∈ 𝑄 ∃ set of documents 𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁𝑞 }

For each (𝑞, 𝑑) ask experts for mark ∈ {0,1,2,3,4,5}

Discount Cumulative Gain

𝑫𝑪𝑮 = 𝟐𝒓𝒆𝒍𝒊 − 𝟏

log𝟐 𝒊 + 𝟏

𝑁𝑞

𝒊=𝟏𝒒∈𝑸

5

How to evaluate ranking quality with clickthrough data?

5

Evaluation with absolute metrics Users were shown results from different rankings Measure statistics about user responses

• Abandonment rate • Reformulation rate • Position of first click • Time to first click • Etc.

Evaluation using Paired Comparisons

Show a combination of results from 2 ranking Infer relative preferences

• Balanced interleaving • Team-draft interleaving • Etc.

6

Team-draft interleaving

6

SERP A

1. UrlA1

2. UrlA2

3. UrlA3

4. UrlA4

5. UrlA5

6. UrlA6

7. UrlA7

SERP B

1. UrlB1

2. UrlB2

3. UrlB3

4. UrlB4

5. UrlB5

6. UrlB6

7. UrlB7

SERP

1. UrlB1

2. UrlA1

3. UrlA2

4. UrlB2

5. UrlA3

6. UrlB3

7. UrlB4

∆=𝑤𝑖𝑛𝑠 𝐴 +

1

2𝑡𝑖𝑒𝑠(𝐴,𝐵)

𝑤𝑖𝑛𝑠 𝐴 + 𝑤𝑖𝑛𝑠 𝐵 + 𝑡𝑖𝑒𝑠(𝐴,𝐵) - 0.5

Learning to rank with classical approach

7

Learning to rank algorithms Pointwise

𝐿 𝑓 𝑥 = (𝒇 𝒙𝒊 − 𝒓𝒆𝒍𝒊)𝟐

𝑁𝑞


Pairwise

Listwise

Discount Cumulative Gain

𝑫𝑪𝑮 = 𝟐𝒓𝒆𝒍𝒊 − 𝟏

log𝟐 𝒊 + 𝟏

𝑁𝑞


→ 𝒎𝒂𝒙

𝐿 𝑓 𝑥 = − log𝑒𝑓(𝑥𝑖)

𝑒𝑓(𝑥𝑖) + 𝑒𝑓(𝑥𝑗)(𝒊,𝒋)𝒒∈𝑸

𝐿 𝑓 𝑥 = − 𝑒𝑟𝑒𝑙𝑗

𝑒𝑟𝑒𝑙𝑘𝑁𝑞𝒌=𝟏

log𝑒𝑓(𝑥𝑗)

𝑒𝑓(𝑥𝑘)𝑁𝑞𝒌=𝟏

𝑁𝑞

𝒋=𝟏𝒒∈𝑸

8

Typical problems of the classical approach

8

Problems with documents Search index is constantly changing we have to rebuild

ranking model often.

Problems with experts Experts do mistakes Group of experts not equal millions of users Experts do not ask queries We fit ranking for instructions(100 pages), not for users

Problems with queries Queries become irrelevant Ratings always outdated

9

Advantages and disadvantages of clickthrough data

9 9

Expert judgements Clickthrough data

Thousands per day Millions per day

Expensive Cheap

Low speed of obtaining High speed of obtaining

Noisy data Extremely noisy data

Fresh only at the moment of

assessment

Always fresh data

Can evaluate any query (not

always correct)

Can’t evaluate queries that

nobody asks in SE

Judgements are biased Unbiased (in terms of our flow

of queries)

How we can use clickthrough data for optimizing TDI?

10

Simple approach

SERP 1 SERP 2

vs

From 2 rankings select only serps, that win on TDI experiment

11

Optimal SERP construction

11 11

Given Query q Set of documents for q

𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁𝑞 }

User sessions with different permutations of docs from set D Idea Let`s construct permutation (optimal permutation - OP) of docs that will win

any other permutation of these documents in terms of TDI experiments in average

12

Information from user session

12 12

Example (Case 1) query q

1. url1

2. url2

3. url3

4. url4

5. url5

6. url6

7. url7

8. url8

9. url9

10. url10

CLICK

What information have we received from this session?

13


13 13

Example (Case 1)

query q

1. url1

2. url2

3. url3

4. url4

5. url5

6. url6

7. url7

8. url8

9. url9

10. url10

CLICK

𝑢𝑟𝑙1 >

𝑢𝑟𝑙2𝑢𝑟𝑙3𝑢𝑟𝑙4𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙8𝑢𝑟𝑙9𝑢𝑟𝑙10

Remark: It is obvious that it is possible to use more complex click model (CCM, DBN, etc.)

14


14 14

Example (Case 2) query q

1. url1

2. url2

3. url3

4. url4

5. url5

6. url6

7. url7

8. url8

9. url9

10. url10

What information have we received from this session?

CLICK

CLICK

CLICK

15


15 15

Example (Case 2)

query q

1. url1

2. url2

3. url3

4. url4

5. url5

6. url6

7. url7

8. url8

9. url9

10. url10

CLICK

CLICK

CLICK

𝑢𝑟𝑙2 >

𝑢𝑟𝑙1𝑢𝑟𝑙3𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙9𝑢𝑟𝑙10

𝑢𝑟𝑙4 >


𝑢𝑟𝑙8 >


16


16 16

Given For query q aggregate partial relative relevance judgments from all users

sessions

query q (session 1)

url1 > url2

url2 > url4

url1 > url5

….

query q (session 2)

url4 > url5

url2 > url1

url3 > url5

….

query q (session 3)

url4 > url5

url2 > url1

url5 > url2

….

query q (session k)

url4 > url5

url2 > url1

url3 > url5

….

query q

url4 > url5 (5 times)


url5 > url2 (-7 times)

….

17


17 17

Given Let`s find weights for each document for query q from system of linear

equations

query q




….

𝑥4 − 𝑥5 = 5

𝑥2 − 𝑥1 = 3

𝑥5 − 𝑥2 = −7

….

18


18 18

In common case Add information about positions of docs

query q




….

𝛾(𝑝𝑜𝑠4)𝑥4 − 𝛾 𝑝𝑜𝑠5 𝑥5 = 𝜑(𝑝𝑜𝑠4, 𝑝𝑜𝑠5, 5)

𝛾(𝑝𝑜𝑠2)𝑥2 − 𝛾 𝑝𝑜𝑠1 𝑥1 = φ(pos1, pos2,3)

𝛾(𝑝𝑜𝑠5)𝑥5 − 𝛾 𝑝𝑜𝑠2 𝑥2 = φ(pos2, pos5,7)

….

19


19 19

Finally

𝜸𝟏𝟏𝒙𝟏 − 𝜸𝟏𝟐𝒙𝟐 = 𝝋𝟏

….

𝜸𝟐𝟏𝒙𝟏 − 𝜸𝟐𝟑𝒙𝟑 = 𝝋𝟐

𝜸𝑵𝑵𝒒−𝟏𝒙𝑵𝒒−𝟏 − 𝜸𝑵𝑵𝒒𝒙𝑵𝒒 = 𝝋𝑵

𝒀𝒙 = Ф

Solution for x

𝒙 = (𝒀𝑻𝒀)−𝟏𝒀𝑻Ф

𝒅𝒊𝒎(𝒀) = 𝑵 × 𝑵𝒒

𝒅𝒊𝒎 𝒙 = 𝑵𝒒

𝒅𝒊𝒎 Ф = 𝑵

𝑵− 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒂𝒓𝒕𝒊𝒂𝒍 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒋𝒖𝒅𝒈𝒎𝒆𝒏𝒕𝒔

𝑵𝒒 − 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒐𝒄𝒔 𝒇𝒐𝒓 𝒒𝒖𝒆𝒓𝒚 𝒒

20

Results

20 20

Computed Optimized Serps for 200000 most frequent queries (7% of flow of queries) +14% quality for these frequent queries +1% search quality NOT BAD

Let`s try use Optimized Serps for machine learning to rank

Amount of statistics

21

We have a problem …

21

22

Learning from top results

22

Problems with learning from top results (Example)

23

Learning from top results

23

Problems with learning from top results Out of top there are many documents with quite another features distribution In all documents word “barcelona” there is in title. Therefore feature, that describes availability words of query in title will be useless for this query.

Solution Let`s sample from set of unlabeled urls We need sampling, because we can`t add all unlabeled data to training data

………

Urls, that should be on top

Unlabeled urls

24

Semi-supervised learning to rank

24

Sampling from unlabeled urls

………

Unlabeled docs Build self organizing map Get one doc from each cluster

Sampled url

Sampled url

Sampled url

Sampled url

Sampled url

25


25

Add sampled docs as “irrelevant” to training set

Sampled url

Sampled url

………

Sampled url

Unlabeled urls Final training data for query q

Train data set


25 26 26

Training data for query 𝑞1 Training data for query 𝑞2 Training data for query 𝑞|𝑄|

…..

Optimized Serp urls

Unlabeled urls (marked as irrelevant)

27

Results

26

2.5% search quality

Final Results

27

We received the automatic search improvement method This method can learn improved ranking function without any explicit feedback from experts

timeline

TDI experiment with our old ranking, based on expert judgments

0

-0.01

0.01

0.02

0.03

0.04

0.05

29

Using clickthrough data for online learning to rank

30

Using clickthrough data for online learning to rank

29

Typical problems with new ranking formula construction We need large dataset (5-10 millions points) Usually we use active learning for obtaining this data

It is necessary about 10-15 iterations of active learning for obtaining new ranking formula with same quality as current model We can`t use all available clickthrough data for training out ranking formula Can we improve current formula using new clickthrough data? Can we improve current formula using ALL new clickthrough data?

31

Typical ranking formula

30

Typical ranking formula specification Ensemble of tens of thousands decision trees Trained using gradient boosting algorithm

32

Idea

31

«Recognition is clusterization, and the role of supervisor is primarily to name clusters correct…» Geoffrey Hinton

33

Typical ranking formula

32

Typical ranking formula specification Ranking formula can return only finite set of values Each decision tree in ensemble contains only several predicates Each query-document pair is described by aggregate of predicates of ensemble

Let`s use partition of multidimensional space generated by ranking formula as clustering

Let`s remap all clickthrough data on this clusterization

34

Online learning to rank

33

point

point

35

Online learning to rank

34

36

Online learning to rank results

35

Online learning to rank We get online learning to rank method Method allows us to use ALL clickthrough feedback from users We don`t need to retrain model

Method allows to actualize current ranking formula under current users behavior

37

Thank you!

Science

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data