Upload
mailru-group
View
5.135
Download
1
Embed Size (px)
Citation preview
Vladimir Gulin
Learning to rank using clickthrough data
2
Search Engine Architecture
2
WEB CRAWLER
INDEX
BACKEND
FRONTEND
3
What is ranking?
3
Main algorithm in search engine
Based on ML algorithms
Computes relevance score for query-document pair
The most kept secret of search companies
Today ranking quality depends on
Evaluation of ranking quality
A method of Data Set construction
Features of search engine
ML algorithm
4
How to evaluate ranking quality?
4
Classical approach Classical
Classical approach
Select set of queries 𝑄 = {𝑞1, 𝑞2, … , 𝑞|𝑄|} from logs
For each 𝑞 ∈ 𝑄 ∃ set of documents 𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁𝑞 }
For each (𝑞, 𝑑) ask experts for mark ∈ {0,1,2,3,4,5}
Discount Cumulative Gain
𝑫𝑪𝑮 = 𝟐𝒓𝒆𝒍𝒊 − 𝟏
log𝟐 𝒊 + 𝟏
𝑁𝑞
𝒊=𝟏𝒒∈𝑸
5
How to evaluate ranking quality with clickthrough data?
5
Evaluation with absolute metrics Users were shown results from different rankings Measure statistics about user responses
• Abandonment rate • Reformulation rate • Position of first click • Time to first click • Etc.
Evaluation using Paired Comparisons
Show a combination of results from 2 ranking Infer relative preferences
• Balanced interleaving • Team-draft interleaving • Etc.
6
Team-draft interleaving
6
SERP A
1. UrlA1
2. UrlA2
3. UrlA3
4. UrlA4
5. UrlA5
6. UrlA6
7. UrlA7
SERP B
1. UrlB1
2. UrlB2
3. UrlB3
4. UrlB4
5. UrlB5
6. UrlB6
7. UrlB7
SERP
1. UrlB1
2. UrlA1
3. UrlA2
4. UrlB2
5. UrlA3
6. UrlB3
7. UrlB4
∆=𝑤𝑖𝑛𝑠 𝐴 +
1
2𝑡𝑖𝑒𝑠(𝐴,𝐵)
𝑤𝑖𝑛𝑠 𝐴 + 𝑤𝑖𝑛𝑠 𝐵 + 𝑡𝑖𝑒𝑠(𝐴,𝐵) - 0.5
Learning to rank with classical approach
7
Learning to rank algorithms Pointwise
𝐿 𝑓 𝑥 = (𝒇 𝒙𝒊 − 𝒓𝒆𝒍𝒊)𝟐
𝑁𝑞
𝒊=𝟏𝒒∈𝑸
Pairwise
Listwise
Discount Cumulative Gain
𝑫𝑪𝑮 = 𝟐𝒓𝒆𝒍𝒊 − 𝟏
log𝟐 𝒊 + 𝟏
𝑁𝑞
𝒊=𝟏𝒒∈𝑸
→ 𝒎𝒂𝒙
𝐿 𝑓 𝑥 = − log𝑒𝑓(𝑥𝑖)
𝑒𝑓(𝑥𝑖) + 𝑒𝑓(𝑥𝑗)(𝒊,𝒋)𝒒∈𝑸
𝐿 𝑓 𝑥 = − 𝑒𝑟𝑒𝑙𝑗
𝑒𝑟𝑒𝑙𝑘𝑁𝑞𝒌=𝟏
log𝑒𝑓(𝑥𝑗)
𝑒𝑓(𝑥𝑘)𝑁𝑞𝒌=𝟏
𝑁𝑞
𝒋=𝟏𝒒∈𝑸
8
Typical problems of the classical approach
8
Problems with documents Search index is constantly changing we have to rebuild
ranking model often.
Problems with experts Experts do mistakes Group of experts not equal millions of users Experts do not ask queries We fit ranking for instructions(100 pages), not for users
Problems with queries Queries become irrelevant Ratings always outdated
9
Advantages and disadvantages of clickthrough data
9 9
Expert judgements Clickthrough data
Thousands per day Millions per day
Expensive Cheap
Low speed of obtaining High speed of obtaining
Noisy data Extremely noisy data
Fresh only at the moment of
assessment
Always fresh data
Can evaluate any query (not
always correct)
Can’t evaluate queries that
nobody asks in SE
Judgements are biased Unbiased (in terms of our flow
of queries)
How we can use clickthrough data for optimizing TDI?
10
Simple approach
SERP 1 SERP 2
vs
From 2 rankings select only serps, that win on TDI experiment
11
Optimal SERP construction
11 11
Given Query q Set of documents for q
𝑞 → 𝐷 = {𝑑1, 𝑑2, … , 𝑑 𝑁𝑞 }
User sessions with different permutations of docs from set D Idea Let`s construct permutation (optimal permutation - OP) of docs that will win
any other permutation of these documents in terms of TDI experiments in average
12
Information from user session
12 12
Example (Case 1) query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
CLICK
What information have we received from this session?
13
Information from user session
13 13
Example (Case 1)
query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
CLICK
𝑢𝑟𝑙1 >
𝑢𝑟𝑙2𝑢𝑟𝑙3𝑢𝑟𝑙4𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙8𝑢𝑟𝑙9𝑢𝑟𝑙10
Remark: It is obvious that it is possible to use more complex click model (CCM, DBN, etc.)
14
Information from user session
14 14
Example (Case 2) query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
What information have we received from this session?
CLICK
CLICK
CLICK
15
Information from user session
15 15
Example (Case 2)
query q
1. url1
2. url2
3. url3
4. url4
5. url5
6. url6
7. url7
8. url8
9. url9
10. url10
CLICK
CLICK
CLICK
𝑢𝑟𝑙2 >
𝑢𝑟𝑙1𝑢𝑟𝑙3𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙9𝑢𝑟𝑙10
𝑢𝑟𝑙4 >
𝑢𝑟𝑙1𝑢𝑟𝑙3𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙9𝑢𝑟𝑙10
𝑢𝑟𝑙8 >
𝑢𝑟𝑙1𝑢𝑟𝑙3𝑢𝑟𝑙5𝑢𝑟𝑙6𝑢𝑟𝑙7𝑢𝑟𝑙9𝑢𝑟𝑙10
16
Optimal SERP construction
16 16
Given For query q aggregate partial relative relevance judgments from all users
sessions
query q (session 1)
url1 > url2
url2 > url4
url1 > url5
….
query q (session 2)
url4 > url5
url2 > url1
url3 > url5
….
query q (session 3)
url4 > url5
url2 > url1
url5 > url2
….
query q (session k)
url4 > url5
url2 > url1
url3 > url5
….
query q
url4 > url5 (5 times)
url2 > url1 (3 times)
url5 > url2 (-7 times)
….
17
Optimal SERP construction
17 17
Given Let`s find weights for each document for query q from system of linear
equations
query q
url4 > url5 (5 times)
url2 > url1 (3 times)
url5 > url2 (-7 times)
….
𝑥4 − 𝑥5 = 5
𝑥2 − 𝑥1 = 3
𝑥5 − 𝑥2 = −7
….
18
Optimal SERP construction
18 18
In common case Add information about positions of docs
query q
url4 > url5 (5 times)
url2 > url1 (3 times)
url5 > url2 (-7 times)
….
𝛾(𝑝𝑜𝑠4)𝑥4 − 𝛾 𝑝𝑜𝑠5 𝑥5 = 𝜑(𝑝𝑜𝑠4, 𝑝𝑜𝑠5, 5)
𝛾(𝑝𝑜𝑠2)𝑥2 − 𝛾 𝑝𝑜𝑠1 𝑥1 = φ(pos1, pos2,3)
𝛾(𝑝𝑜𝑠5)𝑥5 − 𝛾 𝑝𝑜𝑠2 𝑥2 = φ(pos2, pos5,7)
….
19
Optimal SERP construction
19 19
Finally
𝜸𝟏𝟏𝒙𝟏 − 𝜸𝟏𝟐𝒙𝟐 = 𝝋𝟏
….
𝜸𝟐𝟏𝒙𝟏 − 𝜸𝟐𝟑𝒙𝟑 = 𝝋𝟐
𝜸𝑵𝑵𝒒−𝟏𝒙𝑵𝒒−𝟏 − 𝜸𝑵𝑵𝒒𝒙𝑵𝒒 = 𝝋𝑵
𝒀𝒙 = Ф
Solution for x
𝒙 = (𝒀𝑻𝒀)−𝟏𝒀𝑻Ф
𝒅𝒊𝒎(𝒀) = 𝑵 × 𝑵𝒒
𝒅𝒊𝒎 𝒙 = 𝑵𝒒
𝒅𝒊𝒎 Ф = 𝑵
𝑵− 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒂𝒓𝒕𝒊𝒂𝒍 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒋𝒖𝒅𝒈𝒎𝒆𝒏𝒕𝒔
𝑵𝒒 − 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒐𝒄𝒔 𝒇𝒐𝒓 𝒒𝒖𝒆𝒓𝒚 𝒒
20
Results
20 20
Computed Optimized Serps for 200000 most frequent queries (7% of flow of queries) +14% quality for these frequent queries +1% search quality NOT BAD
Let`s try use Optimized Serps for machine learning to rank
Amount of statistics
21
We have a problem …
21
22
Learning from top results
22
Problems with learning from top results (Example)
23
Learning from top results
23
Problems with learning from top results Out of top there are many documents with quite another features distribution In all documents word “barcelona” there is in title. Therefore feature, that describes availability words of query in title will be useless for this query.
Solution Let`s sample from set of unlabeled urls We need sampling, because we can`t add all unlabeled data to training data
………
Urls, that should be on top
Unlabeled urls
24
Semi-supervised learning to rank
24
Sampling from unlabeled urls
………
Unlabeled docs Build self organizing map Get one doc from each cluster
Sampled url
Sampled url
Sampled url
Sampled url
Sampled url
25
Semi-supervised learning to rank
25
Add sampled docs as “irrelevant” to training set
Sampled url
Sampled url
………
Sampled url
Unlabeled urls Final training data for query q
Train data set
Semi-supervised learning to rank
25 26 26
Training data for query 𝑞1 Training data for query 𝑞2 Training data for query 𝑞|𝑄|
…..
Optimized Serp urls
Unlabeled urls (marked as irrelevant)
27
Results
26
2.5% search quality
Final Results
27
We received the automatic search improvement method This method can learn improved ranking function without any explicit feedback from experts
timeline
TDI experiment with our old ranking, based on expert judgments
0
-0.01
0.01
0.02
0.03
0.04
0.05
29
Using clickthrough data for online learning to rank
30
Using clickthrough data for online learning to rank
29
Typical problems with new ranking formula construction We need large dataset (5-10 millions points) Usually we use active learning for obtaining this data
It is necessary about 10-15 iterations of active learning for obtaining new ranking formula with same quality as current model We can`t use all available clickthrough data for training out ranking formula Can we improve current formula using new clickthrough data? Can we improve current formula using ALL new clickthrough data?
31
Typical ranking formula
30
Typical ranking formula specification Ensemble of tens of thousands decision trees Trained using gradient boosting algorithm
32
Idea
31
«Recognition is clusterization, and the role of supervisor is primarily to name clusters correct…» Geoffrey Hinton
33
Typical ranking formula
32
Typical ranking formula specification Ranking formula can return only finite set of values Each decision tree in ensemble contains only several predicates Each query-document pair is described by aggregate of predicates of ensemble
Let`s use partition of multidimensional space generated by ranking formula as clustering
Let`s remap all clickthrough data on this clusterization
34
Online learning to rank
33
point
point
35
Online learning to rank
34
36
Online learning to rank results
35
Online learning to rank We get online learning to rank method Method allows us to use ALL clickthrough feedback from users We don`t need to retrain model
Method allows to actualize current ranking formula under current users behavior
37
Thank you!