Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Extending Learning to Rank with User Dynamic
8th Italian Information Retrieval Workshop - IIR 2017 June 6, 2017
Nicola Ferro, Claudio Lucchese, Maria Maistro, Raffaele PeregoUniversity of Padua, Padua, Italy
ISTI-CNR, Pisa, Italy
Extended Abstract of N. Ferro, C. Lucchese, M. Maistro and R. Perego. On Including the User Dynamic in Learning to Rank. In SIGIR, ACM, 2017.
Outline• Motivations and Goal
• Modeling User Dynamic
• Integrating User Dynamic in LtR
• Experimental Results
• Conclusion and Future Work
Click Log Data
Query logs have proven to be a valuable and informative source of implicit user feedback:
• they can be easily collected by search engines;
• they are available in real time;
• they represent personalized user preferences.
Incorporating User Features in LtR
E. Agichtein, E. Brill, and S. Dumais. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR, pages 19–26, ACM, 2006.
Incorporating User Features in LtR
To show the importance of user interactions features we trained LambdaMART on MSLR-WEB10K LtR dataset:
E. Agichtein, E. Brill, and S. Dumais. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR, pages 19–26, ACM, 2006.
Incorporating User Features in LtR
To show the importance of user interactions features we trained LambdaMART on MSLR-WEB10K LtR dataset:
NDCG with user features:0.4636
E. Agichtein, E. Brill, and S. Dumais. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR, pages 19–26, ACM, 2006.
Incorporating User Features in LtR
To show the importance of user interactions features we trained LambdaMART on MSLR-WEB10K LtR dataset:
NDCG with user features:0.4636
NDCG without user features:0.4410
E. Agichtein, E. Brill, and S. Dumais. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR, pages 19–26, ACM, 2006.
A Complementary Approach
Embedding of the user interaction dynamics into LambdaMART: we model the user dynamic with Markov Chain trained on query log data and we modified the LambdaMART loss function.
Measuring EffectivenessEffectiveness is often measured as the inner product of:
• Relevance vector, accounting the quality of ranked documents;
• Discounting vector, accounting for the position where a document is ranked.
J · D
J
D
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected Browsing Utility for Web Search Evaluation. In CIKM, pages 1561–1565, ACM, 2010.
Measuring EffectivenessEffectiveness is often measured as the inner product of:
• Relevance vector, accounting the quality of ranked documents;
• Discounting vector, accounting for the position where a document is ranked.
J · D
J
D
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected Browsing Utility for Web Search Evaluation. In CIKM, pages 1561–1565, ACM, 2010.
Example: Discounted Cumulated Gain (DCG)
Ji = 2li � 1
Di = log(i+ 1)
Navigational vs. Informational Queries
Observation: user behavior in visiting a SERP differs depending on query type and number of relevant results retrieved.
Navigational vs. Informational Queries
Observation: user behavior in visiting a SERP differs depending on query type and number of relevant results retrieved.
SERP with a single highly relevant result in the first position
Navigational Behavior
Navigational vs. Informational Queries
Observation: user behavior in visiting a SERP differs depending on query type and number of relevant results retrieved.
SERP with a single highly relevant result in the first position
Navigational Behavior
SERP with several relevant results or no relevant results
Informational Behavior
Markovian User Model
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
Markovian User ModelWe assume that each user decides, independently from the random time spent in the first document, to move forward or backward to another document in the list.
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
Markovian User ModelWe assume that each user decides, independently from the random time spent in the first document, to move forward or backward to another document in the list.
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
1 X1
Markovian User ModelWe assume that each user decides, independently from the random time spent in the first document, to move forward or backward to another document in the list.
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
2 X2
1 X1
Markovian User ModelWe assume that each user decides, independently from the random time spent in the first document, to move forward or backward to another document in the list.
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
2 X2
1 X1
3 X3
Markovian User ModelWe assume that each user decides, independently from the random time spent in the first document, to move forward or backward to another document in the list.
M. Ferrante, N. Ferro, and M. Maistro. Injecting User Models and Time into Precision via Markov Chains. In SIGIR, pages 597–606, ACM, 2014.
2 X2
1 X1
3 X3
X1, X2, X3, . . . 2 R = {1, 2, . . . , R}
random sequence of document ranks visited by the user
Markov ChainP = (pij : i, j 2 R)Transition Matrix
Markov ChainP = (pij : i, j 2 R)Transition Matrix
pij = P[Xn+1 = j|Xn = i]Transition Probability
Markov ChainP = (pij : i, j 2 R)Transition Matrix
Discrete Time Homogeneous Markov Chain (Xn)n>0
pij = P[Xn+1 = j|Xn = i]Transition Probability
Markov ChainP = (pij : i, j 2 R)Transition Matrix
Discrete Time Homogeneous Markov Chain (Xn)n>0
⇡ = ⇡PInvariant Distribution
pij = P[Xn+1 = j|Xn = i]Transition Probability
Markov ChainP = (pij : i, j 2 R)Transition Matrix
Discrete Time Homogeneous Markov Chain (Xn)n>0
⇡ = ⇡PInvariant Distribution
pij = P[Xn+1 = j|Xn = i]Transition Probability
p(n)ij ! ⇡j as n ! 1 for all i, j
Parameter Estimation• We used the click log dataset provided by Yandex
in the context of the relevance prediction challenge,
• We classify queries on the basis of the number of relevant documents returned and define different classes of queries;
• We aggregate the dynamics of different users and we adopt the maximum likelihood estimator approach to calibrate the transition matrix and compute the invariant distribution.
P. Serdyukov, N. Craswell, G. Dupret. WSCD2012: Workshop on Web Search Click Data 2012. In WSDM, pages 771–772, ACM, 2012.
Different Query Types
Rank Positions2 4 6 8 10
Prob
abilit
y
0
0.05
0.1
0.15
0.2
0.251 Relevant Document0 Relevant Documents3 Relevant Documents5 Relevant Documents7 Relevant Documents9 Relevant Documents
Different Query Types
Rank Positions2 4 6 8 10
Prob
abilit
y
0
0.05
0.1
0.15
0.2
0.251 Relevant Document0 Relevant Documents3 Relevant Documents5 Relevant Documents7 Relevant Documents9 Relevant Documents
one relevant document navigational behavior
Different Query Types
Rank Positions2 4 6 8 10
Prob
abilit
y
0
0.05
0.1
0.15
0.2
0.251 Relevant Document0 Relevant Documents3 Relevant Documents5 Relevant Documents7 Relevant Documents9 Relevant Documents
one relevant document navigational behavior
many or no relevant documents
informational behavior
User DynamicThe user dynamic can be described as a mixture of the navigational and informational behavior
�(i) = ↵i�1 + �i+ �
User DynamicThe user dynamic can be described as a mixture of the navigational and informational behavior
�(i) = ↵i�1 + �i+ �
Navigational component: Inverse of the rank position
User DynamicThe user dynamic can be described as a mixture of the navigational and informational behavior
�(i) = ↵i�1 + �i+ �
Navigational component: Inverse of the rank position
Informational component: linear with the rank position
Fitted Curves
Rank Positions2 4 6 8 10
0
0.05
0.1
0.15
0.2
0.25
/(i) = ,i-1+-i+., = 0.2601- = 0.0112. = -0.0378
: Stationary Distribution/ User Dynamic
Navigational Queries
Rank Positions2 4 6 8 10
0
0.05
0.1
0.15
0.2
0.25
/(i) = ,i-1+-i+., = 0.0848- = 0.0045. = 0.0502
: Stationary Distribution/ User Dynamic
Informational Queries
Learning to RankA LtR algorithm exploits a ground-truth set of training examples in order to learn a document scoring function. The training set is composed of:
• A collection of queries
• Each query is associated with a set of documents ppppppppppppp
• Each query-document pair is represented by a set of features x
q 2 Q
D = {d0, d1, . . .}
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
����
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
����Lambda Gradient
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
����
Quality Function
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
����
Document Scores
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
�����ij =@Qij
@ (si � sj)= sgn(li � lj)
�����Qij ·1
1 + esi�sj
����
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
�����ij =@Qij
@ (si � sj)= sgn(li � lj)
�����Qij ·1
1 + esi�sj
����the sign is determined
by the documents labels only
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
�����ij =@Qij
@ (si � sj)= sgn(li � lj)
�����Qij ·1
1 + esi�sj
����quality variation when swapping scores si and sj
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
�����ij =@Qij
@ (si � sj)= sgn(li � lj)
�����Qij ·1
1 + esi�sj
����
derivative of the RankNet cost
LambdaMARTLet di and dj be two candidate documents for the same query q with relevance labels li and lj respectively.
�ij =@Qij
@ (si � sj)= sgn(yi � yj)
�����Qij ·1
1 + esi�sj
�����ij =@Qij
@ (si � sj)= sgn(li � lj)
�����Qij ·1
1 + esi�sj
����
The lambda gradient for a document di is computed by marginalizing over all possible pairs in the result list: �i =
X
j
�ij
nMCG-MARTWe propose a new measure called Normalized Markov Cumulated Gain (nMCG)
nMCG@k =
Pik
�2li � 1
�· �c(i)
Phk,sorted by lh
(2lh � 1) · �c(h)
nMCG-MARTWe propose a new measure called Normalized Markov Cumulated Gain (nMCG)
nMCG@k =
Pik
�2li � 1
�· �c(i)
Phk,sorted by lh
(2lh � 1) · �c(h)
Relevance Label of the i-ranked document
nMCG-MARTWe propose a new measure called Normalized Markov Cumulated Gain (nMCG)
nMCG@k =
Pik
�2li � 1
�· �c(i)
Phk,sorted by lh
(2lh � 1) · �c(h)
User dynamic function at rank i relative to the query class c
nMCG-MARTWe propose a new measure called Normalized Markov Cumulated Gain (nMCG)
nMCG@k =
Pik
�2li � 1
�· �c(i)
Phk,sorted by lh
(2lh � 1) · �c(h)
nMCG score of the ideal run
nMCG-MARTWe propose a new measure called Normalized Markov Cumulated Gain (nMCG)
nMCG@k =
Pik
�2li � 1
�· �c(i)
Phk,sorted by lh
(2lh � 1) · �c(h)
Extension of nDCG where the discount function is defined by the user dynamic.
Experimental Setup
We evaluate the proposed model on three public datasets:
• MSLR-WEB30K, provided by Microsoft1
• MSLR-WEB10K, provided by Microsoft1
• Istella provided by Tiscali Istella Web search engine2
1. T. Qin, and T. Liu. Introducing LETOR 4.0 Datasets. In CoRR, 2013. 2. D. Dato, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonello o, and R. Venturini. Fast Ranking with
Additive Ensembles of Oblivious and Non Oblivious Regression Trees. In TOIS, 35(2):15:1–15:31, 2016.
Experimental Results
Experimental ResultsNumber of Trees
Experimental Results
p=0.05
Experimental Results
p=0.05 p=0.01
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Take Away Message• We modeled the user
dynamic through Markov chains and show that the user behavior is different on different query types;
• By integrating the user dynamic in LambdaMART we improve with respect to the state of the art both in terms of nDCG and nMCG.
Future Work• Analyze nMCG properties
and its correlation with state of the art evaluation measures;
• Investigate whether nMCG correlates with the quality of a ranking as it is perceived by a user;
• Validate the study of the user dynamic beyond the first results page.
Thank you for your attention Any Questions?