Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
0
0.1
0.2
0.3
0.4
0.5
XMAD@5
0
10
20
30
40
CTR@5 (%)
XReg
XLR
DiSMEC
LEML
Parabel
PfastreXML
0
2
4
6
8
10
Training Time (hrs)0
10
20
30
40
50
60
Prediction Time (ms/point)
0
10
20
30
40
50
Rating@5 (%)
XReg
XLR
DiSMEC
LEML
Parabel
PfastreXML
1-vs-All LS
0
20
40
60
80
Prediction Time (ms/point)
0.8
0.85
0.9
0.95
XMAD@5
91 122
Movie Recommendation:
0
1
2
3
4
5
Training Time (hrs)
Extreme Regression for Ranking & RecommendationYashoteja Prabhu*# Aditya Kusupati
†Nilesh Gupta* Manik Varma*#
*Microsoft Research India #IIT Delhi †University of Washington
• Predict relevance scores of millions of labels towards a given data point
• Reduces to Extreme Classification if relevance scores are binary
𝑋: Data Points 𝑌: Labels
𝑓: 𝑋 → ℝ+|𝑌|
Training:
Prediction:
Probabilistic Model:• How XReg makes recommendations to a user who likes oranges, grapefruits and blueberries ?
Three stages of Xreg:
1) Learn item tree
2) Train probabilistic models
3) Predict for a new user
Citrus
Berries
Citrus Berries
• Learn item tree by hierarchical clustering of items• Similar items end up in the same leaf node
Citrus Berries
• Train a separate linear regressor for each item in a leaf node• Recommend items with high regression scores
.9.1
.8
.9 .1 .80
Citrus Berries
.8
1
.9
1
0
.9
.9
.9
0
.9
.8
.8
.9 .1 .80
.8
1
.9
1
0
.9
.9
.9
0
.9
.8
.8
f ( ) 4
5
10
0
𝑓: 𝑋 → ℝ+|𝑌|
00
0
0
0
0
00
0
• A new paradigm for reformulating ranking & recommendation
• Predict the most relevant label shortlist & their relevance scores for further reranking
Document Tagging:Tag IP Score
Turing awardees 1.8
AI researchers 1.5
Living people 1.0
Others 0
Computational Advertising:Search Query PClick
Sweaters for men 100%Clothing for men 50%
Soft toy 10%Others 0%
User Rating541
Others 0
Movie Recommendation:
Extreme Classification:
• Predicts less relevant labels due to binary relevance assumption
• Does not generate useful relevance scores for further filtering or re-ranking
Conventional Regression:
• High accuracy and low latency predictions required in real-world recommendation
• 1-vs-All regressors scale linearly in number of labels• Scalable tree-based regressors suffer from low accuracy
• Measure the regression errors of millions of labels• Provide a good proxy for ranking quality • Irrelevant labels dominate traditional regression metrics
Extreme Mean Absolute Deviation @𝒌:
XMAD@𝑘 ො𝐫, 𝐫 =1
𝑘
𝑙∈𝑆𝑘
| ෝ𝑟𝑙 − 𝑟𝑙 |
where 𝑆𝑘 contains 𝑘 labels with largest errors
• XMAD@ 1 ො𝐫, 𝐫 = ො𝐫 − 𝐫 ∞
• XMAD@ 𝐿 = MAD• Ranking-regret @ 𝑘 ≤ 2 XMAD@ 2𝑘
Properties:
XMAD is a better indicator of filtering & re-ranking qualities than purely ranking or regression metrics
Computational Advertising:
0
10
20
30
40
50
PSP@5 (%)
XReg
DiSMEC
ProXML
Parabel
PfastreXML
0
2
4
6
8
Training Time (hrs)0
2
4
6
8
10
Prediction Time (ms/point)
Document Tagging:
0
0.05
0.1
0.15
0.2
0.25
XMAD@5
Extreme Regression Limitations of Existing Approaches
Applications
Extreme Regression Metrics
XReg: Extreme Regressor Results
DSA-130K
MovieLens-138K
WikiLSHTC-325K