1
0 0.1 0.2 0.3 0.4 0.5 XMAD@5 0 10 20 30 40 CTR@5 (%) XReg XLR DiSMEC LEML Parabel PfastreXML 0 2 4 6 8 10 Training Time (hrs) 0 10 20 30 40 50 60 Prediction Time (ms/point) 0 10 20 30 40 50 Rating@5 (%) XReg XLR DiSMEC LEML Parabel PfastreXML 1-vs-All LS 0 20 40 60 80 Prediction Time (ms/point) 0.8 0.85 0.9 0.95 XMAD@5 91 122 Movie Recommendation: 0 1 2 3 4 5 Training Time (hrs) Extreme Regression for Ranking & Recommendation Yashoteja Prabhu* # Aditya Kusupati Nilesh Gupta* Manik Varma* # *Microsoft Research India # IIT Delhi University of Washington Predict relevance scores of millions of labels towards a given data point Reduces to Extreme Classification if relevance scores are binary : Data Points : Labels : → ℝ + || Training: Prediction: Probabilistic Model: How XReg makes recommendations to a user who likes oranges, grapefruits and blueberries ? Three stages of Xreg: 1) Learn item tree 2) Train probabilistic models 3) Predict for a new user Citrus Berries Citrus Berries Learn item tree by hierarchical clustering of items Similar items end up in the same leaf node Citrus Berries Train a separate linear regressor for each item in a leaf node Recommend items with high regression scores .9 .1 .8 .9 .1 .8 0 Citrus Berries .8 1 .9 1 0 .9 .9 .9 0 .9 .8 .8 .9 .1 .8 0 .8 1 .9 1 0 .9 .9 .9 0 .9 .8 .8 f ( ) 4 5 1 0 0 : → ℝ + || 0 0 0 0 0 0 0 0 0 A new paradigm for reformulating ranking & recommendation Predict the most relevant label shortlist & their relevance scores for further reranking Document Tagging: Tag IP Score Turing awardees 1.8 AI researchers 1.5 Living people 1.0 Others 0 Computational Advertising: Search Query PClick Sweaters for men 100% Clothing for men 50% Soft toy 10% Others 0% User Rating 5 4 1 Others 0 Movie Recommendation: Extreme Classification: Predicts less relevant labels due to binary relevance assumption Does not generate useful relevance scores for further filtering or re-ranking Conventional Regression: High accuracy and low latency predictions required in real-world recommendation 1-vs-All regressors scale linearly in number of labels Scalable tree-based regressors suffer from low accuracy Measure the regression errors of millions of labels Provide a good proxy for ranking quality Irrelevant labels dominate traditional regression metrics Extreme Mean Absolute Deviation @: XMAD@ , = 1 |ෝ | where contains labels with largest errors XMAD @ 1 , = XMAD @ = MAD Ranking-regret @ ≤ 2 XMAD @ 2 Properties: XMAD is a better indicator of filtering & re-ranking qualities than purely ranking or regression metrics Computational Advertising: 0 10 20 30 40 50 PSP@5 (%) XReg DiSMEC ProXML Parabel PfastreXML 0 2 4 6 8 Training Time (hrs) 0 2 4 6 8 10 Prediction Time (ms/point) Document Tagging: 0 0.05 0.1 0.15 0.2 0.25 XMAD@5 Extreme Regression Limitations of Existing Approaches Applications Extreme Regression Metrics XReg: Extreme Regressor Results DSA-130K MovieLens-138K WikiLSHTC-325K

Extreme Regression for Dynamic Search Adskusupati/docs/... · uk online casino 0% 100s ideapad Lenovo 0% 1. Dynamic Search Advertising 2. Web-page annotation 3. Product recommendation

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Extreme Regression for Dynamic Search Adskusupati/docs/... · uk online casino 0% 100s ideapad Lenovo 0% 1. Dynamic Search Advertising 2. Web-page annotation 3. Product recommendation

0

0.1

0.2

0.3

0.4

0.5

XMAD@5

0

10

20

30

40

CTR@5 (%)

XReg

XLR

DiSMEC

LEML

Parabel

PfastreXML

0

2

4

6

8

10

Training Time (hrs)0

10

20

30

40

50

60

Prediction Time (ms/point)

0

10

20

30

40

50

Rating@5 (%)

XReg

XLR

DiSMEC

LEML

Parabel

PfastreXML

1-vs-All LS

0

20

40

60

80

Prediction Time (ms/point)

0.8

0.85

0.9

0.95

XMAD@5

91 122

Movie Recommendation:

0

1

2

3

4

5

Training Time (hrs)

Extreme Regression for Ranking & RecommendationYashoteja Prabhu*# Aditya Kusupati

†Nilesh Gupta* Manik Varma*#

*Microsoft Research India #IIT Delhi †University of Washington

• Predict relevance scores of millions of labels towards a given data point

• Reduces to Extreme Classification if relevance scores are binary

𝑋: Data Points 𝑌: Labels

𝑓: 𝑋 → ℝ+|𝑌|

Training:

Prediction:

Probabilistic Model:• How XReg makes recommendations to a user who likes oranges, grapefruits and blueberries ?

Three stages of Xreg:

1) Learn item tree

2) Train probabilistic models

3) Predict for a new user

Citrus

Berries

Citrus Berries

• Learn item tree by hierarchical clustering of items• Similar items end up in the same leaf node

Citrus Berries

• Train a separate linear regressor for each item in a leaf node• Recommend items with high regression scores

.9.1

.8

.9 .1 .80

Citrus Berries

.8

1

.9

1

0

.9

.9

.9

0

.9

.8

.8

.9 .1 .80

.8

1

.9

1

0

.9

.9

.9

0

.9

.8

.8

f ( ) 4

5

10

0

𝑓: 𝑋 → ℝ+|𝑌|

00

0

0

0

0

00

0

• A new paradigm for reformulating ranking & recommendation

• Predict the most relevant label shortlist & their relevance scores for further reranking

Document Tagging:Tag IP Score

Turing awardees 1.8

AI researchers 1.5

Living people 1.0

Others 0

Computational Advertising:Search Query PClick

Sweaters for men 100%Clothing for men 50%

Soft toy 10%Others 0%

User Rating541

Others 0

Movie Recommendation:

Extreme Classification:

• Predicts less relevant labels due to binary relevance assumption

• Does not generate useful relevance scores for further filtering or re-ranking

Conventional Regression:

• High accuracy and low latency predictions required in real-world recommendation

• 1-vs-All regressors scale linearly in number of labels• Scalable tree-based regressors suffer from low accuracy

• Measure the regression errors of millions of labels• Provide a good proxy for ranking quality • Irrelevant labels dominate traditional regression metrics

Extreme Mean Absolute Deviation @𝒌:

XMAD@𝑘 ො𝐫, 𝐫 =1

𝑘

𝑙∈𝑆𝑘

| ෝ𝑟𝑙 − 𝑟𝑙 |

where 𝑆𝑘 contains 𝑘 labels with largest errors

• XMAD@ 1 ො𝐫, 𝐫 = ො𝐫 − 𝐫 ∞

• XMAD@ 𝐿 = MAD• Ranking-regret @ 𝑘 ≤ 2 XMAD@ 2𝑘

Properties:

XMAD is a better indicator of filtering & re-ranking qualities than purely ranking or regression metrics

Computational Advertising:

0

10

20

30

40

50

PSP@5 (%)

XReg

DiSMEC

ProXML

Parabel

PfastreXML

0

2

4

6

8

Training Time (hrs)0

2

4

6

8

10

Prediction Time (ms/point)

Document Tagging:

0

0.05

0.1

0.15

0.2

0.25

XMAD@5

Extreme Regression Limitations of Existing Approaches

Applications

Extreme Regression Metrics

XReg: Extreme Regressor Results

DSA-130K

MovieLens-138K

WikiLSHTC-325K