Extreme Regression for Dynamic Search Adskusupati/docs/... · uk online casino 0% 100s ideapad Lenovo 0% 1. Dynamic Search Advertising 2. Web-page annotation 3. Product recommendation

0

0.1

0.2

0.3

0.4

0.5

XMAD@5

0

10

20

30

40

CTR@5 (%)

XReg

XLR

DiSMEC

LEML

Parabel

PfastreXML

0

2

4

6

8

10

Training Time (hrs)0

10

20

30

40

50

60

Prediction Time (ms/point)

0

10

20

30

40

50

Rating@5 (%)

XReg

XLR

DiSMEC

LEML

Parabel

PfastreXML

1-vs-All LS

0

20

40

60

80


0.8

0.85

0.9

0.95

XMAD@5

91 122

Movie Recommendation:

0

1

2

3

4

5

Training Time (hrs)

Extreme Regression for Ranking & RecommendationYashoteja Prabhu*# Aditya Kusupati

†Nilesh Gupta* Manik Varma*#

*Microsoft Research India #IIT Delhi †University of Washington

• Predict relevance scores of millions of labels towards a given data point

• Reduces to Extreme Classification if relevance scores are binary

𝑋: Data Points 𝑌: Labels

𝑓: 𝑋 → ℝ+|𝑌|

Training:

Prediction:

Probabilistic Model:• How XReg makes recommendations to a user who likes oranges, grapefruits and blueberries ?

Three stages of Xreg:

1) Learn item tree

2) Train probabilistic models

3) Predict for a new user

Citrus

Berries

Citrus Berries

• Learn item tree by hierarchical clustering of items• Similar items end up in the same leaf node

Citrus Berries

• Train a separate linear regressor for each item in a leaf node• Recommend items with high regression scores

.9.1

.8

.9 .1 .80

Citrus Berries

.8

1

.9

1

0

.9

.9

.9

0

.9

.8

.8

.9 .1 .80

.8

1

.9

1

0

.9

.9

.9

0

.9

.8

.8

f ( ) 4

5

10

0

𝑓: 𝑋 → ℝ+|𝑌|

00

0

0

0

0

00

0

• A new paradigm for reformulating ranking & recommendation

• Predict the most relevant label shortlist & their relevance scores for further reranking

Document Tagging:Tag IP Score

Turing awardees 1.8

AI researchers 1.5

Living people 1.0

Others 0

Computational Advertising:Search Query PClick

Sweaters for men 100%Clothing for men 50%

Soft toy 10%Others 0%

User Rating541

Others 0

Movie Recommendation:

Extreme Classification:

• Predicts less relevant labels due to binary relevance assumption

• Does not generate useful relevance scores for further filtering or re-ranking

Conventional Regression:

• High accuracy and low latency predictions required in real-world recommendation

• 1-vs-All regressors scale linearly in number of labels• Scalable tree-based regressors suffer from low accuracy

• Measure the regression errors of millions of labels• Provide a good proxy for ranking quality • Irrelevant labels dominate traditional regression metrics

Extreme Mean Absolute Deviation @𝒌:

XMAD@𝑘 ො𝐫, 𝐫 =1

𝑘

𝑙∈𝑆𝑘

| ෝ𝑟𝑙 − 𝑟𝑙 |

where 𝑆𝑘 contains 𝑘 labels with largest errors

• XMAD@ 1 ො𝐫, 𝐫 = ො𝐫 − 𝐫 ∞

• XMAD@ 𝐿 = MAD• Ranking-regret @ 𝑘 ≤ 2 XMAD@ 2𝑘

Properties:

XMAD is a better indicator of filtering & re-ranking qualities than purely ranking or regression metrics

Computational Advertising:

0

10

20

30

40

50

PSP@5 (%)

XReg

DiSMEC

ProXML

Parabel

PfastreXML

0

2

4

6

8

Training Time (hrs)0

2

4

6

8

10


Document Tagging:

0

0.05

0.1

0.15

0.2

0.25

XMAD@5

Extreme Regression Limitations of Existing Approaches

Applications

Extreme Regression Metrics

XReg: Extreme Regressor Results

DSA-130K

MovieLens-138K

WikiLSHTC-325K

Documents

Extreme Regression for Dynamic Search Adskusupati/docs/... · uk online casino 0% 100s ideapad Lenovo 0% 1. Dynamic Search Advertising 2. Web-page annotation 3. Product recommendation