16

Michael Levin - MatrixNet Applications at Yandex

Embed Size (px)

Citation preview

Page 1: Michael Levin - MatrixNet Applications at Yandex
Page 2: Michael Levin - MatrixNet Applications at Yandex

MatrixNet

Michael LevinChief Data Scientist

Page 3: Michael Levin - MatrixNet Applications at Yandex

Yandex Data Factory› Created in 2014› Machine Learning for other

industries› Computing resources› Machine Learning

infrastructure› Data scientists

3

Page 4: Michael Levin - MatrixNet Applications at Yandex

4

› Gradient Boosting over Decision Trees› Classification, Regression, Ranking› Strong results with default parameters› Easy to use› Highly optimized› Training can be local or parallelized on a cluster

What is MatrixNet?

Page 5: Michael Levin - MatrixNet Applications at Yandex

› Web search ranking› Ads click prediction› External projects of YDF› Recommendations

MatrixNet applications at Yandex

› Bot detection› Resolving homonymy› User segmentation› …

5

Page 6: Michael Levin - MatrixNet Applications at Yandex

6

› Oblivious trees

Some tricks & features

Page 7: Michael Levin - MatrixNet Applications at Yandex

7

Regular vs oblivious treesDecision Tree Oblivious Trees

F1>3

F2>3

F1>6

F1>3

F2>3

F2>3

F2

F1

F2

F1

Page 8: Michael Levin - MatrixNet Applications at Yandex

8

› Oblivious trees› Leaf regularization› Gradually increase model complexity› Different objectives: MSE, Log-loss, combinations and non-

standard› Feature binarization› Estimates feature importance and correlation

Some tricks & features

Page 9: Michael Levin - MatrixNet Applications at Yandex

9

› Train based on judged (query, document) pairs› Excellent, Good, Moderate, Bad, Stupid› Features: query, document, query-document, url, host, link,…› Multiclassification, objective = cross-entropy› Regression: Excellent = 1, Stupid = 0, Good = 0.8, objective =

MSE› Ranking: objective - nDCG

Ranking

Page 10: Michael Levin - MatrixNet Applications at Yandex

10

› Non-smooth, so no gradient for gradient boosting› Approximate smooth ranking objective› Alternative - pairwise approach› P(r(dij) < r(dik)) = σ(M(dij) – M(dik))› Maximize likelihood of data given predictions

Ranking

Page 11: Michael Levin - MatrixNet Applications at Yandex

11

› Search ads› User enters query or clicks a link, advertiser enters keywords› Match query and keywords, then show the best ads› Which are the best?› Relevant ads which maximize revenue› Expected money = P(click) * Bid› Goodness = P(click) * Bid * Relevance

Ads click prediction

Page 12: Michael Levin - MatrixNet Applications at Yandex

12

› Need to estimate probability of click› Solution: use log-loss› P(click) = σ(M(ad))

› Maximizes likelihood of data given the predictions› But if ranking doesn’t change, no point in better predictions› Don’t waste model resources on approximating probabilities› Use combination of classification and ranking objective

Ads click prediction

Page 13: Michael Levin - MatrixNet Applications at Yandex

13

› Which telecom users are going to switch after a week?› A week is needed to prepare churn prevention campaign› Compared with telecom’s in-house model› Metric – Lift-10%› Won by 18.7% on CV, 11.5% on test data (churn rate grew 2x)› Got most of this delta with first application of MatrixNet

Churn prediction in telecom

Page 14: Michael Levin - MatrixNet Applications at Yandex

14

› Multiple category features› Sparse features› Can’t “divide” discretized features› Continuous dependency› “Golden feature”

MatrixNet limitations

Page 15: Michael Levin - MatrixNet Applications at Yandex

15

› MatrixNet is GBDT with bells and whistles› Handles numeric and categorical features› Almost no tuning is needed› Training is parallelized and optimized› Often superior to other available models› Some careful feature preparation is needed because of

limitations

Conclusions

Page 16: Michael Levin - MatrixNet Applications at Yandex

Contacts

[email protected]

Michael Levin

Chief Data Scientist,

Yandex Data Factory