16
Copyright © 2015 Criteo New challenges for scalable machine learning in online advertising Olivier Koch Engineering Program Manager, Criteo ICML Online Advertising Systems Workshop June 24, 2016

New challenges for scalable machine learning in online advertising

Embed Size (px)

Citation preview

Page 1: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

New challenges for scalable machine

learning in online advertising

Olivier Koch

Engineering Program Manager, Criteo

ICML Online Advertising Systems Workshop

June 24, 2016

Page 2: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

What we do

2

Advertiser Publisher

Page 3: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning applications at Criteo

• Bidding (2nd price auctions)

• Product recommendation

• Banner look and feel selection

Page 4: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning at Criteo

• Supervised learning using standard regression methods / optimization algorithms (SGD, L-BFGS)

• Distribution on Hadoop (MapReduce, Spark)

• 3B displays / day

• 40 PB of data -- 15,000 servers

• 7 data centers worldwide

Page 5: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

The good news

• New generations of algorithms

• NLP (word embeddings), reinforcement learning, policy learning, deep networks

• Releases of ML infrastructures

• Caffe on Spark, TensorFlow, Torch, PhotonML, GPUs inside clusters

→ strong traction in the academic/industrial community

Page 6: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

The good news (c’ed)

• A lot of data is available

• Interactions with banners : clicks

• Interactions with products/advertisers : sales, baskets, home views, listings, visit history

• New data is coming

• Mobile, cross-device, (offline)

Page 7: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Now what?

Page 8: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 1/3

• The technical debt of large-scale machine learning systems

• AB tests = snapshots. Are we missing long term effects?

• Some models become hard to improve. Are we overfitting or using the wrong metrics?

• We need to deal with a growing number of models – e.g. automate feature engineering

Page 9: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 2/3

• We want to provide a better online advertising experience

• Personalized

• Cross-device

• Long tail (new users, new products)

Page 10: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Challenges in online advertising 3/3

• Credit assignment and incrementality

• Several clicks might be needed to generate a sale

• We should probably optimize a series of bids as opposed to single bids

• What is the optimal credit assignment scheme?

• We optimize what clients give us

• Attributed sales may not be the right target

• Global sales increase are noisy

Page 11: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning to the rescue

• Offline metrics – counterfactual analysis

• Optimal bidding strategies under uncertainty -- reinforcement learning

• Classification/prediction of time series

• Long tail (users, products) -- transfer learning, factorization

• Probabilistic match of devices

Page 12: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Machine learning to the rescue

• Offline metrics – counterfactual analysis

• Optimal bidding strategies under uncertainty -- reinforcement learning

• Classification/prediction of time series

• Long tail (users, products) -- transfer learning, factorization

• Probabilistic match of devices

Page 13: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Offline metrics – counterfactual analysis

• Option 1 : run a controlled experiment (AB test)• How would the system behave if I replaced model M by model M*?• Takes time to conclude• Costs money if M* is worse than M (often)• Does not measure long-term effects

• Option 2 : use counter-factual analysis• How would the system have performed if, when the data was collected, we had replaced model M by model M∗?• Requires real-time randomization -- cost/exploration trade-off• Works best when M* is close to M• Trades time for computation and storage• Ignores future users’ and advertisers’ reactions

Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising, Bottou et al.

Page 14: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Optimal bidding strategies

• A user is seen more than 20 times a day on average

• Each action we take has an impact on the user, the advertiser and the competition

• Option 1 : model the environment and bid accordingly• Cannot go beyond the proxy being optimized

• Option 2 : no model, randomized experiments• Hard problem : very high-dimensional state space and very sparse rewards

Page 15: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Conclusions

• Machine learning applies well to online advertising at scale

• New algorithms, new infrastructures and more data are coming

• A number of challenges remain unresolved…

• … come help us solve them!

Page 16: New challenges for scalable machine learning in online advertising

Copyright © 2015 Criteo

Thanks! Questions?

[email protected]

Dataset released: http://bit.ly/criteodata