37
Copyright © 2015 Criteo Large - Scale Real - Time Product Recommendation at Criteo Romain Lerallut, Diane Gasselin RecSys Vienna, Sept 18, 2015

RecSys 2015: Large-scale real-time product recommendation at Criteo

Embed Size (px)

Citation preview

Page 1: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Large-Scale Real-Time Product

Recommendation at CriteoRomain Lerallut, Diane Gasselin

RecSys Vienna, Sept 18, 2015

Page 2: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

2

Page 3: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

« The largest internet company you’ve never heard of »

• Founded in 2005, in the adtech business since 2008

• Recommendation was our first product

• Disruptive business models

• 1700 people WW (50+% for less than a year)

• 300+ engineers

• 26 offices

• Live in 130 countries

• 1B unique users

Page 4: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

We buy

• Inventory ! (ad spaces)

• Billions of times a day

• All over the Internet

• For 95% of the population

=> Funding the Web

A technology company first and foremost

We sell

• Clicks !• (that convert)

• (that convert a lot)

=> Delight to our clients !

We take the risk

You pay only for what you get

Page 5: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

Page 6: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

Page 7: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

leads to

1 sale

Page 8: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

8

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

Page 9: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

9

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

<10 ms to process RTB request

<100 ms to process reco request

Page 10: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

10

Physical infrastructure

7 in-house data centers on 3 continents

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

<10 ms to process RTB request

<100 ms to process reco request

Page 11: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

11

Physical infrastructure

7 in-house data centers on 3 continents

~ 15000 servers, largest Hadoop cluster in Europe

More than 35 PB of storage Big Data

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

<10 ms to process RTB request

<100 ms to process reco request

Page 12: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

(Big) Data Sources

Ad display data20B events / day

User behavior data2B events / day

Catalog data1M+ products / client

10k clients

Page 13: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

How do we do it ?

Page 14: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Recommend products for a user

• What we want: reco(user) = products

• But 1B users x 3B products !

• And we need to scale and keep it fresh

• What we can do :

• Pre-select products offline (source)

• Refine recommendation online

Page 15: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

15

Offline : prepare sources

Advertiser events

Co events

Item View – Item View Item Sale – Item Sale

Best ofBest of by category

Similarities Complementarities

Top N

350M keys12B values

50B

50M keys1B values

Page 16: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

User X saw orange shoes

Offline : prepare sources

Historical

Similar

Best-of

Other users :

Most viewed products on the client website

Some candidate products for user X

Complementary

Page 17: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

OFFLINE

Reco overview

Advertiser

events

Source computation

Map-Reduce jobs

Recommendation Service

Display, Click, Sale logs

Prediction

models

Sources

Catalog

12h

4h

6h

4.5B

500M

100K qps

50B

Page 18: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

ML model

• Logistic regression models because : • They scale

• They are fast

• They can handle lots of features (with a bit of magic)

Product-specific User-specific User-product interactions Display-specific

Page 19: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Online: sources

Similarities Most viewed Most bought

Page 20: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Online: merge of products

Similarities Most viewed Most bought

Page 21: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Online: scoring

Similarities Most viewed Most bought

0,02 0,12 0,06 0,18 0,03 0,05 0,01 0,005 0,011 0,013 0,004 0,007

Page 22: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Online: scoring

Similarities Most viewed Most bought

0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004

Page 23: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Online: candidates

0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004

SHOP SHOP SHOP SHOP

-50%

Page 24: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Evaluation

Page 25: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

• It is the only truth we have

• 50% users on model A

• 50% users on model B

The basics : online ab-testing

My company

BUY! BUY!

BUY!

My company

BUY! BUY!

BUY!

Page 26: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

• It is the only truth we have

• 50% users on model A

• 50% users on model B

• But it is onerous• If not good, we lose money, fast !

• Tests are long (~2weeks needed to have good confidence intervals)

• Code has to be prod-ready (no bug, good performance), we run 24/7

• Can be heavy on the infrastructure

• And does not take long-term effect into account

The basics : online ab-testing

My company

BUY! BUY!

BUY!

My company

BUY! BUY!

BUY!

Page 27: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

The test framework for prediction

• ALTERNATIVE : Framework that replays production logs (offline)• 30 000 tests / year

• Replay ~x100

• BUT : we only have data on products we display (exploration iscostly)

• SO : we can only make sure we are not completely mistaken

Page 28: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Ultimate solution: offline ab-testing

• Find the best offline predictor for online performance

• Counterfactual Reasoning and Learning Systems

Léon Bottou Microsoft Research, Redmond, WA

Jonas Peters Max Planck Institute, Tübingen

Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly,

Dipankar Ray, Patrice Simard, Ed Snelson

• But we haven’t succeeded in making it precisely match reality..

Page 29: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Ultimate solution: offline ab-testing

• Find the best offline predictor for online performance

• Counterfactual Reasoning and Learning Systems

Léon Bottou Microsoft Research, Redmond, WA

Jonas Peters Max Planck Institute, Tübingen

Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly,

Dipankar Ray, Patrice Simard, Ed Snelson

• But we haven’t succeeded in making it precisely match reality.. YET

Page 30: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next ?

Page 31: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next for us : Upcoming challenges

• Long(er)-term user profiles

Page 32: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next for us : Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

Page 33: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next for us : Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

• Instant-update of similarities• (because batch computation is soooo last year)

Page 34: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next for us : Upcoming challenges

• Long(er)-term user profiles

• More and better product information (images, semantic, NLP)

• Instant-update of similarities• (because batch computation is soooo last year)

• Joined product scoring• (score full banner and not products independently)

Page 35: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

What’s next for you : Fancy a try ?

On your own:

With us !

http://labs.criteo.com/jobs/

• We published datasets for click prediction

• 4GB display-click data : Kaggle challenge in 2014 http://bit.ly/1vgw2XC• 1TB Display-Click data (industry’s largest dataset) : http://bit.ly/1PyH4Vq

• 4 billion of observations• 156 billion feature-value• available on Microsoft Azure• used by edX (UC Berkeley)

• We would be happy to share Recocentric data !

Page 36: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Questions?

Page 37: RecSys 2015: Large-scale real-time product recommendation at Criteo

Copyright © 2015 Criteo

Thank you !

[email protected] @Rlerallut

[email protected]

@recsysfr