47
@schuilr 1

Boosting Ad Revenue Using Reinforcement Learning (Robin Schuil Technology Stream)

Embed Size (px)

Citation preview

@schuilr 1

Case Study: Marktplaats.nl

@schuilr 2

Marktplaats.nl

•  Largest classifieds site in the Netherlands

•  One of the most visited websites in NL

•  Founded in 1999, acquired by eBay in 2004

•  Now headquarters to eBay Classifieds Group: 12 brands in 17 countries

@schuilr 3

Facts & Figures

•  1.3 million visitors / day–  desktop: 34%, mobile: 49%, tablet: 18%

•  9 million live listings–  350,000 new items / day

•  6 million unique search requests / day–  70 searches per second (average)

@schuilr 4

Data & Trends @ Marktplaats

Seasonal trends

@schuilr 6

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

skibroek

ski

skipak

snowboard

Winter sports!

Seasonal trends

@schuilr 7

Camping!

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

4.00%

4.50%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

caravans

campers

vouwwagen

Seasonal trends

@schuilr 8

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

sinterklaas

kerst

Saint Nicolas & Christmas!

Weather, temperature, etc.

@schuilr 9

0"

5"

10"

15"

20"

25"

0.00%"

1.00%"

2.00%"

3.00%"

4.00%"

5.00%"

6.00%"

7.00%"

1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"

Tempe

ratuur)

Vraag)

Week)

vliegengordijn"

Temperatuur"

Fly curtains!

Weather, temperature, etc.

@schuilr 10

Heaters!0"

5"

10"

15"

20"

25"0.00%"

0.50%"

1.00%"

1.50%"

2.00%"

2.50%"

3.00%"

3.50%"

4.00%"

1" 3" 5" 7" 9" 11" 13" 15" 17" 19" 21" 23" 25" 27" 29" 31" 33" 35" 37" 39" 41" 43" 45" 47" 49" 51"

Tempe

ratuur)

Vraag)

Week)

kachel"

Temperatuur"

Reversed

Special events

@schuilr 11

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Vraa

g

Week

oranje

Orange (“oranje”)!

World Cup Football

King’s Day

During a football game

@schuilr 12

20:45

&20:48

&20:51

&20:54

&20:57

&21:00

&21:03

&21:06

&21:09

&21:12

&21:15

&21:18

&21:21

&21:24

&21:27

&21:30

&21:33

&21:36

&21:39

&21:42

&21:45

&21:48

&21:51

&21:54

&21:57

&22:00

&22:03

&22:06

&22:09

&22:12

&22:15

&22:18

&22:21

&22:24

&22:27

&22:30

&22:33

&22:36

&22:39

&22:42

&22:45

&22:48

&22:51

&22:54

&22:57

&23:00

&23:03

&23:06

&23:09

&23:12

&23:15

&

Last&Friday& This&Friday&

Break

Kick-off 1 - 0 1 - 1 1 - 2 1 - 3 1 - 4 1 - 5 End

“Juichpakken”

0.00%$

5.00%$

10.00%$

15.00%$

20.00%$

25.00%$

1$ 3$ 5$ 7$ 9$ 11$ 13$ 15$ 17$ 19$ 21$ 23$ 25$ 27$ 29$ 31$ 33$ 35$ 37$ 39$ 41$ 43$ 45$ 47$ 49$ 51$

Vraag%

Week%

roy$donders$

juichpak$

Exploiting trends

@schuilr 14

“Nieuw & populair”

@schuilr 15

•  “Nieuw & populair” = trending products

•  Pay-per-click advertising model

•  Advertisers bid for clicks, similar to Google Adwords

•  Metric to optimize: �Revenue Per Mille (RPM) = CTR * bid * 1,000

First (minimal) version

•  Find top 100 “trending” keywords using Spark•  Randomly pick one of those keywords•  Display top 4 results for the selected keyword

@schuilr 16

Can we do better?

•  CTR and bid varies per keyword. Random selection gives average performance.

•  Doesn’t consider the user’s personal preferences

@schuilr 17

GLOBAL OPTIMIZATION PART I

@schuilr 18

One armed bandit = slot machine�

Problem:�How to pick between slot machines �

so that you maximize profit?

@schuilr 19

Exploration – Exploitation

•  Explore (learn)"Try out different candidates to learn how they perform over time

•  Exploit (earn)"Take advantage of what you’ve learned to maximize payoff (your current best guess)

@schuilr 20

Many different approaches

•  Epsilon First•  Epsilon Greedy•  Upper Confidence Bound•  Thompson Sampling•  LinUCB

@schuilr 21

Epsilon First

Time

Random

Learn: collect data for each

candidate

( split testing, A/B testing )

Best

Earn: show the best

performer

@schuilr 22

Epsilon First •  Simple and intuitive•  Lots of tools available (VWO, Optimizely, …)�

•  Average reward until exploration is finished•  What if the best candidate is no longer the best?

@schuilr 23

Epsilon Greedy

Best (90%)

Time

Random (10%)

Continuous exploration

@schuilr 24

Epsilon Greedy •  Very simple to implement and surprisingly effective•  Can deal with nonstationary problems

•  How to determine the optimal value for ε?

@schuilr 25

Upper Confidence Bound Basic idea:•  Calculate mean and a measure of uncertainty

(variance) for each candidate•  Pick current best performer based on mean +

uncertainty bonus

@schuilr 26

Measuring uncertainty

Observed mean: 0.50

95% certain that true mean ≤ 0.76

Uncertainty bonus: 0.26

@schuilr 27

More data = less uncertainty

95% certain that true mean ≤ 0.63

Uncertainty bonus: 0.13

@schuilr 28

Mean + uncertainty bonus

Upper Confidence Bound

A

B

C

Es)matedreward

Pick “A”!

@schuilr 29

Upper Confidence Bound •  Selecting “A” reduces uncertainty•  Candidate “C” now has the highest score

A

B

C

Es)matedreward

Pick “C”!

@schuilr 30

Upper Confidence Bound

•  Uses variance measure to automatically balance exploration with exploitation�

•  Deterministic; requires online learning (not suited for small-batch mode)

@schuilr 31

Thompson Sampling Basic idea:•  The number of pulls for a given lever should match

its actual probability of being the optimal lever�

•  Sample from the posterior for the mean of each lever:�

p(λ|X) = Gamma(conv + prior_conv, impr + prior_impr)

@schuilr 32

Few conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 11 282 42%

B (3.3%) 2 61 39%

C (2.8%) 4 143 19%

@schuilr 33

More conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 93 2,382 82%

B (3.3%) 66 2,011 13%

C (2.8%) 31 1,093 5%

@schuilr 34

Many conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 892 22,882 97%

B (3.3%) 174 5,261 2%

C (2.8%) 66 2,343 1%

@schuilr 35

Lots of conversions Candidate Conversions Impressions Chance of being

winner

A (3.9%) 5,621 144,132 > 99%

B (3.3%) 256 7,761 < 1%

C (2.8%) 101 3,593 < 1%

@schuilr 36

Thompson Sampling

•  Weighted random sampling•  Works well in small-batch mode�

•  Doesn’t consider context (e.g. user’s personal preferences)

@schuilr 37

PERSONALIZATION PART II

@schuilr 38

LinUCB Basic idea:•  Define a “context” of information of the user•  Fit a per-candidate logistic regression model•  Applies the concept of Upper Confidence Bound

(UCB)–  mean + uncertainty bonus

@schuilr 39

Context •  Gender•  Recently viewed categories•  Current date•  Weather forecast•  …

Principal Component Analysis (PCA) to reduce sparseness and computation complexity

@schuilr 40

LinUCB Mean + uncertainty bonus:

μα(t) + σα(t)

@schuilr 41

Pruning

•  Periodically remove weakest performers•  Replace with new, unexplored “trending keywords”•  Rinse and repeat

@schuilr 42

Results

@schuilr 43

Random Optimized

× 2.8!

Endless possibilities

•  News homepage•  Online advertising•  Deciding which thumbnail to show on the SERP•  Etc, etc ...

@schuilr 44

Reading List “Bandit Algorithms for Website Optimization”�http://bit.ly/bandits-book

“Reinforcement Learning”�http://bit.ly/rl-book

@schuilr 45

@SCHUILR"LINKEDIN.COM/IN/ROBINSCHUIL

Дякую

@schuilr 46

References •  https://en.wikipedia.org/wiki/Multi-armed_bandit•  http://shop.oreilly.com/product/0636920027393.do•  https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html•  http://www.slideshare.net/chucheng/efficient-approximate-thompson-sampling-for-search-query-recommendation•  http://www.slideshare.net/iliasfl/multiarmed-bandits-intro-examples-and-tricks•  http://www.slideshare.net/mgershoff/conductrics-bandit-basicsemetrics1016•  http://www.slideshare.net/MarkusOjala1/multi-armed-bandits-and-optimized-online-marketing-54679491

@schuilr 47