40
Conductrics twitter: @mgershoff Bandit Basics – A Different take on Online Optimization

Conductrics bandit basicsemetrics1016

Embed Size (px)

Citation preview

Page 1: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

Bandit Basics – A Different

take on Online Optimization

Page 2: Conductrics bandit basicsemetrics1016

Who is this guy?

Matt Gershoff

CEO: Conductrics

Many Years in Database Marketing (New York

and Paris)

and a bit of Web Analytics

www.conductrics.com

twitter:@mgershoff Email:[email protected]

Page 3: Conductrics bandit basicsemetrics1016

Speak Up

Conductrics twitter: @mgershoff

Page 4: Conductrics bandit basicsemetrics1016

What Are We Going to Hear?

• Optimization Basics

• Multi-Armed Bandit

• Its a Problem, Not a Method

• Some Methods

• AB Testing

• Epsilon Greedy

• Upper Confidence Interval (UCB)

• Some Results

Page 5: Conductrics bandit basicsemetrics1016

Choices Targeting

Learning Optimization

Conductrics twitter: @mgershoff

Page 6: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

OPTIMIZATION

If THIS Then THAT ITTT brings together:

1.Decision Rules

2.Predictive Analytics

3.Choice Optimization

Page 7: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

If THIS Then THAT

OPTIMIZATION

Find and Apply the Rule with the most Value

If THIS Then THAT

If THIS Then THAT If THIS Then THAT

If THIS Then THAT

If THIS Then THAT

If THIS Then THAT

If THIS Then THAT

If THIS Then THAT If THIS Then THAT

If THIS Then THAT

If THIS Then THAT

Page 8: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

OPTIMIZATION

If Then Facebook

High Spend

Urban GEO

.

.

.

.

.

Home Page

App Use

Offer A

Offer B

Offer C

.

.

.

.

.

Offer Y

Offer Z

Variables whose Values

Are Given to You Variables whose Values

You Control

F1

F2

Fm

S Valuei

Predictive Model

THIS THAT

Inputs Outputs

Page 9: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

But …

Offer A ?

Offer B ?

Offer C ?

.

.

.

.

.

Offer Y ?

Offer Z ?

THAT

1. We Don’t Have Data on ‘THAT’

2. Need to Collect – Sample

3. How to Sample Efficiently?

Page 10: Conductrics bandit basicsemetrics1016

Where

Marketing Applications:

• Websites

• Mobile

• Social Media Campaigns

• Banner Ads

Pharma: Clinical Trials

Conductrics twitter: @mgershoff

Page 11: Conductrics bandit basicsemetrics1016

What is a Multi Armed Bandit

A

OR

B

One Armed Bandit –>Slot Machine

Conductrics twitter: @mgershoff

The problem:

How to pick between Slot Machines so that

you walk out with most $$$ from Casino at the

end of the Night?

Page 12: Conductrics bandit basicsemetrics1016

Objective

Pick so as to get the most

return/profit as you can

over time

Technical term: Minimize Regret

Conductrics twitter: @mgershoff

Page 13: Conductrics bandit basicsemetrics1016

… but how to Pick?

A

OR

B

Sequential Selection

Conductrics twitter: @mgershoff

Need to Sample, but do it efficiently

Page 14: Conductrics bandit basicsemetrics1016

Explore – Collect Data

• Data Collection is costly – an Investment

• Be Efficient – Balance the potential value of

collecting new data with exploiting what

you currently know.

A

OR

B

Conductrics twitter: @mgershoff

Page 15: Conductrics bandit basicsemetrics1016

Multi-Armed Bandits

“Bandit problems embody in essential

form a conflict evident in all human

action: choosing actions which yield

immediate reward vs. choosing actions

… whose benefit will come only later.”*

- Peter Whittle

*Source: Qing Zhao, UC Davis. Plenary talk at SPAWC, June, 2010.

Conductrics twitter: @mgershoff

Page 16: Conductrics bandit basicsemetrics1016

Exploration Exploitation

1) Explore/Learn – Try out different actions

to learn how they perform over time – This is

a data collection task.

2) Exploit/Earn – Take advantage of what

you have learned to get highest payoff –

Your current best guess

Conductrics twitter: @mgershoff

Page 17: Conductrics bandit basicsemetrics1016

Not A New Problem

1933 – first work on competing

options

1940 – WWII Problem Allies

attempt to tackle

1953 – Bellman formulates as a

Dynamic Programing problem

Source: http://www.lancs.ac.uk/~winterh/GRhist.html

Conductrics twitter: @mgershoff

Page 18: Conductrics bandit basicsemetrics1016

Testing

• Explore First

–All actions have an equal chance of

selection (uniform random).

–Use hypothesis testing to select a

‘Winner’.

• Then Exploit - Keep only ‘Winner’

for selection

Conductrics twitter: @mgershoff

Page 19: Conductrics bandit basicsemetrics1016

Learn First

Conductrics twitter: @mgershoff

Time

Explore/

Learn

Exploit/

Earn

Data Collection/Sample Apply Leaning

Page 20: Conductrics bandit basicsemetrics1016

P-Values: A Digression

P-Values:

• NOT the probability that the Null is

True. P( Null=True| DATA)

• P(DATA (or more extreme)| Null=True)

• Not a great tool for deciding when to

stop sampling See:

http://andrewgelman.com/2010/09/noooooooooooooo_1/

http://www.stat.duke.edu/~berger/papers/02-01.pdf

Conductrics twitter: @mgershoff

Page 21: Conductrics bandit basicsemetrics1016

A Couple Other Methods

1. Epsilon Greedy

Nice and Simple

2. Upper Confidence Bounds(UCB)

Adapts to Uncertainty

Conductrics twitter: @mgershoff

Page 22: Conductrics bandit basicsemetrics1016

1) Epsilon-Greedy

Conductrics twitter: @mgershoff

Page 23: Conductrics bandit basicsemetrics1016

Greedy

What do you mean by ‘Greedy’?

Make whatever choice seems

best at the moment.

Page 24: Conductrics bandit basicsemetrics1016

Epsilon Greedy

• Explore – randomly select action

percent of the time (say 20%)

• Exploit – Play greedy (pick the

current best) 1- (say 80%)

What do you mean by ‘Epsilon

Greedy’?

Page 25: Conductrics bandit basicsemetrics1016

Epsilon Greedy User

Select

Randomly

Like AB Testing

Select Current

Best

(Be Greedy)

Explore/Learn

(20%)

Exploit/Earn

(80%)

Conductrics twitter: @mgershoff

Page 26: Conductrics bandit basicsemetrics1016

Epsilon Greedy

Action Value

A $5.00

B $4.00

C $3.00

D $2.00

E $1.00

80% Select Best 20% Random

Conductrics twitter: @mgershoff

Page 27: Conductrics bandit basicsemetrics1016

Continuous Sampling

Conductrics twitter: @mgershoff

Time

Explore/Learn

Exploit/Earn

Page 28: Conductrics bandit basicsemetrics1016

Epsilon Greedy

–Super Simple/low cost to implement

–Tends to be surprisingly effective

–Less affected by ‘Seasonality’

–Not optimal (hard to pick best )

–Doesn’t use measure of variance

–Should/How to decrease Exploration over

time? Conductrics twitter: @mgershoff

Page 29: Conductrics bandit basicsemetrics1016

Upper Confidence Bound

Basic Idea:

1) Calculate both mean and a measure

of uncertainty (variance) for each

action.

2) Make Greedy selections based on

mean + uncertainty bonus

Conductrics twitter: @mgershoff

Page 30: Conductrics bandit basicsemetrics1016

Confidence Interval Review

Confidence Interval = mean +/- z*Std

Mean - 2*Std +2*Std

Conductrics twitter: @mgershoff

Page 31: Conductrics bandit basicsemetrics1016

Upper Confidence

Mean +Bonus

Score each option using the upper

portion of the interval as a Bonus

Conductrics twitter: @mgershoff

Page 32: Conductrics bandit basicsemetrics1016

Upper Confidence Bound

$0

Estimated Reward

$5 $10

A

B

C

1) Use upper portion of CI as ‘Bonus’

Select A

Conductrics twitter: @mgershoff

2) Make Greedy Selections

Page 33: Conductrics bandit basicsemetrics1016

Upper Confidence Bound

$0 Estimated Reward

$5 $10

A

B

C

1) Selecting Action ‘A’ reduces uncertainty

bonus (because more data)

Select C

Conductrics twitter: @mgershoff

2) Action ‘C’ now has highest score

Page 34: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

Upper Confidence Bound

• Like A/B Test – uses variance

measure

• Unlike A/B Test – no hypothesis

test

• Automatically Balances

Exploration with Exploitation

Page 35: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

Treatment

Conversion

Rate Served

V2V3 9.9% 14,893

V2V2 9.7% 9,720

V2V1 8.0% 2,441

V1V3 3.3% 2,090

V2V3 2.6% 1,849

V2V2 2.0% 1,817

V1V1 1.8% 1,926

V3V1 1.8% 1,821

V1V2 1.5% 1,873

Case Study:

Page 36: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

Case Study

Test Method Conversion

Rate

Adaptive 7%

Non Adaptive 4.5%

Page 37: Conductrics bandit basicsemetrics1016

AB Testing V Bandit

Conductrics twitter: @mgershoff

Option A ->

Option B ->

Option C ->

Page 38: Conductrics bandit basicsemetrics1016

Why Should I Care?

Conductrics twitter: @mgershoff

• More Efficient Learning

• Automation

• Changing World

Page 39: Conductrics bandit basicsemetrics1016

Questions?

Conductrics twitter: @mgershoff

Page 40: Conductrics bandit basicsemetrics1016

Conductrics twitter: @mgershoff

Matt Gershoff

p) 646-384-5151

e) [email protected]

t) @mgershoff

Thank You!