Tutorial 11 (computational advertising)

Preview:

DESCRIPTION

Part of the Search Engine course given in the Technion (2011)

Citation preview

Computational advertising

Kira Radinsky

Slides based on material from the paper

“Bandits for Taxonomies: A Model-based Approach” by

Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti,

Vanja Josifovski, in SDM 2007

The Content Match Problem

Advert

isers

Ads

DB

Ads

Ad impression: Showing an ad to a user

(click)

The Content Match Problem

Advert

isers

Ads

Ad click: user click leads to revenue for ad server and content provider

Ads

DB

(click)

The Content Match Problem

Advert

isers

Ads

DB

Ads

The Content Match Problem:

Match ads to pages to maximize clicks

The Content Match Problem

Advert

isers

Ads

DB

Ads

Maximizing the number of clicks means: For each webpage, find the ad with the best

Click-Through Rate (CTR) but without wasting too many impressions in

learning this.

Outline

Problem

Background: Multi-armed bandits

• Proposed Multi-level Policy

• Experiments

• Conclusions

Background: Bandits

Bandit “arms”

p1 p2 p3(unknown payoff

probabilities)

Pull arms sequentially so as to maximize the total

expected reward

• Estimate payoff probabilities pi

• Bias the estimation process towards better arms

Background: Bandits Solutions

• Try 1: Greedy Solution:

• Compute the sample mean of an arm A by dividing the total reward received from the arm by the number of times the arm has been pulled. At each time step choose the arm with

highest sample mean.

• Try 2: Naïve solution:

• Pull each arm an equal number of times.

• Epsilon-greedy strategy:

• The best bandit is selected for a proportion 1 − ε of the trials,

and another bandit is randomly selected (with uniform

probability) for a proportion ε.

• Many more strategies

Background: Bandits

Bandit Policy

1.Assign priority to

each arm

2. “Pull” arm with

max priority, and

observe reward

3.Update priorities

Priority 1 Priority 2 Priority 3

Allocation

Estimation

Background: Bandits

Why not simply apply a bandit policy

directly to the problem?

• Convergence is too slow

~109 instances of the MAB

problem(bandits), with ~106 arms per

instance (bandit)

• Additional structure is available, that

can help Taxonomies

Outline

Problem

Background: Multi-armed bandits

Proposed Multi-level Policy

• Experiments

• Conclusions

Multi-level Policy

Ads

Webpages

… …

……

……

classes

classes

Consider only two levels

Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Consider only two levels

Tra

ve

lC

om

pu

-

ters

Ap

pa

rel

Ad parent

classes

Ad child classes

Block

One MAB problem

instance (bandit)

Multi-level Policy

ApparelCompu-

ters Travel

… …

……

……

Key idea: CTRs in a block are homogeneous

Ad parent

classes

Block

One MAB problem

instance (bandit)

Tra

ve

lC

om

pu

-

ters

Ap

pa

rel

Ad child classes

Multi-level Policy

• CTRs in a block are homogeneous

– Used in allocation (picking ad for each new page)

– Used in estimation (updating priorities after each observation)

Multi-level Policy

• CTRs in a block are homogeneous

Used in allocation (picking ad for each new page)

– Used in estimation (updating priorities after each observation)

C

A C T

AT

Multi-level Policy (Allocation)

?

Page

classifier

• Classify webpage page class, parent page class

• Run bandit on ad parent classes pick one ad parent class

C

A C T

AT

Multi-level Policy (Allocation)

• Classify webpage page class, parent page class

• Run bandit on ad parent classes pick one ad parent class

• Run bandit among cells pick one ad class

• In general, continue from root to leaf final ad

?

Page

classifier

ad

C

A C T

AT

ad

Multi-level Policy (Allocation)

Bandits at higher levels

• use aggregated information

• have fewer bandit arms

Quickly figure out the best ad parent class

Page

classifier

Multi-level Policy

• CTRs in a block are homogeneous

Used in allocation (picking ad for each new page)

Used in estimation (updating priorities after each observation)

Multi-level Policy (Estimation)

• CTRs in a block are homogeneous

– Observations from one cell also give information about others in the block

– How can we model this dependence?

Multi-level Policy (Estimation)

• Shrinkage Model

Scell | CTRcell ~ Bin (Ncell, CTRcell)

CTRcell ~ Beta (Paramsblock)

# clicks in

cell

# impressions in cell

All cells in a block come from the same distribution

Multi-level Policy (Estimation)

• Intuitively, this leads to shrinkageof cell CTRs towards block CTRs

E[CTR] = α.Priorblock + (1-α).Scell/Ncell

Estimated

CTR

Beta prior (“block

CTR”)

Observed

CTR

Outline

Problem

Background: Multi-armed bandits

Proposed Multi-level Policy

Experiments

• Conclusions

Experiments [S. Panday et al. 2007]

Root

20 nodes

221 nodes

~7000 leaves

Taxonomy structure

use these 2

levels

Depth 0

Depth

7

Depth 1

Depth 2

Experiments

• Data collected over a 1 day period

• Collected from only one server, under some other ad-matching rules (not our bandit)

• ~229M impressions

• CTR values have been linearly transformed for purposes of confidentiality

Experiments (Multi-level Policy)

Multi-level gives much higher #clicks

Number of pulls

Clic

ks

Experiments (Multi-level Policy)

Multi-level gives much better Mean-Squared Error it has learnt

more from its explorations

Mean-S

qu

are

d E

rror

Number of pulls

Conclusions

• When having a CTR guided system, exploration is a key component

• Short term penalty for the exploration needs to be limited (exploration budget)

• Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance)

• Exploration in a reduced dimensional space: class hierarchy

• Top down traversal of the hierarchy to determine the class of the ad to show

Recommended