20
Online Learning for Online Pricing Problems Maria Florina Balcan

Online Learning for Online Pricing Problems Maria Florina Balcan

Embed Size (px)

Citation preview

Page 1: Online Learning for Online Pricing Problems Maria Florina Balcan

Online Learning for Online

Pricing Problems

Maria Florina Balcan

Page 2: Online Learning for Online Pricing Problems Maria Florina Balcan

AlgorithmicAlgorithmic– Customers’ shopping lists / valuations known to

the algorithm. (Seller knows market well)

Incentive-compatible auctionIncentive-compatible auction– Customers submit lists / valuations to mechanism,

which decides who gets what for how much. Must be in customers’ interest to report truthfully.

On-line pricingOn-line pricing– Customers arrive one at a time, buy what they

want at current prices. Seller modifies prices over time.

Three versions (easiest to hardest)

Page 3: Online Learning for Online Pricing Problems Maria Florina Balcan

Adaptive algorithms for pricing a single good.

(Connections to experts and bandit problems)

$500 a glass

Page 4: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single good• Say you are selling lemonade (or a cool new software

tool, or tickets to the world’s fair).• Protocol #1: for t=1,2,…T

– Seller posts price pt

– Buyer arrives with valuation vt

– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.– vt revealed to algorithm.

• Protocol #2: same as protocol #1 but without last step.

• Assume all valuations in [1,h]

$500 a glass$1

$5.00 a glass

• Goal: do nearly as well as best fixed price in hindsight.

Page 5: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single good• Say you are selling lemonade (or a cool new software

tool, or tickets to the world’s fair).• Protocol #1: for t=1,2,…T

– Seller posts price pt

– Buyer arrives with valuation vt

– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.– vt revealed to algorithm.

• Bad algorithm: “best price in past”– What if sequence of buyers = 1, h, 1, …, 1, h, 1, …,

1, h, …– Alg makes T/h, OPT makes T.

Ratio of h worse!

Page 6: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single good• Say you are selling lemonade (or a cool new software

tool, or tickets to the world’s fair).• Protocol #1: for t=1,2,…T

– Seller posts price pt

– Buyer arrives with valuation vt

– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.– vt revealed to algorithm.

• Good algorithm: “combining expert advice”– Define one expert for each price p = (1+²)i 2 [1,h].– Best price of this form gives profit ¸ OPT/(1+²).– Run RWM algorithm. Get expected gain at least:

OPT/(1+²)2 - O(²-1 h log(²-1 log h))

[extra factor of h coming from range of gains]

Page 7: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single good• Say you are selling lemonade (or a cool new software

tool, or tickets to the world’s fair)

• Good algorithm: “combining expert advice”– Define one expert for each price p = (1+²)i 2 [1,h].– Best price of this form gives profit ¸ OPT/(1+²).– Run RWM algorithm. Get expected gain at least:

OPT/(1+²)2 - O(²-1 h log(²-1 log h))

[extra factor of h coming from range of gains]

Only arbitrarily small constant factor worse, with O(h log log h) additive loss!

Can’t hope to do much better: e.g., if only one high bidder dominates the rest.

Page 8: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single good• Say you are selling lemonade (or a cool new software

tool, or tickets to the world’s fair).• What about Protocol #2? [just see accept/reject

decision]– Now we can’t run RWM directly since we don’t know

how to penalize the experts!– Called the “adversarial bandit problem”– How can we solve that? $1

$5.00 a glass

Page 9: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single goodExponential Weights for Exploration and Exploitation (exp3)

RWM

Exp3

Distrib pt

Expert i ~ qt

$1.25

Gain git

Gain vector ĝt

qt

qt = (1-°)pt + ° unif

ĝt = (0,…,0, git/qi

t,0,…,0)

OPT

OPT

1. RWM believes gain is: pt ¢ ĝt = pit(gi

t/qit) ´ gt

RWM

3. Actual gain at t is: git = gt

RWM (qit/pi

t) ¸ gtRWM(1-°)

2. t gtRWM ¸ /(1+²) - O(²-1 nh/° log n)OPT

· nh/°

Page 10: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single goodExponential Weights for Exploration and Exploitation (exp3)

RWM

Exp3

Distrib pt

Expert i ~ qt

$1.25

Gain git

Gain vector ĝt

qt

qt = (1-°)pt + ° unif

ĝt = (0,…,0, git/qi

t,0,…,0)

OPT

OPT

1. RWM believes gain is: pt ¢ ĝt = pit(gi

t/qit) ´ gt

RWM

3. Actual gain is at t: git = gt

RWM (qit/pi

t) ¸ gtRWM(1-°)

2. t gtRWM ¸ /(1+²) - O(²-1 nh/° log n)OPT

· nh/°

3.5. Actual overall gain >= ° /(1+²) - O(²-1 nh/° log n)

OPT

Page 11: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single goodExponential Weights for Exploration and Exploitation (exp3)

RWM

Exp3

Distrib pt

Expert i ~ qt

$1.25

Gain git

Gain vector ĝt

qt

qt = (1-°)pt + ° unif

ĝt = (0,…,0, git/qi

t,0,…,0)

OPT

OPT

1. RWM believes gain is: pt ¢ ĝt = pit(gi

t/qit) ´ gt

RWM

3. Actual gain is at t is: git = gt

RWM (qit/pi

t) ¸ gtRWM(1-°)

2. t gtRWM ¸ /(1+²) - O(²-1 nh/° log n)OPT

4. E[ ] ¸ OPT. OPT Because E[ĝjt] = (1- qj

t)0 + qjt(gj

t/qjt) = gj

t ,

so E[maxj[t ĝjt]] ¸ maxj [ E[t ĝj

t] ] = OPT.

· nh/°

Page 12: Online Learning for Online Pricing Problems Maria Florina Balcan

Pricing a single goodExponential Weights for Exploration and Exploitation (exp3)

RWM

Exp3

Distrib pt

Expert i ~ qt

$1.25

Gain git

Gain vector ĝt

qt

qt = (1-°)pt + ° unif

ĝt = (0,…,0, git/qi

t,0,…,0)

OPT

OPT

Conclusion (° = ²): E[Exp3] ¸ OPT/(1+²)2 - O(²-2 h log(h) loglog(h))

Page 13: Online Learning for Online Pricing Problems Maria Florina Balcan

Algorithmic Problem, Single-minded Bidders

• n item types (coffee, cups, sugar, apples), with unlimited supply of each.

• m customers.

• Say all marginal costs to you are 0 [revisit this in a bit], and you know all the (Li, wi) pairs.

• Each customer i has a shopping list Li and will only shop if the total cost of items in Li is at most some amount wi (otherwise he will go elsewhere).

What prices on the items will make you the most What prices on the items will make you the most money?money?

• Easy if all Li are of size 1.

• What happens if all Li are of size 2?

Page 14: Online Learning for Online Pricing Problems Maria Florina Balcan

Algorithmic Pricing, Single-minded Bidders

• A multigraph G with values we on edges e.

• Goal: assign prices on vertices pv¸ 0 to maximize total profit, where:

• APX hard [GHKKKM’05].

10

40

15

2030

5

10

5

Page 15: Online Learning for Online Pricing Problems Maria Florina Balcan

A Simple 2-Approx. in the Bipartite Case

• Goal: assign prices on vertices pv ¸ 0 as to maximize total profit, where:

• Set prices in R to 0 and separately fix prices for each node on L.

• Set prices in L to 0 and separately fix prices for each node on R

• Take the best of both options.

AlgorithmAlgorithm

• Given a multigraph G with values we on edges e.

ProofProof simple!

OPT=OPTL+OPT

R

40

152535

1525

5

L R

Page 16: Online Learning for Online Pricing Problems Maria Florina Balcan

A 4-Approx. for Graph Vertex Pricing

• Goal: assign prices on vertices pv¸ 0 to maximize total profit, where:

• Randomly partition the vertices into two sets L and R.• Ignore the edges whose endpoints are on the same side and run the alg. for the bipartite case.

AlgorithmAlgorithm

ProofProof In expectation half of OPT’s profit is from edges with one endpoint in L and one endpoint in R.

• Given a multigraph G with values we on edges e.

simple!

10

40

15

2030

5

10

5

Page 17: Online Learning for Online Pricing Problems Maria Florina Balcan

Algorithmic Pricing, Single-minded Bidders,k-hypergraph Problem

What about lists of size · k?

– Put each node in L with probability 1/k, in R with probability 1 – 1/k.

– Let GOOD = set of edges with exactly one endpoint in L. Set prices in R to 0 and optimize L wrt GOOD.

• Let OPTj,e be revenue OPT makes selling item j to customer e. Let Xj,e be indicator RV for j 2 L & e 2 GOOD.

• Our expected profit at least:

AlgorithmAlgorithm

10

15

20

Page 18: Online Learning for Online Pricing Problems Maria Florina Balcan

On-line PricingCustomers arrive one at a time, buy or don’t buy at current prices.

In (full information) auction model, we know valuation info for customers 1,…,i-1 when customer i arrives.

In posted-price model, only know who bought what for how much.

Goal - do well compared to best fixed set of item prices.

Page 19: Online Learning for Online Pricing Problems Maria Florina Balcan

On-line Pricing

Can run separate online auctions over items in L, customers in GOOD (customers who want exactly one item in L).

Guarantee: perform comparably to best fixed set of item prices (for pts in L, people in GOOD).

Our O(k)-approx. alg. can be naturally adapted to the online setting, by using results of [BH’05] and [BKRW’03] for the online digital good auction.

Let OPTi be the best profit achievable (from item i) using a fixed price for item i from customers in GOOD whose bundle contain item i.Can use [BH’05] auction -- the expected profit of the online auction for item i is

Page 20: Online Learning for Online Pricing Problems Maria Florina Balcan

On-line PricingCan run separate online auctions over items in L, customers in GOOD (customers that who want exactly one item in L).

Let OPTi be the best profit achievable (from item i) using a fixed price for item i from customers in GOOD whose bundle contain item i.

Using the [BH’05] auction, the expected profit of the online auction for item i is:

Overall, we achieve profit at least:

Profit of the offline approx. alg.