26
1 Symbolic Symbolic Perseus Perseus : : a Generic a Generic POMDP Algorithm with POMDP Algorithm with Application to Dynamic Pricing Application to Dynamic Pricing with Demand Learning with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009

Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

1

Symbolic Symbolic PerseusPerseus: : a Generic a Generic POMDP Algorithm with POMDP Algorithm with

Application to Dynamic Pricing Application to Dynamic Pricing with Demand Learningwith Demand Learning

Pascal Poupart (University of Waterloo)

INFORMS 2009

Page 2: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

2

OutlineOutline

• Dynamic Pricing as a POMDP• Symbolic Perseus

– Generic POMDP solver– Point-based value iteration– Algebraic decision diagrams

• Experimental evaluation• Conclusion

Page 3: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

3

SettingSetting• One or several firms (monopoly or oligopoly)• Fixed capacity and fixed number of selling rounds

(i.e., sale of seasonal items)• Finite range of prices• Unknown and varying demand

• Question: how to dynamically adjust prices to maximize sales?

Page 4: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

4

POMDPsPOMDPs Formulation (monopoly)Formulation (monopoly)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Time Time Time Time

Firm

Con

sum

er

Page 5: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

5

POMDPsPOMDPs Formulation (oligopoly)Formulation (oligopoly)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Price-i Price-i Price-i Price-i

Sales Sales

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Firm

Com

petit

ors

Con

sum

er

Page 6: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

6

Unknown demand & competitorsUnknown demand & competitors

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Price-i Price-i Price-i Price-i

Sales Sales

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Firm

Com

petit

ors

Con

sum

er

Page 7: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

7

Demand ModelDemand Model• Probability that consumer chooses firm i:

Pr(CC=i) = eai+bipi

Σi eai+bipi + 1

• Parameters ai and bi are unknown• Learn them

– From historical data– As process evolves

Page 8: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

8

CompetitorsCompetitors• Model each competitor:

– Pricing strategy: inv/time price– Two thresholds: tup and tdown

• If inv/time < tup price↑

• If inv/time > tdown price↓

• Learn thresholds – From historical data– As process evolves

Page 9: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

9

Expanded POMDPExpanded POMDP

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Firm

Con

sum

er

Price-i Price-i Price-i Price-i

Inv-i Inv-i Inv-i Inv-i

Time Time Time Time

Com

petit

ors

A, B A, B A, B A, B

T↑, T↓ T↑, T↓ T↑, T↓ T↑, T↓

Page 10: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

10

POMDPsPOMDPs• Partially Observable Markov Decision Processes

– S: set of states• Cross product of domain of all variables• |S| = ∏i |dom(Vi)| (exponentially large!)

– A: set of actions• {price↑, price↓, price unchanged}

– O: set of observations• Cross product of domain of observable variables

– T(s,a,s’) = Pr(s’|s,a): transition function• Factored rep: Pr(s’|s,a) = ∏i Pr(Vi|parents(Vi))

– R(s,a) = r: reward function• Sale = price x CC

Page 11: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

11

Belief monitoringBelief monitoring• Belief: b(s)

– Distribution over states

• Belief update: Bayes theorem– bao’(s’) = k Σs∈S b(s) Pr(s’|s,a) Pr(o’|a,s’)– bao’ = < o’, a, b >

• Demand learning and opponent modeling:– Implicit learning by belief monitoring

Page 12: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

12

Policy treesPolicy trees• Policy π

– Mapping from past actions & obs to next action– Tree representation

– Problem: tree grows exponentially with time

a1

a3a2

a7a6a5a4

o1 o2

o1 o2

o1 o2

Page 13: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

13

Policy OptimizationPolicy Optimization• Policy π : B A

– mapping from beliefs to actions

• Value function Vπ(b) = Σt γt Ebt|π [R]

• Optimal policy π*:– V*(b) ≥ Vπ(b) for all π,b

• Bellman’s Equation:– V*(b) = maxa Eb[R] + γ Σo’ Pr(o’|s,a) V*(bao’)

Page 14: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

14

DifficultiesDifficulties

• Exponentially large state space– |S| = ∏i |dom(Vi)|– Solution: algebraic decision diagrams

• Complex policy space– Policy π : B A– Continuous belief space– Solution: point-based Bellman backups

Page 15: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

15

Symbolic Symbolic PerseusPerseus

• Publicly available:– http://www.cs.uwaterloo.ca/~ppoupart/software.html

• Has been used to solve POMDPs with millions of states

• Currently used by– Intel, Toronto Rehabilitation Institute, Univ of Dundee,

Technical Univ of Lisbon, Univ of British Columbia, Univ of Manchester, Univ of Waterloo

Point-based value iteration

algebraic decision diagrams

+

Page 16: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

16

Piecewise linear & convex Piecewise linear & convex valval fnfn• Value of a policy tree β is linear

Vβ(b0) = Σs∈S b0(s) Vβ(s)

• Value of an optimal finite horizon policy is piecewise-linear and convex [SS73]

belief spaceb(s)=0 b(s)=1

Page 17: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

17

PointPoint--based value iterationbased value iteration• Point-based backup (Pineau & al. 2003)

αt-1(b) = maxa Eb[R] + γ Σo’ Pr(o’|s,a) αt(bao’)

b

VtVt-1

bao2bao1

Page 18: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

18

Algebraic Decision DiagramsAlgebraic Decision Diagrams• First use in MDPs: Hoey et al. 1999

• Factored Representation– Exploit conditional independence– Pr(s’|s,a) = ∏i Pr(Vi|parents(Vi))

• Automatic State aggregation– Exploit context specific independence– Exploit sparsity

Page 19: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

19

Factored RepresentationFactored Representation

• Transition fn: Pr(s’|s,a) – Flat representation: matrix O(|S|2)– Factored representation: often O(log |S|)

Price

CC

Inv

Sales

Price

CC

Inv

Price

CC

Inv

Price

CC

Inv

Sales Sales

Time Time Time Time

Page 20: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

20

Computation with Factored RepComputation with Factored Rep

• Belief monitoring: – bao’(s’) = k Pr(o’|a,s’) Σs b(s) Pr(s’|s,a)

• Point-based Bellman backup:– α(s) = maxa R(s,a) + Σs’o’ Pr(s’|s,a) Pr(o’|a,s’) αao’(s’)

• Flat representation: O(|S|2)• Factored representation: often O(|S| log |S|)

Page 21: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

21

Algebraic Decision DiagramsAlgebraic Decision Diagrams

• Tree-based representation– Acyclic directed graph

• Avoid duplicate entries– Exploit context

specific independence– Exploit sparsity

2x~y~z0x~yz0xy~z0xyz

X

Y Y

Z

0 2

3

3~x~y~z3~x~yz2~xy~z0~xyz

Page 22: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

22

Empirical ResultsEmpirical Results• Monopolistic Dynamic Pricing

448199192817,92035 / 70

350199188605,12030 / 60

161198182424,32025 / 50

61187171275,52020 / 40

48167152158,72015 / 30

19 13312173,92010 / 20

Runtime (min)

Upper bound

SP Value|S|Inv / Time

Page 23: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

23

COACH projectCOACH project• Automated prompting system to help elderly persons

wash their hands• IATSL: Alex Mihailidis, Jesse Hoey, Jennifer Boger et al.

Page 24: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

24

Policy OptimizationPolicy Optimization

• Partially observable MDP:– Handle noisy HandLocation and noisy WaterFlow– Can adapt to user responsiveness– 50,181,120 states, 20 actions, 12 observations

• Approximation: fully observable MDP– Assume HandLocation, WaterFlow are fully observable– Remove responsiveness user variable– 25,090,560 states, 20 actions

Page 25: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

25

Empirical Comparison (Simulation)Empirical Comparison (Simulation)

Page 26: Symbolic Perseus: a Generic POMDP Algorithm with ...ppoupart/publications/...2 Outline • Dynamic Pricing as a POMDP • Symbolic Perseus – Generic POMDP solver – Point-based

26

ConclusionConclusion• Natural encoding of Dynamic Pricing as POMDP

– Demand and competitor learning by belief monitoring– Factored model

• Symbolic Perseus (generic POMDP solvers)– Point-based value iteration + algebraic decision diagrams– Exploit problem specific structure

• Future work– Bayesian reinforcement learning– Planning as inference