29
Markov Decision Models for Markov Decision Models for Order Acceptance/Rejection Order Acceptance/Rejection Problems Problems Florian Defregger and Florian Defregger and Heinrich Kuhn Heinrich Kuhn Catholic University of Eichstätt-Ingolstadt Fifth International Conference on Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“ „Analysis of Manufacturing Systems - Production Management“ Zakynthos, Mai 24 Zakynthos, Mai 24 th th , 2005 , 2005

Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

Markov Decision Models for Markov Decision Models for Order Acceptance/Rejection Order Acceptance/Rejection

ProblemsProblems

Florian Defregger and Florian Defregger and Heinrich KuhnHeinrich Kuhn Catholic University of Eichstätt-Ingolstadt

Fifth International Conference on Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“„Analysis of Manufacturing Systems - Production Management“

Zakynthos, Mai 24Zakynthos, Mai 24thth, 2005, 2005

Page 2: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 2

StructureStructure

1. Introduction

2. Decision Problem

3. Markov Decision Model

4. Solution Procedure

5. Numerical Results

Page 3: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 3

IntroductionIntroduction

Revenue Management (RM)

– Service industries (air transportation, hotels, car rental, etc.)

– Manufacturing industries (steel, paper, aluminum, etc.)

see Kniker/Burman (2001)

– Implementations of RM systems have increased profits

by 2 – 10%.

Page 4: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 4

IntroductionIntroduction

Which kind of manufacturing company could potentially use revenue

management to increase the bottom line?

a) high fixed costs

b) a short-term increase of capacity to meet demand peaks is very

expensive or even not possible

c) demand fluctuates over time

d) customers are willing to pay different prices for essentially the

same product

Page 5: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 6

Decision problemDecision problem

Assumptions

• One single bottleneck in the manufacturing process

• Orders:

• specific price, volume, and lead time (due date)

• one arrival in a given time period

• arrivals are independent of one another

• Products can be made to stock

• Limited inventory capacity

• Infinite planning horizon

Page 6: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 7

Decision problemDecision problem

1. Accept order? yes/no

2. If yes; how much inventory should be used?

Incoming orders

Accept?

no

Deliveryyes

Machine

Inventory

kMachine m

Time

Accepted orders before

today

n

Maximum lead time, ln

Page 7: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 8

NotationNotation

• N order classes, n {1, ..., N}.

• Each order n can be assigned to one order class.

• Parameters for orders of class n:

mn : profit margin

un : capacity usage

ln : lead time

pn : probability of arriving

dummy order class 0: 0 ,1 01

000

lumppN

nn

Orders:

.

.

.

today

0

1

N

p1

p0

pN

mnunln

n

.

.

.

pn

Page 8: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 9

NotationNotation

Inventory:

Imax : maximum inventory level

i : inventory level, i {0,1, ..., Imax}.

h : inventory holding costs per unit of inventory per period

Inventory level i is expressed in periods that the machine needed toproduce that inventory

Page 9: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 10

NotationNotation

n,c,i n,c,i n,c,i

today

Transition Probabilities

States (n, c, i) S (state space):

n : order class of the order arrived at the beginning of the current period

c : number of periods the machine is reserved for already accepted but not finished yet orders, c {0,1, ..., H}.

i : current inventory level

H-c : available capacity in the considered horizon H

Problem Size:

n c i

)1()1,maxmax()1( max IlNS nn

)1,maxmax( nn

lH

k

m

nmaximum lead time

timetoday

lk ln lm

capacity usage, un

maximum horizon, H

Page 10: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 11

Sequence of DecisionsSequence of Decisions

Incoming Order

Accept?

no

yes

ReplenishInventory

?

ReplenishInventory

?

no

D1

yes

D4

no

D3(r)

yesIs themachine

busy?

yes

no

Is themachine

busy?

yes

no

Decide how manyunits to use from

inventory

D2

accept, do not raise inventory and satisfy order with r units from inventory: n > 0 (c+un ln + i un i), r {rmin,…,rmax}

D3(r) :=

D2 := reject and raise inventory level : c = 0 i < Imax

D1 := reject and do not raise inventory level

D4 := accept, satisfy order completely from inventory and raise inventory level: n > 0 c = 0 un i

D[(n, c, i)] =

n: order class

c: machine usage

i: inventory level

kMachine m

Time

Accepted orders before

today

n

Maximum lead time, ln

Page 11: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 12

RewardsRewards

timetoday

InventoryD2

ii+1

timetoday

InventoryD3(r)

i-r

in

timetoday

InventoryD4

i-un

in

RD1 = RD2 = – h ·i

RD3(r) = mn – h · (i – r)

RD4 = mn – h · (i – un)

D1: reject and do not raise inventory level

D2: reject and raise inventory level

D3: accept and do not raise inventory level

D4: accept and raise inventory level

timetoday

InventoryD1

i

Page 12: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 13

Time-discrete Markov Decision ProcessTime-discrete Markov Decision Process

Objective: find the best action for every state in order to maximize the long-term average reward per period

|D| = 4),maxmin(max

Iunn

Number of decision possibilities

state

Transition Probabilities

time

today

state state state

decision,reward

decision,reward

decision,reward

Page 13: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 14

pm, (n, c, i) {S : c 0}, m {0, ..., N}

0, elsePD1[(n, c, i), (m, c – 1, i)] =

n, m: order class

c: machine usage

i: inventory level

Transition ProbabilitiesTransition Probabilities

=

pm, (n, c, i) S, m {0, ..., N},

r {min(max(0, c + un – ln), min(i, un),..., min(i, un)}

0, else

PD3(r)[(n, c, i), (m, c + un – r – 1, i – r )] =

D1: reject and do not raise inventory level

D3: accept and do not raise inventory level

n, c, i

m, c-1, i

m, c+un-r-1, i-r

machineis busy

pm

if D1

pm

if D3(r)

Page 14: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 15

PD2[(n, 0, i), (m, 0, i + 1)] =

pm, n, m {0, ..., N}, i {0, ..., Imax – 1}

0, else

pm, (n, c, i) S, m {0, ..., N}

0, else

PD4[(n, 0, i), (m, 0, i – un + 1)] =

n, m: order class

c: machine usage

i: inventory level

Transition ProbabilitiesTransition Probabilities

pm, n, m {0, ..., N}, i {0, ..., Imax}

0, else

PD1[(n, 0, i), (m, 0, i)] =

PD3(r)[(n, 0, i), (m, max(0,un – r – 1), i – r )] = …

n, 0, i

m, 0, i

m, 0, i-un+1

machine isnot busy

pmif D1

pmif D4

m, 0, i+1pm

if D2

m, un-r-1, i-r

pmif D3(r)

D1: reject and do not raise inventory level

D2: reject and raise inventory level

D3: accept and do not raise inventory level

D4: accept and raise inventory level

Page 15: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 16

This Markov Decision Process can be solved via standard methods, e.g.

linear programming , policy iteration or value iteration.

But, for large problem instances the computational times are too long(see Numerical Results).

Solution ProcedureSolution Procedure

Page 16: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 17

Heuristic:

Objective: Find good policies in acceptable runtimes

Idea: Reject "bad" order classes and accept "good" order classes

"goodness" of an order class: relative profit margin mn / un [profit/cap. usage]

Solution ProcedureSolution Procedure

0 1 2 3 4 5order classes, sorted

ascending by relative profitmargins

reject under allcircumstances

reject,acceptance notpossible

reject, althoughacceptance possible

accept

accept if possibleaccept in favorablesituations

Page 17: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 19

Consider an “accept in favorable situations” order class, e.g. n=2 or n=3:

Acceptance levels increase with lower machine usages or higher inventory levels

Solution ProcedureSolution Procedure

Machine

Time

n

lead time, ln =5

minimum inventory needed = 3

today

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5machine

usage

inventorylevel

reject,acceptance notpossible

reject, althoughacceptance possible

accept

0

1

2

3

4

5

6

7

8

9

10

6

0

1

2

3

4

5

6

7

8

9

10

7

capacity usageun=8

lead timeln=5

minimum inventoryneededun - ln = 3

Page 18: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 21

The result is a combinatorial optimization problem in N dimensions.

Idea for heuristic: evaluate the average reward of certain policies AT = (a1, a2, ..., aN) via

simulation and find good policies by simulation comparisons.

Example: N = 5

Solution ProcedureSolution Procedure

un

n

1 2 3 4 5

Imax

0

Imax+1

max(0, un - ln)

Page 19: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 23

Solution ProcedureSolution Procedure

Policy i:• order classes n {0,1,…,i} are completely rejected• order classes n {i+1,…,N} are completely accepted• R(i) : average reward of policy i

an

n

1 2 3 4 5

Imax + 1

0

max (0, un ln )

first two policies to be compared

policy i = 1

policy i = 0

Imax

Page 20: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 24

Solution ProcedureSolution Procedure

Procedure:

• Sort order classes ascending by their relative profit margins

• Close order classes successively n = 1, 2, ... until maximum of average reward is reached

• The last order class that was closed has the maximum reward R* ;it is called n*

i

ni

nRR

nR

Nn

i

RR

*

*

*

n

endfor

)(

endfor ,)(R if

,...,2,1for

0

)0(* an

n

1 2 3 4 5

Imax+1

0

n* = 2

max (0, un - ln )

Imax

Page 21: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 25

Further improvement of the policy:

• Close half of the order class right of n*, n=n*+1,

• Open half of n*• Determine which policy offers maximum of average reward

Solution ProcedureSolution Procedure

an

n

1 2 3 4 5

Imax+1

0

n*

an

n

1 2 3 4 5

Imax+1

0

n*

max (0, un - ln )

an*an*+1

Imax Imax

Page 22: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 26

Numerical ResultsNumerical Results

problem class 1 2 3 4 5

number of states 10,000 50,000 100,000 500,000 1,000,000

number of instances

100 100 100 100 100

order classes [5,20] [5,20] [10,30] [20,50] [20,50]

maximum inventory 10 15 20 50 100

relative profit margin

[1,3] [1,3] [1,3] [1,3] [1,3]

maximum lead time 151 520 423 466 471

inventory cost 0.01 0.01 0.01 0.01 0.01

trafic intensity [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5]

Problem classesProblem classes

Page 23: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 27

Numerical ResultsNumerical Results

problem class 1 2 3 4 5

proportion optimum [%] 99 93 94 0 0

runtime value iteration [sec.] 82.3 880.9 1584.1 3681.3 3741.1

average [%] 4.4 3.8 4.0 2.4 -8.5

minimum [%] 0.0 0.0 0.0 -3.0 -69.9

maximum [%] 18.3 33.9 34.2 22.2 8.6

standard deviation [%] 4.7 6.2 6.0 3.9 13.6

Average reward per period FCFS-policy vs. value iteration algorithm

Page 24: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 28

Numerical ResultsNumerical Results

problem class 1 2 3

proportion optimum [%] 99 93 94

running time heuristic [sec.] 42.8 92.8 115.3

running time value iteration [sec.] 82.3 880.9 1584.1

average [%] 1.7 1.8 1.5

minimum [%] 0.0 0.0 0.0

maximum [%] 17.9 33.9 23.1

standard deviation [%] 2.9 4.8 3.1

Average reward per periodHeuristic procedure vs. value iteration algorithm

Page 25: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 29

Numerical ResultsNumerical Results

problem class 1 2 3 4 5

runtime FCFS [sec.] 15.0 62.8 115.3 70.5 143.2

runtime heuristic [sec.] 42.8 92.8 58.3 254.8 206.9

average [%] 2.7 2.1 2.5 2.0 1.7

minimum [%] 0.0 0.0 0.0 0.0 0.0

maximum [%] 16.6 19.2 32.1 18.4 11.7

standard deviation [%] 3.8 4.1 5.1 2.8 2.5

Average reward per period FCFS-policy vs. heuristic procedure

Page 26: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 30

Numerical ResultsNumerical Results

order class 1 2 3

lead time 10 4 2

profit margin 20,00 € 60,00 € 100,00 €

capacity usage 4 4 4

relative profit margin 5,00 15,00 25,00

relative traffic intensity

60% 30% 10%

Example with three order classes

Page 27: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 31

Numerical ResultsNumerical Results

Average reward per period Heuristic procedure vs. value iteration algorithm

influence of traffic intensity on average reward, low inventory holding costs = 1€

-2

0

2

4

6

8

10

12

14

50% 75% 100% 125% 150% 175% 200% 225% 250%

traffic intensity

aver

age

rew

ard

optimal policy lowinventory capacity (2 units)

heuristic lowinventory capacity (2 units)

optimal policy highinventory capacity (8 units)

heuristic highinventory capacity (8 units)

Page 28: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 32

Numerical ResultsNumerical Results

Average reward per period Heuristic procedure vs. value iteration algorithm

influence of inventory capacity on average reward, high traffic intensity = 200%

5

6

7

8

9

10

11

12

0 1 2 3 4 5 6 7 8 9 10

inventory capacity

ave

rag

e re

war

d

optimal policy lowinventory holdingcost (1€)

heuristic lowinventory holdingcost

optimal policy highinventory holdingcost (5€)

heuristic highinventory holdingcost

steep ascent because one order class needs at least two units of inventory for acceptance

Page 29: Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

May 24, 2005 34

Thank you for your attention.