Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University

Markov Decision Models for Markov Decision Models for Order Acceptance/Rejection Order Acceptance/Rejection

ProblemsProblems

Florian Defregger and Florian Defregger and Heinrich KuhnHeinrich Kuhn Catholic University of Eichstätt-Ingolstadt

Fifth International Conference on Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“„Analysis of Manufacturing Systems - Production Management“

Zakynthos, Mai 24Zakynthos, Mai 24thth, 2005, 2005

May 24, 2005 2

StructureStructure

1. Introduction

2. Decision Problem

3. Markov Decision Model

4. Solution Procedure

5. Numerical Results

May 24, 2005 3

IntroductionIntroduction

Revenue Management (RM)

– Service industries (air transportation, hotels, car rental, etc.)

– Manufacturing industries (steel, paper, aluminum, etc.)

see Kniker/Burman (2001)

– Implementations of RM systems have increased profits

by 2 – 10%.

May 24, 2005 4

IntroductionIntroduction

Which kind of manufacturing company could potentially use revenue

management to increase the bottom line?

a) high fixed costs

b) a short-term increase of capacity to meet demand peaks is very

expensive or even not possible

c) demand fluctuates over time

d) customers are willing to pay different prices for essentially the

same product

May 24, 2005 6

Decision problemDecision problem

Assumptions

• One single bottleneck in the manufacturing process

• Orders:

• specific price, volume, and lead time (due date)

• one arrival in a given time period

• arrivals are independent of one another

• Products can be made to stock

• Limited inventory capacity

• Infinite planning horizon

May 24, 2005 7

Decision problemDecision problem

1. Accept order? yes/no

2. If yes; how much inventory should be used?

Incoming orders

Accept?

no

Deliveryyes

Machine

Inventory

kMachine m

Time

Accepted orders before

today

n

Maximum lead time, ln

May 24, 2005 8

NotationNotation

• N order classes, n {1, ..., N}.

• Each order n can be assigned to one order class.

• Parameters for orders of class n:

mn : profit margin

un : capacity usage

ln : lead time

pn : probability of arriving

dummy order class 0: 0 ,1 01

000

lumppN

nn

Orders:

.

.

.

today

0

1

N

p1

p0

pN

mnunln

n

.

.

.

pn

May 24, 2005 9

NotationNotation

Inventory:

Imax : maximum inventory level

i : inventory level, i {0,1, ..., Imax}.

h : inventory holding costs per unit of inventory per period

Inventory level i is expressed in periods that the machine needed toproduce that inventory

May 24, 2005 10

NotationNotation

n,c,i n,c,i n,c,i

today

Transition Probabilities

States (n, c, i) S (state space):

n : order class of the order arrived at the beginning of the current period

c : number of periods the machine is reserved for already accepted but not finished yet orders, c {0,1, ..., H}.

i : current inventory level

H-c : available capacity in the considered horizon H

Problem Size:

n c i

)1()1,maxmax()1( max IlNS nn

)1,maxmax( nn

lH

k

m

nmaximum lead time

timetoday

lk ln lm

capacity usage, un

maximum horizon, H

May 24, 2005 11

Sequence of DecisionsSequence of Decisions

Incoming Order

Accept?

no

yes

ReplenishInventory

?

ReplenishInventory

?

no

D1

yes

D4

no

D3(r)

yesIs themachine

busy?

yes

no

Is themachine

busy?

yes

no

Decide how manyunits to use from

inventory

D2

accept, do not raise inventory and satisfy order with r units from inventory: n > 0 (c+un ln + i un i), r {rmin,…,rmax}

D3(r) :=

D2 := reject and raise inventory level : c = 0 i < Imax

D1 := reject and do not raise inventory level

D4 := accept, satisfy order completely from inventory and raise inventory level: n > 0 c = 0 un i

D[(n, c, i)] =

n: order class

c: machine usage

i: inventory level

kMachine m

Time

Accepted orders before

today

n

Maximum lead time, ln

May 24, 2005 12

RewardsRewards

timetoday

InventoryD2

ii+1

timetoday

InventoryD3(r)

i-r

in

timetoday

InventoryD4

i-un

in

RD1 = RD2 = – h ·i

RD3(r) = mn – h · (i – r)

RD4 = mn – h · (i – un)

D1: reject and do not raise inventory level

D2: reject and raise inventory level

D3: accept and do not raise inventory level

D4: accept and raise inventory level

timetoday

InventoryD1

i

May 24, 2005 13

Time-discrete Markov Decision ProcessTime-discrete Markov Decision Process

Objective: find the best action for every state in order to maximize the long-term average reward per period

|D| = 4),maxmin(max

Iunn

Number of decision possibilities

state

Transition Probabilities

time

today

state state state

decision,reward

decision,reward

decision,reward

May 24, 2005 14

pm, (n, c, i) {S : c 0}, m {0, ..., N}

0, elsePD1[(n, c, i), (m, c – 1, i)] =

n, m: order class

c: machine usage

i: inventory level

Transition ProbabilitiesTransition Probabilities

=

pm, (n, c, i) S, m {0, ..., N},

r {min(max(0, c + un – ln), min(i, un),..., min(i, un)}

0, else

PD3(r)[(n, c, i), (m, c + un – r – 1, i – r )] =



n, c, i

m, c-1, i

m, c+un-r-1, i-r

machineis busy

pm

if D1

pm

if D3(r)

May 24, 2005 15

PD2[(n, 0, i), (m, 0, i + 1)] =

pm, n, m {0, ..., N}, i {0, ..., Imax – 1}

0, else

pm, (n, c, i) S, m {0, ..., N}

0, else

PD4[(n, 0, i), (m, 0, i – un + 1)] =

n, m: order class

c: machine usage

i: inventory level

Transition ProbabilitiesTransition Probabilities

pm, n, m {0, ..., N}, i {0, ..., Imax}

0, else

PD1[(n, 0, i), (m, 0, i)] =

PD3(r)[(n, 0, i), (m, max(0,un – r – 1), i – r )] = …

n, 0, i

m, 0, i

m, 0, i-un+1

machine isnot busy

pmif D1

pmif D4

m, 0, i+1pm

if D2

m, un-r-1, i-r

pmif D3(r)


D2: reject and raise inventory level


D4: accept and raise inventory level

May 24, 2005 16

This Markov Decision Process can be solved via standard methods, e.g.

linear programming , policy iteration or value iteration.

But, for large problem instances the computational times are too long(see Numerical Results).

Solution ProcedureSolution Procedure

May 24, 2005 17

Heuristic:

Objective: Find good policies in acceptable runtimes

Idea: Reject "bad" order classes and accept "good" order classes

"goodness" of an order class: relative profit margin mn / un [profit/cap. usage]


0 1 2 3 4 5order classes, sorted

ascending by relative profitmargins

reject under allcircumstances

reject,acceptance notpossible

reject, althoughacceptance possible

accept

accept if possibleaccept in favorablesituations

May 24, 2005 19

Consider an “accept in favorable situations” order class, e.g. n=2 or n=3:

Acceptance levels increase with lower machine usages or higher inventory levels


Machine

Time

n

lead time, ln =5

minimum inventory needed = 3

today

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5machine

usage

inventorylevel

reject,acceptance notpossible

reject, althoughacceptance possible

accept

0

1

2

3

4

5

6

7

8

9

10

6

0

1

2

3

4

5

6

7

8

9

10

7

capacity usageun=8

lead timeln=5

minimum inventoryneededun - ln = 3

May 24, 2005 21

The result is a combinatorial optimization problem in N dimensions.

Idea for heuristic: evaluate the average reward of certain policies AT = (a1, a2, ..., aN) via

simulation and find good policies by simulation comparisons.

Example: N = 5


un

n

1 2 3 4 5

Imax

0

Imax+1

max(0, un - ln)

May 24, 2005 23


Policy i:• order classes n {0,1,…,i} are completely rejected• order classes n {i+1,…,N} are completely accepted• R(i) : average reward of policy i

an

n

1 2 3 4 5

Imax + 1

0

max (0, un ln )

first two policies to be compared

policy i = 1

policy i = 0

Imax

May 24, 2005 24


Procedure:

• Sort order classes ascending by their relative profit margins

• Close order classes successively n = 1, 2, ... until maximum of average reward is reached

• The last order class that was closed has the maximum reward R* ;it is called n*

i

ni

nRR

nR

Nn

i

RR

*

*

*

n

endfor

)(

endfor ,)(R if

,...,2,1for

0

)0(* an

n

1 2 3 4 5

Imax+1

0

n* = 2

max (0, un - ln )

Imax

May 24, 2005 25

Further improvement of the policy:

• Close half of the order class right of n*, n=n*+1,

• Open half of n*• Determine which policy offers maximum of average reward


an

n

1 2 3 4 5

Imax+1

0

n*

an

n

1 2 3 4 5

Imax+1

0

n*

max (0, un - ln )

an*an*+1

Imax Imax

May 24, 2005 26

Numerical ResultsNumerical Results

problem class 1 2 3 4 5

number of states 10,000 50,000 100,000 500,000 1,000,000

number of instances

100 100 100 100 100

order classes [5,20] [5,20] [10,30] [20,50] [20,50]

maximum inventory 10 15 20 50 100

relative profit margin

[1,3] [1,3] [1,3] [1,3] [1,3]

maximum lead time 151 520 423 466 471

inventory cost 0.01 0.01 0.01 0.01 0.01

trafic intensity [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5] [1.5,2.5]

Problem classesProblem classes

May 24, 2005 27



proportion optimum [%] 99 93 94 0 0

runtime value iteration [sec.] 82.3 880.9 1584.1 3681.3 3741.1

average [%] 4.4 3.8 4.0 2.4 -8.5

minimum [%] 0.0 0.0 0.0 -3.0 -69.9

maximum [%] 18.3 33.9 34.2 22.2 8.6

standard deviation [%] 4.7 6.2 6.0 3.9 13.6

Average reward per period FCFS-policy vs. value iteration algorithm

May 24, 2005 28


problem class 1 2 3

proportion optimum [%] 99 93 94

running time heuristic [sec.] 42.8 92.8 115.3

running time value iteration [sec.] 82.3 880.9 1584.1

average [%] 1.7 1.8 1.5

minimum [%] 0.0 0.0 0.0

maximum [%] 17.9 33.9 23.1

standard deviation [%] 2.9 4.8 3.1

Average reward per periodHeuristic procedure vs. value iteration algorithm

May 24, 2005 29



runtime FCFS [sec.] 15.0 62.8 115.3 70.5 143.2

runtime heuristic [sec.] 42.8 92.8 58.3 254.8 206.9

average [%] 2.7 2.1 2.5 2.0 1.7

minimum [%] 0.0 0.0 0.0 0.0 0.0

maximum [%] 16.6 19.2 32.1 18.4 11.7

standard deviation [%] 3.8 4.1 5.1 2.8 2.5

Average reward per period FCFS-policy vs. heuristic procedure

May 24, 2005 30


order class 1 2 3

lead time 10 4 2

profit margin 20,00 € 60,00 € 100,00 €

capacity usage 4 4 4

relative profit margin 5,00 15,00 25,00

relative traffic intensity

60% 30% 10%

Example with three order classes

May 24, 2005 31


Average reward per period Heuristic procedure vs. value iteration algorithm

influence of traffic intensity on average reward, low inventory holding costs = 1€

-2

0

2

4

6

8

10

12

14

50% 75% 100% 125% 150% 175% 200% 225% 250%

traffic intensity

aver

age

rew

ard

optimal policy lowinventory capacity (2 units)

heuristic lowinventory capacity (2 units)

optimal policy highinventory capacity (8 units)

heuristic highinventory capacity (8 units)

May 24, 2005 32


Average reward per period Heuristic procedure vs. value iteration algorithm

influence of inventory capacity on average reward, high traffic intensity = 200%

5

6

7

8

9

10

11

12

0 1 2 3 4 5 6 7 8 9 10

inventory capacity

ave

rag

e re

war

d

optimal policy lowinventory holdingcost (1€)

heuristic lowinventory holdingcost

optimal policy highinventory holdingcost (5€)

heuristic highinventory holdingcost

steep ascent because one order class needs at least two units of inventory for acceptance

May 24, 2005 34

Thank you for your attention.

Documents

Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University