25
MIT and James Orlin © 2003 1 Stochastic Dynamic Programming Review DP with probabilities

MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

Embed Size (px)

Citation preview

Page 1: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

1

Stochastic Dynamic Programming

– Review– DP with probabilities

Page 2: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

2

Overview

Objective: illustrate the use of DP with probabilities

Seems more complex because it is a more complex decision at each stage

But the optimal decision at each stage still depends on the previous stages.

Page 3: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

3

Review of DP using stages

Capital Budgeting, again

Investment 1 2 3 4 5 6

Cash Required (1000s)

$5

$7

$4

$3

$4

$6

NPV added (1000s)

$16

$22

$12

$8

$11

$19

Investment budget = $14,000

Page 4: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

4

The Dynamic programming stages and states

Let f(k,B) be the best NPV limited to stocks 1, 2, …, k only and using a budget of at most B.

Stages: at stage k consider only stocks 1, 2, …, kState: B is the budget

Compute f(1, B) for B = 0 to 14.

Then compute f(2, B) for B = 0 to 14.

Then compute f(3, B) for B = 0 to 14.

etc.

Page 5: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

5

Capital Budgeting: stage 1

Budget used up

Consider stock 1: cost $5, NPV: $16

f(k, B)

f(1,B) = 0 for B = 0 to 4

f(1, B) = 16 for B >= 5.

3 4 5 6 7 8 9 10 11 12 13 14210B

0 0 0 16 16 16 16 16 16 16 16 16 16S1 0 0

Page 6: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

6

Capital Budgeting: stage 2

Budget used up

Consider stock 1: cost $5, NPV: $16

f(k, B)

f(2,B) = 0 for B = 0 to 4f(2, B) = 16 for B = 5, 6f(2, B) = 22 for B = 7 to 11f(2, B) = 38 for B = 12 to 14

3 4 5 6 7 8 9 10 11 12 13 14210B

0 0 0 16 16 16 16 16 16 16 16 16 16S1 0 0

Consider stock 2: cost $7, NPV: $22

0 0 0 16 16 22 22 22 22 22 38 38 38S2 0 0

Page 7: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

7

Capital Budgeting: stage 3, using DP

Budget used up

3 4 5 6 7 8 9 10 11 12 13 14210B

0 0 0 16 16 22 22 22 22 22 38 38 38S2 0 0

Consider stock 3: cost $4, NPV: $12

f(2, B)

We can compute f(3, B) using f(2, ) as input.We illustrate on f(3, 9).

<3,9>

<2,5>

<2,9>Don’t buy stock 3

$22

Buy stock 3

$12$16

$28

Choose the best decision.

Page 8: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

8

On the DP for the Capital Budgeting Problem

<3,9>

<2,5>

<2,9>

Buy stock 3

Don’t buy stock 3$22

$12$16

$28

f(3,9) = max [ 12 + f(2, 5), f(2,9) ]

f(3, B) = f(2, B) for B = 0, 1, 2, 3

f(3, B) = max [12 + f(2, B-4), f(2, B) ] for B = 4 to 14.

In general, f(k, B) can be computed from f(k-1, · )

Page 9: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

9

Decision Diagrams

Buy stock 3

Don’t buy stock 3

<3,9>

<2,5>

<2,9> $22

$12$16

$28

The above diagram is a decision diagram.

The optimal decision at each stage can be determined from decisions at previous stages.

We may view the diagram as a “local decision diagram” since it involves only a small part of the overall decision.

We use an extension of this approach when we deal with dynamic programming under uncertainty.

Page 10: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

10

Dynamic Programming under uncertainty

Next: we will permit uncertainties in our DPs.

This is usually where DP gets much more powerful as a tool, but also more complex

We illustrate with an example in warfare, or gaming if you prefer.

Page 11: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

11

Destroying an enemy target: a bomber example

You are a pilot in enemy territory. Your mission is to destroy an important target. You must get through. You have four minutes to reach your target, and have just been spotted by radar.

Enemies have can launch up to one bomber per minute to prevent you from reaching the target. The probability of them launching a bomber in any minute is qi for i = 1 to 4.

Page 12: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

12

A bomber example, continued

To protect yourself, you have M missiles. Each has a probability of pj of destroying the bomber.

Whenever you see a bomber, you must decide how many missiles to launch. If you do not destroy the bomber, then you will be destroyed.

Determine a strategy for how many missiles to launch at each time, assuming you see a bomber attacking you.

– Let f(k, m) be the number of missiles to launch assuming that you have k minutes left and have m missiles on hand.

– A strategy is to determine f(k, m) for k = 1 to 4 and m = 1 to M.

Page 13: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

13

Simulating the bomber example

Each person has a die and a page describing the probabilities.

Simulate 1 or more instances of the game.– We will discuss the results– Then we will show how to determine an

optimal strategy using DP

Page 14: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

14

What is the probability of surviving with 1 minutes remaining and 4 missiles left

bomber launched?

<1,4>

1 minutes left, 4 missiles

Fire

yes

hit?

You win!yes

no You win!

noYou lose.

There is one minute left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire. What is the probability of survival?

1 missile

2 missiles

3 missiles

4 missiles

Step 1. Draw the diagram.

Firing all missiles is clearly optimal with one minute to go.

Page 15: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

15

Step 2. Fill in probabilities and end-valuesThe probability of a launched bomber is 2/3.

The probability of a missile hitting the bomber is 1/3.

What is the probability of survival?

bomber launched?

<1,4>

1 minutes left, 4 missiles

Fire

yes

hit?

You win!yes

no You win!

noYou lose.

1 missile

2 missiles

3 missiles

4 missiles

Fill in end values, prob. of survival

1

0

Fill in probabilities of events.1/3

2/3

Probability of 4 missiles missing is (2/3)4 = 16/81

16/81

65/81

1

Page 16: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

16

1

Step 3. Compute values at each node.The probability of a launched bomber is 2/3.

The probability of a missile hitting the bomber is 1/3.

bomber launched?

<1,4>

1 minutes left, 4 missiles

F

yes

H

You win!yes

no You win!

noYou lose.

1 missile

2 missiles

3 missiles

4 missiles

1

0

Compute values at each node, moving from right to left.

1/3

2/3

Value(B)= 1/3 1 + 2/3 65/81 = 211/243

16/81

65/8165/81

65/81

211/243

211/243=.868

B

Value(F)= Value(H) = 65/81

Value(H)= 65/81 1 + 16/81 0

Page 17: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

17

Carry out similar calculations for other values at stage 1, that is one minute remaining

Probability of surviving

2 3 4 5 6 7 8 9 10 110 1

.704 .802 .868 .912 .941 .974 .983 .988 .992.961.333 .556

Number of missiles remaining

Calculations for stage 1.

We next do a stage 2 calculation, which will be typical of all other calculations.

Page 18: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

18

Diagram for Determining Number of Missiles to Fire

<1,4>

Fire

hit?

hit?

hit?

hit?

<1,3>

Lose

<1,2>

Lose

<1,1>

Lose

<1,0>

Lose

bomber launched?

yes

no

yes

no

yes

no

yes

no

yes

no

1 missile

2 missiles

3 missiles

4 missiles

There are two minutes left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire?

<2,4>

2 minutes left, 4 missiles

Step 1, lay out the diagram.

Page 19: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

19

Step 2. Fill in end values

<2,4>

<1,4>

Fire

hit?

hit?

hit?

hit?

<1,3>

Lose

<1,2>

Lose

<1,1>

Lose

<1,0>

Lose

bomber launched?

yes

no

yes

no

yes

no

yes

no

yes

no

1 missile

2 missiles

3 missiles

4 missiles

2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.

2 minutes left, 4 missiles

Fill in end values

.868

.802

0

.704

0

.566

0

.333

0

Page 20: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

20

2/3

Step 3. Fill in probabilities for events

<2,4>

<1,4>

Fire

hit?

hit?

hit?

hit?

<1,3>

Lose

<1,2>

Lose

<1,1>

Lose

<1,0>

Lose

bomber launched?

yes

no

yes

no

yes

no

yes

no

yes

no

1 missile

2 missiles

3 missiles

4 missiles

2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.

1/3

2 minutes left, 4 missiles

Fill in Probabilities

.868

.802

0

.704

0

.566

0

.333

0

1/3

4/9

8/27

16/81

5/9

19/27

65/81

2/3

Page 21: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

21

2/3

Step 4. Determine values of nodes and make decisions.

<2,4>

<1,4>

F

H1

H2

H3

H4

<1,3>

Lose

<1,2>

Lose

<1,1>

Lose

<1,0>

Lose

bomber launched?

yes

no

yes

no

yes

no

yes

no

yes

no

1 missile

2 missiles

3 missiles

4 missiles

2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.

1/3

2 minutes left, 4 missiles

Determine node values.

.868

.802

0

.704

0

.566

0

.333

0

1/3

4/9

8/27

16/81

5/9

19/27

65/81

2/3

Value(H1) = 1/3 .802 + 2/3 0 = .2673

.2673

Value(H2) = 5/9 .704 + 4/9 0 = .3909

.3909

.3909

.2673

Value(H3) = 19/27 .566 + 8/27 0 = .3909

Value(H4) = 65/81 .333 + 16/81 0 = .2673

Value(F) = max[Value(H1), Value(H2), Value(H3), Value(H4)] = .3909

.3909

.549

.549

B

Value(B) = 1/3 .868 + 2/3 .3909 = .550

Page 22: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

22

Node values: again

H1

H2

H3

H4

<1,3>

Lose

<1,2>

Lose

<1,1>

Lose

<1,0>

Lose

yes

no

yes

no

yes

no

yes

no

1 missile

2 missiles

3 missiles

4 missiles

.802

0

.704

0

.566

0

.333

0

1/3

4/9

8/27

16/81

5/9

19/27

65/81

2/3

Value = 1/3 .802 + 2/3 0 = .2673

.2673

Value = 5/9 .704 + 4/9 0 = .3909

.3909

.3909

.2673

Value = 19/27 .566 + 8/27 0 = .3909

Value = 65/81 .333 + 16/81 0 = .2673

Page 23: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

23

Some comments on DP

Seems complex, but the computations are all very similar.– easy to program (not so easy in Excel)– very efficient

Useful in finance – investments over time– the outcome of an investment is uncertain

Useful in inventory control– demands are uncertain– supplies must be ordered in advance

Page 24: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

24

Probabilities of surviving

Probability of reaching the target

2 3 4 5 6 7 8 9 10 110 1missiles

.704 .802 .868 .912 .941 .974 .983 .988 .992.961.333 .5561 minute

.358 .473 .550 .634 .690 .789 .830 .858 .886.750.111 .2592 minutes

.177 .254 .316 .387 .452 .561 .616 .655 .696.508.037 .1113 minutes

.084 .126 .171 .223 .270 .368 .417 .460 .504.318.012 .0454 minutes

Bomber spreadsheet

Page 25: MIT and James Orlin © 2003 1 Stochastic Dynamic Programming –Review –DP with probabilities

MIT and James Orlin © 2003

25

Summary for dynamic programming

Useful in decision making over time Uses stages, states, optimal value functions Uses recursion Can incorporate probabilities Useful in inventory management, finance,

shortest path, and much more