Upload
tyler-andrews
View
219
Download
4
Tags:
Embed Size (px)
Citation preview
MIT and James Orlin © 2003
1
Stochastic Dynamic Programming
– Review– DP with probabilities
MIT and James Orlin © 2003
2
Overview
Objective: illustrate the use of DP with probabilities
Seems more complex because it is a more complex decision at each stage
But the optimal decision at each stage still depends on the previous stages.
MIT and James Orlin © 2003
3
Review of DP using stages
Capital Budgeting, again
Investment 1 2 3 4 5 6
Cash Required (1000s)
$5
$7
$4
$3
$4
$6
NPV added (1000s)
$16
$22
$12
$8
$11
$19
Investment budget = $14,000
MIT and James Orlin © 2003
4
The Dynamic programming stages and states
Let f(k,B) be the best NPV limited to stocks 1, 2, …, k only and using a budget of at most B.
Stages: at stage k consider only stocks 1, 2, …, kState: B is the budget
Compute f(1, B) for B = 0 to 14.
Then compute f(2, B) for B = 0 to 14.
Then compute f(3, B) for B = 0 to 14.
etc.
MIT and James Orlin © 2003
5
Capital Budgeting: stage 1
Budget used up
Consider stock 1: cost $5, NPV: $16
f(k, B)
f(1,B) = 0 for B = 0 to 4
f(1, B) = 16 for B >= 5.
3 4 5 6 7 8 9 10 11 12 13 14210B
0 0 0 16 16 16 16 16 16 16 16 16 16S1 0 0
MIT and James Orlin © 2003
6
Capital Budgeting: stage 2
Budget used up
Consider stock 1: cost $5, NPV: $16
f(k, B)
f(2,B) = 0 for B = 0 to 4f(2, B) = 16 for B = 5, 6f(2, B) = 22 for B = 7 to 11f(2, B) = 38 for B = 12 to 14
3 4 5 6 7 8 9 10 11 12 13 14210B
0 0 0 16 16 16 16 16 16 16 16 16 16S1 0 0
Consider stock 2: cost $7, NPV: $22
0 0 0 16 16 22 22 22 22 22 38 38 38S2 0 0
MIT and James Orlin © 2003
7
Capital Budgeting: stage 3, using DP
Budget used up
3 4 5 6 7 8 9 10 11 12 13 14210B
0 0 0 16 16 22 22 22 22 22 38 38 38S2 0 0
Consider stock 3: cost $4, NPV: $12
f(2, B)
We can compute f(3, B) using f(2, ) as input.We illustrate on f(3, 9).
<3,9>
<2,5>
<2,9>Don’t buy stock 3
$22
Buy stock 3
$12$16
$28
Choose the best decision.
MIT and James Orlin © 2003
8
On the DP for the Capital Budgeting Problem
<3,9>
<2,5>
<2,9>
Buy stock 3
Don’t buy stock 3$22
$12$16
$28
f(3,9) = max [ 12 + f(2, 5), f(2,9) ]
f(3, B) = f(2, B) for B = 0, 1, 2, 3
f(3, B) = max [12 + f(2, B-4), f(2, B) ] for B = 4 to 14.
In general, f(k, B) can be computed from f(k-1, · )
MIT and James Orlin © 2003
9
Decision Diagrams
Buy stock 3
Don’t buy stock 3
<3,9>
<2,5>
<2,9> $22
$12$16
$28
The above diagram is a decision diagram.
The optimal decision at each stage can be determined from decisions at previous stages.
We may view the diagram as a “local decision diagram” since it involves only a small part of the overall decision.
We use an extension of this approach when we deal with dynamic programming under uncertainty.
MIT and James Orlin © 2003
10
Dynamic Programming under uncertainty
Next: we will permit uncertainties in our DPs.
This is usually where DP gets much more powerful as a tool, but also more complex
We illustrate with an example in warfare, or gaming if you prefer.
MIT and James Orlin © 2003
11
Destroying an enemy target: a bomber example
You are a pilot in enemy territory. Your mission is to destroy an important target. You must get through. You have four minutes to reach your target, and have just been spotted by radar.
Enemies have can launch up to one bomber per minute to prevent you from reaching the target. The probability of them launching a bomber in any minute is qi for i = 1 to 4.
MIT and James Orlin © 2003
12
A bomber example, continued
To protect yourself, you have M missiles. Each has a probability of pj of destroying the bomber.
Whenever you see a bomber, you must decide how many missiles to launch. If you do not destroy the bomber, then you will be destroyed.
Determine a strategy for how many missiles to launch at each time, assuming you see a bomber attacking you.
– Let f(k, m) be the number of missiles to launch assuming that you have k minutes left and have m missiles on hand.
– A strategy is to determine f(k, m) for k = 1 to 4 and m = 1 to M.
MIT and James Orlin © 2003
13
Simulating the bomber example
Each person has a die and a page describing the probabilities.
Simulate 1 or more instances of the game.– We will discuss the results– Then we will show how to determine an
optimal strategy using DP
MIT and James Orlin © 2003
14
What is the probability of surviving with 1 minutes remaining and 4 missiles left
bomber launched?
<1,4>
1 minutes left, 4 missiles
Fire
yes
hit?
You win!yes
no You win!
noYou lose.
There is one minute left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire. What is the probability of survival?
1 missile
2 missiles
3 missiles
4 missiles
Step 1. Draw the diagram.
Firing all missiles is clearly optimal with one minute to go.
MIT and James Orlin © 2003
15
Step 2. Fill in probabilities and end-valuesThe probability of a launched bomber is 2/3.
The probability of a missile hitting the bomber is 1/3.
What is the probability of survival?
bomber launched?
<1,4>
1 minutes left, 4 missiles
Fire
yes
hit?
You win!yes
no You win!
noYou lose.
1 missile
2 missiles
3 missiles
4 missiles
Fill in end values, prob. of survival
1
0
Fill in probabilities of events.1/3
2/3
Probability of 4 missiles missing is (2/3)4 = 16/81
16/81
65/81
1
MIT and James Orlin © 2003
16
1
Step 3. Compute values at each node.The probability of a launched bomber is 2/3.
The probability of a missile hitting the bomber is 1/3.
bomber launched?
<1,4>
1 minutes left, 4 missiles
F
yes
H
You win!yes
no You win!
noYou lose.
1 missile
2 missiles
3 missiles
4 missiles
1
0
Compute values at each node, moving from right to left.
1/3
2/3
Value(B)= 1/3 1 + 2/3 65/81 = 211/243
16/81
65/8165/81
65/81
211/243
211/243=.868
B
Value(F)= Value(H) = 65/81
Value(H)= 65/81 1 + 16/81 0
MIT and James Orlin © 2003
17
Carry out similar calculations for other values at stage 1, that is one minute remaining
Probability of surviving
2 3 4 5 6 7 8 9 10 110 1
.704 .802 .868 .912 .941 .974 .983 .988 .992.961.333 .556
Number of missiles remaining
Calculations for stage 1.
We next do a stage 2 calculation, which will be typical of all other calculations.
MIT and James Orlin © 2003
18
Diagram for Determining Number of Missiles to Fire
<1,4>
Fire
hit?
hit?
hit?
hit?
<1,3>
Lose
<1,2>
Lose
<1,1>
Lose
<1,0>
Lose
bomber launched?
yes
no
yes
no
yes
no
yes
no
yes
no
1 missile
2 missiles
3 missiles
4 missiles
There are two minutes left. You have 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3. If a bomber is launched, how many missiles do you fire?
<2,4>
2 minutes left, 4 missiles
Step 1, lay out the diagram.
MIT and James Orlin © 2003
19
Step 2. Fill in end values
<2,4>
<1,4>
Fire
hit?
hit?
hit?
hit?
<1,3>
Lose
<1,2>
Lose
<1,1>
Lose
<1,0>
Lose
bomber launched?
yes
no
yes
no
yes
no
yes
no
yes
no
1 missile
2 missiles
3 missiles
4 missiles
2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.
2 minutes left, 4 missiles
Fill in end values
.868
.802
0
.704
0
.566
0
.333
0
MIT and James Orlin © 2003
20
2/3
Step 3. Fill in probabilities for events
<2,4>
<1,4>
Fire
hit?
hit?
hit?
hit?
<1,3>
Lose
<1,2>
Lose
<1,1>
Lose
<1,0>
Lose
bomber launched?
yes
no
yes
no
yes
no
yes
no
yes
no
1 missile
2 missiles
3 missiles
4 missiles
2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.
1/3
2 minutes left, 4 missiles
Fill in Probabilities
.868
.802
0
.704
0
.566
0
.333
0
1/3
4/9
8/27
16/81
5/9
19/27
65/81
2/3
MIT and James Orlin © 2003
21
2/3
Step 4. Determine values of nodes and make decisions.
<2,4>
<1,4>
F
H1
H2
H3
H4
<1,3>
Lose
<1,2>
Lose
<1,1>
Lose
<1,0>
Lose
bomber launched?
yes
no
yes
no
yes
no
yes
no
yes
no
1 missile
2 missiles
3 missiles
4 missiles
2 minutes left. 4 missiles remaining. The probability of a launched bomber is 2/3. The probability of a missile hitting the bomber is 1/3.
1/3
2 minutes left, 4 missiles
Determine node values.
.868
.802
0
.704
0
.566
0
.333
0
1/3
4/9
8/27
16/81
5/9
19/27
65/81
2/3
Value(H1) = 1/3 .802 + 2/3 0 = .2673
.2673
Value(H2) = 5/9 .704 + 4/9 0 = .3909
.3909
.3909
.2673
Value(H3) = 19/27 .566 + 8/27 0 = .3909
Value(H4) = 65/81 .333 + 16/81 0 = .2673
Value(F) = max[Value(H1), Value(H2), Value(H3), Value(H4)] = .3909
.3909
.549
.549
B
Value(B) = 1/3 .868 + 2/3 .3909 = .550
MIT and James Orlin © 2003
22
Node values: again
H1
H2
H3
H4
<1,3>
Lose
<1,2>
Lose
<1,1>
Lose
<1,0>
Lose
yes
no
yes
no
yes
no
yes
no
1 missile
2 missiles
3 missiles
4 missiles
.802
0
.704
0
.566
0
.333
0
1/3
4/9
8/27
16/81
5/9
19/27
65/81
2/3
Value = 1/3 .802 + 2/3 0 = .2673
.2673
Value = 5/9 .704 + 4/9 0 = .3909
.3909
.3909
.2673
Value = 19/27 .566 + 8/27 0 = .3909
Value = 65/81 .333 + 16/81 0 = .2673
MIT and James Orlin © 2003
23
Some comments on DP
Seems complex, but the computations are all very similar.– easy to program (not so easy in Excel)– very efficient
Useful in finance – investments over time– the outcome of an investment is uncertain
Useful in inventory control– demands are uncertain– supplies must be ordered in advance
MIT and James Orlin © 2003
24
Probabilities of surviving
Probability of reaching the target
2 3 4 5 6 7 8 9 10 110 1missiles
.704 .802 .868 .912 .941 .974 .983 .988 .992.961.333 .5561 minute
.358 .473 .550 .634 .690 .789 .830 .858 .886.750.111 .2592 minutes
.177 .254 .316 .387 .452 .561 .616 .655 .696.508.037 .1113 minutes
.084 .126 .171 .223 .270 .368 .417 .460 .504.318.012 .0454 minutes
Bomber spreadsheet
MIT and James Orlin © 2003
25
Summary for dynamic programming
Useful in decision making over time Uses stages, states, optimal value functions Uses recursion Can incorporate probabilities Useful in inventory management, finance,
shortest path, and much more