Dynamic Programming - Fordham University 2005-11-13آ  Dynamic Programming 163 11 Dynamic Programming

  • View
    8

  • Download
    0

Embed Size (px)

Text of Dynamic Programming - Fordham University 2005-11-13آ  Dynamic Programming 163 11 Dynamic...

  • Dynamic Programming 163

    11 Dynamic Programming

    The Bellman Equation In Chapter 10 we studied optimization over time. The constraints (10.1) expressing increments to stocks or state variables, and the additively separable form (10.3) of the objective function, were spe- cial features that enabled us to express the first-order conditions in a useful special form, namely as the Maximum Principle. Dynamic Programming is an alternative way of solving the same problem. It proves especially useful when time and uncertainty appear to- gether, as they so often do in reality. Let us begin with time to keep the exposition simple, and introduce uncertainty later.

    The vectors of initial stocks yo and terminal stocks y,+~ were given when we maxirnised

    subject to the constraints

    and

    G ( ~ t , z t , t ) < 0, (10.2) for t = 0, 1 . . . T. Keep the terminal stock requirement fixed for now. As in Chapter 5, we can define the resulting maximum value 1 as a function of the initial stocks, say V(y0). The vector of deriva- tives Vy(yO) will be the vector of the shadow prices of these initial stocks.

    The separability of the objective and the constraints allows us to generalize this greatly. Instead of the starting time 0, consider another particular time, say t = r . For the decisions starting at I r, the only thing that matters about the past is the vector y, of ; stocks that emerges from the past decisions. We can take that '

    as parametric, and start the whole problem afresh at T. In other words, we maximize a sum just like (10.3), but extending only from T to T, subject to constraints just like (10.1) and (10.2), but holding only for T, T + 1, . . . T. Let V(y,, T) be the maximum value function of this problem; the explicit argument T is necessary because the limit of the summation depends on it. The vector of derivatives Vy ( y,, T) is the marginal increase in the maximized sum when we start with a small increment to the initial stocks y, at T , that is, the vector of shadow prices of the initial stocks for the optimization problem that starts at r.

    What happens when we embed the sub-~roblem starting at T into the full problem starting at O? In Chapter 10 we could interpret the Lagrange multiplier on the constraint (10.1) for T as the shadow price vector n,+l of stocks at ( ~ + 1 ) . A slight relaxation of this constraint meant an exogenous increase in y,+l, and the multiplier told us the resultant increase in the objective function (10.3). At first sight this differs from the vector of derivatives 17y(~r+1, T + 1) for the sub-problem starting at (T + 1). In the full problem, we know at time 0 that the stocks at (T + 1) are going to increase a little. Then we can plan ahead and change the control variables at earlier dates. For example, if we know at time 0 that a windfall of wealth is going to occur at (T + I) , we will consume more in anticipation as well as after the realization. But the Envelope Theorem comes to the rescue. For the small changes that are involved when we look at first-order derivatives, the direct effect on the objective function is all that counts; the induced effect of optimal readjustments in the choice variables can be ignored. Therefore we can indeed identify the derivatives Vy with the shadow prices n at all times.

    Now pick any t, and consider the decision about the control variables zt at that time. Consider the consequences of any partic- ular choice of zt. It will lead to next period's stocks yt+l according to (10.1). Thereafter it remains to solve the subproblem starting at (t + I) , and achieve the maximum value V ( Y ~ + ~ , t + 1). Then the total value starting with yt at t can be broken down into two terms: F(yt , zt, t) that accrues at once, and V(yt+t, t + 1) that accrues thereafter. The choice of zl should maximize the sum of these two terms. In other words,

    ~ ( ~ ~ , t ) = max { F(yt,zt , t) + V(yt+l,t + 1) 1, (11.1) +t ,

  • 164 Optimization in Economic Theory Dynamic Programming 165

    subject to the constraints (10.1) and (10.2) for just this one t. This equation gives us a brute-force way of solving the original

    optimization problem. The idea is to start at the end and proceed to earlier times recursively. At time T there is no future, only the fixed terminal stock requirement y T + ~ . Therefore

    subject to

    This is in principle a straightforward static optimization problem, and yields the maximum value function V(yT, T). That can then be used on the right-hand side in (11.1) for t = T - 1. This is another static problem, and yields the maximum value function V(yT-, , T - 1). And so on all the way back to 0. In practice this works only for the simplest problems. Analytical solutions of this kind are possible only when the functions F, G, and Q have very simple forms. Numerical solutions can be computed for somewhat harder problems, but if the state variables form a vector of more than two dimensions, even that quickly becomes unmanageable. Luckily, the brute-force method is only a backstop. In many economic applications, there are better methods to find the solution, or at least obtain useful insights about it.

    This method of optimization over time as a succession of static programming problems was pioneered by Richard Bellman, and named Dynamic Programming. The idea that whatever the de- cision at t, the subsequent decisions should proceed optimally for the subproblem starting at (t + 1) is known as Bellman's Principle of Optimality. The maximum value function V(yt, t) is called the Bellman value function, and equation (1 1.1) the Bellman equation.

    Let us look at the maximization problem on the right-hand side of the Bellman equation. Substituting for yt+l from (10.1), we are to choose zt to maximize

    subject to

    G(Yt ,W) 1 0 .

    Letting At denote the row vector of the multipliers on the con- straints, the &st-order conditions are

    Recognizing the derivatives Vy as the shadow prices n, this becomes

    These are exactly the first-order conditions for zt to maximize the Harniltonian H(yt, zt, nt+l, t) defined in Chapter 10, subject to the single-period constraints (10.2) as there. Thus Dyanamic Program- ming leads to the same rule for setting the choice variables as the Maximum Principle.

    In fact the Maximum Principle and Dynamic Programming are fully equivalent alternative methods for optimization over time. You should use whichever is simpler for tackling the particular problem at hand. The Maximum Principle is generally better when time is continuous and there is no uncertainty; Dynamic Program- ming in the opposite case of discrete time and uncertainty. But that is not a hard and fast rule.

    Later in this chapter I shall illustrate the use of Dynamic Programming in some economic applications. To conclude this section I use it to establish the intertemporal arbitrage equation (10.13) in a different way. When zt is chosen optimally, (11.1) holds with equality, that is,

    Differentiate this with respect to yt, noting that yt+l depends on yt, and using the Envelope Theorem on the right-hand side. Then

    Using the shadow prices n, this becomes (10.13).

    Uncertainty Dynamic Programming is particularly well suited to optimization problems that combine time and uncertainty. Suppose that the process governing the evolution of stocks yt through time has a

  • L 66 Uptamzzataon an &conomac Il'heory

    random component. Given the stocks yt at the beginning of pe- riod t, and the controls zt during the period, we know only the probability density function of next period's stocks yt+l. Write this as #(yttl; yt, zt). The arguments are separated out for no- tational clarity. The first argument yt+l is the actual vector of random variables whose probability density function this is; the others are like parameters that can alter the functional form of the distribution. As a simple example, yt+l could be a vector normal distribution with a mean vector p and a variance-covariance ma- trix C both of which depend on (yt, zt). As an even more special case, p could equal yt+Q(yt, zt, t), the value of yt+l in the previous discussion without uncertainty.

    Now the problem is to maximize the mathematical expectation of (10.3), subject to (10.2) for all t, and (10.1) replaced by the stochastic law of motion for yt+l described by the function 4. Write V(yt , t) for the maximum value function of the subproblem starting at t. For the moment fix the choice of zt. Consider what happens after the actual value of yt+l becomes known at the beginning of period (t +I). The rest of the decisions will be made optimally, and yield V ( Y ~ + ~ , t + 1). From our perspective of period t , this is still a random variable, and we are concerned about its mathematical expectation,

    where the integral is taken over the range over which yt+l tributed. Then the Principle of Optimality becomes

    V(yt,t) = max { q y t , zt,t) + E[V(yt+1,t + 1)l ), I t

    (1 1.2)

    is dis-

    (11.3)

    The maximization on the right-hand side of (11.3) is somewhat more difficult than the corresponding certainty case (11.1). The first-order condition with respect to zt requires differentiation of # with respect to zt inside the integral, and the results of that can be hard to characterize and understand. But in principle (11.3) allows us to start at T and solve the problem recursively backward to 0 just as before. In simple but useful models, the solution can be completed analytically. At the end of the chapter, I develop two examples of Dynamic Programming under uncert