MathReview_3

8/11/2019 MathReview_3

http://slidepdf.com/reader/full/mathreview3 1/38

Math Review: Week 31

Yoshifumi Konishi

Ph.D. in Applied Economics

COB 337jE-mail: [email protected]

12. Quadratic Forms and De…nite Matrices

Finally, we are going to study optimization problems. At least up until the 1970s, economics isall about the study of rational optimizing agents . Though we often study cases of incomplete infor-mation and imperfect competition, we still assume that agents intend to optimize their behaviors,given the information they have. It is important to note that, when we say agents are rational,we mean that agents make optimal choices according to their objectives. In economics, rationality

is never a statement about whether or not their objectives are reasonable or rational. We simplytake their objectives as given and solve their optimization problems. So, it is very critical for usto be very familiar with optimization techniques. To be more precise, we need to learn at leastthree things: (i) When the optimization problems of our interest are well-de…ned? When do theyhave a solution? (ii) Are the solutions unique? Are they global or local? (iii) How can we solvethem correctly and e¢ciently? In Week 1, we have learned how to solve unconstrained one-variableobjective functions. This section discusses the cases of more than one variable, but with a verysimple functional form. That is, a quadratic form.

Before proceeding further, recall how to solve a one-variable optimization problem. Consider:

min f (x) = x2

FONC gives us a solution:

2x = 0 =) x = 0

Need to check if this is a minimum or maximum. SOC at x = 0 is:

2 > 0

So, this is at least a local minimum. But, we can easily see that f (x) > 0 for all x 6= 0. Thismeans that x = 0 is a global minimum. Now, a question arises naturally: Can we extend this sortof simple analysis to more than one variable cases? A natural extension of quadratic functions to

more than one variable is a quadratic form. Recall that a quadratic form on Rn is a real-valuedfunction, Q : Rn ! R such that:

Q(x) =Xi j

aijxix j

1 This lecture note is adapted from the following sources: Simon & Blume (1994), W. Rudin (1976), A. Takayama(1985), M. Wadachi (2000), and Toda & Asano (2000).

1



Note that a quadratic form takes the value of zero if x = 0. Recall again that we can representevery quadratic form in a matrix form, using a symmetric matrix A,

Q(x) = xT Ax

For example, on R2,

a11x21 + a22x2

2 + a12x1x2 = (x1x2)

a11

12a12

12a12 a22

x1

x2

The de…niteness of the symmetric matrix A is de…ned by the characteristics of an originalfunction Q.

De…nition 12-1 (De…niteness): Let A be an (n n) symmetric matrix. Then, A is said to be:(i) positive de…nite i¤ xT Ax > 0 for all x 6= 0 in Rn;(ii) positive semide…nite i¤ xT Ax 0 for all x 6= 0 in Rn;

(iii) negative de…nite i¤ xT

Ax < 0 for all x 6= 0 in Rn

;(iv) negative semide…nite i¤ xT Ax 0 for all x 6= 0 in Rn;(v) inde…nite i¤ xT Ax > 0 for some x and < 0 for some other x in Rn.

Note that if A is positive (negative) de…nite, then A is positive (negative) semide…nite. Now,what’s the point of this? Remember that xT Ax = 0 at x = 0. So, if A is positive de…nite,then x = 0 is a unique global minimum. If A is a positive semide…nite, then x = 0 is a globalminimum (which may not be unique). And so on. So, if we are solving a maximization problem of aquadratic form, what we want is a negative semide…niteness of A. In fact, as we will see later, we cangeneralize this test to more general functional forms and we look for the negative semide…niteness

of a Hessian matrix . To have a more geometric idea, take a look at …gures 16.2-16.6 in S&B.

Obviously, these de…nitions by themselves are of no help for us, unless there is a convenientmethod for identifying the de…niteness of matrices. Luckily for us, there is one relatively easymethod.

De…nition 12-2 (Principal Submatrix and Minor): Let A be a (n n) matrix. A (k k)submatrix of A obtained by deleting n k columns, i1; i2;:::;ink, and n k rows of the sameindices i1; i2;:::;ink is called a k-th order principal submatrix of A. The determinant of the(k k) principal submatrix is called a k-th order principal minor of A.

Example 1. Consider a (3 3) matrix:

A =0@ a11 a12 a13a21 a22 a23

a31 a32 a33

1AQuestion: How many principal submatrices of the third order? Answer: There is only one.Question: How many second order submatrices are there? Not necessarily principal submatrices.Answer: We need to …nd how many combinations for deleting (i; j). So, possible combinationsare (1; 1); (1; 2); (1; 3); (2; 1); (2; 2); (2; 3); (3; 1); (3; 2); (3; 3). Note that deleting 1st row and second

2



column is di¤erent from deleting second row and 1st column. So, there are nine possible secondorder submatrices of A. How about principal submatrices? There are only three: (1; 1); (2; 2); (3; 3),because we have to delete rows and columns of the same indices.

A33 = a11 a12

a21 a22

A22 =

a11 a13

a31 a33

A11 =

a22 a23

a32 a33

To …nd the de…niteness of a matrix, however, we only use one special principal minor, called aleading principal minor.

De…nition 12-3 (Leading Principal Submatrix and Minor): Let A be a (nn) matrix. Thek-th order principal submatrix of A obtained by deleting the last n k rows and the last n kcolumns is called the k-th order leading principal submatrix of A. We denote it by Ak. Itsdeterminant is called the k-th order leading principal minor of A.

Example 2. For a (3 3) matrix:

A =

0@ a11 a12 a13

a21 a22 a23a31 a32 a33

1A

Their leading principal minors are:

det(A1) = det(a11)

det(A2) = det

a11 a12a21 a22

det(A3) = det

0@ a11 a12 a13

a21 a22 a23

a31 a32 a33

1A

Now, we are ready to state the main theorem of this section.

Theorem 12-1 (De…niteness): Let A be a (n n) symmetric matrix. Then,(i) A is positive de…nite if and only if all its leading principal minors are strictly positive (> 0);(ii) A is positive semide…nite if and only if all its principal minors are nonnegative ( 0);(iii) A is negative de…nite if and only if all its leading principal minors alternate sings as follows:

det(A1) < 0; det(A2) > 0; det(A3) < 0 etc

3



(iv) A is negative semide…nite if and only if all its principal minors of odd order are 0 and of even order are 0;(v) If some k-th order leading principal minor of A is nonzero but the sign pattern of nonzero termsdoes not …t either of the case (i) or (iii), then A is inde…nite.

Note that for positive or negative de…niteness of A, we only need to check their leading principalminors. But, if A is neither positive (negative) de…nite nor inde…nite, then we must check all of its principal minors. Furthermore, it is important to note that when some of the leading principalminors are zero, it may not be inde…nite and can be positive or negative semide…nite if the signpattern of its nonzero terms still obeys the patterns of (i) or (iii).

Example 3. Consider a (4 4) matrix:

A =

0

BB@a11 a12 a13 a14a21 a22 a23 a24a31 a32 a33 a34

a41 a42 a43 a44

1

CCAQuestion: What are the leading principal submatrices? Question: Consider each of the followingcases. Are they positive de…nite, negative de…nite, or inde…nite?

(a) det(A1) > 0; det(A2) > 0; det(A3) > 0; det(A4) > 0 =) A is positive de…nite.(b) det(A1) < 0; det(A2) > 0; det(A3) < 0; det(A4) > 0 =) A is negative de…nite.(c) det(A1) > 0; det(A2) < 0; det(A3) > 0; det(A4) < 0 =) A is inde…nite.(d) det(A1) > 0; det(A2) > 0; det(A3) = 0; det(A4) > 0 =) A is not positive de…nite, but may

be positive semide…nite. If it is not positive semide…nite, then it is inde…nite.(e) det(A1) = 0; det(A2) < 0; det(A3) > 0; det(A4) > 0 =) A is inde…nite, because of det(A2):

If a symmetric matrix is a diagonal matrix, then it becomes very easy. Recall that the deter-

minant of a diagonal matrix is simply a product of the diagonal terms. So, its leading principalminors are:

det(A1) = a11; det(A2) = a11a22;:::; det(An) = a11a22:::ann

Thus, we have the following convenient theorem.

Theorem 12-2 (De…niteness of Diagonal Matrices): Let A be a (n n) diagonal matrix.Then,(i) A is positive de…nite if and only if all the a0ii are strictly positive;(ii) A is negative de…nite if and only if all the a0ii are strictly negative;(iii) A is positive semide…nite if and only if all the a 0ii are nonnegative ( 0);

(iv) A is negative semide…nite if and only if all the a0ii are nonpositive ( 0);(iv) A is inde…nite if two of the a 0ii have opposite signs.

In summary, we have that x = 0 is a unique solution to max Q(x) = xT Ax if A is N.D. and tomin Q(x) = xT Ax if P.D.

Lastly, in economic applications, we often have a linear constraint of the form:

4



max Q(x) s.t. c x = 0 where c = (c1;:::;cn)

In this case, we need to have "A is negative de…nite on the constraint set fx : c x = 0g". To checkthis, we form a bordered matrix:

H n+1 =

0 c

cT A

(1)

=

0BBBBB@

0 c1 c2 cnc1c2...

cn

a11 a12 a1n

a21 a22 a2n...

... . . .

...an1 an2 ann

1CCCCCA

Theorem 12-3: Consider a bordered matrix of the form in (1). Suppose that c1 6= 0.

(i) If the last n leading principal minors of H n+1 has the same sign, then the quadratic form Q ispositive de…nite on the constraint set fx : c x = 0g (so that x = 0 is a unique global minimum).(ii) If the last n leading principal minors of H n+1 alternate signs, then the quadratic form Q isnegative de…nite on the constraint set fx : c x = 0g(so that x = 0 is a unique global maximum).

We can generalize the result of this theorem for more than one constraint. However, we rarelysee the use of it. So, I believe it is enough that you know where you can …nd it in your textbookwhen you need to refer to it. (But you will see relatively more frequent use of a similar technique,‘bordered Hessian’ later).

13. Unconstrained Optimization

In this section, we deal with unconstrained optimization. The analysis of this case is impor-tant, because this case corresponds to the interior solutions of any (constrained or unconstrained )optimization problems. We have learned that for functions of one variable, a necessary conditionfor an interior maximum is that a derivative of the objective must be equal to zero. We also hadsu¢cient conditions for an interior maximum, which is that the second derivative must be negative.The main results for functions of several variables are analogous to these. First, let’s review a keyterm.

De…nition 3-11 (Interior): For any set E X , a point x is an interior point of E if there existsa neighborhood N of x such that N E: We denote the set of all interior points of E as int(E ).

Example 1. Consider a set E = f(x; y) : x2

+ y2

1g. Question: What does this set look like inR2? Answer: It is an area inside a unit circle. Consider a point (0; 1). Is it an interior point? No,

it is a boundary point. How about (0; 1=2)? 02 + (1=2)2 = 1=4 < 1. So it is an interior point. Now,consider a set E = f(x; y) : x 0; y > 0g. Is a point (0; 1) a boundary or interior point? Answer:It is a boundary point. To convince yourself, try to construct an open ball around (0; 1) and see if any open ball can be E .

5



Theorem 13-1 (FONC for Local Max/Min): Let F : U Rn ! R be a C 1 function. Suppose

that x 2 U is a local maximum or minimum of F in U . Suppose that(i) U is open; or(ii) x 2 int(U ).

Then,

D1F (x) = 0T i.e. @F

@xi(x) = 0 for all i

It is important to keep in mind that (i) this condition is a necessary condition (D1F (x) = 0T

does not imply that x is a local min or max. It simply means that x is a critical point of F ) and(ii) this necessary condition only works for interior points. (See the graph).

What about second order conditions? Recall that the second order derivative of F can besummarized as the Hessian of F :

H = D2F =

0BBBBB@@ 2

f @x2

1

@ 2

f @x2@x1 @ 2

f @xn@x1@ 2f

@x1@x2@ 2f @x2

2

@ 2f @xn@x2

... ...

. . . ...

@ 2f @x1@xn

@ 2f @x2@xn

@ 2f @x2n

1CCCCCARecall also that the Hessian is symmetric. We have the following second-order conditions.

Theorem 13-2 (SOSC for Local Max/Min): Let F : U Rn ! R be a C 2 function. Suppose

that U is open and that D1F (x) = 0T .(i) If the Hessian D2F (x) is negative de…nite, then x is a (strict) local maximum of F ;(ii) If the Hessian D2F (x) is positive de…nite, then x is a (strict) local minimum of F ;(iii) If the Hessian D2F (x) is inde…nite, then x is neither local max nor minimum of F .

De…nition 13-1 (Saddle Point): A critical point x of F for which the Hessian D2F (x) isinde…nite is called a saddle point of F .

A saddle point is a minimum of F in some directions and a maximum in other directions. Itlooks like a saddle, just like in Figure 16.4.

To prove Theorem 13-2, we utilize two of the important results learned in Week 1 and 2. Let’ssee the sketch of the proof, as it is a good review of the previous materials. In Week 1, we havelearned a Taylor approximation. Let’s write a second-order approximation of F around the critical

point x

. Let h be a change in x, so that x

+ h represents an arbitrary point around x

. In asu¢ciently small neighborhood of x, therefore,

F (x + h) F (x) + D1F (x)h + 1

2hT D2F (x)h

Because D1F (x) = 0; D1F (x)h = 0: So, we can rewrite this as:

6



F (x + h) F (x) 1

2hT D2F (x)h

We learned that, if D2F (x) is negative de…nite, for all h 6= 0, hT D2F (x)h < 0. This means that,

for all h 6= 0,

F (x + h) F (x) < 0 or F (x + h) < F (x)

This means that, for all points around x, F (x) gives the highest value. In other words, F (x) isa local maximum. We can easily adopt it for the case of positive de…nite.

Example 2. Let’s work through a concrete problem. Consider a function:

F (x; y) = x3 y3 + 9xy

Let’s …nd a critical point …rst, using FOC.

F x = 3x2 + 9y = 0 (1)

F y = 3y2 + 9x = 0 (2)

Now, from (1), y = (1=3) x2. Substitute this into (2), and manipulate:

3 (1=3) x2

2+ 9x = (1=3) x4 + 9x = 0

or x4 27x = 0

or x

x3 27

= 0

So, x = 0 or 3. Thus, the solution to this system is: (x; y) = (0; 0) or (3;3). Now, to determinewhich one is a local maximum or minimum, let’s check the Hessian.

H =

F xx F xyF xy F y

=

6x 9

9 6y

At (x; y) = (0; 0),

H = 0 9

9 0 To check the de…niteness of the Hessian, compute the leading principal minors,

det(H 1) = det(0) = 0

det(H 2) = det

0 99 0

= 81

7





In summary, we need either one of the conditions:(a) x is a local maximum (minimum) of f and x is the only critical point of f on a connected

interval I ; or(b) f 00 0 (for minimum) and f 00 0 (for maximum) for all x.

As we will see in a moment, we have analogous conditions for global maxima for functions of several variables. Before discussing further, we need to de…ne several related concepts.

De…nition 13-2 (Concave & Convex Functions): A function F : U Rn ! R is concave on

U if and only if for all x;y 2 U , for all 2 [0; 1], we have:

F [x + (1 )y] F (x) + (1 )F (y)

A function F : U Rn ! R is convex on U if and only if for all x;y 2 U , for all 2 [0; 1], wehave:

F [x + (1 )y] F (x) + (1 )F (y)

A function F : U Rn ! R is strictly concave on U if and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F [x + (1 )y] > F (x) + (1 )F (y)

A function F : U Rn ! R is strictly convex on U if and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F [x + (1 )y] < F (x) + (1 )F (y)

For functions of one variable, we can easily see what concavity and convexity means graphically.(See the graphs). Note that a linear function is both convex and concave. Now, we have a closelyrelated concept for sets. You may …nd them a bit confusing, but it is very important that you areable to use them correctly. Believe me, you will see them everywhere in your prelim or coursework.

De…nition 13-3 (Convex Sets): A set U Rn is convex if and only if for all x;y 2 U , for all

2 [0; 1],

x + (1 )y 2 U

There is no such thing as concave sets. We say the sets that are not convex are nonconvex.Moreover, note that, for functions of one variable, if f 00 0, it de…nes a concave function and if

f 0 0, then it de…nes a convex function. Thus, condition (b) above translates into the statement:If a function is concave, then a critical point is a global maximum. If a function is convex, then acritical point is a global minimum. Now, we are ready to state the main theorem:

Theorem 13-4 (Su¢ciency for Global Min/Max): Let F : U Rn ! R be a C 2 functionand U is a convex open subset of Rn.(a) The following three conditions are equivalent:

9



(i) F is a concave function on U ;(ii) F (y) F (x) D1F (x)(y x) for all x;y 2 U ;(iii) D2F (x) is negative semide…nite for all x 2 U .

(b) The following three conditions are equivalent:

(i) F is a convex function on U ;(ii) F (y) F (x) D1F (x)(y x) for all x;y 2 U ;(iii) D2F (x) is positive semide…nite for all x 2 U .

(c) If F is a concave function on U and D1F (x) = 0 for some x 2 U , then x is a global maximumof F on U .(d) If F is a convex function on U and D1F (x) = 0 for some x 2 U , then x is a global minimumof F on U .

Question: What does (ii) and (iii) mean for one-variable functions? Answer: For (i), draw apicture. For (ii), D2F (x) = F

00

, so that F is concave if and only if F 00

0 or < 0.

Note the di¤erence between Theorem 13-2, 13-3 and 13-4.

For local maxima,

H (x) is ND and D1F (x) = 0

=) [x is a local maximum]

[x is a local maximum] =)

H (x) is NSD and D1F (x) = 0

For global maxima,

H (x) is NSD for all x and D1F (x) = 0

=) [x is a global maximum]

Example 4. Recall the problem in Example 2.

F (x; y) = x3 y3 + 9xy

We had two critical points: (x; y) = (0; 0) or (3;3). We concluded that at (0; 0), Hessian isinde…nite, so that it is neither a local maximum or minimum. On the other hand, we …nd that at(3;3), Hessian is positive de…nite, so that it is a local minimum. Now, the question is whetheror not this local minimum is also a global minimum. To …nd out, we need to check the Hessian atarbitrary points of x.

H =

F xx F xyF xy F y

= 6x 99 6y

We need to check all leading principal minors …rst:

det(H 1) = 6x ? 0

det(H 2) = 36xy 81 ? 0

10



So, we cannot say that this is a global minimum or not.

Example 5. Consider an additively-separable utility function:

F (x; y) = u(x) + v(y)

Suppose that both u() and v() are concave functions, so that u00 0 and v00 0. Question:Then, is the original utility function F concave? Answer: To answer this, we need to check theHessian at arbitrary points. F xx = u00; F xy = F = 0; F yy = v 00. So, the Hessian is:

H =

F xx F xyF xy F y

=

u00 0

0 v00

For NSD, we need to check all principal minors. We have two 1st-order principal minors and one2nd-order principal minor.

det(u00) = u00 0; det(v00) = v 00 0; det u00 0

0 v00 = u00v00 0

So, they are consistent with NSD. F is a concave function. Question: What can you say, if u()and v() are strictly concave? Answer: F is a strictly concave function.

Lastly, as a corollary to Theorem 13-4, we have the following result.

Theorem 13-5 (Uniqueness of Global Min/Max): Let F : U Rn ! R be a C 2 functionand U is a convex open subset of Rn.(c) If F is a strictly concave function on U and D1F (x) = 0 for some x 2 U , then x is a uniqueglobal maximum of F on U .(d) If F is a strictly convex function on U and D1F (x) = 0 for some x 2 U , then x is a unique

global minimum of F on U .

To check strict concavity (convexity) of a function, we have the following characterizationtheorem (See Takayama A., Mathematical Economics, 1985, p.125-126).

Theorem 13-6: Let F : U Rn ! R be a C 2 function and U is a convex open subset of Rn.

(a) F is a strictly concave function on U if D2F (x) is negative de…nite for all x 2 U .(b) F is a convex function on U if D2F (x) is positive de…nite for all x 2 U .

Note that the converse may not be true (See an example).

14. Constrained Optimization I: First-Order Conditions

The objective of this and the next section is for us to be able to solve the optimization problemsof the following form:

maxx2Rn

F (x1; x2;:::;xn)

11



subject to g1(x1; x2;:::;xn) b1...

gk(x1; x2;:::;xn) bk

and h1(x1; x2;:::;xn) = c1...

hm(x1; x2;:::;xn) = cm

That is, an optimization problem of n decision variables with k inequality constraints and m equalityconstraints. Note that this form of optimization arises very naturally in economics. For example,a utility maximization is generally of the form:

maxx2Rn

U (x1; x2;:::;xn) s.t. p x M ; x1;:::;xn 0

The last n constraints are usually called nonnegativity constraints and are assumed to be there,whether implicitly stated or not, because a consumer can never consume a negative amount of agood (except some special cases). Other common economic problems include pro…t maximization:

maxx2Rn

p yw x s.t. f 1(x) y1;:::;f m(x) ym; x1;:::;xn 0

and cost minimization:

minx2Rn

p x s.t. U (x1; x2;:::;xn) U ; x1;:::;xn 0

14-1. Equality Constraints without Inequality Constraints

Let’s …rst consider an optimization problem with equality constraints only. It is called a La-grangian method. You may have learned it somewhere in calculus classes. Consider an objectivefunction of two decision variables with one equality constraint:

maxx2R2

f (x1; x2) s.t. h(x1; x2) = c (1)

Then, we have the following result.

Result 14-1 (FONC for (1)): Suppose that in Problem (1) f and h are C 1 function and that(x1; x2) is the solution. If D1h(x1; x2) 6= 0 (that is, (x1; x2) is not a critical point of h), then there

exists a real number

such that (x1; x

2;

) is a critical point of the following function:

L(x1; x2; ) = f (x1; x2) + [c h(x1; x2)]

Thus, by FONC, we must have:

D1L(x1; x2; ) = 0

12



In other words,

@L

@x1(x1; x2; ) = 0;

@L

@x2(x1; x2; ) = 0;

@L

@(x1; x2; ) = 0

This is a very surprising result (at least to me). It basically says that we can turn constrainedoptimization into unconstrained optimization and just working through FONCs can give you (can-didate) solutions. As we will see, this method works (with some modi…cation) for inequality con-straints as well. The function L is called a Lagrange function and the miraculous variable iscalled a Lagrange multiplier. The condition that (x1; x2) is not a critical point of h is calleda constraint quali…cation. As we get into more complicated problems, we will see various con-straint quali…cations. But, in most economic problems, this condition is automatically satis…ed,because their constraint is likely to be a linear combination of variables:

a1x1 + ::: + anxn M

in which case, we only need ai 6= 0 for all i.

Now, let’s see why this result can be obtained geometrically. Consider Problem (1) again. Note…rst that the equality constraint forms a level set:

C = f(x1; x2) : h(x1; x2) = cg

In two dimensional space, this set is represented by some line. Now, consider the objective, f (x1; x2).We can also draw levels sets for each and every real number a.

L(a) = f(x1; x2) : f (x1; x2) = ag

Suppose that an increase in a is represented by the moves of levels sets in the northeast direction.

Then, the maximum solution occurs exactly at the tangency of L(a) and C . Recall Implicit FunctionTheorem, which says the slope of the constraint set is given by:

dx2

dx1=

@h=@x1

@h=@x2

This must be equal to the slope of the level set L(a) at this optimum, which is also given by ImplicitFunction Theorem:

dx2

dx1=

@f=@x1

@f=@x2

Thus, we have:

@h=@x1

@h=@x2(x) =

@f=@x1

@f=@x2(x)

Or,

@f=@x1

@h=@x1(x) =

@f=@x2

@h=@x2(x)

13



So, let be the common number:

= @f=@x1

@h=@x1(x) =

@f=@x2

@h=@x2(x)

Rewrite these as:

= @f=@x1

@h=@x1(x) =)

@f

@x1(x)

@h

@x1(x) = 0

= @f=@x2

@h=@x2(x) =)

@f

@x2(x)

@h

@x2(x) = 0

Along with the original constraint,

c h(x) = 0

We have a system of three equations with three unknown variables. Note that this system is exactlythe FONCs for the Lagrangian:

@L

@x1(x1; x2; ) =

@f

@x1(x)

@h

@x1(x) = 0

@L

@x2(x1; x2; ) =

@f

@x2(x)

@h

@x2(x) = 0

@L

@(x1; x2; ) = c h(x) = 0

A couple of caveats in using this result. First, cannot be de…ned at x if @h@x1(x) = 0 or

@h@x2

(x) = 0, because RHS of the equation:

= @f=@x1

@h=@x1(x) =

@f=@x2

@h=@x2(x)

are not well de…ned in such a case. Let’s see why this can be a problem in a concrete example.

Example 1. Consider a problem:

maxx2R2

f (x1; x2) = x1x2 s.t. h(x1; x2) = x1 = 1

Note that @h=@x1 = 1 but @h=@x2 = 0. Let’s draw a picture. In this case, x1 is …xed at c. But,

there is no constraint on x2 (in general, if @h=@x2 = 0 for all x2, there is no constraint on x2).Thus, we can take any value of x2. Thus, the function f (x1; x2) = x1x2 is unbounded under thisconstraint and there can be no solution.

In general, when CQ is violated, there may or may not be a solution. Caution: This ex-ample is not meant as an example of CQ being violated. In fact, CQ is still satis…ed, because(@h=@x1;@h=@x2) = (1; 0) 6= (0; 0). Rather, it is an example illustrating what kind of problems you

14



might encounter when CQ is violated . The following example is a case in which this constraintworks just …ne.

Example 2. Consider a problem:

minx2R2

f (x1; x2) = x21 + x2

2 s.t. h(x1; x2) = x1 = 1

The level set of f (x1; x2) is obviously a circle. The bigger the value of f is, the bigger the circlegets. So, the minimum obviously happens at x1 = 1 and x2 = 0, at which we have a tangency.(Note that if this were a maximization problem, then there would not be a solution).

Another caveat is about how to set up the Lagrangian function. As you may be aware, (i) wecan have + or before the Lagrange multiplier and (ii) we can have h(x1; x2) c or c h(x1; x2).Mathematically, to obtain a solution, it does not matter what combination of these (total of fourcombinations) you have in your Lagrangian setup. However, to get a right interpretation of thevalue of the Lagrange multiplier, it is sometimes important to set it up in a right manner. We will

discuss this point later. For this reason, I advise that you use either one of the following setupsand make it a habit.

L(x1; x2; ) =

f (x1; x2) + [c h(x1; x2)]f (x1; x2) [h(x1; x2) c]

Lastly, we generalize this Lagrangian method for n variables and m equality constraints.

Theorem 14-2 (FONC for Equality Constraints): Consider the maximization or minimizationproblem of the form:

maxx2Rn

F (x1; x2;:::;xn)

subject to h1(x1; x2;:::;xn) = c1...

hm(x1; x2;:::;xn) = cm

Suppose that F; h1;:::;hm are C 1 functions and that x = (x1; x2;:::;xn) is a local maximum (orminimum) of F on the constraint set. If x is not a critical point of h = (h1;:::;hm), then thereexists a vector of Lagrange multipliers = (1; 2;:::;m) such that (x;) is a critical point of the Lagrangian function:

L(x;) = F (x) + [c h]

= F (x) + 1[c1 h1(x)] + ::: + 1[cm hm(x)]

Note that we haven’t de…ned what it means to be a critical point of a vector-valued multivariatefunction h.

15



De…nition 14-1 (Critical Point): Let h : Rn ! Rm be a C 1 function. A point x is said to be

a critical point of h if and only if the rank of its Jacobian matrix, rank(D1h(x)) is < m.

14-2. Inequality Constraints without Equality Constraints

What if constraints are inequality constraints? Things get a bit more complicated. To developour intuition, let’s consider a simple case of two variables and one inequality constraint.

maxx2R2

f (x1; x2) s.t. g(x1; x2) b (2)

For this problem, two cases can happen. Case 1: The constraint is binding at optimum, i.e.g(x1; x2) = b. Case 2: The constraint is not binding, i.e. g(x1; x2) < b. In other words, theoptimum happens at an interior of the constraint set. Let’s see this graphically. (See the Figure).

Case 1. The maximum would have occurred outside the constraint set if there were no constraint.In this case, we could set it up as a problem with an equality constraint. Thus, the constraintproblem is exactly the same as the one we saw in (1) before and FONCs would be:

@f

@x1(x)

@g

@x1(x) = 0 (3)

@f

@x2(x)

@g

@x2(x) = 0

Case 2. The maximum would be exactly the same as if there were no constraint. In this case, wecould set it up as an unconstrained optimization. FONCs would be just:

@f

@x1

(x) = 0 (4)

@f

@x2(x) = 0

So, we just need a convenient device such that we can deal with both cases. The following‘trick’, called a complementary slackness condition, would do the job.

[b g(x1; x2)] = 0

Let’s examine what this does for us. First, note that (4) becomes (3) if = 0. Suppose that theconstraint is binding. Then, b g(x1; x2) = 0 so can be any value. Suppose that the constraintis not binding. Then, b g(x1; x2) > 0 so that must be zero. So, we have an ideal relationship.

Binding constraint =) @f

@x1(x)

@g

@x1(x) = 0;

@f

@x2(x)

@g

@x2(x) = 0

Non-binding const. =) @f

@x1(x) = 0;

@f

@x2(x) = 0

Question: Can it happen that = 0 and b g(x1; x2) = 0? Answer: Yes. It is called adegenerate case. It rarely happens. But when it does, it means that the unconstrained optimum

16



occurs exactly at the boundary of the constraint set. (See the Figure). These conditions can besummarized as the following result.

Result 14-3 (FONC for (2)): Suppose that in Problem (2) f and g are C 1 function and that(x1; x2) is the solution. Suppose that, if g(x1; x2) = 0, D1g(x1; x2) 6= 0 (that is, (x1; x2) is not a

critical point of g). Then, there exists a real number such that, for the Lagrangian functionde…ned by:

L(x1; x2; ) = f (x1; x2) + [b g(x1; x2)]

we have:

(i) @L

@x1(x1; x2; ) = 0;

(ii) @L

@x2(x1; x2; ) = 0;

(iii) g(x1; x2) b;

0;

[b g(x

1; x

2)] = 0

I advise that you write the last condition (iii) as a whole set. This way, you are less likely toforget when there are many mixed constraints. Note that nonnegativity of the Lagrange multipliercomes only if you set up the Lagrangian function right. For this problem, there are only two wayswe can set it up right.

L(x1; x2; ) = f (x1; x2) + [b g(x1; x2)]

L(x1; x2; ) = f (x1; x2) [g(x1; x2) b]

Use + in front of if you put RHS of ‘less than equal’ constraint …rst (i.e. b g(x1; x2)) and use

in front of if you put RHS of ‘less than equal’ constraint second (i.e. g(x1; x2) b). Question:What if we have a resource constraint of the form: b g(x1; x2)? Answer: The same rule applies.So, the sign changes. Use + in front of if you put RHS of ‘less than equal’ constraint …rst (i.e.g(x1; x2)b). That is, f (x1; x2) + [g(x1; x2) b] or f (x1; x2) [b g(x1; x2)]. In any case, makea habit of using a correct rule, so that you don’t have to ‘think’ while you are writing a Lagrangianin exams.

Now, we generalize this to an optimization problem with n decision variables and m inequalityconstraints.

Theorem 14-4 (FONC for Inequality Constraints): Consider the maximization problem of the form:

maxx2Rn

F (x1; x2;:::;xn)


gm(x1; x2;:::;xn) bm

17



Suppose that F; g1;:::;gm are C 1 functions and that x = (x1; x2;:::;xn) is a local maximum of F on the constraint set. Suppose that, when the …rst k constraints are binding at x, x is not acritical point of gk = (g1;:::;gk). That is, the rank of the Jacobian:

D1gk(x) =0B@

@g1

@x1(x) @g1

@xn(x)

... . . .

...@gk@x1

(x) @gk@xn

(x)

1CAis k. Then, there exists a vector of Lagrange multipliers = (1; 2;:::;m) such that, for theLagrangian function de…ned by:

L(x;) = F (x) + [b g]

= F (x) + 1[b1 g1(x)] + ::: + m[bm gm(x)]

we have:

(i) @ L(x;)

@xi= 0 for all i = 1; 2:::n

(ii) g j(x) b j; j 0; j [b j g j(x)] = 0 for all j = 1; 2:::m

Example 3. Let’s consider a standard utility maximization problem:

maxx2R2

U (x1; x2) s.t. p1x1 + p2x2 I

Assume that p1; p2 > 0, so that CQ is satis…ed. The Lagrangian is:

L = U (x1; x2) + [I p1x1 p2x2]

FONCs are:

@ L

@x1=

@U

@x1 p1 = 0

@ L

@x2=

@U

@x2 p2 = 0

@ L@ = I p1x1 p2x2 0; 0; [I p1x1 p2x2] = 0

Now, let’s impose another assumption on utility, called monotonicity. That is, @U @x1> 0 or @U

@x2> 0.

This says that increasing consumption of at least one good would strictly increase her utility.Under such assumption, cannot be zero, for otherwise, FONCs would become @U

@x1= p1 = 0

or @U @x2

= p2 = 0. Thus, > 0. But, then, from the slackness condition, I p1x1 p2x2 = 0.

18



This means that the consumer spends all of her income if her utility is monotonic increasing. Thisphenomenon is called a Walras Law.

In the above formulation, we have omitted non-negativity constraints: x1; x2 0. We shouldhave the following problem:

maxx2R2

U (x1; x2) s.t. p1x1 + p2x2 I; x1 0; x2 0

How can we incorporate the constraints? Notice that we can rewrite the non-negativity constraintsas:

0 x1; 0 x2

Then, in terms of Theorem 14-4, b1 = 0; g1 = x1; b2 = 0; g2 = x2: Thus, the Lagrangian is:

L = U (x1; x2) + 1[b1 g1] + 2[b2 g2] + 3[I p1x1 p2x2]

= U (x1; x2) + 1x1 + 2x2 + 3[I p1x1 p2x2]

FONCs are:

@ L

@x1=

@U

@x1+ 1 3 p1 = 0; x1 0; 1 0; 1x1 = 0 (5)

@ L

@x2=

@U

@x2+ 2 3 p2 = 0; x2 0; 2 0; 2x2 = 0 (6)

@ L

@ = I p1x1 p2x2 0; 3 0; 3[I p1x1 p2x2] = 0 (7)

Now, note that in (5), if we have x1 > 0, from the slackness condition 1 = 0. Then, it implies that@U @x1

3 p1 = 0. If x1 = 0, then 1 may not be zero, 1 0, which implies:

@U

@x1 3 p1 = 1 0

We can do the same analysis for (6). In summary, we can simplify our FONCs using only oneLagrange multiplier:

@U

@x1 p1 0; x1 0 and

@U

@x1 p1 = 0 if x1 > 0

@U

@x2 p2 0; x2 0 and

@U

@x2 p2 = 0 if x2 > 0

I p1x1 p2x2 0; 0; [I p1x1 p2x2] = 0

This is called Kuhn-Tucker conditions. In essence, Kuhn-Tucker formulation is a special caseof Theorem 14-4, in which there are non-negativity constraints but the Lagrangian is formulatedwithout non-negativity constraints.

19



Theorem 14-5 (Kuhn-Tucker Theorem): Consider the maximization problem of the form:

maxx2Rn

F (x1; x2;:::;xn)



x1 0...

xn 0

Suppose that F; g1;:::;gm are C 1 functions and that x = (x1; x2;:::;xn) is a local maximum of F on the constraint set. Suppose that the modi…ed Jacobian matrix:

(@g j=@xi) ji

has maximal rank, where the j’s run over the indices of the g j that are binding at x, and the i’srange over the indices i for which xi > 0. Then, there exists a vector of Lagrange multipliers suchthat, for the Kuhn-Tucker Lagrangian de…ned by:

L(x;) = F (x) + [b g]

= F (x) + 1[b1 g1(x)] + ::: + m[bm gm(x)]

we have:

(i) @ L(x;)

@xi 0; xi 0; xi

@ L(x;)

@xi= 0 for all i = 1; 2:::n

(ii) g j(x) b j; j 0; j [b j g j(x)] = 0 for all j = 1; 2:::m

So, we have n slackness conditions for all non-negativity constraints and m slackness conditionsfor all regular constraints.

Lastly, I will state the theorem for the case of mixed constraints. It is a straightforwardadaptation of Theorems 14-2 and 14-4.

Theorem 14-6 (FONC for Mixed Constraints): Consider the maximization problem of theform:

maxx2Rn

F (x1; x2;:::;xn)

20





h1(x1; x2;:::;xn) = c1...

hm(x1; x2;:::;xn) = cm

Suppose that F; g1;:::;gk; h1;:::;hm are C 1 functions and that x = (x1; x2;:::;xn) is a local max-imum of F on the constraint set. Without loss of generality, assume that the …rst k0 inequalityconstraints are binding at x. Suppose that the rank of the Jacobian matrix of the equality con-straints plus binding constraints:

0BBBBBBBBB@

@g1

@x1 (x

) @g1

@xn (x

)... . . .

...@g

k0

@x1(x)

@gk0

@xn(x)

@h1@x1

(x) @h1@xn

(x)...

. . . ...

@hm@x1

(x) @hm@xn

(x)

1CCCCCCCCCAis k0+m: Then, there exists a vector of Lagrange multipliers = (1; 2;:::;k); = (1; 2;:::;m)such that, for the Lagrangian function de…ned by:

L(x;) = F (x) + [b g] + [c h]= F (x) + 1[b1 g1(x)] + ::: + k[bk gk(x)] + 1[c1 h1(x)] + ::: + m[cm hm(x)]

we have:

(i) @ L(x;)

@xi= 0 for all i = 1; 2:::n

(ii) g j(x) b j; j 0; j [b j g j(x)] = 0 for all j = 1; 2:::k

(iii) hl(x) = cl; for all l = 1; 2:::m

Example 4. Let’s conclude this section with a couple of concrete examples. Consider an opti-mization problem with mixed constraints:

max F (x; y) = 3xy x2

21



subject to 2x y = 5 (1)

5x + 2y 8 (2)

x 0 (3)y 0 (4)

Let us use a Kuhn-Tucker formulation. So, the Lagrangian can be written:

L = 3xy x2 + 1[5 2x + y] + 2[5x + 2y 8]

First, we need to check a constraint quali…cation. The Jacobian of the system (1)-(2) (for Kuhn-Tucker) is:

@g1=@ x @ g1=@y@g2=@ x @ g2=@y

=

2 15 2

The two vectors are linearly independent, so it has maximal rank=2, regardless of what the solutionsare. Thus, it satis…es CQ. Now, from KTNC, we have:

@ L

@x = 3y 2x 21 + 52 0; x 0; and x[2y 3x2 21 + 52] = 0 (5)

@ L

@y = 3x + 1 + 22 0; y 0; and y [3x + 1 + 22] = 0 (6)

@ L

@1= 5 2x + y = 0 (This must be always binding) (7)

@ L

@2

= 5x + 2y 8 0; 2 0; and 2[5x + 2y 8] = 0 (8)

There may be a couple of possible cases. But, let’s eliminate impossible cases. Suppose x = 0.Then, from (7), we have y = 5 < 0, which contradicts y 0. So, x > 0. If y = 0, then from (7),we have x = 5=2. We are done. Now, suppose that both x > 0 and y > 0. By slackness conditions,we know that:

3y 2x 21 + 52 = 0 (9)

3x + 1 + 22 = 0 (10)

Combining these with (7), we have three equations with four unknowns. We need to eliminate at

least one more variable. Fortunately, we can eliminate 2. To see this, suppose by contradictionthat 2 > 0. Then, it must be the case that 5x + 2y = 17 by slackness condition. Combining with(7),

5x + 2y = 8

4x 2y = 10

22



This gives us: x = 2; y = 1 < 0. So, this cannot be a solution. Thus, we can set 2 = 0. Then,the system (7), (9), and (10) reduces to a system of 3 equations with 3 unknowns:

2x y = 5

3y 2x 21 = 0

3x + 1 = 0

With appropriate algebra, we should be able to solve this system. But, in fact, solving this leads toy = 2, which contradicts y > 0. Thus, the unique (candidate) solution for this problem is x = 5=2and y = 0.

Example 5. Lastly, let’s consider the following utility maximization problem.

max U (x; y) = x2 + y2

subject to 2x + y 4 (1)

x 0 (2)

y 0 (3)

For this problem, it might be helpful to use a diagram. Draw several level sets for the objectivefunction. Each level set is a quarter piece of a circle in the positive orthant of R2. So, from thediagram, it is clear that a solution is on the y-axis. The question is: Can we con…rm this using theKuhn-Tucker theorem? (BTW, you will be often asked to do both diagrammatic and mathematicalanalyses in a micro sequence). The Lagrangian is:

L = x2 + y2 + [4 2x y]

By FONCs, we have:

Lx = 2x 2

= 0 if x > 0 0 if x = 0

(4)

Ly = 2y

= 0 if y > 0 0 if y = 0

(5)

L = 4 2x y

= 0 if > 0 0 if = 0

(6)

Note that U x = 2x 0 and U y = 2y 0 and it cannot happen that both x = y = 0 (for example,(1; 1) is within the budget and U (1; 1) > U (0; 0)). So, We have U x = 2x > 0 or U y = 2y > 0. Thus,by Walras law, the consumer must spend all of her resource. We can consider three possible casesof solutions (See the diagram).

Case 1: x > 0; y > 0:Case 2: x > 0; y = 0:

23



Case 3: x = 0; y > 0:

Consider each case.Case 1: Suppose x > 0; y > 0. From (4), 2x 2 = 0, so that x = : From (5), 2y = 0,so that 2y = . Thus, x = 2y. By assumption, x > 0; y > 0 so that > 0. This implies that4 2x y = 0. So, combining with x = 2y, we have: 4 4y y = 0. So, y = 4=5, x = 8=5.

Case 2: Suppose x > 0; y = 0: From (4), 2x 2 = 0, so that x = : This implies > 0. So,from (6), 4 2x y = 0. Substitute y = 0, then we have x = 2.

Case 3: Suppose x = 0; y > 0: From (5), 2y = 0, so that 2y = . This implies > 0. So,again from (6), 4 2x y = 0. Substitute x = 0, then we have y = 4.

So, we have obtained three solutions directly from KT conditions. But, substitute those solutionsinto the objective:

Case 1: U (x

; y

) = (8=5)2

+ (4=5)2

= 6425 +

1625 =

8025 =

165 :

Case 2: U (x; y) = (2)2 + (0)2 = 4 = 205 :

Case 3: U (x; y) = (0)2 + (4)2 = 16 = 805 :

Obviously, x = 0, y = 4 gives the highest value of the objective, so it is the solution. The lessonhere is: Although KT conditions are useful, they are still necessary conditions for optimality. Wewill learn second order su¢cient conditions in the next section.

15. Constrained Optimization II: Comparative Statics and SOCs

In this section,we will learn (i) comparative statics and (ii) second order conditions that distin-guish maxima from minima.

15-1. Comparative Statics and Envelope Theorem

By comparative statics, we mean the sensitivity of (i) the optimal value of the objective and(ii) the optimal value of the decision variables, with respect to changes in primitive parameters of the problem. For example, in a standard utility maximization,

max U (x; y)

subject to px + y I

x 0

y 0

We have two primitive parameters, i.e. relative price of a good, p, and income, I . Thus, in general,the solution to this problem will be written as:

24



x( p; I )

y( p; I )

v( p; I ) = U (x

( p; I ); y

( p; I ))

as functions of primitive parameters. As many of you know, v( p; I ) is called an indirect utilityfunction and x( p; I ); y( p; I ) are (individual) demand functions. Our interest here is: howwould a change in these parameters change v( p; I ); x( p; I ); y( p; I )? More explicitly, what are:

@x( p; I )

@I ;

@x( p; I )

@p ;

@v( p; I )

@I ;

@v( p; I )

@p

In many economic applications, we have to work very hard to get these comparative statics.However, there are several useful theorems that we can employ in such an endeavor: a theorem onshadow price and Envelope Theorem. To get an idea, let’s turn to a simple problem. One primitive

parameter on one equality constraint.

max f (x; y) (1)

subject to g(x; y) = a

We can write the optimal value of this problem as:

f (x(a); y(a))

We can prove that:

(a) = dda

f (x(a); y(a))

where (a) is a Lagrange multiplier on the equality constraint. This says that the Lagrangemultiplier measures the rate of change of the optimal value of f with respect to a marginal change

in the resource constraint . Let’s prove this result.

Proof : The Lagrangian for problem (1) is:

L = f (x; y) + [a g(x; y)]

By FONC, we have:

@ L

@x =

@f (x; y)

@x

@g(x; y)

@x = 0 =)

@f (x; y)

@x =

@g(x; y)

@x (2)

@ L

@y =

@f (x; y)

@y

@g(x; y)

@y = 0 =)

@f (x; y)

@y =

@g(x; y)

@y (3)

Now, because g(x(a); y(a)) = a for all a, we can consider this as an identity. So, let’s take aderivative of both sides of the identity w.r.t. a:

25



dg(x(a); y(a))

da =

d

daa =)

@g(x; y)

@x

dx(a)

da +

@ g(x; y)

@y

dy(a)

da = 1 (4)

Using a chain rule, we can compute:

d

daf (x(a); y(a)) =

@f (x; y)

@x

dx(a)

da +

@ f (x; y)

@y

dy(a)

da

Substitute (2) and (3) …rst and then (4),

d

daf (x(a); y(a)) =

@f (x; y)

@x

dx(a)

da +

@ f (x; y)

@y

dy(a)

da

= (a)@g(x; y)

@x

dx(a)

da + (a)

@g(x; y)

@y

dy(a)

da

= (a) @g(x; y)

@x

dx(a)

da +

@ g(x; y)

@y

dy(a)

da = (a)

Thus, we have obtained the desired result. QED

It is relatively straightforward to generalize this result to the case of several equality constraintsand inequality constraints. I will state it without proof.

Theorem 15-1 (Shadow Price): Consider the following maximization problem:

maxx2Rn

F (x1; x2;:::;xn)



h1(x1; x2;:::;xn) = c1...

hm(x1; x2;:::;xn) = cm

Suppose that F; g1;:::;gk; h1;:::;hm are C 1 functions and that x = (x1; x2;:::;xn) is a local max-imum of F on the constraint set. Let (b; c) = (1; 2;:::;k);(b; c) = (1; 2;:::;m) be thecorresponding Lagrange multipliers for the Lagrangian:

L(x;) = F (x) + [b g] + [c h]

= F (x) + 1[b1 g1(x)] + ::: + k[bk gk(x)] + 1[c1 h1(x)] + ::: + m[cm hm(x)]

26



Suppose that x(b; c);(b; c);(b; c) are di¤erentiable w.r.t. (b; c) and that relevant CQ holdsat x(b; c). Then,

j (b; c) =

@F (x(b; c))

@b j for all j = 1; 2;::;k

l (b; c) = @F (x(b; c))

@clfor all l = 1; 2;::;m

Thus, a Lagrange multiplier j measures the e¤ect of a marginal change in input j on theobjective value. In this view, economists often call j the shadow price (or imputed value) of input j .

Example 1.max U (x; y)

subject to px + y I

x 0

y 0

Let’s treat price p as …xed, so we write the objective value as U (x(I ); y(I )). Then, by the abovetheorem,

= dU (x(I ); y(I ))

dI

Thus, the Lagrange multiplier on the budget constraint measures the shadow price of income.

Now, I will state and prove an easy version of Envelope Theorem. Although some people may…nd it obvious, this is a very useful result and, yet, is sometimes not understood properly. Theconfusion comes mainly from the notation. For this reason, I slightly deviate from Simon & Blume’snotation.

Theorem 15-2 (Envelope Theorem I): Let F : Rn Rm ! R be a continuously di¤erentiable

function. For each …xed vector of parameters 2 Rm, consider the maximization problem:

maxx2Rn

F (x;)

Let x

() be the solution and v() = maxx2Rn F (x;) = F (x

();) be the value function of this problem. If x() is a C 1 function, then:

@v()

@ j=

@F (x;)

@ j

x=x()

(#)

That is, the derivative of the value function w.r.t. a parameter is equal to the derivative of theobjective function w.r.t. that parameter evaluated at optimum x().

27



Proof : We can prove this using Chain Rule and total di¤erential:

@v()

@ j=

@F (x();)

@ j

=

" nXi=1

@F (x;)

@xi

@xi ()

@ j+

@F (x;)

@ j

#x=x()

= @F (x;)

@ j

x=x()

where the last equality follows, because @F (x;)@xi

= 0 at optimum by FONC. QED:

Question: What’s the point of this theorem? Answer: It is sometimes cumbersome to com-pute LHS of (#). But, by this theorem, we can simply compute RHS of (#). To see how it works,consider the following example.

Example 2.

maxx

x2 + 2ax + 4a2

where a is the primitive parameter of this problem. We are interested in the e¤ect of a change in aon the optimal value. The direct approach is to compute LHS of (#). Let’s …rst derive the solution…rst. By FONC,

F x = 2x + 2a = 0

) x = a

Then, the value function is:

v(a) = a2 + 2a a + 4a2 = 5a2

Thus, its derivative is:

dv(a)

da = 10a

Now, let’s use Envelope Theorem. We can simply compute:

@F

@a = 2x + 8a

Evaluating this at optimum x = a, we obtain 2x + 8a = 10a, which is exactly the same as before.

A more surprising result is that we can generalize this result for parameters on constraints.That is,

Theorem 15-3 (Envelope Theorem II): Let F; G : RnRm ! R be a continuously di¤erentiablefunction. For each …xed vector of parameters 2 Rm, consider the maximization problem:

28



maxx2Rn

F (x;) subject to G(x;) 0

(Note that some of the parameters may be in the objective while others may be on the constraint).Let x() be the solution and v() = F (x();) be the value function of this problem. Writethe Lagrangian:

L = F (x;) G(x;)

If x() and () are C 1 functions and CQ conditions are satis…ed, then

@v()

@ j=

@ L(x;;)

@ j

x=x();=()

In fact, this theorem can generalize to the case of several constraints. Let’s see how this works

with a couple of examples.

Example 3. Consider a standard utility maximization again.

max U (x; y)

subject to px + y I

x 0

y 0

Let v( p; I ) = U (x( p; I ); y( p; I )) be its indirect utility function and L = U (x; y) + [I px y].Then, using Envelope Theorem,

@v( p; I )

@I =

@ L

@I

x( p;I );y( p;I );( p;I )

= ( p; I )

This shows that the shadow price theorem is in fact a special case of Envelope Theorem. Advantageof Envelope Theorem, however, goes beyond that of shadow price theorem. Let’s take a derivativew.r.t. price.

@v( p; I )

@p =

@ L

@p

x( p;I );y( p;I );( p;I )

= ( p; I )x( p; I )

Substitute the result above for ( p; I ), then

29



@v( p; I )

@p

@v( p; I )

@I = x( p; I )

Or in a more conventional notation,

@U (x; y)

@p

@U (x; y)

@I = x

In general, we can generalize this to n goods case,

@U (x)

@pi

@U (x)

@I = xi for i = 1; 2;:::n

This is called Roy’s Identity, which says that (Marshallian) demand for good i can be obtainedby a quotient of two partials of the value function.

Example 4. Consider a …rm’s cost minimization problem:

min w x

subject to F (x) y

where w is a vector of input prices and F is a production function. Let C (w) = w x andL = w x+[F (x) y]. (Note minw x = maxw x). By Envelope Theorem,

@C (w)

@wi=

@ L

@wi

x;

= xi for i = 1; 2;:::n

This is called Shephard’s Lemma, which says that input demand for good i can be obtained bya derivative of the cost function evaluated at optimum.

15-2. Second Order Conditions

Up to now, we have only looked at …rst order necessary conditions. But, as we know, FONCsdo not guarantee a solution. We need second order conditions. Recall that, from Section 13, weneed to check the de…niteness of the Hessian matrix for su¢cient conditions:

Theorem 13-2 (SOSC for Local Max/Min): Let F : U Rn ! R be a C 2 function. Suppose

that U is open and that D1F (x) = 0T .(i) If the Hessian D2F (x) is negative de…nite, then x is a (strict) local maximum of F ;(ii) If the Hessian D2F (x) is positive de…nite, then x is a (strict) local minimum of F ;(iii) If the Hessian D2F (x) is inde…nite, then x is neither local max nor minimum of F .

But, this theorem is for unconstrained optimization . What if we have constraints? Intuitively,what we only need is to study the de…niteness of the Hessian on the constraint space . How can we

30



do that? Now, recall that we have studied a very simple case of constrained optimization with aquadratic function in Section 12. We constructed a bordered Hessian of the form:

H n+1 = 0 c

cT A where c is a vector of coe¢cients for a linear constraint: c x = 0. We basically combine thesemethods together.

Theorem 15-4 (SOSC for Constrained Local Max/Min): Let F; h1;:::;hm be C 2 functionson Rn. Consider the problem:

maxx2Rn

F (x1; x2;:::;xn)

subject to h1(x1; x2;:::;xn) = c1

...

hm(x1; x2;:::;xn) = cm

Form a Lagrangian as usual, L = F (x) + [c h], and let x; satisfy FONCs. Suppose that"the Hessian of L w.r.t. x at (x;), D2

xL(x;), is negative de…nite on the linear constraint setfv : D1h(x)v = 0g", then x is a strict local constrained maximum of F .

What does the condition in quotation mean? It means that the following bordered Hessianmatrix.

H m+n =

0BBB@mm z}|{ 0

mn z }| { D1h(x)

D1h(x)T | {z } nm

D2xL(x;) | {z }

nn

1CCCA

=

0BBBBBBBBB@

0 0...

. . . ...

0 0

@h1@x1

@h1@xn

... . . .

...@hm@x1

@hm@xn

@h1@x1

@hm@x1

... . . .

...@h1@xn

@hm@xn

@ 2L@x2

1

@ 2L@xn@x1

... . . .

...@ 2L

@x1@xn

@ 2L

@x2

n

1CCCCCCCCCA

satis…es two conditions: (i) the last (n m) leading principal minors alternate in sign, and (ii)det(H ) > 0 if n is even or det(H ) < 0 if n is odd.

There are related theorems in Simon & Blume, pp.460-467. As you can see, it is very cum-bersome to check second order conditions (even with computer, writing a code is cumbersome).Moreover, we cannot guarantee a global solution from this second order condition. So, what we

31



usually do is to study characteristics of objective and constraint functions such that global solutionsare guaranteed. The method is called concave programming.

Example 5. Consider a standard utility maximization again:

max U (x; y)

subject to p1x + p1y = h(x; y) = I

Let’s construct a bordered Hessian.

H =

0@ 0 h1 h2

h1 Lxx Lyx

h2 Lxy Lyy

1A =) H =

0@ 0 p1 p2

p1 U xx U yx p2 U xy U yy

1A (by elementary row operations)

We want to have:

det(H ) = det

0@ 0 p1 p2

p1 U xx U yx p2 U xy U yy

1A > 0

It turns out that det(H ) > 0 if U is quasiconcave, a concept we study in the next section.

16. Concave and Quasiconcave Functions

In economic applications, you will often encounter concave and quasiconcave functions. It isvery essential that you familiarize yourself with properties of these functions. In general,

(i) For a global maximizer of an unconstrained maximization problem, we want the objective

function to be concave.(ii) For a global maximizer of a constrained maximization problem, we want the objective tobe quasiconcave and constraint functions to be quasiconvex.

The objective of this section is three-fold:(a) To learn properties of concave and convex functions;(b) To learn properties of quasiconcave and quasiconvex functions;(c) To learn concave programming.

16-1. Concave and Convex Functions

First, let’s recall the de…nitions of concave and convex functions.

De…nition 13-2 (Concave & Convex Functions): A function F : U Rn ! R is concave on

U if and only if for all x;y 2 U , for all 2 [0; 1], we have:

F [x + (1 )y] F (x) + (1 )F (y)

A function F : U Rn ! R is convex on U if and only if for all x;y 2 U , for all 2 [0; 1], wehave:

32



F [x + (1 )y] F (x) + (1 )F (y)

A function F : U Rn ! R is strictly concave on U if and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F [x + (1 )y] > F (x) + (1 )F (y)

A function F : U Rn ! R is strictly convex on U if and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F [x + (1 )y] < F (x) + (1 )F (y)

We have the following characterizations of concave (convex) functions.

Theorem 16-1 (Properties of Concave Functions): Let F : U Rn ! R be a C 2 unctionand U is a convex open subset of Rn.

(a) F is a concave function on U if and only if:(i) F (y) F (x) D1F (x)(y x) for all x;y 2 U ;(ii) D2F (x) is negative semide…nite for all x 2 U ;(iii) F is convex on U ;(iv) F is concave for all 0;(v) For i = 1;:::;n, F (x1;:::;xi;:::; xn) is concave in xi for each …xed ( x1;:::xi1; xi+1:::; xn);(vi) its restriction to every line segment in U is a concave function.

(b) Let F 1; F 2;:::;F m be concave functions and let 1; 2;:::;m be positive numbers. Then, linearcombination

Pm j=1 jF j is a concave function.

(c) If F is a concave function on U , then for every x 2 U , the upper contour set:

C +(x) = fx 2 U : F (x) F (x)g

is a convex set. (If F is convex, then the lower contour set is a convex set). The converse is nottrue in general.

We saw (a)-(i),-(ii), and -(ii) already. (a)-(iv) is obvious. Implication of (v) and (vi) is veryimportant — it says that we can determine the concavity of a function just by looking at thegeometric feature of its graph. In particular, let F be a function from R

2 to .R Then, (vi) meansthat if one slices the image of F along any line segment in R

2, then we should see the graph just likethat of a one-variable concave function. Question: Can you write down the analogous theoremfor convex functions?

Example 1. In a dynamic programming, we often see the following time-separable utility functionfor an in…nite sequence of consumption c = fctg1t=0:

U (c) =1Xt=0

tut(ct)

where is a discount factor. If each ut() is concave, then U (c) is also concave by (b), as long asthe in…nite sum is well-de…ned.

33



Example 2. Consider a Cobb-Douglas function:

F (x; y) = Axayb where A;a;b > 0, a + b 1

Question: Is this concave or convex? Answer: It is a concave function on R2++. To see this, we

only need to prove that xayb is concave. Let’s construct a Hessian:

D2F = H =

a(a 1)xa2yb abxa1yb1

abxa1yb1 b(b 1)xayb2

det(H 1) = a(a 1)xa2yb < 0, because a 1 < 0

det(H ) = ab(1 a b)x2a2y2b2 0, because a + b 1

So, under the assumption, the Hessian is negative semide…nite. So, F is concave. In fact, if a+b = 1,F is concave. If a + b < 1, F is strictly concave.

Example 3. We can show that an expenditure function is concave in prices. Consider a standardexpenditure minimization problem.

minx2Rn

+

p x subject to U (x) U

Let e(p; U ) = minx2Rn+fp x : U (x) U g be an expenditure function. We want to show e(; U )

is a concave function of p for each …xed U . Now, pick arbitrary points p1;p2 and 2 [0; 1]. Letp = p1 + (1 )p2 and let x (p1) ;x (p2) ;x (p) be the solution to each minimization problem.Now, by de…nition,

e(p1; U ) = minx2Rn

+

fp1 x : U (x) U g

= p1 x (p1)

p1 x (p) because x (p) is not the solution (1)

Similarly,

e(p1; U ) = p2 x (p2)

p2 x (p) because x (p) is not the solution (2)

Its convex combination is,

e(p1; U ) + (1 )e(p2; U ) = p1x (p1) + (1 )p2x (p2)

p1 x (p) + (1 )p2 x (p) by (1) and (2)

= [p1 + (1 )p2] x (p)

= p x (p) = e(p; U ) by de…nition

Using a similar argument, we can show that a pro…t function is convex in prices. Try it yourself.

34



16-2. Quasiconcave and Quasiconvex Functions

Let’s …rst de…ne what it means by quasiconcave/quasiconvex functions.

De…nition 16-1 (Quasiconcave & Quasiconvex Functions):(i) A function F : U Rn ! R is quasiconcave on U , where U is a convex set in Rn; if and onlyif for all x;y 2 U , for all 2 [0; 1], we have:

F (x) F (y) =) F (x + (1 )y) F (y)

(ii) A function F : U Rn ! R is quasiconvex on U , where U is a convex set in Rn; if and only

if for all x;y 2 U , for all 2 [0; 1], we have:

F (x) F (y) =) F (x + (1 )y) F (y)

(iii) A function F : U Rn ! R is strictly quasiconcave on U , where U is a convex set in Rn;

if and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F (x) F (y) =) F (x + (1 )y) > F (y)

(iii) A function F : U Rn ! R is strictly quasiconvex on U , where U is a convex set in Rn; if

and only if for all x 6= y 2 U , for all 2 (0; 1), we have:

F (x) F (y) =) F (x + (1 )y) < F (y)

What does a quasiconcave (quasiconvex) function looks like? (See the graphs).Are these func-tions quasiconcave?

Obviously, quasiconcavity is a geometric characterization. So, we have the following equivalencetheorem.

Theorem 16-2 (Properties of Quasiconcave Functions): Let F : U Rn ! R be de…ned on

a convex set U . Then, the following are equivalent.(a) F is quasiconcave on U;(b) For every real number a, the upper contour set de…ned by,

C +(a) = fx 2 U : F (x) ag

is a convex set in U .(c) For all x;y 2 U , for all 2 [0; 1], we have:

F (x + (1 )y) minfF (x); F (y)g

(d) F is quasiconvex on U .

Moreover, in view of Theorem 16-1 (c), we have:

35



F is concave =):

F is quasiconcave

F is strictly concave =): F is strictly quasiconcave

Theorem 16-3 (Cobb-Douglas Functions): Any Cobb-Douglas function F (x; y) = Axayb isquasiconcave for A;a;b > 0.

Question: How can we check if a function F is quasiconcave on R or R2? Answer: If F : R! R

is monotone increasing, it is quasiconcave. To see this, suppose F (x) F (y). Then, it must implythat x y , for otherwise x < y would imply F (x) < F (y). A contradiction. But, then, it convexcombination x + (1)y y. So, by monotonicity, F (x + (1)y) F (y). Now, if F : R2 ! R,then we use the idea that level sets for quasiconcave function must form a convex function. So,follow the steps:

(i) Set F (x; y) = c for some arbitrary constant c.(ii) Solve for y to get: y = g(x; c).(iii) Check if g is convex. Recall that g is convex i¤ g 00 is positive.

Example 4. Let’s do this for Cobb-Douglas function. Let F (x; y) = Axayb = c. Solving for y,

y = (c=Axa)1=b = (c=A)1=bxa=b. Now, take the second derivative. y0 = (a=b) (c=A)1=bxa=b1 0; y00 = (a=b) (a=b + 1)(c=A)1=bxa=b2 0 if A;a; b > 0 (Note F cannot take negative values for(x; y) 2 R2

+ so that c 0). So, it is convex.

If F : Rn ! R where n 2, then we need to invoke the following theorem to check thequasiconcavity of F .

Theorem 16-4 (Bordered Hessian Test): Let F : U Rn ! R be a C 2 function on a convex

set U . Consider the following bordered Hessian:

H =

0 D1F (x)

D1F T (x) D2F (x)

(n+1)(n+1)

=

0BBB@

0 F x1 F xnF x1 F x1x1 F x1xn

... ...

. . . ...

F xn F xnx1 F xnxn

1CCCA(a) Starting with the 3rd leading principal minors, if the (n 1) leading principal minors of H

alternate in sign and the sign starts with a positive sign for all x 2 U , then F is quasiconcave. *(a) Starting with the 3rd leading principal minors, if the (n 1) leading principal minors of H areall negative in sign for all x 2 U , then F is quasiconvex.

*Remark: Some textbooks start with the 2nd LPM, instead of 3rd LPM, and its sign must benegative.

36



Example 5. Let’s use this for F (x; y) = xy, a special case of Cobb-Douglas. F x = y; F y = x; F xx =0; F xy = 1; F yy = 0. So, the bordered Hessian is:

H = 0@0 y xy 0 1x 1 0

1AWe start with the third leading principal minors. That means det(H ).

det(H ) = 0

0 11 0

y

y 1x 0

+ x

y 0x 1

= xy + xy = 2xy > 0 for all x; y > 0

Thus, F (x; y) = xy is quasiconcave on R2++.

16-3. Concave Programming

Concave programming is a name assigned for a class of techniques/theorems for solving pro-gramming problems with concave/quasiconcave functions. Although it is a very rich …eld withmany analytically important theorems, as applied economists, we probably need to know just twoof them at least for this lecture. We have seen the …rst one already before.

Theorem 13-4 (Su¢ciency for Unconstrained Global Min/Max): Let F : U Rn ! R

be a C 1 function and U is a convex and open subset of Rn.(c) If F is a concave function on U and D1F (x) = 0 for some x 2 U , then x is a global maximumof F on U .(d) If F is a convex function on U and D1F (x) = 0 for some x 2 U , then x is a global minimumof F on U .

Question: Why is it essential to have U convex and open? Answer: If U is not convex, then F is not well de…ned as a concave or convex function (because convex combination x + (1 )y inU is not well de…ned). If U is not open, then there may be a boundary solution at which FONCsmay fail.

Theorem 16-5 (Su¢ciency for Constrained Global Min/Max): Let F; g1;:::;gm : U Rn ! R be a C 1 function and U is a convex and open subset of Rn. Consider the maximization

problem:

maxx2Rn

F (x1; x2;:::;xn)



37



Suppose that F is quasiconcave on U and g1;:::;gm are quasiconvex on U . If all the necessary FOCsand CQ (as stated in Theorem 14-4) are satis…ed at x, then x is a global maximum of F on theconstraint set.

What does this theorem mean intuitively? Let’s consider a standard utility maximizationproblem.

max U (x; y)

subject to px + y I

Suppose U = R2++. Note that the budget function g(x; y) = px + y is a quasiconvex (actually also

quasiconcave) function, because its level sets are linear. Suppose U (x; y) is (strictly) quasiconcave.Then, there will be a tangent point that satisfy FONCs. Suppose on the contrary that U (x; y)is strictly quasiconvex. Then, the tangency does not mean a maximum. There is also a casewhere U (x; y) is not even quasiconvex. Then, it does not guarantee a global maximum. Consider,

furthermore, the case where U (x; y) is quasiconcave but g(x; y) is not quasiconvex. Then, a tangency(FONC) does not imply a global solution. (See the graphs).

Documents

MathReview_3