14. Calculus of Several Variables

14. Calculus of Several Variables

10/12/21

New Homework: Problems 13.17, 13.20, 14.7, 14.9, and 14.13 are due on Tues-day, October 19.

Although the chapter title refers to calculus, the focus of the chapter is rather narrower,concentrating on various types of derivatives. Much of the material here is referred toas advanced calculus, or even introductory analysis.

14.1 The Ordinary Derivative

We begin by recalling the ordinary derivative. Let U ⊂ R be an open set. A functionf : U → R is differentiable at x0 ∈ U if

f′(x0) = limh→0

f(x0 + h) − f(x0)

h

exists. It is differentiable on U if the derivative f′(x0) exists for every x0 ∈ U. Theexpression inside the limit, both here and in similar definitions, is the difference quotient.The derivative is also denoted df/dx.

2 MATH METHODS

14.2 Difference Quotients

The difference quotients used to define the derivative are the slopes of chord runningbetween

(

x0, f(x0))

and(

x0 + h, f(x0 + h))

. We take the limit of these slopes to findthe slope of the tangent to f at x0. The derivative can be used to define a line, L by

y = f(x0) + f′(x0)(x− x0).

The line L is tangent to the graph of f at the point(

x0, f(x0))

.

x

y L

b

x0

f

Figure 14.2.1: The tangent line L at x0 has the equation y = f(x0) + f′(x0)(x− x0).

Linear Approximation. The tangent line L is a linear approximation of f(x) for x near x0.Setting ∆x = x− x0, and ∆y = y− f(x0), the equation becomes

∆y = y− f(x0) = f′(x0)∆x.

Now y approximates f(x), so f(x) ≈ f(x0) + f′(x0)∆x.

14. CALCULUS OF SEVERAL VARIABLES 3

14.3 Partial Derivatives

In economics, we often deal with functions defined on a subset of Rm. Production andutility functions, cost, expenditure, and profit functions are all examples. Let U ⊂ Rm

be an open set and suppose f : U → R.We define the ith partial derivative of f at x0 ∈ U by

∂f

∂xi(x0) = lim

h→0

f(x0 + hei) − f(x0)

h

provided the limit exists.

4 MATH METHODS

14.4 Computing Partial Derivatives

In the partial derivative, all of the variables except for xi are held constant. We thencompute the partial derivative by taking the limit of difference quotients as h → 0. Itgives the slope of a tangent line in direction ei. It is also the rate of change of thefunction f in the direction ei.

Since all of the variables except xi are held constant when computing ∂f/∂xi, wecan compute ∂f/∂xi by treating the other variables as constant and taking the ordinaryderivative. Thus

∂(x2 + y2)

∂x= 2x,

∂(x2y + xyz + y2z3)

∂x= 2xy + yz, and

∂(x2y + xyz + y2z3)

∂y= x2 + xz + 2yz3.


14.5 Higher Partial Derviatives

If a partial derivative of f exists, it too may have partial derivatives. Expressions such as

∂

∂y

(

∂f

∂x

)

=∂

2f

∂y ∂xand

∂

∂x

(

∂f

∂x

)

=∂

2f

∂x2.

are used to write second (and higher) partial derivatives. Another common notationuses subscripts. Thus

∂f

∂x= fx,

∂

∂y

(

∂f

∂x

)

= (fx)y = fxy, and∂2f

∂x2= fxx.

Ck Functions. The notation Ck denotes the set of k-times continuously differentiablefunctions. A function f ∈ Ck if and only if all of the first k partial derivatives exist andare continuous. When all of the partial derivatives of f exist and are continuous, wewrite f ∈ C∞. A vector-valued function f is Ck if and only if each component functionis Ck.

6 MATH METHODS

14.6 Partial Derivatives and Utility

In consumer theory, partial derivatives can be used to compute marginal utilities andmarginal rates of substitution.

Example 14.6.1: Utility Functions. Suppose we have a Cobb-Douglas utility function

u(x, y) = xαy1−α

defined on R2++ where 0 < α < 1. The partial derivatives are the marginal utilities

MUx =∂u

∂x= αxα−1y1−α and MUy =

∂u

∂y= (1 − α)xαy−α.

We can use the marginal utilities to construct the marginal rate of substitution:

MRSxy =MUa

MUb

=α

1 − α

y

x.


14.7 Partial Derivatives and Production

Partial derivatives are also used in producer theory. For one, they can be used to expressmarginal products and the marginal rate of technical substitution.

Example 14.7.1: Production Functions. Similarly, if

f(x1, x2) =√x1 +

√x2

is a production function on R2++, the partial derivatives are the marginal products

MP1 =∂f

∂x1=

1

2√x1

and MP2 =∂f

∂x2=

1

2√x2

.

Then the marginal rate of technical substitution is

MRTS12 =MP1

MP2=

√

x2

x1.

8 MATH METHODS

14.8 Approximation via Partial Derivatives

We can also use partial derivatives to approximate functions. The key is that if we havea small change in x, ∆x, the function will change by approximately

∆f ≈ ∂f

∂x∆x.

Here’s an example to show how it works.

Example 14.8.1: Approximation. Take the Cobb-Douglas production function F(K, L) =6K1/3L2/3. Suppose K = 8000 and L = 2744, yielding F(8000, 2744) = 23, 520. Nowsuppose K increases by ∆K = 10. We estimate the effect on output using ∂f/∂K.

∂f

∂K(8000, 2744) = 2K−2/3L2/3 = 2(8000)−2/3(2744)2/3 = .98

so we expect output to increase by about

∂f

∂K(8000, 2744) × ∆K = 0.98 × 10 = 9.8.

to 23, 520 + 9.8 = 23, 529.8. The actual value is f(8010, 2744) ≈ 23, 529.796, so youcan see it is a pretty good approximation, at least for relatively small changes in K orL.


14.9 Frechet Differentiable Functions

The next step is to consider differentiability for vector-valued functions. The definitionmay look a little strange if you have never seen it before. However, it is a generalizationof the ordinary derivative.

Frechet Derivative. Suppose f : U → Rm where U is an open subset of Rk The functionf is Frechet differentiable at x0 ∈ U if there is a linear function L : Rk → Rm such that

lim‖h‖→0

f(x0 + h) − f(x0) − L(h)

‖h‖ = 0.

In that case L is the Frechet derivative at x0. If f is Frechet differentiable at every x0 ∈ U,we say that f is Frechet differentiable on U. The Frechet derivative at x0 is denotedDf(x0), or Dfx0. If we need to to specify which variables we are using to differentiate,we will use Dxf(x0).

10 MATH METHODS

14.10 Derivative of the Dot Product

We apply the definition to the dot product. The derivative of p·x is the covector pT .This makes sense as the derivative is a linear functional of the vector x, which must bea covector.

Example 14.10.1: Dot Product. An easy example is f(x) = p·x =∑k

i=1 pixi. Heref : Rk → R. Since it is a linear function, it is its own linear approximation.

If we set L(h) = f(h) = p·h, we find that

f(x0 + h) − f(x0) − L(h) = p·(x0 + h) − p·x0 − p·h = 0.

We divide by ‖h‖ and take the limit. The limit is zero, so the linear function L : Rk → R

is the Frechet derivative. As a linear functional, it is represented by the row vector p.In other words,

D(p·x)x0 = pT = (p1, . . . , pk)

for all x0 ∈ Rk.


14.11 Frechet and Ordinary Derivatives

Suppose k = m = 1, so f : R → R. In this case the Frechet derivative is the same asthe ordinary derivative. One way to see that is to rewrite the definition of the ordinaryderivative.

f′(x0) = limh→0

f(x0 + h) − f(x0)

h

0 = limh→0

f(x0 + h) − f(x0)

h− f′(x0)

= limh→0

f(x0 + h) − f(x0) − hf′(x0)

h

= limh→0

f(x0 + h) − f(x0) − L(h)

h

where L(h) = f′(x0)h is the required linear function of h.

12 MATH METHODS

14.12 Matrix Form of the Derivative — The Jacobian

We’re about ready to write down the general form of the Frechet derivative for a C1

function.When dealing with derivatives of vector functions from Rk to Rm where m > 1, we

will have to be somewhat pedantic about how the vectors are written. Writing themproperly allows us to express certain relations as matrix products. Write them wrongly,and you get nonsense. Although we sometimes write a vector x ∈ Rm in casual fashionas (x1, . . . , xm), here it needs to be written as a column

x =

x1...

xm

.

When dealing with derivatives, we must use the formal column form. If f : Rk → Rm

is a vector function, the derivative must be a linear function from Rk → Rm. Thatmeans it can be represented by an m× k matrix with terms

[

Df(x0)]

ij=

∂fi

∂xj(x0)

Then we write the derivative of

f(x) =

f1(x)...

fm(x)

as Df(x0) =

∂f1/∂x1 ∂f1/∂x2 · · · ∂f1/∂xk∂f2/∂x1 ∂f2/∂x2 · · · ∂f2/∂xk

......

. . ....

∂fm/∂x1 ∂fm/∂x2 · · · ∂fm/∂xk

where all of the partial derivatives are evaluated at x0. The derivative matrix Df issometimes called the Jacobian derivative, Jacobian matrix, or just plain Jacobian. Thelinear function L(h) that the Jacobian represents is formed by multiplying the Jacobianby a vector h, obtaining a vector in Rm.


14.13 The Derivative: An Example

Consider the function f : R3 → R2.

f(x) =(

x1x22 + x1x3

x22 + x2

1x3 + x2x3

)

.

The derivative is a linear mapping from R3 → R2, and is represented by a 2× 3 matrix.

Df(x) =(

x22 + x3 2x1x2 x1

2x1x3 2x2 + x3 x21 + x2

)

We evaluate the derivative at x0 = (1, 1, 2)T , obtaining

Df

(112

)

=( 3 1 1

4 4 2

)

.

Finally, the linear function L : R3 → R2 is

L(h) =[

Dfx0

]

h =( 3 1 1

4 4 2

)

(

h1

h2

h3

)

.

14 MATH METHODS

14.14 A Different Kind of Matrix Derivative

Should we have a function f : Rk → Rm that starts as a row vector,(

f1(x), . . . , f(xK))

we will take the derivative to be k×m matrix Df = (D(fT ))T . In other words,

Df =

∂f1/∂x1 · · · ∂fm/∂x1

∂f1/∂x2 · · · ∂fm/∂x2...

. . ....

∂f1/∂xk · · · ∂fm/∂xk

.

In such a case, the linear mapping L(h) is not formed by post-multiplying Df by h,but by pre-multiplying by hT .

When we consider second derivatives of functions from Rm → R, the first derivativewill be a row vector, and we can use the same method as above to take its derivativethat will then be represented by an m×m matrix. It will define a bilinear functional.One of the vectors will be used to post-multiply D2f, the other will be transposed so itcan pre-multiply D2f. In combination, this gives us a bilinear form.


14.15 Frechet Derivatives and Approximation

Examining the definition, we see that the Frechet derivative Df(x0) defines a linearapproximation to the function f by

f(x0 + h) ≈ f(x0) + L(h) (14.15.1)

where L = Df(x0).To understand the approximation a bit better, we define the remainder R by

R(x0,h) = f(x0 + h) − f(x0) − L(h).

The remainder is the difference between the function f and its linear approximationf(x0) + L(h). By the definition of the derivative,

limh→0

R(x0,h)

‖h‖ = 0.

This can also be writtenR(x0,h)

‖h‖ = o(

‖h‖)

at 0.

We conclude that the linear approximation is fairly accurate near x0, and gets moreaccurate the closer we are to x0. In fact, the error term, the remainder, converges tozero enough faster that ‖h‖, that the ratio of them also converges to zero.

16 MATH METHODS

14.16 Marginal Rate of Substitution: Interpretation

We can use our knowledge of approximation to interpret the marginal rate of sub-stitution. Suppose we start at x0. Now consider the indifference curve through x,x : u(x) = u(x0). Let’s make a small change in xi, holding everything else constant.This moves us off the u(x) indifference curve. Now change xj to return us the the indif-ference curve. Since indifference curves slope downward (preferences are monotonic),there is a trade-off between goods i and j. They must have opposite signs.

We can now use the linear approximation. Define h by hi = ∆xi, hj = ∆xj, andhk = 0 for k 6= i, j, so h = ∆xiei + ∆xjej. Then

∆u ≈ Df(x0)h =∂u

∂xi∆xi +

∂u

∂xj∆xj = MUi∆xi + MUj ∆xj

It follows that MUi ∆xi + MUj∆xj ≈ 0, so

MRSij =MUi

MUj

≈ −∆xj

∆xi

In other words, if we plot a slice of the indifference curve in xi-xj space, with xi on thehorizontal axis, the marginal rate of substitution approximates the negative slope of thesecant through the points

(

x0 + ∆x, u(x0 + ∆x))

and(

x0, u(x0))

. Of course, if we let∆xi, ∆xj → 0, we obtain the negative slope of the tangent at u(x0).


14.17 Total Derivatives and Linear Functionals

When f : Rk → R, the derivative takes the simpler form

Df =

(

∂f

∂x1,∂f

∂x2, . . . ,

∂f

∂xk

)

(14.17.2)

Sometimes the derivative is written in what appears to be a quite different manner

df =∂f

∂x1dx1 +

∂f

∂x2dx2 + · · · +

∂f

∂xkdxk. (14.17.3)

They’re supposed to both be the derivative. What is going on here?Recall that the derivative of f is a linear function from Rk to R. As such, it can

properly be written as a covector, a row vector. That is what we have done in equation(14.17.2). But what about equation (14.17.3)?

The obvious interpretation is that we mean that dxi must the ith basis elemente∗i = (0, . . . , 0, 1, 0, . . . , 0) with a 1 in the ith position.But why is e∗

i written dxi? Consider the ith coordinate function xi(x) = xi, whichmaps Rk to R. Its derivative, naturally denoted dxi must be the linear functional e∗

i .This function maps Rk to R such that dxi(ej) = e∗

i (ej) is given by the Kronecker deltaδij. Either way we write it, the differentials of the such functions form a basis for rowvectors.

18 MATH METHODS

14.18 One-Forms

The expression used for df in equation (14.17.3) is a simple type of differential form. Afunctional linear combination of the basis vectors dxi is a called a 1-form. It is importantthat there are no products of the dxi terms and each term has a single dxi. Real-valuedfunctions are sometimes referred to a 0-forms.

One-forms are a linear combination of the basis vectors of (Rm)∗ where the coeffi-cients are functions. The functions are the derivatives ∂f/∂xi. One-forms produce alinear functional on Rm for every x In other words, a differential 1-form is a functionwhose values are linear functionals.


14.19 Derivatives and Approximation

Because the derivative of f is a linear approximation to f, if we feed Df(x0) a vectorsuch as

∆x = ∆x =

∆x1...

∆xm

=∑

i

∆xi ei,

we obtain

∆f ≈ Dfx0 ∆x

=m∑

i=1

[

∂f

∂xi(x0)dxi

(

m∑

j=1

∆xj ej

)]

=m∑

i=1

m∑

j=1

∂f

∂xi(x0) δij∆xj

=m∑

i=1

∂f

∂xi(x0)∆xi

which is a linear approximation to the change in f as x0 is replaced by x0 + ∆x (seealso equation (14.15.1)).

Of course, if we use the one-form df(∆x) and compute, we obtain the same answer.

20 MATH METHODS

14.20 Derivatives and Differential Forms

If we used the df version, we would get the same result. So why have two versions?Among other things, differential forms are used in integration. Products there appearin higher derivatives (k-forms) and correspond to area or volume elements. The appro-priate product here is something called the exterior or wedge product, which is bothalternating and multilinear, like determinants.

We also use products for higher Frechet derivatives, Dkf. These are k-multilinearfunctions, k-tensors. They are not alternating, and are not to be confused with thek-forms used in multi-dimensional integration. The k-forms are alternating, the kth

derivatives are not. Quite the contrary, terms such as ∂2f/

∂x2i appear and at times are

important. They would be zero if the tensors were alternating.


14.21 Curves and their Tangents

A curve in Rm is an m-tuple of continuous functions

x(t) =

x1(t)...

xm(t)

where each xi : I → R where I is a open subset of R. The functions are the coordinatefunctions of the curve x and t is a parameter describing the curve. Curves are allowedto cross or repeat themselves.

One of the simplest curves is a straight line. Recall that straight lines can be writtenin parametric form as

x(t) = x0 + tx1 (10.39.5.)

The curve x(t) = (cos t, sin t) repeatedly traces out a circle of radius one centeredat the origin. By adding a coordinate, we obtain y(t) = (cos t, sin t, t), which is aright-handed helix in R3.

If x is differentiable, we can define the tangent vector as

x′(t) =(

x′1(t), . . . , x′m(t))T

.

It describes the instantaneous rate of change of the curve, both direction and magnitude.When x(t) is the path of an actual object in motion and t is time, x′(t) is the velocity at

any time t and ‖x′(t)‖ is the speed. The second derivative x′′(t) =(

x′′1 (t), . . . , x′′m(t))T

isthe acceleration at time t. Here x′′i (t) denotes the ordinary second derivative d2xi/dt

2.

22 MATH METHODS

14.22 Examples of Curves

Our first example is a straight line.

Example 14.22.1: Straight Lines. The straight line x(t) = x0 + tx1 has tangent x′(t) =x1, the direction of the line. In this form, there is no acceleration along the line,x′′(t) = 0. The same line can be traced out in other ways, for example the curvex(t) = x0 + t3x1 visits all of the points on that straight line, but does it in a differentfashion, with tangent 2tx1. Except at t = 0, it points in the same direction, but hasa different magnitude, reflecting the variously slower and faster motion along the line.Moreover the acceleration is not zero, but x′′(t) = 2x1.

The second example circle around and around the unit circle about zero.

Example 14.22.2: Perpetual Circle. The perpetual circle is defined by x(t) =(cos t, sin t). It continually retraces the circle of radius 1 about the origin. Its tangentvector is x′(t) = (− sin t, cos t). You’ll notice that for circular motion, the tangent is or-thogonal to the curve, x′(t)·x(t) = 0. The acceleration x′′(t) = (− cos t,− sin t) = −x(t)always points toward the origin.

Our third example is a right-handed helix, meaning the it winds counter-clockwise.

Example 14.22.3: Right-handed Helix. The helix y(t) = (cos t, sin t, t) has tangenty′(t) = (− sin t, cos t, 1). It is not perpendicular to x(t) due to the x3 component.The acceleration is x′′(t) = (− sin t,−cost, 0) and points toward the x3-axis. The factthat the third component of acceleration is zero shows the constant motion along thex3-axis. The other components reflect the circular motion about it.


14.23 Regular Curves

We will often require that curves be sufficiently smooth. Such curves are called regular.

Regular Curve. A curve x(t) =(

x1(t), . . . , xm(t))T

in Rm is regular if each xi is contin-uously differentiable in t and x′(t) 6= 0 for all t.

Examples (14.22.1)-(14.22.3) are all regular.Regularity rules out sudden changes of direction.

Example 14.23.1: Curve with a Cusp. Let x(t) = t3, and y(t) = t2. Of course,x′(t) = (3t2, 2t)T . This curve is not regular at 0 because x(0) = (0, 0). It makes asudden change of direction there.

x

y

Figure 14.23.2: The point where the curve makes a sudden change of direction is called acusp.

It’s not possible to re-parameterize this curve to make it regular. The tangent changesfrom going straight down to straight up. Reversing the tangent requires that it be zeroat the reversal point, regardless of how the parameter is applied. One way to see this isthat if x′(t) is C1, so is x′2(t). For t < 0, x′2(t) is negative, and for t > 0, x′2(t) is positive.Continuity requires that x′2(0) = 0. The only way x can be regular at zero is if x′1(0) 6= 0.But in that case the tangent could not be straight up or down at 0.

24 MATH METHODS

14.24 Functions, Curves, and Derivatives

It is sometimes useful to evaluate the derivative of a function f defined along a curve. Ifx(t) is a regular curve in Rm and f : Rm → R, we can define g(t) = f x(t) = f

(

x(t))

.We can take the derivative of g in the following fashion, based on the chain rule:

g′(t) =∂f

∂x1

(

x(t))

x′1(t) +∂f

∂x2

(

x(t))

x′2(t)

+ · · · +∂f

∂xm

(

x(t))

x′m(t)

= Dfx(t) x′(t).

In the final formula, keep in mind that Df is a 1 ×m row vector and x′ is an m × 1column vector, so the matrix product is a number. Writing the vectors (x) and covectors(Df) in the right way insures everything lines up properly in the product.

As for the formula above, we state the relevant theorem, without proof.

Chain Rule I. Let x(t) =(

x1(t), . . . , xm(t))

be a C1 curve on an interval about t0 and f

a C1 function defined on a ball in Rm about x(t0). Then g(t) = f(

x(t))

is a C1 functionon an interval about t0 and

dg

dt(t0) =

∂f

∂x1

(

x(t0))

x′1(t0) + · · · +∂f

∂xm

(

x(t0))

x′m(t0)

= Dfx(t0) x′(t0).


14.25 Directional Derivatives and Gradients

One way to think about derivatives in a particular direction is to consider a curve thatgoes in that direction. We want a curve with x(t0) = x0 and x′(t0) = v. There are manysuch curves, one is the line through x0 in the direction v, given by x(t) = x0 + (t− t0)v.

If f : Rm → R, we can consider the composite function g(t) = f(

x(t))

and take itsderivative

df

dt(t0) = Dfx0 x

′(t0) = Dfx0 v.

In fact, Chain Rule I ensures that any curve with x(t0) = x0 and x′(t0) = v will havethe same derivative. We refer to this as the directional derivative of f in the directionv. Two notations sometimes used for the directional derivative are

∂f

∂v(x0) and Dvf(x0)

partial Frechet derivatives taken with respect to a subset of the variables. we prefer theformer notation as Dvf conflicts with our notation for

It is sometimes useful to regard the derivative of a real-valued function f on U ⊂ Rm

as a vector rather than a covector. That vector is called the gradient, and is written

∇f(x0) =

∂f∂x1

(x0)∂f∂x2

(x0)...

∂f∂xm

(x0)

where x0 ∈ U. Of course, ∇f(x0) = Df(x0)T . We can then write the directionalderivative as

∂f

∂v(x0) = ∇f(x0)·v.

26 MATH METHODS

14.26 The Chain Rule

The following theorem is called “Chain Rule IV” by Simon and Blume. We’ll just call itthe Chain Rule.

The Chain Rule. Let U and V be open subsets of Rk and Rℓ, respectively. Supposef : U → Rℓ and g : V → Rm are C1 functions, and that x0 ∈ U and y0 = f(x0) ∈ V.We will write y = f(x) and g(y).

Then h = g f is also C1 on an open ball about x0 with

Dxh(x0) = Dyg(y0) ×Dxf(x0)

In other words, the Jacobian derivative of h is the product of the Jacobian derivatives ofg and f.

The Chain Rule both asserts that the composite function h = g f is differentiableand gives a formula for its derivative.

We know that Dh is an k× ℓ matrix, and Df is an ℓ×m matrix, so Dh is an k×mmatrix. Unpacking the matrix product shows that

∂hi

∂xj=

m∑

k=1

(

∂gi

∂yk

)(

∂fk

∂xj

)

(14.26.4)

for all i = 1, . . . , k and j = 1, . . . ,m. Equation (14.26.4) can be written more fully as

∂hi

∂xj(x0) =

m∑

k=1

(

∂gi

∂yk

)

(y0)

(

∂fk

∂xj

)

(x0),

again for all i = 1, . . . , k and j = 1, . . . ,m. Keep in mind that y0 = f(x0).


14.27 Chain Rule for Direct and Indirect Variables

The Chain Rule has many applications beyond the obvious ones. Some functions willuse the same variables both directly and indirectly via another function.

Consider the functionΦ(x) = f

(

x, g(x))

.

Here x appears both directly in Φ, and indirectly via g. If x ∈ Rm, f takes m + 1arguments. Partition those arguments (x, y). Define a function h : Rm → Rm+1 by

h(x) =(

xg(x)

)

with Dh(x) =(

ImDxg

)

where Im is the m×m identity matrix.Then Φ(x) = f

(

h(x))

, so the Chain Rule tells us that

DxΦ = Df(x,y) ×Dxh

= Df(x,y) ×(

ImDxg

)

= DxΦ = Dxf + Dyf×Dxg.

28 MATH METHODS

14.28 An Application of the Chain Rule to Integrals

One useful application of the chain rule is to integrals with variable limits.

Example 14.28.1: Chain Rule and Integrals. Now consider let g : R → R and h : R2 →R be C1 and define

Φ(x) =

∫g(x)

0

h(x, t)dt.

Now define

f(x, y) =

∫y

0

h(x, t)dt

and apply the previous result to Φ(x) = f(

x, g(x))

. This calculation yields

Φ′(x) =d

dx

(∫g(x)

0

h(x, t)dt

)

= h(

x, g(x))

g′(x) +

∫g(x)

0

∂h

∂x(x, t)dt.

When g(x) = x, this formula reduces to

d

dx

(∫x

0

h(x, t)dt

)

= h(x, x) +

∫x

0

∂h

∂x(x, t)dt.

A similar method can be used on the case where both limits of integration are definedby functions.


14.29 Second Derivatives: The Hessian 10/14/21

Second derivatives are a little more complex than first derivatives and we will startwith the case where f : Rm → R because it is easier to see what is going on. In thatcase, the first-derivative is a row vector

(

∂f∂x1

(x) . . . ∂f∂xm

(x))

.

This was discussed in section 14.14, where we used a modified matrix representationof such derivatives. We apply that method here.

For functions that are twice continuously differentiable, we write the matrix of secondpartial derivatives, the Hessian matrix, or Hessian, as

D2f(x) =

∂2f(x)∂x2

1

∂2f(x)∂x1 ∂x2

· · · ∂2f(x)∂x1 ∂xm

∂2f(x)∂x2 ∂x1

∂2f(x)∂x2

2

· · · ∂2f(x)∂x2 ∂xm

......

...∂2f(x)∂xm ∂x1

∂2f(x)∂xm ∂x2

· · · ∂2f(x)∂x2

m

.

Where∂

2f

∂xk ∂xℓ=

∂

∂xk

(

∂f

∂xℓ

)

.

30 MATH METHODS

14.30 Another Notation

This is one of those times when it is useful to employ an alternate notation for partialderivatives.

fi =∂f

∂xi, fij = (fi)j =

∂2f

∂xj ∂xi=

∂

∂xj

(

∂f

∂xi

)

, etc.

Notice the reversal of order between the two notations, ij versus ji. The alternatenotation allows us to write the Hessian in the more compact form

D2f(x) =

f11(x) f12(x) · · · f1m(x)f21(x) f22(x) · · · f2m(x)

......

...fm1(x) fm2(x) · · · fmm(x)

.

When f is twice continuously differentiable, it does not matter which order is usedfor the second partial derivatives because

∂2f

∂xk ∂xℓ=

∂2f

∂xℓ ∂xk.

The derivative is the same either way.1 If f is not C2, the order in which we take partialderivatives may affect the result.

1 In the economics literature, this result is often called Young’s Theorem, although a host of mathemati-cians have worked on the problem, including Cauchy and Lagrange. My experience is that mathematicsbooks usually don’t name the theorem after anyone, although I have seen it called the Clairaut-SchwartzTheorem. Clairaut published a proof in 1740 that does not meet modern standards of rigor. The firstrigorous proof was due to H.A. Schwartz in 1873. The most commonly used proof is that of Jordanpublished in 1883. E.W. Hobson and W.H. Young later proved the theorem under weaker conditions,1907-1909. The name Young’s Theorem may derive from the economist R.G.D. Allen.


14.31 Interpreting The Hessian

So what exactly is the Hessian?The derivative of f : Rm → R is a mapping to a linear function from Rm → R. It

maps x to the linear functional Df(x), which takes a single argument Rm. We can writethe values of this linear functional for y ∈ Rm as the matrix product [Df(x)]y. TheHessian is a way of writing the derivative of the mapping x 7→ Df(x).

The second derivative at x is a linear function from Rm → (Rm)∗. We feed it a(column) vector (z ∈ Rm) and get a covector (row vector in (Rm)∗). We can then feedit a second column vector (y ∈ Rm), obtaining a number. The Hessian matrix makes itpossible to do this. The resulting bilinear map has the form

L(y, z) = zT[

D2f(x)]

y. (14.31.5)

It takes pairs of vectors (y, z), one to get a linear functional, and a second to produce anumber.

32 MATH METHODS

14.32 The Hessian is a Bilinear Form — A 2-Tensor

Writing equation (14.31.5) out, we obtain

L(y, z) = zT D2f(x)y

= (z1, . . . , zm)

f11(x) f12(x) · · · f1m(x)f21(x) f22(x) · · · f2m(x)

......

...fm1(x) fm2(x) · · · fmm(x)

y1

y2...

ym

=m∑

i,j=1

fij(x) ziyj.

Since D2f(x) is symmetric, it doesn’t really matter which vector goes where, but I havewritten it in the logical order. The vector y belongs to the linear functional, while thevector z combines with a linear function that makes to linear functionals.

The resulting mapping from Rm × Rm → R is bilinear, and is best thought of as a2-tensor, a linear mapping from Rm ⊗ Rm to R. As such we can write

D2f(x) =m∑

ij=1

fij(x)dxi ⊗ dxj

when we can use the outer product to write

L(y, z) =[

D2f(x)]

·(y⊗ z)

The dot product of the two matrices indicates we are multiplying the correspondingterm and adding, just like the ordinary dot product, but for matrices. This makes sensebecause linear functionals on Rm involve dot products. It shouldn’t be surprising tofind one here.

You may run across constructions where you vectorize the matrices. This is sometimesdone in econometrics. Then taking the dot product of the resulting vectors would giveyou the same result.


14.33 Taylor’s Formula: Preview

One important use of the Hessian is Taylor’s formula, which we will prove later.First, a couple of definitions. In a vector space, the line segment between x and y is

ℓ(x,y) = (1 − t)x + ty : 0 ≤ t ≤ 1. A set S is convex if it contains ℓ(x,y) wheneverx,y ∈ S.

First Order Taylor’s Formula. Let f : U → R be C2 on a convex open set U ⊂ Rm.Then for every x, x′ ∈ U, there is a y on the line segment connecting x and x′ such that

f(x′) = f(x) + Df(x)(x′ − x) +1

2(x′ − x)T

[

D2f(y)]

(x′ − x). (14.33.6)

In Taylor’s formula, not only is the Hessian symmetric, but we feed it the samevector on each side, so there is even more symmetry. If L(·, ·) is a bilinear form, thenQ(z) = L(z, z) is called a quadratic form. To see why, write it out. When L is theHessian, we get

m∑

ij=1

fij(x) zizj.

It is a purely second degree polynomial in the zi, hence the term quadratic. The Taylorapproximation in equation (30.6.2) consists of a constant term, a linear term, and aquadratic term, and for small changes in x, will better approximate f than the derivativealone does.

There are higher Taylor expansions that involve third and higher degree terms, termswhich are 3-tensors, 4-tensors, etc.

34 MATH METHODS

14.34 Higher Derivatives as Tensors

Let f : Rm → R be k times continuously differentiable with k > 2. What do its higherderivatives, those past the Hessian, look like. Well, we know that Dkf(x) must be ak-linear form. That means it can be written as linear map from (Rm)⊗k → R, a k-tensor.

Although perhaps a bit tedious, it’s not hard to figure out what Dkf(x) looks like,even though they do not have convenient matrix representations.

[

Dkf(x)]

(y1, . . . ,yk) =m∑

i1,... ,ik=1

fi1···ik (x)y1i1· · ·yk

ik

where yij is the jth component of yi. As written here, yj

ikis the ik component of yj.

This means that Dkf(x) can be written as the k-tensor

Dkf(x) =m∑

i1,... ,ik=1

fi1···ik(x)dxi1⊗ · · · ⊗ dxik .

30. CALCULUS OF SEVERAL VARIABLES II 35

30. Calculus of Several Variables II

30.1 Rolle’s Theorem

We’ve already covered Weierstrass’s Theorem, which was the first thing in section 30.1of Simon and Blume. One important consequence of Weierstrass’s Theorem is Rolle’sTheorem.

Rolle’s Theorem. Suppose f : [a, b] → R is continuous on [a, b] and continuouslydifferentiable on (a, b). If f(a) = f(b) = 0, there is a point c ∈ (a, b) with f′(c) = 0.

Proof. If f is constant on [a, b], then f(x) = 0 for all x ∈ [a, b] and so f′(x) = 0 for allx ∈ (a, b). Then any c ∈ (a, b) will do.

If f is not constant on [a, b], either there is d ∈ (a, b) with f(d) > 0 or a d ∈ (a, b)with f(d) < 0. In the former case, f has a maximum at some c ∈ (a, b) by Weierstrass’sTheorem. The first order necessary condition for an interior optimum on R showsf′(c) = 0 (remember your calculus!). In the latter case, f has a minimum at somec ∈ (a, b) by Weierstrass’s Theorem, and so f′(c) = 0 by the first order necessarycondition for an interior optimum. Either way, we are done.

x

y

f

b

bc1

c2b b

5

Figure 30.1.1: The function goes both above and below the axes, meaning there must be atleast two points in the closed interval [0, 4] that are extrema, obeying f′(c) = 0. In this casethere are exactly two, labeled c1 and c2.

36 MATH METHODS

30.2 Mean Value Theorem

The Mean Value Theorem generalizes Rolle’s Theorem, which is the key to the proofof the Mean Value Theorem.

Mean Value Theorem. Let f : I → R be a C1 function on an interval I ⊂ R. Then forany points a, b ∈ I with a < b there is a point c, a < c < b with

f′(c) =f(b) − f(a)

b− a, or equivalently, f(b) − f(a) = f′(c)(b− a).

x

y

S

slope f(b) − f(a)b− a

T

slope f′(c)

f

a bc

Figure 30.2.1: The Mean Value Theorem gives us a point c ∈ (a, b) where the slope of thetangent T to f at

(

c, f(c))

is equal to the slope of the secant S through(

a, f(a))

and(

b, f(b))

.


30.3 Proof of the Mean Value Theorem

Mean Value Theorem. Let f : I → R be a C1 function on an interval I ⊂ R. Then forany points a, b ∈ I with a < b there is a point c, a < c < b with

f′(c) =f(b) − f(a)

b− a, or equivalently, f(b) − f(a) = f′(c)(b− a).

Proof. Define a function g by

g(x) = f(x) − f(a) − f(b) − f(a)

b− a(x− a).

Then g(x) is the vertical distance between the secant line S and f(x). The distance iszero at both a and b:

g(a) = f(a) − f(a) − f(b) − f(a)

b− a(a− a) = 0

and

g(b) = f(b) − f(a) − f(b) − f(a)

b− a(b− a) = 0.

By Rolle’s Theorem, there is a c ∈ (a, b) with

0 = g′(c) = f′(c) − f(b) − f(a)

b− a

Then f′(c)(b− a) = f(b) − f(a), proving the result.

38 MATH METHODS

30.4 Mean Value Theorem on Rm

There is a version for Rm that follows directly from the basic Mean Value Theorem.

Theorem 30.4.1. Let f : U → R be a continuously differentiable function defined on anopen set U ⊂ Rm. Suppose a,b ∈ U with the line segment ℓ(a,b) ⊂ U. Then there isa point c ∈ ℓ(a,b) such that

f(b) − f(a) = Dfc(b− a)

Proof. Define g : [0, 1] → U by g(t) = a + t(b − a) and set h(t) = f(

g(t))

. Thenh : [0, 1] → R. By the Mean Value Theorem, there is a t∗ ∈ (0, 1) with

h(1) − h(0) = h′(t∗)(1 − 0) = h′(t∗).

Let c = g(t∗) = a + t∗(b− a). Then

f(b) − f(a) = h(1) − h(0)

= h′(t∗)

= Dtf(

g(t∗))

= Dxf(

g(t∗))

g′(t∗)

= Dfc(b− a).

This tells us that f(b) = f(a) + Df(c)(b− a) for some c on the line segment ℓ(b,a),which is useful for approximating f at a point b based on its value at a point a.


30.5 Taylor Polynomials

There is a generalization of the Mean Value Theorem using Taylor Polynomials. Let f(k)

denote the kth derivative of f

f(k)(x) =dkf

dxk(x)

where f(0)(a) = f(a).Recall that k! denotes k factorial which is defined inductively for non-negative integers

by 0! = 1 and k! = k(k− 1)!. Thus

k! = 1 · 2 · 3 · · · (k− 1) · k.

The gamma function extends the definition to the complex numbers, excepting thenon-positive integers. They are related by k! = Γ (k + 1). When z ∈ C has a positivereal part,

Γ (z) =

∫∞

0

xz−1e−x dx.

Taylor Polynomial. The kth order Taylor polynomial is

Pk(x;a) = f(a) + f′(a)(x − a) +1

2!f′′(a)(x − a)2 + · · · +

1

k!f(k)(a)(x− a)k

=k∑

n=0

1

n!f(n)(a)(x − a)n.

The fact that Pk(a;a) = f(a) will be useful when we prove various forms of Taylor’sFormula.

Here are the first several Taylor polynomials:

P0(x;a) = f(x)

P1(x;a) = f(x) + f′(a)(x − a)

P2(x;a) = f(x) + f′(a)(x − a) +1

2f′′(a)(x− a)2

P3(x;a) = f(x) + f′(a)(x − a) +1

2f′′(a)(x− a)2 +

1

6f(3)(a)(x − a)3

40 MATH METHODS

30.6 First Order Taylor’s Formula in R

To see how the proof of Taylor’s formula works, we start with the first order Taylor’sformula. (The Mean Value Theorem is the zeroth order Taylor’s formula.)

Theorem 30.6.1. Let f : I → R be a C2 function defined on an interval in R. If a, b ∈ Ithere exists a c ∈ (a, b) such that

f(b) = f(a) + f′(a)(b − a) +1

2f′′(c)(b− a)2. (30.6.1)

= P1(b) +1

2f′′(c)(b− a)2

Proof. Define

g(x) = f(x) − P1(x;a) −M(x− a)2

= f(x) − [f(a) + f′(a)(x − a)] −M(x− a)2. (30.6.2)

By definition, g(a) = 0. Now choose M so that g(b) = 0. Then

M =1

(b− a)2[

f(b) − f(a) − f′(a)(b − a)]

By Rolle’s Theorem, there is a c1 ∈ (a, b) with g′(c1) = 0. Now

g′(x) = f′(x) − f′(a) − 2M(x− a)

so g′(a) = f′(a) − f′(a) = 0. Both g′(a) = g′(c1) = 0, so we apply Rolle’s Theoremagain, this time to g′.

From Rolle’s Theorem we obtain a c ∈ (a, c1) ⊂ (a, b) with g′′(c) = 0. Now

g′′(x) = f′′(x) − 2M

so f′′(c) = 2M. In equation (30.6.2), substitute M = f′′(c)/2 and set x = b to obtainequation (30.6.1).


30.7 kth Order Taylor’s Formula in R

We now consider the kth order Taylor’s formula.

Taylor’s Formula in R. Let f : I → R be a Ck+1 function defined on an interval in R. Ifa, b ∈ I there exists a c ∈ (a, b) such that

f(b) = f(a) + f′(a)(b − a) +1

2!f′′(a)(b− a)2 + · · · +

1

k!f(k)(a)(b − a)k

+1

(k + 1)!f(k+1)(c)(b− a)k+1 (30.7.3)

= Pk(b) +1

(k + 1)!f(k+1)(c)(b− a)k+1

PROOF SKIPPED

Proof. Define

g(x) = f(x) − Pk(x;a) −M(x− a)k+1

= f(x) − f(a) − f′(a)(x− a) − 1

2!f′′(a)(x− a)2

− · · · − 1

k!f(k)(a)(x − a)k + M(x− a)k+1 (30.7.4)

where

M =1

(b − a)k+1

[

f(b) − Pk(b;a)]

As before, g(a) = f(a) − Pk(a;a) = f(a) − f(a) = 0 and M has been chosen so thatg(b) = f(b)−Pk(b;a)−

[

f(b)−Pk(b;a)]

= 0. By Rolle’s Theorem, there is a c1 ∈ (a, b)with g′(c1) = 0.

Now

g′(x) = f′(x) − f′(a) − f′′(a)(x − a) − · · · − 1

(k− 1)!f(k)(a)(x − a)k−1

+ (k + 1)M(x − a)k.

Then g′(a) = f′(a) − f′(a) = 0. A second application of Rolle’s Theorem, now to g′,yields a c2 ∈ (a, c1) ⊂ (a, b) with g′′(c2) = 0.

(Proof continues on next page...)

42 MATH METHODS

30.8 Taylor’s Formula Proof Part II

PROOF SKIPPED

Remainder of Proof. Computing g′′ we obtain

g′′(x) = f′′(x) − f′′(a) − · · · − 1

(k− 2)!f(k)(a)(x− a)k−2

− (k + 1)kM(x− a)k−1.

It follows that g′′(a) = f′′(a) − f′′(a) = 0. A third application of Rolle’s Theorem, nowto g′′, yields a c3 ∈ (a, c2) ⊂ (a, b) with g′′′(c3) = 0.

We continue applying Rolle’s Theorem to successive derivatives until we eventuallyget to g(k)(x) with g(k)(ck) = 0 and

g(k)(x) = f(k)(x) − f(k)(a) − (k + 1)!M(x− a).

It follows that g(k)(a) = f(k)(a) − f(k)(a) = 0, so we apply Rolle’s Theorem one last timeto find a c ∈ (a, ck) ⊂ (a, b) with g(k+1)(ck) = 0.

We computeg(k+1)(x) = f(k+1)(x) − (k + 1)!M.

Then f(k+1)(c) = (k + 1)!M, so M = f(k+1)(c)/(k + 1)!.In equation (30.7.4), substitute M = f(k+1)(c)/(k + 1)! and set x = b to obtain

equation (30.7.3).


30.9 Big O, Little o

Big O and little o are notations used to describe the asymptotic behavior of functions.They can be applied at infinity, or at any finite point such as 0. The “O” stands fororder, and indicates that one function is the same order as the other. They are usefulfor describing the precision of estimates.1

Thus f(x) = O(

g(x))

as x → ∞ means there is an M > 0 such that

|f(x)|

g(x)≤ M

for x large enough.Similarly f(x) = O(x3) at 0 means there is an M > 0 with

|f(x)|

x3≤ M

for x near 0.Thus

10 − 1

x= O(1) as x → ∞.

Big O is used when the ratio of two functions is bounded. We use little o when theratio converges to zero. So f(x) = o

(

g(x))

at infinity means that for every ε > 0, thereis a K such that

|f(x)|

g(x)< ε

for all x > K. The definition at any finite point is similar.When we say R(x) = o

(

|x|k)

at x = 0, it means that for every ε > 0, there is a δ > 0with

|R(x)| < ε|x|k

for |x| < δ. In this case f converges to zero enough faster than |x|k that the ratio alsoconverges to zero.

1 This type of asymptotic notation is known as the Bachman-Landau notation. Big O was introduced byPaul Bachman in 1894, Edmund Landau proposed the little o notation in 1909. There are other variantsthat are less commonly used, such as Hardy and Littlewood’s Ω notation.

44 MATH METHODS

30.10 Big O and Little o as Limits SKIPPED

The definitions of big O and little o can also be stated in terms of limits.

Big O. We write f(x) = O(

g(x))

as x → a to mean

lim supx→a

|f(x)|

g(x)< ∞.

The case a = ±∞ is allowed.

That used the limit superior.

Limit Superior. The limit superior or lim sup is defined by

lim supx→a

f(x) = limε→0

(

supf(x) : x ∈ Bε(a))

when a is finite, and

lim supx→∞

f(x) = limk→∞

(

supf(x) : x > k)

when a = +∞.

The definition is a little simpler for sequences. When we have a sequence an,

lim supn→∞

= limn→∞

(

supam : m ≥ n)

.

Limit Inferior. The limit inferior, lim inf, is defined analogously. E.g.,

lim infn→∞

an = limn→∞

(

infam : m ≥ n)

.

As for little o, we use the same quotient, but here the limit decreases to 0. In thatcase, it is enough to use the limit itself rather than the limit superior.

Little o. We write f(x) = o(

g(x))

as x → a when

limx→a

|f(x)|

g(x)= 0.

The case a = ±∞ is allowed.


30.11 The Remainder in Taylor’s Formula SKIPPED

Define the kth remainder term by

Rk(x;a) = f(x) − Pk(x;a).

We can use Taylor’s formula to write

Rk(x;a)

(x− a)k=

1

k!

f(k+1)(c)(x− a)k+1

(x− a)k=

1

k!f(k+1)(c)(x− a).

As x → a, c → a, so the limit of the remainder is

limx→a

Rk(x;a) =1

k!f(k+1)(a)(a− a) = 0,

or in the little o notation,Rk(x;a) = o

(

|x− a|k)

.

This shows that Taylor’s formula is a good approximation of f for x near a.

46 MATH METHODS

30.12 Example: A Power Series

Example 30.12.1: Power Series. In some cases, the approximation is perfect ask → ∞, and we get a convergent power series. Consider f(x) = sin x and set a = 0.Then f(2k)(0) = 0, f(4k+1)(0) = +1 and f(4k+3)(0) = −1, so the Taylor expansion for sin xis

sin x = x− x3

3!+

x5

5!− x7

7!+ · · · =

∞∑

n=0

(−1)n

(2n + 1)!x2n+1.

Applying the ratio test, we find that this series converges absolutely for any x ∈ R

because∣

∣

∣

∣

x2n+3

(2n + 3)!

∣

∣

∣

∣

·∣

∣

∣

∣

(2n + 1)

x2n+1

∣

∣

∣

∣

=

∣

∣

∣

∣

x2

(2n + 2)(2n + 3)

∣

∣

∣

∣

→ 0

as n → ∞ for every real x.The remainder obeys

|R2k+1| =|x|2k+1

k!→ 0 as k → ∞,

so the power series converges to sin x. The convergence is uniform on any compactinterval.


30.13 Taylor’s Formula in Rm

Just as we did with the Mean Value Theorem, we can derive Taylor’s Formula in Rm

from the R1 version. We will use the shorthand notation[

Dkfa]

h⊗k to denote thek-tensor

[

Dkfa]

(h, . . . ,h) that is the kth derivative applied to h⊗ · · · ⊗ h ∈ (Rm)⊗k.

Taylor’s Formula in Rm. Let f : U → R be a Ck+1 function defined on an open set inRm. Suppose that for every a,b ∈ U, ℓ(a,b) ⊂ U. Then for all a,b ∈ U there exists ac ∈ ℓ(a,b) such that

f(b) = f(a) +[

Dfa]

(b− a) +1

2!

[

D2fa]

(b− a)⊗2 + · · · +1

k!

[

Dkfa]

(b− a)⊗k

+1

(k + 1)!

[

Dk+1fc]

(b− a)⊗(k+1) (30.13.5)

PROOF SKIPPED

Proof. We will piggyback off Taylor’s Formula in R. Define

φ(t) = f(

(1 − t)a + tb)

= f(

a + t(b− a))

.

Since U is open, φ : I → R is a C(k) function defined on an open interval I ⊃ [0, 1].We can apply Taylor’s Formula on the interval I between 0 and 1 to find

φ(1) =k∑

n=0

1

n!

[

φ(n)(0)]

1n +1

(k + 1)!

[

φ(k+1)(t∗)]

1k+1.

for some t∗ ∈ (0, 1). Then we apply the Chain Rule to find

φ′(t) =[

Df(

a + t(b− a))]

(b− a),

φ′′(t) =[

D2f(

a + t(b− a))]

(b− a)⊗2,

φ′′′(t) =[

D3f(

a + t(b− a))]

(b− a)⊗3

etc.

Setting t = 0 we obtain equation (30.13.5).

48 MATH METHODS

30.14 The Remainder Term in Rm

Again, the kth remainder term is Rk(x;a) = f(x) − Pk(x;a), so

Rk(x;a) =1

k!

[

D(k+1)fc]

(x− a)⊗(k+1).

Dividing by ‖x− a‖k, we obtain

Rk(x;a)

‖x− a‖k =1

k!

[

D(k+1)fc]

(x− a)⊗(k+1)

‖x− a‖k

Let u be the unit vector (x− a)/‖x− a‖. Then

Rk(x;a)

‖x− a‖k =1

k!

[

D(k+1)fc]

u⊗(k+1)‖x− a‖

As x → a, c → a, and we findRk(x;a)

‖x− a‖k → 0

as x → a. AlternativelyRk(x;a) = o

(

‖x− a‖k)

as ‖x− a‖ → a. This means that the remainder goes to zero as x → a enough fasterthan ‖x− a‖k → 0 that their ratio converges to zero.

29.3. CONNECTED SETS 49

29.3. Connected Sets

Our last collection of important topological concepts relates to connected sets.Roughly speaking, a connected set is a set that is a single contiguous piece. Beforedefining connectedness, we need another definition.

29.22 Relative Topology

We start by defining the relative or subspace topology.

Relative (Subspace) Topology. Let (X,T) be a topological space and S a subset of X. Therelative or subspace topology on S is defined by TS = S ∩ U : U ∈ T and (S,TS) iscalled a subspace of X.

The relative topology is the weakest topology where the inclusion map iS : S → X,defined by iS(x) = x, is continuous. In any topology where it is continuous, the setsU∩S must be open. The relative topology TS demands exactly that—no more, no less.The relative topology on S makes S into a topological space (in this case a subspace).When (X, d) is a metric space, the subspace topology is equivalent to (S, d).

Sets of the form U∩ S with U open in X are referred to as relatively open and sets ofthe form F ∩ S with F closed in X are called relatively closed.

50 MATH METHODS

29.23 Embedding R in R2

Example 29.23.1: Embedding R in R2. When R2 has the usual topology, we canembed R into R2 as A = x ∈ R2 : x2 = 0. The relative topology on A is the same asthe usual topology on R. This is illustrated in Figure 29.23.2.

bc bcA

x1

x2

Figure 29.23.2: The intersection of an open ball in R2 and the line A = x : x2 = 0 is anopen interval in A. Had we used a closed ball, we would have gotten a closed interval.


29.24 Relative Complements

The relative complement of A in S is defined by S \A = x ∈ S : x /∈ A = S ∩Ac.It’s easy to see that the relative complement of a relatively open set is relatively closed

and vice-versa. It is easy to show that the relative complements of relatively open setsare relatively closed and vice-versa.

Theorem 29.24.1. Let S ⊂ X have the relative topology induced by (X,T). Then A isthe relative complement of an open set if and only if A = S ∩ F for some set F that isclosed in X.

Proof. Let A = S \U = S∩Uc for some open set U. Then A = S∩ F where F = Uc

is closed.Conversely, if A = S ∩ F for some closed set F, then A = S \ Fc. Then U = Fc is

open and A = S \U.

52 MATH METHODS

29.25 Connected Sets 10/15/20

Connected and Disconnected Sets. Let (X,T) be a topological space. The space X isconnected if there do not exist two disjoint non-empty open sets U and V that obeyX = U ∪ V. If such sets do exist, we say they disconnect X and that X is disconnected

A subset S of X is connected it if is connected in the relative topology.

These definitions can be recast in terms of closed sets. Theorem 29.25.1 appliesregardless of whether we are using the base topology on X, or if we are considering asubset with the relative topology.

Theorem 29.25.1. A space X is disconnected if and only if there are non-empty disjointclosed subsets A and B that cover X.

Proof. The set X is disconnected if and only if there are non-empty open sets U andV with U ∩ V = ∅ and U ∪ V = X. Taking complements, and defining the closed setsA = Uc and B = Vc, we find this holds if and only if A ∪ B = Uc ∪ Vc = X andA∩B = ∅. Also there are u ∈ U and v ∈ V, if and only if u /∈ Uc = A and v /∈ Vc = B.Since the sets cover X, that is equivalent to A and B being non-empty.


29.26 Characterizing Relative Disconnections

Let S have the relative topology and U,V be relatively open sets that disconnect S.What can we say about the open sets in X that generate them?

Theorem 29.26.1. Let (X,T) be a topological space and S ⊂ X. Suppose U and V arerelatively open sets that disconnect S. Then there are open sets U′ and V ′ in X obeying

1. U′ ∩ S and V ′ ∩ S are both non-empty.2. S ⊂ U′ ∪ V ′.3. U′ ∩ V ′ ∩ S = ∅.

Conversely, if U′ and V ′ are open sets obeying (1)–(3), U = U′ ∩ S and V = V ′ ∩ Sdisconnect S in the relative topology.

Proof. By the definition of the relative topology, there are open sets U′, V ′ ⊂ X withU = U′ ∩ S and V = V ′ ∩ S. (1) Now U′ ∩ S and V ′ ∩ S are U and V, which arenon-empty. (2) Because U and V are disjoint, U′ ∩ V ′ ∩ S = ∅. (3) Finally, U and Vcover S, so (U′ ∩ S) ∪ (V ′ ∩ S) = S. This follows if S ⊂ U′ ∪ V ′.

We now prove the converse. By (1), U = U′ ∩ S and V = V ′ ∩ S are are non-empty.By (2), U and V are disjoint. By (3), their union is all of S. Since U and V are alsorelatively open, they disconnect S.

Notice that it need not be the case that U′∩V ′ is empty, only that it contain no pointsin S.

There’s a similar result for closed sets. The proof is quite similar, and has beenomitted.

Theorem 29.26.2. Let (X,T) be a topological space and S ⊂ X. Suppose A and B arerelatively closed sets that disconnect S. Then there are closed sets A′ and B′ in X obeying

1. A′ ∩ S and B′ ∩ S are both non-empty.2. S ⊂ A′ ∪ B′.3. A′ ∩ B′ ∩ S = ∅.

Conversely, if A′ and B′ are closed sets obeying (1)–(3), A = A′ ∩ S and B = B′ ∩ Sdisconnect S in the relative topology.

Intuitively, a set is connected if there are no breaks in the set, if it is all one piece.

Example 29.26.3: A Disconnected Set. A set such as S = [0, 1) ∪ (1, 2] ⊂ R is notconnected. One disconnection is A = (−1, 1) and B = (1, 3). Both sets are open, bothhave non-empty intersection with S. Here A ∩ S = [0, 1) 6= ∅ and B ∩ S = (1, 2] 6= ∅.The set S is covered because A ∪ B = (−1, 1) ∪ (1, 3) ⊃ S. Finally, the sets are disjoint:A ∩ B = ∅.

54 MATH METHODS

29.27 Intervals are Connected

In contrast to the previous example, any interval in the real line is a connected set.

Proposition 29.27.1. Any interval I in R is connected.

Proof. Let I be an interval, possibly infinite. We prove this by contradiction. SupposeI is not connected, then there are non-empty relatively closed sets A and B thatdisconnect I. Take a ∈ A and b ∈ B. We label the sets so that a < b and consider theinterval [a, b]. Note that [a, b] ⊂ I because I is an interval and a, b ∈ I.

Define z = sup([a, b] ∩A). The set [a, b] ∩A is non-empty as a ∈ A. Since the setis bounded, the supremum will exist. By construction, a ≤ z ≤ b, so z ∈ [a, b] ⊂ I.

Because A and [a, b] are relatively closed,

z ∈ cl(

[a, b] ∩A)

= [a, b] ∩A ⊂ A.

Of course z 6= b ∈ B because A ∩ B is empty.Since z is the upper bound of A ∩ [a, b], any w ∈ (z, b] must be in B. In particular,

z + 1n∈ B for n large enough.

Letting n → ∞ and using the fact that B is relatively closed, we find z ∈ B. Butz ∈ A so it cannot also be in B. This contradiction means that there are no sets Aand B that disconnect I. Therefore, I is connected.


29.28 Totally Disconnected Sets

Totally disconnected sets are the opposite extreme from connected sets. They containno connected subsets bigger than a single point. A set is totally disconnected if its onlyconnected subsets are the trivial ones—singletons and the empty set. Equivalently, aset is totally disconnected if you can disconnect any two distinct points in the set.

Example 29.28.1: The Rationals are Totally Disconnected. The rational numbers Q

provide an example of total disconnection. Suppose a is an irrational number (e.g.,√2, π). Then A = (−∞, a] ∩ Q and B = Q ∩ [a,+∞) are relatively closed sets in

Q that disconnect Q. Moreover, you can disconnect any subset of Q that contains atleast two distinct points in the same fashion. That means that the set Q ⊂ R is totallydisconnected.

Another totally disconnected set is the Cantor set of Example 12.33.1.

Example 29.28.2: The Cantor Set is Totally Disconnected. Suppose x < y are distinctelements of the Cantor set. As we saw, both can be written as ternary numbers consistingentirely of 0’s and 1’s. Take the first digit that differs, call it digit k. Digit k is 1 in x, 3in y. Let z have ternary expansion identical to x, except that the kth digit is 2. Thenx < z and y > z. Let A = C∩ (−∞, z] and B = C∩ [z,+∞). Then A and B are closedsets that disconnect C.

56 MATH METHODS

29.29 Connected Components of Disconnected Sets

Let (X,T) be a topological space that is not connected. If a subspace S is connected,we can ask it if is there are any larger connected subspaces that include it. If there arenot, then we call S a connected component or just a component of X.

Given a connected subset S, a maximal connected subset containing S can be con-structed by taking the union of all connected subsets of X that contain S. The key step inshowing this is that the union of two connected subsets containing S is also connected.In fact, we can show more than we need. If two connected subsets share even a singlepoint, their union is connected.

Theorem 29.29.1. Suppose A and B are connected subsets of (X,T) containing a pointx. Then A ∪ B is connected.

Proof. Suppose that A ∪ B can be disconnected by relatively open sets U and V.Label the sets U and V so x ∈ U. Keep in mind that x ∈ A∩B and take y ∈ V∩ (A∪B).There are two possibilities: y ∈ A or y ∈ B.

Suppose y ∈ A. But then both A ∩ U and A ∩ V are non-empty, so U and Vdisconnect A, which is impossible.

Otherwise, y ∈ B. Now x ∈ U and y ∈ V, so both B∩U and B∩V are non-empty.Then U and V disconnect the connected set B, which is also impossible.

As both possibilites are impossible, we cannot disconnect A ∪ B.

Earlier we considered the space [0, 1) ∪ (1, 2]. It has two connected components,[0, 1) and (1, 2]. Both sets are connected because they are intervals, and we add evena single point of one interval to the other, we get a disconnected set.


29.30 Continuity and Connectedness

Our next result is that the continuous image of a connected set is connected. This meansthat we can make connected sets from other connected sets by applying continuousfunctions to them.

Theorem 29.30.1. Suppose f : X → Y is continuous and S ⊂ X is connected. Then f(S)is connected.

Proof. Suppose not. Let A, B be relatively closed sets that disconnect f(S). Nowf−1(A) and f−1(B) are relatively closed because f is continuous. Moreover,

f−1(A) ∩ f−1(B) = f−1(A ∩ B) = ∅

andf−1(A) ∪ f−1(B) = f−1(A ∪ B) ⊃ f−1

(

f(S))

= S.

Thus f−1(A) and f−1(B) disconnect S. As this is impossible, f(S) cannot be discon-nected.

58 MATH METHODS

29.31 Path-Connected Sets

It is often easier to show a set is connected by showing it is path-connected. A pathfrom a to b in S is a continuous function f : [0, 1] → S with f(0) = a and f(1) = b.

Path-connected. A set S is path-connected if for every a, b ∈ S there is a path from ato b in S.

Path-connected sets are connected.

Proposition 29.31.1. Any path-connected set is connected.

Proof. Suppose S is path-connected andA, B are closed sets with S so that S ⊂ A∪B.Choose a ∈ A ∩ S and b ∈ B ∩ S and let f be a path between them in S.

Now consider f−1(A) and f−1(B). These are closed sets in [0, 1] with 0 ∈ f−1(A) and1 ∈ f−1(B). Moreover, f−1(A)∪ f−1(B) = f−1(A∪B) ⊃ [0, 1]. Since [0, 1] is connected,there is some x ∈ f−1(A)∩ f−1(B). It follows that f(x) ∈ A∩B and that A and B cannotdisconnect S. Thus S is connected.

In some spaces the connected and path-connected sets are the same. This happensin R, where the only connected sets are the intervals. This includes trivial intervalssuch as [a, a] and infinite intervals like (0,+∞). Intervals are always path-connected,meaning that the set of connected subsets and the set of path-connected subsets areidentical in R.


29.32 Convex and Star-shaped Sets

A convex set is one that contains the line segment between any pair of its points.

Convex Set. Let V be a vector space. A set S ⊂ V is convex if for every x,y ∈ S,ℓ(x,y) = (1 − t)x + ty : 0 ≤ t ≤ 1 ⊂ S.

A star-shaped set is a set where all points can be connected to a special point, thestar point, using a line segment.

Star-shaped Set. A set S ⊂ Rm is star-shaped if there is a point x0 ∈ S so that for any x,ℓ(x0, x) ⊂ S. We then say S is star-shaped with respect to x0. The point x0 is called astar point.

It’s easy to show that convex sets are star-shaped.

Proposition 29.32.1. Any convex set S is star-shaped and every point of it is a star point.

Proof. Let x0 be any point in S. Since S is convex, ℓ(x0,y) ⊂ S for any y ∈ S,showing that S is star-shaped and that any point x0 is a star point for S.

AB

Cb bD

E

Figure 29.32.2: The sets A and B are convex. The sets C and D are not convex, asdemonstrated by the red line segments that must leave the set to connect points. The sets Cand D are star-shaped with respect to the heavy dots. The annulus E is neither convex norstar-shaped, although it is connected.

60 MATH METHODS

29.33 Convex and Star-shaped Sets are Connected

An immediate corollary of the definitions together with Proposition 29.31.1 is thatconvex sets are connected.

Corollary 29.33.1. Any convex set is connected.

Proof. If S is convex and x, y ∈ S, the function f(t) = (1 − t)x + ty is a path from xto y in S.

Star-shaped sets are also connected.

Corollary 29.33.2. Any star-shaped set is connected.

Proof. Given x,y ∈ S, the path f(t) = (1 − 2t)x + 2tx0 for t ∈ [0, 1/2] andf(t) = (2 − 2t)x0 + (2t− 1)y for t ∈ [1/2, 1] is a path from x to y that passes throughthe star point x0.


29.34 Example: Connected, But Not Path-Connected

Path-connectedness is quite useful because it is often easier to show a set is connectedby showing it is path-connected rather than trying to directly use the definition ofconnectedness. There are limitations to this approach because the converse is not true.A set can be connected without being path-connected.

Example 29.34.1: Topologist’s Sine Curve. Consider R2 and let S = (0, y) : −1 ≤y ≤ 1∪ (x, sin(1/x) : x > 0. This is a variant of the topologist’s sine curve of Example13.7.1. The set S is connected but not path-connected.

Suppose, by way of contradiction, that S is path-connected. Then there is acontinuous path in S, f(t) =

(

x(t), y(t))

, with f(0) = (0, 0) and f(1) = (1/π, 0). Thecomponents of f, x and y, inherit continuity from f.

By the Intermediate Value Theorem, there is a t1, 0 < t1 < 1 with x(t1) = 2/3π.Then there is a t2, 0 < t2 < t1 with x(t2) = 2/5π. Continue this process to obtaina decreasing sequence tn with tn → 0 and x(tn) = 2/(2n + 1)π. Since tn is adecreasing sequence that is bounded below, it converges to some t0.

By continuity, y(tn) → y(t0). But this is impossible because y(tn) = +1 when n iseven and −1 when n is odd. This contradiction shows there is no such path f. Theset S cannot be path connected.

It is easy to see that the portion of S with x > 0 is path-connected, as is (0, y) : −1 ≤y ≤ 1. This means that the only possible disconnection is into A = (0, y) : −1 ≤ y ≤1 and B = (x, sin(1/x) : x > 0. This fails because B is not relatively closed. It doesnot contain limn(1/2πn, 0) = (0, 0). Therefore S is connected.

Figure 29.34.2: Topologist’s Sine Curve

62 MATH METHODS

29.35 The Intermediate Value Theorem

An important consequence of Theorem 29.30.1 is the Intermediate Value Theorem,which says that if a continuous function defined on an interval takes two values, it alsotakes all values in between.

Intermediate Value Theorem. If f : [a, b] → R is continuous, and y is a number be-tween f(a) and f(b), then there is at least one c ∈ [a, b] with f(c) = y.

Proof. We may assume f(a) < f(b) without loss of generality. We proceed bycontradiction. If no such c exists, (−∞, y] and [y,+∞) disconnect f([a, b]). This isimpossible by Proposition 29.30.1, and so such a c must exist.


29.36 A Simple Fixed Point Theorem10/19/21

A function f : X → X has a fixed point if there is an x∗ ∈ X with f(x∗) = x∗. TheIntermediate Value Theorem can be used to show that any continuous function mappingthe unit interval [0, 1] to itself has a fixed point.

Theorem 29.36.1. Let f : [0, 1] → [0, 1] be a continuous function. Then there is x∗ ∈[0, 1] with f(x∗) = x∗.

Proof. If f(0) = 0 or f(1) = 1, we are done.Otherwise, define g(x) = f(x) − x. Then g is continuous with g(0) > 0 and g(1) < 0.

By the Intermediate Value Theorem, there is a x∗, 0 < x∗ < 1, with g(x∗) = 0. Butthen f(x∗) = x∗ and we are done.

64 MATH METHODS

29.37 Contraction Mappings

We temporarily put aside connected sets to prove a stronger fixed point theorem,Banach’s Contraction Mapping Theorem. This powerful theorem can be used to provethe Inverse and Implicit Function Theorems of S&B Chapter 15. It can be used toshow the existence of solutions to differential equations. It has economic applications,including finding solutions to Bellman’s dynamic programming equation.

Contraction. Let (X, d) be a metric space. A function f : X → X is called a contractionif there is an r < 1 with d

(

f(x), f(y))

≤ r d(x, y) for every x, y ∈ X.

In other words, a mapping is a contraction if it the images of any two points areuniformly closer together than the original two points.


29.38 Contraction Mapping Theorem

Contraction Mapping Theeorem. Let f be a contraction on a complete metric space(X, d). Then f has a unique fixed point x∗. Moreover, take x0 ∈ X and define xn =f(xn−1) for n = 1, 2, . . . . Then xn → x∗.

Proof. First we show uniqueness. Suppose there are two fixed points x∗ and y∗.Then

d(x∗, y∗) = d(

f(x∗), f(y∗))

≤ r d(x∗, y∗).

This implies d(x∗, y∗) = 0, so x∗ = y∗.Consider the sequence xn given in the statement of the theorem. I claim that

d(xn+1, xn) ≤ rnd(x1, x0). We prove this by induction. It is true for n = 1 because

d(x2, x1) = d(

f(x1), f(x0))

≤ r d(x1, x0).

Also, if it is true for n,

d(xn+2, xn+1) = d(

f(xn+1), f(xn))

≤ r d(xn+1, xn) ≤ rn+1d(x1, x0)

shows it is true for n + 1. By induction it follows that it is true for all n = 1, 2, . . . .Suppose m ≥ n. Then

d(xm, xn) ≤m−n−1∑

i=0

d(xn+i+1, xn+i)

≤m−n−1∑

i=0

rn+id(x1, x0)

=rn

1 − rd(x1, x0).

Since the right hand side converges to zero as n → ∞, we conclude xn is a Cauchysequence. By completeness of X the sequence has a limit x∗. The function f iscontinuous, so

f(x∗) = f(

limn

xn)

= limn

f(xn) = lim xn+1 = x∗

showing that x∗ is the unique fixed point of f.

October 23, 2021

Copyright c©2021 by John H. Boyd III: Department of Economics, Florida InternationalUniversity, Miami, FL 33199

Documents

14. Calculus of Several Variables