MULTIVARIABLE CALCULUS Contents...MULTIVARIABLE CALCULUS T.K.SUBRAHMONIAN MOOTHATHU Contents 1. A few remarks about Rn 2 2. Multivariable diﬀerentiation: deﬁnitions 6 3. Multivariable

MULTIVARIABLE CALCULUS

T.K.SUBRAHMONIAN MOOTHATHU

Contents

1. A few remarks about Rn 2

2. Multivariable differentiation: definitions 6

3. Multivariable differentiation: properties 12

4. Higher order partial derivatives 18

5. Inverse function theorem and Implicit function theorem 23

6. Tangent spaces and Lagrange’s multiplier method 27

7. Multivariable Riemann integration over a box 31

8. Iterated integrals and Fubini’s theorem 38

9. Multivariable Riemann integration over a Jordan measurable set 41

10. Change of variable 46

11. Polar, cylindrical, and spherical coordinates 51

12. Line integrals 53

13. Circulation density and Green’s theorem 57

14. Surface integrals 60

15. Divergence and curl 67

16. Stokes’ theorem 69

17. Gauss’ divergence theorem 72

The basic idea in Calculus is to approximate smooth objects locally by linear objects.

Suggested textbooks for additional reading:

1. T.M. Apostol, Calculus, Vol. II, 1969.

2. T.M. Apostol, Mathematical Analysis, 1974.

3. J.J. Callahan, Advanced Calculus, 2010.

4. S.R. Ghorpade and B.V. Limaye, A Course in Multivariable Calculus and Analysis, 2010.

5. J.H. Hubbard and B.B. Hubbard, Vector Calculus, Linear Algebra, and Differential Forms, 1999.

6. S. Lang, Calculus of Several Variables, 1987.

7. P.D. Lax and M.S. Terell, Multivariable Calculus with Applications, 2017.1

2 T.K.SUBRAHMONIAN MOOTHATHU

8. J.R. Munkres, Analysis on Manifolds, 1991.

9. C.C. Pugh, Real Mathematical Analysis, 2015.

10. M. Spivak, Calculus on Manifolds, 1965.

1. A few remarks about Rn

A general remark about notations: We do not wish to complicate notations unnecessarily; hence

certain notations have to be understood based on the context. For example, the notation ‘x ∈ Rn’

means x = (x1, . . . , xn), where each xj ∈ R; on the other hand, the notation ‘v1, . . . , vk ∈ Rn’

means each vi is an n-tuple vi = (vi1, . . . , vin) with vij ∈ R.

Recall the following from Linear Algebra:

Definition: Let K = R or C, and X be a vector space over K. A map ∥ · ∥ : X → [0,∞) is called a

norm on X if the following conditions are satisfied for every x, y ∈ X:

(i) ∥x∥ = 0 iff x = 0,

(ii) ∥αx∥ = |α|∥x∥ for every α ∈ K.

(iii) [triangle inequality] ∥x+ y∥ ≤ ∥x∥+ ∥y∥.

If this holds, then (X, ∥ · ∥) is called a normed space. Note that any norm ∥ · ∥ on X induces

a metric on X by the condition d(x, y) := ∥x − y∥ for x, y ∈ X. Thus every normed space is in

particular a metric space.

Example: Let K = R or C. If 1 ≤ p <∞, then p-norm ∥ ·∥p on Kn given by ∥x∥p = (∑n

j=1 |xj |p)1/p

is a norm on Kn, where the triangle inequality is nothing but the Minkowski’s inequality. When

p = 2, we get the Euclidean norm ∥·∥2 on Kn defined as ∥x∥2 = (∑n

j=1 |xj |2)1/2. The metric induced

by the Euclidean norm is the Euclidean metric dE on Kn, where dE(x, y) = (∑n

j=1 |xj − yj |2)1/2.

Two other commonly used norms on Kn are ∥·∥1 (which is the p-norm for p = 1) and ∥·∥∞, defined

respectively as ∥x∥1 =∑n

j=1 |xj | and ∥x∥∞ = max{|xj | : 1 ≤ j ≤ n} for x = (x1, . . . , xn) ∈ Kn.

Remark: (i) If ∥ · ∥ is a norm on a vector space X, then |∥x∥ − ∥y∥| ≤ ∥x− y∥ for every x, y ∈ X

(to see this, note by triangle inequality that ∥x∥ ≤ ∥y∥ + ∥x − y∥ and ∥y∥ ≤ ∥x∥ + ∥y − x∥); and

consequently, ∥ · ∥ : X → R is Lipschitz continuous. (ii) Our primary interest is in the normed

space (Rn, ∥ · ∥2). As a metric space, Cn can be identified with R2n in a natural manner.

Exercise-1: [Recall from Real Analysis] With respect to dE , the following are true:

(i) Rn is complete and (path) connected.

(ii) Qn is a countable dense subset of Rn.

(iii) Every bounded subset of Rn is totally bounded.

MULTIVARIABLE CALCULUS 3

(iv) Let A ⊂ Rn. Then A is compact ⇔ A is closed and bounded in Rn ⇔ A is sequentially compact

⇔ every infinite subset of A has a limit point in A.

(v) A sequence in (Rn) converges iff it converges coordinatewise.

(vi) If X is a metric space, then a function f = (f1, . . . , fn) : X → Rn is continuous iff each fj is

continuous.

(vii) Every linear map f : Rn → Rm is continuous for n,m ∈ N. Consequently, every invertible

linear map f : Rn → Rn is a homeomorphism.

Notation: When K = R or C, let e1, . . . , en be the standard basis vectors in Kn, where ej has 1 at

the jth coordinate and zeroes elsewhere. Note that if x = (x1, . . . , xn) ∈ Kn, then x =∑n

j=1 xjej .

Definition: Two norms ∥ · ∥ and ∥ · ∥0 on a vector space X are said to be equivalent if there are

0 < a < b such that a∥x∥ ≤ ∥x∥0 ≤ b∥x∥ for every x ∈ X. Note that this is equivalent to saying

that the identity map I : (X, ∥ · ∥) → (X, ∥ · ∥0) is a homeomorphism.

[101] Any two norms on Rn are equivalent (similarly, any two norms on Cn are equivalent).

Proof. The equivalence of norms is an equivalent relation on all the collection of all norms on Rn.

Therefore, it suffices to show that an arbitrary norm ∥ · ∥ on Rn is equivalent to the Euclidean

norm ∥ · ∥2 on Rn. Let b =∑n

j=1 ∥ej∥. For x =∑n

j=1 xjej ∈ Rn, we have |xj | ≤ ∥x∥2 for

every j and hence ∥x∥ ≤∑n

j=1 |xj |∥ej∥ ≤ ∥x∥2∑n

j=1 ∥ej∥ = b∥x∥2. From this, we also note that

|∥x∥ − ∥y∥| ≤ ∥x− y∥ ≤ b∥x− y∥2, and thus ∥ · ∥ : Rn → R is Lipschitz continuous with respect to

the Euclidean norm ∥ · ∥2. Next, to find a > 0 such that a∥x∥2 ≤ ∥x∥ for every x ∈ Rn, we argue as

follows. Consider the unit sphere S = {y ∈ Rn : ∥y∥2 = 1}, which is compact, and define f : S → R

as f(y) = ∥y∥. Since f is (Lipschitz) continuous and positive on the compact set S, there is a > 0

such that f(y) ≥ a for every y ∈ A. Now for any x ∈ Rn \ {0}, we have y := x/∥x∥2 ∈ S, and

therefore a ≤ f(y) = ∥y∥ = ∥x∥/∥x∥2, which means a∥x∥2 ≤ ∥x∥ as required. �

Remark: Recall the norms ∥ · ∥1 and ∥ · ∥∞ on Rn mentioned earlier. They induce the metrics d1

and d∞ on Rn, where d1(x, y) =∑n

j=1 |xj − yj | and d∞(x, y) = max{|xj − yj | : 1 ≤ j ≤ n} for

x, y ∈ Rn. A consequence of [101] is the following:

Exercise-2: (i) For a function f : Rn → R, we have that f : (Rn, dE) → (R, dE) is continuous ⇔

f : (Rn, d1) → (R, dE) is continuous ⇔ f : (Rn, d∞) → (R, dE) is continuous.

(ii) If ∥ · ∥ is a norm on Rn, then ∥ · ∥ : (Rn, ∥ · ∥2) → R is Lipschitz continuous.

Definition: Let K = R or C, and X be a vector space over K. A function ⟨·, ·⟩ : X ×X → K is said

to be an inner product on X if the following conditions are satisfied:

(i) ⟨x, x⟩ ≥ 0 for every x ∈ X; and ⟨x, x⟩ = 0 iff x = 0.

(ii) ⟨y, x⟩ = ⟨x, y⟩ for every x, y ∈ X (where the bar indicates complex conjugate).


(iii) [Linearity in the first variable] For each y ∈ X, the map ⟨·, y⟩ : X → K is K-linear, i.e.,

⟨c1x1 + c2x2, y⟩ = c1⟨x1, y⟩+ c2⟨x2, y⟩ for every x1, x2 ∈ X and c1, c2 ∈ K.

Any inner product ⟨·, ·⟩ on X induces a norm ∥ · ∥ on X by the rule ∥x∥ := ⟨x, x⟩1/2. Then we

have the Cauchy-Schwarz inequality |⟨x, y⟩| ≤ ∥x∥∥y∥ for every x, y ∈ X.

Remark: If K = R, then condition (ii) becomes ⟨y, x⟩ = ⟨x, y⟩ for every x, y ∈ X, and then it

follows by (iii) that ⟨·, ·⟩ is linear in each variable separately (in other words, any inner product on

a real vector space is in particular a bilinear map).

Example: The standard inner product on Rn is defined as ⟨x, y⟩ =∑n

j=1 xjyj for x, y ∈ Rn, and

the standard inner product on Cn is defined as ⟨x, y⟩ =∑n

j=1 xjyj for x, y ∈ Cn. The norm induced

by this inner product is nothing but the Euclidean norm ∥ · ∥2 on Rn (respectively, Cn).

Remark: Let ⟨·, ·⟩ be the standard inner product on Rn, and ⟨·, ·⟩0 be an arbitrary inner product

on Rn, and A be the n × n matrix whose ijth entry is ⟨ei, ej⟩0. Then we see by the bilinearity of

the inner product that ⟨x, y⟩0 =∑n

i=1

∑nj=1 xiyj⟨ei, ej⟩0 = ⟨Ax, y⟩ for every x, y ∈ Rn. Note that

A is a symmetric real matrix which is positive-definite (i.e, ⟨Av, v⟩ > 0 for every v ∈ Rn \ {0}).

Conversely, it can be verified that if A is an n × n positive-definite real (symmetric) matrix, then

(x, y) → ⟨Ax, y⟩ is an inner product on Rn.

Exercise-3: Let ⟨·, ·⟩ be the standard inner product on Rn.

(i) Let u, v ∈ Rn \ {0}. Then ⟨u, v⟩ = ∥u∥∥v∥ cos θ, where θ ∈ [0, π] is the angle between u and v.

In particular, ⟨u, v⟩ = 0 iff u and v are orthogonal.

(ii) ⟨·, ·⟩ : Rn×Rn → R is continuous, i.e., if (xk) → x and (yk) → y in Rn, then (⟨xk, yk⟩) → ⟨x, y⟩.

(iii) If f : Rn → R is linear and y = (f(e1), . . . , f(en)), then f(x) = ⟨x, y⟩ for every x ∈ Rn.

[Hint : (i) Assume ∥u∥ ≤ ∥v∥. Also suppose θ ∈ [0, π/2] (else, replace v with −v). Let w be the

point where the perpendicular from u meets the line passing through the origin and v. The right-

angled triangle with vertices at u,w, and v has side-lengths ∥v− u∥, ∥u∥ sin θ, and ∥v∥− ∥u∥ cos θ.

Hence ∥v − u∥2 = ∥u∥2 sin2 θ + (∥v∥ − ∥u∥ cos θ)2 = ∥u∥2 + ∥v∥2 − 2∥u∥∥v∥ cos θ. On the other

hand, ∥v−u∥2 = ⟨v−u, v−u⟩ = ∥u∥2+ ∥v∥2− 2⟨u, v⟩ since ⟨u, v⟩ = ⟨v, u⟩. (ii) |⟨x, y⟩− ⟨xk, yk⟩| ≤

|⟨x, y⟩ − ⟨x, yk⟩|+ |⟨x, yk⟩ − ⟨xk, yk⟩| = |⟨x, y − yk⟩|+ |⟨x− xk, yk⟩| ≤ ∥x∥∥y − yk∥+ ∥x− xk∥∥yk∥

by Cauchy-Schwarz inequality, and observe that (yk) is bounded since it is convergent.]

Exercise-4: [Verify the claims] Let a ∈ Rn. (i) For 1 ≤ k ≤ n, any k-dimensional plane in Rn

containing a has the form a + span{v1, . . . , vk}, where v1, . . . , vk ∈ Rn are linearly independent

(here, an (n− 1)-dimensional plane in Rn is called a hyperplane). In particular, any 2-dimensional

plane containing a has the form {a+ su+ tv : s, t ∈ R}, where u, v ∈ Rn are linearly independent.


(ii) If y ∈ Rn \ {0}, then the hyperplane H passing through a and orthogonal to y is given by

H = {x ∈ Rn : ⟨x − a, y⟩ = 0}. Letting r = ⟨a, y⟩, we may also write H = {x ∈ Rn : ⟨x, y⟩ = r}.

For any b ∈ Rn, the equation of the line passing through b and orthogonal to H is {b+ ty : t ∈ R}.

If c ∈ H is the point where the this line intersects H, then c = b+ t0y and ⟨b+ t0y − a, y⟩ = 0 so

that t0 = ⟨a − b, y⟩/∥y∥2. If ∥y∥ = 1, then we get c = b + ⟨a − b, y⟩y, and dist(b,H) = ∥b − c∥ =

∥⟨b− a, y⟩y∥ = |⟨b− a, y⟩|.

A little bit of geometric visualization is helpful in grasping various concepts in Multivariable

Calculus. Kindle your geometric thinking with the following:

Exercise-5: The graph of a function f : Rn → Rk is the subset {(x, f(x)) : x ∈ Rn} of Rn+k.

Visualize the graphs f1, f2, f3, f4 : R2 → R and g1, g2, g3, g4 : R → R2 as subsets of R3, where

(i) f1(x, y) = x+ y, f2(x, y) = xy, f3(x, y) = x2 + y2, and f4(x, y) = x2 − y2.

(ii) g1(x) = (x+ 1, 0), g2(x) = (x, 2x), g3(x) = (x, x2), and g4(x) = (|x|, x2).

Definition: Any sequence in Rn converging to the origin (0, . . . , 0) ∈ Rn will be called a null

sequence. In particular, a null sequence in R means a sequence converging to 0.

Remark: f : Rn → Rm need not be continuous even if it is continuous in each variable separately .

(i) Let f : R2 → R be f(0, 0) = 0 and f(x, y) =2xy

x2 + y2for (x, y) ̸= (0, 0). Then f is continuous in

each variable separately when the other variable is fixed. But f(ak, ak) = 1 ̸→ 0 = f(0, 0) if (ak) is

a null sequence in R \ {0}.

(ii) For a more striking example, consider f : R2 → R given by f(0, y) = 0 and f(x, y) =xy2

x2 + y4

for x ̸= 0. Then f is continuous in each variable separately. Moreover, if c ∈ R, and (xk) is a null

sequence in R \ {0}, then f(xk, cxk) =cxk

1 + cx2k→ 0 = f(0, 0), which means f(xk, yk) → f(0, 0)

whenever (xk, yk) → (0, 0) along a straight line. In spite of this, f is not continuous at (0, 0);

for any null sequence (ak) in R \ {0}, we have f(a2k, ak) = 1/2 ̸→ 0 = f(0, 0). When we study

differentiability, we will see that a function f : Rn → Rm may fail to be differentiable even if it is

differentiable along all linear directions.

Remark: Some care must be taken while considering limits in Rn. Let U ⊂ R2 be open, f : U → R

be a function and (a, b) ∈ U . The three expressions ‘lim(x,y)→(a,b) f(x, y)’, ‘limx→a limy→b f(x, y)’,

‘limy→b limx→a f(x, y)’ mean three different things. If f is continuous in a neighborhood of the

point (a, b), then the three expressions give the same value, namely f(a, b) (check). If f is not

continuous, then some of the limits may not exist, and even if they exist, they may not be equal.

(i) Let f : R2 → R be f(0, 0) = 0 and f(x, y) =|x|

|x|+ |y|for (x, y) ̸= (0, 0). Then we have

limx→0 (limy→0 f(x, y)) = 1 ̸= 0 = limy→0 (limx→0 f(x, y)), and (hence) lim(x,y)→(0,0) f(x, y) does

not exist (for the last assertion, we may also note that f(1/k, 0) = 1 and f(0, 1/k) = 0).


(ii) Let f : R2 → R be f(x, y) = g(y)x + g(x)y, where g(x) = 1 for x ≥ 0 and g(x) = −1

for x < 0. Then |f(x, y)| ≤ |x| + |y| and hence by Exercise-2, f is continuous at (0, 0) with

lim(x,y)→(0,0) f(x, y) = 0 = f(0, 0). If x ∈ (0, 1), then f(x, 1/k) = x + 1/k and f(x,−1/k) =

−x − 1/k. Hence limy→0 f(x, y) does not exist. A similar observation holds with x and y inter-

changed. Thus the two iterated limits limx→0 limy→0 f(x, y) and limy→0 limy→0 f(x, y) do not exist.

Moreover, it follows that f fails to be continuous in every neighborhood of (0, 0).

Remark: Often, we will denote the Euclidean norm ∥ · ∥2 on Rn simply as ∥ · ∥ when no other norm

is being considered. Similarly, the notation ⟨·, ·⟩ will mean the standard inner product.

2. Multivariable differentiation: definitions

General tip: Keep track of the dimension: throughout this course, while considering elements of

the Euclidean space and functions between Euclidean spaces, make a mental note of the dimension

of the relevant Euclidean space(s), i.e., observe clearly whether the space is R or Rn or Rm, etc.

This will help you to reduce notational as well as conceptual errors.

Definition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U .

(i) If v ∈ Rn, then the directional derivative of f at a in the direction of v is defined as f ′(a; v) =

limt→0f(a+ tv)− f(a)

t∈ Rm, if the limit exists (this makes sense even if v = 0 ∈ Rn). Note

that if f = (f1, . . . , fm), then f′(a; v) = (f ′1(a; v), . . . , f

′m(a; v)); moreover, if ε > 0 is chosen with

a + tv ∈ U for every t ∈ (−ε, ε), and gj : (−ε, ε) → R is defined as gj(t) = fj(a + tv), then

f ′j(a; v) = g′j(0), if the derivative exists.

(ii) The directional derivatives f ′(a; e1), . . . , f′(a; en) in the direction of the standard basis vectors

e1, . . . , en ∈ Rn are called the partial derivatives of f at a. We will write f ′(a; ej) as∂f

∂xj(a), and

call it the partial derivative of f at a with respect to xj , or call it the jth partial derivative of f at

a. Note that∂f

∂x1(a), if it exists, is the derivative of the one-variable function x→ f(x, a2, . . . , an)

at x = a1, and similarly for other partial derivatives. Thus the jth partial derivative of f measures

the rate of change of f with respect to the jth variable, when the other variables are kept fixed.

Remark: To determine∂f

∂xj(a), often the following method is used: formally differentiate f with

respect to xj and substitute a in the resulting expression. This works provided the partial derivative

exists in a neighborhood of a and is continuous at a. If we cannot see the continuity of the partial

derivative at a in advance, then the existence or value of∂f

∂xj(a) should be determined by directly

studying the limit limt→0f(a+ tej)− f(a)

t.


Example: (i) Let f : R2 → R be f(x, y) = x2y. Then for a = (a1, a2), we have∂f

∂x(a) = 2a1a2

and∂f

∂y(a) = a21. Moreover, if v = (3, 5), then f ′(a; v) = limt→0

f(a1 + 3t, a2 + 5t)− f(a1, a2)

t=

limt→0(a1 + 3t)2(a2 + 5t)− a21a2

t= 6a1a2 + 5a21 = 3

∂f

∂x(a) + 5

∂f

∂y(a).

(ii) The existence of partial derivatives does not ensure the existence of directional derivatives. Let

f : R2 → R be f(x, y) = x if 0 ≤ x ≤ y, f(x, y) = y if 0 ≤ y ≤ x, and f(x, y) = 0 otherwise. Check

that f is continuous. For a = (0, 0), we have∂f

∂x(a) = f ′(a; e1) = 0 and

∂f

∂y(a) = f ′(a; e2) = 0.

But if v = (1, 1), then limt→0+f(a+ tv)− f(a)

t= limt→0+

f(t, t)− 0

t= limt→0+

t

t= 1 ̸= 0 =

limt→0−f(a+ tv)− f(a)

t, and hence the directional derivative f ′(a; v) does not exist (geometrically,

the graph of f along the line x = y has a sharp turning at (0, 0)).

(iii) The existence of all directional derivatives does not imply continuity for the function. Let

f : R2 → R be f(0, 0) = 0 and f(x, y) =xy2

x2 + y4for (x, y) ̸= (0, 0). We saw earlier that f is

not continuous at (0, 0) because f(1/n2, 1/n) = 1/2 ̸→ 0. But if a = (0, 0) and v = (v1, v2),

then f ′(a; v) = 0 if v1 = 0 and f ′(a; v) = v22/v1 if v1 ̸= 0; thus all directional derivatives exist at

a = (0, 0). Note that the map v → f ′(a; v) from R2 → R is not linear in this example.

(iv) Let f : R2 → R be f(0, 0) = 0 and f(x, y) =xy3

x2 + y6for (x, y) ̸= (0, 0) =: a. Consider

v = (v1, v2) ̸= (0, 0). Note thatf(a+ tv)− f(a)

t=

tv1v3v21 + t4v62

. If v1 = 0 and v2 ̸= 0, then

f(a+ tv)− f(a)

t= 0, and hence f ′(a; v) = 0. If v1 ̸= 0, then f ′(a; v) = limt→0

tv1v3v21 + t4v62

= 0. Thus

all directional derivatives exist at a, and the map v 7→ f ′(a; v) from R2 → R is the zero map, which

is linear. However, f(1/n3, 1/n) = 1/2 ̸→ 0 = f(a), and therefore f is not continuous at a.

Discussion: In the case of one dimension, we think of the derivative as the ‘rate of change’; moreover,

if f is differentiable at a, then limx→af(x)− f(a)

x− a= f ′(a), which is equivalent to saying that

limx→af(x)− f(a)− L(x− a)

x− a= 0, where L : R → R is the linear map y 7→ f ′(a)y. To define

differentiability in higher dimensions, we need to consider rate of change in different directions. The

whole information about rate of change along various directions is encoded in the map v 7→ f ′(a; v).

It is nice if this map is a linear map, say L. But this map being linear is not sufficient to guarantee

the continuity of f at a (as noted in the previous example). For differentiability to imply continuity,

we should also demand that the limit limt→0f(a+ tv)− f(a)

texists uniformly for all unit vectors

v. This is ensured by demanding that limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥= 0.

[102] Let U ⊂ Rn be open, f : U → Rm, and a ∈ U . Then the following are equivalent:


(i) There is a linear map L : Rn → Rm such that limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥= 0.

(ii) All directional derivatives of f exist at a, the map v 7→ f ′(a; v) from Rn to Rm is linear, and for

every ε > 0, there is δ > 0 such that ∥f(a+ tv)− f(a)

t− f ′(a; v)∥ < ε for every t ∈ (−δ, δ) \ {0}

and every v ∈ Rn with ∥v∥ = 1.

Moreover, if (i) and (ii) hold, then f ′(a; v) = L(v) for every v ∈ Rn.

Proof. Let S = {v ∈ Rn : ∥v∥ = 1}. (i) ⇒ (ii): Let ε > 0. Choose δ > 0 such that∥f(x)− f(a)− L(x− a)∥

∥x− a∥< ε whenever 0 < ∥x − a∥ < δ. Then for every t ∈ (−δ, δ) \ {0}

and every v ∈ S, putting x = a+ tv and noting tL(v) = L(tv), we get that

∥f(a+ tv)− f(a)

t− L(v)∥ =

∥f(a+ tv)− f(a)− L(tv)∥|t|

=∥f(x)− f(a)− L(x− a)∥

∥x− a∥< ε.

This means (ii) holds with f ′(a; v) = L(v).

(ii) ⇒ (i): Let L : Rn → Rm be L(v) = f ′(a; v), which is a linear map by (ii). Given ε > 0,

choose δ > 0 such that ∥f(a+ tv)− f(a)

t− f ′(a; v)∥ < ε for every t ∈ (−δ, δ) \ {0} and every

v ∈ S. Then for any x ∈ U with 0 < ∥x − a∥ < δ, putting t = ∥x − a∥, v =x− a

∥x− a∥and noting

f ′(a; tv) = tf ′(a; v), we get that

∥f(x)− f(a)− L(x− a)∥∥x− a∥

=∥f(a+ tv)− f(a)− f ′(a; tv)∥

|t|= ∥f(a+ tv)− f(a)

t− f ′(a; v)∥ < ε.

This establishes (i). �

Definition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U . We say f is differentiable

at a if there is a linear map L : Rn → Rm such that limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥= 0. If this

condition holds, then the linear map L must be unique (because we must have L(v) = f ′(a; v) for

every v ∈ Rn by [102]), and L is called the total derivative or differential of f at a, and we denote

L either as f ′(a; ·) or as Df(a; ·) (so that L(v) = f ′(a; v) = Df(a; v) for every v ∈ Rn). Other

notations for L found in textbooks are Dfa, Df(a), Daf , df(a), dfa, etc. We say f is differentiable

in U if f is differentiable at every a ∈ U .

Example: (i) If f : Rn → Rm is a constant map f ≡ c, then clearly f is differentiable with

f ′(a; ·) ≡ 0 (the zero map from Rn to Rm) for every a ∈ R. (ii) Let L : Rn → Rm be linear,

y ∈ Rm, and f : Rn → Rm be the affine map f(x) = L(x) + y. Then for each a ∈ Rn, we have

f(x)− f(a)−L(x−a) = 0 ∈ Rm and consequently, f is differentiable at a, and the differential of f

at a is L, i.e., f ′(a; v) = L(v) for every v ∈ Rn. In particular (if y = 0), the differential of a linear

map L : Rn → Rm at any a ∈ Rn is the map L itself.

Remark: We emphasize that the total derivative at a ∈ Rn of a differentiable function f : Rn → Rm

is a linear map L : Rn → Rm and not a real number. In Mathematics, a transition from a


lower dimensional theory to a higher dimensional theory often demands such modifications in one’s

perspective. For instance, a real polynomial f : R → R of degree n ≥ 1 has at most n zeroes,

but a polynomial f : R2 → R in two variables of degree n ≥ 1 can have uncountably many zeroes

(example: f(x, y) = xy); so the correct perspective in higher dimension is the ‘dimension’ of the

zero-set and not the number of zeroes.

Definition: Let U ⊂ Rn be open and a ∈ U . (i) If f : U → R is a function such that the partial

derivative∂f

∂xj(a) exists for every j ∈ {1, . . . , n}, then the gradient vector ∇f(a) ∈ Rn of f at a is

defined as ∇f(a) = (∂f

∂x1(a), . . . ,

∂f

∂xn(a)). For example, if f : R3 → R is f(x, y, z) = 2x3y − yz4,

then ∇f(x, y, z) = (6x2y, 2x3 − z4,−4yz3).

(ii) If f = (f1, . . . , fm) : U → Rm is a function such that the partial derivative∂fi∂xj

(a) exists for

every i ∈ {1, . . . ,m} and every j ∈ {1, . . . , n}, then the Jacobian matrix Jf (a) of f at a is defined

as the m× n matrix whose ijth entry is∂fi∂xj

(a). Note that the ith row of Jf (a) is ∇fi(a).

Exercise-6: [Directional derivative as a linear combination of partial derivatives] Let U ⊂ Rn be

open, a ∈ U , and suppose f : U → Rm is differentiable at a. Then,

(i) f ′(a; v) =∑n

j=1 vjf′(a; ej) =

∑nj=1 vj

∂f

∂xj(a) for every v = (v1, . . . , vn) =

∑nj=1 vjej ∈ Rn.

(ii) If m = 1, then f(a; v) = ⟨∇f(a), v⟩ for every i ∈ {1, . . . ,m} and every v ∈ Rn.

(iii) If n = 1, and f = (f1, . . . , fm), then f′(a) = (f ′1(a), . . . , f

′m(a)).

(iv) In the general case, f ′(a; v) = (⟨∇f1(a), v⟩, . . . , ⟨∇fm(a), v⟩), where f = (f1, . . . , fm).

[Hint : (i) v 7→ f ′(a; v) is linear by [102]. (ii) This follows from (i).]

Remark: Let U ⊂ Rn be open, and f : U → Rm be differentiable at a ∈ U with ∇f(a) ̸= 0. Let S =

{v ∈ Rn : ∥v∥ = 1} and w =∇f(a)∥∇f(a)∥

. The function v 7→ f ′(a; v) = ⟨∇f(a), v⟩ = ∥∇f(a)∥⟨u, v⟩

from S to R attains its maximum at v = w by Cauchy-Schwarz inequality (or more precisely, by the

fact ⟨v, w⟩ = ∥v∥∥w∥ cos θ, where θ is the angle between v and w). Thus the gradient vector ∇f(a)

indicates the direction in which the directional derivative f ′(a; v) (with ∥v∥ = 1) is maximum.

In the one-dimensional case, differentiability of a function is characterized in terms of the so

called Caratheodory lemma1. Since division by x−a becomes meaningless in Rn for n ≥ 2, we need

to modify Caratheodory lemma in higher dimensions. We will give various modified forms - one as

the equivalence (i) ⇔ (iv) in [103] below, and two others as the results [104] and [105].

[103] Let U ⊂ Rn be open, f = (f1, . . . , fm) : U → Rm be a function, and a ∈ U . Then the

following are equivalent: (i) f is differentiable at a.

(ii) fi is differentiable at a for every i ∈ {1, . . . ,m}.

1See my notes Real Analysis.


(iii) The partial derivative∂fi∂xj

(a) exists for every i ∈ {1, . . . ,m} and every j ∈ {1, . . . , n}, and

limx→a|fi(x)− fi(a)− ⟨∇fi(a), x− a⟩|

∥x− a∥= 0 for each i ∈ {1, . . . ,m}.

(iv) There exist a linear map L : Rn → Rm and a function F : U → Rm with limx→a F (x) = 0 =

F (a) such that f has the Caratheodory representation f(x) − f(a) = L(x − a) + ∥x − a∥F (x) for

every x ∈ U (and if this holds, then f ′(a; ·) = L).

Proof. (i) ⇔ (ii): Let L : Rn → Rm be a linear map, and write L = (L1, . . . , Lm). Since conver-

gence in Rm is determined coordinatewise, we note that limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥= 0 iff

limx→a|fi(x)− fi(a)− Li(x− a)|

∥x− a∥= 0 for every i ∈ {1, . . . ,m}.

(ii) ⇒ (iii): If Li = f ′i(a; ·), then Li(v) = fi(a; v) = ⟨∇fi(a), v⟩ by [102] and Exercise-6.

(iii) ⇒ (ii): Let Li : Rn → R be the linear map defined as Li(v) = ⟨∇fi(a), v⟩. Then we obtain

limx→a|fi(x)− fi(a)− Li(x− a)|

∥x− a∥= 0 by (iii), and hence fi is differentiable at a.

(i) ⇒ (iv): Let L be as in the definition of differentiability, and define F : U → Rm as F (a) = 0

and F (x) =f(x)− f(a)− L(x− a)

∥x− a∥for x ̸= a.

(iv) ⇒ (i): If f(x) − f(a) = L(x − a) + ∥x − a∥F (x), then limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥=

limx→a ∥F (x)∥ = 0. �

[104] [Caratheodory lemma for multivariable real-valued functions] Let U ⊂ Rn be open, f : U → R

be a function, and a ∈ U . Then the following are equivalent: (i) f is differentiable at a.

(ii) There is a vector-valued function F : U → Rn (called the Caratheodory function of f at a) such

that f(x)− f(a) = ⟨F (x), x− a⟩ for every x ∈ U , and F is continuous at a.

Moreover, if (i) and (ii) hold, then F (a) = ∇f(a); and the identity f(x)− f(a) = ⟨F (x), x− a⟩

will also be called the Caratheodory representation of f .

Proof. (i) ⇒ (ii): Define F : U → Rn as F (x) = ∇f(a) + (f(x)− f(a)− ⟨∇f(a), x− a⟩)∥x− a∥2

(x − a)

for x ̸= a and F (a) = ∇f(a). Then ∥F (x)−F (a)∥ =|f(x)− f(a)− ⟨∇f(a), x− a⟩|

∥x− a∥→ 0 as x→ a

by (i). Also, we have ⟨F (x), x − a⟩ = ⟨∇f(a), x − a⟩ + (f(x)− f(a)− ⟨∇f(a), x− a⟩)∥x− a∥2

∥x− a∥2=

f(x)− f(a) for x ̸= a.

(ii)⇒ (i): Let L : Rn → R be L(v) = ⟨F (a), v⟩, which is a linear map. Then for x ̸= a, using (ii) and

Cauchy-Schwarz inequality, we see that |f(x)− f(a)−L(x− a)| = |⟨F (x), x− a⟩− ⟨F (a), x− a⟩| =

|⟨F (x)−F (a), x−a⟩| ≤ ∥F (x)−F (a)∥∥x−a∥. Hence |f(x)− f(a)− L(x− a)|∥x− a∥

≤ ∥F (x)−F (a)∥ → 0

as x→ a by the continuity of F at a. �


To generalize [104] to vector-valued functions, we need a little preparation.

Exercise-7: Let L(Rn,Rm) = {L : Rn → Rm : L is a linear map}.

(i) L(Rn,Rm) is a vector space over R with respect to pointwise addition and scalar multiplication.

(ii) ∥L∥ := sup{∥L(v)∥ : v ∈ Rn and ∥v∥ ≤ 1} for L ∈ L(Rn,Rm) defines a norm - called the

operator norm - on the vector space L(Rn,Rm).

(iii) ∥L(v)∥ ≤ ∥L∥∥v∥ for every L ∈ L(Rn,Rm) and every v ∈ Rn.

(iv) Since a norm induces a metric, L(Rn,Rm), becomes a metric space, and hence we can talk

about continuity of functions from Rn to L(Rn,Rm). Note that L(Rn,Rm) can be identified with

{all m× n real matrices}, which in turn can be identified with Rm×n. With this identification, it

follows by [101] that convergence in L(Rn,Rm) with respect to the operator norm is equivalent to

the convergence in Rm×n with respect to the Euclidean norm.

[105] [Caratheodory lemma for vector-valued functions] Let U ⊂ Rn be open, f : U → Rm be a

function, and a ∈ U . Then the following are equivalent: (i) f is differentiable at a.

(ii) There is a function Φ : U → L(Rn : Rm), Φ(x) = Lx, such that f(x) − f(a) = Lx(x − a) for

every x ∈ U , and Φ is continuous at a (which means limx→a ∥Lx − La∥ = 0 with respect to the

operator norm).

Moreover, if (i) and (ii) hold, then La = f ′(a; ·); and the identity f(x) − f(a) = Lx(x − a) will

also be called the Caratheodory representation of f .

Proof. Write f = (f1, . . . , fm). (i) ⇒ (ii): If f is differentiable at a, then each fi is differentiable

at a. Hence by [104], there are functions Fi : U → Rn such that for each i ∈ {1, . . . ,m}, Fi is

continuous at a and fi(x) − fi(a) = ⟨Fi(x), x − a⟩ for every x ∈ U . Define Φ : U → L(Rn : Rm)

as Φ(x) = Lx, where Lx : Rn → Rm is defined as Lx(v) = (⟨F1(x), v⟩, . . . , ⟨Fm(x), v⟩). Note that

each Lx is indeed linear because each of its coordinates is a linear map. Since the ith coordinate of

Lx(x−a) is equal to ⟨Fi(x), x−a⟩ = fi(x)−fi(a), we have f(x)−f(a) = Lx(x−a) for every x ∈ U .

If ∥v∥ ≤ 1, then ∥Lx(v)−La(v)∥ ≤ (∑m

i=1 ∥Fi(x)− Fi(a)∥2)1/2 by Cauchy-Schwarz inequality, and

hence (the operator norm) ∥Lx−La∥ ≤ (∑m

i=1 ∥Fi(x)−Fi(a)∥2)1/2. Since each Fi is continuous at

a, it follows that Φ is continuous at a.

(ii) ⇒ (i): Given Φ as in (ii) with Φ(x) = Lx, put L = La. Then f(x) − f(a) − L(x − a) =

Lx(x− a)−La(x− a). Hence ∥f(x)− f(a)−L(x− a)∥ ≤ ∥Lx −La∥∥x− a∥ by Exercise-7(iii). So,

limx→a∥f(x)− f(a)− L(x− a)∥

∥x− a∥≤ ∥Lx − La∥ → 0 as x→ a by the continuity of Φ at a. �

The following sufficient condition is practically useful to check whether a multivariable function

is differentiable. For u, v ∈ Rn, let [u, v] denote the line segment joining u and v.


[106] Let U ⊂ Rn be open, f = (f1, . . . , fm) : U → Rm be a function, and a ∈ U . If all the partial

derivatives∂fi∂xj

exist and are continuous in a neighborhood of a, then f is differentiable at a.

Proof. By considering each fi separately, we may suppose f is real-valued (i.e., m = 1). Choose

r > 0 such that B(a, r) ⊂ U and all the partial derivatives of f are continuous in B(a, r). Fix

x ∈ B(a, r), and note that x − a =∑n

j=1(xj − aj)ej . Define vectors u0, u1, . . . , un ∈ B(a, r) as

follows: u0 = a, u1 = u0 + (x1 − a1)e1, u2 = u1 + (x2 − a2)e2, · · · , un = un−1 + (xn − an)en = x.

Observe that uj−1 and uj differ only in the jth coordinate. Define gj : [0, 1] → R as gj(t) =

f(uj−1 + t(xj − aj)ej). Applying the one-variable Mean value theorem to gj , we may find a vector

vj = uj−1+ tj(xj −aj)ej on the line segment [uj−1, uj ] such that f(uj)− f(uj−1) = gj(1)− gj(0) =

g′j(tj) =∂f

∂xj(vj)(xj − aj). Define F : B(a, r) → Rn as F (x) = (

∂f

∂x1(v1), . . . ,

∂f

∂xn(vn)). Then we

have f(x) − f(a) =∑n

j=1(f(uj) − f(uj−1)) =∑n

j=1

∂f

∂xj(vj)(xj − aj) = ⟨F (x), x − a⟩ for every

x ∈ B(a, r). Moreover, F (x) → ∇f(a) = F (a) as x→ a by the continuity of the partial derivatives

of f . Hence f is differentiable at a by [104]. �

3. Multivariable differentiation: properties

[107] [Basic properties] Let U ⊂ Rn be open, and a ∈ U .

(i) Suppose f : U → Rm is differentiable at a. Then f is locally Lipschitz at a in the sense that

there are λ, δ > 0 with B(a, δ) ⊂ U and ∥f(x) − f(a)∥ ≤ λ∥x − a∥ for every x ∈ B(a, δ) (this is

weaker than saying f is Lipschitz continuous in B(a, δ)). In particular, f is continuous at a.

(ii) If f = (f1, . . . , fm) : U → Rm is differentiable at a, then the matrix of the linear map f ′(a; ·) :

Rn → Rm is the m× n (not n×m) Jacobian matrix Jf (a) whose ijth entry is∂fi∂xj

(a).

(iii) [Linearity] If f, g : U → Rm are differentiable at a and c1, c2 ∈ R, then c1f+c2g is differentiable

at a and (c1f + c2g)′(a; ·) = c1f

′(a; ·) + c2g′(a; ·).

(iv) [Product rule] Let f, g : U → R be differentiable at a. Then their product fg is differentiable at

a, and (fg)′(a; ·) = f(a)g′(a; ·) + g(a)f ′(a; ·). Consequently2, ∇(fg)(a) = f(a)∇g(a) + g(a)∇f(a).

(v) [Quotient rule] Let f, g : U → R be differentiable at a, and suppose g is non-vanishing in U (or in

a neighborhood of a). Then f/g is differentiable at a, and (f/g)′(a; ·) = g(a)f ′(a; ·)− f(a)g′(a; ·)g(a)2

.

Consequently, ∇(f/g)(a) =g(a)∇f(a)− f(a)∇g(a)

g(a)2.

Proof. (i) By [103](iv), f has a Caratheodory representation f(x)−f(a) = L(x−a)+∥x−a∥F (x) for

x ∈ U , where L : Rn → Rm is linear and limx→a F (x) = 0 = F (a). Since L is linear, there is M > 0

such that ∥L(x)− L(y)∥ ≤M∥x− y∥ for every x, y ∈ Rn. Since limx→a ∥F (x)∥ = 0, there is δ > 0

2The existence of ∇(fg)(a) by itself does not imply the differentiability of fg at a.


such that B(a, δ) ⊂ U and ∥F (x)∥ ≤ 1 for every x ∈ B(a, δ). Then ∥f(x)−f(a)∥ ≤ (M +1)∥x−a∥

for every x ∈ B(a, δ).

(ii) The jth column of the matrix of the linear map f ′(a; ·) is specified by the vector f ′(a; ej) =∂f

∂xj(a) = (

∂f1∂xj

(a), . . . ,∂fm∂xj

(a)).

(iii) If f(x)−f(a) = L1(x−a)+∥x−a∥F (x) and g(x)−g(a) = L2(x−a)+G(x) are the Caratheodory

representations as in [103](iv) of f and g respectively, then (c1f + c2g)(x) − (c1f + c2g)(a) =

(c1L1 + c2L2)(x− a) + ∥x− a∥(c1F + c2G)(x). Again use [103].

(iv) Let L1 = f ′(a; ·), L2 = g′(a; ·), and put L = f(a)L2 + g(a)L1. By adding and subtracting the

quantity f(x)g(a)− f(x)L2(x− a), we may note that ∥f(x)g(x)− f(a)g(a)− L(x− a)∥ ≤

∥f(x)∥∥g(x)− g(a)− L2(x− a)∥+ ∥f(x)− f(a)∥∥L2(x− a)∥+ ∥g(a)∥∥f(x)− f(a)− L1(x− a)∥.

Hence, limx→a∥f(x)g(x)− f(a)g(a)− L(x− a)∥

∥x− a∥= 0 by hypothesis, by part (i), and by the fact

that limx→a L2(x− a) = 0.

We can give another (easier) proof using [104] as follows. Let f(x) − f(a) = ⟨F (x), x − a⟩ and

g(x) − g(a) = ⟨G(x), x − a⟩ be the Caratheodory representations of f and g given by [104]. Then

f(x)g(x)−f(a)g(a) = f(x)g(x)−f(x)g(a)+f(x)g(a)−f(a)g(a) = f(x)(g(x)−g(a))+g(a)(f(x)−

f(a)) = f(x)⟨G(x), x−a⟩+g(a)⟨F (x), x−a⟩ = ⟨H(x), x−a⟩, where H(x) := f(x)G(x)+g(a)F (x).

Then H is continuous at a since f,G, F are continuous at a. Hence by [104], fg is differentiable at

a and ∇(fg)(a) = H(a) = f(a)G(a) + g(a)F (a) = f(a)∇g(a) + g(a)∇f(a).

(v) Let f(x)−f(a) = ⟨F (x), x−a⟩ and g(x)−g(a) = ⟨G(x), x−a⟩ be the Caratheodory representa-

tions of f and g given by [104]. Then (f/g)(x)−(f/g)(a) =f(x)g(a)− f(a)g(x)

g(x)g(a). But = f(x)g(a)−

f(a)g(x) = (f(x) − f(a))g(a) − f(a)(g(x) − g(a)) = g(a)⟨F (x), x − a⟩ − f(a)⟨G(x), x − a⟩. Hence

(f/g)(x)−(f/g)(a) = ⟨H(x), x−a⟩, where H(x) :=g(a)F (x)− f(a)G(x)

g(x)g(a). Clearly, H is continuous

at a. Therefore f/g is differentiable at a and ∇(f/g)(a) = H(a) =g(a)∇f(a)− f(a)∇g(a)

g(a)2. �

Exercise-8: Let f : Rn → R be a polynomial in n variables, i.e., f has the form f(x1, . . . , xn) =∑pk=0 ckx

q(1,k)1 · · ·xq(n,k)n , where p ∈ N ∪ {0}, ck ∈ R, and q(j, k) ∈ N ∪ {0}. Then,

(i) f is differentiable in Rn.

(ii) ∇f(a) = (∂f

∂x1(a), · · · , ∂f

∂xn(a)), and

∂f

∂xj(a) is obtained by formally differentiating f with

respect to xj by keeping the other variables fixed.

[Hint : (i) For 1 ≤ j ≤ n, let πj : Rn → R be the projection x 7→ xj , which is linear and hence

differentiable. Let π0 : Rn → R be the constant map π0 ≡ 1, which is also differentiable. Now note

that f is a linear combination of products of πj ’s for 0 ≤ j ≤ n, and use [107](iii) and [107](iv).]


Remark: Identify M(n,R) := {all n× n real matrices} with Rn×n. Then A 7→ det(A) and A 7→

trace(A) from Rn×n to R are polynomials in n×n variables, and hence differentiable by Exercise-8.

[108] (i) [Chain rule] Let U ⊂ Rn and V ⊂ Rm be open, f : U → V and g : V → Rk be functions,

and let a ∈ U . If f is differentiable at a, and g is differentiable at f(a), then g ◦ f : U → Rk is

differentiable at a, and (g ◦ f)′(a; ·) = g′(f(a); ·) ◦ f ′(a; ·) = g′(f(a); f ′(a; ·)) (in terms of Jacobian

matrices, this means that Jg◦f (a) = Jg(f(a))Jf (a)).

(ii) Suppose k = 1 and f = (f1, . . . , fm) in (i). Then we have that (g◦f)′(a; v) = g′(f(a); f ′(a; v)) =

⟨∇g(f(a)), f ′(a; v)⟩ =∑m

i=1

∂g

∂yi(f(a))⟨∇fi(a), v⟩ ∀ v ∈ Rn. So,∇(g◦f)(a) =

∑mi=1

∂g

∂yi(f(a))∇fi(a).

Moreover,∂(g ◦ f)∂xj

(a) =∑m

i=1

∂g

∂yi(f(a))

∂fi∂xj

(a) = ⟨∇g(f(a)), ∂f∂xj

(a)⟩ for 1 ≤ j ≤ n.

(iii) If n = k = 1 in (i), then (g ◦ f)′(a) = ⟨∇g(f(a)), f ′(a)⟩.

(iv) If m = k = 1 in (i), then (g ◦ f)′(a; v) = g′(f(a))f ′(a; v) = g′(f(a))⟨∇f(a), v⟩ for every v ∈ Rn.

(v) If n = m = 1 in (i), then (g ◦ f)′(a) = g′(f(a))f ′(a) = (g′1(f(a)f′(a), . . . , g′k(f(a))f

′(a)), where

g = (g1, . . . , gk).

Proof. (i) By [103], f and g have Caratheodory representations f(x)−f(a) = L1(x−a)+∥x−a∥F (x)

and g(y) − g(f(a)) = L2(y − f(a)) + ∥y − f(a)∥G(y), where L1 = f ′(a; ·) : Rn → Rm and L2 =

g′(f(a); ·) : Rm → Rk are linear, limx→a F (x) = 0 ∈ Rm and limy→f(a)G(y) = 0 ∈ Rk. Let

L = L2 ◦ L1 : Rn → Rk, which is again a linear map. We have g(f(x))− g(f(a))

= L2(f(x)− f(a))+ ∥f(x)− f(a)∥G(f(x)) = L2(L1(x− a)+ ∥x− a∥F (x))+ ∥f(x)− f(a)∥G(f(x))

= L(x − a) + ∥x − a∥H(x), where H(x) := L2(F (x)) +∥f(x)− f(a)∥

∥x− a∥G(f(x)) for x ̸= a and

H(a) := 0 ∈ Rk. Since limx→a ∥H(x)∥ = 0 by [107](i) and by the properties of F,G, and L2, it

follows that the g◦f has the Caratheodory representation g(f(x))−g(f(a)) = L(x−a)+∥x−a∥H(x)

for x ∈ U . Hence by [103], g ◦ f is differentiable at a with (g ◦ f)′(a; ·) = L = L2 ◦ L1.

Another proof : By Caratheodory lemma [105], there are functions Φ1 : U → L(Rn,Rm) and

Φ2 : V → L(Rm,Rk), Φ1(x) = L1,x and Φ2(y) = L2,y, such that Φ1 and Φ2 are continuous at

a, f(a) respectively, f(x)− f(a) = L1,x(x− a) for every x ∈ U , and g(y)− g(f(a)) = L2,y(y− f(a))

for every y ∈ V . Let Φ : U → L(Rn,Rk) be Φ(x) = Lx := L2,f(x) ◦ L1,x. Then g(f(x))− g(f(a)) =

L2,f(x)(f(x) − f(a)) = L2,f(x)(L1,x(x − a)) = Lx(x − a). Moreover, using Exercise-7(iii), we may

observe that the operator norm ∥Lx − La∥ = ∥L2,f(x) ◦ L1,x − L2,f(a) ◦ L1,a∥ = ∥L2,f(x) ◦ (L1,x −

L1,a) + (L2,f(x) −L2,f(a)) ◦L1,a∥ ≤ ∥L2,f(x)∥∥L1,x −L1,a∥+ ∥L2,f(x) −L2,f(a)∥∥L1,a∥ → 0 as x→ a

by the continuity of f and Φ1 at a, the continuity of Φ2 at f(a), and the continuity of the operator

norm (see Exercise-2). Thus Φ is continuous at a. Therefore by [105], g ◦ f is differentiable at a,

and (g ◦ f)′(a; ·) = La = L2,f(a) ◦ L1,a = g′(f(a); ·) ◦ f ′(a; ·).


(ii) The first assertion is evident. For the last assertion, argue as follows. We have Jg◦f (a) =

Jg(f(a))Jf (a) by (i). Now note that Jg(f(a)) is the 1 ×m matrix whose ith entry is∂g

∂yi(f(a)),

and Jf (a) is an m× n matrix whose ijth entry is∂fi∂xj

(a).

For (iii) and (iv), use part (ii) and Exercise-6(ii). For (v), use part (i) and Exercise-6(iii). �

Remark: (i) Suppose n = m = 2 and k = 1 in [105](i). Write f(x, y) = (u(x, y), v(x, y)) and put

h = g ◦ f . Then h(x, y) = g(u(x, y), v(x, y)). So the Chain rule as expressed in [105](ii) takes the

form∂g

∂x=

∂g

∂u

∂u

∂x+∂g

∂v

∂v

∂xand

∂g

∂y=

∂g

∂u

∂u

∂y+∂g

∂v

∂v

∂y, an expression usually found in Calculus

textbooks. (ii) Two examples illustrating the Chain rule are given below. However, in practical

problems, it is often easier to compute (g ◦ f)′(a; ·) directly after determining g ◦ f . The Chain rule

is mostly useful in theoretical proofs.

Example: In the two examples below, the functions are differentiable by [106].

(i) [n = k = 1 and m = 2] Let f : R → R2 be f(t) = (t, t2) and g : R2 → R be g(x, y) = x3 + y4.

Then Jg◦f (t) = Jg(f(t))Jf (t) = Jg(t, t2)Jf (t) =

[3t2 4t6

] 1

2t

=[3t2 + 87

]. Hence (g ◦ f)′(t) =

3t2 + 8t7. Direct computation is easier: (g ◦ f)(t) = t3 + t8 and hence (g ◦ f)′(t) = 3t2 + 8t7.

(ii) [n = m = 2 and k = 1] Let f : R2 → R2 be f(r, θ) = (r cos θ, r sin θ), and g : R2 →

R be g(x, y) = x2 + y2. Then Jg◦f (r, θ) = Jg(f(r, θ))Jf (r, θ) = Jg(r cos θ, r sin θ)Jf (r, θ) =[2r cos θ 2r sin θ

]cos θ −r sin θ

sin θ r cos θ

=[2r 0

]. Hence

∂(g ◦ f)∂r

(r, θ) = 2r and∂(g ◦ f)∂θ

(r, θ) = 0.

Consequently, (g◦f)′((r, θ); (v1, v2)) = 2rv1. Again, a direct computation is easier: (g◦f)(r, θ) = r2

and hence ∇(g ◦ f)(r, θ) = (2r, 0).

Exercise-9: Let ϕ, ψ : Rn → R be ϕ(x) = ∥x∥2, ψ(x) = ∥x∥, and ρ : Rn×Rn → R be ρ(x, y) = ⟨x, y⟩.

(i) ϕ is differentiable in Rn, ∇ϕ(a) = 2a, and ϕ′(a; v) = 2⟨a, v⟩ for every a, v ∈ Rn.

(ii) ψ is differentiable in Rn \ {0}, ∇ψ(a) = a

∥a∥, and ψ′(a; v) =

⟨a, v⟩∥a∥

for every a ∈ Rn \ {0} and

v ∈ Rn. But ψ is not differentiable at 0 ∈ Rn.

(iii) ρ is differentiable in Rn × Rn, ∇ρ(a, b) = (b, a), and ρ′((a, b); (u, v)) = ⟨b, u⟩+ ⟨a, v⟩ for every

(a, b), (u, v) ∈ Rn × Rn.

[Hint : (i) ϕ(x) =∑n

j=1 x2j . So ϕ is differentiable by Exercise-8, and ∇ϕ(a) = 2a. (ii) Let ϕ :

Rn \ {0} → R \ {0} be ϕ(x) = ∥x∥2, and g : R \ {0} → R be g(x) =√x. Then ϕ, g are differentiable

and ψ = g ◦ϕ on Rn \ {0}. Hence ψ is differentiable on Rn \ {0}, and ψ′(a; v) = g′(ϕ(a))⟨∇ϕ(a), v⟩.

To see ψ is not differentiable at 0 ∈ Rn, note that t 7→ |t| = ψ(t, 0, . . . , 0) is not differentiable at

0 ∈ R. (iii) ρ is differentiable by Exercise-8 since it is a multivariable polynomial.]


Exercise-10: (i) If f : Rn → Rm \ {0} is differentiable is at a ∈ Rn, then h : Rn → R defined

as h(x) = ∥f(x)∥ is differentiable at a, h′(a; v) =⟨f(a), f ′(a; v)⟩

∥f(a)∥for every v ∈ Rn, and ∇h(a) =∑m

i=1 fi(a)∇fi(a). If n = 1, then h′(a) =⟨f(a), f ′(a)⟩

∥f(a)∥.

(ii) Let f, g : Rn → Rm be differentiable at a, b ∈ Rn respectively. Then h : Rn×Rn → R defined as

h(x, y) = ⟨f(x), g(y)⟩ is differentiable at (a, b) and h′((a, b); (u, v)) = ⟨g(b), f ′(a;u)⟩+⟨f(a), g′(b; v)⟩

for every (u, v) ∈ Rn × Rn. Also, ∇h(a, b) = (∑m

i=1 gi(b)∇fi(a),∑m

i=1 fi(a)∇gi(b)). If n = 1, then

h′(a, b) = ⟨g(b), f ′(a)⟩+ ⟨f(a), g′(b)⟩.

(iii) [Generalization of product rule] Let f, g : Rn → Rm be differentiable at a ∈ Rn. Then

h : Rn → R defined as h(x) = ⟨f(x), g(x)⟩ is differentiable at a and h′(a; v) = ⟨g(a), f ′(a; v)⟩ +

⟨f(a), g′(a; v)⟩ for every v ∈ Rn. Also, ∇h(a) =∑m

i=1 gi(a)∇fi(a) +∑m

i=1 fi(a)∇gi(a). If f = g,

then h(x) = ∥f(x)∥2, h′(a; v) = 2⟨f(a); f ′(a; v)⟩ for every v ∈ Rn, and ∇h(a) = 2∑m

i=1 fi(a)∇fi(a).

[Hint : Let ϕ, ψ, ρ be as in the previous exercise. (i) We have h = ψ ◦ f , where ψ(y) = ∥y∥.

Use Exercise-9(ii) after noting that h′(a; v) = ψ′(f(a); f ′(a; v)) by Chain rule. (iii) We have h =

ρ ◦ (f, g), where ρ : Rm × Rm → R is ρ(x, y) = ⟨x, y⟩. Use Exercise-9(iii) after noting that

h′(a; v) = ρ′((f(a), g(a)); (f ′(a; v), g′(b; v))).]

[109] Let U ⊂ Rn be open, and a, b ∈ U be distinct vectors such that the line segment [a, b] ⊂ U .

(i) [Mean value theorem for real valued functions] If f : U → R is differentiable, then there is

c ∈ (a, b) such that f(b)− f(a) = f ′(c; b− a) = ⟨∇f(c), b− a⟩.

(ii) [Mean value theorem for vector-valued functions] If f : U → Rm is differentiable, then for each

z ∈ Rm, there is c ∈ (a, b) (where c depends on z) such that ⟨f(b)− f(a), z⟩ = ⟨f ′(c; b− a), z⟩.

(iii) [Mean value inequalities for multivariable functions] Let f = (f1, . . . , fm) : U → Rm be

differentiable. Let M1 = sup{∥f ′(c; ·)∥ : c ∈ [a, b]}, where ∥f ′(c; ·)∥ is the operator norm of the

linear map f ′(c; ·), and M2 = sup{∑m

i=1 ∥∇fi(c)∥ : c ∈ [a, b]}. Then ∥f(b) − f(a)∥ ≤ M1∥b − a∥

and ∥f(b)− f(a)∥ ≤M2∥b− a∥.

Proof. (i) Let g : [0, 1] → R be g(t) = f(a+ t(b− a)), which is differentiable, being the composition

of two differentiable functions. Applying the one-variable Mean value theorem to g, we may find

t0 ∈ (0, 1) with f(b)− f(a) = g(1)− g(0) = g′(t0)(1− 0) = f ′(c; b− a), where c := a+ t0(b− a).

(ii) Fix z ∈ Rm and define g : [0, 1] → R as g(t) = ⟨f(a+ t(b−a)), z⟩, which is differentiable since f

and the maps t 7→ a+ t(b− a), y 7→ ⟨y, z⟩ are differentiable. Applying the one-variable Mean value

theorem to g, we may find t0 ∈ (0, 1) with ⟨f(b) − f(a), z⟩ = g(1) − g(0) = g′(t0)(1 − 0) = g′(t0).

Let c = a+ t0(b− a). By Chain rule and the linearity of the inner product in the first variable, we

may observe that g′(t0) = ⟨f ′(c; b− a), z⟩ (or directly calculate the limit limt→t0

g(t)− g(t0)

t− t0).


(iii) Taking z = f(b)−f(a) in (ii) and applying Cauchy-Schwarz inequality, we get ∥f(b)−f(a)∥2 =

|⟨f(b)−f(a), f(b)−f(a)⟩| = |⟨f ′(c; b−a), f(b)−f(a)⟩| ≤ ∥f ′(c; b−a)∥∥f(b)−f(a)∥ for some c ∈ (a, b),

and hence ∥f(b) − f(a)∥ ≤ ∥f ′(c; b − a)∥. Since ∥f ′(c; b − a)∥ ≤ ∥f ′(c; ·)∥∥b − a∥, it follows that

∥f(b)− f(a)∥ ≤M1∥b− a∥. Moreover, as f ′(c; b− a) =∑m

i=1⟨∇fi(c), b− a⟩ei, another application

of Cauchy-Schwarz inequality yields ∥f ′(c; b−a)∥ ≤∑m

i=1 |⟨∇fi(c), b−a⟩| ≤∑m

i=1 ∥∇fi(c)∥∥b−a∥.

This implies ∥f(b)− f(a)∥ ≤M2∥b− a∥. �

Example: Let U ⊂ Rn be open, f : U → Rm be differentiable, and [a, b] ⊂ U . Then there may not

exist c ∈ (a, b) with f(b)−f(a) = f ′(c; b−a). Consider f : R2 → R2 defined as f(x, y) = (x2, y3). Let

a = (0, 0) and b = (1, 1). Since b− a = (1, 1) and Jf (c1, c2) =

2c1 0

0 3c22

for any c = (c1, c2) ∈ R2,

we get f ′(c; b−a) = (2c1, 3c22). If f

′(c; b−a) = f(b)−f(a) = (1, 1), then c1 ̸= c2 and hence c /∈ [a, b].

Exercise-11: Let U ⊂ Rn be open and connected.

(i) If a, b ∈ U , then there is a polygonal path in U from a to b (polygonal path means a continuous

path consisting of finitely many line segments).

(ii) If f : U → Rm is differentiable and f ′(c; ·) ≡ 0 for every c ∈ U , then f is constant.

[Hint : (i) Let Y = {y ∈ U : there is a polygonal path in U from a to y}. Then a ∈ Y . Check that

Y is both open and closed in A. (ii) Writing f = (f1, . . . , fm) and considering each fi separately,

assume m = 1. If [a, b] ⊂ U , then f(a) = f(b) by Mean value theorem [109](i) (or, apply [109](ii)

directly to f with z = f(b)− f(a)). It now follows by (i) that f(a) = f(b) for every a, b ∈ U .]

Exercise-12: Let U ⊂ Rn be open, and f : U → R be differentiable. Suppose f has either a local

maximum or a local minimum at a ∈ U , i.e., there is r > 0 with B(a, r) ⊂ U such that either

f(b) ≥ f(a) for every b ∈ B(a, r) or f(b) ≤ f(a) for every b ∈ B(a, r). Then f ′(a; ·) ≡ 0. [Hint :

Assume B(a, r) ⊂ U and f(b) ≥ f(a) for every b ∈ B(a, r). Fix v ∈ Rn with ∥v∥ = 1 and define

g : (−r, r) → R as g(t) = f(a + tv). Then g has a local maximum at 0. Hence by one-variable

theory, 0 = g′(0) = f ′(a; v).]

Exercise-13: Let f : Rn → R be differentiable.

(i) If f has continuous partial derivatives, then f(x) = f(0) +∑n

j=1 xj∫ 10

∂f

∂xj(tx)dt ∀x ∈ Rn.

(ii) Let k ∈ N. If f(tx) = tkf(x) for every t ∈ R and x ∈ Rn, then kf(x) = ⟨∇f(x), x⟩ ∀x ∈ Rn.

(iii) If f(tx) = tf(x) for every t ∈ R and x ∈ Rn, then f(x) = ⟨∇f(0), x⟩ ∀x ∈ Rn.

[Hint : (i) Fix x ∈ Rm and define F : R → R as F (t) = f(tx). Then F ′(t) = ⟨∇f(tx), d(tx)dt

⟩ =∑nj=1 xj

∂f

∂xj(tx). Applying the Fundamental theorem of calculus to F , we see f(x) − f(0) =

F (1) − F (0) =∫ 10 F

′(t)dt. (ii) Differentiate tkf(x) = f(tx) with respect to t, and put t = 1. (iii)

Differentiate tf(x) = f(tx) with respect to t, and put t = 0.]


4. Higher order partial derivatives

Definition and Example: Let U ⊂ Rn be open and f : U → Rm be a function for which the partial

derivatives∂f

∂xj: U → Rm exist in U . If

∂2f

∂xixj:=

∂

∂xi(∂f

∂xj) exists, then it is called a second order

partial derivative of f . Repeating this process, for any k ∈ N, we may define kth order partial

derivatives of f , if they exist. For example, let f : R2 → R be f(x, y) =x3y − xy3

x2 + y2if (x, y) ̸= (0, 0)

and f(0, 0) = 0. Then∂f

∂x(x, y) =

x4y − 4x2y3 − y5

(x2 + y2)2for (x, y) ̸= (0, 0) and

∂f

∂x(0, 0) = 0. Since

∂f

∂x(0, y) = −y, we get

∂2f

∂y∂x(0, y) = −1 for every y ∈ R. Similarly,

∂f

∂y(x, y) =

x5 − 4x3y2 − xy4

(x2 + y2)2

for (x, y) ̸= (0, 0) and∂f

∂y(0, 0) = 0. Since

∂f

∂y(x, 0) = x, we get

∂2f

∂x∂y(x, 0) = 1 for every y ∈ R.

Hence∂2f

∂x∂y(0, 0) = 1 ̸= −1 =

∂2f

∂y∂x(0, 0).

[110] [Equality of mixed partial derivatives] Let U ⊂ Rn be open, f : U → Rmbe a function, and

1 ≤ j, k ≤ n. If∂2f

∂xj∂xkand

∂2f

∂xk∂xjexist and are continuous in U , then

∂2f

∂xj∂xk=

∂2f

∂xk∂xjin U .

Proof. Writing f = (f1, . . . , fm) and considering each fi separately, we may supposem = 1. Also we

may assume n = 2 since we deal with only two variables at a time. So now the function under consid-

eration is f : U ⊂ R2 → R. Let (a, b) ∈ U . Then∂2f

∂y∂x(a, b) = limy→b

1

y − b

[∂f

∂x(a, y)− ∂f

∂x(a, b)

]= limy→b limx→a

(f(x, y)− f(a, y))− (f(x, b)− f(a, b))

(y − b)(x− a)

= limy→b limx→ag(x, y)− g(x, b)

(y − b)(x− a)(where g(x, y) := f(x, y)− f(a, y)).

= limy→b limx→a

∂g

∂y(x, y0)

x− a(for some y0 ∈ (b, y) by Mean value theorem)

= limy→b limx→a

∂f

∂y(x, y0)−

∂f

∂y(a, y0)

x− a

= limy→b limx→a∂2f

∂x∂y(x0, y0) (for some x0 ∈ (a, x) by Mean value theorem)

=∂2f

∂x∂y(a, b) since

∂2f

∂x∂yis continuous and (x0, y0) → (a, b) as (x, y) → (a, b).

Note that in this proof, we used only the continuity of∂2f

∂x∂y. �

Definition: Let U ⊂ Rn be open and f : U → Rm be a function. For k ∈ N, we say f is a

Ck-function (or a Ck-map) if all the kth order partial derivatives of f exist and are continuous in

U . We say f is a C∞-function (or a smooth function/map) if f is a Ck-function for every k ∈ N.


Remark: Let U ⊂ Rn be open and f : U → Rm be a function. Since we have the identification

L(RnRm) ∼= {all m× n real matrices} ∼= Rm×n and since the entries of the Jacobian matrix are the

(first order) partial derivatives, it follows from [106] that f is a C1-function ⇔ f is differentiable in

U and the map x 7→ f ′(x; ·) from U to L(Rn,Rm) is continuous with respect to the operator norm

on L(Rn,Rm).

The following types of results are of importance in Differential Topology:

[111] (i) Let a ∈ Rn and 0 < r < s. Then there exist smooth functions (i.e., C∞-functions)

f, g : Rn → R with the following properties: 0 ≤ f, g ≤ 1, f(x) > 0 iff ∥a − x∥ < s, f(x) = 0 iff

∥a− x∥ ≥ s, g(x) > 0 iff ∥a− x∥ > r, and g(x) = 0 iff ∥a− x∥ ≤ r.

(ii) Let a ∈ Rn and 0 < r < s. Then there is a smooth function h : Rn → R with 0 ≤ h ≤ 1 such

that h(x) = 1 for every x ∈ B(a, r), and h(x) = 0 for every x ∈ Rn \B(a, s).

(iii) Let A ⊂ U ⊂ Rn, where A is a nonempty compact set and U is open in Rn. Then there is a

smooth function h : Rn → R with 0 ≤ h ≤ 1 such that h(x) = 1 for every x ∈ A, and h(x) = 0 for

every x ∈ Rn \ U .

Proof. (i) We know from Real Analysis that there is a smooth function ψ : R → R such that

0 ≤ ψ ≤ 1, ψ(x) = 0 for x ≤ 0, and ψ(x) > 0 for x > 0. For example, we may take ψ to be the

function ψ(x) = 0 if x ≤ 0 and ψ(x) = e−1/x if x > 0. Note that the function y → ∥y∥2 from Rn to

R is a multivariable polynomial and hence smooth. Therefore, the functions f, g : Rn → R defined

as f(x) = ψ(s2−∥a−x∥2) and g(x) = ψ(∥a−x∥2− r2) are smooth. Verify the required properties.

(ii) Let f, g be as in the above proof. Note that f(x)+g(x) > 0 for every x ∈ Rn. Define h : Rn → R

as h(x) =f(x)

f(x) + g(x). Check that this works.

(iii) For each a ∈ A, there is ra > 0 with B(a, 2ra) ⊂ U . As {B(a, ra) : a ∈ A} is an open cover for

the compact set A, there are finitely many vectors a1, . . . , ap ∈ A with A ⊂∪pj=1B(aj , raj ). First

proof : By (ii), there are smooth functions h1, . . . , hp : Rn → R such that 0 ≤ hj ≤ 1, hj(x) = 1 for

every x ∈ B(aj , raj ), and hj(x) = 0 for every x ∈ Rn \ B(aj , 2raj ), 1 ≤ j ≤ p. Let h : Rn → R be

h(x) = 1 −∏pj=1(1 − hj(x)), and check that this works. Second proof : Let U1 =

∪pj=1B(aj , raj )

and U2 =∪pj=1B(aj , 2raj ). Then A ⊂ U1 ⊂ U1 ⊂ U2 ⊂ U . For each j, we may choose smooth

functions fj , gj : Rn → R by (i) such that fj(x) > 0 iff ∥aj−x∥ < 2raj , fj(x) = 0 iff ∥aj−x∥ ≥ 2raj ,

gj(x) > 0 iff ∥aj − x∥ > raj , and gj(x) = 0 iff ∥aj − x∥ ≤ raj . Let F,G : Rn → R be F =∑p

j=1 fj

and G =∏pj=1 gj . Then F,G are smooth, F (x) > 0 iff x ∈ U2, F (x) = 0 iff x ∈ Rn \ U2, G(x) > 0

iff x ∈ Rn \ U1, and G(x) = 0 iff x ∈ U1. In particular, F + G > 0 in Rn. Define h : Rn → R as

h(x) =F (x)

F (x) +G(x). Check that this works. �

Remark: Using [111](iii), we may construct smooth partitions of unity ; read this from textbooks.


Definition: If v = (v1, . . . , vn) ∈ Rn, then we define the directional derivative operator ⟨v,∇⟩

as ⟨v,∇⟩ =∑n

j=1 vj∂

∂xj. Note that if U ⊂ Rn is open and f : U → R is a function for

which the relevant partial derivatives exist, then ⟨v,∇⟩f(a) =∑n

j=1 vj∂f

∂xj(a), ⟨v,∇⟩2f(a) =∑n

i=1

∑nj=1 vivj

∂2f

∂xi∂xj(a), ⟨v,∇⟩3f(a) =

∑ni=1

∑nj=1

∑nk=1 vivjvk

∂3f

∂xi∂xj∂xk(a), and so on.

[112] [Multivariable Taylor’s theorem] Let U ⊂ Rn be open, [a, b] ⊂ U , q ∈ N, and f : U → R be a

Cq+1-function. Then,

(i) There is c ∈ (a, b) such that f(b) = f(a) +∑q

k=1

⟨b− a,∇⟩kf(a)k!

+⟨b− a,∇⟩q+1f(c)

(q + 1)!.

(ii) [Integral form] f(b) = f(a) +∑q

k=1


+∫ 10

(1− t)q

q!⟨b− a,∇⟩q+1f(a+ t(b− a))dt.

Proof. (i) Define g : [0, 1] → R as g(t) = f(a + t(b − a)), which is a Cq+1-function. Note that

g′(t) = f ′(a + t(b − a); b − a) = ⟨∇f(a + t(b − a)), b − a⟩ = ⟨b − a,∇⟩g(t), and hence we may

verify inductively that g(k)(t) = ⟨b − a,∇⟩kg(t) = ⟨b − a,∇⟩kf(a + t(b − a)) for 1 ≤ k ≤ q + 1,

where g(k) denotes the kth derivative of g. Applying the one-variable Taylor’s theorem to g, we

may find t0 ∈ (0, 1) with g(1) = g(0) +∑q

k=1

g(k)(0)(1− 0)k

k!+g(q+1)(t0)(1− 0)q+1

(q + 1)!. This means

f(b) = f(a) +∑q

k=1


+⟨b− a,∇⟩q+1f(c)

(q + 1)!, where c := a+ t0(b− a).

(ii) Let g be as above. Then the integral form of one-variable Taylor’s theorem says that g(1) =

g(0) +∑q

k=1

g(k)(0)(1− 0)

k!+∫ 10

(1− t)q

q!g(q+1)(t)dt. This yields the required result. �

Definition: Let U ⊂ Rn be open and f : Rn → R be a function for which all the second order partial

derivatives exist at a ∈ U . Then the n × n matrix Hf (a) whose ijth entry is∂2f

∂xi∂xj(a) is called

the Hessian matrix of f . Note that if the second order partial derivatives of f are continuous in a

neighborhood of a, then Hf (a) is a symmetric matrix by [110].

Remark: Let U ⊂ Rn be open and f : Rn → R be differentiable. Then f ′(a; ·) ∈ L(Rn,R) ∼= Rn

for every a ∈ U . Under this identification, g := f ′ can be thought of as a map from U to Rn,

where g(a) = ∇f(a). Now assume that the second order partial derivatives of f exist and are

continuous in U . Writing g = (g1, . . . , gn), we have gi =∂f

∂xiand hence the ijth entry of Jg(a) is

∂gi∂xj

(a) =∂2f

∂xj∂xi(a) =

∂2f

∂xi∂xj(a), where the last equality is by [110]. Thus Jg(a) = Hf (a). We

can think of g′ as the ‘second derivative’ of f , and hence the behavior of the second derivative of

f at a is controlled by the Hessian matrix Hf (a). In particular, the Hessian matrix is useful in

studying the local minima and local maxima of f . To state the results precisely, we need a few

facts from Linear Algebra.


Definition: Let A be an n×n real symmetric matrix. For x ∈ Rn, let Ax = A(x) ∈ Rn be the vector

obtained by applying the linear map specified by A (with respect to the standard basis) to x. Since

A is symmetric, Ax can be obtained in terms of matrix multiplication either by multiplying the

column vector x on the left by A, or the row vector x on the right by A. We say A is positive definite

if ⟨Ax, x⟩ > 0 for every x ∈ Rn \ {0}, and negative definite if ⟨Ax, x⟩ < 0 for every x ∈ Rn \ {0}.

Example: When n = 2, we have that

1 0

0 1

is positive definite,

−1 0

0 −1

is negative definite,

and

1 0

0 −1

is a symmetric matrix which is neither positive definite nor negative definite.

[113] [Facts from Linear Algebra] Let A be an n× n real symmetric matrix. Then,

(i) The eigenvalues, say λ1, . . . , λn, of A are all real, and there is an orthonormal basis {u1, . . . , un}

of Rn such that Auj = λjuj for 1 ≤ j ≤ n.

(ii) A is positive definite ⇔ all eigenvalues of A are positive.

(iii) A is negative definite ⇔ all eigenvalues of A are negative.

(iv) Suppose n = 2 and A = [aij ]2×2. Then, A is positive definite ⇔ det(A) > 0 and a11 > 0.

Similarly, A is negative definite ⇔ det(A) > 0 (warning: not negative) and a11 < 0. If det(A) < 0,

then the two eigenvalues of A have opposite signs because det(A) is equal to the product of the

two eigenvalues of A.

Proof. For (i), see a suitable textbook in Linear Algebra.

(ii) Let λj ∈ R and uj ∈ Rn be as in (i). If A is positive definite, then λj = λj⟨uj , uj⟩ =

⟨λjuj , uj⟩ = ⟨Auj , uj⟩ > 0 for 1 ≤ j ≤ n. Conversely, assume λj > 0 for 1 ≤ j ≤ n. For any

x =∑n

j=1 cjuj ∈ Rn \ {0}, we have cj ̸= 0 for some j, and hence we see by the orthonormality of

uj ’s that ⟨Ax, x⟩ =∑n

i=1

∑nj=1 cicj⟨Aui, uj⟩ =

∑ni=1

∑nj=1 cicjλi⟨ui, uj⟩ =

∑nj=1 c

2jλj∥uj∥2 > 0.

(iii) Note that A is negative definite iff −A is positive definite, or imitate the proof of (ii).

(iv) Note that a12 = a21 since A is symmetric. Since ⟨Ae1, e1⟩ = a11, we may suppose a11 ̸= 0 for

proving both the implications. Now for (x, y) ∈ R2, observe that ⟨A(x, y), (x, y)⟩

= a11x2 + 2a12xy + a22y

2 = a11(x+a12y

a11)2 + (a22 −

a212a11

)y2 = a11(x+a12y

a11)2 +

det(A)y2

a11.

From this expression, it is clear that A is positive definite if a11 > 0 and det(A) > 0. Conversely,

if A is positive definite, then from the above expression, a11 = ⟨Ae1, e1⟩ > 0 and det(A)a11 =

⟨A(a12,−a11), (a12,−a11)⟩ > 0 so that det(A) > 0. �

Remark: If A ∈ Rn×n, then the function Q : Rn → R given by Q(x) = ⟨Ax, x⟩ is called a quadratic

form. Some of the assertions in [113] can also be stated in the language of a quadratic form.


Definition: Let U ⊂ Rn be open, f : U → R be differentiable, and a ∈ U . We say a is a critical

point of f if ∇f(a) = 0 ∈ Rn (equivalently, if f ′(a; ·) ≡ 0). If a ∈ U is a critical point of f , and is

neither a local maximum nor a local minimum of f , then a is called a saddle point of f .

Example: Let f : R2 → R be f(x, y) = x2 − y2. Then (0, 0) is a critical point of f because

∇f(0, 0) = (0, 0). But f(x, y) > 0 for x > y > 0 and f(x, y) < 0 for y > x > 0, and hence (0, 0) is

neither a local minimum nor a local maximum of f . Thus (0, 0) is a saddle point of f .

Exercise-14: Let U ⊂ Rn be open, f : U → R be a C2-function, and a ∈ U be a critical point

of f . If r > 0 is with B(a, r) ⊂ U , then for every b ∈ B(a, r) \ {a}, there is c ∈ (a, b) such

that 2(f(b) − f(a)) = ⟨Hf (c)(b − a), b − a⟩, where Hf (c) is the Hessian matrix of f at c. [Hint :

By [112], there is c ∈ (a, b) such that f(b) − f(a) = ⟨b − a,∇⟩f(a) + (1/2)⟨b − a,∇⟩2f(c). But

⟨b− a,∇⟩f(a) = ⟨∇f(a), b− a⟩ = 0 since ∇f(a) = 0. Also, ⟨b− a,∇⟩2f(c) = ⟨Hf (c)(b− a), b− a⟩.]

[114] [Test for local extrema] Let U ⊂ Rn be open, f : U → R be a C2-function, and a ∈ U be a

critical point of f , i.e., ∇f(a) = 0 ∈ Rn. Let Hf (x) be the Hessian matrix of f at x ∈ U .

(i) If Hf (a) is positive definite (equivalently, if all eigenvalues of Hf (a) are positive), then a is a

strict local minimum of f .

(ii) If Hf (a) is negative definite (equivalently, if all eigenvalues of Hf (a) are negative), then a is a

strict local maximum of f .

(iii) If there are x, y ∈ Rn with ⟨Hf (a)(x), x⟩ < 0 < ⟨Hf (a)(y), y⟩, or if Hf (a) has two eigenvalues

λ1 and λ2 with λ1 < 0 < λ2, then a is a saddle point of f .

Proof. Let S = {v ∈ Rn : ∥v∥ = 1}, which is compact.

(i) If Hf (a) is positive definite, there is δ > 0 such that ⟨Hf (a)v, v⟩ ≥ δ for every v ∈ S because the

function v 7→ ⟨Hf (a)v, v⟩ from S to R is continuous. As f is a C2-function, we may choose r > 0

such that B(a, r) ⊂ U and the operator norm ∥Hf (c)−Hf (a)∥ < δ for every c ∈ B(a, r). Consider

b ∈ B(a, r) \ {a}. Then 2(f(b) − f(a)) = ⟨Hf (c)(b − a), b − a⟩ for some c ∈ (a, b) by Exercise-14.

Taking v =b− a

∥b− a∥∈ S, we see that

2(f(b)− f(a))

∥b− a∥2= ⟨Hf (c)v, v⟩ = ⟨Hf (a)v, v⟩ + ⟨(Hf (c) −

Hf (a))v, v⟩ > 0 because ⟨Hf (a)v, v⟩ ≥ δ and ⟨(Hf (c) − Hf (a))v, v⟩ ≤ ∥Hf (c) − Hf (a)∥∥v∥2 < δ.

This shows that f(b) > f(a) for every b ∈ B(a, r) \ {a}.

(ii) This is similar to the proof of (i), or apply (i) to −f .

(iii) If Hf (a) has two eigenvalues λ1 and λ2 with λ1 < 0 < λ2, then for the corresponding eigen-

vectors x, y ∈ Rn \ {0}, we have that ⟨Hf (a)x, x⟩ < 0 < ⟨Hf (a)y, y⟩. So assume this inequality

holds. Let v = y/∥y∥. We claim that f(a + εv) > f(a) for every sufficiently small ε > 0. Since

δ := ⟨Hf (a)v, v⟩ is positive, we may choose r > 0 such that B(a, r) ⊂ U and the operator norm

∥Hf (c) − Hf (a)∥ < δ for every c ∈ B(a, r). Now consider any ε ∈ (0, r) and put b = a + εv ∈


B(a, r)\{a}. By Exercise-14, there is c ∈ (a, b) ⊂ B(a, r) with 2(f(b)−f(a)) = ⟨Hf (c)(b−a), b−a⟩.

As in the proof of (i), we conclude that2(f(b)− f(a))

∥b− a∥2= ⟨Hf (a)v, v⟩+ ⟨(Hf (c)−Hf (a))v, v⟩ > 0,

where the last inequality is deduced using the choice of δ and r. This shows that f(a+εv)−f(a) =

f(b) − f(a) > 0 for every ε ∈ (0, r). Similarly, by taking u = x/∥x∥, we may show that

f(a+ εu)− f(a) < 0 for every sufficiently small ε > 0. Hence a is a saddle point of f . �

Example: (i) Let f : R2 → R be f(x, y) = xy−x2−y2. Then we see that ∇f(x, y) = (y−2x, x−2y),

and hence ∇f(0, 0) = (0, 0). Now, A = [aij ] := Hf (0, 0) =

−2 1

1 −2

. The eigenvalues of this

matrix are the roots of its characteristic polynomial, which is λ2 + 4λ + 3. Hence the eigenvalues

are −1 and −3. Thus (0, 0) is a strict local maximum of f by [114](i). Alternately, one may use

[114](i) and [113](iv) to conclude the same since det(A) > 0 and a11 = −2 < 0.

(ii) Let f : R2 → R be f(x, y) = ex2+y2 . Directly, we may see that (0, 0) is the unique minimum of

f . Now ∇f(x, y) = (2xex2+y2 , 2yex

2+y2) so that ∇f(0, 0) = (0, 0), and Hf (0, 0) =

2 0

0 2

, whoseeigenvalues are positive. Thus (0, 0) is a local minimum of f by our test.

(iii) Let f : R2 → R be f(x, y) = exy2. Since ex > 0 and y2 ≥ 0, we see directly that (x, 0) is a local

minimum of f for every x ∈ R. However, this does not follow from our test. Note that ∇f(x, y) =

(exy2, 2exy) so that ∇f(0, 0) = (0, 0). Since Hf (0, 0) =

0 0

0 2

, we have det(Hf (0, 0)) = 0, and

the tests [113] and [114] are not applicable.

(iv) Let f : R2 → R be f(x, y) = cosx + y sinx. Then ∇f(x, y) = (− sinx + y cosx, sinx) so that

∇f(0, 0) = (0, 0). We have Hf (0, 0) =

−1 1

1 0

. Since det(Hf (0, 0)) < 0, we conclude by [113](iv)

and [114](iii) that (0, 0) is a saddle point of f .

5. Inverse function theorem and Implicit function theorem

Under suitable hypothesis, we will show that if the Jacobian matrix Jf (a) of f at a is invertible,

then f is locally invertible. We need a little preparation.

Exercise-15: Let L(Rn,Rn) = {all linear maps L : Rn → Rn}, equipped with the operator norm.

Recall that L(Rn,Rn) ∼= Rn×n, and the operator norm on L(Rn,Rn) is equivalent to the Euclidean

norm on Rn×n. Let U := {L ∈ L(Rn,Rn) : L is invertible} = {A ∈ Rn×n : det(A) ̸= 0}. Then,

(i) U is open in L(Rn,Rn) (equivalently, in Rn×n) because det : Rn×n → R is continuous.

(ii) The map A 7→ A−1 from U to U is continuous.


[Hint : (ii) Consider A ∈ U . Choose 0 < r <1

2∥A−1∥by (i) such that C ∈ U whenever ∥A−C∥ < r.

Now consider C ∈ U with ∥A−C∥ < r. Note that C−1 −A−1 = C−1(A−C)A−1. Hence ∥C−1∥ ≤

∥A−1∥ + ∥C−1∥∥A − C∥∥A−1∥ < ∥A−1∥ + ∥C−1∥/2, and hence ∥C−1∥ ≤ 2∥A−1∥. Therefore,

∥C−1 −A−1∥ ≤ ∥C−1∥∥A− C∥∥A−1∥ < 2∥A−1∥2∥A− C∥, which gives continuity at A.]

Exercise-16: [Matrix form of Mean value theorem] Let U ⊂ Rn be open, f = (f1, . . . , fn) : U → Rn

be differentiable, and a, b ∈ U be distinct with [a, b] ⊂ U . Then there are vectors c1, . . . , cn ∈ (a, b)

such that f(b)− f(a) =

[∂fi(ci)

∂xj

]n×n

(b− a), where f(b)− f(a) and b− a are to be considered as

column vectors. [Hint : Apply [109](i) to each fi separately.]

[115] [Inverse function theorem] Let U ⊂ Rn be open, f : U → Rn be a C1-function, and a ∈ U be

with det(Jf (a)) ̸= 0. Then there is r > 0 with B(a, r) ⊂ U such that the following are true:

(i) f is injective on B(a, r) and det(Jf (x)) ̸= 0 for every x ∈ B(a, r).

(ii) f restricted to B(a, r) is an open map; and in particular, V := f(B(a, r)) is open in Rn.

(iii) Let g : V → B(a, r) be g = (f |B(a,r))−1. Then g is a C1-function and Jg(v) = Jf (u)

−1 for every

v = f(u) ∈ V , where u ∈ B(a, r). Moreover, if f is a Ck-function, then g is also a Ck-function.

Proof. (i) Let Un = U×· · ·×U ⊂ Rn×n. Define ϕ : Un → R as ϕ(u1, . . . , un) = det

([∂fi(ui)

∂xj

]n×n

).

Then ϕ is continuous because f is a C1-function and det : Rn×n → R is continuous. Since

ϕ(a, . . . , a) = det(Jf (a)) ̸= 0, we may find r > 0 with B(a, r) ⊂ U such that ϕ(c1, . . . , cn) ̸= 0 for ev-

ery (c1, . . . , cn) ∈ B(a, r)× · · · ×B(a, r) = B(a, r)n. In particular, det(Jf (x)) = ϕ(x, . . . , x) ̸= 0 for

every x ∈ B(a, r). Now suppose u, x ∈ B(a, r) are distinct. By Exercise-16, there are c1, . . . , cn ∈

(u, x) with f(x)−f(u) =[∂fi(ci)

∂xj

](x−u). Since x−u ̸= 0 and det

([∂fi(ci)

∂xj

])= ϕ(c1, . . . , cn) ̸= 0,

it follows that f(x)− f(u) ̸= 0. Thus f is injective on B(a, r).

(ii) Let U0 ⊂ B(a, r) be open and u ∈ U0. We have to find δ > 0 with B(f(u), δ) ⊂ f(U0). Choose

ε > 0 with B(u, ε) ⊂ U0. Since the boundary ∂B(u, ε) is compact, its image f(∂B(u, ε)) is also

compact by the continuity of f . Moreover, f(u) /∈ f(∂B(u, ε)) by injectivity. Hence there is δ > 0

such that ∥f(x)−f(u)∥ ≥ 2δ for every x ∈ ∂B(u, ε). Fix y ∈ B(f(u), δ) and define ψ : B(u, ε) → R

as ψ(x) = ∥f(x) − y∥2 =∑n

i=1(fi(x) − yi)2. Then ψ is continuous and attains its minimum at

some z ∈ B(u, ε) by the compactness of B(u, ε). Observe that if x ∈ ∂B(u, ε), then ψ(x) ≥

(∥f(x) − f(u)∥ − ∥f(u) − y∥)2 ≥ (2δ − δ)2 = δ2 > ψ(u). Therefore, ψ cannot attain its minimum

on the boundary of B(u, ε). In other words, we must have z ∈ B(u, ε). Since ψ is differentiable,

we conclude by Exercise-12 that ∇ψ(z) = 0. This means 0 =∂ψ

∂xj(z) = 2

∑ni=1(fi(z) − yi)

∂fi∂xj

(z)

for 1 ≤ j ≤ n. In matrix form, this is equivalent to (f(z) − y)Jf (z) = 0 ∈ Rn. As det(Jf (z)) ̸= 0

by (i), the only possibility is f(z) = y. This shows B(f(u), δ) ⊂ f(B(u, ε)) ⊂ f(U0).


(iii) Fix v = f(u) ∈ V , where u ∈ B(a, r). We claim that g is differentiable at v with Jg(v) =

Jf (u)−1. Consider y ∈ V \ {v}. By (i), there is x ∈ B(a, r) \ {u} with f(x) = y. Then by

Exercise-16, there are c1, . . . , cn ∈ (u, x) with f(x)− f(u) =

[∂fi(ci)

∂xj

](x− u). This means y− v =[

∂fi(ci)

∂xj

](g(y)−g(v)), and hence g(y)−g(v) =

[∂fi(ci)

∂xj

]−1

(y−v). As y → v, we have x→ u since g

is continuous by (ii); and consequently

[∂fi(ci)

∂xj

]−1

→ Jf (u)−1 in Rn×n by the C1-property of f and

Exercise-15(ii). Therefore, it follows by [105] (or by direct calculation) that g is differentiable at v

with Jg(v) = Jf (u)−1. This proves the claim, and thus g is differentiable in V with Jg(v) = Jf (u)

−1

for every v = f(u), where u ∈ B(a, r). Since f is a C1-function, it now follows by Exercise-15(ii)

that g is also a C1-function (or note that the entries of Jg(v) are rational functions of the entries

of Jf (u)). Similarly, if f is a Ck-function, we may deduce that g is a Ck-function. �

Remark: (i) Inverse function theorem guarantees only a local inverse for f . Even if det(Jf (a)) ̸= 0

for every a ∈ U , f may not have a global inverse. For example, let f : R2 → R2 be f(x, y) =

(ex cos y, ex sin y). Then f(x, y + π) = f(x, y) so that f fails to be injective, and hence f has no

global inverse. But det(Jf (x, y)) = det

ex cos y −ex sin y

ex sin y ex cos y

= e2x ̸= 0 for every (x, y) ∈ R2.

(ii) For another proof of [115] using Banach fixed point theorem for contraction maps, see Theorem

9.24 in Rudin, Principles of Mathematical Analysis.

Notation: We write (x, y) ∈ Rn+m to mean x = (x1, . . . , xn) ∈ Rn and y = (y1, . . . , ym) ∈ Rm. If

f = (f1, . . . , fm) : Rn+m → Rm is differentiable, let

[∂f

∂x(x, y)

]denote the m×n matrix whose ijth

entry is∂fi∂xj

(x, y), and

[∂f

∂y(x, y)

]denote the m ×m matrix whose ijth entry is

∂fi∂yj

(x, y). The

Jacobian matrix of f is then expressed as a block matrix: Jf (x, y) =

[∂f

∂x(x, y)

∂f

∂y(x, y)

]m×(n+m)

.

To motivate our next result, we ask the following question: if f : Rn+m → Rm is a function,

then from the expression f(x, y) = 0 (where x ∈ Rn and y ∈ Rm), can we solve y as a function of

x, at least locally? In other words, can we represent the zero set {(x, y) ∈ Rn+m : f(x, y) = 0}, at

least locally, as the graph G(g) of a function g from an open set in Rn to Rm? This question about

the ‘implicit’ existence of a function g is answered by the Implicit function theorem stated as [116]

below. First let us consider some examples.

Example: (i) Let f : R2 → R be f(x, y) = x+ 2y − 3. If f(x, y) = 0, then y = (3− x)/2.

(ii) Let f : R2 → R be f(x, y) = xy. Here, y cannot be solved as a function of x globally from

‘f(x, y) = 0’ since f(0, y) = 0 for every y ∈ R. However, if x ̸= 0, then we have y = 0 whenever

f(x, y) = 0, and hence y may be thought of as the zero function of x for x ∈ R \ {0}.


(iii) Let f : R2 → R be f(x, y) = x2+y2−1. Then {(x, y) : f(x, y) = 0} is the unit circle in R2. The

full circle is not the graph of any function. But any arc of the unit circle which projects injectively

to the x-axis is clearly the graph of a function; for example, {(x, y) : f(x, y) = 0 and y > 0} is the

graph of the function g : (−1, 1) → R defined as g(x) =√1− x2. If an open arc of the unit circle

contains either (1, 0) or (0, 1), then that arc is not the graph of any function. Here, note that (1, 0)

and (0, 1) are precisely the points where∂f

∂yvanishes.

(iv) Let f : Rn+m → Rm be linear. Let L1 : Rn → Rm and L2 : Rm → Rm be L1(x) = f(x, 0)

and L2(y) = f(0, y) so that f(x, y) = L1(x) + L2(y). Now from ‘f(x, y) = 0’, we can solve y as

y = −L−12 (L1(x)) provided L2 is invertible. Let A be the m× (n+m) matrix of the linear map f

with respect to the standard bases of Rn+m and Rm. Then Jf (x, y) = A for every (x, y) ∈ Rn+m.

Note that the (n+ j)th column of A is same as the jth column of the matrix of the linear map L2

since L2(y) = f(0, y). Therefore L2 is invertible iff the m×m matrix

[∂f

∂y(x, y)

]is invertible.

[116] [Implicit function theorem] Let U ⊂ Rn+m be open, f : U → Rm be a C1-function, and

(a, b) ∈ U be such that f(a, b) = 0 ∈ Rm and the m ×m matrix

[∂f

∂y(a, b)

]is invertible. Then,

there exist r > 0, an open neighborhood A ⊂ Rn of a, and a C1-function g : A→ Rm such that

(i) B((a, b), r) ⊂ U , g(a) = b, and {(x, y) ∈ B((a, b), r) : f(x, y) = 0} = {(x, g(x)) : x ∈ A}.

(ii) Jg(x) = −[∂f

∂y(x, g(x))

]−1 [∂f∂x

(x, g(x))

]for every x ∈ A.

Proof. (i) Let F : U → Rn+m be F (x, y) = (x, f(x, y)), which is a C1-function with F (a, b) =

(a, 0). As JF (x, y) =

In 0∂f

∂x(x, y)

∂f

∂y(x, y)

, we see det(JF (a, b)) = det(

[∂f

∂y(a, b)

]) ̸= 0. By

Inverse function theorem [115], there exist r > 0, an open set V ⊂ Rn+m, and a C1-function

G : V → B((a, b), r) such that B((a, b), r) ⊂ U , det(JF (x, y)) = det(

[∂f

∂y(x, y)

]) ̸= 0 for every

(x, y) ∈ B((a, b), r), F |B((a,b),r) : B((a, b), r) → V is a bijective open map, and G is the inverse of

F |B((a,b),r). Moreover, G must be of the form G(x, y) = (x, h(x, y)) since F (x, y) = (x, f(x, y)).

Let A = {x ∈ Rn : (x, 0) ∈ V }, and g : A → Rm be g(x) = Π(G(x, 0)), where Π : Rn+m → Rm

is the projection Π(x, y) = y. Then A is an open neighborhood a (since (a, 0) = F (a, b)), and

g is a C1-function with g(x) = h(x, 0). For (x, y) ∈ U , we observe that (x, y) ∈ B((a, b), r) and

f(x, y) = 0 ⇔ (x, y) ∈ B((a, b), r) and F (x, y) = (x, 0) ⇔ (x, 0) ∈ V and G(x, 0) = (x, y) ⇔ x ∈ A

and g(x) = y. Hence {(x, y) ∈ B((a, b), r) : f(x, y) = 0} = {(x, g(x)) : x ∈ A}, and g(a) = b.

(ii) Let ϕ : A → Rn+m be ϕ(x) = (x, g(x)) = G(x, 0), which is a C1-function with f ◦ ϕ ≡ 0.

We deduce by the Chain rule that 0m×n = Jf◦ϕ(x) = Jf (ϕ(x))Jϕ(x) for every x ∈ A. But


Jf (ϕ(x)) =

[∂f

∂x(x, g(x))

∂f

∂y(x, g(x))

]m×(n+m)

and Jϕ(x) =

In

Jg(x)

(n+m)×m

. Hence we get

0m×n =

[∂f

∂x(x, g(x))

]+

[∂f

∂y(x, g(x))

]Jg(x). So, Jg(x) = −

[∂f

∂y(x, g(x))

]−1 [∂f∂x

(x, g(x))

]. �

Remark: The required function g in [116] can be produced in a simpler way when n = m = 1.

Let U ⊂ R2 be open, f : U → R be a C1-function, and (a, b) ∈ U be such that f(a, b) = 0 and∂f

∂y(a, b) ̸= 0. Replacing f with −f if necessary, assume

∂f

∂y(a, b) > 0. By the C1-property of f , we

may choose δ > 0 with∂f

∂y> 0 in [a − δ, a + δ] × [b − δ, b + δ]. Then f(x, ·) is strictly increasing

in [b − δ, b + δ] for each x ∈ [a − δ, a + δ]. In particular, f(a, b − δ) < 0 < f(a, b + δ). Therefore,

we may choose ε ∈ (0, δ) such that f(x, b − δ) < 0 < f(x, b + δ) whenever |x − a| < ε. For each

x ∈ (a−ε, a+ε), applying the intermediate value property to the strictly increasing function f(x, ·),

we may find a unique y ∈ (b− δ, b+ δ) with f(x, y) = 0. Define g : (a− ε, a+ ε) → R as g(x) = y.

We mention two examples to point out some subtle aspects of [116]:

Example: (i) Let f : R2 → R be f(x, y) = x2− y2 = (x− y)(x+ y). Then f(0, 0) = 0,∂f

∂y(0, 0) = 0,

and the set {(x, y) : f(x, y) = 0} is the union of the lines x = y and x = −y. If we define g : R → R

as g(x) = x (or, g(x) = −x), then g is a C1-function and f(g(x)) = 0 for every x ∈ R. But in every

neighborhood of (0, 0), the set {(x, y) : f(x, y) = 0} contains points that are not on the graph of g.

(ii) Let f : R2 → R be f(x, y) = x2 − y3, and g : R → R be g(x) = (x2)1/3. Then f(0, 0) = 0,∂f

∂y(0, 0) = 0, and {(x, y) ∈ R2 : f(x, y) = 0} = {(x, g(x)) : x ∈ R}. But g is not differentiable at 0.

6. Tangent spaces and Lagrange’s multiplier method

To find local extrema of a function f restricted to a subset of its original domain (i.e., under

some constraints), a technique called Lagrange’s Multiplier Method (LMM) is often useful. This

method does not pinpoint the local extrema, but helps us to narrow down our search to possible

solutions. First we will explain theoretically why this method works. In this context, we will briefly

mention what is a tangent space to a level set; to learn more about tangent vectors and tangent

spaces, see some textbook in Differential Geometry.

[117] Let U ⊂ Rn+m be open, f : U → Rm be a C1-function, c ∈ Rm, and S = f−1(c) = {(x, y) ∈

U : f(x, y) = c} (which is called a level set of f). Let p = (a, b) ∈ S be such that the linear

map f ′(p ; ·) : Rn+m → Rm is surjective, or equivalently, Jf (p) has full rank m. Then for any

v = (v1, v2) ∈ Rn+m, the following are equivalent:

(i) f ′(p ; v) = 0 ∈ Rm.


(ii) There exist δ > 0 and a C1-path α : (−δ, δ) → Rn+m such that α(t) ∈ S for every t ∈ (−δ, δ),

α(0) = p, and α′(0) = v.

Proof. Replacing f with f − c, we may suppose c = 0 ∈ Rm.

(ii) ⇒ (i): Since f ◦ α : (−δ, δ) → Rm is constant, f ◦ α ≡ c = 0, we get by Chain rule that

0 = (f ◦ α)′(0) = f ′(α(0);α′(0)) = f ′(p; v).

(i) ⇒ (ii): Since Jf (p) = Jf (a, b) has full rank m, we may suppose after a permutation of the

coordinates of Rn+m that the m × m matrix

[∂f

∂y(a, b)

]is invertible. Then by Implicit function

theorem, there exist r > 0 with B((a, b), r) ⊂ U , an open neighborhood A ⊂ Rn of a, and a

C1-function g : A→ Rm such that

(*) g(a) = b, and S ∩B((a, b), r) = {(x, y) ∈ B((a, b), r) : f(x, y) = 0} = {(x, g(x)) : x ∈ A}.

(**) Jg(a) = −[∂f

∂y(a, b)

]−1 [∂f∂x

(a, b)

].

By (i), we have f ′((a, b); (v1, v2)) = 0, or in other words

[∂f

∂x(a, b)

]v1 +

[∂f

∂y(a, b)

]v2 = 0 ∈ Rm.

Hence v2 = −[∂f

∂y(a, b)

]−1 [∂f∂x

(a, b)

]v1 = Jg(a)v1 by (**), and thus g′(a; v1) = v2. Since A is an

open neighborhood of a, there is δ > 0 with a+ tv1 ∈ A for every t ∈ (−δ, δ). Define α : (−δ, δ) →

Rn+m as α(t) = (a+ tv1, g(a+ tv1)). Then α is a C1-path with α(0) = (a, g(a)) = (a, b) = p, and

α(t) ∈ S for every t ∈ (δ, δ) by (*). Also, α′(0) = (v1, g′(a; v1)) = (v1, v2) = v. �

Definition: Let U ⊂ Rn+m be open, f : U → Rm be a C1-function, c ∈ Rm, and S = f−1(c). Let

p = (a, b) ∈ S be such that the linear map f ′(p ; ·) : Rn+m → Rm is surjective, or equivalently,

Jf (p) has full rank m (when m = 1, this condition means simply that ∇f(p) ̸= 0). We say v =

(v1, v2) ∈ Rn+m is a tangent vector to the level set S at p if there is a C1-path α : (−δ, δ) → Rn+m

(for some δ > 0) such that α(t) ∈ S for every t ∈ (−δ, δ), α(0) = p, and α′(0) = v. The set of all

tangent vectors of S at p is called the tangent space of S at p and it is denoted as TpS.

[118] Let U ⊂ Rn+m be open, f = (f1, . . . , fm) : U → Rm be a C1-function, c ∈ Rm, and

S = f−1(c). Let p ∈ S be such that the linear map f ′(p ; ·) : Rn+m → Rm is surjective, or

equivalently, Jf (p) has full rank m. Then,

(i) The tangent space TpS is an n-dimensional vector subspace of Rn+m.

(ii) The vectors ∇f1(p), . . . ,∇fm(p) in Rn+m are linearly independent.

(iii) For each i ∈ {1, . . . , n}, ∇fi(p) ⊥ Tp(S), i.e., ⟨∇fi(p), v⟩ = 0 for every v ∈ TpS (in particular,

if m = 1, then ∇f(p) is normal to Tp(S)).

(iv) The orthogonal complement of TpS is equal to the span of the vectors ∇f1(p), . . . ,∇fm(p),

and therefore we have the orthogonal decomposition Rn+m = span{∇fi(p) : 1 ≤ i ≤ m}⊕TpS.


Proof. (i) By [117], TpS is equal to the kernel of the surjective linear map f ′((p ; ·) : Rn+m → Rm.

(ii) These vectors are the rows of Jf (p), which has rank m since f ′(p ; ·) : Rn+m → Rm is surjective.

(iii) Let v ∈ TpS. Then f ′(p ; v) = (⟨∇f1(p), v⟩, . . . , ⟨∇fm(p), v⟩) = 0 ∈ Rm by [117]. Or argue as

follows. Choose α as in [117](ii). Since f ◦ α is constant, we have that fi ◦ α is constant for each i.

Hence by Chain rule, 0 = (fi ◦ α)′(0) = ⟨∇fi(α(0)), α′(0)⟩ = ⟨∇fi(p), v⟩.

(iv) This follows from (i), (ii), and (iii). �

Example: (i) Let f : Rn+1 → R be f(x) =∑n+1

j=1 x2j , and S = f−1(1), which is the unit sphere in

Rn+1. Let p ∈ S, and note that ∇f(p) = 2p ̸= 0 since p ∈ S. Hence by [118], the tangent space

TpS is an n-dimensional vector subspace of Rn+1, and every geometric tangent vector to S at p can

be realized as the velocity vector of a C1-path in S as described by [117](ii). Moreover, ∇f(p) is

normal to Tp(S) at p by [118](iii).

(ii) Let f : R2 → R be f(x, y) = xy and S = f−1(0). Then S is equal to the union of x-axis and

y-axis. Geometrically, the collection of all tangent vectors to S at (0, 0) is equal to S itself, and this

is not a vector subspace of R2. Here, the problem is that ∇f(0, 0) = (0, 0), and therefore the linear

map f ′((0, 0) : ·) : R2 → R is not surjective; in fact, this linear map is the zero map, and hence

its kernel is the whole of R2, which is strictly larger than the collection of all geometric tangent

vectors to S at (0, 0).

Now we will state the result justifying the Lagrange’s Multiplier method (LMM), and then we

will illustrate its use through several examples.

[119] [LMM theorem] Let U ⊂ Rn+m be open, g : U → R be differentiable, and f = (f1, . . . , fm) :

U → Rm be a C1-function. Let c ∈ Rm, S = f−1(c), and p ∈ S be a local extremum of the restricted

function g|S . If the linear map f ′(p ; ·) : Rn+m → Rm is surjective, or equivalently, if Jf (p) has full

rank m, then there are λ1, . . . , λm ∈ R (called multipliers) such that ∇g(p) =∑m

i=1 λi∇fi(p).

Proof. We claim that ∇g(p) ⊥ TpS. To prove the claim, consider v ∈ TpS. Then by [117],

there is a C1-path α : (−δ, δ) → Rn+m such that α(t) ∈ S for every t ∈ (−δ, δ), α(0) = p, and

α′(0) = v. Then g◦α : (−δ, δ) → R is a differentiable function having a local extremum at 0. Hence

0 = (g ◦ α)′(0) = g′(α(0);α′(0)) = g′(p; v) = ⟨∇g(p), v⟩. This proves the claim. By the claim and

[118](iv), it follows that ∇g(p) ∈ span{∇f1(p), . . . ,∇fm(p)}. �

The Lagrange’s Multiplier Method (LMM) may be explained roughly as follows. Suppose g is a

real-valued function defined on an open subset U of Rn+m, and we need to find the (local) extrema of

g|S , where S := f−1(c) is the level set of a function f = (f1, . . . , fm) : U → Rm. If the assumptions

of [119] are satisfied, then any local extremum p ∈ S of g|S must satisfy the following: (i) f(p) = c,


and (ii)∇g(p) =∑m

i=1 λi∇fi(p) for some λ1, . . . , λm ∈ R. Let S0 = {p ∈ U : p satisfies (i) and (ii)}.

Based on the given problem, we may be able to identify the subset S0 of S (often, S0 is a finite

subset). In this manner, LMM helps us to narrow down our search. Next, by examining each

p ∈ S0, we need to determine using other considerations whether p is a (local) extremum of g|S .

Example: (i) We wish to find the maximum/minimum of x + y subject to the constraint that

x2 + y2 = 1. Define g, f : R2 → R as g(x) = x+ y and f(x, y) = x2 + y2, which are C1-functions.

Let S = f−1(1), and note that ∇f(p) = 2p ̸= 0 for every p ∈ S. If p = (a, b) ∈ S is a local

extremum of g|S , then ∇g(p) = λ∇f(p) for some λ ∈ R by [119]. This gives (1, 1) = 2λ(a, b). Also

a2 + b2 = 1 since (a, b) ∈ S. Therefore, λ = ± 1√2and (a, b) = ±(

1√2,1√2). LMM gives only this

much information. Now by direct examination, we deduce that x + y attains its maximum and

minimum subject to the constraint x2 + y2 = 1 respectively at (1√2,1√2) and (

−1√2,−1√2).

(ii) We wish to find the unique point on the line y = x+5 at a minimum distance to (2, 1). Define

g, f : R2 → R as g(x, y) = (x − 2)2 + (y − 1)2, f(x, y) = y − x (which are C1-functions), and

put S = f−1(5). We know geometrically that g|S has a unique minimum. Note that ∇f(x, y) =

(−1, 1) ̸= (0, 0) for every (x, y) ∈ S. Hence if (a, b) ∈ S is where g|S attains its minimum, then

∇g(a, b) = λ∇f(a, b) for some λ ∈ R by [119]. Solving the equations 2((a− 2), (b− 1)) = λ(−1, 1)

and b = a+ 5, we get (a, b) = (−1, 4), which must be the required point.

(iii) We wish to maximize x + y + z subject to x − y = 1 and x + z2 = 1. Let g : R3 → R be

g(x, y, z) = x+ y + z and f = (f1, f2) : R3 → R2 be f(x, y, z) = (x− y, x+ z2). Let S = f−1(1, 1).

Note that Jf (x, y, z) =

1 −1 0

1 0 2z

has rank 2 for every (x, y, z) ∈ S. If p = (x, y, z) ∈ S is a local

extremum of g|S , then by [119], ∇g(p) = λ1∇f1(p)+λ2∇f2(p) and f(p) = (1, 1). The first equation

implies λ1 = −1, λ2 = 2, and z = 1/4. Then using f(p) = (1, 1), we get x = 1 − z2 = 15/16 and

y = x− 1 = −1/16. Thus p = (15/16,−1/16, 1/4), and g(p) = 9/8. Observe that any point q ∈ S

must be of the form q = (1− z2,−z2, z). If |z| > 1, then g(q) = 1 + z − 2z2 < 0 < g(p). Moreover,

{(x, y, z) ∈ S : |z| ≤ 1} is compact, and g must have a maximum on this compact set. Therefore,

g|S must attain its maximum at p, and max g(S) = g(p) = 9/8.

(iv) We wish to find the maximum volume of a 3-dimensional rectangular box A with surface

area c > 0. Suppose that A = {∑3

j=1 tjej : 0 ≤ t1 ≤ x; 0 ≤ t2 ≤ y; 0 ≤ t3 ≤ z}, where

x, y, z are positive. Then the volume of A is xyz and surface area of A is 2(xy + yz + xz). Let

U = {(x, y, z) ∈ R3 : x > 0, y > 0, z > 0}. Define g, f : U → R as g(x, y, z) = xyz and

f(x, y, z) = xy + yz + xz, and put S = f−1(c/2). For (x, y, z) ∈ S, from xy + yz + xz = c/2,

we have xy < c/2 and xz < c/2, and therefore g(x, y, z) = xyz <c2

4x→ 0 as x → ∞. Similarly,

g(x, y, z) → 0 when y → ∞, and also when z → ∞, for (x, y, z) ∈ S. Moreover, xyz = 0 if one


of x, y, z is 0. These observations imply that g|S must have a maximum, say at p = (x0, y0, z0).

Since ∇f(p) = (y0 + z0, x0 + z0, x0 + y0) ̸= 0 ∈ R3 by the definition of U , we deduce by [119] that

∇g(p) = λ∇f(p) for some λ ∈ R. Then y0 =λx0x0 − λ

= z0, and similarly, x0 =λy0y0 − λ

= z0. This

means the volume is maximum when the box is a cube. Using f(p) = c/2, we get x0 = (c/6)1/2

and hence the maximum volume is (c/6)3/2.

(v) Let n ≥ 2 and A ∈ Rn×n be a symmetric matrix. We will show A has a real eigenvalue using

LMM. Define g, f : Rn → R as g(x) = ⟨Ax, x⟩, f(x) = ∥x∥2 = ⟨x, x⟩ (which are C1-functions),

and put S = f−1(1), which is the unit sphere in Rn. By Exercise-10(iii) and the symmetry of

A, we have ⟨∇g(x), v⟩ = g′(x; v) = ⟨Ax, v⟩ + ⟨x,Av⟩ = 2⟨Ax, v⟩, and hence ∇g(x) = 2Ax. Also,

∇f(x) = 2x ̸= 0 for x ∈ S. Since S is compact, there is x ∈ S where g|S attains its maximum. By

[119], we get ∇g(x) = λ∇f(x) for some λ ∈ R, which implies Ax = λx. Then λ is an eigenvalue of

A because x is a unit vector (∵ x ∈ S). We remark that this type of argument can be continued to

show A is diagonalizable, and all its eigenvalues are real (for the next step, take the unit sphere in

Y := {y ∈ Rn : ⟨x, y⟩ = 0} in the place of S).

Exercise-17: (i) Maximize∏nj=1 x

2j subject to

∑nj=1 x

2j = 1.

(ii) [Geometric mean ≤ Arithmetic mean] If a1, . . . , an ∈ (0,∞), then (∏nj=1 aj)

1/n ≤∑n

j=1 aj

n.

[Hint : (i) Let g(x1, . . . , xn) =∏nj=1 x

2j , f(x1, . . . , xn) =

∑nj=1 x

2j , and S = f−1(1). Then g has a

positive maximum on the compact set S, say at p = (x1, . . . , xn), where xj ̸= 0 for every j. Since

∇f(p) = 2p ̸= 0, LMM gives ∇g(p) = λ∇f(p) for some λ ∈ R. This implies g(x) = λx2j for every

j and hence ng(x) =∑n

j=1 λx2j = λ. Therefore, g(x) = ng(x)x2j , or x

2j = 1/n for every j. Thus

g(p) =1

nn. (ii) Let bj =

√aj and b = (b1, . . . , bn). Then

1

nn= g(p) ≥ g(

b

∥b∥) =

∏nj=1 aj

(∑n

j=1 aj)n.]

7. Multivariable Riemann integration over a box

Definition: (i) We say A ⊂ Rn is an n-box, if A is a product of n closed intervals, A =∏nj=1[aj , bj ].

Its n-dimensional volume (Lebesgue measure) is µ(A) = µn(A) :=∏nj=1(bj − aj). If there is r > 0

with bj−aj = r for 1 ≤ j ≤ n, the we say A is an n-cube with side length r; in this case, µ(A) = rn.

(ii) Recall from Real analysis that P = {a0 ≤ a1 ≤ · · · ≤ ak} is a partition of [a, b] if a0 = a

and ak = b. We say P =∏nj=1 Pj is a partition of an n-box A =

∏nj=1[aj , bj ] if Pj is a partition

of [aj , bj ] for each j ∈ {1, . . . , n}. Observe that if Pj = {aj,0 ≤ aj,1 ≤ · · · ≤ aj,kj}, then the

partition P =∏nj=1 Pj divides the n-box A into k :=

∏nj=1 kj sub n-boxes D1, . . . , Dk with pairwise

disjoint interiors, and µ(A) =∑k

i=1 µ(Di). In this case, we define the norm (or mesh) ∥P∥ of P as

∥P∥ = max{diam(Di) : 1 ≤ i ≤ k}.


(iii) Let P =∏nj=1 Pj and Q =

∏nj=1Qj be two partitions of an n-box A ⊂ Rn. If Pj ⊂ Qj for

1 ≤ j ≤ n, then we say Q is a refinement of P .

Definition: Let A ⊂ Rn be an n-box and f : A→ R be a bounded function.

(i) Let P be a partition of A and D1, . . . , Dk be the sub n-boxes of A determined by P . The

lower Riemann sum L(f, P ) and upper Riemann sum U(f, P ) of f with respect to P are defined as

L(f, P ) =∑k

i=1(inf f(Di))µ(Di) and U(f, P ) =∑k

i=1(sup f(Di))µ(Di). As in one-variable theory,

we may show L(f, P ) ≤ L(f,Q) ≤ U(f,Q) ≤ U(f, P ) whenever Q is a partition of A refining P ,

and consequently L(f, P ) ≤ U(f, P ′) for any two partitions P, P ′ of A.

(ii) The lower Riemann integral L(f) and upper Riemann integral U(f) of f over A are defined as

L(f) = sup{L(f, P ) : P is a partition of A} and U(f) = inf{U(f, P ) : P is a partition of A}. By

the last sentence of (i) above, we have that L(f) ≤ U(f) always. If L(f) = U(f) =: y ∈ R, then

we say f is Riemann integrable over A and we write∫A f(x)dx = y (or simply

∫A fdx = y).

(iii) Let P be a partition of A, and D1, . . . , Dk be the sub n-boxes of A determined by P . A

finite set T = {t1, . . . , tk} ⊂ A is called a tag for P if ti ∈ Di for 1 ≤ i ≤ k. If this holds, then

S(f, P, T ) :=∑k

i=1 f(ti)µ(Di) is called a Riemann sum of f with respect to the partition P . Clearly,

L(f, P ) ≤ S(f, P, T ) ≤ U(f, P ). Also note that L(f, P ) = inf{S(f, P, T ) : T is a tag for P}, and

U(f, P ) = sup{S(f, P, T ) : T is a tag for P}.

Notation: Let R(A) denote the collection of all Riemann integrable functions f : A→ R.

The results about Riemann integration in the multivariable case are analogous to the results in

the one-variable case. Some of the results are stated below and the proofs are left to the student3.

Exercise-18: Let A ⊂ Rn be an n-box and f : A→ R be a bounded function.

(i) Let P be a partition of A and D1, . . . , Dk are the sub n-boxes of A determined by P . Then

U(f, P )− L(f, P ) =∑k

i=1 diam(f(Di))µ(Di).

(ii) f ∈ R(A) ⇔ for every ε > 0, there is a partition P of A with 0 ≤ U(f, P )− L(f, P ) ≤ ε .

(iii) If f is continuous, then f ∈ R(A).

(iv) If there is c ∈ R such that f(x) = c for every x ∈ int(A), then f ∈ R(A) and∫A fdx = cµ(A).

In particular,∫A 0 dx = 0.

(v) If µ(A) = 0, then∫A fdx = 0.

[Hint : (i) Consider ε > 0. Since f is uniformly continuous on the compact set A, there is δ > 0

such that ∥f(x)− f(y)∥ < ε

µ(A) + 1for every x, y ∈ A with ∥x− y∥ < δ. Let P be a partition of A

3See my notes Real Analysis for the proofs in the one-variable case.


with ∥P∥ < δ, and let D1, . . . , Dk be the sub n-boxes of A determined by P . Then diam(f(Di)) ≤ε

µ(A) + 1for every i by the choice of δ. Now (i) may be applied.]

Exercise-19: Let A ⊂ Rn be an n-box and f ∈ R(A). Then for every ε > 0, there is a δ > 0 such

that the following are true for every partition P of A with ∥P∥ < δ:

(i) 0 ≤ U(f, P )− L(f, P ) ≤ ε and L(f, P ) ≤∫A fdx ≤ U(f, P ).

(ii) |∫A fdx− S(f, P, T )| ≤ ε for any Riemann sum S(f, P, T ) of f with respect to P .

Consequently, for any f ∈ R(A) and any sequence (Pn) of partitions of A with (∥Pn∥) → 0, the

following are true:∫A fdx = limn→∞ L(f, Pn) = limn→∞ U(f, Pn) = limn→∞ S(f, Pn, Tn) for any

choice of tags Tn of Pn.

Exercise-20: Let A ⊂ Rn be an n-box and f, g ∈ R(A). Then,

(i) [Linearity] f + g ∈ R(A) and∫A(f + g)dx =

∫A fdx+

∫A gdx.

Moreover, cf ∈ R(A) and∫A(cf)dx = c

∫A fdx for every c ∈ R.

(ii) fg ∈ R(A), where fg(x) := f(x)g(x).

(iii) If g is non-vanishing in A and 1/g is bounded, then 1/g, f/g ∈ R(A).

(iv) If f ≥ 0, then∫A fdx ≥ 0. If f ≤ g, then

∫A fdx ≤

∫A gdx.

(v) If s, t ∈ R are with s ≤ f ≤ t, then s(b− a) ≤∫A fdx ≤ t(b− a).

(vi) |f | ∈ R(A) and |∫A fdx| ≤

∫A |f |dx.

(vii) max{f, g},min{f, g} ∈ R(A). So, f+ := max{f, 0} and f− := −min{f, 0} belong to R(A).

Exercise-21: Let A ⊂ Rn be an n-box. (i) Let f : A → R be a bounded function, and D1, . . . , Dk

be the sub n-boxes of A determined by some partition P of A. Then f ∈ R(A) ⇔ f |Di ∈ R(Di)

for 1 ≤ i ≤ k. Moreover, when both sides hold, then∫A fdx =

∑ki=1

∫Difdx.

(ii) Let f : A→ R be a bounded function with only finitely many discontinuities. Then f ∈ R(A).

(iii) If f ∈ R(A) and g : A → R is a bounded function such that {x ∈ A : f(x) ̸= g(x)} is finite,

then g ∈ R(A) and∫A gdx =

∫A fdx.

Exercise-22: Let A ⊂ Rn be an n-box. (i) [Mean value theorem of Riemann integration] If f : A→ R

is continuous, there is a ∈ A with f(a)µ(A) =∫A fdx.

(ii) Let (fm) be a sequence in R(A) converging uniformly to a function f : A→ R. Then f ∈ R(A)

and∫A fdx = limm→∞

∫A fmdx.

(iii) If f : A→ [u, v] is Riemann integrable and g : [u, v] → R is continuous, then g ◦ f ∈ R(A).

(iv) If f ∈ R(A) and f ≥ 0, then f1/k ∈ R(A) for every k ∈ N.

[Hint : (i) Assume µ(A) > 0. Let u = min f(A) and v = max f(A). Then∫A udx ≤

∫A fdx ≤∫

A vdx, or u ≤∫A fdx

µ(A)≤ v. Choose x, y ∈ A with f(x) = u and f(y) = v. Let α : [0, 1] → A be

α(t) = x+t(y−x). Applying the intermediate value property to f◦α, find t0 ∈ [0, 1] with f(α(t0)) =

34 T.K.SUBRAHMONIAN MOOTHATHU∫A fdx

µ(A). Take a = f(α(t0)). (ii) Verify that f is bounded. Now let d∞ be the supremum metric

on the collection of all bounded real-valued functions on A. Let C > max{3, µ(A)}. Given ε > 0,

choose k ∈ N such that d∞(f, fm) < ε/C for every m ≥ k. Since fk ∈ R(A), there is a partition P

of A with U(fk, P )−L(fk, P ) ≤ ε/C. Since U(f, P ) ≤ U(fk)+ ε/C and L(f, P ) ≥ L(fk, P )− ε/C,

it follows by the choice of C that U(f, P ) − L(f, P ) ≤ ε. Hence f ∈ R(A). Moreover, for every

m ≥ k we have |∫A fdx −

∫A fmdx| ≤

∫A |f − fm|dx ≤

∫A(ε/C)dx ≤ ε. (iii) Approximate g with

polynomials by Weierstrass approximation theorem, and use (ii).]

For further discussion of Riemann integration, we need the notion of a null set (a set of Lebesgue

measure zero) in Rn - defined below - and some of the results pertaining to null sets.

Definition: We say X ⊂ Rn is a null set (or a set of Lebesgue measure zero) in Rn if for every

ε > 0, there is a sequence (Ak) of n-boxes with X ⊂∪∞k=1Ak and

∑∞k=1 µ(Ak) ≤ ε. For example,

if A ⊂ Rn is an n-box, then ∂A is a null set (because each face of A is an n-box of zero volume).

Exercise-23: Let X ⊂ Rn. (i) If X is a null set, then every subset of X is a null set.

(ii) X is a null set ⇔ for every ε > 0, there is a sequence (Ak) of n-boxes in Rn such that

X ⊂∪∞k=1 int(Ak) and

∑∞k=1 µ(Ak) ≤ ε ⇔ for every ε > 0, there is a sequence (Ak) of n-cubes in

Rn such that X ⊂∪∞k=1Ak and

∑∞k=1 µ(Ak) ≤ ε.

(iii) If X is a compact null set, then for every ε > 0, there are finitely many n-boxes A1, . . . , Ap in

Rn with X ⊂∪pk=1 int(Ak) and

∑pk=1 µ(Ak) ≤ ε.

(iv) If X is equal to a countable union of null sets in Rn, then X is a null set.

(v) If X is countable, then X is a null set by (iv) because every singleton is a null set.

Exercise-24: (i) Let X ⊂ Rn be compact and f : X → Rm be continuous. Then its graph G(f) is

a null set in Rm+n.

(ii) Let f : Rn → Rm be continuous. Then its graph G(f) is a null set in Rm+n.

[Hint : (i) Let A be an n-box with X ⊂ A and µn(A) > 0. Consider ε > 0. Choose ε0 > 0

with (2ε0)m <

ε

µn(A), and let δ > 0 be such that ∥f(x) − f(y)∥ < ε0 for every x, y ∈ X with

∥x− y∥ < δ. Let P be a partition of A with ∥P∥ < δ. Then A gets divided into finitely many sub

n-boxes of diameter < δ. Let D1, . . . , Dk be a listing of those sub n-boxes intersecting X. Since

diam(f(Di ∩ X)) ≤ ε0, there is an m-cube Ei ⊂ Rm of side length 2ε0 with f(Di ∩ X) ⊂ Ei.

Note that µm(Ei) = (2ε0)m <

ε

µn(A). Now G(f) ⊂

∪ki=1(Di × Ei), and

∑ki=1 µn+m(Di × Ei) =∑k

i=1 µn(Di)µm(Ei) ≤∑k

i=1 µn(Di)ε

µn(A)= µn(A) ×

ε

µn(A)= ε. (ii) Write Rn as a countable

union of compact sets, and use (i) and Exercise-23(iv).]


[120] Let U ⊂ Rn be a nonempty open set. Then, (i) There is a sequence (Kj) of compact sets in

Rn such that U =∪∞j=1Kj and Kj ⊂ int(Kj+1) for every j ∈ N.

(ii) In addition, we may choose Kj ’s in such a way that each Kj is a finite union of n-cubes with

pairwise disjoint interiors.

Proof. (i) If U = Rn, let Kj = B(0, j). Else, let Kj = B(0, j) ∩ {x ∈ Rn : dist(x,Rn \ U) ≥ 1/j}.

(ii) Choose Kj ’s as in (i). Now fix j ∈ N. Choose δ > 0 such that the δ-neighborhood Nδ(Kj) :=

{x ∈ Rn : dist(x,Kj) < δ} of Kj is included in int(Kj+1). Let A be an n-cube containing Kj , and

P be a partition of A with ∥P∥ < δ. The partition P divides A into sub n-boxes. Let Y1, . . . , Yk

be a listing of those sub n-boxes of A intersecting Kj , and put Ej =∪ki=1 Yi. Then Ej is compact,

and Kj ⊂ Ej ⊂ Nδ(Kj) ⊂ int(Kj+1). Carry out this construction for each j. Then U =∪∞j=1Ej

and Ej ⊂ int(Ej+1). Thus the new collection (Ej) of compact sets satisfies the requirement. �

[121] Let U ⊂ Rn be open, and f : U → Rn be a C1-function.

(i) Let A ⊂ U be a compact convex set, and λ = sup{∥f ′(c; ·)∥ : c ∈ A}, where ∥f ′(c; ·)∥ denotes

the operator norm of f ′(c; ·). Then λ < ∞, and f |A is λ-Lipschitz, i.e., ∥f(b) − f(a)∥ ≤ λ∥b − a∥

for every a, b ∈ A. Moreover, if D ⊂ A is an n-cube with side-length δ, then f(D) is contained in

an n-cube of side-length 2δλ√n.

(ii) If X ⊂ U is a null set in Rn, then f(X) is also a null set in Rn.

Proof. (i) We have λ < ∞ since f is a C1-function and A is compact. Consider a, b ∈ A. Then

the line segment [a, b] ⊂ A because A is convex. By Mean value inequality [109](iii), we see

that ∥f(b) − f(a)∥ ≤ λ∥b − a∥. Next, consider an n-cube D ⊂ A with side-length δ. Then

diam(D) = δ√n, and hence diam(f(D)) ≤ δλ

√n. Consequently, f(D) can be put inside some

n-cube of side-length 2δλ√n.

(ii) First suppose there is an n-cube A with X ⊂ A ⊂ U . Since U is open, by enlarging A a little

bit we may suppose that there is δ > 0 such that the δ-neighborhood Nδ(X) of X is included

in A. By part (i), f |A is λ-Lipschitz for some λ > 0. Consider ε > 0, and choose ε0 > 0

with (2λ√n)nε0 ≤ ε. By Exercise-23(ii), there is a sequence (Ck) of n-cubes with X ⊂

∪∞k=1Ck

and∑∞

k=1 µ(Ck) ≤ ε0. By partitioning Ck’s into smaller cubes if necessary, we may suppose

Ck ⊂ Nδ(X) ⊂ A for every k ∈ N. Let δk be the side-length of Ck. By (i), there is an n-cube

Ek ⊂ Rn of side-length 2δkλ√n with f(Ck) ⊂ Ek. Hence f(X) ⊂

∪∞k=1Ek and

∑∞k=1 µ(Ek) =∑∞

k=1(2δkλ√n)n = (2λ

√n)n

∑∞k=1 µ(Ck) ≤ (2λ

√n)nε0 ≤ ε. Thus f(X) is a null set.

In the general case, using [120], first write U as a countable union of n-cubes, U =∪∞i=1Ai. By

what is proved above, f(X ∩Ai) is a null set for each i ∈ N. Since a countable union of null sets is

again a null set, it follows that f(X) =∪∞i=1 f(X ∩Ai) is also null set. �


Definition: Let X ⊂ Rn and f : X → R be a function. The oscillation ω(f, x) of f at a point

x ∈ X is defined as ω(f, x) = infδ>0 diam(f(X ∩ B(x, δ))) = limδ→0 diam(f(X ∩ B(x, δ))). It is

easy to verify that f is continuous at a point x ∈ X iff ω(f, x) = 0.

[122] Let A ⊂ Rn be an n-box, f : A → R be a bounded function, and X = {x ∈ A :

f is not continuous at x}. Then,

(i) [Lebesgue’s criterion for Riemann integrability] f ∈ R(A) ⇔ X is a null set in Rn.

(ii) If X is countable, then f ∈ R(A).

Proof. (i) ⇒: Note that X = {x ∈ A : ω(f, x) > 0} =∪∞q=1Xq, where Xq := {x ∈ A : ω(f, x) ≥

1/q}. Since a countable union of null sets is a null set, it suffices to show each Xq is a null

set in Rn. So fix q ∈ N and consider ε > 0. Since f ∈ R(A), there is a partition P of A

with U(f, P ) − L(f, P ) ≤ ε

2q. Let D1, . . . , Dk be the sub n-boxes of A determined by P , and

Γ = {1 ≤ i ≤ k : Xj ∩ int(Di) ̸= ∅}. Write Xq = X ′q ∪X ′′

q , where X′q =

∪i∈Γ(Xq ∩ int(Di)) and

X ′′q = Xq ∩ (

∪ki=1 ∂Di). Choose finitely many n-boxes E1, . . . , Em such that

∪ki=1 ∂Di ⊂

∪mj=1Ej

and∑m

j=1 µ(Ej) ≤ ε/2 (in fact, we may choose Ej ’s to be the (n−1)-dimensional faces of Di’s, and

then µ(Ej) = 0 for each j). If i ∈ Γ, thenXq∩int(Di) ̸= ∅, and therefore diam(f(Di)) ≥ 1/q. Henceε

2q≥ U(f, P ) − L(f, P ) ≥

∑i∈Γ diam(f(Di))µ(Di) ≥

1

q

∑i∈Γ µ(Di), which implies

∑i∈Γ µ(Di) ≤

ε/2. Thus Xq = X ′q ∪X ′′

q ⊂ (∪i∈ΓDi)∪ (

∪mj=1Ej) and

∑i∈Γ µ(Di) +

∑mj=1 µ(Ej) ≤ ε/2+ ε/2 = ε.

(i) ⇐: Let ε > 0 be given. We need to find a partition P of A with U(f, P ) − L(f, P ) ≤ ε. Let

M > |f | and ε0 = ε/(2M + µ(A)). Since X is a null set, there is a sequence (Aj) of n-boxes with

X ⊂∪∞j=1 int(Aj) and

∑∞j=1 µ(Aj) ≤ ε0 by Exercise-23(ii). For each a ∈ A \ X, choose (by the

continuity of f at a) an n-box Ea such that a ∈ int(Ea) and diam(f(Ea)) < ε0. Then {int(Aj) :

j ∈ N}∪{int(Ea) : a ∈ A\X} is an open cover for the compact set A. Hence there exist m ∈ N and

a1, . . . , am ∈ A such that {int(Aj) : 1 ≤ j ≤ m} ∪ {int(Eaj ) : 1 ≤ j ≤ m} is an open cover for A.

Let A′j = A∩Aj and E′

j = A∩Eaj . Then we have A = (∪mj=1A

′j)∪(

∪mj=1E

′j),∑m

j=1 µ(A′j) ≤ ε0, and

diam(f(E′j)) < ε0 for 1 ≤ j ≤ m. Write each A′

j and each E′j as products of closed intervals, and use

the endpoints of those closed intervals to define a partition P of A. Then the sub n-boxesD1, . . . , Dk

of A determined by P satisfy the following: for each i ∈ {1, . . . , k}, there is j ∈ {1, . . . ,m} such that

either Di ⊂ A′j or Di ⊂ E′

j . Let Γ1 = {1 ≤ i ≤ k : Di ⊂ A′j for some j} and Γ2 = {1 ≤ i ≤ k : Di ⊂

E′j for some j}. Then U(f, P )−L(f, P ) ≤

∑i∈Γ1

diam(f(Di))µ(Di)+∑

i∈Γ2diam(f(Di))µ(Di) ≤

2M∑

i∈Γ1µ(Di) + ε0

∑i∈Γ2

µ(Di) ≤ 2M∑m

j=1 µ(A′j) + ε0

∑ki=1 µ(Di) ≤ 2Mε0 + ε0µ(A) = ε.

(ii) This is a corollary of (i) because any countable set in Rn is a null set. �

Remark: Let A ⊂ Rn be an n-box. For a bounded function f : A → R, let Xf = {x ∈ A :

f is not continuous at x}. Then Xf+g ⊂ Xf ∪Xg and Xfg ⊂ Xf ∪Xg. Combining this observation


with [122](i) gives another proof of the fact that f +g, fg ∈ R(A) whenever f, g ∈ R(A). Similarly,

we can give another reasoning for Exercise-22(iii) using [122](ii) because Xg◦f ⊂ Xf when g is

continuous.

(ii) We wish to point out that the continuity of g is necessary in Exercise-22(iii). Let f : [0, 1] → [0, 1]

be f(0) = 1, f(x) = 0 if x is irrational, and f(p/q) = 1/q if p, q ∈ N are coprime with p ≤ q. Then

{x ∈ [0, 1] : f is not continuous at x} = [0, 1]∩Q, which is a countable set, and hence f is Riemann

integrable by [122]. Let g : [0, 1] → R be g(0) = 0 and g(x) = 1 if x > 0. Then g is also Riemann

integrable by [122]. But g ◦ f : [0, 1] → R is the indicator function 1[0,1]∩Q of [0, 1] ∩ Q, which is

discontinuous at every point of [0, 1]. Since [0, 1] is not a null set, we see by [122](i) (or by directly

calculating U(g ◦ f, P ) and L(g ◦ f, P )) that g ◦ f is not Riemann integrable.

Exercise-25: Let A ⊂ Rn be an n-box. (i) Let f ∈ R(A) and Y = {y ∈ A : f(y) ̸= 0}. If Y is a

null set, then∫A fdx = 0. Conversely, if f ≥ 0 and

∫A fdx = 0, then Y is a null set.

(ii) Let f, g ∈ R(A). If {x ∈ A : f(x) ̸= g(x)} is a null set in Rn, then∫A fdx =

∫A gdx. In

particular, if g(x) = f(x) for every x ∈ int(A), then∫A fdx =

∫A gdx.

(iii) Let f ∈ R(A). If g : A → R is a bounded function such that {x ∈ A : f(x) ̸= g(x)} is a null

set in Rn, should g be Riemann integrable?

(iv) Let f ∈ R(A). If g : A → R is a bounded function such that {x ∈ A : f(x) ̸= g(x)} is a null

set in Rn, then g ∈ R(A) and∫A g =

∫A f .

[Hint : (i) Assume Y is a null set. Let D1, . . . , Dk be the sub n-boxes of A determined by a partition

P of A. If µ(Di) > 0, then Di cannot be a subset of Y , and hence inf f(Di) ≤ 0 ≤ sup f(Di).

Therefore L(f, P ) ≤ 0 ≤ U(f, P ). This is true for every partition P . Since f ∈ R(A), we

must have∫A fdx = 0. For the converse part, we may suppose µ(A) > 0. Let X = {x ∈ A :

f is not continuous at x}. By [122], it suffices to show Y ⊂ X. Consider y ∈ Y . Then f(y) > 0.

If y /∈ X, there is δ > 0 such that f(z) > f(y)/2 for every z ∈ A with ∥y − z∥ < δ. Let P be

a partition of A with ∥P∥ < δ, and D1, . . . , Dk be the sub n-boxes of A determined by P . There

is i0 such that µ(Di0) > 0 and y ∈ Di0 . Then inf(f(Di0)) ≥ f(y)/2, and inf(f(Di)) ≥ 0 for

every i so that∫A fdx ≥ L(f, P ) ≥ inf(f(Di0))µ(Di0) ≥ f(y)µ(Di0)/2 > 0, a contradiction. (ii)

Apply (i) to f − g. (iii) Need not be. Let f, g : [0, 1] → R be f ≡ 0 and g = 1[0,1]∩Q. (iv)

{x ∈ A : g is not continuous at x} ⊂ {x ∈ A : f is not continuous at x} ∪ {x ∈ A : f(x) ̸= g(x)}.]

Remark: Those who know Lebesgue measure theory can give an easier proof for the converse part

of Exercise-25(i) by writing Y =∪∞k=1 Yk, where Yk := {y ∈ A : f(y) ≥ 1/k}, and noting that∫

A fdx =∫A fdµ ≥

∫Ykfdµ ≥ µ(Yk)/k.


Seminar topic: Let A ⊂ R2 be a 2-box (rectangle), and f : A → R be a bounded function which

is monotone on each variable separately. Then f ∈ R(A) (see Proposition 5.12 in Ghorpade and

Limaye, A Course in Multivariable Calculus and Analysis).

8. Iterated integrals and Fubini’s theorem

Exercise-26: Let A = A1×A2 ⊂ Rn+m be an (n+m)-box, where A1 ⊂ Rn is an n-box and A2 ⊂ Rm

is an m-box. Write (x, y) ∈ Rn+m to mean x ∈ Rn and y ∈ Rm. Let f : A → R be a bounded

function. Let fL,1, fU,1 : A1 → R be defined as fL,1(x) = L(f(x, ·)) and fU,1(x) = U(f(x, ·)), the

lower and upper integrals of f(x, ·) over A2. Similarly, let fL,2, fU,2 : A2 → R be fL,2(y) = L(f(·, y))

and fU,2(y) = U(f(·, y)), the lower and upper integrals of f(·, y) over A1. Let P = P1 × P2 be a

partition of A, where P1 is a partition of A1, and P2 be a partition of A2. Then,

(i) L(f, P ) ≤ L(fL,1, P1) and U(fU,1, P1) ≤ U(f, P ).

(ii) L(f, P ) ≤ L(fL,2, P2) and U(fU,2, P2) ≤ U(f, P ).

[Hint : (i) Let R1, . . . , Rr be the sub n-boxes of A1 determined by P1, and S1, . . . , Ss be the

sub m-boxes of A2 determined by P2. Then Ri × Sj for 1 ≤ i ≤ r and 1 ≤ j ≤ s are the

sub (n + m)-boxes of A determined by P . Fix i ∈ {1, . . . , r} and x ∈ Ri. Then for any j,

inf f(Ri × Sj) ≤ inf{f(x, y) : y ∈ Sj}. Multiplying both sides with µm(Sj) and summing over j we

get∑s

j=1 inf f(Ri × Sj)µm(Sj) ≤ L(f(x, ·), P2) ≤ L(f(x, ·)) = fL,1(x). Since this is true for every

x ∈ Ri, we see∑s

j=1 inf f(Ri × Sj)µm(Sj) ≤ inf fL,1(Ri). Multiplying both sides with µn(Ri) and

summing over i, we obtain that L(f, P ) ≤ L(fL,1, P1).]

[123] [Fubini’s theorem] Let A = A1 ×A2 ⊂ Rn+m be an (n+m)-box, where A1 ⊂ Rn is an n-box

and A2 ⊂ Rm is an m-box. Write (x, y) ∈ Rn+m to mean x ∈ Rn and y ∈ Rm.

(i) Let f ∈ R(A). Then the functions x 7→ L(f(x, ·)) and x 7→ U(f(x, ·)) are integrable over A1,

the functions y 7→ L(f(·, y)) and y 7→ U(f(·, y)) are integrable over A2, and∫A f =

∫A1L(f(x, ·))dx =

∫A1U(f(x, ·))dx =

∫A2L(f(·, y))dy =

∫A2U(f(·, y))dy.

(ii) Let f ∈ R(A). Suppose that f(x, ·) ∈ R(A2) for each x ∈ A1 and f(·, y) ∈ R(A1) for each

y ∈ A2. Then the iterated integrals∫A1

(∫A2f(x, y)dy)dx and

∫A2

(∫A1f(x, y)dx)dy exist and∫

A f =∫A1

(∫A2f(x, y)dy)dx =

∫A2

(∫A1f(x, y)dx)dy.

(iii) If f : A→ R is continuous, then∫A1

(∫A2f(x, y)dy)dx and

∫A2

(∫A1f(x, y)dx)dy exist and∫

A f(x, y)d(x, y) =∫A1

(∫A2f(x, y)dy)dx =

∫A2

(∫A1f(x, y)dx)dy.

Proof. (i) As in Exercise-26, let fL,1(x) = L(f(x, ·)), fU,1(x) = U(f(x, ·)), fL,2(y) = L(f(·, y)), and

fU,2(y) = U(f(·, y)). Let P = P1 × P2 be a partition of A, where P1, P2 are partitions of A1, A2

respectively. Using Exercise-26(i) and the inequality fL,1 ≤ fU,1, we get

(*) L(f, P ) ≤ L(fL,1, P1) ≤ U(fL,1, P1) ≤ U(fU,1, P1) ≤ U(f, P ), and


(**) L(f, P ) ≤ L(fL,1, P1) ≤ L(fU,1, P1) ≤ U(fU,1, P1) ≤ U(f, P ).

This implies U(fL,1, P1) − L(fL,1, P1) ≤ U(f, P ) − L(f, P ) and U(fU,1, P1) − L(fU,1, P1) ≤

U(f, P ) − L(f, P ). As f ∈ R(A) and P is an arbitrary partition of A, it follows that fL,1, fU,1 ∈

R(A1). Moreover, (*) and (**) imply that∫A f(x, y)d(x, y) =

∫A1fL,1(x)dx =

∫A1fU,1(x)dx.

Similarly, we may use Exercise-26(ii) to show that fL,2, fU,2 ∈ R(A2) and∫A f(x, y)d(x, y) =∫

A2fL,2(y)dy =

∫A2fU,2(y)dy.

(ii) By hypothesis L(f(x, ·)) = U(f(x, ·)) for each x ∈ A1, and L(f(·, y)) = U(f(·, y)) for each

y ∈ A2. So this is a corollary of (i).

(iii) If f is continuous, then the hypothesis of (ii) is satisfied. �

Exercise-27: Let A =∏nj=1[aj , bj ] be an n-box, f : A→ R be continuous, and σ be any permutation

of {1, . . . , n}. Then∫A f =

∫ bσ(1)

aσ(1)(· · · (

∫ bσ(n)

aσ(n)fdxσ(n)) · · · )dxσ(1) (which means we can integrate f in

any order over the intervals [aj , bj ]). [Hint : Repeated application of [123](iii).]

Remark: Let A = A1×A2, where A1 ⊂ Rn is an n-box and A2 ⊂ Rm is an m-box. Let g : A1 → R,

h : A2 → R be continuous and f : A → R be f(x, y) = g(x)h(y). If (xn) → x in A1 and

(yn) → y in A2, then (g(xn)h(yn)) → g(x)h(y), and hence f is continuous. By Fubini’s theorem,∫A f =

∫A1

(∫A2g(x)h(y)dy)dx = (

∫A1g(x)dx)(

∫A2h(y)dy).

Example: (i) Let A = [0, 2] × [0, 3] and f : A → R be f(x, y) = xy2. Then∫ 20 (∫ 30 xy

2dy)dx =∫ 20 9xdx = 18 and

∫ 30 (∫ 20 xy

2dx)dy =∫ 30 2y2dy = 18 so that

∫A f = 18.

(ii) Let A = [0, π]2 and f : A → R be f(x, y) =sinx cos y

ex + x4 + cos2 x. Then the iterated integral∫ π

0 (∫ π0 f(x, y)dx)dy is difficult to calculate. By [123](iii), the above iterated integral is equal to∫ π

0 (∫ π0 f(x, y)dy)dx, whose value is easily seen to be 0 because

∫ π0 f(x, y)dy = 0.

(iii) Let A = [0, 1]2 and f : A → R be f(x, y) = 1 if x = 0 and y ∈ Q, and f(x, y) = 0 otherwise.

Then f is continuous in (0, 1]× [0, 1] (where f ≡ 0), whose complement {0} × [0, 1] is a null set in

R2. Hence f ∈ R(A) by [122] and∫A f = 0. If y ∈ [0, 1] is fixed, then f(·, y) is continuous in (0, 1],

whose complement {0} is a null set in R. Hence∫ 10 f(x, y)dx exists and is equal to 0. Therefore∫ 1

0 (∫ 10 f(x, y)dx)dy exists and is equal to 0. But if we fix x = 0, then f(0, ·) fails to be continuous

at every point of [0, 1]. Hence the integrals∫ 10 f(0, y)dy and

∫ 10 (∫ 10 f(x, y)dy)dx do not exist.

(iv) Let A = [0, 1]2, and S ⊂ A be a dense subset of A with the property that S contains at

most one point from each horizontal line and at most one point from each vertical line. Such a

set S can be constructed as follows. Let D1, D2, . . . ⊂ A be a listing of all sub-rectangles of A

having rational coordinates for the vertices and having nonempty interiors. Let (x1, y1) ∈ D1.

Having chosen (xj , yj) ∈ Dj for 1 ≤ j ≤ n, we choose (xn+1, yn+1) ∈ Dn+1 in such a way that


xn+1 ̸= xj for 1 ≤ j ≤ n and yn+1 ̸= yj for 1 ≤ j ≤ n. Then the set S := {(xn, yn) : n ∈ N} has

the required properties. Now define f : A → R as the indicator function 1S of S. Then f fails

to be continuous at every point of A, and hence∫A f does not exist. If x ∈ [0, 1] is fixed, then

f(x, y) = 0 for every y ∈ [0, 1] with one possible exception. Hence∫ 10 f(x, y)dy = 0, and therefore∫ 1

0 (∫ 10 f(x, y)dy)dx = 0. Similarly, the iterated integral

∫ 10 (∫ 10 f(x, y)dx)dy exists and is equal to 0.

Remark: (i) Let U ⊂ R2 be open and f : U → R be a C2-function. We can give another proof

for the fact∂2f

∂x∂y=

∂2f

∂y∂xusing Fubini’s theorem. Let F =

∂2f

∂x∂yand G =

∂2f

∂y∂x, which are

continuous since f is a C2-function. Consider A = [a, b]× [c, d] ⊂ U . Then∫A F =

∫ ba (∫ dc Fdy)dx =∫ d

c (∫ ba Fdx)dy and

∫AG =

∫ ba (∫ dc Gdy)dx =

∫ dc (∫ ba Gdx)dy by Fubini’s theorem. Now, using the

Fundamental theorem of calculus, we note that∫ dc (∫ ba Fdx)dy =

∫ dc (∂f

∂y(b, y) − ∂f

∂y(a, y))dy =

f(b, d)−f(b, c)−f(a, d)+f(a, c), and∫ ba (∫ dc Gdy)dx =

∫ ba (∂f

∂x(x, d)−∂f

∂x(x, c))dx = f(b, d)−f(a, d)−

f(b, c)+f(a, c). Hence∫A F =

∫AG for every 2-box (rectangle) A ⊂ U . If F (x0, y0) ̸= G(x0, y0) for

some (x0, y0) ∈ U , say F > G at (x0, y0), then we can find a rectangle A ⊂ U with (x0, y0) ∈ int(A)

and ε > 0 such that F (x, y) > G(x, y) + ε for every (x, y) ∈ A; then∫A F ≥

∫AG+ εµ(A) >

∫AG,

a contradiction to what we have already proved.

(ii) Conversely4, Fubini’s theorem for a continuous real-valued function f defined on a rectangle

A = [a, b] × [c, d] ⊂ U can be deduced using the equality of mixed partial derivatives of suitable

functions defined in terms of certain integral expressions of f .

[124] [Interchanging differentiation and integration] (i) Let A ⊂ Rn be an n-box and f : A×[c, d] →

R be a function such that f(·, t) ∈ R(A) for each t ∈ [c, d], and∂f

∂t: A× [c, d] → R is continuous.

Then t 7→∫A f(x, t)dx from [c, d] to R is differentiable and

d

dt(∫A f(x, t)dx) =

∫A

∂f

∂t(x, t)dx.

(ii) Let A ⊂ Rn be an n-box and f : [a, b] × A → R be a function such that f(s, ·) ∈ R(A) for

each s ∈ [a, b], and∂f

∂s: [a, b] × A → R is continuous. Then s 7→

∫A f(s, x)dx from [a, b] to R is

differentiable andd

ds(∫A f(s, x)dx) =

∫A

∂f

∂s(s, x)dx.

(iii) Let U ⊂ Rn be open, and f : U × [c, d] → R be a C1-function such that f(x, ·) is Riemann

integrable over [c, d] for each x ∈ U . Then F : U → R defined as F (x) =∫ dc f(x, t)dt has the

property that all partial derivatives of F exist and∂F

∂xj(x) =

∫ dc

∂f

∂xj(x, t)dt for every j ∈ {1, . . . , n}

and every x ∈ U .

(iv) Let U ⊂ Rn be open, and f : [a, b] × U → R be a C1-function such that f(·, x) is Riemann

integrable over [a, b] for each x ∈ U . Then F : U → R defined as F (x) =∫ ba f(s, x)ds has the

4See A. Aksoy, M. Martelli, Mixed partial derivatives and Fubini’s theorem, College Math. J., 33, (2002).


property that all partial derivatives of F exist and∂F

∂xj(x) =

∫ ba

∂f

∂xj(s, x)ds for every j ∈ {1, . . . , n}

and every x ∈ U .

Proof. (i) Fix w ∈ [c, d], and let Q(x, t) =f(x, t)− f(x,w)

t− wfor t ̸= w. We need to show that

limt→w

∫AQ(x, t)dx =

∫A

∂f

∂t(x,w)dx. Consider ε > 0. Since

∂f

∂tis uniformly continuous on the

compact set A×[c, d], there is δ > 0 such that |∂f∂t

(x, t)− ∂f

∂t(x,w)| < ε

µ(A) + 1for every x ∈ A and

every t ∈ [c, d] with |t−w| < δ. Now consider t ∈ [c, d] with 0 < |t−w| < δ. For each x ∈ A, applying

the Mean value theorem to f(x, ·), we may find tx between t and w with Q(x, t) =∂f

∂t(x, tx). Hence

|∫AQ(x, t)dx−

∫A

∂f

∂t(x,w)dx| ≤

∫A |∂f∂t

(x, tx)−∂f

∂t(x,w)|dx ≤

∫A

ε

µ(A) + 1dx < ε.

(ii) The proof is similar to that of (i).

(iii) Fix j ∈ {1, . . . , n} and x ∈ U . Choose δ > 0 such that x + sej ∈ U for every s ∈ [−δ, δ],

and define g : [−δ, δ] × [c, d] → R as g(s, t) = f(x + sej , t). Note that∂g

∂s(s, t) =

∂f

∂xj(x + sej , t).

Hence∂g

∂s: [−δ, δ] × [c, d] → R is continuous by the C1-property of f , and moreover

∂g

∂s(0, t) =

∂f

∂xj(x, t). Applying (ii) to g, we see that

d

ds(∫ dc g(s, t)dt) =

∫ dc

∂g

∂s(s, t)dt. At s = 0, the right hand

side is equal to∫ dc

∂f

∂xj(x, t)dt, and the left hand side is equal to lims→0

∫ dc

g(s, t)− g(0, t)

sdt =

lims→0

∫ dc

f(x+ sej , t)− f(x, t)

sdt = lims→0

F (x+ sej)− F (x)

s=∂F

∂xj(x).

(iv) The proof is similar to that of (iii). �

9. Multivariable Riemann integration over a Jordan measurable set

As we go through the finer details of Riemann integration theory (such as the theory of Jordan

measurable sets), we will also see some of its disadvantages. To a certain extent, these disadvantages

are rectified in Lebesgue integration theory by the use of Lebesgue measurable sets (which we will

not discuss here).

Exercise-28: Let A,E ⊂ Rn be n-boxes with A∩E ̸= ∅. Let f : A∪E → R be a bounded function

such that f |A ∈ R(A) and f(x) = 0 for every x ∈ A∆E = (A∪E)\ (A∩E). Then f |E ∈ R(E) and∫E f =

∫A f . [Hint : Observe that ∂(A ∩ E) is a closed null set in Rn. In view of Exercise-25(iv),

we may modify f on ∂(A ∩ E) and also suppose that f(x) = 0 for every x ∈ ∂(A ∩ E). Choose

a sequence (Pn) of partitions of A such that limn→∞ L(f |A, Pn) = limn→∞ U(f |A, Pn) =∫A f .

By refining Pn’s, assume that there is a sequence (Qn) of partitions of A ∩ E such that Pn is an

extension of Qn. Now choose a sequence (P̃n) of partitions of E such that P̃n is an extension

of Qn. Since f ≡ 0 outside int(A ∩ E), we get L(f |E , P̃n) = L(f |A∩E , Qn) = L(f |A, Pn) and

U(f |E , P̃n) = U(f |A∩E , Qn) = U(f |A, Pn) for every n ∈ N. It follows that L(f |E) =∫A f = U(f |E).]


Definition: Let X ⊂ Rn be a bounded set and f : X → R be a function.

(i) We define fX : Rn → R as fX(x) = f(x) for x ∈ X, and fX(x) = 0 otherwise.

(ii) We say f is Riemann integrable over X if there is an n-box A ⊂ Rn such that X ⊂ A and

fX ∈ R(A). If this holds, we define∫X f :=

∫A fX and we write fX ∈ R(X). Note that this

definition is independent of the particular choice of an n-box A because of Exercise-28.

Exercise-29: Let X ⊂ Rn be a bounded set, and f, g ∈ R(X). Then,

(i) af + bg ∈ R(X) and∫X(af + bg) = a

∫X f + b

∫X g for every a, b ∈ R.

(ii) fg ∈ R(X).

(iii) If f ≥ 0, then∫X f ≥ 0; and if f ≥ g, then

∫X f ≥

∫X g.

(iv) If f ≥ 0, then∫X f ≥

∫X0f whenever X0 ⊂ X and f ∈ R(X0).

(v) |f | ∈ R(X) and |∫X f | ≤

∫X |f |.

(vi) max{f, g},min{f, g} ∈ R(X). In particular, f+, f− ∈ R(X).

(vii) If X is a null set in Rn, then∫X f = 0.

(viii) If {x ∈ X : f(x) ̸= g(x)} is a null set in Rn, then∫X f =

∫X g.

[Hint : Consider an n-box A ⊃ X. Apply the usual properties of integration to fX and gX over A.]

Definition: Let X ⊂ Rn be a bounded set and A ⊂ Rn be an n-box containing X. Let D1, . . . , Dk

be the sub n-boxes of A determined by a partition P of A. Put Γ1 = {1 ≤ i ≤ k : Di ⊂ X} and

Γ2 = {1 ≤ i ≤ k : Di ∩ X ̸= ∅}. Define LJ(X,P ) =∑

i∈Γ1µ(Di) and UJ(X,P ) =

∑i∈Γ2

µ(Di)

(where J stands for Jordan). Next define LJ(X) and UJ(X) with respect to A as LJ(X) =

sup{LJ(X,P ) : P is a partition of A} and UJ(X) = inf{UJ(X,P ) : P is a partition of A}. These

quantities attempt to approximate the n-dimensional volume of X from inside and outside. Observe

that LJ(X,P ) = L(1X , P ), UJ(X,P ) = U(1X , P ), LJ(X) = L(1X), and UJ(X) = U(1X).

[125] Let X ⊂ Rn be a bounded set. Then the following are equivalent:

(i) 1X ∈ R(A) for some n-box A ⊂ Rn containing X.

(ii) 1X ∈ R(A) for every n-box A ⊂ Rn containing X.

(iii) LJ(X) = UJ(X) with respect to some n-box A ⊂ Rn containing X.

(iv) LJ(X) = UJ(X) with respect to every n-box A ⊂ Rn containing X.

(v) ∂X is a null set in Rn.

Proof. We get (i) ⇔ (iii), and (ii) ⇔ (iv) because LJ(X) = L(1X) and UJ(X) = U(1X). Moreover,

we have (i) ⇔ (ii) because of Exercise-28.

(ii) ⇒ (v) ⇒ (i): Let A be an n-box with X ⊂ int(A). Then {x ∈ A : 1X is not continuous at x} =

∂X. Now use Lebesgue’s criterion [122]. �


Definition: Let X ⊂ Rn be a bounded set. If the constant function 1 is Riemann integrable over X,

i.e., if the indicator function 1X ∈ R(A) for some n-box A containing X, then we say5 the set X is

Jordan measurable, and we define the Jordan measure µ(X) of X as µ(X) =∫X 1dx =

∫A 1Xdx.

[126] (i) The definition of Jordan measurability (and Jordan measure) of a bounded set X ⊂ Rn

is independent of the particular choice of an n-box A containing X because of Exercise-28.

(ii) By [125], a bounded set X ⊂ Rn is Jordan measurable ⇔ ∂X is a null set in Rn ⇔ LJ(X) =

UJ(X) with respect to some/every n-box A ⊂ Rn containing X.

(iii) If X itself is an n-box, then (by taking A = X, we may see that) X is Jordan measurable and

its Jordan measure coincides with its n-dimensional volume.

(iv) If X ⊂ Rn is Jordan measurable, then∫X c = c

∫A 1X = cµ(X) for every c ∈ R; in particular,

by taking c = 1, we observe using Exercise-25(i) that µ(X) = 0 iff X is a null set (i.e., its Lebesgue

measure is zero) in Rn.

[127] Let X ⊂ Rn be a bounded set.

(i) If X is Jordan measurable, then so are int(X) and X, and µ(int(X)) = µ(X) = µ(X).

(ii) X is Jordan measurable ⇔ LJ(int(X)) = UJ(X) with respect to some/every n-box A ⊃ X.

(iii) Suppose X = X1 ∪ X2. Let f : X → R be Riemann integrable over X1 and X2. Then f is

Riemann integrable over X and X1 ∩X2, and∫X f =

∫X1f +

∫X2f −

∫X1∩X2

f . In particular (by

taking f : X → R to be f ≡ 1), if there are Jordan measurable sets X1, X2 ∈ Rn with X = X1∪X2,

then X and X1 ∩X2 are Jordan measurable and µ(X) = µ(X1) + µ(X2)− µ(X1 ∩X2).

(iv) Suppose X =∪ki=1Xi, a finite union, and Xi ∩ Xj is a null set in Rn for every i ̸= j. If

f : X → R is Riemann integrable over each Xi, then f ∈ R(X) and∫X f =

∑ki=1

∫Xif .

(v) Assume X is Jordan measurable, and let Y ⊂ Rn be Jordan measurable. Then X ∩ Y and

X \ Y are Jordan measurable. Moreover if f ∈ R(X), then fX ∈ R(Y ).

Proof. Let A ⊂ Rn be an n-box with X ⊂ A.

(i) int(X) and X are Jordan measurable by [125] because ∂(int(X)) ⊂ ∂X and ∂(X) ⊂ ∂(X).

Next note that∫A 1int(X) =

∫A 1X =

∫A 1X by Exercise-25(ii).

(ii) ⇒: By (i) and [125], LJ(int(X)) =∫A 1int(X) =

∫A 1X = UJ(X). ⇐: Clear by [125].

(iii) By considering f+ and f− separately, assume f ≥ 0. Then fX = max{fX1 , fX2}, fX1∩X2 =

min{fX1 , fX2}, and fX = fX1 + fX2 − fX1∩X2 . Now use Exercise-29(vi) and Exercise-29(i).

(iv) We may suppose k = 2; the general case can be proved by a repeated application of this case.

When k = 2, the result follows from (iii) and Exercise-29(viii).

5In some textbooks, [127](ii) is taken as the definition of Jordan measurability.


(v) X ∩ Y is Jordan measurable by (iii). Since ∂(X \ Y ) ⊂ ∂X ∪ ∂Y , it follows by [126](ii) that

X \ Y is also Jordan measurable. Now suppose f ∈ R(X). Let A be an n-box with X ∪ Y ⊂

int(A). Then fX ∈ R(A) and hence {x ∈ A : fX is not continuous at x} is a null set by [122].

Since (fX)Y = fX∩Y , and since {x ∈ A : fX∩Y is not continuous at x} ⊂ ∂X ∪ ∂Y ∪ {x ∈ X :

fX is not continuous at x}, it follows that (fX)Y ∈ R(A) by [122]. Hence fX ∈ R(Y ). �

[128] Let X ⊂ Rn be a bounded set.

(i) Let (fn) be a sequence in R(X) converging uniformly to a function f : X → R. Then f ∈ R(X)

and∫X f = limn→∞

∫X fn.

(ii) If X is Jordan measurable and f : X → R is a bounded continuous function, then f ∈ R(X). In

particular, if X is a Jordan measurable compact set and f : X → R is continuous, then f ∈ R(X).

(iii) Let f ∈ R(X), g : X → R be a bounded function and X0 = {x ∈ X : f(x) ̸= g(x)}. If X0 is

Jordan measurable with µ(X0) = 0, then g ∈ R(X) and∫X f =

∫X g.

Proof. Let A ⊂ Rn be an n-box with X ⊂ A.

(i) Apply Exercise-22(ii) to fX over A.

(ii) ∂X is a null set by hypothesis, and {x ∈ A : fX is not continuous at x} ⊂ ∂X. Hence fX ∈

R(A) by Lebesgue’s criterion [122](i).

(iii) By [127](i), X0 is also Jordan measurable and µ(X0) = µ(X0) = 0. Note that {x ∈ A :

fX(x) ̸= gX(x)} ⊂ X0 ∪ ∂X, and the set on the right hand side is a closed null set in Rn. Now

apply Exercise-25(iv). �

Example: (i) X := 1[0,1]∩Q is not Jordan measurable because ∂X = [0, 1] is not a null set in R.

This example shows also that a bounded set which is a countable union of Jordan measurable sets

need not be Jordan measurable.

(ii) We will construct a bounded open set X in R which is not Jordan measurable. Let ε ∈ (0, 1/2)

and {xn : n ∈ N} ⊂ (0, 1) be a dense subset of [0, 1]. For each n ∈ N, choose an open interval

Jn ⊂ (0, 1) containing xn with µ(Jn) < ε/2n, and put X =∪∞n=1 Jn, which is an open set in R

with X = [0, 1]. If ∂X is a null set, there is a sequence (J̃n) of open intervals with ∂X ⊂∪∞n=1 J̃n

and∑∞

n=1 µ(J̃n) < ε. Then {Jn : n ∈ N} ∪ {J̃n : n ∈ N} is an open cover for the compact

set X ∪ ∂X = [0, 1]. Extract a finite subcover {Jn : 1 ≤ n ≤ p} ∪ {J̃n : 1 ≤ n ≤ p}. Then

1 = µ([0, 1]) ≤∑p

n=1 µ(Jn) +∑p

n=1 µ(J̃n) ≤ ε + ε = 2ε < 1, a contradiction (here we used: if

J1, . . . , Jp are intervals covering an interval J , then µ(J) ≤∑p

n=1 µ(Jn)).

(iv) If X is as in (ii) above, and K = [0, 1] \ X, then K is a compact set which is not Jordan

measurable because ∂K = ∂X.


(iv) There are path connected Jordan measurable sets which are not Borel sets. Let n ≥ 2, and

A ⊂ Rn be an n-box with int(A) ̸= ∅. Since ∂A is an uncountable compact set, there is a non-Borel

set Y ⊂ ∂A (∵ the cardinality of the collection of Borel subsets of Rn is equal to that of R whereas

the cardinality of the power set P(∂A) is equal to that of P(R)). Let X = int(A)∪ Y , which is not

a Borel set. But X is Jordan measurable because ∂X ⊂ ∂A, and clearly X is path connected.

Remark: (i) Every Jordan measurable set X ⊂ Rn is a (bounded) Lebesgue measurable set because

X = int(X)∪ (X ∩∂X), where int(X) is an open set and hence Lebesgue measurable, and X ∩∂X

is also Lebesgue measurable because it is a subset of the null set ∂X.

(ii) Let X ⊂ Rn be a bounded set. The existence some f in R(X) does not imply the Jordan

measurability of X: trivially, 0 ∈ R(X) for every bounded set X.

Definition: Let U, V ⊂ Rn be open. A function g : U → V is called a C1-diffeomorphism if g is

bijective and both g and g−1 are C1-functions. Note that if g : U → V is a bijective C1-function

with det(Jg(x)) ̸= 0 for every x ∈ U , then g is a C1-diffeomorphism by Inverse function theorem.

Exercise-30: Let U, V ⊂ Rn be open, and g : U → V be a C1-diffeomorphism.

(i) Let X ⊂ Rn be Jordan measurable with X ⊂ U . Then g(int(X)) = int(g(X)), g(∂X) = ∂g(X),

g(X) = g(X), and g(X) is a Jordan measurable set with g(X) ⊂ V .

(ii) If U, V are Jordan measurable and f ∈ R(V ), then f ◦ g ∈ R(U).

[Hint : (i) Since g is in particular a homeomorphism, we get g(X) = g(X) ⊂ V , g(int(X)) =

int(g(X)), and g(∂X) = ∂g(X). The first equality implies g(X) is compact, and hence g(X) is

bounded. By [121](ii), we see ∂g(X) = g(∂X) is a null set in Rn. Now [126](ii) can be used. (ii) Let

Y = {y ∈ V : f is not continuous at y}. Note that {x ∈ Rn : (f ◦ g)U is not continuous at x} ⊂

∂U ∪ g−1(Y ) and use [122].]

Exercise-31: Let X = {(x, y) ∈ R2 : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)}, where ϕ, ψ : [a, b] → R are

continuous with ϕ ≤ ψ. Then, (i) X is compact and Jordan measurable.

(ii) If f : X → R is continuous, then f ∈ R(X) and∫X f =

∫ ba (∫ ψ(x)ϕ(x) f(x, y)dy)dx.

[Hint : (i) ∂X is a null set in R2 because the graphs of ϕ and ψ are null sets in R2. (ii) f ∈ R(X)

by [128](ii). Now consider a 2-box (rectangle) A = [a, b] × [c, d] ⊃ X. For each x ∈ [a, b], the

map y 7→ fX(x, y) is Riemann integrable over [c, d] (because it can be discontinuous at at most

two points, ϕ(x) and ψ(x)), and∫ dc fX(x, y)dy =

∫ ψ(x)ϕ(x) f(x, y)dy. By Fubini’s theorem [123](ii),∫

X f =∫A fX =

∫ ba (∫ dc fX(x, y)dy)dx =

∫ ba (∫ ψ(x)ϕ(x) f(x, y)dy)dx.]

Remark: Exercise-31 is useful in evaluating certain integrals. For example, let X = {(x, y) ∈ R2 :

0 ≤ x ≤ 1 and 0 ≤ y ≤ x2} and f : X → R be f(x, y) = xy2. Then∫X f =

∫ 10 (∫ x20 xy2dy)dx =∫ 1

0

x7

3dx =

1

24. In certain cases, instead of bounding y with functions of x, we may bound x with


functions of y and interchange the order of integration. For example, let X = {(x, y) ∈ R2 : 0 ≤

x ≤ 1 and x ≤ y ≤ 1} and f : X → R be f(x, y) = ey2. Then

∫X f =

∫ 10 (∫ 1x e

y2dy)dx, but the inner

integral is not easy to evaluate. However note that X = {(x, y) ∈ R2 : 0 ≤ y ≤ 1 and 0 ≤ x ≤ y},

and hence∫X f =

∫ 10 (∫ y0 e

y2dx)dy =∫ 10 ye

y2dy = (1/2)∫ 10 e

tdt =e− 1

2by the substitution t = y2.

Exercise-32: Let X ⊂ Rn be a Jordan measurable compact set, and f : X× [c, d] → R be a function

such that f(·, y) ∈ R(X) for each y ∈ [c, d], and∂f

∂y: X × [c, d] → R is continuous. Then the

function y 7→∫X f(x, y)dx from [c, d] → R is differentiable and

d

dy(∫X f(x, y)dx) =

∫X

∂f

∂y(x, y)dx.

[Hint :∂f

∂y∈ R(X) by [128](ii). Now imitate the proof of [124].]

10. Change of variable

We may write the Change of variable formula in one-variable theory in the following form:

Exercise-33: Let f : [a, b] → R be Riemann integrable and g : [c, d] → [a, b] be a surjec-

tive C1-function with g′ non-vanishing. Then∫ ba f(t)dt =

∫ dc f(g(x))|g

′(x)|dx. [Hint : We have∫ g(d)g(c) f(t)dt =

∫ dc f(g(x))g

′(x)dx by the standard Change of variable formula. Since g′ is non-

vanishing, either g′ > 0 or g′ < 0 on [c, d]. If g′ > 0, then g(c) = a, g(d) = b, and |g′(x)| = g′(x). If

g′ < 0, then g(c) = b, g(d) = a, and |g′(x)| = −g′(x).]

Our aim is to generalize Exercise-33 to higher dimensions by replacing the ‘local magnification

factor’ |g′(x)| with |det(Jg(x))|. The reason for | det(Jg(x))| to be the ‘local magnification factor’

in higher dimensions stems from the result [129] stated below.

Definition: An invertible linear map E : Rn → Rn is said to be an elementary linear map if it is

one of the following three types:

Type-1 : ∃ j ∈ {1, . . . , n} and λ ∈ R \ {0} with E(ej) = λej and E(ek) = ek for every k ̸= j.

Type-2 : ∃ i ̸= j in {1, . . . , n} with E(ei) = ej , E(ej) = ei, and E(ek) = ek for every k ̸= i, j.

Type-3 : ∃ i ̸= j in {1, . . . , n} with E(ej) = ei + ej and E(ek) = ek for every k ̸= j.

Exercise-34: [Fact from Linear Algebra] Every invertible linear map L : Rn → Rn can be written

as a finite product of elementary linear maps. [Hint : For the corresponding result in terms of

matrices, see Theorem 12 in Section 1.6 of Hoffman and Kunze, Linear Algebra.]

Convention: Let L ∈ L(Rn,Rn). Then L′(x; ·) = L, and hence JL(x) is equal to the matrix of L

with respect to the standard basis of Rn for each x ∈ Rn. Identifying L with its matrix, we will

write det(L) to mean the determinant of the matrix of L; with this convention, det(L) = det(JL(x))

for every x ∈ Rn.

[129] Let X ⊂ Rn be Jordan measurable.


(i) If E ∈ L(Rn,Rn) is an elementary linear map, then µ(E(X)) = | det(E)|µ(X).

(ii) If L ∈ L(Rn,Rn) is invertible, then µ(L(X)) = | det(L)|µ(X).

Proof. Note that E(X) and L(X) are Jordan measurable because of Exercise-30(i).

(i) Since X is Jordan measurable, we have LJ(X) = UJ(X) with respect to any n-cube containing

X by [126](ii). Hence X can be approximated with a finite union of n-cubes with pairwise disjoint

interiors, and therefore, we may suppose X itself is an n-cube. Since E(cx + y) = cE(x) + y for

c ∈ R \ {0} and x, y ∈ Rn, we may also suppose after a scaling and translation that X is the unit

cube in Rn. In particular, µ(X) = 1, and thus we need to just show µ(E(X)) = |det(E)|. Keep in

mind that E(X) = {∑n

k=1 ckE(ek) : ck ∈ [0, 1] for every k} since X is the unit n-cube.

If E is of type-1, then there are λ ∈ R \ {0} and j ∈ {1, . . . , n} such that E(ej) = λej and

E(ek) = ek for every k ̸= j. Hence | det(E)| = |λ|. As E(X) is an n-box whose jth edge has length

|λ| and other edges have unit length, we conclude µ(E) = |λ| = | detE|. If E is of type-2, then

| det(E)| = 1 and E(X) = X so that µ(E(X)) = 1 = µ(X). If E is of type-3, then there are i ̸= j

in {1, . . . , n} such that E(ej) = ei + ej and E(ek) = ek for every k ̸= j. Then | det(E)| = 1. In the

xixj-plane, E maps the unit square to the parallelogram with vertices 0, ei, ei + ej , and 2ei + ej ,

whose area is 1. Consequently, µ(X) = 1 = | det(E)| in this case also.

(ii) Write L = E1 · · ·Ep, a finite product of elementary linear maps, and we will use induction

on p. The case p = 1 is covered by (i). Put L0 = E1 · · ·Ep−1 so that L = L0Ep. Since Y :=

Ep(X) is also Jordan measurable, we get by induction assumption on p − 1 that µ(L(X)) =

µ(L0(Y )) = | det(L0)|µ(Y ). Now, µ(Y ) = µ(Ep(X)) = | det(Ep)|µ(X) by (i), and det(L0) =

det(E1) · · · det(Ep−1). It follows that µ(L(X)) =∏pi=1 | det(Ei)|µ(X) = | det(L)|µ(X). �

[130] [Linear change of variable] Let L ∈ L(Rn,Rn) be invertible, and X ⊂ Rn be Jordan measur-

able. If f : L(X) → R is Riemann integrable, then F : X → R defined as F (x) = f(L(x))| det(L)|

is Riemann integrable over X, and∫L(X) f =

∫X F .

Proof. By considering f+ and f− separately, assume f ≥ 0. Let A ⊂ Rn be an n-box with

X ⊂ int(A). Observe that Y := L(X) is Jordan measurable by Exercise-30(i). Let fY , FX : Rn → R

be the extended functions which are zero respectively outside Y and X. The set C := {c ∈ Rn :

fY is not continuous at c} is a null set in Rn by [122] because f ∈ R(Y ) by hypothesis. Since

{x ∈ A : FX is not continuous at x} ⊂ L−1(C), and L−1(C) is a null set in Rn by [121](ii) (applied

to L−1), we deduce that F ∈ R(X) by [122].

Let D1, . . . , Dk be the sub n-boxes of A determined by a partition P of A. Then L(Di) is Jordan

measurable for every i. Moreover if i ̸= j, then Di ∩Dj is a null set in Rn, and therefore L(Di) ∩

L(Dj) = L(Di∩Dj) is also a null set in Rn by the invertibility of L and [121](ii). Now using [127](iv)


and [129], we see that∫Y f =

∫L(A) fY =

∑ki=1

∫L(Di)

fY ≤∑k

i=1 sup(fY (L(Di)))µ(L(Di)) =∑ki=1 sup(fY (L(Di)))| det(L)|µ(Di) =

∑ki=1 sup(FX(Di))µ(Di) = U(FX , P ). Similarly, L(FX , P ) ≤∫

Y f . As P is an arbitrary partition of A, and F ∈ R(X), it follows that∫X F =

∫Y f . �

One advantage of n-cubes over n-dimensional balls is that subsets of Rn can be covered with

finitely many or countably many n-cubes of the same size with pairwise disjoint interiors (whereas

any covering using balls will have overlapping of the balls in general, which makes it difficult to add

up estimates from different balls). We need to make an estimate about the volume of the image of

an n-box under a C1-map. For this purpose, it is convenient to use certain special norms:

Definition: The supremum norm of x ∈ Rnis defined as ∥x∥∞ = max{|xj | : 1 ≤ j ≤ n}. For a

linear map T : Rn → Rn with matrix [tij ], let ∥T∥0 = max{∑n

j=1 |tij | : 1 ≤ i ≤ n}.

Exercise-35: The quantity ∥ · ∥0 defined above is a norm on the vector space L(Rn,Rn) ∼= Rn×n

with ∥ − T∥0 = ∥T∥0 and ∥I∥0 = 1. Moreover,

(i) ∥Tx∥∞ ≤ ∥T∥0∥x∥∞ for every T ∈ L(Rn,Rn) and x ∈ Rn.

(ii) For every T ∈ L(Rn,Rn), there is x ∈ Rn with ∥x∥∞ = 1 and ∥Tx∥∞ = ∥T∥0.

(iii) ∥S ◦ T∥0 ≤ ∥S∥0∥T∥0 for every S, T ∈ L(Rn,Rn).

[Hint : (i) |∑n

j=1 tijxj | ≤∑n

j=1 |tij |∥x∥∞ for 1 ≤ i ≤ n. (ii) Let i ∈ {1, . . . , n} be with∑n

j=1 |tij | =

∥T∥0. Define x ∈ Rn as xj = 1 if tij ≥ 0 and xj = −1 if tij < 0. (iii) Choose x ∈ Rn with ∥x∥∞ = 1

and ∥(S ◦ T )x∥∞ = ∥S ◦ T∥0. Then ∥S ◦ T∥0 ≤ ∥S∥0∥Tx∥∞ ≤ ∥S∥0∥T∥0∥x∥∞ = ∥S∥0∥T∥0.]

Exercise-36: Let U ⊂ Rn be open, f = (f1, . . . , fn) : U → Rn be a C1-function, D ⊂ U be a

compact convex set, and λ = sup{∥Jf (c)∥0 : c ∈ D}. Then,

(i) λ <∞, and∑n

j=1 |∂fi∂xj

(c)| ≤ λ for every i ∈ {1, . . . , n} and every c ∈ D.

(ii) ∥f(b)− f(a)∥∞ ≤ λ∥a− b∥∞ for every a, b ∈ D.

[Hint : (i) λ < ∞ since f is a C1-function and D is compact. (ii) By Mean value theorem and

the convexity of D, there are c1, . . . , cn ∈ [a, b] ⊂ D with fi(b) − fi(a) = ⟨∇fi(ci), b − a⟩ =∑nj=1

∂fi∂xj

(ci) · (bj −aj) for 1 ≤ i ≤ n. Hence |fi(b)− fi(a)| ≤∑n

j=1 |∂fi∂xj

(ci)||bj −aj | ≤ λ∥b−a∥∞.]

[131] Let U, V ⊂ Rn be open, g : U → V be a C1-diffeomorphism (i.e., g is bijective and g, g−1

are C1-functions), and A ⊂ U be compact. Then for every ε > 0, there is δ > 0 such that

µ(g(D)) ≤ (1+ε)n| det(Jg(x))|µ(D) for every n-cube D ⊂ A with side-length < δ and every x ∈ D.

Proof. Define Φ : A × A → Rn×n ∼= L(Rn,Rn) as Φ(x, y) = Jg(x)−1Jg(y). Then Φ is continuous

with respect to the operator norm of L(Rn,Rn) because g is a C1-function and the operation of

inversion is continuous in L(Rn,Rn) by Exercise-15(ii). Since any two norms are equivalent on a

finite dimensional space by [101], the map Φ is also continuous with respect to the supremum norm

∥ · ∥∞ on Rn and the norm ∥ · ∥0 on Rn×n defined above as ∥[tij ]∥∞ = max{∑n

j=1 |tij | : 1 ≤ i ≤ n}.


Since Φ(x, x) = I and since A × A is compact also with respect to the supremum norm, we may

find δ > 0 such that ∥Jg(x)−1Jg(y)∥0 < 1 + ε for every x, y ∈ A with ∥x− y∥∞ < δ.

Consider an n-cube D ⊂ A whose side-length (say) r is < δ, and fix x ∈ D. Now, L := Jg(x) :

Rn → Rn is an invertible linear map, f := L−1 ◦ g : U → L−1(V ) is a C1-diffeomorphism, and

g = L ◦ f . Applying [129](ii) to the Jordan measurable set X := f(D), we obtain µ(g(D)) =

µ(L(f(D))) = | det(L)|µ(f(D)) = |det(Jg(x))|µ(f(D)).

Since the side-length r of D is < δ, we get ∥x − y∥∞ ≤ r < δ for every y ∈ D. Hence

∥Jg(x)−1Jg(y)∥0 < 1+ ε for every y ∈ D by the choice of δ. Observe that Jf (y) = Jg(x)−1Jg(y) by

the definition of f , and hence λ := sup{∥Jf (y)∥0 : y ∈ D} ≤ 1+ε. Let a ∈ D be the center ofD, and

consider b ∈ D. Note that ∥b−a∥∞ ≤ r/2. By Exercise-36, ∥f(b)−f(a)∥∞ ≤ λ∥b−a∥∞ ≤ (1+ε)r/2.

Therefore, f(D) is contained in an n-cube with center f(a) and side-length (1 + ε)r. This implies

µ(f(D)) ≤ (1+ε)nrn = (1+ε)nµ(D). Combining this with the estimate of the previous paragraph,

we conclude µ(g(D)) ≤ (1 + ε)n| det(Jg(x))|µ(D). �

[132] Let U, V ⊂ Rn be open sets and g : U → V be a C1-diffeomorphism.

(i) [Preparatory step] Let A ⊂ Rn be an n-cube, f : g(A) → R be Riemann integrable, and

f ≥ 0. Then, F : A → R defined as F (x) = f(g(x))|det(Jg(x))| is Riemann integrable over A and∫g(A) f ≤

∫A F .

(ii) [Change of variable theorem - version 1] Let X ⊂ Rn be a Jordan measurable set with X ⊂ U

and f : g(X) → R be Riemann integrable. Then F : X → R defined as F (x) = f(g(x))| det(Jg(x))|

is Riemann integrable over X and∫g(X) f =

∫X F .

(iii) [Change of variable theorem - version 2] Assume the open sets U, V are Jordan measurable,

x 7→ det(Jg(x)) is bounded on U , and f : V → R is Riemann integrable. Then F : U → R defined

as F (x) = f(g(x))| det(Jg(x))| is Riemann integrable over U and∫V f =

∫U F .

Proof. (i) As in the proof of [130], we may see that g(A) is Jordan measurable and F ∈ R(A). Let

ε > 0. By [131], there is δ > 0 such that µ(g(D)) ≤ (1 + ε)n| det(Jg(x))|µ(D) for every n-cube

D ⊂ A with side-length < δ. Choose a partition P of A such that the sub n-boxes D1, . . . , Dk of A

determined by P are n-cubes with side-legth < δ, and such that U(F, P ) ≤ ε+∫A F . Since g is a

C1-diffeomorphism, g(Di) is Jordan measurable for every i, and g(Di)∩g(Dj) = g(Di∩Dj) is a null

set for every i ̸= j by [121](ii). Therefore,∫g(A) f =

∑ki=1

∫g(Di)

f ≤∑k

i=1 sup(f(g(Di)))µ(g(Di)) ≤

ε+∑k

i=1 f(g(xi))µ(g(Di)) for some choice of points xi ∈ Di.

By the choice of δ, we have µ(g(Di)) ≤ (1 + ε)n|det(Jg(xi))|µ(Di) for 1 ≤ i ≤ k. This can be

combined with the previous inequality because f ≥ 0. Thus we get∫g(A) f ≤ ε+ (1 + ε)n

∑ki=1 f(g(xi))| det(Jg(xi))|µ(Di) = ε+ (1 + ε)n

∑ki=1 F (xi)µ(Di)


≤ ε+ (1 + ε)nU(F, P ) ≤ ε+ (1 + ε)n(ε+∫A F ).

Since ε > 0 is arbitrary, we deduce that∫g(A) f ≤

∫A F .

(ii) By considering f+ and f− separately, assume f ≥ 0. Let Y = g(X). The continuous map x 7→

| det(Jg(x))| is bounded on the compact set X, and therefore F is bounded. Let fY , FX : Rn → R

be the extended functions which are zero respectively outside Y and X. Since X ⊂ U , there are

finitely many n-cubes A1, . . . , Ak with pairwise disjoint interiors such that the set K :=∪ki=1Ai

satisfies X ⊂ K ⊂ U . Since X and Ai are Jordan measurable, the sets Y = g(X) and g(Ai) are

Jordan measurable. As f ∈ R(Y ), it follows that fY ∈ R(g(Ai)) by [127](v). Applying part (i)

to fY and FX , we get that FX ∈ R(Ai) and∫g(Ai)

fY ≤∫AiFX for 1 ≤ i ≤ k. It follows that

FX ∈ R(K) and∫g(K) fY ≤

∫K FX since Ai ∩ Aj and g(Ai) ∩ g(Aj) are null sets for i ̸= j (see

[127](iv)). This implies F ∈ R(X) and∫Y f ≤

∫X F . Now observe that if x ∈ X and y = g(x),

then det(Jg−1(y)) =1

det(Jg(x))and hence f(y) = F (g−1(y))| det(Jg−1(y))|. This allows us to

interchange the roles of f and F (and interchange g and g−1) to establish the reverse inequality∫X F ≤

∫Y f . Thus

∫Y f =

∫X F .

(iii) By considering f+ and f− separately, assume f ≥ 0. By Exercise-30(ii), f ◦ g ∈ R(U).

The bounded continuous function x 7→ | det(Jg(x))| is Riemann integrable over U by [128](ii).

Hence F ∈ R(U), and also F ≥ 0. Using [120], we may write U =∪∞j=1Kj , where Kj ’s are

Jordan measurable compact sets with Kj ⊂ int(Kj+1) for every j ∈ N. Let Yj = g(Kj). Then

V =∪∞j=1 Yj , where Yj ’s are Jordan measurable compact sets with Yj ⊂ int(Yj+1) for every j ∈ N.

By (ii), we have f ∈ R(Yj), F ∈ R(Kj), and∫Yjf =

∫KjF for every j ∈ N. From this we may

deduce∫V f = limj→∞

∫Yjf = limj→∞

∫KjF =

∫U F as follows.

Consider ε > 0. Choose a partition P of an n-box A ⊃ U with (∫U F ) − ε < L(FU , P ). If

D1, . . . , Dk are the sub n-boxes of A determined by P and Γ = {1 ≤ i ≤ k : Di ⊂ U}, then

inf(FU (Di)) = 0 for i /∈ Γ, and therefore L(FU , P ) =∑

i∈Γ inf(F (Di))µ(Di) ≤∑

i∈Γ∫DiF =

∫K F ,

where K :=∪i∈ΓDi. Since the compact set K ⊂ U =

∪∞j=1 int(Kj+1), there is j with K ⊂

int(Kj+1). Hence (∫U F ) − ε < L(FU , P ) ≤

∫K F ≤

∫Kj+1

F . As ε > 0 is arbitrary and F ≥ 0,

we deduce∫U F ≤ limj→∞

∫KjF . Clearly, we also have

∫U F ≥

∫KjF for every j, and hence∫

U F ≥ limj→∞∫KjF . Thus

∫U F = limj→∞

∫KjF . Similarly,

∫V f = limj→∞

∫Yjf . �

Remark: For two other proofs of the Change of variable theorem see (i) Chapter 4 of Munkres,

Analysis on Manifolds, and (ii) P.D. Lax, Change of variables in multiple integrals, American

Mathematical Monthly, 1999.

[133] Let U, V ⊂ Rn be open sets and g : U → V be a C1-diffeomorphism.


(i) Let X ⊂ Rn be a Jordan measurable set with X ⊂ U . Then x 7→ | det(Jg(x))| is Riemann

integrable over X and µ(g(X)) =∫X | det(Jg(x))|dx.

(ii) Assume the open sets U, V are Jordan measurable, and x 7→ det(Jg(x)) is bounded on U . Then

µ(V ) =∫U | det(Jg(x))|dx.

Proof. (i) Let f : g(X) → R be f ≡ 1, and note that f ∈ R(X) by [128](ii). Now by [132](ii), we

have µ(g(X)) =∫g(X) 1 =

∫g(X) f =

∫X | det(Jg(x))|dx.

(ii) Apply [132](iii) to f : V → R defined as f ≡ 1. �

11. Polar, cylindrical, and spherical coordinates

One important use of the Change of variable theorem is in transforming the Euclidean coordinate

to polar, cylindrical, spherical coordinates.

Definition: (i) [Polar coordinates in R2] Let U = {(r, θ) ∈ R2 : r > 0 and 0 < θ < 2π}, and

g : U → R2 be g(r, θ) = (r cos θ, r sin θ). Then V := g(U) = R2 \ {(x, 0) : x ≥ 0}, where

{(x, 0) : x ≥ 0} is a closed null set in R2. The function g : U → V is a bijective C1-function

with Jg(r, θ) =

cos θ −r sin θ

sin θ r cos θ

so that det(Jg(r, θ)) = r ̸= 0 for every (r, θ) ∈ U . Hence

g : U → V is a C1-diffeomorphism by Inverse function theorem. If (x, y) ∈ V and (x, y) = g(r, θ),

then (r cos θ, r sin θ) is said to be the polar coordinate representation of (x, y). Here note that

r2 = x2 + y2, and θ is the angle (measured in the anticlockwise direction) from the positive x-axis

to the line segment joining (0, 0) and (x, y).

(ii) [Cylindrical coordinates in R3] Let U = {(r, θ, z) ∈ R3 : r > 0, 0 < θ < 2π, and z ∈ R} and

g : U → R3 be g(r, θ, z) = (r cos θ, r sin θ, z) (this means using polar coordinates in the xy-plane

and keeping the z-coordinate unchanged). Let V = R3 \ {(x, 0, z) : x ≥ 0 and z ∈ R}. Then

{(x, 0, z) : x ≥ 0 and z ∈ R} is a closed null set in R3, and g : U → V is a bijective C1-function

with Jg(r, θ, z) =

cos θ −r sin θ 0

sin θ r cos θ 0

0 0 1

so that det(Jg(r, θ, z)) = r ̸= 0 for every (r, θ, z) ∈ U . Hence

g : U → V is a C1-diffeomorphism. If (x, y, z) ∈ V and (x, y, z) = g(r, θ, z), then (r cos θ, r sin θ, z)

is said to be the cylindrical coordinate representation of (x, y, z). Here note that r2 = x2 + y2, and

θ is the angle (measured in the anticlockwise direction) from the positive x-axis to the line segment

joining (0, 0, 0) and (x, y, 0).

(iii) [Spherical coordinates in R3] Note that A := {(x, 0, z) : x ≥ 0 and z ∈ R} is a closed null set

in R3. Let V = R3 \ A, and consider (x, y, z) ∈ V . Define r > 0 and t > 0 by the conditions

that r2 = x2 + y2 + z2 and t2 = x2 + y2. In the xy-plane, we may use polar coordinates and write


(x, y, 0) = (t cos θ, t sin θ), where θ ∈ (0, 2π) is the angle (measured in the anticlockwise direction)

from the positive x-axis to the line segment joining (0, 0, 0) and (x, y, 0). Let η ∈ (0, π) be the angle

between the positive z-axis and the line segment joining (0, 0, 0) and (x, y, z). Then z = r cos η and

t = r sin η so that (x, y, z) = (r cos θ sin η, r sin θ sin η, r cos η).

Let U = {(r, θ, η) ∈ R2 : r > 0, 0 < θ < 2π, and 0 < η < π} and g : U → V be

g(r, θ, η) = (r cos θ sin η, r sin θ sin η, r cos η). Then g is a bijective C1-function with Jg(r, θ, η) =cos θ sin η −r sin θ sin η r cos θ cos η

sin θ sin η r cos θ sin η r sin θ cos η

cos η 0 −r sin η

so that det(Jg(r, θ, η)) = −r2 sin η ̸= 0 for every (r, θ, η)

in U . Hence g : U → V is a C1-diffeomorphism. If (x, y, z) ∈ V and (x, y, z) = g(r, θ, η), then

(r cos θ sin η, r sin θ sin η, r cos η) is said to be the spherical coordinate representation of (x, y, z).

Exercise-37: (i) Use polar coordinates to see the area of B(0, λ) ⊂ R2 is πλ2.

(iii) Use spherical coordinates to see the area of B(0, λ) ⊂ R3 is 4πλ3/3.

[Hint : (i) Let U = (0, λ)× (0, 2π) and V = B(0, λ) \ {(x, 0) : x ≥ 0}. Note that {(x, 0) : x ≥ 0} is a

closed null set in R2 and g : U → V given by g(r, θ) = (r cos θ, r sin θ) is a C1-diffeomorphism with

det(Jg(r, θ)) = r. Hence by [133](ii), µ(B(0, λ)) = µ(V ) =∫U | det(Jg(r, θ))| =

∫ λ0

∫ 2π0 rdθdr = πλ2.

(ii) Let U = (0, λ) × (0, 2π) × (0, π), g : U → R3 be g(r, θ, η) = (r cos θ sin η, r sin θ sin η, r cos η),

and V = g(U). Then V is equal to B(0, λ) minus a closed null set, and g : U → V is a

C1-diffeomorphism with |det(Jg(r, θ, η))| = r2 sin η. Hence by [133](ii), µ(B(0, λ)) = µ(V ) =∫U r

2 sin η =∫ λ0

∫ 2π0

∫ π0 r

2 sin ηdηdθdr =4πλ3

3.]

Example: We wish to evaluate∫V f , where V = {(x, y) ∈ R2 : x > 0, y > 0, and x2 + y2 < λ2},

and f : V → R is f(x, y) = x2y. Let U = (0, λ) × (0, π/2) and note g : U → V given by

g(r, θ) = (r cos θ, r sin θ) is a C1-diffeomorphism with det(Jg(r, θ)) = r. By [133](ii),∫V f =∫

U f(g(r, θ))| det(Jg(r, θ))| =∫ λ0

∫ π/20 r4 cos2 θ sin θdθdr =

∫ λ0

∫ 10 r

4t2dtdr = λ5/15 (where t = cos θ).

[134] For n ∈ N and λ > 0, let v(n, λ) denote the n-dimensional volume of B(0, λ) ⊂ Rn. Then,

(i) v(n, λ) = λnv(n, 1).

(ii) v(n+ 2, 1) = 2πv(n, 1)/(n+ 2).

(iii) v(3, λ) = 4πλ3/3 and v(4, λ) = π2λ4/2.

Proof. (i) Let g : Rn → Rn be g(x) = λx. Then g is an invertible linear map (in particular a

C1-diffeomorphism) with g(B(0, 1)) = B(0, λ). The matrix of g is a diagonal matrix where all

diagonal entries are equal to λ, and hence | det(Jg(x))| = λn for every x ∈ Rn. By [133](ii),

v(n, λ) =∫B(0,1) | det(Jg(x))|dx =

∫B(0,1) λ

n = λnv(n, 1).

(ii) In Rn+2, put y = xn+1 and z = xn+2. With the help of part (i), we see v(n+ 2, 1)


=∫y2+z2<1(

∫x21+···+x2n<1−(y2+z2) 1)dydz

=∫y2+z2<1 v(n,

√1− (y2 + z2))dydz = v(n, 1)

∫y2+z2<1(1− (y2 + z2))n/2dydz.

Now applying [133](ii) to the polar coordinates in the yz-plane, note that∫y2+z2<1(1− (y2 + z2))n/2dydz =

∫ 10

∫ 2π0 (1− r2)n/2rdθdr = 2π/(n+ 2) by putting t = 1− r2.

(iii) Since v(1, 1) = 2, we get by (i) and (ii) that v(3, λ) = λ3v(3, 1) = λ3 × 2πv(1, 1)/3 = 4πλ3/2.

Since v(2, 1) = π, we get by (i) and (ii) that v(4, λ) = λ4v(4, 1) = λ4 × 2πv(2, 1)/4 = π2λ4/2. �

12. Line integrals

Line integral refers to the integral of a function f over a path α, and there are two types:

(i) line integrals of scalar fields (i.e., real-valued) f , and this line integral will be independent of

the orientation of the path α, and

(ii) line integrals of vector fields (i.e., vector-valued) f , and this line integral will be sensitive to

the orientation of the path α.

Discussion: Let U ⊂ Rn be open, f : U → R be continuous, and α : [a, b] → U be a C1-path. We

would like to define∫α f in such a way that

∫α f is approximately equal to

∑kj=1 f(α(aj))∥α(aj)−

α(aj−1)∥ whenever P = {a0 ≤ a1 ≤ · · · ≤ ak} is a sufficiently refined partition of [a, b]. If

α = (α1, . . . , αn), then by Mean value theorem there are cij ∈ (aj−1, aj) with αi(aj) − αi(aj−1) =

α′i(cij)(aj − aj−1). Since α is a C1-path, we deduce that α(aj) − α(aj−1) ∼ α′(aj)(aj − aj−1).

Hence∫α f ∼

∑kj=1 f(α(aj))∥α′(aj)∥(aj − aj−1), where the right hand side is a Riemann sum of

the continuous real-valued function t 7→ f(α(t))∥α′(t)∥ on [a, b]. Hence we define:

Definition: [Line integral of a scalar field] Let U ⊂ Rn be open and f : U → R be continuous.

(i) If α : [a, b] → U is a C1-path, then we define∫α f =

∫ ba f(α(t))∥α

′(t)∥dt(ii) If α : [a, b] → U is a continuous path with the property that there is a partition P = {a0 ≤ a1 ≤

· · · ≤ ak} of [a, b] such that α[j] := α|[aj−1,aj ] is a C1-path for each j, then we say α :=

∑kj=1 α[j] is

a piecewise C1 path (for example, the parametrization of the boundary of a rectangle), and in this

case we define∫α f =

∑kj=1

∫α[j]

f .

Definition: The length l(α) of a C1-path α : [a, b] → Rn is defined as l(α) =∫α 1 =

∫ ba ∥α

′(t)∥dt(i.e., take f ≡ 1 in the line integral defined above). If α :=

∑kj=1 α[j] : [a, b] → Rn is a piecewise

C1 path, its length is defined as l(α) =∑k

j=1 l(α[j]).

Remark: Let U ⊂ R2 be open, and f : U → R be continuous with f ≥ 0. If α : [a, b] → U is a

C1-path, then the quantity∫α f =

∫ ba f(α(t))∥α

′(t)∥dt gives the area of the ‘curtain-shaped’ region

{(x, α(x), z) ∈ R3 : a ≤ x ≤ b and 0 ≤ z ≤ f(x, α(x))} bounded by the image of α in the xy-plane

and the projection of this image on the graph of f (depends also an the ‘speed’ of α).


Example: (i) Let α : [0, π/2] → R2 be α(t) = (2 cos t, 2 sin t) and f : R2 → R be f(x, y) = x + 5y.

Then f(α(t)) = 2 cos t+ 10 sin t and ∥α′∥ ≡ 2. Hence∫α f =

∫ π/20 (4 cos t+ 20 sin t)dt = 24.

(ii) Let α, β : [0, 2π] → R2 be α(t) = (cos t, sin t) and β(t) = (cos 3t, sin 3t). Then α and β have the

same image (the unit circle), but l(α) = 2π ̸= 6π = l(β) because ∥α′∥ ≡ 1 and ∥β′∥ ≡ 3. Moreover,

if f : R2 → R is f(x, y) = x2 + y2, then f ◦ α ≡ 1 ≡ f ◦ β, and therefore∫α f = 2π ̸= 6π =

∫β f .

Exercise-38: Let U ⊂ Rn be open, f, g : U → R be continuous, and α : [a, b] → U be a piecewise

C1 path. Then, (i)∫α(c1f + c2g) = c1

∫α f + c2

∫α g for every c1, c2 ∈ R.

(ii) If f ≥ g, then∫α f ≥

∫α g. In particular, if f ≥ 0, then

∫α f ≥ 0.

[135] [Line integral of a scalar field remains invariant under an equivalent reparametrization of the

path] Let U ⊂ Rn be open, and f : U → R be continuous.

(i) Let α : [a, b] → U be a piecewise C1 path, g : [c, d] → [a, b] be a C1-diffeomorphism, and

β : [c, d] → U be β = α ◦ g. Then∫α f =

∫β f .

(ii) Let α : [a, b] → U be a piecewise C1 path, and define −α : [a, b] → Rn as (−α)(t) = α(a+ b− t)

(the path in the reverse direction). Then∫α f =

∫−α f .

Proof. (i) By the additivity of the integral, we may suppose that α is a C1-path; and then β

is also a C1-path. Let h : [a, b] → R be h(t) = f(α(t))∥α′(t)∥. Then by Change of variable

theorem,∫α f =

∫ ba h(t)dt =

∫ dc h(g(s))|g

′(s)|ds. By Chain rule, β′(s) = g′(s)α′(g(s)), and therefore∫β f =

∫ dc f(β(s))∥β

′(s)∥ds =∫ dc f(β(s))∥α

′(g(s))∥|g′(s)|ds =∫ dc h(g(s))|g

′(s)|ds =∫α f .

(ii) This follows from (i) by taking g : [a, b] → [a, b] to be g(s) = a+ b− s. �

Remark: When we have to integrate a scalar field over a circle or the boundary of a rectangle,

etc., we should consider the natural parametrization in the anticlockwise direction. For example,

let A = [a, b] × [c, d], and suppose we wish to evaluate∫∂A 1. Let α, β : [a, b] → R2 be α(t) =

(t, c), β(t) = (t, d); and γ, σ : [c, d] → R2 be γ(t) = (a, t), σ(t) = (b, t). Then the anticlockwise

parametrization of ∂A is given by the path α+σ−β−γ. By [135](ii),∫∂A 1 =

∫α 1+

∫σ 1+

∫β 1+

∫γ 1 =∫ b

a (∥α′(t)∥+∥β′(t)∥)+

∫ dc (∥σ

′(t)∥+∥γ′(t)∥) =∫ ba 2+

∫ dc 2 = 2((b−a)+(d−c)), which is the perimeter

of the rectangle A.

Discussion: Let U ⊂ Rn be open, f : U → Rn be continuous and α : [a, b] → U be a C1-path.

Let P = {a0 ≤ a1 ≤ · · · ≤ ak} be a sufficiently refined partition of [a, b]. Think of f as a force

field. The work done by f in moving a particle from α(aj−1) to α(aj) along the image of α is

approximately equal to ⟨f(α(aj)), α(aj) − α(aj−1)⟩. But α(aj) − α(aj−1) ∼ (aj − aj−1)α′(aj) by

Mean value thoerem and the C1-property of α. Hence the work done by f in moving a particle

along the image of α from α(a) to α(b) is approximately equal to∑k

j=1⟨f(α(aj)), α′(aj)⟩(aj−aj−1),


which is a Riemann sum of the continuous function t 7→ ⟨f(α(t)), α′(t)⟩ from [a, b] to R. Motivated

by this observation, we define:

Definition: [Line integral of a vector field] Let U ⊂ Rn be open and f : U → Rn be continuous.

(i) If α : [a, b] → U is a C1-path, then we define∫α f =

∫ ba ⟨f(α(t)), α

′(t)⟩dt This integral is

also denoted as∫f · dα, where the dot in the middle indicates the dot product (inner product).

If f = (f1, . . . , fn) and α = (α1, . . . , αn), then∫f · dα =

∫ ba

∑ni=1 fi(α(t))α

′i(t)dt. Moreover, if

xi(t) = αi(t), then dxi = α′i(t)dt, and hence the following expression is also used for the line

integral of a vector field:∫f · dα =

∫α(f1dx1 + · · ·+ fndxn)

(ii) If α =∑k

j=1 α[j] : [a, b] → U is a piecewise C1 path (where each α[j] is a C1-path), we define∫α f =

∑kj=1

∫α[j]

f .

Example: (i) Let α : [0, 1] → R3 be α(t) = (t, t2, t3). Then we have∫α(xdx − 2ydy + zdz) =∫ 1

0 (t− 4t3 + 3t5)dt = 0.

(ii) Let α : [0, π/2] → R2 be α(t) = (2 cos t, 2 sin t) and f : R2 → R2 be f(x, y) = (3x, 5y). Then∫α f =

∫ π/20 ⟨f(α(t)), α′(t)⟩dt =

∫ π/20 ⟨(6 cos t, 10 sin t), (−2 sin t, 2 cos t)⟩dt = 8

∫ π/20 cos t sin tdt =

8∫ 10 sds = 4 by putting s = sin t.

Exercise-39: Let U ⊂ Rn be open, f, g : U → Rn be continuous, and α : [a, b] → U be a piecewise

C1 path. Then,∫α(c1f + c2g) = c1

∫α f + c2

∫α g for every c1, c2 ∈ R.

[136] [Line integral of a vector field is sensitive to the orientation of the path; but is invariant

under an orientation-preserving equivalent reparametrization of the path] Let U ⊂ Rn be open,

f : U → Rn be continuous, and α : [a, b] → U be a piecewise C1 path. Then,

(i)∫−α f = −

∫α f , where −α : [a, b] → U is given by (−α)(t) = α(a+ b− t).

(ii) Let g : [c, d] → [a, b] be a C1-diffeomorphism, and β : [c, d] → U be β = α◦g. Then,∫α f =

∫β f

if g′ > 0; and∫α f = −

∫β f if g′ < 0.

Proof. (i)∫−α f =

∫ ba ⟨f(α(a + b − t), α′(a + b − t)⟩dt = −

∫ ba ⟨f(α(s), α

′(s)⟩ds = −∫α f by putting

s = a+ b− t.

(ii) Let h : [a, b] → R be h(t) = ⟨f(α(t)), α′(t)⟩. Then by Change of variable,∫α f =

∫ ba h(t)dt =∫ d

c h(g(s))|g′(s)|ds. Also,

∫β f =

∫ dc ⟨f(β(s)), β

′(s)⟩ds =∫ dc h(g(s))g

′(s)ds since β′(s) = g′(s)α′(g(s))

by Chain rule. Now it is clear that∫α f =

∫β f if g′ > 0, and

∫α f = −

∫β f if g′ < 0. �

Example: Let A = [a, b]×[c, d]. We wish to evaluate∫∂A((x+y)dx+(x−y)dy). Let α, β : [a, b] → R2

be α(t) = (t, c), β(t) = (t, d); and γ, σ : [c, d] → R2 be γ(t) = (a, t), σ(t) = (b, t). Then the

anticlockwise parametrization of ∂A is given by the path α + σ − β − γ. Moreover, observe that

dy = 0 along α and β; and dx = 0 along γ and σ. Therefore by [136](i),∫∂A(x+ y)dx+ (x− y)dy


=∫α(x+ y)dx+

∫σ(x− y)dy −

∫β(x+ y)dx−

∫γ(x− y)dy

=∫ ba (t+ c)dt+

∫ dc (b− t)dt−

∫ ba (t+ d)dt−

∫ dc (a− t)dt =

∫ ba (c− d)dt+

∫ dc (b− a)dt = 0.

Definition: Let U ⊂ Rn be a connected open set and f : U → Rn be continuous. We say f has path

independent line integral in U if for any two piecewise C1 paths α, β : [a, b] → U with α(a) = β(a)

and α(b) = β(b), we have that∫α f =

∫β f .

Recall from Exercise-11 that if U ⊂ Rn is a connected open set, then for every x, y ∈ U , there is

a polygonal path (i.e., a continuous path consisting of finitely many line segments, and in particular

a piecewise C1 path) in U from x to y. Here note that a path α : [a, b] → U is said to be a path

from x to y if α(a) = x and α(b) = y.

[137] [Fundamental theorem of Calculus for line integrals of a vector field] Let U ⊂ Rn be a

connected open set, and f : U → Rn be continuous.

(i) Assume there is a function F : U → R with ∇F = f . Then∫α f = F (α(b)) − F (α(a)) for any

piecewise C1 path α : [a, b] → U .

(ii) Assume f has path independent line integral in U . Fix z ∈ U . Define F : U → R as F (x) =∫α f ,

where α is any piecewise C1 path in U from z to x. Then F is a C1-function with ∇F = f .

Proof. (i) Since ∇F = f and f is continuous, it follows that F is a C1-function, and in particu-

lar differentiable. Now,∫α f =

∫ ba ⟨f(α(t)), α

′(t)⟩dt =∫ ba ⟨∇F (α(t)), α

′(t)⟩dt =∫ ba (F ◦ α)′(t)dt =

F (α(b)) − F (α(a)) by the Chain rule [108](iii) and the Fundamental theorem of Calculus of one-

variable theory.

(ii) It suffices to show ∇F = f , and then the continuity of f will imply that F is a C1-function. Let

f = (f1, . . . , fn). Fix x ∈ U and j ∈ {1, . . . , n}. We need to show that limt→0F (x+ tej)− F (x)

t=

fj(x). Let α be piecewise C1 path in U from z to x. Then F (x) =∫α f . Choose an open ball

B ⊂ U centered at x and consider t ̸= 0 with x + tej ∈ B. Let β : [0, 1] → U be β(s) = x + stej ,

i.e., β is a parametrization of the line segment joining x and x + tej . Then α + β is a path in

U from z to x + tej , and therefore F (x + tej) =∫α+β f =

∫α f +

∫β f = F (x) +

∫β f . Hence

F (x + tej) − F (x) =∫β f =

∫ 10 ⟨f(x + stej), tej⟩ds =

∫ t0 ⟨f(x + λej), ej⟩dλ =

∫ t0 fj(x + λej)dλ by

putting λ = st. Moreover, fj(x) =1

t

∫ t0 fj(x)dλ. Consequently, |F (x+ tej)− F (x)

t− fj(x)| ≤∫ t

0 |fj(x+ λej)− fj(x)|dλ|t|

, and the right hand side goes to 0 as t→ 0 by the continuity of fj . �

A path α : [a, b] → Rn is said to be a closed path if α(a) = α(b).

[138] Let U ⊂ Rn be a connected open set, and f : U → Rn be continuous. Then the following are

equivalent: (i)∫α f = 0 for every piecewise C1 closed path α in U .

(ii) f has path independent line integral in U .


(iii) There is a C1-function F : U → R with ∇F = f .

Proof. (i) ⇒ (ii): Let α, β : [a, b] → U be piecewise C1 paths with α(a) = β(a) and α(b) = β(b).

Then α− β is a piecewise C1 closed path, and hence 0 =∫α−β f =

∫α f −

∫β f by (i) and [136](i).

The implication ‘(ii) ⇒ (iii)’ is established in [137](ii), and ‘(iii) ⇒ (i)’ follows from [137](i). �

13. Circulation density and Green’s theorem

Discussion: Let U ⊂ R2 be open. A C1-function f : U → R2 can be thought of as representing a

flow in the planar set U , where f(x, y) is the velocity vector at (x, y) ∈ U . Some examples are:

(i) If c ∈ R2, then f : R2 → R2 given by f ≡ c represents the flow moving in the direction of the

vector c with constant velocity c. If c = (0, 0), then f represents a stationary flow.

(ii) f : R2 → R2 given by f(x, y) = (x, y) represents a flow moving outward from (0, 0) in all

directions with increasing speed (expansion). Draw a picture to see this.

(iii) f : R2 → R2 given by f(x, y) = (−y, x) represents a flow rotating by an angle π/2 in the

anticlockwise direction around the origin.

[139] [Interpretation of∂f2∂x

− ∂f1∂y

as circulation density] Let U ⊂ R2 be open, and f : U → R2

be a C1-function. The circulation density of f at (a, b) ∈ U may be defined as the quantity

limA→{(a,b)}1

area(A)

∫∂A f = limε→0

1

ε2∫∂A f , where A ⊂ U is a small square with side-length ε > 0

centered at (a, b). Then the circulation density of f at (a, b) is equal to∂f2∂x

(a, b) − ∂f1∂y

(a, b).

Consequently, we have the following:

(i) If Jf (a, b) is a symmetric matrix, then the circulation density of f at (a, b) is zero.

(ii) If there is a function F : U → R with ∇F = f , then the circulation density of f at (a, b) is zero

for every (a, b) ∈ U .

Proof. Write f = (f1, f2), and define α, β, γ, σ : [0, ε] → U as α(t) = (a − ε/2 + t, b − ε/2),

β(t) = (a− ε/2 + t, b+ ε/2), γ(t) = (a− ε/2, b− ε/2 + t), and σ(t) = (a+ ε/2, b− ε/2 + t). Then∫∂A f =

∫α+σ−β−γ(f1dx + f2dy) =

∫α f1dx +

∫σ f2dy −

∫β f1dx −

∫γ f2dy because dy = 0 along α

and β, and dx = 0 along γ and σ.

Since the midpoint of the side of A represented by α is (a, b − ε/2), we have that∫α f1dx =∫ ε

0 f1(α(t)) × 1dt ∼∫ ε0 f1(a, b − ε/2)dt = εf1(a, b − ε/2). Similarly,

∫β f1dx ∼ εf1(a, b + ε/2),∫

γ f2dy ∼ εf2(a− ε/2, b), and∫σ f2dy ∼ εf2(a+ ε/2, b). Therefore,

1

ε2∫∂A f

=

[∫α f1dx−

∫β f1dx

ε

]+

[∫σ f2dy −

∫γ f2dy

ε

]∼[f1(a, b− ε/2)− f1(a, b+ ε/2)

ε

]+

[f2(a+ ε/2, b)− f2(a− ε/2, b)

ε

]


→ −∂f1∂y

(a, b) +∂f2∂x

(a, b) as ε→ 0. This proves the main assertion.

Statement (i) is an immediate corollary. To deduce (ii) from (i), note that Jf (a, b) is the transpose

of the Hessian matrix HF (a, b), and HF (a, b) is symmetric by [110] because F is C2 (as f is C1). �

Example: (i) Let f : R2 → R2 be f(x, y) = (−y, x). We know that this flow represents a rotation.

Since∂f2∂x

− ∂f1∂y

≡ 2, the circulation density of f at (a, b) is 2 for every (a, b) ∈ R2. From [139](ii),

we deduce that there does not exist any C2-function F : R2 → R with ∇F = f .

(ii) Let f : R2 → R2 be f(x, y) = (x, y), which represents a flow expanding in all directions from

the origin with increasing speed. Here,∂f2∂x

− ∂f1∂y

≡ 0, and thus the circulation density of f at

(a, b) is 0 for every (a, b) ∈ R2. If F : R2 → R is F (x, y) =x2 + y2

2, then ∇F = f .

If f : R2 → R2 is a C1-function with∂f2∂x

− ∂f1∂y

≡ 0, we may ask whether there is a C2-function

F : R2 → R with ∇F = f , or equivalently whether f has path independent line integral in R2.

The affirmative answer is given by [140] below. Another related result is Green’s theorem (stated

as [141] below), which is true for regions bounded by piecewise C1 paths, but we will prove only a

special case of the Green’s theorem.

[140] Let U ⊂ Rn be open and f : U → Rn be a C1-function. Suppose that U is convex6. Then

there is a C2-function F : U → R with ∇F = f ⇔ Jf (a) is symmetric for every a ∈ U (i.e.,∂fi∂xj

(a) =∂fj∂xi

(a) for every i, j ∈ {1, . . . , n} and every a ∈ U).

Proof. ⇒: This is similar to the proof of [139](ii): Jf (a) is the transpose of HF (a), and HF (a) is

symmetric by [110].

⇐: After a translation, assume 0 ∈ U . Define F : U → R as F (x) =∫ 10 ⟨f(sx), x⟩ds. That

is, F (x) =∫α f , where α : [0, 1] → U given by α(s) = sx parametrizes the line segment [0, x]

(here we use the fact that U is convex). We will show ∇F = f , and then the C1-property of f

will imply that F is a C2-function. Fix j ∈ {1, . . . , n} and x ∈ U . By [124](iii), we have that∂F

∂xj(x) =

∫ 10

∂⟨f(sx), x⟩∂xj

ds. The assumption∂fi∂xj

=∂fj∂xi

implies that∂f

∂xj= ∇fj . Hence


= ⟨∂(f(sx))∂xj

, x⟩+ ⟨f(sx), ∂x∂xj

⟩ = s⟨ ∂f∂xj

(sx), x⟩+ ⟨f(sx), ej⟩

= s⟨∇fj(sx), x⟩+ fj(sx) = sg′(s) + g(s), where g : [0, 1] → U is g(s) := fj(sx). Therefore,

∂F

∂xj(x) =

∫ 10


ds =∫ 10 (sg

′(s) + g(s))ds = sg(s)|10 = g(1) = fj(x). �

Definition: A compact set A ⊂ R2 is said to be an elementary region if there are piecewise C1 paths

ϕ, ψ : [a, b] → R and ϕ̃, ψ̃ : [c, d] → R such that A has both of the following two representations:

6or more generally that U has no ‘holes’.


A = {(x, y) : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)} = {(x, y) : c ≤ y ≤ d and ϕ̃(y) ≤ x ≤ ψ̃(y)}.

Clearly, every rectangle is an elementary region. On the other hand, the set {(x, y) ∈ R2 : −1 ≤

x ≤ 1 and x2 ≤ y ≤ x2 + 1} is not an elementary region because the second representation fails.

Exercise-40: (ii) If A ⊂ R2 is a solid triangle, then A is an elementary region.

(ii) If A ⊂ R2 is a compact convex set bounded by a polygonal path (i.e., if A is a compact convex

polyhedron), then A is an elementary region.

[Hint : (i) For example, suppose A has vertices (−1, 2), (0, 0), and (1, 1). Take ϕ to be a parametriza-

tion of [(−1, 2), (0, 0)] + [(0, 0), (1, 1)], ψ to be a parametrization of [(−1, 2), (1, 1)], ϕ̃ to be a

parametrization of [(−1, 2), (0, 0)], and ψ̃ to be a parametrization of [(−1, 2), (1, 1)] + [(1, 1), (0, 0)].

(ii) Let [a, b] × [c, d] be the smallest rectangle enclosing A. Let y1 = min{y : (a, y) ∈ A},

y2 = max{y : (a, y) ∈ A}, y3 = min{y : (b, y) ∈ A}, and y4 = max{y : (b, y) ∈ A}. Let

ϕ : [a, b] → R be a parametrization of the ‘lower’ portion of the boundary of A from (a, y1) to

(b, y3), and ψ : [a, b] → R2 be a parametrization of the ‘upper’ portion of the boundary of A from

(a, y2) to (b, y4). Similarly, define ϕ̃, ψ̃ : [c, d] → R2.]

[141] [Green’s theorem] Let U ⊂ R2 be open and f : U → R2 be a C1-function.

(i) If A ⊂ U is an elementary region, then∫A(∂f2∂x

− ∂f1∂y

) =∫∂A f (where the integral over ∂A is

taken with the anticlockwise orientation).

(ii) Suppose A ⊂ U is such that A =∪pj=1Aj , where p ∈ N and Aj ’s are elementary regions with

pairwise disjoint interiors. Then∫A(∂f2∂x

− ∂f1∂y

) =∫∂A f .

Proof. (i) Let f = (f1, f2). Then∫∂A f =

∫∂A(f1dx + f2dy). We will show that

∫A(−

∂f1∂y

) =∫∂A f1dx and

∫A

∂f2∂x

=∫∂A f2dy. Choose piecewise C1 paths ϕ, ψ, ϕ̃, and ψ̃ such that

A = {(x, y) : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)} = {(x, y) : c ≤ y ≤ d and ϕ̃(y) ≤ x ≤ ψ̃(y)}.

The graphs of ϕ, ψ, ϕ̃, ψ̃ are null sets in R2. Since ∂A consists of these graphs and at most two

horizontal and at most two vertical line segments, ∂A is also a null set in R2. Hence the compact

set A is Jordan measurable, and therefore the continuous function∂f2∂x

− ∂f1∂y

is indeed Riemann

integrable over A by [128](ii).

We have∫A(−

∂f1∂y

) =∫ ba (∫ ψ(x)ϕ(x) (−

∂f1∂y

)dy)dx =∫ ba (f1(x, ϕ(x) − f1(x, ψ(x))dx by the first rep-

resentation of A. Let α, β : [a, b] → R2 be α(t) = (t, ϕ(t)) and β(t) = (t, ψ(t)). Then dx = dt

along both α and β. Note that the vertical line segments of ∂A (if any) do not contribute to the

integral∫∂A f1dx since dx = 0 along vertical lines. Therefore, by the first representation of A, we

get∫∂A f1dx =

∫α f1dx−

∫β f1dx =

∫ ba f1(t, ϕ(t)− f1(t, ψ(t))dt =

∫A(−

∂f1∂y

)dy.


Next,∫A

∂f2∂x

=∫ dc (∫ ψ̃(y)ϕ̃(y)

∂f2∂x

dx)dy =∫ dc (f2(ψ̃(y), y) − f2(ϕ̃(y), y))dy by the second representa-

tion of A. Let γ, σ : [c, d] → R2 be γ(t) = (ϕ̃(t), t) and σ(t) = (ψ̃(t), t). Then dy = dt along both

γ and σ. Note that the horizontal line segments of ∂A (if any) do not contribute to the integral∫∂A f2dy since dy = 0 along horizontal lines. Therefore, by the second representation of A, we get∫∂A f2dy =

∫σ f2dy −

∫γ f2dy =

∫ dc (f2(ψ̃(t), t)− f2(ϕ̃(t), t))dt =

∫A(−

∂f1∂y

)dy.

(ii) We have∫A(∂f2∂x

− ∂f1∂y

) =∑p

j=1

∫Aj(∂f2∂x

− ∂f1∂y

) by [127](iv) because Ai∩Aj is a null set in R2

for every i ̸= j. We also have∫∂A f =

∑pj=1

∫∂Aj

f because the integrals over the common portions

of ∂Ai and ∂Aj (if any) for i ̸= j are in opposite directions and cancel each other. Hence the result

follows by applying part (i) to each Aj . �

Remark: The equality ‘∫A(∂f2∂x

− ∂f1∂y

) =∫∂A f ’ in Green’s theorem may be interpreted as follows:

the net amount of anticlockwise rotation of a 2-dimensional flow f in a planar region A is equal to

the net amount of the flow f along the boundary of A in the anticlockwise direction.

Exercise-41: Let A ⊂ R2 be the region bounded by the ellipsex2

a2+y2

b2= r2, where a, b > 0. Then

µ(A) = πabr2 by an application of Green’s theorem. [Hint : Choose a simple enough C1-function

f : R2 → R2 with∂f2∂x

− ∂f1∂y

≡ 1, say f(x, y) = (0, x). Then µ(A) =∫A 1 =

∫A(∂f2∂x

− ∂f1∂y

) =∫∂A f

by [141]. Parametrizing ∂A with α : [0, 2π] → R2 given by α(t) = (ar cos t, br sin t), we see∫∂A f =

∫ 10 ⟨f(α(t)), α

′(t)⟩dt =∫ 2π0 abr2 cos2 tdt = abr2

∫ 2π0 (

1 + cos(2t)

2)dt = πabr2.]

14. Surface integrals

As in the case of line integrals, we will define two types of surface integrals - (i) for scalar fields,

and (ii) for vector fields. We will think of a surface as a function rather than as a set (as in the

case of a path). First, we need to recall the notion of a cross product (vector product).

Definition: The cross product (also called vector product) is a binary operation on R3 defined by

the following conditions: (i) e1 × e2 = e3, e2 × e3 = e1, and e3 × e1 = e2.

(ii) ej × ei = −(ei × ej) for 1 ≤ i, j ≤ 3, and in particular ej × ej = 0 ∈ R3 for 1 ≤ j ≤ 3.

(iii) u× v =∑3

i=1

∑3j=1 uivj(ei × ej) for every u = (u1, u2, u3) and v = (v1, v2, v3) in R3.

Exercise-42: In R3, we have: (i) v × u = −(u× v); and hence u× u = 0 ∈ R3.

(ii) The cross product a is bilinear map, i.e., it is linear in each variable.

(iii) Symbolically, u× v = det

e1 e2 e3

u1 u2 u3

v1 v2 v3

= det

e1 u1 v1

e2 u2 v2

e3 u3 v3

(since det(At) = det(A)).

(iv) u× v ̸= 0 ⇔ {u, v} is linearly independent (this follows from (iii)).


(v) ⟨u× v, w⟩ = det

w1 w2 w3

u1 u2 u3

v1 v2 v3

= det

u1 u2 u3

v1 v2 v3

w1 w2 w3

= det

u1 v1 w1

u2 v2 w2

u3 v3 w3

.(vi) ⟨u× v, w⟩ ̸= 0 ⇔ {u, v, w} is linearly independent (this follows from (v)). In particular, u× v

is perpendicular to both u and v, i.e., ⟨u× v, u⟩ = 0 = ⟨u× v, v⟩.

(vii) ∥u × v∥2 = ∥u∥2∥v∥2 − |⟨u, v⟩|2 by (iii). It follows that ∥u × v∥ = ∥u∥∥v∥ sin θ if θ ∈ [0, π] is

the angle between u and v because ⟨u, v⟩ = ∥u∥∥v∥ cos θ.

(viii) ∥u× v∥ is the area of the parallelogram in R3 with vertices 0, u, v, and u+ v by (vii).

(ix) |⟨u× v, w⟩| is the area of the parallelepiped in R3 specified by the three vectors u, v, and w (to

see this, note that if η is the angle between u× v and w, then |⟨u× v, w⟩| = ∥u× v∥∥w∥| cos η|).

(x) u× (v × w) = ⟨u,w⟩v − ⟨u, v⟩w.

(xi) [Jacobi identity] (u× v)× w + (v × w)× u+ (w × u)× v = 0 ∈ R3.

There are different approaches to the definition of a surface. We will consider only the restricted

notion of a parametric surface in R3. As in the case of a path, we will define a parametric surface as

a function; and the image of this function will be what we geometrically think of as a surface. There

will be a little bit of ambiguity in the definition of a parametric surface since minor modifications

will be needed depending on the context.

Definition: A closed path α : [a, b] → Rn is said to be simple if α is injective on [a, b).

Definition: Let X ⊂ R2 be a compact connected set whose boundary can be parametrized by a

piecewise C1 simple closed path (this implies in particular that X is Jordan measurable). Then a

C1-function P : X → R3 will be called a parametric surface (ambiguity: define partial derivatives

of P by considering one-sided limits or assume P is defined and is C1 in a neighborhood of X).

Geometrically, the image set P (X) is to be thought of as a surface in R3. To ensure that P (X) is

indeed a ‘two-dimensional’ figure, it is desirable to assume the following:

(i) P is injective on X, or at least on int(X).

(ii) JP (a) has full rank 2 for every a ∈ X, or for every a ∈ int(X). Note that JP (a) has rank 2 ⇔

the columns∂P

∂x(a) and

∂P

∂y(a) of JP (a) are linearly independent ⇔ ∂P

∂x(a)× ∂P

∂y(a) ̸= 0 ∈ R3.

Example: (i) [Sphere] Let X = [0, 2π] × [0, π], r > 0, and define a C1-function P : X → R3 as

P (x, y) = (r cosx sin y, r sinx sin y, r cos y), which is injective on int(X). Note that the image P (X)

is the sphere with radius r centered at the origin of R3. We have that


∂P

∂x(x, y) × ∂P

∂y(x, y) = det

e1 e2 e3

−r sinx sin y r cosx sin y 0

r cosx cos y r sinx cos y −r sin y

= −r sin yP (x, y) ̸= (0, 0, 0) for

every (x, y) ∈ int(X). Thus P is a parametric surface. In this example,∂P

∂x(x, y) × ∂P

∂y(x, y) is

the inward normal to the sphere at P (x, y) because of the negative sign in ‘−r sin yP (x, y)’, and

∥∂P∂x

(x, y)× ∂P

∂y(x, y)∥ = r2 sin y because ∥P (x, y)∥ = r and sin y ≥ 0 for y ∈ [0, π].

(ii) [Cylinder] Let r > 0, h > 0, X = [0, 2π] × [0, h], and define a C1-function P : X → R3 as

P (x, y) = (r cosx, r sinx, y), which is injective on int(X). Note that the image P (X) is a vertical

cylinder of height h and radius r (without the top and bottom discs) with the center of the bottom

disc placed at the origin of R3. We have that

∂P

∂x(x, y) × ∂P

∂y(x, y) = det

e1 e2 e3

−r sinx r cosx 0

0 0 1

= (r cosx, r sinx, 0) = P (x, 0) ̸= (0, 0, 0) for

every (x, y) ∈ X. Thus P is a parametric surface. In this example,∂P

∂x(x, y) × ∂P

∂y(x, y) is the

outward normal to the cylinder at P (x, y), and ∥∂P∂x

(x, y)× ∂P

∂y(x, y)∥ = ∥P (x, 0)∥ = r.

Exercise-43: (i) [Observation] If P = (P1, P2, P3) : X → R3 is a parametric surface and a ∈ X,

then by Exercise-42(iii), we see that

∂P

∂x(a)× ∂P

∂y(a) = det

e1

∂P1

∂x(a)

∂P1

∂y(a)

e2∂P2

∂x(a)

∂P2

∂y(a)

e3∂P3

∂x(a)

∂P3

∂y(a)

= det[E JP (a)

], where E :=

e1

e2

e3

3×1

.

This suggests that ∥∂P∂x

(a)× ∂P

∂y(a)∥ is the ‘local magnification factor’ of P at a ∈ X.

(ii) Let P : X → R3 and P̃ : X̃ → R3 be parametric surfaces, and suppose g : X → X̃ is a

C1-diffeomorphism with P = P̃ ◦ g. Then for every a ∈ X and b := g(a) ∈ X̃, we have that

∂P

∂x(a)× ∂P

∂y(a) = det(Jg(a))(

∂P̃

∂x(b)× ∂P̃

∂y(b)).

[Hint : (ii) JP (a) = JP̃(b)Jg(a) by Chain rule. Now use (i).]

Discussion: (i) Let P : X → R3 be a parametric surface, and D ⊂ X be a small rectangle. Let

a, a + εe1, a + δe2, and a + εe1 + δe2 be the vertices of D, where ε, δ > 0 are small. Then P (D)

is approximately equal to a parallelogram, three of whose four vertices are P (a), P (a + εe1), and

P (a + δe2). Using the C1-property of P and Mean value theorem, we see that the area of this

parallelogram is ∼ ∥(P (a + εe1) − P (a)) × (P (a + δe2) − P (a))∥ ∼ ∥∂P∂x

(a) × ∂P

∂y(a)∥εδ. Thus

area(P (D)) ∼ ∥∂P∂x

(a)× ∂P

∂y(a)∥µ(D).


(ii) Let U ⊂ R3 be open, f : U → R be continuous, and P : X → U be a parametric surface. We

wish to define the integral∫P f . Since X is Jordan measurable, we may approximate X from inside

by a finite union∪qi=1Di of rectangles with pairwise disjoint interiors. If Di’s are sufficiently small

and ai ∈ Di, then∫P f should be approximately equal to

∑qi=1 f(P (ai)) × area(P (Di)). By part

(i), this requirement becomes∫P f ∼

∑qi=1 f(P (ai))∥

∂P

∂x(ai) ×

∂P

∂y(ai)∥µ(Di) if we take ai to be

the lower-left vertex of Di. The sum on the right hand side is a Riemann sum approximating the

Riemann integral over (the Jordan measurable set) X of the continuous function (f ◦P )∥∂P∂x

× ∂P

∂y∥

from X to R. Hence we define:

Definition: (i) [Surface integral of a scalar field] Let U ⊂ R3 be open, f : U → R be continuous,

and P : X → U be a parametric surface. We define ∫P f =

∫X(f ◦ P )∥∂P

∂x× ∂P

∂y∥

(ii) If P : X → R3 is a parametric surface, then we definearea(P (X)) =

∫P 1 =

∫X ∥∂P

∂x× ∂P

∂y∥

Remark: The surface integral∫P f is also denoted as

∫S fdS, where S := P (X).

Example: (i) Recall the examples of the sphere and the cylinder from the previous page. In the

case of the sphere with radius r, we have ∥∂P∂x

(x, y)× ∂P

∂y(x, y)∥ = r2 sin y, and hence the surface

area of this sphere is =∫ 2π0 (∫ π0 r

2 sin ydy)dx =∫ 2π0 2r2dx = 4πr2. In the case of the cylinder with

height h and radius r, we have ∥∂P∂x

× ∂P

∂y∥ ≡ r, and hence the surface area of this cylinder (without

the top and bottom discs) is =∫ 2π0 (∫ h0 rdy)dx = 2πrh.

(ii) Let h > 0, r > 0, X = [0, π/2] × [0, h], and P : X → R3 be P (x, y) = (r cosx, r sinx, y). Note

that ∥∂P∂x

× ∂P

∂y∥ ≡ r. If f : R3 → R is f(x, y, z) = x+ y+ z, then

∫P f =

∫X(f ◦P )∥

∂P

∂x× ∂P

∂y∥ =∫ π/2

0 (∫ h0 (r

2 cosx+ r2 sinx+ ry)dy)dx =∫ π/20 (r2h cosx+ r2h sinx+

rh2

2)dx = 2r2h+

πrh2

4.

Exercise-44: Let P : X → R3 be a parametric surface of the form P (x, y) = (x, y, ϕ(x, y)), where

ϕ : X → R is a C1-function. Then P (X) is the graph of ϕ, P is injective on X, and∂P

∂x× ∂P

∂y=

det

e1 e2 e3

1 0∂ϕ

∂x

0 1∂ϕ

∂y

= (−∂ϕ∂x,−∂ϕ

∂y, 1) ̸= 0 ∈ R3 inX. Hence area(P (X)) =

∫A

√1 + (

∂ϕ

∂x)2 + (

∂ϕ

∂y)2.

Exercise-45: Let U ⊂ R3 be open, f, g : U → R be continuous, and P : X → U be a parametric

surface. Then, (i)∫P (c1f + c2g) = c1

∫P f + c2

∫P g for every c1, c2 ∈ R.

(ii) If f ≥ g, then∫P f ≥

∫P g. In particular, if f ≥ 0, then

∫P f ≥ 0.


[142] Let U ⊂ R3 be open, f : U → R be continuous, and P : X → U and P̃ : X̃ → U be

parametric surfaces. Suppose there is a C1-diffeomorphism g : X → X̃ with P = P̃ ◦g (if necessary

assume g is defined in a neighborhood of X). Then,∫P f =

∫P̃f .

Proof. If a ∈ X and b = g(a) ∈ X̃, then ∥∂P∂x

(a) × ∂P

∂y(a)∥ = ∥∂P̃

∂x(b) × ∂P̃

∂y(b)∥|det(Jg(a))|

by Exercise-43(ii). Let h : X̃ → R be h(b) = (f ◦ P̃ )(b)∥∂P̃∂x

(b) × ∂P̃

∂y(b)∥. Then by Change of

variable theorem and the initial observation, we see that∫P̃f =

∫X̃h =

∫X(h ◦ g)| det(Jg(·)| =∫

X(f ◦ P )∥∂P∂x

× ∂P

∂y∥ =

∫P f . �

[143] (i) Let 0 ≤ a < b, and ϕ : [a, b] → R be a C1-function. Assume that the graph of ϕ lies in

the xz-plane in R3. Then the area of the ‘surface of revolution’ obtained by rotating the graph of

ϕ around the z-axis is 2π∫ ba x√

1 + (ϕ′(x))2dx.

(ii) The surface area of the cone with height h > 0 and radius r > 0 (without the disc) is πr√r2 + h2.

Proof. (i) Let X = [a, b]× [0, 2π]. The ‘surface of revolution’ is parametrized by P : X → R3 given

by P (x, y) = (x cos y, x sin y, ϕ(x)). Now,∂P

∂x(x, y)× ∂P

∂y(x, y) = det

e1 e2 e3

cos y sin y ϕ′(x)

−x sin y x cos y 0

=

(−xϕ′(x) cos y,−xϕ′(x) sin y, x) so that ∥∂P∂x

(x, y)× ∂P

∂y(x, y)∥ = x

√1 + (ϕ′(x))2. Therefore,

area(P (X)) =∫ ba (∫ 2π0 x

√1 + (ϕ′(x))2dy)dx = 2π

∫ ba x√

1 + (ϕ′(x))2dx.

(ii) The surface of the cone (without the disc) can be obtained as a ‘surface of revolution’ as

described in (i) if we take ϕ : [0, r] → R as ϕ(x) = hx/r. Hence by (i), the area of the cone

= 2π∫ r0 x√

1 + (ϕ′(x))2dx = 2π∫ r0 x√

1 + (h/r)2dx = πr2√

1 + (h/r)2 = πr√r2 + h2. �

Our next aim is to define surface integrals for vector fields in R3.

Discussion: Let U ⊂ R3 be open, f : U → R3 be a C1-function, and P : X → U be a parametric

surface. We wish to define the integral∫P f . Assume f describes a flow (think of f(u) as the

velocity vector of the flow at u ∈ U), and the value∫P f should give the total flow across P (X) in

the direction specified by∂P

∂x× ∂P

∂y

(i) Let D ⊂ X be a small rectangle, a ∈ D, and suppose∂P

∂x(a)× ∂P

∂y(a) ̸= 0 ∈ R3. Then, observe

that (∂P

∂x(a)× ∂P

∂y(a))/∥∂P

∂x(a)× ∂P

∂y(a)∥ is a unit normal to the surface P (X). Hence the flow f

across P (D) in the direction of this unit normal is

∼ ⟨f(P (a)), unit normal⟩ × area(P (D)) ∼ ⟨f(P (a)), ∂P∂x

(a) × ∂P

∂y(a)⟩µ(D) because area(P (D)) ∼

∥∂P∂x

(a)× ∂P

∂y(a)∥µ(D).


(ii) Since X is Jordan measurable, we may approximate X from inside by a finite union∪qi=1Di

of rectangles with pairwise disjoint interiors. If Di’s are sufficiently small and ai ∈ Di, then by

(i), the total flow f across P (X) in the direction specified by∂P

∂x× ∂P

∂yis approximately equal to∑q

i=1⟨f(P (ai)),∂P

∂x(ai)×

∂P

∂y(ai)⟩µ(Di). The right hand side is a Riemann sum of the continuous

real-valued function ⟨f ◦ P, ∂P∂x

× ∂P

∂y⟩ over the Jordan measurable compact set X. This function

is Riemann integrable by [128](ii). Hence we define:

Definition: [Surface integral of a vector field] Let U ⊂ R3 be open, f : U → R3 be a C1-function,

and P : X → U be a parametric surface. The surface integral of f over P (or over S := P (X)) is

defined as ∫P f =

∫X⟨f ◦ P, ∂P

∂x× ∂P

∂y⟩.

Remark: Sometimes we are interested in calculating the flow f across S := P (X) in the direction

of −(∂P

∂x× ∂P

∂y) (depends on the context). Then we consider −

∫X⟨f ◦P,

∂P

∂x× ∂P

∂y⟩ as the surface

integral. The surface integral is also denoted as∫P f · n̂, or as

∫S f · n̂ dS, where S = P (X) and

n̂ = ±(∂P

∂x× ∂P

∂y)/∥∂P

∂x× ∂P

∂y∥ (the unit normal).

Exercise-46: Let U ⊂ R3 be open, f, g : U → R3 be C1-functions, and P : X → U be a parametric

surface. Then∫P (c1f + c2g) = c1

∫P f + c2

∫P g for every c1, c2 ∈ R.

Example: Let h, r > 0, X = [0, 2π] × [0, h], and P : X → R3 be P (x, y) = (r cosx, r sinx, y). We

know that P (X) is a cylinder with height h and radius r whose axis is the z-axis. (i) Let f : R3 → R3

be f(x, y, z) = (−y, x, 0). Then f represents a rotation around the z-axis, and hence there is no flow

out of P (X) so that we expect∫P f = 0. Indeed

∫P f =

∫X⟨(−r sinx, r cosx, 0), (r cosx, r sinx, 0)⟩ =∫

X 0 = 0. (ii) Let f : R3 → R3 be f(x, y, z) = (x, y, 0), then there is flow out of P (X); in fact,∫P f =

∫X⟨(r cosx, r sinx, 0), (r cosx, r sinx, 0)⟩ =

∫X r

2 = r2µ(X) = 2πr2h.

Another notation for the surface integral of a vector field: Let U ⊂ R3 be open, f = (f1, f2, f3) :

U → R3 be continuous, and P = (P1, P2, P3) : X → U be a parametric surface. Write elements of

X as (x, y) and elements of U as (u1, u2, u3). Then f(P (x, y)) = f(u1, u2, u3), where u1 = P1(x, y),

u2 = P2(x, y), and u3 = P3(x, y). For distinct j, k ∈ {1, 2, 3}, letting

duj ∧ duk = det

∂Pj

∂x∂Pj

∂y

∂Pk∂x

∂Pk∂y

, we see ⟨f ◦ P, ∂P∂x

× ∂P

∂y⟩ = f1du2 ∧ du3 + f2du3 ∧ du1 + f3du1 ∧ du2.

Also, let S = P (X). Then the surface integral may be written as

...∫P f =

∫S(f1du2 ∧ du3 + f2du3 ∧ du1 + f3du1 ∧ du2) .

Often, duj ∧ duk is written simply as dujduk (but it should be noted that this does not mean a

simple double integral as in Fubini’s theorem). Since the calculation of the surface integral does


not involve the Chain rule, it is not essential to use disjoint sets of variables for f and P : we may

write the variables of f as x, y, z also. Then the notation for the surface integral becomes∫P f =

∫S(f1dydz + f2dzdx+ f3dxdy) .

Exercise-47: Let f : R3 → R3 be f(x, y, z) = (x, y, 0), and S ⊂ R3 be the upper half of the sphere

with radius r > 0 centered at the origin. Compute the surface integral∫S f with respect to the

outward unit normal to S by considering the following parametrizations:

(i) P : [0, 2π]× [0, π/2] → R3, P (x, y) = (r cosx sin y, r sinx sin y, r cos y).

(ii) P : B(0, 1) ⊂ R2 → R3, P (x, y) = (x, y,√r2 − x2 − y2).

[Hint : (i) We know∂P

∂x(x, y) × ∂P

∂y(x, y) = −r sin yP (x, y), which is an inward normal. Let

X = [0, 2π]× [0, π/2]. Then∫S f = −

∫X⟨f ◦ P, ∂P

∂x× ∂P

∂y⟩

= −∫X⟨(r cosx sin y, r sinx sin y, 0),−r sin y(r cosx sin y, r sinx sin y, r cos y)⟩

=∫ π/20

∫ 2π0 r3 sin3 ydxdy = 2πr3

∫ π/20 sin3 ydy = 2πr3

∫ π/20 (1−cos2 y) sin ydy = 2πr3

∫ 10 (1−λ

2)dλ =

4πr3/3 by putting λ = cos y. (ii) Letting ϕ(x, y) =√r2 − x2 − y2 and using Exercise-44, we

have∂P

∂x× ∂P

∂y= (−∂ϕ

∂x,−∂ϕ

∂y, 1) =

P

ϕ, which is the outward normal. Let X = B(0, 1) ⊂

R2. Then∫S f =

∫X⟨f ◦ P, ∂P

∂x× ∂P

∂y⟩ =

∫X⟨(x, y, 0), (

x√r2 − x2 − y2

,y√

r2 − x2 − y2, 1)⟩ =

∫X

x2 + y2√r2 − x2 − y2

. Using the polar coordinates (x, y) = (t cos θ, t sin θ) and Change of variable the-

orem, this integral is equal to∫ r0

∫ 2π0

t2 × t√r2 − t2

dθdt =∫ r0

2πt3√r2 − t2

dt =∫ r0 2π(r2 − λ2)dλ = 4πr3/3

by putting λ =√r2 − t2 (then dλ =

−tdt√r2 − t2

and t2 = r2 − λ2).]

[144] Let U ⊂ R3 be open, f : U → R3 be continuous, and P : X → U and P̃ : X̃ → U be

parametric surfaces. Suppose there is a C1-diffeomorphism g : X → X̃ with P = P̃ ◦g (if necessary

assume g is defined in a neighborhood of X).

(i) If det(Jg(a)) > 0 for every a ∈ X, then∫P f =

∫P̃f .

(ii) If det(Jg(a)) < 0 for every a ∈ X, then∫P f = −

∫P̃f .

Proof. If a ∈ X and b = g(a) ∈ X̃, then∂P

∂x(a) × ∂P

∂y(a) = det(Jg(a))(

∂P̃

∂x(b) × ∂P̃

∂y(b)) by

Exercise-43(ii). Let h : X̃ → R be h(b) = ⟨(f ◦ P̃ )(b), ∂P̃∂x

(b) × ∂P̃

∂y(b)⟩. Then by Change of

variable theorem and the initial observation, we see that∫P̃f =

∫X̃h =

∫X(h ◦ g) det(Jg(·)) =

±∫X⟨f ◦P,

∂P

∂x(a)× ∂P

∂y(a)⟩ = ±

∫P f , where we have plus sign if det(Jg(·)) > 0 and we have minus

sign if det(Jg(·)) < 0. �


15. Divergence and curl

We will introduce the notions of divergence and curl for a vector field: divergence measures

expansion/compression (positive divergence indicates expansion and negative divergence indicates

compression), and curl measures the circulation density (the direction of the curl vector indicates

the axis around which maximal rotation happens and the magnitude of the curl vector measures

the speed of rotation).

Definition: (i) Let U ⊂ Rn be open and f = (f1, . . . , fn) : U → Rn be a C1-function (or just

assume that all the first order partial derivatives exist). Then the divergence of f is the scalar-

valued function divf : U → R defined as divf = ⟨∇, f⟩ = ∇ · f =∑n

i=1

∂fi∂xi

. Note that if f is a

Ck-function, then divf is a Ck−1-function.

(ii) Let U ⊂ R3 be open and f = (f1, f2, f3) : U → R3 be a C1-function (or just assume that all the

first order partial derivatives exist). Then the curl of f is the vector-valued function curlf : U → R3

defined as curlf = ∇× f = det

e1 e2 e3∂

∂x

∂

∂y

∂

∂z

f1 f2 f3

= (∂f3∂y

− ∂f2∂z

,∂f1∂z

− ∂f3∂x

,∂f2∂x

− ∂f1∂y

). Note that

if f is a Ck-function, then curlf is a Ck−1-function. Also observe that if a ∈ U and Jf (a) is

symmetric, then (curlf)(a) = 0.

(iii) Let U ⊂ Rn be open. If f : U → R is a C2-function, then the Laplacian ∇2f of f is the

function ∇2f : U → R is defined as ∇2f = ⟨∇,∇f⟩ = div(∇f) =∑n

i=1

∂2f

∂x2i. If ∇2f ≡ 0, then f is

said to be a harmonic function. For example, the real part and imaginary part of a holomorphic

function from C to C are known to be harmonic. If f = (f1, . . . , fn) : U → Rn is a C2-function,

then we define ∇2f = (∇2f1, . . . ,∇2fn).

Remark: [Meaning of divergence in R2] Let U ⊂ R2 be open, f = (f1, f2) : U → R2 be a C1-

function, and (a, b) ∈ U . Consider a small square A ⊂ U centered at (a, b) with side-length ε > 0.

The flow out of A through an edge of A is approximately equal to

⟨f(mid point of the edge), outward unit normal of the edge⟩ × length of the edge. Hence,

net amount of flow out of A

area(A)

∼ 1

ε2[⟨f(a, b− ε/2),−e2⟩ε+ ⟨f(b, a+ ε/2), e1⟩ε+ ⟨f(a, b+ ε/2), e2⟩ε+ ⟨f(a− ε/2, b),−e1⟩ε]

=f1(a+ ε/2, b)− f(a− ε/2, b)

ε+f2(a, b+ ε/2)− f2(a, b− ε/2)

ε→ ∂f1

∂x(a, b) +

∂f2∂y

(a, b)

= (divf)(a, b) as ε→ 0. Similar explanation can be given in higher dimensions.


Remark: [Meaning of curl] Let U ⊂ R3 be open, f = (f1, f2, f3) : U → R3 be a C1-function, and

consider a ∈ U . By definition, (curlf)(a) = (∂f3∂y

(a)− ∂f2∂z

(a),∂f1∂z

(a)− ∂f3∂x

(a),∂f2∂x

(a)− ∂f1∂y

(a)).

The third coordinate∂f2∂x

(a) − ∂f1∂y

(a) gives the circulation density of f at a in the plane passing

though a and parallel to the xy-plane (equivalently, around the line passing through a and parallel

to the z-axis). Similar explanations can be given for the first and second coordinates of (curlf)(a).

Notation: Let U ⊂ Rn be open, and f, g : U → Rn be functions. Write f = (f1, . . . , fn) and

g = (g1, . . . , gn). Then ⟨f, g⟩ : U → R is defined as ⟨f, g⟩(a) =∑n

i=1 fi(a)gi(a). If n = 3, then

f × g : U → R3 is defined as (f × g)(a) = (f(a))× (g(a)).

Exercise-48: Let U ⊂ R3 be open and f, g, h : U → R3 be functions. Then,

(i) f × (g × h) = ⟨f, h⟩g − ⟨f, g⟩h by Exercise-42(x).

(ii) (f × g)× h = −(h× (f × g)) = ⟨h, f⟩g − ⟨h, g⟩f = ⟨f, h⟩g − ⟨g, h⟩f by (i).

Warning: Let U ⊂ Rn be open, f : U → Rn be a C1-function. Then ⟨∇, f⟩ ̸= ⟨f,∇⟩: the right

hand side is the partial differential operator∑n

i=1 fi∂

∂xi.

[145] Let U ⊂ Rn be open. (i) Let f, g : U → Rn be C1-functions, and c1, c2 ∈ R.

Then div(c1f + c2g) = c1 divf + c2 divg. If n = 3, then curl(c1f + c2g) = c1 curlf + c2 curlg.

(ii) Let f : U → Rn and ϕ : U → R be C1-functions. Then

div(ϕf) = ⟨∇, ϕf⟩ = ⟨∇ϕ, f⟩+ ϕ⟨∇, f⟩ = ⟨∇ϕ, f⟩+ ϕ divf . If n = 3, then

curl (ϕf) = ∇× (ϕf) = ∇ϕ× f + ϕ(∇× f) = ∇ϕ× f + ϕ curlf .

(iii) Assume n = 3, and let f, g : U → R3 be C1-functions. Then

div(f × g) = ⟨∇, f × g⟩ = ⟨g,∇× f⟩ − ⟨f,∇× g⟩ = ⟨g, curlf⟩ − ⟨f, curlg⟩, and

curl(f × g) = ∇× (f × g) = ⟨g,∇⟩f − ⟨f,∇⟩g + ⟨∇, g⟩f − ⟨∇, f⟩g

= ⟨g,∇⟩f − ⟨f,∇⟩g + (divg)f − (divf)g.

Proof. The verifications are left to the student. �

[146] Let U ⊂ R3 be open, and f : U → R3 be a C1-function.

(i) If f = ∇F for some F : U → R, then curlf ≡ 0. That is, curl(grad) ≡ 0.

(ii) If f = curlg for some C2-function g : U → R3, then divf ≡ 0. That is, div(curl) ≡ 0.

(iii) If f = curlg for some C2-function g : U → R3, then curlf = curl(curlg) = ∇(divg)−∇2g.

Proof. (i) Suppose f = ∇F , and consider a ∈ U . Then Jf (a) is equal to the transpose of the

Hessian matrix HF (a). But F is a C2-function (since f is C1), and hence HF (a) is symmetric by

[110]. Thus Jf (a) is symmetric, which implies (curlf)(a) = 0 by the definition of curlf .

(ii) Use the equality of second order mixed partial derivatives of g given by [110].

(iii) The proof is similar to that of Exercise-42(x). �


Remark: (i) Compare [146](i) with [139](i) and [139](ii). Another way to understand [146](i) is: if

U is connected and f = ∇F , then∫α f = 0 by [138] for every piecewise C1 closed path α in U , and

therefore ‘circulation density’ of f is zero everywhere in U ; so curlf ≡ 0.

(ii) At a formal level, [146](ii) says ⟨∇,∇ × g⟩ = 0, which is similar to the fact ⟨u, u × v⟩ = 0.

Further intuition about [146](ii) is given by Gauss’ divergence theorem [149] (see Exercise-51).

Example: (i) Let f : R3 → R3 be f(x, y, z) = (x, y, z). This represents a flow originating from

(0, 0, 0) and spreading outwards with increasing speed. There is no rotation involved. We see that

divf ≡ 3 and cirlf ≡ 0 ∈ R3.

(ii) Let f : R3 → R3 be f(x, y, z) = (−y, x, 0), which is a rotation around the z-axis. There is

neither expansion nor compression. We see that divf ≡ 0 and (curlf)(x, y, z) = (0, 0, 2).

Exercise-49: Let U ⊂ R3 be open.

(i) If F : U → R is a harmonic C2-function and f = ∇F , then divf ≡ 0 ∈ R and curlf ≡ 0 ∈ R3.

(ii) If f, g : U → R3 are C2-functions with divf = divg and curlf = curlg, then ∇2(f−g) ≡ 0 ∈ R3.

(iii) Assume U is connected. Then there is a non-constant C2-function f : U → R3 with divf ≡

0 ∈ R and curlf ≡ 0 ∈ R3.

[Hint : (i) divf = ∇2F ≡ 0 since F is harmonic. And [146](i) gives curlf = 0. (ii) Let h = f − g,

and note by [146](iii) that curl(curlh) = ∇(divh)−∇2h. (iii) Let F : R3 → R be F (x, y, z) = x2−y2

which is (the real part of a holomorphic function and) harmonic. Let f = ∇F . Then divf ≡ 0 ∈ R

and curlf ≡ 0 ∈ R3 by (i), but f is not a constant: f(x, y, z) = (2x,−2y, 0).]

The converse parts of [146](i) and [146](ii) are true in certain special cases:

[147] Let U ⊂ R3 be open and f : U → R3 be a C1-function.

(i) Assume U is convex7. Then there is F : U → R with ∇F = f ⇔ curlf ≡ 0 ∈ R3.

(ii) Assume U is the interior of a 3-box. Then there is g : U → R with curlg = f ⇔ divf ≡ 0 ∈ R.

Proof. (i) The implication ⇒ is given by [146](i). And the reverse implication follows from [140]

because Jf (a) is symmetric for every a ∈ U if curlf ≡ 0.

(ii) The implication ⇒ is given by [146](ii). For the reverse implication, see Theorem 12.5 of

Apostol, Calculus-II (left as a reading assignment). �

16. Stokes’ theorem

The result analogous to Green’s theorem in dimension 3 is called Stokes’ theorem. Roughly

speaking, it says that if P : X ⊂ R2 → R3 is a parametric surface, then the net amount of rotation

7or more generally, an open set without ‘holes’.


of a flow tangential to P (X) is equal to the net amount of anticlockwise flow along the boundary

of P (X). A technical point: if necessary assume P is defined in a neighborhood of X.

[148] [Stokes’ theorem] Let U ⊂ R3 be open, f : U → R3 be a C1-function, P : X ⊂ R2 → R3

be an injective C2 parametric surface, and let α : [c, d] → U be a piecewise C1 simple closed path

parametrizing ∂X in the anticlockwise direction. Then∫P curlf =

∫P◦α f .

Proof. Write f = (f1, f2, f3). Then f = (f1, 0, 0) + (0, f2, 0) + (0, 0, f3). Since the curl and the

integrals (the surface integral∫P and the line integral

∫P◦α) are linear operators, it suffices to prove

the result for each of the three functions (f1, 0, 0), (0, f2, 0), (0, 0, f3) separately. We will prove the

result for the function (0, 0, f3); the proofs for the other two functions are similar. So assume

f = (0, 0, f3) for the rest of the proof. We will use Green’s theorem in the proof.

Write P = (P1, P2, P3) and α = (α1, α2). We will denote the members of U as u = (u1, u2, u3).

Since f = (0, 0, f3), we see that∫P◦α f =

∫ ba ⟨(f ◦ P ◦ α)(t), (P ◦ α)′(t)⟩dt

=∫ dc

[(f3 ◦ P ◦ α)(t)∂P3

∂x(α(t))α′

1(t) + (f3 ◦ P ◦ α)(t)∂P3

∂y(α(t))α′

2(t)

]dt

=∫ dc [g1(α(t))α

′1(t) + g2(α(t))α

′2(t)] dt =

∫ dc ⟨g(α(t)), α

′(t)⟩dt =∫α g =

∫∂X g,

where g : X → R2 is defined as g(a) = (g1(a), g2(a)) = (f3(P (a))∂P3

∂x(a), f3(P (a))

∂P3

∂y(a)).

Note that g is a C1-function because f is C1 and P is C2. Applying Green’s theorem to g, we

conclude that∫P◦α f =

∫∂X g =

∫X(∂g2∂x

− ∂g1∂y

). (*)

For a ∈ X and u = P (a) ∈ U , observe by the definition of g and Chain rule that

∂g2∂x

(a) =

(∑3i=1

∂f3∂ui

(u)∂Pi∂x

(a)∂P3

∂y(a)

)+ f3(u)

∂2P3

∂x∂y(a), and

∂g1∂y

(a) =

(∑3i=1

∂f3∂ui

(u)∂P3

∂x(a)

∂Pi∂y

(a)

)+ f3(u)

∂2P3

∂y∂x(a).

The final terms are equal by the equality of mixed partial derivatives of P3 given by [110] since P3

is a C2-function. Moreover, the terms corresponding to i = 3 are also equal. Therefore,

(∂g2∂x

− ∂g1∂y

)(a) =∂f3∂u1

(u)(∂P1

∂x

∂P3

∂y− ∂P3

∂x

∂P1

∂y)(a) +

∂f3∂u2

(u)(∂P2

∂x

∂P3

∂y− ∂P3

∂x

∂P2

∂y)(a). (**)

Next observe that curlf = det

e1 e2 e3∂

∂u1

∂

∂u2

∂

∂u30 0 f3

= (∂f3∂u2

,−∂f3∂u1

, 0), and

∂P

∂x× ∂P

∂y= (

∂P2

∂x

∂P3

∂y− ∂P3

∂x

∂P2

∂y,∂P3

∂x

∂P1

∂y− ∂P1

∂x

∂P3

∂y,∂P1

∂x

∂P2

∂y− ∂P2

∂x

∂P1

∂y).

Therefore, for a ∈ X and u = P (a), we may deduce using (**) that


⟨(curlf)(P (a)), ∂P∂x

(a)× ∂P

∂y(a)⟩ = (

∂g2∂x

− ∂g1∂y

)(a).

Hence (*) implies that∫P curlf =

∫X⟨(curlf) ◦ P,

∂P

∂x× ∂P

∂y⟩ =

∫X(∂g2∂x

− ∂g1∂y

) =∫P◦α f . �

Remark: (i) Stokes’ theorem can be extended to ‘surfaces’ obtained by ‘pasting together’ finitely

many (images of) parametric surfaces provided on any common boundary of two distinct parametric

surfaces, the line integrals are in opposite directions and cancel each other. On the other hand,

Stokes’ theorem cannot be extended to ‘non-orientable’ surfaces such as the Mobius band (which

has only ‘one side’). For more details on this topic, see Section 12.8 of Apostol, Calculus-II.

(ii) Stokes’ theorem provides an insight into the identity curl(grad) ≡ 0 as follows. Suppose f = ∇F .

Then by taking β = P ◦ α in [148], we see∫P curlf =

∫β f = F (β(b)) − F (β(a)) = 0 by [137](i)

since β is a closed path. As P is an arbitrary parametric surface, we may deduce that curlf ≡ 0.

(iii) Let U ⊂ R3 be open, f : U → R3 be a C1-function, P : X ⊂ R2 → R3 be an injective C2

parametric surface, and let α : [a, b] → U be a piecewise C1 simple closed path parametrizing ∂X

in the anticlockwise direction. If f = curlg, then the surface integral∫P f simplifies to the line

integral∫P◦α g by applying [148] to g.

Exercise-50: (i) Let r > 0 and S ⊂ R3 be the sphere with radius r centered at the origin, with

upper half S1 and lower half S2. Let X = {(x, y) ∈ R2 : x2 + y2 ≤ r2}, and P,Q : X → R3 be

P (x, y) = (x, y, ϕ(x, y)) and Q(x, y) = (x,−y,−ϕ(x, y)), where ϕ(x, y) =√r2 − x2 − y2. Then P

parametrizes S1, Q parametrizes S2, and the common boundary ∂S1 ∩ ∂S2 (which is the equator

of S) is parametrized by P and Q in opposite directions (because of the minus sign for y in the

expression for Q). Moreover,∂P

∂x× ∂P

∂y= (−∂ϕ

∂x,−∂ϕ

∂y, 1) and

∂Q

∂x× ∂Q

∂y= (−∂ϕ

∂x,∂ϕ

∂y,−1) give

outward normals to S1 and S2 respectively (in the first case, the z-coordinate is positive, which

indicates upward normal; and in the second case, the z-coordinate is negative, which indicates

downward normal). Hence if U ⊂ R3 is an open neighborhood of S and f : U → R3 is a C1-

function, then∫S f =

∫P f +

∫Q f .

(ii) Let U ⊂ R3 be open, f : U → R3 be a C1-function, and suppose f = curlg for some g : U → R3.

Then∫S f = 0 for any sphere S ⊂ U when the integral

∫S f is considered with respect to the

outward unit normal to S.

[Hint : (ii) For simplicity assume S = {(x, y, z) ∈ R3 : x2+ y2+ z2 = r2}. Write S = S1 ∪S2, where

S1 is the upper half and S2 is the lower half of S. Let P,Q be as in part (i). Since the common

boundary ∂S1 ∩ ∂S2 is parametrized in opposite directions by P and Q, and since∂P

∂x× ∂P

∂yand

∂Q

∂x× ∂Q

∂ygive outward normals to S1 and S2 respectively, we may extend Stokes’ theorem to S

for the function g and conclude∫S f =

∫S1f +

∫S2f =

∫P curlg +

∫Q curlg =

∫∂S1

g +∫∂S2

g = 0.


Let U ⊂ R3 be open, and f : U → R3 be a C1-function. By [146](ii), a necessary condition for

the existence of a C2-function g : U → R3 with f = curlg is divf ≡ 0. The following example

shows that this condition is not sufficient.

Example: Let U = R3 \ {0}, and f : U → R3 be f(x, y, z) =(x, y, z)

(x2 + y2 + z2)3/2. We see that

∂f1∂x

=y2 + z2 − 2x2

(x2 + y2 + z2)3/2,∂f2∂y

=z2 + x2 − 2y2

(x2 + y2 + z2)3/2, and

∂f3∂z

=x2 + y2 − 2z2

(x2 + y2 + z2)3/2. Hence f is a

C1-function, and divf =∂f1∂x

+∂f2∂y

+∂f3∂z

≡ 0.

To prove f ̸= curlg for any g, it suffices to show in view of Exercise-50(ii) that∫S f ̸= 0

for the unit sphere S ⊂ R3, where the integral∫∂S f is taken with respect to the outward unit

normal of S. Let S1 and S2 be respectively the upper and lower halves of the unit sphere S.

Let X = {(x, y) ∈ R2 : x2 + y2 ≤ 1}, and P,Q : X → U be P (x, y) = (x, y,√

1− x2 − y2)

and Q(x, y) = (x,−y,−√

1− x2 − y2). Then P parametrizes S1, and Q parametrizes S2. Using

the polar coordinates (x, y) = (r cos θ, r sin θ), and the Change of variable therem (where the

determinant of the Jacobian is r), we see∫P f =

∫X⟨f ◦ P, ∂P

∂x× ∂P

∂y⟩

=∫X⟨(x, y,

√1− x2 − y2), (

x√1− x2 − y2

,y√

1− x2 − y2, 1)⟩ =

∫X

1√1− x2 − y2

=∫ 10

∫ 2π0

r√1− r2

dθdr = 2π∫ 10

r√1− r2

dr = π∫ 10 t

−1/2dt = 2π (by putting t = 1− r2).

Similarly,∫Q f =

∫X⟨f ◦Q, ∂Q

∂x× ∂Q

∂y⟩

=∫X⟨(x,−y,−

√1− x2 − y2), (

x√1− x2 − y2

,−y√

1− x2 − y2,−1)⟩ =

∫X

1√1− x2 − y2

= 2π.

Both∂P

∂x× ∂P

∂yand

∂Q

∂x× ∂Q

∂ygive outward normals to the unit sphere; see Exercise-50(i). Hence∫

S f =∫S1f +

∫S2f =

∫P f +

∫Q f = 4π ̸= 0. Thus we conclude by Exercise-50(ii) that f ̸= curlg

for any C2-function g : U → R3. (Remark: In this example, the failure of the existence of any g

with curlg = f is essentially due to the fact that the open set U has a ‘hole’).

17. Gauss’ divergence theorem

Recall the interpretation of divergence as measuring expansion/compression of a flow, and think

of compression as negative expansion. Then the divergence theorem of Gauss ([149] below) says

roughly that the net amount of expansion of a flow in a 3-dimensional solid region V is equal to

the net amount of flow out of V through the ‘surface’ ∂V . We will present Gauss’ theorem only

for a special type of solid region (similar to the ‘elementary region’ in Green’s theorem).

Remark: In the theorems of Green, Stokes, and Gauss, we have an equality between an integral

over an n-dimensional region (n = 2 or n = 3) and an integral over the boundary of the region,


In this sense, all these three theorems can be thought of as generalizations of the Fundamental

theorem of calculus (which says∫ ba f(x)dx = F (b)− F (a) if F ′ = f and f is continuous).

Definition: We say V ⊂ R3 is an elementary solid if there are

(i) compact connected sets X, X̃, X̂ ⊂ R2 bounded by piecewise C1 simple closed paths, and

(ii) C1-functions ϕ, ψ : X → R, ϕ̃, ψ̃ : X̃ → R, and ϕ̂, ψ̂ : X̂ → R with ϕ < ψ on int(X), ϕ̃ < ψ̃ on

int(X̃), and ϕ̂ < ψ̂ on int(X̂) such that

V = {(x, y, z) ∈ R3 : (x, y) ∈ X and ϕ(x, y) ≤ z ≤ ψ(x, y)}

= {(x, y, z) ∈ R3 : (x, z) ∈ X̃ and ϕ̃(x, z) ≤ y ≤ ψ̃(x, z)}

= {(x, y, z) ∈ R3 : (y, z) ∈ X̂ and ϕ̂(y, z) ≤ x ≤ ψ̂(y, z)}. Note that when this holds, the

graphs of ϕ, ψ, ϕ̃.ψ̃, ϕ̂, ψ̂ are (images of) parametric surfaces, and V is the region between pairs of

such surfaces. Examples of elementary solids are 3-boxes, solid spheres, etc.

[149] [Gauss’ divergence theorem] Let U ⊂ R3 be open, f : U → R3 be a C1-function, and V ⊂ U

be an elementary solid. Then∫V divf =

∫∂V f , where the ‘surface integral’

∫∂V f is taken with

respect to the outward unit normal to ∂V .

Proof. Write f = (f1, f2, f3). Then f = (f1, 0, 0) + (0, f2, 0) + (0, 0, f3). Since div and the integrals

are linear operators, it is enough to prove the result for each of the three functions (f1, 0, 0),

(0, f2, 0), (0, 0, f3) separately. We will prove the result for the function (0, 0, f3); and the proofs for

the other two functions are similar. So assume f = (0, 0, f3) for the rest of the proof.

As per definition, the elementary solid V has three representations, out of which choose the

following representation: V = {(x, y, z) ∈ R3 : (x, y) ∈ X and ϕ(x, y) ≤ z ≤ ψ(x, y)}, where

X ⊂ R2 is a compact connected set bounded by piecewise C1 simple closed path, and ϕ, ψ : X → R

are C1-functions with ϕ < ψ on int(X). Since f = (0, 0, f3), we get by Fubini’s theorem that∫V divf =

∫V

∂f3∂z

=∫(x,y)∈X(

∫ ψ(x,y)ϕ(x,y)

∂f3∂z

dz)d(x, y) =∫X(f3(x, y, ψ(x, y))− f3(x, y, ϕ(x, y))) (*)

Let P,Q : X → U be P (x, y) = (x, y, ψ(x, y)) and Q(x, y) = (x, y, ϕ(x, y)). Then V is the region

between the images of the parametric surfaces P (upper part) and Q (lower part). We may write

∂V = P (X) ∪ S ∪Q(X), where S is the part of ∂V between P (X) and Q(X). Note the following:

(i)∂P

∂x× ∂P

∂y= (−∂ψ

∂x,−∂ψ

∂y, 1), which is an outward normal to the upper part P (X) of ∂V because

the z-coordinate is positive (upward).

(ii)∂Q

∂x× ∂Q

∂y= (−∂ϕ

∂x,−∂ϕ

∂y, 1), which is an inward normal to the lower part Q(X) of ∂V because

the z-coordinate is positive (upward).

(iii) Any outward normal to the ‘middle part’ S of ∂V is parallel to the xy-plane and hence has

the z-coordinate zero. Consequently,∫S f = 0 because f = (0, 0, f3) by assumption.


By the above observations, the value of∫∂V f with respect to the outward unit normal is:∫

∂V f =∫P f−

∫Q f =

∫X⟨f ◦P,

∂P

∂x× ∂P

∂y⟩−∫X⟨f ◦Q,

∂Q

∂x× ∂Q

∂y⟩ =

∫X((f3◦P )×1)−

∫X((f3◦Q)×1)

=∫X(f3(x, y, ψ(x, y))− f3(x, y, ϕ(x, y))) =

∫V divf by (*). �

Remark: Gauss’ divergence theorem can be extended to more general 3-dimensional solid regions

which are formed by ‘pasting together’ finitely many elementary solids V1, . . . , Vk provided on any

shared boundary of Vi and Vj (for i ̸= j), the outward unit normals are in opposite directions (so

that the respective parts of surface integrals over ∂Vi and ∂Vj cancel each other).

Exercise-51: Derive ‘div(curl) ≡ 0’ using Gauss’s divergence theorem. [Hint : Let U ⊂ R3 be open,

and f : U → R3 be a C1-function, and assume f = curlg for some g : U → R3. If (divf)(u) ̸= 0

for some u ∈ U , assume (divf)(u) > 0. Choose δ > 0 and a small solid sphere V ⊂ U centered at

u such that divf ≥ δ in V . Then∫V divf ≥ δµ(V ) > 0. Hence by [149],

∫∂V f =

∫V f > 0. On the

other hand,∫∂V f = 0 by Exercise-50(ii), a contradiction.]

[150] [A coordinate-free expression for ‘div’] Let U ⊂ R3 be open, f : U → R3 be a C1-function,

and w ∈ U . Then (divf)(w) = limr→0+1

volume(B(w, r))

∫∂B(w,r) f = limr→0+

3

4πr3∫∂B(w,r) f .

Proof. Let δ0 > 0 be with B(w, δ0) ⊂ U , and Mr = µ(B(w, r)) for r > 0. Consider ε > 0. We

have to find δ ∈ (0, δ0) such that |(divf)(w)− 1

Mr

∫∂B(w,r) f | < ε for every r ∈ (0, δ). As f is a C1-

function, divf is continuous, and so there is δ ∈ (0, δ0) such that |(divf)(u)−(divf)(w)| < ε/2 when-

ever u ∈ B(w, δ). Now consider any r ∈ (0, δ). We may write (divf)(w) as1

Mr

∫B(w,r)

(divf)(w).

Moreover,∫∂B(w,r) f =

∫B(w,r)

divf by [149]. Hence |(divf)(w)− 1

Mr

∫∂B(w,r) f |

≤ 1

Mr

∫B(w,r)

|(divf)(w)− (divf)(u)|du ≤ 1

Mr

∫B(w,r)

(ε/2) = ε/2 < ε. �

Remark: Similarly, ‘curl’ is independent of coordinates: recall the explanation given earlier in terms

of circulation density.

*****

Documents

MULTIVARIABLE CALCULUS Contents...MULTIVARIABLE CALCULUS T.K.SUBRAHMONIAN MOOTHATHU Contents 1. A few remarks about Rn 2 2. Multivariable diﬀerentiation: deﬁnitions 6 3. Multivariable