limsup stuff

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010

JACK SPIELBERG

Contents

1. Axioms for the real numbers 22. Cardinality (briefly) 83. Decimal representation of real numbers 94. Metric spaces 115. The topology of metric spaces 146. The Cantor set 177. Sequences 198. Continuous functions 219. Limits of functions 2310. Sequences in R 2411. Limsup and liminf 2612. Infinite limits and limits at infinity 2813. Cauchy sequences and complete metric spaces 2914. Compactness 3115. Continuity and compactness 3616. Connectedness 3717. Continuity and connectedness 4018. Uniform continuity 4219. Convergence of functions 4320. Differentiation 4521. Higher order derivatives and Taylor’s theorem 5122. The Riemann integral 5323. The “Darboux” approach 5624. Measure zero and integration 5925. The fundamental theorem of calculus 6326. The Weierstrass approximation theorem 6527. Uniform convergence and the interchange of limits 6828. Infinite series 7129. Series of functions 7430. Power series 7531. Compactness in function space 7632. Conditional convergence 78

1

2 JACK SPIELBERG

1. Axioms for the real numbers

In this course we will more-or-less follow an axiomatic approach. Namely, we will giveaxioms for the real numbers, and prove everything in the course from these axioms. Well,this is not strictly true — some things will be stated without proof. These may be assimple as ordinary high school algebra, which we will assume is well-understood already(and that you have had some experience in deriving from the axioms). We will also makeuse of various functions familiar from calculus, such as the trigonometric, exponential andlogarithmic functions, even if we haven’t yet proved their existence and properties fromthe axioms. However, in this case we will eventually at least sketch how this can be donerigorously, and we promise that even if we never talk about these proofs, the use we makeof these functions isn’t needed for the proofs (thus we avoid any circularity in the logicalstructure of the material). There is one theorem at the foundation of the course that we willneither prove nor sketch. That one we will just take “on faith.” (You can read a proof inthe first chapter of Rudin’s book, and if you do, you will understand why we won’t use classtime for it.)

So now we begin. The axioms that define the real numbers come in three parts: the fieldaxioms, the order axioms, and the completeness axiom.

Definition 1.1. A field is a set F with two binary operations, addition (denoted +) and mul-tiplication (denoted ·), that satisfy the following axioms (we assume that these are familiar,so we only give them briefly).

(1) Addition and multiplication are associative and commutative.(2) There exist identity elements for addition and multiplication, denoted 0 and 1, re-

spectively.(3) 0 6= 1.(4) Every element of F has an additive inverse.(5) Every non-zero element of F has a multiplicative inverse.(6) Multiplication distributes over addition.

All of the usual algebraic rules of arithmetic follow from these axioms. For example:

• Additive and multiplicative identities and inverses are unique.• (−1)2 = 1.• xy = 0 if and only if x = 0 or y = 0.• (a+ b)n =

∑nj=0

(nj

)an−jbj, where the binomial coefficients are defined by(

n

j

)=

n!

j!(n− j)!.

We will assume familiarity with this stuff. It is interesting, though, to consider what isactually included in the phrase “this stuff.” What facts from high school algebra are coveredby the field axioms? Here is an example of something that is not covered.

Example 1.2. Let F be a field. Are the elements 1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .all distinct? In fact, if we just have the field axioms, we can neither prove nor disprovethat these are all distinct elements. Notice that these are what we normally refer to as thenatural numbers (denoted N). So it isn’t clear that the natural numbers even make sense inan arbitrary field.

Exercise 1.3. Explain why the “fact” stated in the previous example is true.

NOTES, MAT 472, INTERMEDIATE ANALYSIS, FALL 2010 3

As another example, it is impossible to define the absolute value function in an arbitraryfield in an intelligent way. To resolve these problems (i.e. to make sure that our axiomsreally do pick out the real numbers) we have to give more axioms.

Definition 1.4. Let F be a field. Then F is an ordered field if there is a distinguishedsubset F+ of F (called the positive elements of F ) satisfying the following properties.

(1) For each x ∈ F , exactly one of the three statements x ∈ F+, −x ∈ F+, x = 0 is true.(2) If x, y ∈ F+ then x+ y ∈ F+.(3) If x, y ∈ F+ then xy ∈ F+.

Now we define the usual symbols to express order. For any elements x, y ∈ F , we writex > y to mean x− y ∈ F+, x ≥ y to mean x > y or x = y, etc.

All of the usual rules of inequalities follow from the order axioms and the field axioms.For example:

• If x ≤ y and y ≤ x, then x = y.• If x 6= 0 then x2 > 0.• If x < y and z > 0 then xz < yz.• If 0 < x < y and n ∈ N then xn < yn.• xy > 0 if and only if x > 0 and y > 0, or x < 0 and y < 0.• A finite subset of F has a minimum. (Of course this is not true for infinite subsets.)

As a further example, let’s note that the order axioms resolve the ambiguity mentionedabove. If F is an ordered field, then 1 + 1 + · · ·+ 1 > 0. It follows easily (exercise!) that 1,1 + 1, 1 + 1 + 1, . . . are all distinct positive elements of F . Thus N is “contained” in everyordered field. But then the integers, Z, are also contained in every ordered field. But then therational numbers, Q, are also contained in every ordered field. In fact, the rational numbersthemselves are an ordered field (this is obvious, isn’t it?). Thus the rational numbers can bedescribed as the smallest ordered field.

We will now dip back into high school algebra to review the absolute value function. Eventhough we assume familiarity with this stuff, absolute value is so important that it’s worthstating some of the details.

Definition 1.5. Let F be an ordered field. The absolute value function | · | : F → F isdefined by

|x| =

{x, if x ≥ 0,

−x, if x < 0.

When we think of the real numbers as a “number line,” we can view |x| as the “distance”between x and 0. The number line is great for intuition, but must not be used for proofs.

Here are the basic properties of absolute value.

(1) | − x| = |x|.(2) |x| ≥ 0.(3) ±x ≤ |x|.(4) |x| < a if and only if −a < x and x < a (written −a < x < a).(5) |x| > a if and only if x < −a or x > a (Note that this cannot be written without

using the word “or”).(6) |x+ y| ≤ |x|+ |y| (the triangle inequality).(7) |x+ y| ≥

∣∣|x| − |y|∣∣.

4 JACK SPIELBERG

(8) |x− a| < r if and only if a− r < x < a+ r (draw a picture on the number line).(9) If a < x < b and a < y < b then |x− y| < b− a.

(10) Let x ∈ F . Suppose that |x| < ε for every positive element ε ∈ F . Then x = 0.

Property 10 above can be strengthened a bit, in a way that can be very useful. (Don’tcite property 10 when proving this.)

Exercise 1.6. Let F be an ordered field, and let x ∈ F . Suppose that p, q ∈ F+ are suchthat for every ε ∈ F with 0 < ε < p, we have |x| < qε. Then x = 0.

Remark 1.7. Here is another consequence of the ordered field axioms. Let b > 0. Then

(1 + b)n = 1 + nb+ · · · > nb.

Now let 0 < a < 1. Then 1a> 1, so

1− aa

=1

a− 1 > 0.

Let b = 1a− 1. Then a = 1

1+b, and

an =1

(1 + b)n<

1

nb=

(a

1− a

)1

n.

Now we ask the following question (assuming some familiarity with the concept of limit,but only for the sake of the discussion): if 0 < a < 1 does an tend towards 0, as n → ∞?Another way to put this is to ask: if c is any fixed positive element, does there exists n0 ∈ Nsuch that an < c for all n ≥ n0? Using the above computations, we see that we can answerthis question affirmatively if we could show that for any fixed positive element c, there existsn0 ∈ N such that a

(1−a)n < c for all n ≥ n0. Now observe that we could do this if we could

find n0 ∈ N such that a(1−a)n0

< c. In other words, we could prove that an → 0 if we could

find n0 ∈ N such that n0 >a

(1−a)c . But since c is an arbitrary positive element, then so

is a(1−a)c . So this all comes down to trying to prove that for any positive element x, there

is a natural number n0 such that n0 > x. An ordered field in which this is true is calledArchimedean.

Definition 1.8. Let F be an ordered field. F is called Archimedean if for every x ∈ F thereexists a natural number n such that x < n.

It is evident that Q is an Archimedean ordered field, and we “know” that R is one too.But we can’t prove it yet, because not all ordered fields are Archimedean!! In other words,we don’t yet have enough axioms for the real numbers, since we can’t prove the most basicfact from advanced calculus. Along with the field and order axioms, there is one more axiomthat is necessary to characterize the real numbers. We need some definitions before we canpresent it.

Definition 1.9. Let F be an ordered field, let S ⊆ F , and let x ∈ F .

(1) x is an upper bound of S if y ≤ x for every y ∈ S.(2) x is a lower bound of S if y ≥ x for every y ∈ S.(3) S is bounded above if there exists an upper bound for S.(4) S is bounded below if there exists a lower bound for S.(5) S is bounded if it is bounded above and below.

Exercise 1.10. Is the empty set bounded?


Definition 1.11. Let S be a subset of an ordered field F , and let x ∈ F . x is a supremum(or sup, or least upper bound, or lub) of S if

(1) x is an upper bound of S.(2) For every upper bound z of S, x ≤ z.

The condition (2) can be expressed in the equivalent forms:

(2′) For any z ∈ F , if z < x then z is not an upper bound of S.(2′′) For any z ∈ F , if z < x then there exists y ∈ S with z < y.

In a completely analogous manner we define infimum (or inf, greatest lower bound, glb).The details of the precise formulation are left as an exercise.

Remark 1.12. It follows immediately from condition (2) of Definition 1.11 that if S has asupremum then it is unique (and similarly for infimum).

Exercise 1.13. Let S be a subset of an ordered field F , and let x ∈ F . Let −S = {−y :y ∈ S}

(1) S is bounded above (respectively, below) if and only if −S is bounded below (respec-tively, above).

(2) x is an upper (respectively, lower) bound for S if and only if −x is a lower (respec-tively, upper) bound for −S.

(3) x is a supremum (respectively, infimum) of S if and only if −x is an infimum (respec-tively, supremum) of −S.

Now we are ready to state the last axiom of the real numbers, the completeness axiom.

Definition 1.14. Let F be an ordered field. F is complete if every non-empty subset of Fthat is bounded above has a supremum.

The following is an easy consequence of Exercise 1.13.

Corollary 1.15. Let F be a complete ordered field. Then every non-empty subset of F thatis bounded below has an infimum.

The next theorem is the foundation of the course, but is the one result that we won’tattempt to prove. As mentioned earlier, you can read a proof in Rudin’s book.

Theorem 1.16. There exists a unique complete ordered field.

The one and only complete ordered field is called the field of real numbers, and we willwrite R as an abbreviation. This is the same number line that we (think we) know and love.But even though we have lots of intuition about it, we will insist on proving EVERYTHINGabout it. For example, since R is an ordered field, R contains the rational numbers Q as asubfield. Are there any other elements of R besides Q? Well, we think we know that thereare — but how do we prove that there are? The usual way is to bring up the classical proofthat

√2 is irrational. But this is sophistry! That proof “merely” shows that no rational

number has square equal to 2. It’s possible that there is an element of R having square equalto 2. If there is such an element, then it can’t belong to Q, so it would be an element ofR \Q. But we don’t know yet that there is a real number having square equal to 2. In fact,there is, but this fact must be proved.

Let’s return to an even more basic point, the Archimedean property. We mentioned earlierthat R is Archimedean, and of course this is a fundamental property of the number line: the

6 JACK SPIELBERG

natural numbers march off arbitrarily far to the right. Our first theorem about the realnumbers is this fact. As we pointed out before, the proof must rely on the completenessaxiom, since not all ordered fields are Archimedean.

Theorem 1.17. R is Archimedean: for every x ∈ R there exists n ∈ N such that x < n.

Proof. We suppose that R is not Archimedean, and derive a contradiction. So let x ∈ Rbe such that x ≥ n for all n ∈ N. This just means that x is an upper bound for N. Thusthe (non-empty) subset N of R is bounded above. By the completeness axiom, N has asupremum. Let z = sup(N). Now z − 1 < z. By Definition 1.11 (2′′), there is an elementn ∈ N with n > z− 1. But then n+ 1 > z. Since n+ 1 ∈ N, this contradicts Definition 1.11(1). Therefore R is Archimedean. �

We now present some corollaries of the Archimedean property.

Corollary 1.18. If x ∈ R with x > 0, then there exists n ∈ N with 1n< x.

Proof. By the Archimedean property there is n ∈ N with n > 1x. Then 1

n< x. �

Before stating the next corollary, we recall the well-ordering principle (WOP) and one ofits variations. The WOP states that a non-empty subset of N contains a smallest element.This is a fundamental property of the natural numbers — it is logically equivalent to theprinciple of mathematical induction. The variation we need states that a non-empty subsetof Z that is bounded below (in Z) contains a smallest element.

Corollary 1.19. For x ∈ R there exists a unique n ∈ Z with n ≤ x < n+ 1.

Proof. Let x ∈ R. By the Archimedean property there is m ∈ N with m > |x|. Thenx > −m, so the set {k ∈ Z : k > x} is non-empty and bounded below (by −m). Let n + 1be its smallest element. Then n + 1 > x. But since n < n + 1, n is not in this set, son ≤ x. This proves existence. For uniqueness, suppose that n and n′ both do the job. Thenx− 1 < n, n′ ≤ x, so (by property 9 of absolute value) we have |n−n′| < 1. Since n, n′ ∈ Zthen n = n′. �

The integer n of Corollary 1.19 is denoted [x]. The function [·] : R→ Z is called the greatestinteger function. (Some people denote it by bxc; b·c is also called the floor function.)

Corollary 1.20. For x ∈ R and for N ∈ N there exists a unique n ∈ Z such that nN≤ x <

n+1N

.

Proof. Apply Corollary 1.19 to Nx. �

Corollary 1.21. For x, ε ∈ R with ε > 0, there exists y ∈ Q such that |x− y| < ε.

Proof. By Corollary 1.18 there is N ∈ N with 1N< ε. By Corollary 1.20 there is n ∈ Z such

that nN≤ x < n+1

N. Let y = n

N. Then y ∈ Q, and |x− y| = x− y < n+1

N− n

N= 1

N< ε. �

The conclusion of Corollary 1.21 is often expressed as: Q is dense in R.The completeness axiom is actually stronger than the Archimedean property. The next

result does not follow from the Archimedean property (as can be seen from the fact that theconclusion does not hold in Q).

Theorem 1.22. Let n ∈ N. Every positive real number has a unique positive nth root.


Proof. We first prove uniqueness. If 0 < y < z then yn < zn, so two distinct positive realnumbers cannot be nth roots of the same real number. We now prove existence. Let a > 1.(If 0 < a < 1, then 1/a > 1. In this case, if we show that 1/a has a positive nth root, thenthe inverse of that root will be a positive nth root for a.) Let E = {x ≥ 0 : xn ≤ a}. Wenote that E 6= ∅ since 1 ∈ E. We claim that E is bounded above. To see this, note that ifx ∈ E then

xn ≤ aan.

Therefore x < a, and we see that a is an upper bound for E. Thus the completeness axiomimplies that y = sup(E) exists. We will show that yn = a, finishing the proof.

First note that y ≥ 1, since 1 ∈ E. We will use Exercise 1.6. Let 0 < ε < 1. First notethat since y − ε < y < y + ε, we have

(1) (y − ε)n < yn < (y + ε)n.

Since y − ε < y, property (2′′) of Definition 1.11 implies that there is x ∈ E with y − ε < x.Then (y − ε)n < xn ≤ a. Also, since y + ε > y then y + ε 6∈ E, and hence a < (y + ε)n.Therefore

(2) (y − ε)n < a < (y + ε)n.

From (1) and (2), and property 9 of absolute value, we have |yn − a| < (y + ε)n − (y − ε)n.We have

(y + ε)n − (y − ε)n =n∑j=0

(n

j

)(yn−jεj − yn−j(−ε)j

)=

n∑j=0

(n

j

)yn−jεj

(1− (−1)j

)= 2

n∑j=1j odd

(n

j

)yn−jεj

<

2n∑j=1j odd

(n

j

)yn−j

ε, since ε < 1.

By Exercise 1.6 it follows that yn − a = 0, and hence that yn = a. �

Now we know that R truly is bigger than Q; for example,√

2 ∈ R \Q. This one numbercan be parlayed into many more.

Theorem 1.23. R \Q is dense in R.

Proof. Let x ∈ R, and let ε > 0. By Corollary 1.21 there is z ∈ Q with |x√

2 − z| < ε.In fact, it follows also that we can assume z 6= 0. Let y = z/

√2. Then y ∈ R \ Q, and

|x− y| < ε/√

2 < ε. �

Definition 1.24. The elements of R \Q are called irrational numbers.

Thus the irrational numbers are also dense in R. While Corollary 1.21 and Theorem 1.23treat the rational and irrational numbers symmetrically, in fact the set of irrational numbers

8 JACK SPIELBERG

is much bigger than the set of rationals (Corollary 2.12). Before proving this, we will firstreview some basic facts about the size of sets.

2. Cardinality (briefly)

Definition 2.1. Let A and B be sets.

(1) A and B are equivalent, written A ∼ B, if there exists a bijection from A to B. Inthis case, A and B are said to be of the same cardinality.

(2) A is subequivalent to B, written A � B, if there is a one-to-one function from A toB.

The proof of the following proposition is elementary.

Proposition 2.2. For any sets A, B and C,

• A ∼ A.• If A ∼ B then B ∼ A.• If A ∼ B and B ∼ C then A ∼ C.

The next theorem is very useful, and its proof is a nice exercise.

Theorem 2.3. (Cantor-Bernstein) Let A and B be sets. If A � B and B � A then A ∼ B.

Definition 2.4. Let A be a set.

(1) A is finite if there is n ∈ N ∪ {0} such that A ∼ {1, 2, . . . , n}.(2) A is infinite if A is not finite.(3) A is denumerable if A ∼ N.(4) A is countable if A is finite or denumerable.(5) A is uncountable if A is not countable.

Proposition 2.5. (1) If m 6= n then {1, 2, . . . ,m} 6∼ {1, 2, . . . , n}.(2) N is infinite.(3) A is countable if and only if A � N.(4) Let A1, A2, . . . be countable sets. Then ∪∞n=1An is countable, and for each n, A1 ×· · · × An is countable.

(5) Q is countable.

Proof. The first three statements can be proved as exercises. For the fourth, let An ={xn1, xn2, . . .}. Consider the list: x11, x12, x21, x13, x22, x31, . . .. For each entry, delete allsubsequent occurrences. What is left is a list, without duplications, of the elements of theunion. This defines a bijection from N to the union.

Suppose inductively that A1 × · · · × An is countable. Then

A1 × · · · × An+1 = ∪x∈An+1A1 × · · · × An × {x}is countable.

For the last statement, first note that Z is countable, as can be seen from the list: 0, 1,-1, 2, -2, . . .. Since Z ∼ 1

nZ, it follows from Proposition 2.2 that 1

nZ is countable. Then

Q = ∪∞n=11nZ is countable. �

Example 2.6. Let X =∏∞

1 {0, 1} ={

(x1, x2, . . .) : xi ∈ {0, 1} for all i}

. (Thus X is theset of all sequences of 0’s and 1’s.)

Proposition 2.7. X is uncountable.


Proof. We will show that if f : N→ X is any function, then f is not onto. Therefore theredoes not exist a bijection from N to X.

So let f : N → X be given. Let f(n) be the sequence (xn1, xn2, xn3, . . .). Define anelement y = (y1, y2, . . .) ∈ X by yn = 1− xnn. Then for each n, y and f(n) differ in the nthslot, so that y 6= f(n). Therefore y is not in the range of f . Therefore f is not onto. �

Remark 2.8. X ∼ P(N). To define a bijection from X to P(N), send a sequence x =(x1 x2 . . .) to the set {n ∈ N : xn = 1}. It is easy to check that this works. In fact, this is aspecial case of a general theorem of Cantor.

Theorem 2.9. If S is any set, and if f : S → P(S) is any function, then f is not onto.Thus for any set S, S 6∼ P(S). (Since it is evident that S � P(S), we observe that P(S)has a larger cardinality than S.)

Proof. Given f , let E = {x ∈ S : x 6∈ f(x)}. It is easy to check that E is not in the rangeof f . �

The next result will be proved later (Corollary 6.4).

Theorem 2.10. R ∼ X.

Corollary 2.11. R is uncountable.

The previous corollary (and hence also the next corollary) can be proved from the resultsof the next section, rather than from Corollary 6.4.

Corollary 2.12. The set of irrational numbers is uncountable.

3. Decimal representation of real numbers

We like to think of elements of R as infinite decimals: x ∼ x0.x1x2x3 · · · , where x0 ∈ Zand xn ∈ {0, 1, . . . , 9} for n ≥ 1. We want to make this precise without using infinite series.

Let x ∈ R. Let x0 = [x] ∈ Z. Then x0 ≤ x < x0 + 1, so

0 ≤ x− x0 < 1.

Then 0 ≤ 10(x− x0) < 10. We let x1 = [10(x− x0)] ∈ {0, 1, . . . , 9}. We have

x1 ≤ 10(x− x0) < x1 + 1

x110−1 ≤ x− x0 < x110−1 + 10−1

0 ≤ x− x0 − x110−1 < 10−1.

Inductively, suppose that we have constructed xn−1 ∈ {0, 1, . . . , 9} such that

0 ≤ x−n−1∑i=0

xi10−i < 10−(n−1).

Then 0 ≤ 10n(x−∑n−1

i=0 xi10−i) < 10. We set

xn = [10n(x−n−1∑i=0

xi10−i)] ∈ {0, 1, . . . , 9}.

Then xn ≤ 10n(x−∑n−1

i=0 xi10−i) < xn + 1, and hence

(1) 0 ≤ x−∑n

i=0 xi10i < 10−n.

10 JACK SPIELBERG

Thus we have defined x0 ∈ Z and xn ∈ {0, 1, . . . , 9} for n ≥ 1 so that (1) holds for all n.

(2) x = sup{∑n

i=0 xi10−i : n ≥ 0}

.

Proof. The proof is left as an exercise. �

(3) (xn) is not eventually equal to 9; precisely, for every n there is m ≥ n such thatxm 6= 9. The point is that, for example, if we start with x = 1, we will obtain theexpansion 1.0000 · · · , and NOT 0.9999 · · · .


(4) If x 6= y then there exists k such that xk 6= yk. In other words, the map that takes areal number to its decimal expansion is one-to-one.

Proof. Let x < y. Choose n such that 10−n < y − x. Then

n∑i=0

xi10−i ≤ x < y − 10−n <n∑i=0

yi10−i.

Hence there exists k ≤ n such that xk 6= yk. �

(5) Let y0 ∈ Z and yn ∈ {0, 1, . . . , 9} for n ∈ N be such that (yn) is not eventually equalto 9. Then there is a real number x such that xn = yn for all n. This will prove thatthere is a one-to-one correspondence between real numbers and decimal expansionsthat do not terminate in a string of 9’s.

Proof. First note that∑n

i=1 9/10i = 1− 10−n, by summing a finite geometric series. Now wehave

n∑i=0

yi10−i ≤ y0 +n∑i=1

9 · 10−i = y0 + 1− 10−n < y0 + 1.

Thus the set {∑n

i=0 yi10−i : n ≥ 0} is bounded above. Let x be the supremum of this set.Note that the elements of this set, indexed by n, form an increasing sequence. We will showthat xn = yn for all n. For n = 0, choose k such that yk 6= 9. For any m ≥ k,

m∑i=0

yi10−i ≤ y0 +m∑i=1

9 · 10−i − 10−k = y0 + 1− 10−m − 10−k < y0 + 1− 10−k.

It follows that x ≤ y0 + 1− 10−k < y0 + 1. Therefore x0 ≤ y0. On the other hand, we havethat y0 =

∑0i=0 yi10−i, so y0 ≤ x, and hence y0 ≤ x0. Thus x0 = y0. Suppose inductively

that xi = yi for i < n. Choose k > n with yk 6= 9, and let m ≥ k. We have

m∑i=0

yi10−i ≤n∑i=0

yi10−i +m∑

i=n+1

9 · 10−i − 10−k

=n∑i=0

yi10−i + (1− 10−m)− (1− 10−n)− 10−k

<n∑i=0

yi10−i + 10−n − 10−k.


Since this is true for all m ≥ k, we have

x ≤n∑i=0

yi10−i + 10−n − 10−k.

Since xi = yi for i < n, we have

x−n−1∑i=0

xi10−i ≤ yn10−n + 10−n − 10−k

10n(x−n−1∑i=0

xi10−i) ≤ yn + 1− 10−(k−n) < yn + 1

xn = [10n(x−n−1∑i=0

xi10−i)] ≤ yn.

For the reverse inequality,

x−n−1∑i=0

xi10−i = x−n−1∑i=0

yi10−i ≥n∑i=0

yi10−i −n−1∑i=0

yi10−i = yn10−n.

Hence 10n(x−∑n−1

i=0 xi10−i) ≥ yn, and hence xn ≥ yn. �

As we mentioned at the end of the previous section, the decimal representation of realnumbers can be used to prove that R is uncountable. The idea of the proof is a special caseof the proof of Cantor’s theorem. It is usually called Cantor’s diagonal argument.

Proof. (of Corollary 2.11) We suppose that R is countable, and deduce a contradiction. Letx1, x2, . . . be a listing of the elements of (the supposedly countable set) R. Let xn have thedecimal representation xn0.xn1xn2 · · · . For each n ≥ 1 define yn as follows: if xnn 6= 1 letyn = 1; if xnn = 1, let yn = 2. By construction, the sequence of digits yn is not eventually 9,and therefore it is the decimal representation of a real number y. Now we see that for eachn, y and xn have decimal representations differing in the nth place; therefore y 6= xn. Thusy is not in the list we started with, contradicting the assumption that this list contained allreal numbers. �

4. Metric spaces

Much of what we do in analysis ultimately comes down to measuring the distance betweentwo real numbers. We use the absolute value for this: |x − y| is the distance between thenumbers x and y. There are many other situations where we use the distance betweenpoints in an essential way. For example, the Pythagorean theorem is used to define the usualdistance between points in R2, and even in Rn. One of the wonderful abstractions of XXthcentury mathematics is a generalization of this notion of distance. In fact, it isn’t too hard tonotice that everything we use distance for in advanced calculus (e.g. limits, continuity, etc.)relies only on a few very coarse aspects of the distance function. The following definitionsets these out precisely, and gives the basic setting for this course.

Definition 4.1. Let X be a set. A metric on X is a function d : X ×X → R such that

(1) d(x, y) ≥ 0 for all x, y ∈ X (positivity).(2) d(x, y) = 0 if and only if x = y (definiteness).

12 JACK SPIELBERG

(3) d(x, y) = d(y, x) (symmetry).(4) d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality).

Example 4.2. The usual metric on R is defined by d(x, y) = |x− y|.Remark 4.3. Two common variations of the triangle inequality are easily proved as exer-cises:

(1) d(x, y) ≥∣∣d(x, z)− d(y, z)

∣∣.(2) d(x, y) ≤ d(x, z1) + d(z1, z2) + · · ·+ d(zn−1, zn) + d(zn, y).

Many important examples of metric spaces arise from norms on vector spaces.

Definition 4.4. Let V be a real vector space. A norm on V is a function ‖ · ‖ : V → R suchthat

(1) ‖x‖ ≥ 0.(2) ‖x‖ = 0 only if x = 0.(3) ‖cx‖ = |c|‖x‖ for all c ∈ R (and x ∈ V ).(4) ‖x+ y‖ ≤ ‖x‖+ ‖y‖.

Remark 4.5. If ‖ · ‖ is a norm on V , there is an associated metric on V given by d(x, y) =‖x− y‖.Example 4.6. (Function space) Let S be a nonempty set. The bounded (real-valued) func-tions on S are defined by B(S,R) = {f : S → R : the range of f is bounded}. It is easyto see that B(S,R) is a vector space (with point-wise operations). The uniform norm isdefined on B(S,R) by ‖f‖u = supx∈S

∥∥f(x)∥∥. It is an (easy) exercise to check that this is a

norm.

Example 4.7. Rn = R × · · · × R (n factors) can be thought of as B({1, . . . , n},R

). The

uniform norm here is usually denoted ‖ · ‖∞: ‖(x1, . . . , xn)‖∞ = max1≤i≤n |xi|.Another important way of producing a norm on a vector space is by means of an inner

product.

Definition 4.8. Let V be a real vector space. An inner product on V is a function 〈·, ·〉 :V × V → R such that

(1) 〈x, x〉 ≥ 0.(2) 〈x, x〉 = 0 only if x = 0.(3) 〈x, y〉 = 〈y, x〉.(4) 〈ax+ by, z〉 = a〈x, z〉+ b〈y, z〉 for all a, b ∈ R (and x, y, z ∈ V ).

Property 4 of Definition 4.8 is called linearity in the first variable. By properties 3 and4 it follows that inner products are also linear in the second variable. It follows from thesethat 〈0, y〉 = 〈y, 0〉 = 0 for all y ∈ V .

Theorem 4.9. (Cauchy-Schwartz inequality) Let(V, 〈·, ·〉

)be an inner product space. Then∣∣〈x, y〉∣∣ ≤ 〈x, x〉1/2〈y, y〉1/2.

Proof. By the remarks before the theorem, the inequality holds if any of x, y, and 〈x, y〉equals zero. So suppose that all three are non-zero. Let a = −sgn

(〈x, y〉

)〈y, y〉1/2 and

b = 〈x, x〉1/2. (Recall that sgn(t) = 1 if t > 0, = −1 if t < 0, and = 0 if t = 0.) Then

0 ≤ 〈ax+ by, ax+ by〉 = a2〈x, x〉+ 2ab〈x, y〉+ b2〈y, y〉 = a2b2 − 2|a| b∣∣〈x, y〉∣∣+ b2a2.

Dividing by 2|a| b gives the result. �


Corollary 4.10. Let V be an inner product space. For x ∈ V let ‖x‖ = 〈x, x〉1/2. Then ‖ · ‖is a norm on V .

Proof. We will prove the triangle inequality, leaving the verification of the other propertiesof a norm as an exercise. Let x, y ∈ V . Then by the Cauchy-Schwartz inequality,

‖x+ y‖2 = 〈x+ y, x+ y〉 = 〈x, x〉+ 〈x, y〉+ 〈y, x〉+ 〈y, y〉 = ‖x‖2 + 2〈x, y〉+ ‖y‖2

≤ ‖x‖2 + 2‖x‖ ‖y‖+ ‖y‖2 =(‖x‖+ ‖y‖

)2. �

Example 4.11. The usual norm on Rn arises from the usual inner product. The corre-sponding metric space is usually referred to as (n-dimensional) Euclidean space. We notethe following important inequalities for the Euclidean norm (proof by squaring).

Remark 4.12. Let x ∈ Rn. Then for any i,

|xi| ≤ ‖x‖ ≤ |x1|+ · · ·+ |xn|.Definition 4.13. Let (X, d) be a metric space, and let Y ⊆ X. If we restrict the metric dto points of Y then Y becomes a metric space, called a subspace of X.

Example 4.14. The circle (or torus) is a subspace of Euclidean space: T ={

(x, y) ∈ R2 :

x2 + y2 = 1}

. (Thus, for example, d((1, 0), (0, 1)

)=√

2.)

It is very important to remember that, while pictures can give a lot of valuable intuition,they are not a substitute for a proof. In this course, you may never use a picture as partof a proof (though they can be included to help explain what you are doing). Well, it isn’treally enough to just tell you not to touch the stove — you really have to burn yourself. Thefollowing example is much more frequently encountered than you might imagine the firsttime you see it. You should work through carefully on your own the details of the proof thatit is a metric space, and try to visualize it in some way (it’s unclear what that means!). Itprovides a counterexample to many “obvious” facts about metric spaces that are not actuallytrue. The point is this: any theorem that we prove about metric spaces must be true for allmetric spaces. In particular, it will be true for the metric space in the next example.

Example 4.15. Recall the set X from Example 2.6. We define a metric on X as follows.for x, y ∈ X with x 6= y, the set {i : xi 6= yi} is non-empty. By the well-ordering principle,it has a least element. We set k(x, y) = min{i : xi 6= yi}. Then we define

d(x, y) =

{1

k(x,y), if x 6= y

0, if x = y.

We claim that d is a metric on X. The proofs of positive definiteness and symmetry areimmediate. We will verify the triangle inequality. In fact, we will prove something stronger,called the ultrametric inequality.

Lemma 4.16. For any x, y, z ∈ X, d(x, y) ≤ max{d(x, z), d(y, z)

}.

Proof. We will write “k(x, x) = ∞” as a kind of shorthand. (But notice that then we havethat d(x, y) < d(u, v) if and only if k(x, y) > k(u, v), for any points x, y, u, v ∈ X.)Now let x, y, z ∈ X. If d(x, y) ≤ d(x, z) then the inequality holds. So suppose thatd(x, y) > d(x, z). Then k(x, y) < k(x, z). Since xi = zi for i < k(x, z), we have thatzk(x,y) = xk(x,y) 6= yk(x,y). Therefore k(y, z) ≤ k(x, y), and hence d(x, y) ≤ d(y, z). Therefore

max{d(x, z), d(y, z)

}= d(y, z) ≥ d(x, y). �

14 JACK SPIELBERG

The following is another example of a metric space that varies from what our intuitionsuggests. This one often seems like a stupid metric space . . . well, it is stupid, but it isalso a metric space. Every theorem about metric spaces must be true for it, and hence anystatement that is not true for this example, cannot be proven using only the axioms of ametric space.

Example 4.17. Let S be any set. The discrete metric on S is defined by

d(x, y) =

{1, if x 6= y,

0, if x = y.

Remark 4.18. It is easy to see that the discrete metric on a set with n points can berealized as a subspace of Euclidean n-space. It is a little harder to find a natural setting forthe discrete metric on N. The discrete metric on R is a useful counterexample to keep inmind.

5. The topology of metric spaces

Definition 5.1. Let (X, d) be a metric space, and let a ∈ X, r > 0. The open ball withcenter a and radius r is the set

Br(a) ={x ∈ X : d(x, a) < r

}.

The closed ball with center a and radius r is the set

Br(a) ={x ∈ X : d(x, a) ≤ r

}.

Example 5.2. In R, Br(a) = (a− r, a+ r) and Br(a) = [a− r, a+ r]. You should sketch thepictures of the open and closed balls in R2. These pictures are extremely useful as intuitionwhen proving things. But it is NEVER permissible to use a picture as a substitute for aproof.

Definition 5.3. Let (X, d) be a metric space, and let E ⊆ X.

(1) E is an open set if for each x ∈ E there is r > 0 such that Br(x) ⊆ E.(2) E is a closed set if Ec is an open set.

Proposition 5.4. In a metric space, open balls are open sets and closed balls are closed sets.

Proof. Let a ∈ X and r > 0, and let x ∈ Br(a). We need to find an open ball centered at x(with some positive radius) that is completely contained in Br(a). We know that d(x, a) < r,since x ∈ Br(a). Let s = r − d(x, a). Then s > 0. We claim that Bs(x) ⊆ Br(a). To provethis, let y ∈ Bs(x). Then d(y, x) < s. Then

d(y, a) ≤ d(y, x) + d(x, a) < s+ d(x, a) = r − d(x, a) + d(x, a) = r,

and hence y ∈ Br(a). Therefore Br(a) is an open set.The proof that closed balls are closed sets is left as an exercise. �

Proposition 5.5. The following hold in a metric space.

(1) The union of any collection of open sets is open.(2) The intersection of a finite collection of open sets is open.(3) The intersection of any collection of closed sets is closed.(4) The union of a finite collection of closed sets is closed.


Proof. These are easy exercises using DeMorgan’s laws (and the notation for families ofsets). �

Example 5.6. (1) In any metric space X, X and ∅ are both open and closed. (It is afairly deep fact (to be proved later) that if X = Rn then these are the only sets thatare simultaneously open and closed.)

(2) A singleton set in a metric space is a closed set.

Proof. Let x ∈ X and y ∈ {x}c. Then y 6= x, so r = d(x, y) > 0. Then Br(y) ⊆{x}c. �

(3) A finite subset of a metric space is a closed set.(4) In Rn, sets of the form {x : xi > c}, {x : xi < c}, are called open half-spaces, and are

open sets. Sets of the form {x : xi ≥ c}, {x : xi ≤ c} are called closed half-spaces,and are closed sets.

Proof. Since a closed half-space is the complement of an open half-space, it is enoughto prove openness of open half-spaces. If H = {x : xi > c} and y ∈ H, let r = yi−c >0. If z ∈ Br(y) then yi−zi ≤ |zi−yi| ≤ d(z, y) < r. Then zi = yi−(yi−zi) > yi−r = c,and hence z ∈ H. Thus Br(y) ⊆ H, and we have shown that H is open. The prooffor the other kind of open half-space is left as an exercise. �

Definition 5.7. An open box in Rn is a set of the form (a1, b1) × · · · × (an, bn), where−∞ ≤ ai ≤ bi ≤ ∞ for each i. Closed boxes in Rn are defined similarly, by including allfinite endpoints of the interval factors of the Cartesian product. We note that an open boxis a finite intersection of (at most 2n) open half-spaces, and hence is an open set. Similarly,closed boxes are closed sets.

It is important to remember that, while the complement of an open set is a closed set, theopposite of “open” is not “closed” — many (most, even) sets are neither open nor closed.We next introduce the operations of interior and closure. These provide important open andclosed sets associated with arbitrary subsets of a metric space.

Definition 5.8. Let X be a metric space and let E ⊆ X. The interior of E is the set

int (E) =⋃{

U : U ⊆ E and U is open}.

The closure of E is the set

E =⋂{

K : K ⊇ E and K is closed}.

Remark 5.9. We observe that

(1) int(E) is an open set, and is the largest open set contained in E.(2) E is a closed set, and is the smallest closed set containing E.(3) E =

(int(Ec)

)c, and int(E) = (Ec)c.

16 JACK SPIELBERG

Proof. The first two items follow immediately from the definitions. For the third item, wehave

E = ∩{K : E ⊆ K, K closed}(E)c = ∪{Kc : E ⊆ K, K closed}

= ∪{Kc : Kc ⊆ Ec, Kc open}= ∪{U : U ⊆ Ec, U open}= int (Ec).

Taking complements of both sides yields the first formula. If we apply the first formula toEc, and take complements of both sides, we obtain the second formula. �

The above definitions are abstract, in that they don’t give an explicit criterion to use todecide if a point does or does not belong to the interior or closure of a set. We now givesuch criteria.

Proposition 5.10. (1) x ∈ int(E) if and only if there is r > 0 such that Br(x) ⊆ E.(2) x ∈ E if and only if for every r > 0, Br(x) ∩ E 6= ∅.

Proof. (1) This is almost instantly obtained from the definition, and we leave the detailsas an exercise.

(2) We note that x ∈ (E)c if and only if x ∈ Int(Ec), by Remark 5.9(3). But this is trueif and only if there is r > 0 such that Br(x) ⊆ Ec, by part (1). But this is true if andonly if there is r > 0 such that Br(x) ∩ E = ∅. By negating the first and last itemsin this chain of equivalent statements, we find that x ∈ E if and only if for all r > 0,Br(x) ∩ E 6= ∅.

�

Example 5.11. It is worth thinking about the above definitions and results in the contextof some examples. We note that in any metric space, if E is open then int(E) = E, while ifE is closed then E = E.

In R,

(1) int((a, b]

)= (a, b).

(2) int (Z) = ∅.(3) int (Q) = ∅.(4) (0, 1] = [0, 1].(5) Z = Z.(6) Q = R.

It might seem tempting to try to describe the new points sucked in by the closure operation,i.e. the points of E that are not already in E. However it turns out to be much more usefulto describe the property that brings them into the closure. This property may apply also tosome points already in E.

Definition 5.12. Let X be a metric space, E ⊆ X, and a ∈ X. The point a is a clusterpoint of E (also called by some people limit point or accumulation point) if for every r > 0,the intersection E ∩Br(a) is infinite. We write E ′ for the set of cluster points of E.

Example 5.13. Let X = R.


(1) {1, 12, 13, . . .}′ = {0}.

(2) {0, 1, 12, 13, . . .}′ = {0}.

(3) Z′ = ∅.(4) Q′ = R.

Note that E ′ ⊆ E — this follows from Proposition 5.10(2). Therefore E ∪ E ′ ⊆ E. Infact, the two sides are equal, which fact is the content of the next result.

Proposition 5.14. E = E ∪ E ′.

Before proving the proposition, we give a lemma that may seem surprising at first.

Lemma 5.15. a ∈ E ′ if and only if for each r > 0,(E \ {a}

)∩Br(a) 6= ∅.

Proof. (⇒): Suppose that for some r > 0,(E \ {a}

)∩Br(a) = ∅. Then E ∩Br(a) ⊆ {a}, a

finite set. Hence a 6∈ E ′.(⇐): Let a 6∈ E ′. Then there is r > 0 such that E ∩Br(a) is finite. Let

(E ∩Br(a)

)\ {a} =

{x1, x2, . . . , xm}. Let s = mini d(a, xi) > 0. Then(E \ {a}

)∩Bs(a) = ∅. �

Proof. (of Proposition 5.14) We already proved ⊇ in the comments before the proposition.For ⊆, let a ∈ E. If a ∈ E then clearly a ∈ E ∪ E ′. So suppose that a 6∈ E. Letr > 0. By Proposition 5.10(2) we have E ∩ Br(a) 6= ∅. But since a 6∈ E this implies that(E \ {a}

)∩Br(a) 6= ∅. By Lemma 5.15, a ∈ E ′. �

Corollary 5.16. E is closed if and only if E ′ ⊆ E.

Definition 5.17. Let X be a metric space. A subset E ⊆ X is called bounded if there is aball in X that contains E. X is bounded if it is a bounded subset of itself. (In this case wesay that the metric is bounded.)

Remark 5.18. In Definition 5.17 it doesn’t matter whether the ball is required to be openor closed.

Exercise 5.19. Let X be a metric space, and let E ⊆ X with E 6= ∅. We define thediameter of E by

diam(E) =

{sup{d(x, y) : x, y ∈ E

}, if E is bounded,

∞, if E is unbounded.

(1) Prove that diam(E) <∞ if and only if diam(E) <∞.(2) Prove that diam(E) = diam(E).

6. The Cantor set

In this section we introduce the first “interesting” set that most people come across. LetF0 = [0, 1], F1 = [0, 1

3] ∪ [2

3, 1], F2 = [0, 1

9] ∪ [2

9, 13] ∪ [2

3, 79] ∪ [8

9, 1], and so on. Recursively,

Fn is obtained by removing the open middle third from each subinterval of Fn−1. Thus Fnis the disjoint union of 2n closed intervals, each of length 3−n. Fn is closed, nonempty, andFn ⊇ Fn+1.

Definition 6.1. The Cantor set, C, is the set⋂∞n=0 Fn.

18 JACK SPIELBERG

It is a good idea to draw a picture. It isn’t hard to see that C is nonempty: all theendpoints of the closed subintervals making up the Fn’s belong to C. Still, this set ofendpoints is a countable set. In fact, C is much bigger, as we will now see. Recall the spaceX of Definition 2.6. We will prove that C ∼ X.

Definition 6.2. We define f : X → C as follows. Let x = (x1, x2, . . .) ∈ X. For each ndefine a closed interval In(x) recursively by

I0(x) = [0, 1]

In+1 =

{left piece of In(x) ∩ Fn+1, if xn+1 = 0,

right piece of In(x) ∩ Fn+1, if xn+1 = 1.

Then I0(x) ⊇ I1(x) ⊇ · · · . Let us write In(x) = [an, bn]. The nesting of these intervalsimplies that

a1 ≤ a2 ≤ · · · ≤ b2 ≤ b1.

Let α = sup{a1, a2, . . .} and β = inf{b1, b2, . . .}. We claim that⋂∞n=0 In(x) = [α, β]. To see

this, we first note that since an ≤ α ≤ β ≤ bn for all n, [α, β] ⊆⋂∞n=0 In(x). On the other

hand, if x ∈⋂∞n=0, then an ≤ x ≤ bn for all n. Hence x is an upper bound for the set of an’s,

and a lower bound for the set of bn’s. Thus α ≤ x ≤ β. This proves the claim. Finally, sincebn− an = 3−n, we have β−α ≤ 3−n for all n. Therefore α = β. It follows that

⋂∞n=0 In(x) is

the singleton set {α}. We define f by setting f(x) = α. More precisely, the above argumentallows us to describe f as follows:

{f(x)} =∞⋂n=0

In(x).

Proposition 6.3. f is bijective.

Proof. We first show that f is injective. Let x, y ∈ X with x 6= y. Let k = k(x, y) (recallExample 4.15). For i < k, xi = yi, so that Ii(x) = Ii(y). Since xk 6= yk, Ik(x) and Ik(y) aretwo disjoint subintervals of Ik−1(x) = Ik−1(y). Since f(x) ∈ Ik(x) and f(y) ∈ Ik(y), we musthave f(x) 6= f(y)..

We now show that f is surjective. Let t ∈ C. Then t ∈ Fn for all n. For each n, let Inbe the subinterval of Fn containing t. Since In and In+1 are subintervals of Fn and Fn+1,respectively, then either In ⊇ In+1 or In ∩ In+1 = ∅. Since both contain t, we must haveIn ⊇ In+1. Thus we must have

⋂∞n=0 In = {t}. Now let

xn =

{0, if In is the left piece of In−1 ∩ Fn,1, if In is the right piece of In−1 ∩ Fn.

Letting x = (x1, x2, . . .) ∈ X, we see that In = In(x) for all n, so that t = f(x). �

Corollary 6.4. R, C, X, and P(N) are equivalent sets. In particular, R is uncountable.

Proof. In Remark 2.8 we sketched the proof that X ∼ P(N), while in Proposition 6.3 we sawthat X ∼ C. We finish the proof by showing that R ∼ C. Since C ⊆ R we have C � R. Bythe Cantor-Bernstein theorem, it suffices to show that R � C. Since C ∼ P(N), it sufficesto show that R � P(N). But since N ∼ Q, we know that P(N) ∼ P(Q). Thus we willbe finished if we can show that R � P(Q). We do that as follows. We define a functiong : R→ P(Q) by

g(t) = {q ∈ Q : q < t}.


If s 6= t are distinct points of R, say s < t, then by the density of Q in R there existsq ∈ Q with r < q < t. Then q ∈ g(t) and q 6∈ g(s), and we have g(s) 6= g(t). Hence g isone-to-one. �

Exercise 6.5. (1) int(C) = ∅.(2) C ′ = C.

7. Sequences

Definition 7.1. Let X be a set. A sequence in X is a function x : N→ X.

Remark 7.2. We usually write xn instead of x(n), but the latter notation is often usefultoo. We sometimes write (xn)∞n=1, or (xn), for x. It is important to remember that in thisnotation, n is a dummy variable — it is the argument of the function x. (So, in particular,there is nothing special about the letter n used as the argument — it will often be convenientto use a different letter.) Some texts use curly braces instead of parentheses, but we willavoid this notation, for the following reason. The range of the sequence x is the subset{xn : n ∈ N} of X. This is often referred to as the set of terms of (xn). It is importantto distinguish between the sequence itself (which is a function from N to X), and its set ofterms (which is a subset of X).

While we are on the subject of the subtlety of the notation for sequences, let me pointout a common mistake to guard against. What should we make of the following statement(taken from more than one actual homework paper!): “Let (xn) be a sequence, and let (xi)be another sequence.”? Of course, this deserves a quantity of red ink, but you should thinkcarefully about the precise error. (And PLEASE don’t make this mistake too.)

Definition 7.3. Let (xn) be a sequence in a metric space X, and let a ∈ X. (xn) convergesto a if for every ε > 0, there exists n0 ∈ N such that for all n ≥ n0 we have d(xn, a) < ε. Wewrite xn → a (as n→∞) to indicate that (xn) converges to a.

Lemma 7.4. A sequence in a metric space converges to at most one point.

Proof. Suppose that xn → a and xn → b. Let ε > 0. There exist n1, n2 ∈ N such thatd(xn, a) < ε/2 for all n ≥ n1, and d(xn, b) < ε/2 for all n ≥ n2. Let n = max{n1, n2}. Thend(a, b) ≤ d(a, xn) + d(xn, b) < ε/2 + ε/2 = ε. Since d(a, b) < ε for all ε > 0, it follows thata = b. �

Definition 7.5. If xn → a, a is called the limit of (xn), and we write limn→∞ xn = a. Wesay that (xn) converges if it has a limit; otherwise it diverges.

Proposition 7.6. Let X be a metric space, let E ⊆ X, and let a ∈ X.

(1) a ∈ E if and only if there is a sequence in E converging to a.(2) a ∈ E ′ if and only if there is a sequence in E \ {a} converging to a.(3) E is closed if and only if every sequence in E that converges in X has its limit in E.

Proof. We prove part of the proposition, and leave the rest as an exercise.(1) (⇒): Let a ∈ E. By Proposition 5.10(2), for each n ∈ N we have E ∩ B1/n(a) 6= ∅.Choose xn ∈ E with d(xn, a) < 1/n. Then xn → a. �

Remark 7.7. Sequences are an important tool for studying metric spaces. One can thinkof a sequence as a kind of “probe” — a function from N to the space picks out a certaincountable subset in a manner indexed by the natural numbers. It is also useful to usesequences as tools to study a sequence itself. This leads to the next definition.

20 JACK SPIELBERG

Definition 7.8. Let x be a sequence in a set X, and let n be a strictly increasing sequencein N. (Thus n : N→ N satisfies n1 < n2 < n3 < · · · .) Then x ◦ n is another sequence in X.It is called a subsequence of x.

Remark 7.9. The terms of the subsequence x ◦ n may be denoted (x ◦ n)i = x(n(i)

)=

xn(i) = xni. Thus we may write x ◦ n =

(xni

)∞i=1

.

The idea of a subsequence is pretty simple, but the notation can lead to lots of sillymistakes, against which you should be on guard. For example, let (xn) be a sequence. Theexpression x50 makes sense — it is the 50th term of the sequence. Now let

(xni

)be a

subsequence. The expression xn50 makes sense — it is the 50th term of the subsequence,and equivalently, it is the n50th term of the original sequence. However, the expression x50idoes not make sense. If we try to interpret it, we first realize that it is the value of thefunction x at the argument 50i. So 50i must be an element of the domain of x, namely anatural number. Now 50i must be the value of the function 50 at the argument i. But thisis nonsense — ‘50’ is not a function, so it can’t be ‘evaluated’ at the argument i.

Here is another example to keep in mind. Suppose that we have a bunch of sequencesin X. Say that x1, x2, . . . are all sequences (i.e. we have a sequence of sequences). Howshould we write the terms of the nth sequence? We have that xn : N→ X, so we can writexn =

(xn(i)

)∞i=1

, using function notation for xn. Note carefully that i is the argument of thefunction xn, and not the argument of n (which is not a function). We have to be carefulabout using subscript notation. If we weren’t being careful, we might write xn = (xni)

∞i=1.

But this is the same as the notation for a subsequence of a sequence x. One resolution of thisambiguity is to use more parentheses: xn =

((xn)i

)∞i=1

. The more usual way is to use twosubscripts: xn = (xni)

∞i=1, and this is what we will do when we are faced with this situation.

Writing it out longhand for clarity gives xn = (xn1, xn2, xn3, . . .). Note that it is necessary towrite so clearly that the reader does not mistake the second subscript for a sub-subscript.

Here is a simple result about subsequences.

Proposition 7.10. Let (xn) be a convergent sequence in a metric space. Then every subse-quence of (xn) is also convergent, and has the same limit.

Remark 7.11. Before proving the proposition, we observe that if n : N → N is strictlyincreasing, then ni ≥ i for all i. This is easily proved by induction on i, and we omit theproof. We do point out that equality is possible. In fact, letting ni = i for all i shows thatany sequence is a subsequence of itself.

Proof. (of Proposition 7.10) Let xn → a, and let (xni) be a subsequence. We will show that

xni→ a. Let ε > 0. Since xn → a, there is m such that d(xn, a) < ε whenever n ≥ m. Now

if i ≥ m, then ni ≥ m, by the remark, so that d(xni, a) < ε. Thus xni

→ a (as i→∞). �

Remark 7.12. It is clear from the definition that convergence or divergence of a sequenceis unaffected if finitely many terms are changed. Convergence, divergence, the limit if con-vergent, are examples of properties of a sequence that depend only on the ultimate behaviorof the sequence. In fact, such properties are the only ones that are important for sequences.One way to describe this is by means of tails of a sequence. If (xn) is a sequence, the nthtail is the subsequence (xi)

∞i=n. Thus, if the sequence converges to L, then every tail of

the sequence also converges to L. We sometimes say that a property holds eventually for asequence if it holds for some tail.


Definition 7.13. A subset of a metric space is bounded if it is contained in some ball. Afunction having a metric space as codomain is bounded if its range is a bounded subset ofthe codomain (cf. Example 4.6).

Of course, since a sequence in a metric space is an example of a function with the metricspace as codomain, it makes sense to talk of bounded (and unbounded) sequences. The proofof the next result is a good exercise, but it will also follow from some later results.

Lemma 7.14. Let (xn) be a convergent sequence (in some metric space). Then (xn) isbounded.

8. Continuous functions

Definition 8.1. Let (X, d) and (Y, ρ) be metric spaces, f : X → Y a function, and x0 ∈ X.f is continuous at x0 if for every ε > 0 there exists δ > 0 such that for every x ∈ X, ifd(x, x0) < δ then ρ

(f(x), f(x0)

)< ε. f is continuous if it is continuous at each point of X.

Remark 8.2. Here are some equivalent formulations of continuity at a point x0.

(1) For every ε > 0 there exists δ > 0 such that f(Bδ(x0)

)⊆ Bε

(f(x0)

).

(2) For every open ball C with center f(x0), there exists an open ball B with center x0such that f(B) ⊆ C.

(3) For every ε > 0 there exists δ > 0 such that Bδ(x0) ⊆ f−1(Bε

(f(x0)

)).

Example 8.3. (If the proof is not given, it is an exercise.)

(1) Let f : R→ R be given by f(x) = x2. Then f is continuous.

Proof. Let x0 ∈ R, and let ε > 0. Then for any x ∈ R,∣∣f(x)− f(x0)∣∣ = |x2 − x20|

= |x− x0| |x+ x0|≤ |x− x0|

(|x− x0|+ 2|x0|

);

if |x− x0| < 1, then

≤ |x− x0|(1 + 2|x0|

);

if |x− x0| < ε/(1 + 2|x0|), then

< ε.

Now choose δ > 0 such that δ < min{

1, ε/(1 + 2|x0|)}

. Then |x − x0| < δ impliesthat |x2 − x20| < ε. �

(2) Define the identity function id : X → X by id(x) = x. id is continuous.(3) Fix y0 ∈ Y . Define f : X → Y by f(x) = y0 for all x ∈ X. Then f is continuous. (f

is called a constant function.)(4) Define χQ : R→ R by

χQ(x) =

{1, if x ∈ Q0, if x 6∈ Q.

χQ is discontinuous at each point of R.

22 JACK SPIELBERG

(5) (The Hermite function) Define h : R→ R by

h(x) =

{1n, if x = m

nin lowest terms, where m, n ∈ Z with n > 0

0, if x ∈ R \Q.

Then h is continuous at each irrational number, and discontinuous at each rationalnumber. The proof is a nice exercise. (It is interesting to consider the oppositecontinuity behavior.)

(6) We define the coordinate projections on Rn, πi : Rn → R, by πi(x) = xi. The πi arecontinuous (by Remark 4.12).

Earlier we said that sequences are an important tool for studying objects in analysis. Asevidence, we now show how to use sequences to characterize continuity of a function betweenmetric spaces.

Theorem 8.4. Let X and Y be metric spaces, and let f : X → Y be a function. fis continuous if and only if for every convergent sequence xn → a in X, we have thatf(xn)→ f(a) in Y . (Thus f is continuous if and only if it preserves convergent sequences,and maps the limit of a convergent sequence to the limit of the image sequence.)

Proof. The forward direction is straightforward, and we leave it as an exercise. For thereverse direction we prove the contrapositive. Suppose that f is not continuous at a. Thenthere is ε > 0 such that for every δ > 0 there is x ∈ Bδ(a) with f(x) 6∈ Bε

(f(a)

). We

apply this to δ = 1/n: thus there is a sequence (xn) in X such that d(xn, a) < 1/n andρ(f(xn), f(a)

)≥ ε. But then clearly xn → a while f(xn) 6→ f(a). �

We didn’t mention this before, but the word topology has a technical meaning: the topologyof a metric space is the collection of all the open subsets of the space. A property of thespace is topological if it can be defined just by using the open sets. It is very important toknow that continuity of functions is a topological property.

Theorem 8.5. f : X → Y is continuous if and only if for every open set V ⊆ Y , the inverseimage f−1(V ) is open in X.

Proof. (=⇒): Let V ⊆ Y be open. Let x0 ∈ f−1(V ). Then f(x0) ∈ V . Since V is openthere is ε > 0 such that Bε

(f(x0)

)⊆ V . By Remark 8.2 (3) there is δ > 0 such that

Bδ(x0) ⊆ f−1(V ). Hence f−1(V ) is open.(=⇒): Let x0 ∈ X and let ε > 0. Since Bε

(f(x0)

)is open, then f−1

(Bε

(f(x0)

))is open.

Since x0 ∈ f−1(Bε

(f(x0)

))there is δ > 0 such that Bδ(x0) ⊆ f−1

(Bε

(f(x0)

)). Therefore f

is continuous at x0 (by Remark 8.2 (3)). �

Exercise 8.6. f : X → Y is continuous if and only if for every closed set V ⊆ Y , the inverseimage f−1(V ) is closed in X.

This is a good place to introduce the notion of “sameness” for metric spaces. First, thedefinition:

Definition 8.7. Let X and Y be metric spaces. A homeomorphism from X to Y is afunction f : X → Y which is bijective, continuous, and such that its inverse function f−1

is continuous. Two metric spaces are called homeomorphic if there exists a homeomorphismfrom one to the other.


Homeomorphic metric spaces have the same topological structure and properties. It iscolloquial to describe this by saying that one space can be deformed into the other bybending and stretching without tearing. Here are some simple examples.

Example 8.8. (1) Any two open disks in R2 are homeomorphic.(2) Any two closed disks in R2 having positive radii are homeorphic.(3) No open disk in R2 is homeomorphic to any closed disk in R2. (This is not an obvious

one.)(4) Every open ball in Rn is homeomorphic to every open box in Rn.(5) The unit circle T = {x ∈ R2 : ‖x‖ = 1} is not homeomorphic to the unit interval

[0, 1] ⊆ R. (Again, it isn’t so obvious how to prove this.)

Example 8.9. Recall the function f : X → C from Definition 6.2, where X =∏∞

1 {0, 1} isas in Example 2.6, and C is the Cantor set (Definition 6.1). We will show that f and f−1

are continuous functions. First some notation. If (a1, a2, . . . , an) ∈∏n

1{0, 1}, let

Z(a1, . . . , an) = {x ∈ X : xi = ai for 1 ≤ i ≤ n}.Such sets are called cylinder sets. Note that cylinder sets are clopen: Z(a1, . . . , an) =B1/n(x) = B1/(n+1)(x) for any x ∈ Z(a1, . . . , an). Note also that f

(Z(a1, . . . , an)

)= C∩In(x)

(again for any x ∈ Z(a1, . . . , an)), which is a clopen subset of C (recall the definition of In(x)from Definition 6.2). Thus these two families of clopen subsets are paired by the functionf . Since every open subset of X is a union of open balls, i.e. of cylinder sets, and everyopen subset of C is a union of subsets of the form C ∩ In(x) (an exercise!), it follows fromTheorem 8.5 that f and f−1 are continuous.

The proofs of the next two results are easy, and so are left as exercises.

Corollary 8.10. (of Theorem 8.5) Let X be a metric space. f : X → R is continuous ifand only if f−1

((a, b)

)is open for all a < b in R. Equivalently, f : X → R is continuous if

and only if {f < a} and {f > a} are open for all a ∈ R.

Theorem 8.11. Let f : X → Y and g : Y → Z be functions between metric spaces, and letx0 ∈ X. If f is continuous at x0, and g is continuous at f(x0), then g ◦ f is continuous atx0.

9. Limits of functions

The definition we gave a while ago for the limit of a sequence is a special case of a generalnotion of limit of a function — after all, a sequence is just a special kind of function. Butsequences are quite special. The definition of the limit of a function is a little bit moreinvolved. We will need it, in principle, when we talk about differentiation.

Definition 9.1. Let (X, d) and (Y, ρ) be metric spaces, let E ⊆ X, let x0 ∈ E ′, and lety0 ∈ Y . The limit of f , as x approaches x0, equals y0 if for every ε > 0 there exists δ > 0such that for all x ∈ E, if 0 < d(x, x0) < δ then ρ

(f(x), y0

)< ε. (The final implication can

also be expressed as f(E ∩Bδ(x0) \ {x0}

)⊆ Bε(y0).) We write limx→x0 f(x) = y0.

Remark 9.2. Note that f might or might not be defined at x0 (accordingly as x0 ∈ E orx0 6∈ E). We require x0 ∈ E ′ so that for every δ > 0 there will exist points x satisfying thehypothesis of the implication. Even if x0 ∈ E, the definition of the limit as x → x0 neverrequires that f be evaluated at x0 — the value of f at x0 is irrelevant.

24 JACK SPIELBERG

Note further, that if we tried to apply this definition to a point x0 that is not a clusterpoint of E, then we would find that the definition is satisfied for any point y0 ∈ Y . To avoidthis situation, we only consider limits at cluster points of the domain of the function.

Exercise 9.3. Show that in the situation of Definition 9.1, if the limit exists it is unique.(Be sure to note explicitly where the hypothesis that x0 ∈ E ′ is used.)

Lemma 9.4. Let f , etc., be as in Definition 9.1. Define f : E ∪ {x0} → Y by

f(x) =

{f(x), if x ∈ E \ {x0}y0, if x = x0.

Then limx→x0 f(x) = y0 if and only if f is continuous at x0.


Example 9.5. (1) It is easy to show that limt→0 t sin(1/t) = 0. Let f : R→ R be givenby

f(t) =

{t sin 1

t, if t 6= 0

0, if t = 0.

Then f is continuous at 0.(2) It is easy to show that limt→0 sin(1/t) does not exist. Let c ∈ R, and let g : R → R

be given by

g(t) =

{sin 1

t, if t 6= 0

c, if t = 0.

Then g is not continuous at 0.

Remark 9.6. Note that the definition of limit is local — it depends only on the restrictionof f to Br(x0), for any r > 0.

10. Sequences in R

Theorem 10.1. Let (an) and (bn) be sequences in R. Suppose that an → a, and bn → b.Then

(1) an + bn → a+ b.(2) anbn → ab.(3) If b 6= 0 then an/bn → a/b (where at most finitely many terms are not defined).(4) If an ≤ bn for all n, then a ≤ b.

Proof. These are good exercises, so we will only prove part of the third statement; namely,the case where an = 1 for all n. First, let’s sort out the parenthetical comment. If b 6= 0,then |b| > 0. By definition of convergence, there is n0 such that |bn − b| < |b| for all n ≥ n0.But then, for all n ≥ n0 we have |bn| = |b− (b− bn)| ≥ |b|− |b− bn| > |b|− |b| = 0. Thereforebn 6= 0 if n ≥ n0. The quotient sequence will fail to be defined if the denominator equalszero, but this can only happen for finitely many n (all less than n0).

Now let’s prove that if bn → b 6= 0, then 1/bn → 1/b. Let ε > 0. Let n1 be such that|bn − b| < |b|/2 whenever n ≥ n1. We can improve on the previous paragraph. If n ≥ n1 we


have that |bn| ≥ |b| − |b− bn| > |b| − |b|/2 = |b|/2. Now let n2 be such that |bn− b| < |b|2ε/2whenever n ≥ n2. Let n0 = max{n1, n2}. For n ≥ n0 we have∣∣∣∣ 1

bn− 1

b

∣∣∣∣ =

∣∣∣∣b− bnbbn

∣∣∣∣ = |bn − b| ·1

|b|· 1

|bn|<|b|2ε

2· 1

|b|· 2

|b|= ε.

Therefore 1/bn → 1/b. �

Remark 10.2. The first three statements in the theorem mean that the functions + and· : R2 → R, and ÷ : R× (R \ {0})→ R are continuous.

Remark 10.3. (1) It follows from Theorem 10.1(4) that if an < bn for all n, then a ≤ b.Note that even with strict inequalities in the hypotheses, the conclusion will in generalonly be a weak inequality. This reflects a general principle: limits change strictinequalities into weak inequalities.

(2) The following well-known lemma also follows from Theorem 10.1(4).

Lemma 10.4. Let (an) and (bn) be real sequences, suppose that |an| ≤ |bn|, and suppose thatbn → 0. Then an → 0.

Lemma 10.5. Let (xi) be a sequence in Rn. We write the ith term of the sequence as ann-tuple thus: (xi1, . . . , xin) (cf. Remarks 7.9). If a = (a1, . . . , an) ∈ Rn, then xn → a if andonly if xij → aj (as i→∞) for all j = 1, . . ., n.

Proof. These follow easily from Remarks 4.12. �

We now establish convergence of some special, familiar sequences in R.

Proposition 10.6. (1) For any k ∈ N, 1/ k√n→ 0 as n→∞.

(2) For any 0 < a < 1, an → 0 as n→∞.(3) n1/n → 1 as n→∞.(4) For any a ∈ R with 0 < a < 1, and any k ∈ N, annk → 0 as n→∞.

Proof. (1) Let ε > 0. Choose n0 > 1/εk. If n ≥ n0 then n1/k ≥ n1/k0 > 1/ε, and hence

1/ k√n < ε.

(2) We essentially proved this a long time ago, in Remark 1.7.(3) It is evident that n1/n > 1. Let xn = n1/n− 1. Then by property (1) after Definition 1.1,

for n ≥ 2 we have n = (1 + xn)n > n(n−1)2

x2n, and hence x2n <2

n−1 . It follows (using Lemma10.4, and (1)) that xn → 0.(4) This is very similar to the proof of (2). For that we referred to Remark 1.7. In thatremark we saw that if 0 < a < 1 then there is c > 0 such that an < c/n. Let’s apply this tothe number a1/(k+1), which also lies between 0 and 1. Thus there is a positive number d suchthat an/(k+1) < d/n. Raising both sides to the power k + 1 gives that an < dk+1/nk+1, andhence that annk < dk+1/n. By Theorem 10.1(3) and Lemma 10.4, annk → 0 as n→∞. �

Definition 10.7. A sequence (xn) in R is increasing if xn ≤ xn+1 for all n. It is calledstrictly increasing if xn < xn+1 for all n. Decreasing and strictly decreasing sequences aredefined similarly. A sequence is called monotone if any of these terms apply.

Theorem 10.8. An increasing sequence that is bounded above is convergent.

Proof. Let (xn) be an increasing sequence that is bounded above. Then the set of terms,{xn : n ∈ N}, has a supremum, c. We claim that xn → c. Let ε > 0. Since the supremum

26 JACK SPIELBERG

is an upper bound, we have xn ≤ c < c + ε for all n. Since c− ε < c, c− ε is not an upperbound, so there exists n0 with c − ε < xn0 . Then for all n ≥ n0 we have c − ε < xn. Thuswe get that c− ε < xn < c+ ε whenever n ≥ n0. Thus xn → c. �

Exercise 10.9. A bounded monotone sequence is convergent.

Example 10.10. Does

√2 +

√2 +√

2 + · · · mean anything? OK, this is phrased as aphilosophical question, i.e. it’s a joke. But we can still try to give the expression somekind of sense. For example, we could argue that IF it does represent a real number, callit x, then x must satisfy the equation x =

√2 + x. Then it’s easy to see that x = 2.

But this is not valid since we haven’t shown that the expression does indeed represent areal number. Some people might try to make sense of it by interpreting it as a sequence:

(√

2,√

2 +√

2,

√2 +

√2 +√

2, . . .). They would define the expression to be the limit of thissequence, assuming that the sequence converges. We could argue about whether this is areasonable definition for the expression, but we can’t argue with the intelligibility of the newproblem: does the given sequence converge, and if so, to what? (Other people might arguethat the limit of this sequence (if existing) is actually the definition of a different expression;

namely, · · ·+√

2 +√

2 +√

2.) However we come to study this sequence, it is a nice exercisein induction to prove that it is bounded above and increasing. Therefore it converges. Usingthe recursive definition of the sequence (an+1 =

√2 + an), and Theorem 10.1, it is easy to

prove that the limit is, in fact, 2.

We next point out that continuity of real-valued functions is preserved by pointwise arith-metic of functions.

Corollary 10.11. Let f , g : X → Y , and let a ∈ X. If f and g are continuous at a, thenso are f + g, fg, and f/g (if g(a) 6= 0).

Proof. This follows from Theorems 10.1 and 8.4. �

Remark 10.12. The result for limits analogous to the one in Corollary 10.11 holds, as canbe seen by using Lemma 9.4.

Definition 10.13. If f : X → Rn we define the coordinate functions of f by fi = πi ◦ f :X → R (recall the coordinate projections πi from Example 8.3 (6)). We can then writef(x) =

(f1(x), . . . , fn(x)

).

Corollary 10.14. Let f : X → Rn. Then f is continuous if and only if all fi are continuous.

Proof. (=⇒): Use Theorem 8.11 and Example 8.3 (6).(⇐=): Use Remark 4.12 and Theorem 8.4. �

11. Limsup and liminf

In spite of the wonderful theorem about monotone sequences from the last section, mostsequences (even bounded ones) diverge. However, there is still information to be gotten froma divergent sequence.

Let (an) be a bounded sequence in R, say L ≤ an ≤ M for all n. Then for each n,sup{ak : k ≥ n} = sup{an, an+1, an+2 . . .} exists, since the tails of (an) are all boundedabove (by M). (We will usually use the shorthand supk≥n ak for this sup of the nth tail


of the sequence (an).) Notice that {ak : k ≥ n} ⊇ {ak : k ≥ n + 1}, and hence thatsupk≥n ak ≥ supk≥n+1 ak. Of course, since L ≤ ak for all k, we also have L ≤ supk≥n ak forall n. Therefore the sequence of suprema of tails, (supk≥n ak)

∞n=1 is decreasing and bounded

below, and hence converges.

Definition 11.1. Let (an) be a bounded sequence in R. The limit superior, or limsup of(an) is the real number

lim supn→∞

an = limn→∞

(supk≥n

ak).

In a completely analogous way we define the limit inferior, or liminf :

lim infn→∞

an = limn→∞

(infk≥n

ak).

The justification is the opposite of the above: the sequence (infk≥n ak)∞n=1 is increasing and

bounded, so it has a limit.

Theorem 11.2. Let (an) be a bounded sequence in R.

(1) lim infn→∞ an ≤ lim supn→∞ an(2) (an) converges if and only if lim infn→∞ an = lim supn→∞ an, and in this case,

limn→∞

an = lim infn→∞

an = lim supn→∞

an.


The following theorem is usually referred to as the Bolzano-Weierstrass theorem. It is truein Rn as well. In fact, we will use this property as a definition later (Definition 14.17). Theproof of the Bolzano-Weierstrass theorem in Rn will be given then.

Theorem 11.3. Let (an) be a bounded sequence in R. Then (an) has a convergent subse-quence.

Proof. Let c = lim sup an. Let bn = supj≥n aj, so that b1 ≥ b2 ≥ · · · and lim bn = c. Wewill use (bn) to recursively define a subsequence of (an) that converges to c. Choose m1 withbm1 < c+ 1. Then choose n1 ≥ m1 with an1 > bm1 − 1. Then

c− 1 ≤ bm1 − 1 < an1 ≤ bm1 < c+ 1,

so that |an1 − c| < 1. (Exercise: make sure you can explain each of the above inequalities.)Recursively, having chosen 1 ≤ n1 < n2 < · · · < nk−1 with |ani

− c| < 1i

for i = 1, . . .,

k − 1, choose mk > nk−1 so that bmk< c + 1

k. Then choose nk ≥ mk with bmk

− 1k< ank

.Then we have

c− 1

k≤ bmk

− 1

k< ank

≤ bmk< c+

1

k,

and hence |ank− c| < 1

k. Therefore we have defined 1 ≤ n1 < n2 < · · · so that |ank

− c| < 1k

for all k. Thus the subsequence (ank)∞k=1 converges to c. �

Remark 11.4. As a corollary to the proof, we see that every bounded sequence in R has asubsequence converging to the limsup of the sequence. An analogous argument shows thatthere is a(nother) subsequence converging to the liminf of the sequence.

Exercise 11.5. Let (an) be a bounded sequence. Let E be the set of subsequential limits ;that is, E = {x ∈ R : there is a subsequence of (an) converging to x}. Then lim inf an =min(E) and lim sup an = max(E)

28 JACK SPIELBERG

Definition 11.6. We introduce here some standard terminology regarding sequences, re-flecting the idea that it is only the “ultimate” behavior of a sequence that is of interest (cf.Remark 7.12). Our phrasing is very general, hence vague, but expresses a useful notion thatis easy to understand once you see the idea. Let (an) be a sequence, and let P be someproperty that the terms of the sequence might have. We say that P holds eventually if Pholds for all terms in some tail of the sequence; in other words, if there exists n0 such thatP (an) is true for all n ≥ n0. We say that P holds frequently if every tail of the sequencecontains a term for which P holds; in other words, if for all n0 there exists n ≥ n0 such thatP (an) is true.

For example, you can check your understanding of these terms by working through thefollowing statements.

(1) (an) converges to c if and only if for every ε > 0, an ∈ Bε(c) eventually.(2) (an) has a subsequence converging to c if and only if for every ε > 0, an ∈ Bε(c)

frequently.

Exercise 11.7. Let (an) be a bounded real sequence, and let x ∈ R. Prove the following:

1. x < lim sup an =⇒ x < an frequently =⇒ x ≤ lim sup an

2. x > lim sup an =⇒ x > an eventually =⇒ x ≥ lim sup an

3. x < lim inf an =⇒ x < an eventually =⇒ x ≤ lim inf an

4. x > lim inf an =⇒ x > an frequently =⇒ x ≥ lim inf an

(The exercise is not only to prove the eight implications, but also to show that none of theseimplications can be reversed.)

12. Infinite limits and limits at infinity

There are innumerable ways in which a limit can fail to exist. One of these is “regular”enough to warrant special notation: divergence to (±) infinity.

Definition 12.1. Let X be a metric space, let x0 ∈ X ′, and let f : X → R. We say thatf diverges to infinity as x approaches x0 if for every M ∈ R there exists δ > 0 such thatfor all x ∈ X, if 0 < d(x, x0) < δ then f(x) > M . We write limx→x0 f(x) = ∞ in this case.Similarly, we say that f diverges to minus infinity as x approaches x0 if for every M ∈ Rthere exists δ > 0 such that for all x ∈ X, if 0 < d(x, x0) < δ then f(x) < M . We writelimx→x0 f(x) = −∞ in this case. An analogous definition is used for sequences.

Remark 12.2. It is important always to remember that ∞ and −∞ are not real numbers.However, a limited portion of the arithmetic of real numbers can be usefully extended toinclude these two symbols. The conventions are as follows.

• For x ∈ R, x±∞ = ±∞.• For x ∈ R with x 6= 0, x · ±∞ = ± sgn(x) · ∞.• For x ∈ R, x/±∞ = 0.• ∞+∞ =∞, ∞ ·∞ =∞.

On the other hand, certain combinations are expressly forbidden, under pain of writingnonsense:

∞−∞, ∞∞, 0 · ∞

are not defined.


With the above definition and remarks in mind, we can extend the arithmetic of limitsfrom Corollary 10.11 and Remark 10.12 to include infinite limits (and, of course, limits ofsequences as well as of functions). By this we mean that the limit of the sum/difference/pro-duct/quotient of two functions equals the sum/difference/product/quotient of the two limits,IF that arithmetic combination of the limits is permissible. We leave it as an exercise towrite a precise theorem and its proof.

A different use of the symbols ±∞ is in the description of limits at infinity.

Definition 12.3. Let X be a metric space, let f : R → X, and let x0 ∈ X. We writelimt→∞ f(t) = x0 if for every ε > 0 there exists M ∈ R such that for all t ∈ R with t ≥ Mwe have d

(f(t), x0

)< ε. There is a similar definition for limits at minus infinity.

Remark 12.4. We mention that in this context, the symbols ∞ and −∞ merely indicate“directions”, and are not to be thought of as “numbers” in any way.

13. Cauchy sequences and complete metric spaces

It may not have seemed important at the time, but the definition of convergence fora sequence has an unfortunate limitation. Namely, in order to check the definition, it isnecessary to have the limit in hand. In order to use sequences as a tool to study spaces,it would be very helpful to be able to give an internal characterization of convergence, onethat doesn’t refer to the limit itself. This motivation is not possible to carry out in general,but the idea that came from it is very important.

Definition 13.1. Let (X, d) be a metric space. A sequence (xn) in X is Cauchy if for everypositive real number ε, there exists n0 ∈ N such that for all m, n ≥ n0 we have d(xm, xn) < ε.

Informally, we say that the sequence is Cauchy if its terms can be made close to each othermerely by requiring them to be far enough out in the sequence. It is an exercise in the logicof quantifiers to convince yourself that the definition captures precisely the idea behind thisinformal statement.

The following lemma provides many examples of Cauchy sequences.

Lemma 13.2. A convergent sequence is Cauchy.

Proof. Let (xn) be convergent, with limit x. Let ε > 0 be given. By the definition ofconvergence there is n0 such that for all n ≥ n0, d(xn, x) < ε/2. Then if m, n ≥ n0 we haved(xm, xn) ≤ d(xm, x) + d(x, xn) < ε/2 + ε/2 = ε. Therefore (xn) is Cauchy. �

Example 13.3. Here is an example of a non-Cauchy sequence in R: let xn =√n. (Exercise:

prove that it’s not Cauchy.) But successive terms do get close to each other: |xn − xn+1| =1√

n+√n+1

< 2√n.

Example 13.4. Here is an example of a Cauchy sequence that does not converge. LetX = (0, 1) with the usual metric gotten from R. The sequence (1/n) in X is Cauchy butnot convergent. (Remember the definition of convergence (Definition 7.5): the limit has tobelong to the metric space.)

Example 13.5. Here is a more interesting example of a non-convergent Cauchy sequence.Let V be the vector space of all finite real sequences:

V ={

(x1, x2, . . .) : xi ∈ R, there exists i0 such that for all i > i0, xi = 0}.

30 JACK SPIELBERG

We define a norm on V by ‖x‖ =(∑∞

i=1 x2i

)1/2(note that the sum is actually finite). It’s

easy to see that this is a norm: the properties defining a norm only involve finitely manyvectors at a time, and then the required property actually occurs in some Euclidean space,where we already know the properties hold. Now, let

vn =(1

2,1

4,1

8, . . . ,

1

2n, 0, 0, 0, . . .

)∈ V.

If m < n, we have

‖vm − vn‖2 = ‖(0, 0, . . . , 0, 1

2m+1, . . . ,

1

2n, 0, 0, . . .)‖2

=n∑

i=m+1

( 1

2i)2

=1

4m+1

n−m−1∑i=0

1

4i<

1

4m.

Thus (vn) is Cauchy in V . But we claim that (vn) does not converge. To prove this, lety = (yn) be an arbitrary vector in V . There is k such that yi = 0 for i > k. For n > k,

‖y − vn‖2 =∞∑i=1

(yi − vni)2 =n∑i=1

(yi −

1

2i)2 ≥ (yk+1 −

1

2k+1

)2=

1

4k+1.

Thus d(vn, y) ≥ 2−(k+1) for all n > k. Therefore vn 6→ y.

Definition 13.6. A metric space is called complete if every Cauchy sequence converges.

Theorem 13.7. Rn is complete.

We will give the proof after a couple of lemmas about Cauchy sequences in general metricspaces.

Lemma 13.8. A Cauchy sequence is bounded.

Proof. Let (an) be a Cauchy sequence. Then there is L such that d(am, an) < 1 for all m,n ≥ L. Let R = max

{d(a1, aL), . . . , d(aL−1, aL)

}+ 2. Then d(an, aL) < R for all n, and

hence (an) is bounded �

Lemma 13.9. A Cauchy sequence having a convergent subsequence is convergent.

Proof. Let (an) be a Cauchy sequence, and let (ani) be a convergent subsequence, with limit

c. We claim an → c. Let ε > 0. Since (an) is Cauchy there is L such that d(am, an) < ε/2for all m, n ≥ L. By the definition of convergence, there is i0 such that d

(ani, c)< ε/2 for

all i ≥ i0. Let i1 ≥ i0 be such that ni1 ≥ L. Then for any n ≥ ni1 we have

d(an, c) ≤ d(an, ani1) + d(ani1

, c) <ε

2+ε

2= ε.

Hence an → c. �

Proof. (of Theorem 13.7) We first show that R is complete. Let (an) be a Cauchy sequencein R. By Lemma 13.8 we know that (an) is bounded. By Theorem 11.3 we know that (an)has a convergent subsequence. Then by Lemma 13.9 we know that (an) converges. Thus Ris complete. Now it follows easily from Remark 4.12 that Rn is complete (the details are leftas an exercise). �

Exercise 13.10. A closed subset of a complete metric space is complete.


Exercise 13.11. Let (X, d) be a metric space. Recall the diameter of a subset of X fromExercise 5.19.

(1) Suppose that X is complete. Prove that for every decreasing sequence

F1 ⊇ F2 ⊇ · · ·

of nonempty closed subsets of X with limn→∞ diam(Fn) = 0, there exists an elementa ∈ X such that

∞⋂n=1

Fn = {a}.

(2) (converse of part (a)) Suppose that whenever F1, F2 . . . are nonempty closed subsetsof X such that F1 ⊇ F2 ⊇ · · · and limn→∞ diam(Fn) = 0, then

⋂∞n=1 Fn 6= ∅. Prove

that X is a complete metric space.

14. Compactness

Compactness is probably the most important concept in analysis. It can be described invarious ways. The “right” way is not necessarily the easiest to understand. Before we givethe definition, here is some motivation for why it is reasonable. The basic problem thatcompactness addresses is the transition from local information to global information. Thatmay sound cryptic, and it is meant to be a catchy phrase that will become more intelligibleas you get more used to these ideas. But it isn’t hard to see what it is about. Local (near apoint) means in an open ball centered at that point. Here is a simple example of using thisterminology. If a function is continuous at a point, then it is bounded in some open ballcentered at that point. Thus if a function is continuous on a set, it is bounded locally onthat set: each point in the set has a neighborhood on which the function is bounded. Onthe other hand, global (on a set) means on the whole set. A function is “globally bounded”if it is bounded on its domain, i.e. if it is a bounded function. Is every continuous functionbounded? Of course not! For example, a non-constant polynomial on R is continuous, butnot bounded. Local boundedness does not generally imply global boundedness. However ifthe domain of the polynomial is taken to be a closed bounded interval, then the extremevalue theorem from calculus implies that the polynomial is bounded on the interval. Thegreat insight was that it is a property of the domain that lets us pass from local boundednessto global boundedness, and this property is called compactness.

Now, recall what the word local means: in a neighborhood of a point. A property holdslocally on a set if for each point, there is an open ball centered at the point such that theproperty holds in that ball. If the set is infinite, this will give an infinite collection of openballs, one for each point. We could obtain the property globally if we had a finite collectionof balls instead of an infinite collection. Compactness of the set means that we can alwaysreduce to a finite collection.

You might notice that a lot of mathematics seems to proceed in this way: what would welike to have? Let’s give a name to the situation where we have what we want. Now let’sanalyze the situation to see what exactly we were asking for. In fact, compactness can bedescribed in a variety of ways that seem very different. That means that we can prove that aspace is compact using an easy description. Then we can use compactness via a complicateddescription.

OK, with that as motivation, here is the precise definition.

32 JACK SPIELBERG

Definition 14.1. Let X be a set. A cover of X is a collection of sets whose union containsX. If U is a cover of X, a subcover of U is a subcollection of U that is also a cover of X.

Example 14.2. (1) The set of all open intervals is a cover of R.(2)

{(a, b) : a < b, a, b ∈ Z

}is a subcover of example (1).

Definition 14.3. Let X be a metric space, and let E ⊆ X. An open cover of E is a coverof E whose elements are open subsets of X.

Definition 14.4. Let X be a metric space, and let E ⊆ X. E is compact if every opencover of E has a finite subcover.

Example 14.5. (1) Example 14.2(1) is an open cover of R having a finite subcover.(2) Example 14.2(2) is an open cover of R not having a finite subcover. In particular, it

follows that R is not compact.

Example 14.6. (1) Finite sets are compact.(2) {0, 1, 1/2, 1/3, . . .} is a compact subset of R.(3) [0, 1] is a compact subset of R (this is a special case of Corollary 14.30).

Proof. Let U be an open cover of [0,1]. Let E ={x ∈ [0, 1] : [0, x] is finitely covered

by U}

. Note that 0 ∈ E, so E 6= ∅. Let c = supE. Then c ∈ [0, 1]. We firstclaim that c ∈ E. To see this, choose U0 ∈ U with c ∈ U0. Then there exists r > 0such that (c − r, c + r) ⊆ U0. By the definition of supremum, there is y ∈ E withy > c− r. By definition of E there is a finite subcollection V ⊆ U with [0, y] ⊆

⋃V .

But then V ∪ {U0} is a finite subcollection of U covering [0, c], proving that c ∈ E.Now we note that, in fact, V ∪ {U0} covers [0, a] for any number a between c andc + r. Thus if c < 1 we could find a larger element of E than c, contradicting itsstatus as supremum. So we have shown that c = 1. Thus [0, 1] is finitely covered byU . �

(4) [0, 1) is not compact.

Proof.{

(−1, 1− 1n) : n ∈ N

}is an open cover not having a finite subcover. �

Definition 14.7. A metric space X is compact if X is a compact subset of itself.

By now our waffling use of the qualifier “subset” after the word “compact” may be causingsome trauma. We will remedy this now, but first we need the important notion of relativelyopen set.

Definition 14.8. Let X be a metric space. Recall that a subset E ⊆ X is also a metricspace (cf Definition 4.13). A subset of E is called relatively open (in E) if it is an opensubset of the metric space E.

Example 14.9. (1) Let X = R, and let E = [0, 1] ⊆ X. Then [0, 1/2) is relatively openin E, but not open in X.

(2) Let X = R2, and let E = R × {0} ⊆ X (we think of E as being the x-axis in R2).Then (0, 1)× {0} is just the usual open unit interval in the x-axis — it is relativelyopen in E, but is not open in X.

Lemma 14.10. Let X be a metric space, and let E ⊆ X. For U ⊆ E, U is relatively openin E if and only if there exists an open subset V of X such that U = E ∩ V .


Proof. We will use a superscript E to distinguish open balls in the metric space E from openballs in X. For a ∈ E and r > 0 we see that

BEr (a) =

{x ∈ E : d(x, a) < r

}={x ∈ X : d(x, a) < r

}∩ E = Br(a) ∩ E.

Thus U is relatively open in E if and only if for every x ∈ U there exists r(x) > 0 such thatBr(x)(x)∩E ⊆ U . In this case, we have that U =

(⋃x∈U Br(x)(x)

)∩E, and we may use the set

in parentheses for V . Conversely, suppose that U = V ∩E for some open set V of X. Thenfor a point x ∈ U there is r > 0 such that Br(x) ⊆ V . Then BE

r (x) = Br(x)∩E ⊆ V ∩E = U ,so we have that U is relatively open in E. �

Proposition 14.11. Let X be a metric space, and let E ⊆ X. E is a compact subset of Xif and only if E is a compact metric space.

Proof. Suppose that E is a compact subset of X. Let U be an open cover of (the metric space)E. By Lemma 14.10, for each U ∈ U there is an open set VU ⊆ X such that U = VU ∩ E.Then

E =⋃U∈U

U =⋃U∈U

(VU ∩ E) =(⋃U∈U

VU)∩ E,

and hence {VU : U ∈ U} is an open cover of E in X. By hypothesis this open cover hasa finite subcover. Thus there are U1, . . ., Uk ∈ U such that E ⊆ VU1 ∪ · · · ∪ VUk

. HenceE ⊆ U1∪ · · · ∪Uk, so that U has a finite subcover. Therefore the metric space E is compact.

The converse is left as an exercise. �

Thus compactness is an intrinsic property of a metric space, that cannot be lost when thespace is realized as a subspace of another metric space (in contrast to openness, which doesdepend on the ambient metric space, as seen in Example 14.9). We now develop the chiefproperties of compactness.

Proposition 14.12. A closed subset of a compact space is compact.

Proof. Let X be a compact metric space, and let E ⊆ X be a closed subset. Let U be anopen cover of E. Since E is closed, Ec is open. Then U ∪ {Ec} is an open cover of X. SinceX is compact, this open cover has a finite subcover. The subcover consists of finitely manysets from U , possibly together with Ec. But then the sets from U must cover E, so that Uhas a finite subcover (of E). Therefore E is compact. �

Exercise 14.13. It is a nice exercise to prove a sort of converse to this. Namely, a compactsubset of a metric space is closed. We won’t do it here, as this fact will follow from a laterresult (Corollary 14.20).

Proposition 14.14. A compact subset of a metric space is bounded.

Proof. Let E be a compact subset of the metric space X. Choose any point x0 ∈ X. Then{Bn(x0) : n = 1, 2, 3, . . .

}is an open cover of X, hence also of E. Since E is compact, there

is a finite subcover. But since the open balls increase with n, this means that there is n suchthat E ⊆ Bn(x0). Thus E is bounded. �

Of course, the converse of Proposition 14.14 is false.

Theorem 14.15. (Finite Intersection Property, or FIP) Let X be a compact metric space.Let {Ei}i∈I be a collection of nonempty closed subsets of X. Suppose that every finitesubcollection has nonempty intersection: for all k ∈ N, for all i1, . . ., ik ∈ I, we haveEi1 ∩ · · · ∩ Eik 6= ∅. Then ∩i∈IEi 6= ∅.

34 JACK SPIELBERG

Proof. Suppose not. Then taking complements we have ∪i∈IEci = X. This means that

{Eci : i ∈ I} is an open cover of X. Since X is compact there are i1, . . ., ik ∈ I with

Eci1∪ · · · ∪ Ec

ik= X. But then by complements again, we get that Ei1 ∩ · · · ∩ Eik = ∅, a

contradiction. �

Example 14.16. The theorem may fail if the sets are not closed: consider{

(0, 1/n) : n ∈N}

. This does have the FIP, but the intersection is empty.

Definition 14.17. A metric space X is sequentially compact if every sequence in X has aconvergent subsequence (convergent in X, of course).

Example 14.18. [a, b] is sequentially compact by Theorem 11.3, and the fact that [a, b] isclosed.

Theorem 14.19. A compact metric space is sequentially compact.

Corollary 14.20. A compact subset of a metric space is closed.

The proof of the theorem will be made easier by the following preliminary “computation.”

Lemma 14.21. Let (xn) be a sequence in a metric space, and let y be a point. Then (xn)has a subsequence converging to y if and only if for every ε > 0 and for every m ∈ N, thereexists n ≥ m such that d(xn, y) < ε.

Proof. (⇒): Suppose limi→∞ xni= y. Let ε > 0 and m ∈ N. By the hypothesized conver-

gence there is i0 such that d(xni, y) < ε whenever i ≥ i0. Since ni → ∞ as i → ∞ there

exists j ≥ i0 such that nj ≥ m. Then d(xnj, y) < ε. So nj is the desired ‘n’.

(⇐): Suppose the condition in the statement holds. We apply it repeatedly. First choosen1 such that d(xn1 , y) < 1. Then choose n2 > n1 such that d(xn2 , y) < 1/2. Continuingthis way we construct a subsequence (xni

)∞i=1 such that d(xni, y) < 1/i for all i. Evidently

xni→ y as i→∞. �

Proof. (of Theorem 14.19) We will prove the contrapositive of the statement in the theorem.So suppose that X is not sequentially compact. Then there is a sequence (xn) having noconvergent subsequence. Thus for all y ∈ X, (xn) does not have a subsequence converging toy. Negating the condition in Lemma 14.21, we find that for all y ∈ X there exists εy > 0 andthere exists ny ∈ N such that for all n ≥ ny, d(xn, y) ≥ εy. Let U =

{Bεy(y) : y ∈ X

}. U is

obviously an open cover of X. But if y1, . . ., yk ∈ X are any finite collection of points, choosen > max{ny1 , . . . , nyk}. Then d(xn, y) ≥ εyi for i = 1, . . ., k. Hence xn 6∈ ∪ki=1Bεyi

(yi). ThusU has no finite subcover. Therefore X is not compact. �

Proposition 14.22. A sequentially compact metric space is complete.

Proof. This follows from Lemma 13.9. �

Exercise 14.23. A metric space X is sequentially compact if and only if every infinite subsetof X has a cluster point.

We now turn to the role of boundedness for compact metric spaces. By way of introduction,we mention that the most famous result about compact metric spaces is the Heine-Boreltheorem: a subset of Rn is compact if and only if it closed and bounded. We will prove thislater, but now we want to point out that this result is special to Rn — it is NOT true inarbitrary metric spaces. The reason is that Rn is (duh!) finite dimensional. This may not


seem so special now, but many of the most important metric spaces in analysis are infinitedimensional, and you will surely run into them (maybe not today, maybe not tomorrow,but...yeah, yeah.)

Here is a simple part of the Heine-Borel theorem that we have essentially proved already.For E ⊆ R, if E is bounded then every sequence in E has a convergent subsequence. If E isboth closed and bounded, then the limit of the convergent subsequence must belong to E.Thus we see that for subsets of R, closed and bounded imply sequentially compact.

Here are two examples to show that for general metric spaces, boundedness is too weak anotion. The first is simple-minded, but the second is more interesting.

Example 14.24. (1) Let X be an infinite set with the discrete metric (Example 4.17).Then X is bounded, but not sequentially compact.

(2) Let V be the normed space of finite real sequences (Example 13.5). Then B1(0) isclosed and bounded, but not sequentially compact.

In fact, the situation is worse than might be realized if you just think about thenon-convergent Cauchy sequence from Example 13.5. Consider the sequence (en)in V , where en = (0, 0, . . . , 0, 1, 0, 0, . . .) (with 1 in the nth slot). This sequence iscontained in the unit ball of V , but does not even have a Cauchy subsequence.

These examples show that the problem with boundedness is that a huge space can hideinside a bounded set. The correct definition is the following.

Definition 14.25. A subset E of a metric space is called totally bounded if for every ε > 0there are finitely many balls of radius ε that cover E.

Remark 14.26. (1) The definition is unaffected by specifying the type of the balls (openvs. closed).

(2) A totally bounded subset of a metric space is bounded. A subset of a totally boundedset is totally bounded.

The proofs are left as exercises.

The next lemma shows what makes Rn so special.

Lemma 14.27. In Rn, every bounded subset is totally bounded.

Proof. Let E ⊆ Rn be bounded, and let ε > 0. Choose C > 0 such that E ⊆ [−C,C]n.Choose k > 2C

√n/ε. Write

[−C,C] =k⋃i=1

[−C +

2C(i− 1)

k,−C +

2Ci

k

]=

k⋃i=1

Si,

where S1, . . ., Sk are closed intervals of length 2C/k < ε/√n. Then

[−C,C]n = (S1 ∪ · · · ∪ Sk)× · · · × (S1 ∪ · · · ∪ Sk) =k⋃

i1,...,in=1

Si1 × · · · × Sik = ∪nk

j=1Fj,

where each Fj is a closed cube of side 2C/k. Then the diameter of each Fj, which equalsthe length of the diagonal of Fj, equals (2C/k)

√n < ε. Let xj ∈ Fj be arbitrary. Then

Fj ⊆ Bε(xj). It follows that E ⊆ [−C,C]n ⊆ ∪nk

j=1Bε(xj). �

We now return to our development of the properties of compactness.

36 JACK SPIELBERG

Proposition 14.28. A sequentially compact metric space is totally bounded.

Proof. We again prove the contrapositive. Suppose that X is a metric space that is nottotally bounded. Then there is a positive number ε such that X cannot be covered byfinitely many balls of radius ε. Let x1 ∈ X. Since X 6⊆ Bε(x1) there must be x2 ∈ X withd(x1, x2) > ε. Since X 6⊆ Bε(x1)∪Bε(x2) there must be x3 ∈ X with d(xi, x3) > ε for i < 3.Continuing this way we construct a sequence (xn) in X such that d(xi, xn) > ε for i < n.This sequence has no Cauchy subsequence, hence no convergent subsequence. Therefore Xis not sequentially compact. �

We now have almost all of the pieces of the main theorem on compactness in metric spaces.

Theorem 14.29. Let X be a metric space. The following are equivalent:

(1) X is compact.(2) X is sequentially compact.(3) X is complete and totally bounded.

Proof. (1)⇒(2) This is Theorem 14.19.(2)⇒(3) This follows from Propositions 14.22 and 14.28.(3)⇒(1) We prove this by contradiction. Let X be complete and totally bounded, andsuppose that X is not compact. Then there is an open cover U having no finite subcover.We first use total boundedness. There is a finite collection C1 of closed balls of radius 1covering X. There must be a ball B1 ∈ C1 such that B1 is not finitely covered by U —otherwise X would be finitely covered by U . Now since B1 is totally bounded there is afinite collection C2 of closed balls of radius 1/2 covering B1. There must exist B2 ∈ C2 suchthat B1 ∩ B2 is not finitely covered by U . Continuing this process we construct a sequenceB1, B2, . . . of closed balls such that Bi has radius 1/i and such that for each i, B1 ∩ · · · ∩Bi

is not finitely covered by U .Now we use completeness of X: exercise 13.11 implies that there is a point a ∈ ∩∞i=1Bi.

Choose U0 ∈ U with a ∈ U0. Since U0 is open there is r > 0 with Br(a) ⊆ U0. Let n > 2/r.We claim that Bn ⊆ Br(a). To see this, let y ∈ Bn. Then d(y, a) < diam (Bn) ≤ 2/n <r. This proves the claim, and hence we have Bn ⊆ U0. Therefore B1 ∩ · · · ∩ Bn ⊆ U0,contradicting the fact that B1 ∩ · · · ∩Bn is not finitely covered by U . �

Corollary 14.30. (Heine-Borel theorem) Let E ⊆ Rn. Then E is compact if and only if Eis closed and bounded.

Proof. Since Rn is complete (Theorem 13.7), E is complete if and only if it is closed. ByLemma 14.27 (and the remark preceding that Lemma), E is totally bounded if and only ifit is bounded. �

15. Continuity and compactness

Theorem 15.1. Let X and Y be metric spaces, and let f : X → Y be a continuous function.If X is compact then so is f(X).

Proof. Let V be an open cover of f(X). Then f−1(V) ={f−1(V ) : V ∈ V

}is an open cover

of X. Since X is compact, f−1(V) has a finite subcover. Thus there are V1, . . ., Vk ∈ Vsuch that X = ∪ki=1f

−1(Vi). But then f(X) ⊆ ∪ki=1Vi. Thus V admits the finite subcover{V1, . . . , Vk}. �


Exercise 15.2. One can also prove this theorem using sequences and sequential compact-ness.

Corollary 15.3. If X is compact and f : X → Y is continuous, then f(X) is a closedbounded subset of Y (in fact, totally bounded).

Corollary 15.4. (Extreme value theorem) Let X be a compact metric space, and let f :X → R be continuous. Then f achieves its maximum and minimum at points of X: thereexist x0, x1 ∈ X such that for all x ∈ X, f(x0) ≤ f(x) ≤ f(x1).

Proof. A (non-empty) closed bounded subset of R contains its infimum and supremum. �

Corollary 15.5. A continuous (R-valued) function on a closed bounded interval has a max-imum and a minumum.

Definition 15.6. Let X and Y be metric spaces, and let f : X → Y . f is an open map iff(A) is an open subset of Y whenever A is an open subset of X. f is a closed map if f(A)is a closed subset of Y whenever A is a closed subset of X.

Remark 15.7. Note that the above definitions refer to the forward set map defined by f ,which is less well behaved than the reverse set map. For the reverse map, the analogousproperties are equivalent to continuity (Theorem 8.5 and Exercise 8.6).

Theorem 15.8. Let X be compact, and let f : X → Y be continuous. Then f is a closedmap.

Proof. The proof is an exercise. �

Example 15.9. (1) Let T be the unit circle, and let f : [0, 1] → T be given by f(t) =(cos 2πt, sin 2πt). Then f is continuous but not an open map: [0, 1/2) is an opensubset of [0, 1], but f

([0, 1/2)

)is not an open subset of T, since it contain its non-

interior point (1, 0).(2) Define g : [0,∞)→ T by g(t) =

(cos 2πt

t+1, sin 2πt

t+1

). Then g is bijective and continuous,

but is neither a closed map nor an open map: [1,∞) is a closed subset of [0,∞), butf([1,∞)

)is not a closed subset of T since it does not contain its limit point (1, 0).

As in the previous example, [0, 1) is an open subset of [0,∞), but f([0, 1)

)is not an

open subset of T.

Theorem 15.10. Let X and Y be metric spaces with X compact, and let f : X → Y becontinuous and bijective. Then f is an open map.

Proof. Let U ⊆ X be open. Then U c is closed, hence compact. Therefore f(U c) is compact,hence closed. But f(U c) = f(U)c since f is bijective. Therefore f(U) is open. �

Corollary 15.11. In the above theorem, f−1 is continuous.

16. Connectedness

Let’s recall for a moment Example 8.8(5): T and [0, 1] are not homeomorphic metricspaces. How might we go about proving this? A clever observation is the following: if weremove a point from T, the result is still “one piece” (in fact, it is easy to see that for anyz ∈ T, T \ {z} is homeomorphic to R). On the other hand, if we remove a point from [0, 1](other than one of the two endpoints), the result “consists of two pieces”. It is an even

38 JACK SPIELBERG

cleverer observation that it is not very easy to say more precisely what we mean by “consistsof two pieces”. For example, any set containing more than one point can be divided into twononempty disjoint pieces. But surely, the divsion [0, 1] \ {1

2} = [0, 1

2)t (1

2, 1] is a special way

of dividing a set into two pieces. What is special about it?We need a topological property, and the following is the right one: no sequence in one

of the pieces can converge to a point of the other. Well, this is clearly true of the divisionof [0, 1] \ {1

2} described above. But it pushes the problem back over to the other side: can

we prove that it is not possible to divide R into two nonempty disjoint pieces such that nosequence in one piece can converge to a point of the other piece?

At some point, we just have to bite the bullet and try to prove a hard result. In this sectionwe will do this, and prove the fact about R stated in the previous paragraph. This is a deepconsequence of the completeness axiom. The relevant property of R is called connectedness.As the above discussion has indicated, connectedness is a sort of “negative” property. Wewill begin with the corresponding “positive” property. First, notice that to say that nosequence in A converges to a point of B is the same thing as saying that A∩B = ∅. We usethis for our definition (notice that disjointness of A and B is implied).

Definition 16.1. Let X be a metric space. We call X separated if there exist nonemptysubsets A and B such that A ∪ B = X and A ∩ B = ∅ = A ∩ B. X is called connected if itis not separated.

Remark 16.2. If E ⊆ X is a subset, we call E separated (or connected) if as a metric spacein its own right E has that property. We note that in the above definition of separation,if A and B are subsets of E with union equal to E, the closures may be taken relative toE, or in X — the intersections A ∩ B and A ∩ B will be the same. Thus being separatedor connected is an intrinsic property of E; it does not depend on whether E is given as asubspace of another metric space.

There is another way to describe connectedness. Suppose that the metric space X isseparated, and let A and B be subsets as in the definition. Since X = A∪B, we know thatX = A∪B. Since A∩B = ∅, then A = (B)c. Thus A is an open set in X. Since A∩B = ∅also, we know that B = Ac, hence B is closed. By the symmetry of the situation we knowthat A is also closed, and B is open.

Definition 16.3. Let X be a metric space. A subset of X is clopen if it is both closed andopen.

Lemma 16.4. The metric space X is separated if and only if it contains a proper nonemptyclopen subset. X is connected if and only if its only clopen subsets are X and ∅.

Proof. The proof is elementary, and we leave it as an exercise. �

Remark 16.5. Let X be a metric space, and let E ⊆ X. What does it mean for A ⊆ E tobe relatively clopen in E? We know that A is relatively open in E if and only if A = E ∩ Ufor some open set U ⊆ X. Similarly, one can check that A is relatively closed in E if andonly if A = E∩K for some closed set K ⊆ X. Thus A is relatively clopen in E if and only ifthere are two sets U and K in X, with U open and K closed, such that A = E∩U = E∩K.(Note that it is NOT NECESSARILY true that A equals the intersection of E with a clopensubset of X.)

Exercise 16.6. The Cantor set (Definition 6.1) is not connected.


We now identify the connected subsets of R.

Definition 16.7. An interval is a subset I ⊆ R such that for all a < c < b in R, if a, b ∈ Ithen c ∈ I. (I.e. an interval is a subset of R that is closed under ‘betweenness’.)

Example 16.8. The following are intervals (for any a ≤ b in R):

(a, b) [a, b] [a, b) (a, b] ∅(a,∞) [a,∞) (−∞, b) (−∞, b] R.

Lemma 16.9. Every interval is of one of the forms in Example 16.8.

Proof. Let I be a nonempty interval. Choose c ∈ I. Let B ={x ∈ R : [c, x] ⊆ I

}. B is

nonempty since c ∈ B. Let

b =

{supB, if B is bounded above,

∞, else.

If b ∈ I and b ≥ c, then [c, b] ⊆ I and (b,∞) ⊆ Ic. If b 6∈ I, then [c, b) ⊆ I and [b,∞) ⊆ Ic.Similarly, define a by working on the left of c. There are four cases altogether, and I ispresented as one of the forms in Example 16.8 in each case. �

Theorem 16.10. Let I ⊆ R. Then I is connected if and only if I is an interval.

Proof. (=⇒): Suppose that I is not an interval. Then there are a < c < b in R with a, b ∈ Iand c 6∈ I. Put A = (−∞, c) ∩ I. Then A 6= ∅, A 6= I, and A = I ∩ (−∞, c) = I ∩ (−∞, c]is clopen in I.(⇐=): Suppose that I is an interval, but that I is not connected. Let E ⊆ I be a propernonempty clopen subset of I. Then there are an open set U ⊆ R and a closed set K ⊆ Rsuch that E = I ∩U = I ∩K. Let a ∈ E and b ∈ I \E. We may as well assume that a < b.Then since I is an interval, we know that [a, b] ⊆ I. We have

E ∩ [a, b] = I ∩K ∩ [a, b] = K ∩ [a, b];

(I \ E) ∩ [a, b] =(I \ (I ∩ U)

)∩ [a, b] = (I \ U) ∩ [a, b] = [a, b] \ U.

Thus E ∩ [a, b] and (I \ E) ∩ [a, b] are closed subsets of R. Let c = sup(E ∩ [a, b]

). Then

c ∈ E∩[a, b] since this set is closed. Also, c < b since b 6∈ E. Hence (c, b] ⊆ (I \E)∩[a, b], andso c ∈ (I\E)∩[a, b] since this set is closed. This leads to the contradiction c ∈ E∩(I\E). �

The following theorem is very useful, and we place it here because it deals with intervals(although it is not a result about connectedness).

Theorem 16.11. Let U ⊆ R be open. Then U equals the union of countably many openintervals. Moreover, U can be written as the union of a countable collection of pairwisedisjoint open intervals, and this collection is unique.

Proof. For x ∈ U choose a(x), b(x) ∈ Q with x ∈(a(x), b(x)

)⊆ U . Let E =

{(a(x), b(x)

):

x ∈ U}

. Then E is a collection of open intervals. Since E ⊆{

(α, β) : α, β ∈ Q, α < β}�

Q2, we see that E is a countable collection. It is clear that U =⋃E .

The proof of the second statement of the Theorem is left as an exercise. �

40 JACK SPIELBERG

17. Continuity and connectedness

Theorem 17.1. Let X and Y be metric spaces, and let f : X → Y be continuous. Supposethat X is connected. Then f(X) is connected.

Proof. Since f is continuous, f−1 preserves openness and closedness, hence clopenness. SinceX is connected, f−1(E) is clopen if and only if it equals X or ∅. Therefore any nonemptyclopen set in f(X) must equal f(X). �

Corollary 17.2. (Intermediate value theorem) Let X be a connected metric space, andf : X → R a continuous function. Let a, b ∈ X, and let t lie between f(a) and f(b). Thenthere exists x ∈ X such that f(x) = t.

Proof. By Theorem 17.1, f(X) is a connected subset of R, hence an interval. �

Example 17.3. The following is a typical “practical” illustration of the corollary. Supposethat the temperature in Phoenix is 110 degrees, and at the same instant the temperature inLa Paz is 2 degrees. Then there must be a place on the earth’s surface where the temperature(at the same instant) is exactly π degrees.

Example 17.4. Let n ∈ N, and define f : [0,∞) → [0,∞) by f(t) = tn. Since f iscontinuous and [0,∞) is connected, it follows from Theorem 17.1 that f

([0,∞)

)is connected.

Let x > 0. There is k ∈ N with x < k. Then 0 < x < kn. Since 0, kn ∈ f([0,∞)

), then

x ∈ f([0,∞)

). Therefore there exists y > 0 such that x = f(y). This is a new proof of the

existence of nth roots (compare with the proof of Theorem 1.22).Now let b > 0, and consider the restriction of f : fb := f

∣∣[0,b]

: [0, b] → [0, bn]. Since [0, b]

is compact and fb is continuous and bijective, it follows from Corollary 15.11 that (fb)−1 is

continuous. This is true for all b > 0, and hence we have proved that f−1 is continuous.(f−1(x) = n

√x.)

Definition 17.5. The metric space X is path connected if for any two points a1, a2 ∈ X,there is a continuous function f : [t1, t2]→ X such that f(ti) = ai, for i = 1, 2.

Proposition 17.6. If X is path connected, then X is connected.

Proof. Let A ⊆ X be a nonempty clopen subset. Let a ∈ X. For any x ∈ X, thereis a continuous function f : [0, 1] → X such that f(0) = a and f(1) = x. Since A isclopen, f−1(A) is a clopen subset of [0, 1], and is nonempty since it contains 0. Since [0, 1] isconnected, we have 1 ∈ [0, 1] ⊆ f−1(A), and hence that x = f(1) ∈ f([0, 1]) ⊆ A. ThereforeA = X. �

Definition 17.7. Let V be a real vector space. For x, y ∈ V let Sx,y ={

(1− t)x+ ty : t ∈[0, 1]

}. (Sx,y is the line segment connecting x and y.) A subset E ⊆ V is called convex if

Sx,y ⊆ E whenever x, y ∈ E.

Theorem 17.8. Let V be a real normed vector space. Every convex subset of V is connected.

Proof. For any x, y ∈ E, the function f : [0, 1] → V defined by f(t) = (1 − t)x + ty iscontinuous. Thus f : [0, 1]→ E, and so E is path connected. �

Corollary 17.9. Any convex subset of Rn is connected. (For example, any ball in Rn isconnected.)


Definition 17.10. Let D ⊆ Rn, and let f : D → Rm be a function. The graph of f is theset G(f) ⊆ Rn+m given by G(f) =

{(x, f(x)

): x ∈ D

}.

Proposition 17.11. If D ⊆ Rn is connected, and f : D → Rm is continuous, then G(f) isconnected.

Proof. Define g : D → Rn+m by g(x) =(x, f(x)

). Then g is continuous, since all of its

coordinate functions are continuous (being either a coordinate of x, or a coordinate functionof the continuous function f). By Theorem 17.1, g(D) is connected. But g(D) = G(f). �

Example 17.12. (1) The unit circle T is connected by Theorem 17.1, being the imageof [0, 1] under the continuous function (cos 2πt, sin 2πt).

(2) The graph of sin(1/x) for x > 0 is connected, by Proposition 17.11. Let E denotethis graph: E =

{(x, sin(1/x)

): x > 0

}. Let F = {0}× [−1, 1]. F is also connected,

being convex. It follows from Exercise 17.13 below that the union E∪F is a connectedset. It is a nice exercise to prove that it is not path connected. You should draw apicture (and do the exercise) in order to appreciate this bizarre example.

(3) In the last example, delete the portion of E for x > π, then include a curve below thewiggly graph, connecting (π, 0) to (0,−1). The new set is called the Warsaw circle.It is path connected, but there does not exist a path going “once around”.

Exercise 17.13. Let A be a connected subset of a metric space, and let A ⊆ B ⊆ A. ThenB is connected.

Exercise 17.14. Let X be a metric space, let A ⊆ X be a connected subset, and let E ⊆ Xbe a clopen subset. Then either A ∩ E = ∅, or A ⊆ E. (Thus if a clopen set touches aconnected set, it must contain all of it.)

Exercise 17.15. Let {Ai : i ∈ I} be subsets of a metric space. If all of the Ai are connected,and if ∩i∈IAi 6= ∅, then ∪i∈IAi is connected.

Exercise 17.16. Let E be the following subset of R2:

E =((0, 1]× {0}

)∪

(∞⋃n=1

({ 1n} × [0, 1]

))∪ {(0, 1)}.

Then E is connected, but not path connected.

Theorem 17.17. Let X be a metric space. For x ∈ X let

C(x) =⋃{

A ⊆ X : x ∈ A and A is connected}.

(1) C(x) is connected.(2)

{C(x) : x ∈ X

}is a partition of X.

(3) C(x) is a closed set.(4) C(x) is a maximal connected subset of X.

Proof. (1) The sets A in the union defining C(x) all contain x. Thus C(x) is connected byLemma 17.15.(2) Suppose that C(x) ∩ C(y) 6= ∅. By Lemma 17.15, C(x) ∪ C(y) is connected. Since itcontains x it is one of the sets A in the union defining C(x). Thus C(x)∪C(y) ⊆ C(x), andwe have that C(y) ⊆ C(x). By symmetry, C(x) ⊆ C(y), so that C(x) = C(y).

42 JACK SPIELBERG

(3) By (1) and Exercise 17.13, C(x) is connected. Since x ∈ C(x), C(x) is one of the sets in

the union defining C(x); thus C(x) ⊆ C(x).(4) Any connected set containing C(x) is one of the sets in the union defining C(x), andhence must equal C(x). �

Definition 17.18. Let X be a metric space. A component of X is a maximal connectedsubset. Thus the components of X are the sets C(x) from Theorem 17.17.

Theorem 17.19. Let U ⊆ Rn be open. Then U has countably many components, and theseare open sets.

Proof. Let x ∈ U , and y ∈ C(x). Since U is open there is r > 0 such that Br(y) ⊆ U . ThenC(x)∪Br(y) is connected by Lemma 17.15 (and Corollary 17.9). Then C(x)∪Br(y) ⊆ C(x)by the definition of C(x), hence Br(y) ⊆ C(x). Thus C(x) is open.

Since the components of U are open, we may choose an element of Qn in each one. Thisdefines a map from the set of components to Qn. Since the distinct components are disjoint,this map is one-to-one. Since Qn is countable, so is the set of components. �

18. Uniform continuity

Continuity is a locally defined property. Suppose that f : X → Y is continuous. Ifε > 0 is given, and if a point x0 ∈ X is given, then continuity of f at x0 provides apositive number δ with a certain property (Definition 8.1). The local-ness is expressed inthe order of the quantifiers in that definition (and as we have rephrased it above): thenumber δ need only do its job for the one point x0 already chosen. In fact, this means thatδ (perhaps slightly modified) works throughout some ball centered at x0. A(n open) ballcentered at x0 is a neighborhood of x0. A property is local if each point has a neighborhoodin which the property holds. A globally defined property, on the other hand, is one thatholds everywhere. Continuity would be globally defined if the same δ worked for all pointsof X. Not all continuous functions have such a strong form of continuity; those that do havea special name.

Definition 18.1. Let X and Y be metric spaces, and let f : X → Y be a function. f isuniformly continuous if for every ε > 0 there exists δ > 0 such that for all x1, x2 ∈ X, ifdX(x1, x2) < δ then dY

(f(x1), f(x2)

)< ε.

Note that the only difference between this definition and the definition of continuity on Xis in the order in which the point and the δ are specified. Some examples will help to clarifythis.

Example 18.2. (1) Let f : [−10, 10] → R be given by f(t) = t2. Then f is uniformlycontinuous.

Proof. Let ε > 0 be given. Let δ = ε/20. If t1, t2 ∈ [−10, 10] are such that |t1−t2| < δ,then

∣∣f(t1)− f(t2)∣∣ = |t21 − t22| = |t1 + t2| · |t1 − t2| <

(|t1|+ |t2|

)δ ≤ 20δ = ε. �

(2) Let g : R→ R be given by g(t) = t2. Then g is not uniformly continuous.

Proof. We choose ε = 1. Let δ > 0 be given. Choose t > 1/δ, and let s = t + δ/2.Then |s− t| = δ/2 < δ, while |s2− t2| = |s− t| · |s+ t| = (δ/2)(2t+δ/2) > δt > 1 = ε.Therefore g is not uniformly continuous. �


(3) Let h : (0, 1)→ R be given by h(t) = sin(1/t). Then h is not uniformly continuous.

Proof. We choose ε = 2. Let δ > 0 be given. Choose n > 1/√δ. Let s = 2/[(2n+1)π]

and let t = 2/[(2n+ 3)π]. Then

|s− t| = 2

π

(1

2n+ 1− 1

2n+ 3

)=

2

π

(2

(2n+ 1)(2n+ 3)

)≤ 1

n2< δ.

But h(s)− h(t) = 1− (−1) = 2 ≥ ε. Therefore h is not uniformly continuous. �

The following theorem is a classic use of compactness to get a global result from localinformation.

Theorem 18.3. Suppose f : X → Y is continuous, and X is compact. Then f is uniformlycontinuous.

Proof. Let ε > 0 be given. Since f is continuous, for each x ∈ X there is rx > 0 such thatf(Brx(x)

)⊆ Bε/2

(f(x)

). The collection

{Brx/2(x) : x ∈ X

}is an open cover of X. Since X

is compact, there are x1, . . ., xn ∈ X such that X =⋃ni=1Brxi/2

(xi). Let δ = min{rxi/2 :1 ≤ i ≤ n}. Let y, z ∈ X with d(y, z) < δ. There is i such that d(y, xi) < rxi/2. Thend(z, xi) ≤ d(z, y) + d(y, xi) < δ + rxi/2 ≤ rxi . Then f(y), f(z) ∈ Bε/2

(f(xi)

), so that

d(f(y), f(z)

)< ε. �

19. Convergence of functions

Definition 19.1. Let X be a set. (Note that we really do mean set. Later we will let Xbe a metric space, but for now, that is not relevant.) Let fn : X → Rk for n = 1, 2, 3, . . ..(We remark that Rk may be replaced by another metric space. For ease of exposition, werestrict our attention to the case where the codomain is Euclidean space.) For a ∈ X we saythat (fn) converges at a if

(fn(a)

)∞n=1

is a convergent sequence in Rk. If (fn) converges at

each point of x, define f : X → Rk by f(x) = limn→∞ fn(x). We say that (fn) converges tof (pointwise).

We may specify this more precisely as: for every ε > 0, for every x ∈ X, there existsn0 ∈ N such that for all n ≥ n0, ‖fn(x) − f(x)‖ < ε. (Note that n0 ≡ no(ε, x) depends onboth ε and on x.)

Example 19.2. (1) Let fn : [0, 1]→ R be given by fn(x) = 1nx. Then fn → 0.

(2) Let gn : [0, 1]→ R be given by gn(x) = xn. Then gn → g, where

g(x) =

{0, if x < 1,

1, if x = 1.

Definition 19.3. Let f , fn : X → Rk. We say that (fn) converges to f uniformly (on X)if for each ε > 0, there exists n0 ∈ N such that for every x ∈ X, and for every n ≥ n0,‖fn(x)− f(x)‖ < ε. (Note that n0 ≡ n0(ε) depends only on ε.)

Formally, the difference between pointwise convergence and uniform convergence is onlyin the order of the two quantifed variables n0 and x. The difference practically, however, isprofound, and it is important that you get a good feel for it.

Example 19.4. (1) 1nx→ 0 uniformly on [0, 1].

(2) xn 6→ 0 uniformly on [0, 1].

44 JACK SPIELBERG

Proof. Let ε = 1/2. Let n0 be given. We choose n = n0. Since limt→1 tn = 1, there

is x ∈ [0, 1) such that xn > 1/2. Then∣∣gn(x)− g(x)

∣∣ = gn(x)− 0 > 1/2 = ε. �

(3) 1nx 6→ 0 uniformly on R.

It is useful to have an intrinsic characterization for uniform convergence, i.e. a Cauchycondition.

Definition 19.5. Let X be a set, and let fn : X → Rk be functions for n ∈ N. (fn) isuniformly Cauchy (on X) if for each ε > 0, there exists n0 ∈ N such that for all x ∈ X, andfor all m, n ≥ n0, ‖fm(x)− fn(x)‖ < ε.

Proposition 19.6. If (fn) is uniformly Cauchy, then (fn) is uniformly convergent.

Proof. Let ε > 0. Choose n0 such that for all m, n ≥ n0, and for all x ∈ X, ‖fm(x)−fn(x)‖ <ε/2. This shows that for each x ∈ X, the sequence

(fn(x)

)∞n=1

is Cauchy in Rk. Since Rk

is complete,(fn(x)

)∞n=1

converges. Define f : X → Rk by f(x) = limn→∞ fn(x). If n ≥ n0,then for all x ∈ X we have

‖fn(x)− f(x)‖ = limm→∞

‖fn(x)− fm(x)‖, since y ∈ Rk 7→ ‖z − y‖ ∈ R is continuous,

≤ ε

2< ε.

Therefore fn → f uniformly on X. �

Now we derive consequences when X is a metric space.

Theorem 19.7. Let X be a metric space, let f , fn : X → Rk, and suppose that fn → funiformly on X. Let a ∈ X, and suppose that fn is continuous at a for all n ∈ N. Then fis continuous at a.

Proof. Let ε > 0. Choose n such that for all x ∈ X, ‖fn(x) − f(x)‖ < ε/3. since fn iscontinuous at a, there is δ > 0 such that ‖fn(x)− fn(a)‖ < ε/3 whenever d(x, a) < δ. Nowlet x ∈ X with d(x, a) < δ. We have

‖f(x)− f(a)‖ ≤ ‖f(x)− fn(x)‖+ ‖fn(x)− fn(a)‖+ ‖fn(a)− f(a)‖ε

3+ε

3+ε

3= ε,

(where the first and third occurrences of ε/3 are due to the uniform approximation of f byfn, and the second is due to the continuity of fn at a). Therefore f is continuous at a. �

Corollary 19.8. The uniform limit of continuous functions is continuous.

Example 19.9. (1) Consider the sequence of functions xn on [0, 1]. We have seen thatthis sequence has a pointwise limit, which is not continuous. Since xn is continuousfor each n, the theorem implies that the convergence is not uniform (this is an easierproof than the direct proof we gave earlier).


(2) The above argument cannot be used in reverse. For example, let fn : [0, 1] → R begiven by

fn(x) =

2nx, if 0 ≤ x ≤ 1

2n

−2n(x− 12n

), if 12n≤ x ≤ 1

n

0, if 1n≤ x ≤ 1.

(It will be helpful to draw a picture.) Then fn → 0 pointwise on [0, 1], but notuniformly, even though the limit is continuous.

Example 19.10. Recall function space from Example 4.6: if X is a set, B(X,Rk) is thevector space of all bounded function from X to Rk. B(X,Rk) is a normed vector space, withnorm given by ‖f‖ = supx∈X ‖f(x)‖. Thus B(X,Rk) is a metric space.

Proposition 19.11. Let f , fn : X → Rk be bounded functions.

(1) fn → f in B(X,Rk) if and only if fn → f uniformly on X.(2) (fn) is Cauchy in B(X,Rk) if and only if (fn) is uniformly Cauchy on X.

Proof. This follows immediately from the definitions. �

Corollary 19.12. B(X,Rk) is a complete metric space.

Proof. This follows from Proposition 19.6 and the above proposition. �

Definition 19.13. LetX be a metric space. Cb(X,Rk) is the space of all bounded continuousfunctions from X to Rk.

Note that Cb(X,Rk) is a vector subspace of B(X,Rk), since the sum and (scalar) productof continuous functions is continuous.

Proposition 19.14. Cb(X,Rk) is a complete metric space.

Proof. This follows from Corollary 19.8. �

Remark 19.15. If X is a compact metric space, then C(X,Rk) = Cb(X,Rk).

20. Differentiation

Definition 20.1. Let I ⊆ R be open, let f : I → R, and let a ∈ I. f is differentiable at a if

limx→a

f(x)− f(a)

x− aexists (equivalently, if limh→0

(f(a + h)− f(a)

)/h exists). The limit is called the derivative

of f at a, and is denoted f ′(a) (or dfdx

(a), or dfdx

∣∣x=a

). We say that f is differentiable on I if

it is differentiable at each point of I. We refer to the quantity(f(x)− f(a)

)/(x− a) as the

difference quotient.

Suppose that f is differentiable at a. Let L(x) = f(a) + f ′(a)(x − a) (L is a “linearfunction”, in that its graph is a straight line). The function f is well-approximated by L inthe following sense:

f(a) = L(a)(3)

limx→a

f(x)− L(x)

x− a= 0.(4)

46 JACK SPIELBERG

Remark 20.2. There exists at most one linear function L having these properties. Unique-ness is an exercise, while existence is equivalent to differentiability.

There is a third equivalent formulation of differentiability. We motivate it as follows. Letf be differentiable at a. Define u : I → R by

u(x) =

{f(x)−f(a)−f ′(a)(x−a)

x−a , if x 6= a

0, if x = a.

Then limx→a u(x) = limx→af(x)−L(x)

x−a = 0, so that u is continuous at a. Moreover, f(x) =f(a) + f ′(a)(x− a) + u(x)(x− a). Thus we see that if f is differentiable at a, then f differsfrom L by a function that tends to zero as x tends to a, even when divided by x− a.

Theorem 20.3. f is differentiable at a if and only if there exist a linear function L(x) =m(x− a) + b, and a function u(x), such that

(1) u(a) = 0.(2) u is continuous at a.(3) f(x) = L(x) + u(x)(x− a).

In this case, f ′(a) = m (and of course, b = f(a)).

Proof. The ‘only if’ direction was proved in the remarks before the statement of the theorem.For the ‘if’ direction, let L and u be as in the statement of the theorem. Letting x = a inthe third item of the statement gives f(a) = b. Then dividing by x− a, and letting x→ a,we get

limx→a

f(x)− f(a)

x− a= lim

x→a

m(x− a) + u(x)(x− a)

x− a= lim

x→a

(m+ u(x)

)= m,

since u is continuous at a with value 0. �

We now present some basic properties of differentiation.

Lemma 20.4. If f is differentiable at a, then f is continuous at a.

Proof.

limx→a

f(x) = limx→a

(f(x)−f(a)

)+f(a) = lim

x→a

f(x)− f(a)

x− a(x−a)+f(a) = f ′(a) ·0+f(a) = f(a).

�

Lemma 20.5.d

dx(kx+ `) = k.

Proof. (exercise) �

Lemma 20.6. If f and g are both differentiable at a, then so are f + g, fg, and f/g (ifg(a) 6= 0), and

(f + g)′(a) = f ′(a) + g′(a)

(fg)′(a) = f ′(a)g(a) + f(a)g′(a)

(f/g)′(a) =f ′(a)g(a)− f(a)g′(a)

g(a)2.

Proof. (exercises) �


Theorem 20.7. (The chain rule.) Let I, J ⊆ R be open, let f : I → R and g : J → R, leta ∈ I, suppose that f(a) ∈ J , and suppose that f is differentiable at a and g is differentiableat f(a). Then g ◦ f is differentiable at a, and (g ◦ f)′(a) = g′

(f(a)

)f ′(a).

Proof. We apply Theorem 20.3 to f and g to obtain functions u : I → R and v : J → Rsuch that

(1) u and v vanish at a and f(a), respectively.(2) u and v are continuous at a and f(a), respectively.(3)

f(x) = f(a) + f ′(a)(x− a) + u(x)(x− a)

g(y) = g(f(a)

)+ g′

(f(a)

)(y − f(a)

)+ v(y)

(y − f(a)

).

Then we have (where we let f(x) play the role of y):

g(f(x)

)= g(f(a)

)+ g′

(f(a)

)(f(x)− f(a)

)+ v(f(x)

)(f(x)− f(a)

)= g(f(a)

)+ g′

(f(a)

)(f ′(a)(x− a) + u(x)(x− a)

)+ v(f(x)

)(f ′(a)(x− a) + u(x)(x− a)

)= g(f(a)

)+ g′

(f(a)

)f ′(a)(x− a)

+[g′(f(a)

)u(x) + v

(f(x)

)f ′(a) + v

(f(x)

)u(x)

](x− a).

Then by Theorem 20.3 it suffices to show that the expression in square brackets vanishesand is continous at x = a. We check this for each of the three terms separately. It is true forthe first term because it is true for u. It is true for the second term because f is continuousat a (by Lemma 20.4), v is continuous, and vanishes, at f(a), and Theorem 8.11. It is truefor the third term by both of the above. �

We now draw out some consequences of differentiability on intervals. First we give ageneral definition.

Definition 20.8. Let X be a metric space, let U ⊆ X be open, let a ∈ U and let f : U → R.f has a local maximum (respectively local minimum) at a if there is r > 0 such that for allx ∈ Br(a) we have f(x) ≤ f(a) (respectively, f(x) ≥ f(a)). Local maxima and minima arecalled local extrema.

Lemma 20.9. Let I ⊆ R be an open interval, let a ∈ I, and let f : I → R. Suppose that fis differentiable at a. If f has a local extremum at a, then f ′(a) = 0.

Proof. We prove the contrapositive. Suppose that f ′(a) 6= 0. For definiteness we assumef ′(a) > 0 (the proof in the case f ′(a) < 0 is analogous). We then have that limx→a

(f(x)−

f(a))/(x− a) > 0. Then there is δ > 0 such that (a− δ, a+ δ) ⊆ I, and such that for x ∈ I,

if 0 < |x − a| < δ then(f(x) − f(a)

)/(x − a) > 0. Now, for any x with a − δ < x < a, we

have x−a < 0. Since the difference quotient is positive, we must have f(x)− f(a) < 0; thusf does not have a local minimum at a. Similarly, for any x with a < x < a + δ, we havex − a > 0. Again, since the difference quotient is positive, we must have f(x) − f(a) > 0;

48 JACK SPIELBERG

thus f does not have a local maximum at a. Therefore, f does not have a local extremumat a. �

This lemma has several famous applications.

Theorem 20.10. (Rolle’s theorem) Let f : [a, b] → R be continuous, and assume that f isdifferentiable on (a, b). Suppose further that f(a) = f(b) = 0. Then there exists c ∈ (a, b)such that f ′(c) = 0.

Rolle’s theorem is a special case of the following theorem

Theorem 20.11. (Mean value theorem) Let f : [a, b]→ R be continuous, and assume that fis differentiable on (a, b). Then there exists c ∈ (a, b) such that f ′(c) =

(f(b)−f(a)

)/(b−a).

The idea of the theorem, and the proof, is easy to see from a simple sketch: on the graphof f , draw the straight line between the endpoints of the graph

(a, f(a)

)and

(b, f(b)

). Let

L(x) be the linear function whose graph passes through these two points. The point c in thetheorem is (one of) the place(s) where the vertical distance between the graphs of f and Lis stationary, i.e. has a local extremum. A little algebraic manipulation of the expressionf(x)− L(x) yields the beginning of the following proof.

Proof. Let h(x) =(f(x)− f(a)

)(b−a)−

(f(b)− f(a)

)(x−a). Then h is continuous on [a, b]

and differentiable on (a, b). Also h(a) = h(b) = 0. By the extreme value theorem (Corollary15.4), h takes on its maximum and minimum values on [a, b]. We note that at least oneof these occurs in the interior (a, b). For if both occur at the endpoints, then h must beidentically zero, and hence achieves its maximimum and minimum at every point of [a, b].Let c ∈ (a, b) be such a point. By Lemma 20.9 we have h′(c) = 0. Differentiating h givesh′(x) = f ′(x)(b−a)−

(f(b)−f(a)

). Then the equation h′(c) = 0 gives the desired result. �

Remark 20.12. There is an alternate phrasing of the mean value theorem that is oftenconvenient. Let f : I → R be differentiable, where I is an open interval. Let a ∈ I andh ∈ R\{0} be such that a+h ∈ I. If we wish to apply the mean value theorem to the closedinterval having a and a + h as endpoints, we would like to express the conclusion withoutdeclaring which is the left, and which the right, endpoint. We avoid this inconvenience in thefollowing way: the point c lies (strictly) between a and a+h if and only if there is a number0 < θ < 1 such that c = a+ θh. Thus we reexpress the mean value theorem in the followingway: if a, a+ h ∈ I then there exists 0 < θ < 1 such that f(a+ h) = f(a) + hf ′(a+ θh).

Now we give some corollaries of the mean value theorem.

Corollary 20.13. Let I ⊆ R be an open interval, and let f : I → R be differentiable. Iff ′ = 0 on I, then f is constant on I.

Proof. Let x0 ∈ I, and apply the mean value theorem to the interval between x0 and x,for any x ∈ I. We find that there is c strictly between x0 and x such that f(x) − f(x0) =f ′(c)(x− x0) = 0. Thus f(x) = f(x0) for all x ∈ I. �

Corollary 20.14. Let I be as in the previous corollary, and let f , g : I → R be differentiable.If f ′ = g′ on I, then f − g is a constant function.

Proof. Apply the previous corollary to f − g. �


Definition 20.15. Let I be an interval, and f : I → R. We say that f is increasing(respectively, decreasing) on I if for all x, y ∈ I, if x < y then f(x) ≤ f(y) (respectively,f(x) ≥ f(y)). We say that f is strictly increasing (respectively, strictly decreasing) if theinequalities above involving f are strict rather than weak.

Corollary 20.16. Let I be as in the previous corollaries, and let f : I → R be differentiable.If f ′ ≥ 0 (respectively f ′ ≤ 0) on I, then f is increasing (respectively, decreasing) on I.If f ′ > 0 (respectively f ′ < 0) on I, then f is strictly increasing (respectively, strictlydecreasing) on I.

Proof. We will give the proof in the case that f ′ > 0 on I; the other parts have similar proofs.Let x < y in I. By the mean value theorem there is x < c < y such that f(y) − f(x) =f ′(c)(y − x) > 0. Since f ′(c) > 0 and y − x > 0, then f(y) > f(x). �

Definition 20.17. Let X and Y be metric spaces. A function f : X → Y is called Lipschitz(onX) if there is a positive constantM such that for all x1, x2 ∈ X we have d

(f(x1), f(x2)

)≤

Md(x1, x2).

Corollary 20.18. Let I and f be as in the previous corollary. Suppose that f ′ is boundedon I. Then f is Lipschitz on I.

Proof. Let |f ′| ≤M on I. Then for any x, y ∈ I, the mean value theorem provides c betweenx and y such that f(x)− f(y) = f ′(c)(x− y). It follows that

∣∣f(x)− f(y)∣∣ ≤M |x− y|. �

We next wish to prove the inverse function theorem.

Theorem 20.19. Let I be an open interval, let f : I → R be differentiable, and supposethat f ′ 6= 0 on I. Then f(I) is an open interval, and f : I → f(I) is a homeomorphism.Moreover, f−1 is differentiable, and

(f−1)′(y) =1

f(f−1(y)

) .The generalization of this theorem to higher dimensions is a very important result, and

somewhat surprisingly, is much much harder to prove. (We will tackle that next semester.)In dimension one, the job is easier because the assumption that f ′ is nonzero means thatf is monotone — if we know that f ′ > 0 (or < 0) throughout I. If we assume that f iscontinuously differentiable, then this is immediate: the intermediate value theorem wouldapply to the continuous function f ′, and we would know that f ′ can’t take on both positiveand negative values if it is never zero. In the higher dimensional situation we will assumethat f is continuously differentiable. However, it is remarkable that in dimension one, theresult is true even if f ′ is not continuous. This is because of the following simple observation:f ′ satisfies the intermediate value property even if it is not continuous.

Theorem 20.20. Let I be an interval and let f : I → R be differentiable. Let a, b ∈ I, andassume that f ′(a) < f ′(b). If f ′(a) < M < f ′(b), then there exists c between a and b suchthat f ′(c) = M .

Proof. We will prove this in the special case where a < b, f ′(a) < 0 < f ′(b), and M = 0. Thegeneral case follows easily from this, and we leave those details as an exercise. Since f ′(a) =limh→0 h

−1(f(a+h)−f(a)), there is h > 0 such that a+h < b and h−1

(f(a+h)−f(a)

)< 0.

It follows that f(a + h) < f(a). Therefore the minimum of f on [a, b] does not occur at

50 JACK SPIELBERG

a. A similar argument shows that the minimum does not occur at b. Hence f has a localminimum in the open interval (a, b), and at this point f ′ = 0. �

Proof. (of Theorem 20.19) By Theorem 20.20 we know that f ′ > 0 on I, or that f ′ < 0on I. By Corollary 20.16 it follows that f is strictly monotone on I. It follows from theintermediate value theorem that f(I) is an open interval, and that f−1 is continuous. We nowshow that f−1 is differentiable, and compute its derivative. For x ∈ I let y = f(x) ∈ f(I).For w ∈ f(I) with w 6= y, there is t ∈ I such that w = f(t). Since f is one-to-one, t 6= x.We have

f−1(w)− f−1(y)

w − y=

t− xf(t)− f(x)

=

(f(t)− f(x)

t− x

)−1.

Since f−1 is continuous, limw→y t = x. Moreover t 6= x during this limiting process. Therefore

limw→y

f−1(w)− f−1(y)

w − y= lim

t→x

(f(t)− f(x)

t− x

)−1=

1

f ′(x)=

1

f ′(f−1(y)

) . �

Corollary 20.21. If f is Cr (in addition to the hypotheses of the inverse function theorem),then so is f−1.

Proof. The formula for (f−1)′ shows that it is continuous if f ′ is continuous. Similarly, it isdifferentiable if f ′ is differentiable, etc. �

If you consider the function h used in the proof of the mean value theorem, you will noticethe beginnings of some symmetry: the function f and the identity function play oppositeroles. Remarkably, the identity function can be replaced by another function like f . Theresult is

Theorem 20.22. (Cauchy mean value theorem.) Let f , g : [a, b] → R be continuous,and differentiable on (a, b). Then there exists c ∈ (a, b) such that

(f(b) − f(a)

)g′(c) =(

g(b)− g(a))f ′(c).

Proof. Let h(t) =(f(b)−f(a)

)(g(t)−g(a)

)−(f(t)−f(a)

)(g(b)−g(a)

). Then h is continuous

on [a, b], differentiable on (a, b), and h(a) = h(b) = 0. Now the mean value theorem givesthe result. �

We apply Cauchy’s mean value theorem to prove L’Hopital’s rule on the computation ofindeterminate limits. The proof applies to any form of continuous limit — here we phraseit for one-sided limits.

Theorem 20.23. (L’Hopital’s rule.) Let f , g : (a, b) → R be differentiable. Suppose thatlimt→a+ f(t) = limt→a+ g(t) = 0, and that g(t) 6= 0 on (a, b). If limt→a+ f

′(t)/g′(t) = L, thenlimt→a+ f(t)/g(t) = L.

Proof. Define f(a) = g(a) = 0. Then f and g are continuous on [a, b). By the hypothesis onthe limit of f ′/g′, we are implicitly assuming that g′(t) 6= 0, at least for all t close enough toa. Replacing b by a smaller value, we may assume that g′ 6= 0 on (a, b). Now, for t ∈ (a, b),we apply Cauchy’s mean value theorem to f and g on the interval [a, t]. Thus there existsc ∈ (a, t) with

(f(t) − f(a)

)g′(c) =

(g(t) − g(a)

)f ′(c). Since f(a) = g(a) = 0, we get

f(t)g′(c) = g(t)f ′(c). By hypothesis we have g(t) 6= 0. Thus we have

f(t)

g(t)=f ′(c)

g′(c).


Moreover, a < c < t. Of course, c depends on t, but we see that as t→ a+ then also c→ a+.Hence

limt→a+

f(t)

g(t)= lim

c→a+

f ′(c)

g′(c)= L.

�

With a bit more work, the same result can be proved in the case where we assumelimt→a+

∣∣f(t)∣∣ = limt→a+

∣∣g(t)∣∣ = ∞. This is an interesting exercise, or you may look up

the proof (e.g. in Rudin). If limt→a+ f(t) = 0 while limt→a+ g(t) = ±∞, evaluating the limitlimt→a+ f(t)g(t) presents us with the third kind of indeterminate form, namely 0 ·∞. In thiscase, we would instead consider the limit of f/(g−1), which is indeterminate of form 0/0.

21. Higher order derivatives and Taylor’s theorem

If f is differentiable on an open interval I, then f ′ is itself a function on I. f ′ need not becontinuous; if it is continuous, we say that f is continuously differentiable on I. Even if f iscontinuously differentiable, f ′ need not be differentiable. If f is differentiable on I, and if f ′

is differentiable at a point a ∈ I, we write f ′′(a) for the derivative of f ′ at a. (Note that inorder to be able to consider whether f ′′(a) exists, it is necessary that f be differentiable ina neighborhood of a — I is such a neighborhood.) If f ′ is differentiable on I, we say thatf is twice differentiable on I. (Of course, if f is twice differentiable, then f is necessarilycontinuously differentiable.) In general, if f ′, f ′′, . . ., f (k) exist on I, we say that f is k-timesdifferentiable (on I). In this case f is necessarily (k − 1)-times continuously differentiable.

Definition 21.1. Let I be an open interval, let a ∈ I, and let f : I → R. Suppose that fis k-times differentiable at a. The kth Taylor polynomial of f at a is

Pk(a, t) =k∑j=0

f (j)(a)

j!(t−a)j = f(a) + f ′(a)(t−a) +

1

2!f ′′(a)(t−a)2 + · · ·+ 1

k!f (k)(a)(t−a)k.

Lemma 21.2. Let f : I → R be k-times differentiable at a ∈ I. Then Pk(a, t) has theproperty that

dj

dtjPk(a, t)

∣∣t=a

= f (j)(a), j = 0, . . . , k.

Moreover, no other polynomial of degree k (or less) has this property. (Thus the Taylorpolynomial of degree k is the best approximation to f at a among all polynomials of degreeless than or equal to k.)

Proof. It is a simple calculation to check that Pk(a, t) has the indicated property. If q(t) =c0 + c1(t− a) + · · ·+ ck(t− a)k is a polynomial, then differenting j times gives q(j)(a) = j!cj.Thus if q(j)(a) = f (j)(a) for 0 ≤ j ≤ k, then we must have j!cj = f (j)(a), as required. �

We see by this lemma that it is easy to find a polynomial that approximates f well at thepoint a. It is not as easy to see how well this polynomial approximates f near the point a.For this, we have Taylor’s Theorem. One can think of it as the generalization of the meanvalue theorem from order 0 to order k. The proof is a bit tricky; we will use Cauchy’s meanvalue theorem.

52 JACK SPIELBERG

Theorem 21.3. (Taylor’s theorem.) Let I ⊆ R be open, let a ∈ I, let f : I → R. Supposethat f is (k+ 1)-times differentiable on I. For t ∈ I there exists c between a and t such that

f(t) = Pk(a, t) +f (k+1)(c)

(k + 1)!(t− a)k+1.

Proof. Let R(t) = f(t) − Pk(a, t). (R(t) is sometimes called the kth remainder.) It followsfrom Lemma 21.2 that R(j)(a) = 0 for 0 ≤ j ≤ k, and we easily see that R(k+1)(a) = f (k+1)(a).Let h(x) = (x− a)k+1. Then h(j)(a) = 0 for 0 ≤ j ≤ k also, and h(k+1)(a) = (k + 1)!.

Now let x ∈ I, x 6= a. We apply the Cauchy mean value theorem to R and h on theinterval between a and x: thus there is c1 between a and x such that(

R(x)−R(a))h′(c1) =

(h(x)− h(a)

)R′(c1),

or equivalently, since R(a) = h(a) = 0 and h, h′ 6= 0 away from a,

R(x)

h(x)=R′(c1)

h′(c1).

Now we apply the Cauchy mean value again, to R′ and h′ on the interval between a and c1:there is c2 between a and c1 such that

R′(c1)

h′(c1)=R′′(c2)

h′′(c2)

(again using the facts that R(a) = h(a) = 0 and h′, h′′ 6= 0 away from a). We repeat thisprocess k + 1 times, and we obtain

R(x)

h(x)=R′(c1)

h′(c1)= · · · = R(k+1)(ck+1)

h(k+1)(ck+1)=f (k+1)(ck+1)

(k + 1)!.

Unwinding this gives f(x)−Pk(a, x) = 1(k+1)!

f (k+1)(c)(x−a)k+1, where c (≡ ck+1) lies between

a and x. �

Corollary 21.4. Let f : I → R be twice differentiable, and assume that f ′′ ≥ 0 on I. Thenat each point of I, the graph of f lies above its tangent line.

Proof. Let a, a+ h ∈ I. By Taylor’s theorem there is 0 < θ < 1 such that

f(a+ h) = f(a) + f ′(a)h+1

2f ′′(a+ θh)h2 ≥ f(a) + f ′(a)h.

The last expression is the x2-coordinate of the point on the tangent line at x1 = a+ h. �

Example 21.5. (Polynomial approximation.)

(1) Let f(x) = ex. Then f (j)(x) = ex for all j, so f (j)(0) = 1 for all j. Thus Pk(x) =∑kj=0

1j!xj is the k-th order Taylor polynomial of f at 0. By Taylor’s theorem, there

is c between 0 and x such that

ex − Pk(x) =ec

(k + 1)!xk+1.

Now fix M > 0. For |x| ≤M , we have∣∣ex − Pk(x)∣∣ ≤ eM

Mk+1

(k + 1)!→ 0 as k →∞.

Thus the Taylor polynomials converge uniformly to ex on any bounded interval.


(2) Define g : R→ R by

g(x) =

{e−1/x, if x > 0

0, if x ≤ 0.

It is a nice exercise to show that g has derivatives of all orders at 0 (this is clear atother points of R), and that g(j)(0) = 0 for all j. Thus all Taylor polynomials of gat 0 are identically zero. Therefore the Taylor polynomials of g do not approximateg uniformly in any neighborhood of zero.

22. The Riemann integral

In this section we will discuss Riemann integration. We gratefully cobbled together thistreatment from the ideas of the analogous chapter of Pugh’s book. Pugh’s approach digs alittle bit deeper than the usual ones, but it really is worth the extra effort

What is integration all about? Of course we rely on your previous experience from calculus:the most basic answer is that we want to find the area of a region bounded by curved lines.(A region bounded by straight lines can be dealt with entirely by elementary geometry.) Ourdefinition(s) are based on this idea. The next level of abstraction comes from the fundamentaltheorem of calculus: integration is the inverse operation to differentiation. That statement isa bit glib. After all, the derivative of a function is another function, whereas the integral ofa function is a number. But the statement actually is correct, when it is fleshed out properly— that is the role of the fundamental theorem. What we take from this, (or rather, whatwe imagine that we are explaining to first year calculus students), is that integration is afunction of functions, i.e. a functional :∫

: {functions} → R.

Among the first properties of integration that are presented in calculus are the “sum” and“scalar multiple” rules:∫

(f + g) =

∫f +

∫g;

∫(cf) = c

∫f.

In fact, these are indicating precisely that integration is a linear functional. Linear algebrais an essential part of modern analysis, and the analysis of linear functionals, functionalanalysis, is one of its broadest subdisciplines.

Well, the notion of linear map presupposes the idea of vector spaces: the domain andcodomain of a linear map should be vector spaces. This is a fundamental idea, that isalmost completely lost in a calculus course: the collection of functions that can be integratedshould be a vector space. To be candid, we don’t really talk at all about the “space ofintegrable functions” in a calculus course. At best, we try to explain why certain functionsare integrable, e.g. continuous, or piecewise continuous, functions. This time, we will directlyaddress this question. Not only will we carefully define what integrable means, and prove thatthe set of integrable functions is a vector space. We will give an independent characterization(due to Lebesgue) of exactly which functions are integrable. This is useful even just in thecontext of Riemann integration. Many important results that would otherwise require fussyproofs will become effortless (so to speak). But it also prods us to a larger view. Once weare able to see the space of Riemann integrable functions as a whole, we can also begin to

54 JACK SPIELBERG

see its limitations, and where it might give way to generalization. In the next semester wewill spend some time (how much???) exploring Lebesgue’s version of integration.

That is the end of the “introduction”. We have to get started, and the beginning isvery basic — after all, integration is just a lot of arithmetic. We will follow Pugh’s ideaof emphasizing the fact that there are two usual ways to present the integral; he refers tothem as the Riemann and the Darboux approaches. Without any expertise in the history ofmathematics, or any effort at tracking down that history, we will just adopt this terminology.First we give the Riemann approach. We let f be a real-valued function on a compact interval[a, b].

Definition 22.1. A partition of [a, b] is a finite set P ⊆ [a, b] such that a, b ∈ P .

The idea of a partition is that it defines a subdivision of [a, b] into a finite number ofsubintervals. The easiest way of indicating this is by giving the set of endpoints of thesubintervals, which is what our definition does. We usually write a partition in the form

P = {x0, x1, . . . , xn},where a = x0 < x1 < · · · < xn = b. This is a slight abuse of notation, since the definitionof P as a set does not indicate that the numbers in the set are given in (strictly) increasingorder. From the partition P we obtain n subintervals of [a, b]: [x0, x1], . . ., [xn−1, xn]. Notethat the number n associated with P is obtained from the relation n + 1 = #(P ). We usethe term mesh for the length of the largest subinterval: mesh(P ) = max1≤i≤n(xi − xi−1).The mesh is a rough sort of description of how fine the partition is.

Definition 22.2. A partition pair is a partition P together with a list T = (t1, . . . , tn) suchthat xi−1 ≤ ti ≤ xi for 1 ≤ i ≤ n.

Thus the list T consists of a selection of one element from each subinterval of the partition.

Definition 22.3. Let f : [a, b]→ R, and let (P, T ) be a partition pair for the interval [a, b].The Riemann sum associated to this data is the number

R(f, P, T ) =n∑i=1

f(ti)∆xi,

where ∆xi = xi − xi−1, the length of the ith subinterval.

Now we have the terminology we need to define Riemann integrability and the Riemannintegral. As mentioned above, Riemann sums are just a lot of (carefully organized) arith-metic. To pass to the integral is a limiting process. The following definition is the usualnotion of limit, but is based on the mesh.

Definition 22.4. The function f : [a, b] → R is Riemann integrable if there is a number Lsuch that for every ε > 0, there exists δ > 0 such that for every partition pair (P, T ) of [a, b],if mesh(P ) < δ then

∣∣R(f, P, T )− L∣∣ < ε.

We write L = limmesh(P )→0R(f, P, T ) to indicate this limit. The number L is unique, ifit exists. This is proved in the usual way of limits, and is left to you as an exercise. If f is

Riemann integrable, we write∫ baf (or

∫ baf dx, or

∫ baf(x) dx, or just

∫f) for the number L.

We will write R[a, b] for the set of all Riemann integrable functions on [a, b].There is an important detail hidden in the last definition. For the limit to exist it must

be the case that the approximation holds independently of the choice of the list T in the


partition pair. In other words, if P is a partition with mesh(P ) < δ, then the Riemann sumis within ε of L for any choice of T .

We now give some consequences of the definition.

Theorem 22.5. If f is Riemann integrable then f is bounded.

Proof. We apply the definition of integrability with ε = 1: there exist L and δ > 0 such thatif P is any partition with mesh(P ) < δ, then

∣∣R(f, P, T )−L∣∣ < 1. (As we mentioned above,

this estimate holds for any choice of T .) It follows from the triangle inequality that∣∣ n∑i=1

f(ti)∆xi∣∣ < 1 + |L|.

We will show that f is bounded on each subinterval of [a, b] defined by P . It will then followthat f is bounded on [a, b]. Fix i0 ∈ {1, 2, . . . , n}. For i 6= i0 choose ti ∈ [xi−1, xi]. For anyt ∈ [xi0−1, xi0 ] we apply the above inequality to the list T = (t1, . . . , ti0−1, t, ti0+1, . . . , tn):∣∣f(t)∆xi0

∣∣− ∣∣∑i 6=i0

f(ti)∆xi∣∣ ≤ ∣∣R(f, P, T )

∣∣ < 1 + |L|.

We find that ∣∣f(t)∣∣ ≤ (∆xi0)

−1

(1 + |L|+

∣∣∑i 6=i0

f(ti)∆xi∣∣) .

Thus the right hand side is an upper bound for |f | on [xi0−1, xi0 ]. �

Theorem 22.6. R[a, b] is a vector space, and integration defines a linear functional on it.

Proof. We note that for a fixed partition pair (P, T ), the Riemann sum is linear in f :

R(cf + g, P, T ) =∑i

(cf + g)(ti)∆xi

=∑i

(cf(ti) + g(ti)

)∆xi

= c∑i

f(ti)∆xi +∑i

g(ti)∆xi

= cR(f, P, T ) +R(g, P, T ).

Since addition and multiplication in R are continuous, we get

limmesh(P )→0

R(cf + g, P, T ) = limmesh(P )→0

(cR(f, P, T ) +R(g, P, T )

)= c lim

mesh(P )→0R(f, P, T ) + lim

mesh(P )→0R(g, P, T ).

Therefore cf + g is Riemann integrable, and hence R[a, b] is a vector space. Moreover theabove calculation shows that

∫(cf+g) = c

∫f+∫g, i.e. that integration is a linear functional

on R[a, b]. �

The following example and theorem are easy exercises using the definition of integrability.

Example 22.7. The constant function 1 is Riemann integrable, and∫ ba

1 = b− a.

Theorem 22.8. Let f , g ∈ R[a, b] with f ≤ g. Then∫f ≤

∫g. If |f | ≤ M on [a, b], then∣∣∫ b

af∣∣ ≤M .

56 JACK SPIELBERG

23. The “Darboux” approach

We now discuss the second way of defining the Riemann integral, which we call the Dar-boux method. Again, we need some preliminaries. Notice that for this method we mustassume that the function is bounded.

Definition 23.1. Let f : [a, b]→ R be a bounded function, and let P = {x0, x1, . . . , xn} bea partition of [a, b]. We define

mi = infxi−1≤t≤xi

f(t)

Mi = supxi−1≤t≤xi

f(t).

L(f, P ) =∑i

mi∆xi

U(f, P ) =∑i

Mi∆xi.

These are referred to as lower and upper sums. Notice that for any partition pair (P, T )

we have that L(f, P ) ≤ R(f, P, T ) ≤ U(f, P ). Finally we define

I(f) = supPL(f, P )

I(f) = infPU(f, P ).

These are referred to as the lower and upper integrals of f on [a, b]. It is standard to write∫ baf for I(f), and

∫ baf for I(f). Finally, we say that f is Darboux integrable on [a, b] if I = I,

and in this case the common value is called the (Darboux) integral.

Our goal for this section is to prove that the Riemann and Darboux approaches yield thesame result. Before doing this we need to talk a bit about refinements of partitions, andtheir effect on upper and lower sums and integrals.

Definition 23.2. Let P and P ′ be partitions of [a, b]. We say that P ′ refines P if P ⊆ P ′.

It is easy to see that P ′ refines P if and only if every subinterval associated to P ′ iscontained in one of the subintervals associated to P .

Lemma 23.3. (Refinement Principle) Let P ′ refine P . Then L(f, P ) ≤ L(f, P ′) andU(f, P ′) ≤ U(f, P ).

In other words, refining the partition causes the lower sum to increase, and the upper sumto decrease. The idea of the proof is to proceed from P to P ′ by adding one point at a time.Then the change in the lower and upper sums happens on only one subinterval of P . Weleave as an exercise the writing of a precise proof.

In general, if P1 and P2 are two partitions of [a, b], then neither one need refine the other.Thus there is in general no relation between the upper and lower sums for two partitions.However, P1 and P2 always have a common refinement ; for example, P1 ∪ P2 contains bothP1 and P2. This device gives us the following important result: every lower sum for f is lessthan or equal to every upper sum for f .

Lemma 23.4. Let P1 and P2 be two partitions of [a, b]. Then L(f, P1) ≤ U(f, P2).


Proof. Let P ′ be a common refinement. Then

L(f, P1) ≤ L(f, P ′) ≤ U(f, P ′) ≤ U(f, P2). �

We now give a Cauchy type characterization of Darboux integrability.

Corollary 23.5. f is Darboux integrable on [a, b] if and only if for every ε > 0 there is apartition P of [a, b] such that U(f, P )− L(f, P ) < ε.

Proof. The forward direction follows easily from the definition, and we leave it as an exerciseto write it out carefully. For the reverse direction, suppose that the Cauchy condition holds.We must show that I = I. We already know that I ≤ I. Let ε > 0. Choose a partition Psuch that U(f, P )− L(f, P ) < ε. Then

I ≤ U(f, P ) < L(f, P ) + ε ≤ I + ε.

This is true for every choice of ε, and hence I ≤ I. �

(This Cauchy type condition corresponds to a kind of limit. The limiting process goingon here is that the partition becomes finer and finer, in the sense of refinement. This is adifferent kind of limit than the others we have seen. Until now we have seen limits based ona totally ordered set; for example, n→∞ in N, t→ t0 in R, or t→∞ in R. The limit takenas a partition of [a, b] becomes finer and finer is based on a partially ordered set, namely, theset of partitions ordered by refinement. It isn’t hard to get used to this notion, and we maywrite

I = limP→∞

L(f, P )

I = limP→∞

U(f, P ).

If f is Darboux integrable, then we have∫f = limP→∞ L(f, P ) = limP→∞ U(f, P ).)

We are now ready to prove the main theorem of this section.

Theorem 23.6. Let f : [a, b]→ R. Then f is Riemann integrable if and only if f is Darbouxintegrable. For an integrable function, the two integrals coincide.

Proof. We first assume that f is Riemann integrable. Let ε > 0. There exist a number Land δ > 0 such that if P is any partition with mesh(P ) < δ, then for any list T associatedto P we have

∣∣R(f, P, T ) − L∣∣ < ε. Fix any partition P with mesh(P ) < δ. Then we have

(for any T )L− ε < R(f, P, T ) < L+ ε.

Recall that for any partition pair (P, T ), we have L(f, P ) ≤ R(f, P, T ) ≤ U(f, P ). Moreover,it is easy to see that

L(f, P ) = infTR(f, P, T )

U(f, P ) = supTR(f, P, T ).

It follows that

L− ε ≤ L(f, P )

L+ ε ≥ U(f, P ).

Therefore U(f, P )− L(f, P ) ≤ 2ε. Hence f is Darboux integrable.

58 JACK SPIELBERG

Now we assume that f is Darboux integrable. The proof of this direction is a bit trickierthan the other one. In particular, it relies upon the standard technique of dividing the suminto two kinds of terms, and estimating them differently. Since f is bounded, there is K

such that |f | ≤ K on [a, b]. Let L =∫ baf (the Darboux integral of f). Let ε > 0. Choose a

partition P such that

U(f, P )− L(f, P ) < ε.

Write P = {x0, x1, . . . , xn}. Set δ = εn. We will show that if (Q, T ) is any partition pair

with mesh(Q) < δ, then∣∣R(f,Q, T ) − L

∣∣ < (2K + 1)ε, proving Riemann integrability(and also showing that the two integrals coincide). In fact, it will suffice to show thatU(f,Q)−L(f,Q) < (2K+1)ε, since both L and R(f,Q, T ) lie between the lower and uppersums.

So let Q = {y0, y1, . . . , yk} have mesh less than δ. We will write Ii = [xi−1, xi] for 1 ≤ i ≤ n,and Jj = [yj−1, yj] for 1 ≤ j ≤ k. We divide the subintervals associated to Q into two groupsas follows:

S1 = {j : there exists i with xi ∈ int(Jj)}S2 = {1, 2, . . . , k} \ S1.

Thus S2 indicates those Jj’s that are entirely contained in one of the Ii’s; S1 indicates thoseJj’s that straddle more than one of the Ii’s. There are at most n elements in S1 (in fact,there are at most n − 1). Now we will use m(I) and M(I) for the infimum and supremumof f over an interval I. For j ∈ S1 we have

−K ≤ m(Jj) ≤M(Jj) ≤ K.

For j ∈ S2 there is i such that Jj ⊆ Ii. Then

m(Ii) ≤ m(Jj) ≤M(Jj) ≤M(Ii).

Hence for this j and i we have

M(Jj)−m(Jj) ≤M(Ii)−m(Ii).

Now we estimate:

U(f,Q)− L(f,Q) =k∑j=1

(M(Jj)−m(Jj)

)∆yj

=∑j∈S1

(M(Jj)−m(Jj)

)∆yj +

∑j∈S2

(M(Jj)−m(Jj)

)∆yj

≤∑j∈S1

2K∆yj +n∑i=1

∑j∈S2Jj⊆Ii

(M(Ii)−m(Ii)

)∆yj

< 2Knδ +n∑i=1

(M(Ii)−m(Ii)

)∆xi

= 2Kε+ U(f, P )− L(f, P )

< (2K + 1)ε. �


There are various situations where it is fairly easy to prove integrability (or non-integrabi-lity) using the Darboux definition. These are useful exercises in working with the definition.In the next section we will prove a deep theorem that will make them trivial to verify.

Example 23.7. (1) Continuous functions are Riemann integrable.(2) Monotone functions are Riemann integrable.(3) Step functions are Riemann integrable. (A step function on [a, b] is a function for

which there exists a partition of [a, b] such that the function is constant on the interiorof each subinterval.) In particular, the characteristic function χ[c,d] of a subinterval[c, d] of [a, b] is Riemann integrable over [a, b], where χE(x) = 1 if x ∈ E and = 0 ifx 6∈ E.

(4) More generally, a bounded function that is continuous at all but finitely many pointsof [a, b] is Riemann integrable.

(5) The characteristic function of Q is not Riemann integrable over any interval.

24. Measure zero and integration

In order to characterize intrinsically the property of being Riemann integrable, we need todevelop the concept of sets of measure zero. This is the first step in what is called measuretheory (which I hope to cover a bit more fully next semester). Riemann integration is builtaround the elementary concept of length of an interval. It is natural to consider the “length”of a finite union of intervals, but the question of measuring more complicated subsets of Ris not addressed in calculus. Nevertheless, this is a very important problem, resolved byLebesgue in the first part of the twentieth century. One of his great insights is the maintheorem below.

Definition 24.1. Let E ⊆ R. E has measure zero if for every ε > 0 there exist openintervals U1, U2, . . ., such that

(1) E ⊆⋃∞i=1 Ui.

(2)∑∞

i=1 |Ui| < ε.

In this definition we write |Ui| for the length of the interval Ui. We expect that the notionof a convergent sum of positive real numbers is familiar from a previous course (even thoughwe will review this idea later this semester). In any case, the definition is unchanged if wedemand instead of (2) that

∑ni=1 |Ui| < ε for all n. Here are some examples of sets of measure

zero.

Example 24.2. (1) Finite sets have measure zero. (In fact, only finitely many openintervals are necessary in this case.)

(2) Countable sets have measure zero.

Proof. This is a convenient place to introduce the “ε/2n trick”. Let E = {x1, x2, . . .}.Let ε > 0 be given, and set Ui = (xi − ε

2i+2 , xi + ε2i+2 ), for i = 1, 2, . . .. Then

|Ui| = ε2i+1 , so that

∑i |Ui| =

ε2< ε. It is obvious that E ⊆

⋃i Ui. �

(3) Q has measure zero.(4) A subset of a set of measure zero has measure zero.(5) A countable union of sets of measure zero has measure zero.(6) The definition is unchanged if arbitrary intervals (open, closed, or half-open) are used

instead of open intervals.

60 JACK SPIELBERG

Proof. Proofs for the previous three assertions are left as exercises. �

(7) The Cantor set C has measure zero.

Proof. Recall from our construction of C that C =⋂∞n=1 Fn, where Fn is the union

of 2n closed intervals, each of length 3−n. Stretching each of these a little bit, wecan produce 2n open intervals Ui each having length less than (2.5)−n and having

union containing Fn (and hence C). Then∑2n

i=1 |Ui| =(

22.5

)n, which tends to zero as

n→∞. �

(8) If a < b then [a, b] does not have measure zero. This is a good exercise, even if itisn’t homework (but it might be).

Before stating the main theorem, we recall the notion of oscillation of a function at apoint. The definition makes sense for a function between general metric spaces, but forclarity we will state it only for functions whose codomain is R.

Definition 24.3. Let X be a metric space, and let f : X → R. Let a ∈ X. The oscillationof f at a is

osc(f, a) = infr>0

(sup

x,y∈Br(a)

∣∣f(x)− f(y)∣∣ ).

This is the precise description of a very natural idea. Let’s briefly take the definition apart.Fix r > 0. This defines an open ball about a. How much can the function vary over thisball? The supremum in the parentheses is exactly how much. If we let r become smaller,then the ball becomes smaller, so that there are fewer points in the ball to put inside of f .Thus as r decreases, the supremum also decreases. In fact, the infimum over r is actuallyequal to the limit as r → 0. This limiting value is the minimum amount that f can be madeto jump, no matter to how small a ball (centered at a) you confine its argument. That iswhat we mean by the oscillation at a.

We can think of the oscillation of f at a as a measure of the size of the discontinuity of fat a. That is an interpretation of the first part of the following lemma (which should havebeen homework earlier in the semester).

Lemma 24.4. Let X be a metric space, let f : X → R, and let a ∈ X.

(1) f is continuous at a if and only if osc(f, a) = 0.(2) For c > 0, {x ∈ X : osc(f, x) ≥ c} is a closed set.

Theorem 24.5. Let f : [a, b]→ R be bounded. Let E be the set of points in [a, b] where f isdiscontinuous. Then f is Riemann integrable if and only if E has measure zero.

Proof. We first assume that f is Riemann integrable. Let En = {x ∈ [a, b] : osc(f, x) ≥ 1n}.

By Lemma 24.4 (1), we know that E =⋃∞n=1En. Thus it suffices to show that En has measure

zero for each n. So now fix n, and choose a partition P such that U(f, P ) − L(f, P ) < εn.

Let

S = {I : I is a subinterval of P, and int(I) ∩ En 6= ∅}.For I ∈ S we have that M(I) − m(I) ≥ 1

n. (The reason is that there must exist a point

a ∈ En in the interior of I, so that I ⊇ Br(a) for some r > 0.) But now we estimate

1n

∑I∈S

|I| ≤ U(f, P )− L(f, P ) < εn,


so that∑

I∈S |I| < ε. Now the union⋃I∈S I contains all points of En except possibly some

of the endpoints of subintervals of P not in S. There can be only finitely many such points.Let T be a collection of open intervals centered at these points with total length so small that∑

I∈S |I| +∑

J∈T |J | < ε. Then {int(I) : I ∈ S} ∪ T is a finite collection of open intervalscovering En and having total length less than ε. Therefore En has measure zero.

Now we prove the converse. Suppose that E has measure zero. Let |f | ≤ K on [a, b],and let ε > 0 be given. Let E0 = {x ∈ [a, b] : osc(f, x) ≥ ε}. Then E0 ⊆ E, so thatE0 also has measure zero. Let U1, U2, . . . be open intervals such that E0 ⊆

⋃∞i=1 Ui and∑∞

i=1 |Ui| < ε. By Lemma 24.4(2), E0 is closed. Since E0 ⊆ [a, b], E0 is compact. Thus thereis n such that E0 ⊆ U1 ∪ · · · ∪ Un. Let P0 =

⋃ni=1

(∂Ui ∩ [a, b]

)∪ {a, b}, a partition of [a, b].

We will find a suitable refinement P of P0 such that U(f, P ) − L(f, P ) < (2K + b − a)ε,which will conclude the proof. Since P0 contains the endpoints of the Ui’s, each subintervalassociated to P0 is either contained in some Ui, or is disjoint from all of the Ui’s. Let S1

denote the collection of those subintervals that are contained in some Ui, and let S2 denotethe remaining subintervals. Then for I ∈ S1 we have

M(I)−m(I) ≤ 2K.

Hence ∑I∈S1

(M(I)−m(I)

)|I| ≤ 2K

n∑i=1

|Ui| < 2Kε.

Now consider a subinterval I ∈ S2. Then I ∩ E0 = ∅, so the oscillation of f at each pointof I is less than ε. Thus for each x ∈ I there is an open interval Ix centered at x such thatM(Ix)−m(Ix) < ε. The collection {Ix : x ∈ I} is an open cover of the compact interval I,

hence has a finite subcover: there are x1, . . ., xk ∈ I such that I ⊆⋃ki=1 Ixi . We define P by

including into P0 all endpoints of the Ixi that lie in I:

P = P0 ∪⋃I∈S2

⋃i

((∂Ixi) ∩ I

).

Let us consider the subintervals of P contained in some I ∈ S2, let J be one such. ThenJ ⊆ Ixi for some i, and hence M(J)−m(J) < ε. Therefore∑

I∈S2

∑J⊆I

(M(J)−m(J)

)|J | <

∑I∈S2

∑J⊆I

ε|J | = ε∑I∈S2

|I| ≤ ε(b− a).

We now have

U(f, P )− L(f, P ) =∑I∈S1

(M(I)−m(I)

)|I|+

∑I∈S2

∑J⊆I

(M(J)−m(J)

)|J |

< 2Kε+ (b− a)ε. �

We now give several consequences of this theorem. These can be proved directly fromthe Riemann or Darboux definitions, but are deduced much more easily from the abovecharacterization. The first of these is immediate from the theorem.

Corollary 24.6. A bounded function with only finitely many discontinuities is Riemannintegrable. In particular, a piecewise continuous function is Riemann integrable.

Corollary 24.7. A function that is zero except at finitely many points is Riemann integrable.Moreover, the integral of such a function equals 0.

62 JACK SPIELBERG

Proof. The integrability follows from the previous corollary. It is easy to use the definitionof the integral to show that the integral is zero. �

Corollary 24.8. Riemann integrability, and the value of the Riemann integral, of a functionare unaffected when the function is altered at finitely many points.

Proof. The altered function equals the sum of the original function with a function that iszero except at finitely many points. Thus the previous corollary, together with linearity ofthe integral, give the result. �

Corollary 24.9. Monotone functions are Riemann integrable.

Proof. This follows from the fact that a monotone function has countably many discontinu-ities. To see this, note that a monotone function has one-sided limits at all points, and isdiscontinuous at a point if and only if the two one-sided limits at that point are distinct. Ifwe let

f(x±) = limt→x±

f(t),

then for any x 6= y we have(f(x−), f(x+)

)∩(f(y−), f(y+)

)= ∅. Thus if we let q(x) be

a rational number in the interval(f(x−), f(x+)

)for each discontinuity x of f , then q is a

one-to-one function from the set of discontinuities into Q. Therefore the set of discontinuitiesis countable, and hence of measure zero. �

Corollary 24.10. The product of Riemann integrable functions is Riemann integrable.

Proof. The set of discontinuities of fg is contained in the union of the sets of discontinuitiesof f and g separately. �

Corollary 24.11. Let f be Riemann integrable on [a, b], and let ϕ be a continuous functiondefined on the range of f . Then ϕ ◦ f is Riemann integrable (also on [a, b]).

Proof. Since composition preserves continuity, the set of points where f is continuous iscontained in the set of points where ϕ ◦ f is continuous. Hence the sets of discontinuitiessatisfy the reverse containment. �

Remark 24.12. The order in which the two functions are composed in the previous corollaryis crucial: f ◦ϕ need not be integrable. (You can remember which order preserves integrabi-lity by noting that in the corollary, the composition has the same domain as the integrablefunction.)

Corollary 24.13. If f is Riemann integrable, then so is |f |.Proof. |f | = | · | ◦ f . �

Corollary 24.14. Let f be Riemann integrable on [a, b], and let [c, d] ⊆ [a, b]. Then f is

Riemann integrable on [c, d]. Moreover,∫ dcf =

∫ bafχ[c,d].

Proof. For the first statement, note that any discontinuity of f in [c, d] is also a discontinuityin [a, b]. The second statement follows easily from either definition of the integral by including{c, d} into a partition of [a, b]. �

Corollary 24.15. Let f be Riemann integrable on [a, b], and let c ∈ (a, b). Then∫ b

a

f =

∫ c

a

f +

∫ b

c

f.


Proof. This follows from linearity of the integral and Corollary 24.8, since fχ[a, b] andfχ[a,c] + fχ[c,b] can differ only at c. �

The image of a set of measure zero under a continuous function need not have measurezero. This is a pretty strange phenomenon. The upshot is that continuity is not really such astrong property. It is important that a stronger version of continuity is sufficient to preservemeasure zero sets.

Lemma 24.16. Let g : [a, b] → R be a Lipschitz function, and let E ⊆ [a, b] have measurezero. Then g(E) has measure zero.

Proof. Let c > 0 be a Lipschitz constant for g. We claim that if I is an open intervalcontained in [a, b], then

∣∣g(I)∣∣ ≤ c|I|. To see this, let I = (t−r, t+r). Then by the Lipschitz

condition, g(I) ⊆(g(t)− cr, g(t) + cr

). Now let ε > 0. Let U1, U2, . . . be open intervals with

E ⊆⋃i Ui and

∑i |Ui| <

εc. Let us assume that Ui ⊆ [a, b]; this is not a serious restriction,

as we may extend the domain of g to all of R (e.g. by letting g be constant on (−∞, a] andon [b,∞)) without changing the Lipschitz constant. Then g(E) ⊆

⋃i g(Ui), and∑

i

∣∣g(Ui)∣∣ ≤ c

∑i

|Ui| < c εc

= ε.

Therefore g(E) has measure zero. �

Corollary 24.17. Let f be Riemann integrable, and let ϕ be a continuous one-to-one func-tion (defined on an interval) such that its inverse function ϕ−1 is Lipschitz. Then f ◦ ϕ isRiemann integrable. (Compare with Corollary 24.11.)

Proof. Let E denote the set of discontinuities of f . We first claim that the set of disconti-nuities of f ◦ ϕ is contained in ϕ−1(E). To see this, note that if x 6∈ ϕ−1(E) then ϕ(x) 6∈ E,so that f is continuous at ϕ(x). But then f ◦ ϕ is continuous at x. Hence any point wheref ◦ ϕ is discontinuous must be contained in ϕ−1(E). By the previous corollary, ϕ−1(E) hasmeasure zero. �

25. The fundamental theorem of calculus

Before giving the fundamental theorem, we present the usual notational expediencies.

Definition 25.1. If a < b and f is Riemann integrable on [a, b], we define∫ abf = −

∫ baf .

Theorem 25.2. Let f be bounded on an interval containing a, b and c. If two of∫ baf ,∫ cbf

and∫ acf exist, then so does the third, and∫ b

a

f +

∫ c

b

f +

∫ a

c

f = 0.

Proof. One of a, b and c lies between the other two. By symmetry, we may assume withoutloss of generality that it is b that lies in the middle. Again without loss of generality, we mayassume that a < b < c. Now, if f is integrable on [a, b], we are done by Corollary 24.14. Onthe other hand, if f is integrable on [a, c] and [c, b], then f is integrable on [a, b] by Theorem24.5. �

Remark 25.3. If a < b, then∣∣∫ abf∣∣ ≤ ∫ b

a|f |.

64 JACK SPIELBERG

Theorem 25.4. (The fundamental theorem of calculus.) Let f be Riemann integrable on[a, b]. For x ∈ [a, b] let F (x) =

∫ xaf . Then F is Lipschitz (in particular, F is continuous.)

If f is continuous at x0 ∈ [a, b], then F is differentiable at x0, and F ′(x0) = f(x0) (inparticular, if f is continuous on [a, b], then F is differentiable on [a, b]).

Proof. We leave as an exercise the proof that F is Lipschitz. Suppose that f is continuousat x0. Let ε > 0. Then there is δ > 0 such that for all x ∈ [a, b], if |x − x0| < δ then∣∣f(x)− f(x0)

∣∣ < ε. Now for x ∈ [a, b] \ {x0} with |x− x0| < δ, we have

F (x)− F (x0)

x− x0=

1

x− x0

(∫ x

a

f −∫ x0

a

f

)=

1

x− x0

∫ x

x0

f, hence∣∣∣∣F (x)− F (x0)

x− x0− f(x0)

∣∣∣∣ =

∣∣∣∣ 1

x− x0

∫ x

x0

f − 1

x− x0

∫ x

x0

f(x0)

∣∣∣∣=

∣∣∣∣ 1

x− x0

∫ x

x0

(f − f(x0)

)∣∣∣∣≤ 1

|x− x0|

∣∣∣∣∫ x

x0

∣∣f − f(x0)∣∣∣∣∣∣

≤ 1

|x− x0|ε|x− x0|

= ε. �

Corollary 25.5. If f is continuous on [a, b], then f has an antiderivative on [a, b].

Corollary 25.6. If f is continuous on [a, b], and if G is an antiderivative for f on [a, b],

then∫ baf = G(b)−G(a).

Proof. Let F (x) =∫ xaf . Then F and G are both antiderivatives for f on [a, b]. Thus F

and G are differentiable on [a, b], and F ′ = G′. By Corollary 20.14, F − G is constant, sayF −G = c. Then∫ b

a

f = F (b) = F (b)− F (a) =(G(b) + c

)−(G(a) + c

)= G(b)−G(a),

(since F (a) = 0). �

We conclude this section with the change of variable theorem, which is the basis for themethod of integration by substitution from elementary calculus. If we assume that thefunction f is continuous, then an easy short proof can be given using antiderivatives (youmight try to find it as an exercise). Our characterization of integrability by means of sets ofmeasure zero lets us prove a more general result, without too much extra work.

Theorem 25.7. (Change of variable theorem.) Let f be Riemann integrable on [a, b], andlet g : [c, d]→ [a, b] be continuously differentiable with g′ 6= 0 on [c, d]. Then∫ g(d)

g(c)

f(y) dy =

∫ d

c

f(g(x)

)g′(x) dx.


Proof. Since g′ is continuous on [c, d], it does not change sign. We first consider the casewhere g′ > 0 on [c, d]. Then g(c) < g(d). Note that g−1 is also continuously differentiable,by the inverse function theorem, and hence that g−1 is Lipschitz. By Corollary 24.17, f ◦ gis Riemann integrable, and hence so is (f ◦ g)g′. Let L and L′ be the two integrals in thestatement of the theorem, and let ε > 0. Let δ > 0 be such that for any partition pair(Q,U) of

[g(c), g(d)

]with mesh(Q) < δ we have

∣∣R(f,Q, U)− L∣∣ < ε. Since g is uniformly

continuous on [c, d] there is η1 > 0 such that if |x − x′| < η1 then∣∣g(x) − g(x′)

∣∣ < δ.Choose η2 > 0 such that for any partition pair (P, T ) of [c, d] with mesh(P ) < η2 we have∣∣R((f ◦ g)g′, P, T

)− L′

∣∣ < ε. Fix a partition P of [c, d] with mesh(P ) < min{η1, η2}.Write P = {x0, x1, . . . , xn}. Let yi = g(xi), and let Q = g(P ) = {y0, y1, . . . , yn}. Sincemesh(P ) < η1 we know that mesh(Q) < δ. The mean value theorem applied to g on[xi−1, xi] gives ti ∈ (xi−1, xi) such that

g(xi)− g(xi−1) = g′(ti)(xi − xi−1)i.e. ∆yi = g′(ti)∆xi.

Let ui = g(ti), and set U = (u1, . . . , un). Then (Q,U) is a partition pair of[g(c), g(d)

], and

R(f,Q, U) =n∑i=1

f(ui)∆yi =n∑i=1

f(g(ti)

)g′(ti)∆xi = R

((f ◦ g)g′, P, T

).

Therefore

|L− L′| ≤∣∣L−R(f,Q, U)

∣∣+∣∣R((f ◦ g)g′, P, T

)− L′

∣∣ < ε+ ε = 2ε.

Hence L = L′.If, on the other hand, g′ < 0 on [c, d], then g(d) < g(c). Note that ∆yi = −g′(ti)∆xi (and

i runs backward). But R(f,Q, U) approximates∫ g(c)g(d)

f = −L. �

26. The Weierstrass approximation theorem

In this section, we will apply the Riemann integral to prove a classic, and still veryimportant, theorem from the 19th century, on polynomial approximation. The proof relieson a technique called smearing that is very useful harmonic analysis, and many other areasof mathematics. We will discuss this idea first, and then see about the Weierstrass theorem.

As an introductory example of smearing, consider a continuous function f : R→ R. Thenf ∈ R[a, b] for any a < b. For n ∈ N we define the nth average of f by

An(x) =n

2

∫ x+1/n

x−1/nf(t) dt, for x ∈ R.

Thus An(x) is the average of f over the interval of length 2/n centered at x. We claim thatAn converges to f uniformly on any compact interval. For the proof, let [a, b] be the interval,and let ε > 0. Choose δ > 0 as in the definition of uniform continuity for f and ε on [a, b].Let n > 1/δ. Then for any x ∈ [a, b],∣∣An(x)− f(x)

∣∣ =

∣∣∣∣∣n2∫ x+1/n

x−1/nf(t) dt− f(x)

n

2

∫ x+1/n

x−1/ndt

∣∣∣∣∣≤ n

2

∫ x+1/n

x−1/n

∣∣f(t)− f(x)∣∣ dt;

66 JACK SPIELBERG

for t ∈ [x− 1/n, x+ 1/n],∣∣f(x)− f(t)

∣∣ < ε, so

≤ n

2

∫ x+1/n

x−1/nε dt

= ε.

Observe that An is constructed from f and the auxiliary function n2χ[−1/n,1/n]:

An(x) =n

2

∫ x+1/n

x−1/nf(t) dt =

n

2

∫ 1/n

−1/nf(t+ x) dt =

∫Rf(t+ x)

(n2χ[−1/n,1/n]

)(t) dt.

Let gn = n2χ[−1/n,1/n]. Thus we may express An in the form

(∗) An(x) =

∫Rf(t+ x)gn(t) dt.

The sequence of functions (gn) has the following key properties:

(1) gn ≥ 0.(2)

∫R gn = 1.

(3) For any δ > 0, limn→∞∫ δ−δ gn = 1. (This means that the “mass” of gn concentrates

at 0 as n→∞.)

An argument analogous to the above will work for any sequence of functions having thesethree properties. Such a sequence is sometimes called an approximate identity, and thereare many important examples. Here is an example that we will use to prove the Weierstrasstheorem.

Define hn : R→ R by

hn(x) =

{(1− x2)n, if |x| ≤ 1

0, if |x| > 1.

Let cn =∫ 1

−1 hn, and let gn = 1cnhn. (A sketch of hn, and of gn, will aid in understanding.)

It is immediate that this sequence (gn) satisfies the first two properties above. For the third,note that

1− t2 ≥ 1− t on [0, 1]

(1− t2)n ≥ (1− t)n∫ 1

0

(1− t2)n dt ≥∫ 1

0

(1− t)n dt =1

n+ 1

cn =

∫ 1

−1(1− t2)n dt ≥ 2

n+ 1,

and so∫ 1

δ

gn(t) dt =1

cn

∫ 1

δ

(1− t2)n dt ≤ n+ 1

2

∫ 1

δ

(1− δ2)n dt =1− δ

2(n+ 1)(1− δ2)n → 0

as n→∞. Similarly,∫ −δ−1 gn → 0 as n→∞. Hence

∫ δ−δ gn → 1.

Now we will state and prove the Weierstrass approximation theorem.


Theorem 26.1. Let f : [a, b] → R be continuous. There is a sequence of polynomials pnconverging uniformly to f on [a, b].

Proof. The proof is easier if we first make some reductions. Suppose we prove the theoremin the case that [a, b] = [0, 1]. Let f : [a, b]→ R be continuous. Let ϕ(x) = a+ (b− a)x andψ(x) = (x− a)/(b− a). Then ϕ and ψ are inverses of each other, and ϕ

([0, 1]

)= [a, b]. Now

f ◦ϕ : [0, 1]→ R is continuous, so we are assuming that there are polynomials qn convergingto f ◦ ϕ uniformly on [0, 1]. Then qn ◦ ψ converges to f ◦ ϕ ◦ ψ = f uniformly on [a, b], andit is clear that qn ◦ ψ are also polynomials.

Now suppose that we can prove the theorem in the case that [a, b] = [0, 1] and with theassumption that f(0) = f(1) = 0. Let f : [0, 1] → R be an arbitrary continuous function.Let w(x) = f(0) +

(f(1) − f(0)

)x. Then f − w is continuous on [0, 1], and vanishes at 0

and at 1. By our assumption, there is a sequence qn of polynomials converging to f − wuniformly on [0, 1]. But then qn +w is a sequence of polynomials converging uniformly to fon [0, 1].

The above remarks mean that if we can prove theorem in the case where [a, b] = [0, 1] andf(0) = f(1) = 0, then we will have proved the theorem in general. So now we consider sucha function f . Since f is continuous on the compact set [0, 1], it is bounded. Let |f | ≤ Mon [0, 1]. Extend the domain of f to all of R by setting f(t) = 0 for t 6∈ [0, 1]. Then f iscontinuous on R. Let gn(t) = (1/cn)(1 − t2)nχ[−1,1] be as above. For x ∈ [0, 1] define pn(x)by

pn(x) =

∫Rf(t+ x)gn(t) dt.

(Note that pn is defined just like An via equation (∗) in our preliminary discussion.)We now claim that pn converges to f uniformly on [0, 1]. For this, we use the properties

of gn as an approximate identity. Let ε > 0. Choose δ > 0 as in the definition of uniformcontinuity of f on [0, 1]. Notice that because f vanishes outside the interval [0, 1], this δsatisfies the definition of uniform continuity for f on all of R. By property (3) of approximate

identities, there is n0 ∈ N such that 1−∫ δ−δ gn < δ for n ≥ n0. Now, for any n ≥ n0 and any

x ∈ [0, 1], we have∣∣pn(x)− f(x)∣∣ =

∣∣∣∣∫Rf(t+ x)gn(t) dt− f(x)

∫Rgn(t) dt

∣∣∣∣=

∣∣∣∣∫R

(f(t+ x)− f(x)

)gn(t) dt

∣∣∣∣≤∫R

∣∣f(t+ x)− f(x)∣∣gn(t) dt

=

∫[−δ,δ]

∣∣f(t+ x)− f(x)∣∣gn(t) dt+

∫[−1,−δ]∪[δ,1]

∣∣f(t+ x)− f(x)∣∣gn(t) dt

= C1 + C2,

where we have restricted the integration to the interval [−1, 1] because gn vanishes outsidethat interval. We estimate C1 and C2 separately. For |t| ≤ δ we have

∣∣f(t+ x)− f(x)∣∣ < ε,

so

C1 ≤∫ δ

−δεgn ≤ ε

∫gn = ε.

68 JACK SPIELBERG

For |t| > δ we have∣∣f(t+ x)− f(x)

∣∣ ≤ 2M , so

C2 ≤ 2M

∫[−1,−δ]∪[δ,1]

gn = 2M(1−

∫ δ

−δgn)< 2Mε.

Thus∣∣pn(x)− f(x)

∣∣ < (2M + 1)ε for all x ∈ [0, 1].We will finish the proof by showing that pn is a polynomial on [0, 1]. For x ∈ [0, 1],

pn(x) =

∫Rf(t+ x)gn(t) dt

=

∫ 1−x

−xf(t+ x)gn(t) dt, since f = 0 outside [0, 1],

=

∫ 1

0

f(u)gn(u− x) du, by the change of variable u = x+ t

=

∫ 1

0

f(u)1

cn

(1− (u− x)2

)ndu, since u− x ∈ [−1, 1] when u ∈ [0, 1].

Note that 1cn

(1− (u− x)2

)nis a polynomial in u and x:

1

cn

(1− (u− x)2

)n=

2n∑i,j=0

aijuixj =

2n∑j=0

(2n∑i=0

aijui

)xj.

It follows that

pn(x) =2n∑j=0

(∫ 1

0

f(u)( 2n∑i=0

aijui)du

)xj

is a polynomial in x. �

27. Uniform convergence and the interchange of limits

There are many limiting processes in analysis, and it is frequently the case that two ofthem bump up against each other. We have seen this once already, in Theorem 19.7. Let’srecall that statement: “Let f , fn : X → Rk, and let a ∈ X. Suppose that each fn iscontinuous at a, and that fn → f uniformly. Then f is continuous at a.” We can rewritethis in the following way:

limx→a

limn→∞

fn(x) = limn→∞

limx→a

fn(x),

which shows that it is an example of the interchange of two limiting processes. We alsosaw an example where the above equation does not hold — in that example, the sequence offunctions converges pointwise, but not uniformly. It is the uniform nature of the convergencethat makes the theorem true. This points out another aspect of such situations: the orderof two limiting processes may be reversed if appropriate conditions hold. As a general rule,you should always verify such conditions explicitly when making such an interchange. Fromthe point of view of an instructor, the interchange of two limits in a solution is ALWAYS ared flag, and must be justified in detail by the student.

Theorem 27.1. Let fn ∈ R[a, b], let f : [a, b] → R, and suppose that fn → f uniformly on

[a, b]. Then f ∈ R[a, b], and∫ baf = limn

∫ bafn.


Proof. Let ε > 0 be given. Let η = ε/(1 + 2(b− a)

). Choose N such that ‖fn − f‖u < η for

n ≥ N . Let n ≥ N . Since fn ∈ R[a, b] there are step functions g0, h0 such that g0 ≤ fn ≤ h0on [a, b], and

∫ ba(h0 − g0) < η. Let g = g0 − η and h = h0 + η. Then g and h are step

functions. If x ∈ [a, b], we have

g(x) = g0(x)− η ≤ fn(x)− η < f(x) < fn(x) + η ≤ h0(x) + η = h(x).

Moreover,∫ b

a

(h− g) =

∫ b

a

(h0 − g0 + 2η) =

∫ b

a

(h0 − g0) + 2η(b− a) < η(1 + 2(b− a)

)= ε.

It follows that f ∈ R[a, b]. Finally, we have∣∣∣∣∫ b

a

fn −∫ b

a

f

∣∣∣∣ ≤ ∫ b

a

|fn − f | ≤ η(b− a) < ε,

so we have that limn→∞∫ bafn =

∫ baf . �

It is instructive to find an example of a sequence converging pointwise, but whose integralsdo not converge to the integral of the limit function.

We now turn to the derivative of the limit of a sequence of differentiable functions. Here,the situation is more complicated: uniform convergence seems to have nothing to do evenwith differentiability of the limit, let alone with convergence to the derivative of the limitif the limit is differentiable. For example, we know from the Weierstrass approximationtheorem that every continuous function is a uniform limit of polynomials, which are certainlydifferentiable. But the continuous limit need not be differentiable. This is an indication thatwe need to assume more to get a theorem analogous to the previous one, but for derivatives.The key idea is that we have to assume that the sequence of derivatives converges uniformly.Actually, even though this is a very strong hypothesis, it is not quite enough. For example,any sequence of constant functions has derivatives that converge uniformly (since they areall identically zero), but the sequence of functions need not converge at all.

Theorem 27.2. Let I be an interval, and let fn : I → R be differentiable. Suppose that (f ′n)converges uniformly on I to a function g. Suppose additionally that there is a ∈ I such thatthe sequence of function values

(fn(a)

)converges. Then (fn) converges to a differentiable

function f , and f ′ = g. Moreover, the convergence of fn to f is uniform on any boundedsubinterval of I.

Proof. We first show that there is a function f to which (fn) converges, and that this con-vergence is uniform on bounded subintervals. Let ε > 0 be given. Choose N so that‖f ′n − f ′m‖u < ε and

∣∣fn(a) − fm(a)∣∣ < ε for all m, n ≥ N . Let J ⊆ I be a bounded

subinterval; say |x| ≤M for x ∈ J . If x ∈ J , we have∣∣fn(x)− fm(x)∣∣ =

∣∣(fn − fm)(x)∣∣

=∣∣(fn − fm)(x)− (fn − fm)(a)

∣∣+∣∣fn(a)− fm(a)

∣∣=∣∣(fn − fm)′(c)

∣∣|x− a|+ ∣∣fn(a)− fm(a)∣∣,

70 JACK SPIELBERG

for some c between x and a,

≤ ε|x− a|+ ε

= ε(|x− a|+ 1

)≤ ε(M + |a|+ 1).

Thus (fn) is uniformly Cauchy on J , and hence converges uniformly on J (and pointwise onall of I too).

Let f be the limit of fn. We now show that f is differentiable, and that f ′ = g. Letε > 0. Choose N so that ‖f ′n− f ′m‖u < ε/3 for m, n ≥ N . Letting m→∞, we see also that‖f ′n− g‖u ≤ ε/3. Now fix n ≥ N , and fix x ∈ I. For any h 6= 0 such that x+ h ∈ I, we have∣∣∣∣fn(x+ h)− fn(x)

h− f(x+ h)− f(x)

h

∣∣∣∣ = limm→∞

∣∣∣∣fn(x+ h)− fn(x)

h− fm(x+ h)− fm(x)

h

∣∣∣∣= lim

m→∞

∣∣∣∣(fn − fm)(x+ h)− (fn − fm)(x)

h

∣∣∣∣= lim

m→∞

∣∣(fn − fm)′(x+ θh)∣∣,

for some θ ≡ θ(n,m, x, h) ∈ (0, 1),

≤ ε

3.

Next we use the differentiability of fn (at x) to choose δ > 0 such that∣∣∣∣fn(x+ h)− fn(x)

h− f ′n(x)

∣∣∣∣ < ε

3,

whenever 0 < |h| < δ and x+ h ∈ I. Then for such h we have∣∣∣∣f(x+ h)− f(x)

h− g(x)

∣∣∣∣ ≤ ∣∣∣∣f(x+ h)− f(x)

h− fn(x+ h)− fn(x)

h

∣∣∣∣+

∣∣∣∣fn(x+ h)− fn(x)

h− f ′n(x)

∣∣∣∣+∣∣f ′n(x)− g(x)

∣∣<ε

3+ε

3+ε

3= ε.

Therefore limh→0

(f(x+h)−f(x)

)/h = g(x). Hence f is differentiable, and f ′(x) = g(x). �

As a last example of the interchange of two limiting processes, we give a result on differen-tiating an integral. For this we recall from earlier experience the notion of partial derivative.Let f : [a, b] × [c, d] → R, and suppose that for each y ∈ [c, d] the function x 7→ f(x, y) isdifferentiable on [a, b]. The partial derivative of f with respect to x is defined by

∂f

∂x(x, y) = lim

h→0

f(x+ h, y)− f(x, y)

h.

Theorem 27.3. Let f : [a, b]× [c, d]→ R be continuous, and suppose that ∂f/∂x exists and

is continuous on [a, b] × [c, d]. Let G : [a, b] → R be defined by G(x) =∫ dcf(x, y) dy. Then

G is differentiable on [a, b], and G′(x) =∫ dc

(∂f/∂x)(x, y) dy.


Proof. Let ε > 0. Since ∂f/∂x is continuous on the compact set [a, b]× [c, d], it is uniformlycontinuous. Let δ > 0 be as in the definition of uniform continuity for ∂f/∂x on [a, b]× [c, d]and for the positive quantity ε/(d− c). Now if x, x+ h ∈ [a, b] with 0 < |h| < δ, then∣∣∣∣G(x+ h)−G(x)

h−∫ d

c

∂f

∂x(x, y) dy

∣∣∣∣ =

∣∣∣∣∫ d

c

(f(x+ h, y)− f(x, y)

h− ∂f

∂x(x, y)

)dy

∣∣∣∣=

∣∣∣∣∫ d

c

(∂f

∂x(x+ θh, y)− ∂f

∂x(x, y)

)dy

∣∣∣∣ ,for some θ ≡ θ(x, y, h) ∈ (0, 1),

≤ (d− c) ε

d− c= ε.

It follows that G′(x) =∫ dc

(∂f/∂x)(x, y) dy. �

28. Infinite series

Definition 28.1. Let (an)n∈N be a sequence in R. The infinite series∑∞

n=1 an is defined asfollows. For each n let sn =

∑ni=1 ai; sn is called the nth partial sum of the series. The infinite

series is the sequence (sn)n∈N of partial sums. The series∑∞

n=1 an converges (diverges) if thesequence of partial sums converges (diverges). The sum of a convergent series is the limit ofthe sequence of partial sums. The sum is usually denoted by

∑∞n=1 an.

Remark 28.2. Frequently, an infinite series uses N∪ {0} as index set. Other intervals in Zare also used.

Remark 28.3. The Cauchy criterion for convergence of real sequences can be translatedinto the following criterion for convergence of series:

∑an converges if and only if for every

ε > 0, there exists n0 ∈ N such that for all n0 ≤ m ≤ n we have∣∣∑n

i=m ai∣∣ < ε.

Theorem 28.4. (Test for divergence.) Let∑∞

n=1 an be an infinite series. If the seriesconverges, then limn→∞ an = 0.

Proof. Suppose that∑∞

n=1 an converges. Then

limn→∞

an = limn→∞

(sn − sn−1) = limn→∞

sn − limn→∞

sn−1,

since both of these limits exist,

=∞∑n=1

an −∞∑n=1

an = 0.

�

Example 28.5. (1) The series∑∞

n=0(−1)n diverges, since limn→∞(−1)n does not exist(and hence is not equal to zero).

(2) The series∑∞

n=1 n−1/n diverges, since limn→∞ n

−1/n = 1/(limn→∞n√n) = 1 is nonzero.

Proposition 28.6. If∑an and

∑bn converge, then so does

∑(λan + µbn), and

∑(λan +

µbn) = λ∑an + µ

∑bn.

Proof. This follows immediately from the corresponding results for sequences. �

72 JACK SPIELBERG

Theorem 28.7. (Geometric series.) Let x ∈ R. The series∑∞

n=0 xn converges if and only if

|x| < 1, in which case its sum is 1/(1−x). (The number x is called the ratio of the geometricseries.)

Proof. First suppose that |x| < 1. Note that

xsn =n+1∑i=1

xi = sn + xn+1 − 1,

and hence

sn =1− xn+1

1− x.

Since |x| < 1, then limn→∞ xn+1 = 0. Thus limn→∞ sn = 1/(1− x).

Conversely, if |x| ≥ 1, then (xn) does not converge, so the series diverges by the test fordivergence. �

Remark 28.8. Generally, it is very difficult to find the sum of a convergent infinite series.Often we content ourselves with being able to prove convergence (or divergence). Geometricseries are one of the exceptions. For the next family of series, we will not worry about thevalue of the sum. First we make an observation about series with nonnegative terms.

Lemma 28.9. Let∑∞

n=1 an be an infinite series, and suppose that an ≥ 0 for all n. Thenthe series converges if and only if the sequence of partial sums is bounded.

Proof. The sequence of partial sums is increasing, since sn+1 = sn + an+1 ≥ sn. We alreadyknow that a monotone sequence converges if and only if it is bounded. �

Theorem 28.10. (p series) Let p ∈ R. The series∑∞

n=1 1/np converges if and only if p > 1.

Proof. If p ≤ 0, then the terms of the series are increasing. In this case, 1/np cannot convergeto zero, so the series diverges. Now assume that p > 0. In this case the terms of the seriesare decreasing. Let an = 1/np, and consider two cases. First, suppose that p > 1. We havethat

2n−1∑i=1

ai = a1 + (a2 + a3) + (a4 + · · ·+ a7) + · · ·+ (a2n−1 + · · ·+ a2n−1)

≤ a1 + 2a2 + 4a4 + · · ·+ 2n−1a2n−1 ,

since the terms are decreasing. Hence

s2n−1 ≤n−1∑j=0

2ja2j =n−1∑j=0

2j−jp =n−1∑j=0

(21−p)j.

This last is the partial sum of a geometric series with ratio 21−p. Since p > 1, the ratio is lessthan 1, and hence the geometric series converges. It follows that the partial sums of

∑1/np

are bounded, hence it converges.Next we suppose that 0 < p ≤ 1. We have that

2n∑i=1

ai = a1 + a2 + (a3 + a4) + (a5 + · · ·+ a8) + · · ·+ (a2n−1+1 + · · ·+ a2n)

≥ a2 + 2a4 + 4a8 + · · ·+ 2n−1a2n ,


again since the terms are decreasing. Hence

s2n ≥n−1∑j=0

2ja2j+1 =n−1∑j=0

2j−(j+1)p = 2−pn−1∑j=0

(21−p)j.

Since p ≤ 1, the ratio of this last geometric series is 21−p ≥ 1, hence it diverges. It followsthat the partial sums of

∑1/np are unbounded, so it diverges also. �

Example 28.11. We particularly draw attention to the case p = 1:∑∞

n=1 1/n is called theharmonic series, and it diverges.

Having a few series whose behavior is known makes it relatively easy to establish conver-gence or divergence of other series.

Theorem 28.12. (Comparison test.) Let∑∞

n=1 an and∑∞

n=1 bn be series, and suppose thatan, bn ≥ 0 for all n. Suppose further that an ≤ bn for all n.

(1) If∑∞

n=1 bn converges, then so does∑∞

n=1 an.

(2) If∑∞

n=1 an diverges, then so does∑∞

n=1 bn.

(In fact, both conclusions hold if the corresponding inequalities are valid only for n ≥ n0.)

Proof. Let sn =∑n

i=1 ai and tn =∑n

i=1 bi. Since ai ≤ bi for all i, we know that sn ≤ tnfor all n. If

∑∞n=1 bn converges, then (tn) is bounded above, and hence so is (sn). Therefore∑∞

n=1 an converges. If∑∞

n=1 an diverges, then (sn) is unbounded, and hence so is (tn). Itfollows that

∑∞n=1 bn diverges. �

Definition 28.13. Let∑∞

n=1 an be a convergent infinite series. It converges absolutely ifthe series

∑∞n=1 |an| converges. It converges conditionally if

∑∞n=1 |an| diverges.

Thus all infinite series may be classified into three (mutually exclusive) types: absolutelyconvergent, conditionally convergent, and divergent. Most of the previous results concernseries whose terms are nonnegative; hence they are useful for establishing absolute conver-gence. It is appropriate to think of absolute convergence as “robust”, and of conditionalconvergence as “touchy”. In this course we will only treat absolute convergence (for lackof time). However a few remarks about conditional convergence are in order. The simplestexample of a conditionally convergent series is the alternating harmonic series :

1

1− 1

2+

1

3− 1

4+ · · · .

It is an easy exercise to prove that this series converges. Since its absolute value is the har-monic series that we already know diverges, the alternating harmonic series is conditionallyconvergent. The difference between absolute and conditional convergence can be illustratedby the fact that the sum of the alternating harmonic series can be altered by changing theorder in which the terms are added. This somewhat counter-intuitive fact can be explainedby noticing that the “subseries” of odd terms by itself is divergent, as is the subseries of eventerms. This phenomenon is completely general: the sum of an absolutely convergent series isunaffected by rearranging the terms, while the sum, and even convergence, of a conditionallyconvergent series can be changed arbitrarily by a suitable rearrangement of the terms. Wewill not discuss this theorem further.

The next two theorems give the most useful tests for absolute convergence.

74 JACK SPIELBERG

Theorem 28.14. (Ratio test.) Let∑∞

n=1 an be a series of nonzero terms. Let

L = lim supn→∞

∣∣∣∣an+1

an

∣∣∣∣ .(1) If L < 1 then the series converges absolutely.(2) If there exists n0 ∈ N such that

∣∣an+1/an∣∣ ≥ 1 for all n ≥ n0, then the series diverges.

Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that∣∣an+1/an

∣∣ ≤ r forn ≥ n0. Then for k ≥ 0 we have

|an0+k| ≤ r|an0+k−1| ≤ r2|an0+k−2| ≤ · · · ≤ rk|an0 |.It follows that for n ≥ n0 we have |an| ≤ Crn, where C = |an0|/rn0 . The absoluteconvergence of

∑an now follows from by comparison with the geometric series of

ratio r.(2) In this case, the hypotheses imply that |an| ≥ |an0| for all n ≥ n0, and hence an does

not tend to zero.�

Theorem 28.15. (Root test.) Let∑∞

n=1 an be a series. Let

L = lim supn→∞

n√|an|.

(1) If L < 1 then the series converges absolutely.(2) If L > 1 then the series diverges.

Proof. (1) Choose r such that L < r < 1. Let n0 ∈ N be such that n√|an| ≤ r for n ≥ n0.

Then |an| ≤ rn for n ≥ n0. The absolute convergence of∑an now follows from by

comparison with the geometric series of ratio r.(2) Let n0 be such that n

√|an| > 1 for n ≥ n0. Then |an| > 1 for n ≥ n0, and hence an

does not tend to zero.�

The ratio and root test are useful in situations where the series converges at least asstrongly as some geometric series. Note that both tests are inclusive for the p series (exer-cise!). The ratio test is usually easier to apply, but the root test is more effective: if the ratiotest indicates convergence, then the root test does too. There are series for which the ratiotest is inconclusive, but the root test indicates convergence (exercises).

29. Series of functions

Definition 29.1. Let X be a metric space, and let fn : X → R. The series of functions∑∞n=1 fn converges pointwise (uniformly) if the sequence of partial sums converges pointwise

(uniformly).

It is easy to translate results about convergence of sequences of functions to statementsabout series of functions.

Theorem 29.2. (Cauchy criterion)∑∞

n=1 fn converges uniformly on X if and only if forevery ε > 0 there exists n0 ∈ N such that for all m, n ≥ n0 and for all x ∈ X,

∣∣∑ni=m fi(x)

∣∣ <ε.

The following criterion for uniform convergence is called the Weierstrass M-test :


Corollary 29.3. Let∑∞

n=1 fn be a series of functions on X. Let (Mn)∞1 be a sequence ofconstants such that

∣∣fn(x)∣∣ ≤ Mn for all x ∈ X and all n. If

∑∞n=1Mn converges, then∑∞

n=1 fn converges uniformly.

We also have the following facts as immediate corollaries of the theorems on uniformconvergence.

Theorem 29.4. Let fn : X → R be functions.

(1) If fn is continuous for all n, and∑fn converges uniformly, then

∑fn is continuous.

(2) If X = [a, b], fn ∈ R[a, b] for all n, and∑fn converges uniformly, then

∑fn ∈

R[a, b], and∫ ba

∑fn =

∑∫ bafn.

(3) If X = [a, b], fn is differentiable for all n,∑f ′n converges uniformly, and

∑fn(x0)

converges for some x0 ∈ [a, b], then∑fn is differentiable, and (

∑fn)′ =

∑f ′n.

30. Power series

Definition 30.1. Let x0 ∈ R and let (an)∞n=0 be a real sequence. The series∑∞

n=0 an(x−x0)nis called a power series with center x0. The domain of convergence of the power series isthe set D =

{x ∈ R :

∑∞n=0 an(x− x0)n converges

}.

Theorem 30.2. Let∑∞

n=0 an(x− x0)n be a power series. There are three possibilities.

(1) The series converges absolutely for all x ∈ R. The convergence is uniform on compactsets.

(2) There is R > 0 such that the series converges absolutely if |x−x0| < R, and divergesif |x − x0| > R. The convergence is uniform on compact subsets of the interval(x0 −R, x0 +R).

(3) The series diverges for all x 6= x0.

Proof. Let x, y ∈ R with |y − x0| < |x − x0|. First suppose that the series converges at x.Then limn→∞ an(x− x0)n = 0, so there is M such that

∣∣an(x− x0)n∣∣ ≤M for all n. Now we

have ∣∣an(y − x0)∣∣n =

∣∣an(x− x0)n∣∣ ∣∣∣∣y − x0x− x0

∣∣∣∣nThus

∑an(y − x0)n converges absolutely by comparison with the geometric series

∑Mrn,

where r = |y− x0|/|x− x0| < 1. The contrapositive of the above implication shows that, onthe other hand, if the series diverges at y, then it must also diverge at x.

Now, if cases (1) and (3) do not apply, let R = sup{|x − x0| : the series converges at

x}

. It follows from the above that 0 < R <∞, and that the series converges absolutely for|x− x0| < R and diverges for |x− x0| > R. We let R =∞ in case (1), and we let R = 0 incase (3). In cases (1) and (2), if K ⊆

{x : |x − x0| < R

}is a compact set, then there are

r, s such that 0 < r < s < R, and K ⊆ (x0 − r, x0 + r). Let x = x0 + s, and let M be asabove:

∣∣an(x− x0)n∣∣ ≤M for all n. Let Mn = M(r/s)n. Then

∑nMn <∞. For any y ∈ K

we have ∣∣an(y − x0)n∣∣ =

∣∣an(x− x0)n∣∣ ∣∣∣∣y − x0x− x0

∣∣∣∣ ≤M(rs

)n= Mn.

Then the convergence is uniform on K by the Weierstrass M-test. �

Definition 30.3. The number R in Theorem 30.2 is called the radius of convergence of thepower series.

76 JACK SPIELBERG

We see that (x0−R, x0+R) ⊆ D ⊆ [x0−R, x0+R], where D is the domain of convergenceof the power series.

Theorem 30.4. R =(lim supn→∞

n√|an|)−1

Proof. We use the root test: lim supn→∞∣∣an(x − x0)

n∣∣1/n = lim supn→∞ |an|1/n|x − x0|.

This is less than 1 if |x − x0| <(lim supn→∞

n√|an|)−1

, and greater than 1 if |x − x0| >(lim supn→∞

n√|an|)−1

. This identifies the number R as in the statement of the theorem. �

Theorem 30.5. Let∑an(x−x0)n have a positive radius of convergence R. Then

∑nan(x−

x0)n−1 and

∑(an/(n+ 1)

)(x−x0)n+1 also have radius of convergence R. If f : (x0−R, x0 +

R) → R is defined by f(x) =∑∞

n=0 an(x − x0)n, then these converge to f ′(x) and F (x) forx ∈ (x0 −R, x0 +R) (where F (x) =

∫ xx0f(t) dt).

Proof. For x 6= x0,∞∑n=0

nan(x− x0)n−1 =∞∑n=0

nanx− x0

(x− x0)n.

Since limn n1/n = limn |x− x0|1/n = 1, we have

lim supn→∞

∣∣∣∣ nanx− x0

∣∣∣∣1/n = lim supn→∞

|an|1/n = R−1.

For the antidifferentiated series, first note that

1 ≤ (n+ 1)1/n =

(1 +

1

n

)1/n

n1/n ≤ 21/nn1/n → 1 as n→∞.

Now we have∞∑n=0

ann+ 1

(x− x0)n+1 =∞∑n=0

(an(x− x0)n+ 1

)(x− x0)n,

and hence

lim supn→∞

∣∣∣∣an(x− x0)n+ 1

∣∣∣∣1/n = lim supn→∞

|an|1/n = R−1.

Since the convergence of all three series is uniform on compact subsets of the interior ofthe domain of convergence, we may differentiate or integrate term-by-term, by Theorem29.4. �

31. Compactness in function space

Let X be a compact metric space (we will only consider the case where X is compact).Recall that C(X,Rk) is a complete metric space with the uniform norm: ‖f − g‖u =supx∈X

∥∥f(x) − g(x)∥∥. It is important to remember that the Heine-Borel theorem does not

hold in this setting: it is possible for a closed bounded set to be non-compact. In particular,the closed unit ball B =

{f ∈ C(X,Rk) : ‖f‖u ≤ 1

}is (usually) not compact.

Here is an example, with X = [0, 1]. Let fn(x) = xn. Then fn ∈ B. If B were compact,then the sequence (fn) in B would have a convergent subsequence — that means a uniformlyconvergent subsequence. But we have already seen that there is no such subsequence, sincethe pointwise limit of fn exists and is discontinuous.


We will characterize the compact subsets of C(X,Rk). Recall that a set is compact ifand only if it is complete and totally bounded. Since C(X,Rk) is already a complete metricspace, a subset is complete if and only if it is closed. Therefore we will focus our attentionon the property of total boundedness: how can we describe in a more intrinsic way what itmeans for a subset of C(X,Rk) to be totally bounded?

Let F ⊆ C(X,Rk) be totally bounded. Let ε > 0. Then there are f1, . . ., fn ∈ F suchthat F ⊆

⋃ni=1Bε(fi). Since X is compact, and the fi are continuous, they are uniformly

continuous. Thus for each i there is δi > 0 such that for all x, y ∈ X, if d(x, y) < δi then∥∥fi(x)− fi(y)∥∥ < ε. Let δ = min{δ1, . . . , δn}. We claim that for any function f ∈ F , this δ

works in the definition of uniform continuity. To see this, let f ∈ F , and let x, y ∈ X withd(x, y) < δ. There is i0, 1 ≤ i0 ≤ n, such that ‖f − fi0‖ < ε. Then∥∥f(x)− f(y)

∥∥ ≤ ∥∥f(x)− fi0(x)∥∥+

∥∥fi0(x)− fi0(y)∥∥+

∥∥fi0(y)− f(y)∥∥ < ε+ ε+ ε = 3ε.

Thus we have shown that the functions in the family F are “equally uniformly continuous”.This phrase has been shortened to “equicontinuous”.

Definition 31.1. Let F be a family of functions between metric spaces X and Y . Letx0 ∈ X.

(1) F is equicontinuous at x0 if for each ε > 0 there is δ > 0 such that for each f ∈ Fand for all x ∈ X, if dX(x, x0) < δ then dY

(f(x), f(x0)

)< ε. (I.e. δ is independent

of the choice of f ∈ F .)(2) F is equicontinuous (on X) if it is equicontinuous at each point of X.(3) F is uniformly equicontinuous (on X) if for each ε > 0 there is δ > 0 such that for

each f ∈ F and for all x, z ∈ X, if dX(x, z) < δ then dY(f(x), f(z)

)< ε.

Exercise 31.2. If X is compact, and F is equicontinuous, then F is uniformly equicontin-uous.

Because of this exercise, when X is compact we need not distinguish between equicontinu-ity and uniform equicontinuity. We remark that there are stupid examples of equicontinuousfamilies. For example, in C(X,R), we may consider the family of all constant functions.This family is clearly equicontinuous, but is not totally bounded (or even bounded). For thisreason we identify another property of a family of functions.

Definition 31.3. F ⊆ C(X,Rk) is pointwise bounded if for each x ∈ X, the set F(x) :={f(x) : f ∈ F

}is a bounded subset of Rk.

Exercise 31.4. If F ⊆ C(X,Rk) is pointwise bounded and equicontinuous, then F is abounded subset (of C(X,Rk)).

We remark that a totally bounded subset of C(X,Rk) is also bounded, and hence pointwisebounded. Thus we have already proved the following result.

Lemma 31.5. Let X be compact and F ⊆ C(X,Rk). If F is totally bounded, then F ispointwise bounded and equicontinuous.

The Arzela-Ascoli theorem is the converse of the lemma. It is usually phrased in terms ofprecompactness: a subset of a metric space is precompact if its closure is compact. In thesetting of C(X,Rk), then, precompactness is the same as total boundedness.

Theorem 31.6. Let X be a compact metric space, and let F ⊆ C(X,Rk). Then F isprecompact if and only if it is pointwise bounded and equicontinuous.

78 JACK SPIELBERG

Proof. As remarked above, we have already proved the “only if” direction. So we assume thatF is pointwise bounded and equicontinuous. We use Exercise 31.2; hence F is uniformlyequicontinuous. Let ε > 0. Choose δ > 0 as in the definition of uniform equicontinuityof F . Since X is compact, X is totally bounded. Then there are x1, . . ., xp ∈ X suchthat X =

⋃pi=1Bδ(xi). Now we use the pointwise boundedness of F . For each i, the set

F(xi) ={f(xi) : f ∈ F

}is a bounded subset of Rk, hence is totally bounded (by Lemma

14.27). Then the union⋃pi=1F(xi) is also totally bounded. So we can choose points y1, . . .,

yq ∈ Rk such thatp⋃i=1

F(xi) ⊆q⋃j=1

Bε(yj).

Now we come to the interesting part of the argument. Let f ∈ F . For each i, choose jsuch that f(xi) ∈ Bε(yj). This defines a function ηf : {1, 2, . . . , p} → {1, 2, . . . , q}. Thus ηfsatisfies the formula

f(xi) ∈ Bε(yηf (i)).

But notice that there are only a finite number of possible functions η : {1, 2, . . . , p} →{1, 2, . . . , q}. For each such function η, let

Cη = {f ∈ F : ηf = η}.Then F ⊆

⋃η Cη, a finite union. Each Cη is a subset of C(X,Rk). To finish the proof, we

will show that Cη has diameter at most 4ε. Let f , g ∈ Cη, for some η. Then for i = 1, . . .,p, we have f(xi), g(xi) ∈ Bε(yη(i)). For any x ∈ X choose i with x ∈ Bδ(xi). Then∥∥f(x)−g(x)‖ ≤

∥∥f(x)−f(xi)∥∥+

∥∥f(xi)−g(xi)∥∥+

∥∥g(xi)−g(x)∥∥ < ε+

∥∥f(xi)−g(xi)∥∥+ ε,

by the uniform equicontinuity of F (and the choice of δ),

< ε+ 2ε+ ε,

since f(xi) and g(xi) belong to a ball of radius ε. Thus ‖f − g‖u < 4ε. Therefore Cη hasdiameter at most 4ε. �

32. Conditional convergence

Theorem 32.1. (Abel’s theorem.) Let∑an have bounded partial sums, and let (bn) be a

decreasing nonnegative sequence. If M ≥ 0 is such that∣∣∑n

j=1 aj∣∣ ≤ M for all n, then for

any m ≤ n we have the estimate∣∣∑n

j=m ajbj∣∣ ≤ 2Mbm. Moreover, if bn → 0, then

∑anbn

converges.

Proof. Let sn =∑n

j=1 aj. We have

n∑j=m

ajbj =n∑

j=m

(sj − sj−1)bj

=n∑

j=m

sjbj −n−1∑

j=m−1

sjbj+1

=n−1∑j=m

sj(bj − bj+1) + snbn − sm−1bm.


We then have ∣∣∣∣∣n∑

j=m

ajbj

∣∣∣∣∣ ≤n−1∑j=m

∣∣sj(bj − bj+1)∣∣+ |snbn|+ |sm−1bm|

≤n−1∑j=m

M(bj − bj+1) +Mbn +Mbm,

since bj − bj+1, bn, and bm ≥ 0,

= 2Mbm.

If bm → 0 as m→∞, the series∑anbn converges by the Cauchy criterion. �

Remark 32.2. The estimate in Abel’s theorem also shows that the “remainder” after mterms satisfies

∣∣∑j>m ambm

∣∣ ≤ 2Mbm+1.

Corollary 32.3. (Alternating series test.) Let (bn) be a decreasing sequence with limit 0.Then the alternating series

b1 − b2 + b3 − · · · =∞∑n=1

(−1)n−1bn

converges, and∣∣∑∞

j=n+1(−1)j−1bj∣∣ ≤ bn+1.

Proof. With an = (−1)n, Abel’s theorem proves convergence, and gives the estimate with afactor of 2. However, since the partial sums of

∑an are all non-negative (either 0 or 1), the

estimate in that proof can be improved as in the statement of the corollary. We leave thedetails to the interested reader. �

Example 32.4. (1) (The alternating harmonic series.)∑∞

n=1(−1)n−1/n = 1 − 1/2 +1/3− 1/4 + 1/5− · · · converges by the alternating series test. (We will see later thatthe sum is log 2.)

(2) Let θ be an irrational number. (In fact, the argument we present applies to anynon-integral real number θ.) In the following, we will apply the formula for the sumof a finite geometric series to complex numbers.

n∑j=1

sin 2πjθ = Imn∑j=0

(cos 2πjθ + i sin 2πjθ)

= Imn∑j=0

(cos 2πθ + i sin 2πθ)j

= Im1− (cos 2πθ + i sin 2πθ)n+1

1− (cos 2πθ + i sin 2πθ).

80 JACK SPIELBERG

Hence ∣∣∣∣∣n∑j=1

sin 2πjθ

∣∣∣∣∣ ≤ 2∣∣1− (cos 2πθ + i sin 2πθ)∣∣

=2√

(1− cos 2πθ)2 + sin2 2πθ

=

√2

1− cos 2πθ.

Thus the series∑

n sin 2πnθ has bounded partial sums. By Abel’s theorem, the series∑n

sin 2πnθn

converges.

Abel also proved the following theorem on the behavior of a power series at an endpointof the interval of convergence.

Theorem 32.5. Let∑∞

n=0 an(x−x0)n have radius of convergence 0 < R <∞. Suppose thatthe series converges at an endpoint of the interval of convergence. Then the series convergesuniformly on the closed interval from x0 to that endpoint.

Corollary 32.6. With the hypotheses of the theorem, let f(x) denote the sum of the seriesin its domain of convergence. Then f is continuous.

Proof. (of theorem) A linear change of variables reduces the theorem to the case where x0 = 0and R = 1. We consider the case where the series converges at the right-hand endpoint;the other case has a similar proof. Thus we have a power series

∑∞n=0 anx

n with radius ofconvergence 1, and such that

∑an converges. Let ε > 0 be given. Applying the Cauchy

criterion to∑an, we obtain n0 ∈ N such that for all n0 ≤ m ≤ n we have∣∣∣∣∣

n∑j=m

aj

∣∣∣∣∣ < ε

2.

For any x ∈ [0, 1], the sequence xn is decreasing. We apply Abel’s theorem to the series∑∞j=n0

ajxj to get ∣∣∣∣∣

n∑j=m

ajxj

∣∣∣∣∣ 2(ε2)xm ≤ ε,

a uniform estimate. �

Example 32.7. (1) From the geometric series 1/(1 − x) =∑∞

n=0 xn for |x| < 1, we

integrate term-by-term to obtain

− log(1− x) =

∫ x

0

dt

1− t=∞∑n=0

∫ x

0

tn dt =∞∑n=0

xn+1

n+ 1=∞∑n=1

xn

n,

still with radius of convergence equal to 1. Replacing x by −x we get

(∗∗) log(1 + x) =∞∑n=1

(−1)n−1xn

n,

valid for |x| < 1. When x = 1 we have the alternating harmonic series, whichconverges. By Abel’s theorem, the power series converges uniformly on [0, 1], and


hence the limit is continuous. Since the equality in (∗∗) holds on [0, 1), and bothsides are continuous on [0, 1], the equality must hold at x = 1. This gives

log 2 = 1− 1

2+

1

3− 1

4+ · · · .

(2) Again starting with the geometric series, we replace x by −x2 to get

1

1 + x2=∞∑n=0

(−1)nx2n.

Since | − x2| < 1 if and only if |x| < 1, this equation is also valid for |x| < 1. Now weintegrate term-by-term to get

arctanx =

∫ x

0

dt

1 + t2=∞∑n=0

(−1)n

2n+ 1x2n,

valid for |x| < 1. Again, the series converges for x = 1 by the alternating series test.By Abel’s theorem, the series is continuous on [0, 1], and so the above equation isstill valid at x = 1. We obtain the classical series

π

4= 1− 1

3+

1

5− 1

7+ · · · .

(3) We consider f(x) = (1 + x)α for α > 0, α 6∈ N. Repeated differentiation givesf (n)(x) = α(α− 1) · · · (α− n+ 1)(1 + x)α−n. Thus the Taylor series for f is given by

1 +∞∑n=1

α(α− 1) · · · (α− n+ 1)

n!xn.

Letting an = α(α−1) · · · (α−n+1)/n!, we compute |an+1/an| = |α−n|/(n+1)→ 1as n→∞. Thus the ratio test implies that the radius of convergence of the series is1. If we appy the convergence test of homework #49, we find that (for n > α)

n

(∣∣∣∣α− nn+ 1

∣∣∣∣− 1

)=

n

n+ 1(−1− α)→ −1− α

as n→∞. Thus since α > 0, the Taylor series converges at x = ±1.Now we are faced with the following difficulty. Let g(x) = 1 +

∑∞n=1 anx

n be thesum of the Taylor series of f . Then f and g are both defined on [−1, 1], and arecontinuous (by Abel’s theorem, in the case of g). We would like to know that theyare equal. In the previous two examples, we knew which function the power seriesrepresented because we began with the geometric series. The other method thatwe have seen involves using Taylor’s theorem to prove that the Taylor polynomialsconverge to the function uniformly on compact subsets of the (interior of the) intervalof convergence. In the current example, it isn’t apparent how to bound the derivativesof f so as to use Taylor’s theorem. Instead, we will present a clever trick, courtesyof Folland’s Real Analysis.

Notice that f ′(x) = α(1 + x)α−1, and hence α−1(1 + x)f ′(x) = (1 + x)α. Sincewe intend to prove that g(x) = (1 + x)α, we investigate this expression using g.

82 JACK SPIELBERG

Differentiating term-by-term on (−1, 1), we get

α−1(1 + x)g′(x) = α−1(1 + x)∞∑n=1

nanxn−1

= α−1

(∞∑n=0

(n+ 1)an+1xn +

∞∑n=1

nanxn

)

= α−1

(a1 +

∞∑n=1

((n+ 1)an+1 + nan

)xn

)

Note that a1 = α, and for n ≥ 1,

(n+ 1)an+1 + nan = (n+ 1)α(α− 1) · · · (α− n+ 1)(α− n)

(n+ 1)!

+ nα(α− 1) · · · (α− n+ 1)

n!

=α(α− 1) · · · (α− n+ 1)

n!

((α− n) + n

)= anα.

Henceα−1((n+ 1)an+1 + nan

)= an.

Thus we see that α−1(1 + x)g′(x) = 1 +∑∞

n=1 anxn = g(x). It follows that

d

dx(1 + x)−αg(x) = −α(1 + x)−α−1g(x) + (1 + x)−αg′(x)

= −α(1 + x)−α−1α−1(1 + x)g′(x) + (1 + x)−αg′(x)

= 0.

Since g(0) = 1, we have g(x) = (1 + x)α = f(x) on (−1, 1). By continuity, they areequal on [−1, 1].

Finally, we remark that letting t = 1+x ∈ [0, 2], we obtain tα = 1+∑∞

n=1 an(t−1)n,the series converging uniformly on [0, 2]. This is another explicit demonstration ofWeierstrass’ approximation theorem (for these functions).

Documents

limsup stuff