Princeton MAT 215 All Lectures

MAT 215: Analysis in a single variableCourse notes, Fall 2012

Michael Damron

Compiled from lectures and exercises designed with Mark McConnellfollowing Principles of Mathematical Analysis, Rudin

Princeton University

1

Contents

1 Fundamentals 41.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Natural numbers and induction . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Cardinality and the natural numbers . . . . . . . . . . . . . . . . . . . . . . 141.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 The real numbers 182.1 Rationals and suprema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2 Existence and properties of real numbers . . . . . . . . . . . . . . . . . . . . 192.3 Rn for n ≥ 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Metric spaces 253.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5 Heine-Borel Theorem: compactness in Rn . . . . . . . . . . . . . . . . . . . . 323.6 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Sequences 404.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.2 Subsequences, Cauchy sequences and completeness . . . . . . . . . . . . . . 454.3 Special sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Series 525.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.2 Ratio and root tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3 Non non-negative series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6 Function limits and continuity 626.1 Function limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.3 Relations between continuity and compactness . . . . . . . . . . . . . . . . . 666.4 Connectedness and the IVT . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.5 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2

7 Derivatives 747.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.3 Mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787.4 L’Hopital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.5 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.6 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8 Integration 918.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.2 Properties of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.3 Fundamental theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.4 Change of variables, integration by parts . . . . . . . . . . . . . . . . . . . . 1018.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A Real powers 105A.1 Natural roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.2 Rational powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105A.3 Real powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B Logarithm and exponential functions 110B.1 Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110B.2 Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111B.3 Sophomore’s dream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

C Dimension of the Cantor set 115C.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115C.2 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117C.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3

1 Fundamentals

1.1 Sets

We begin with the concepts of set, object and set membership. We will leave these as primitivein a sense; that is, undefined. You can think of a set as a collection of objects and if a isan object and A is a set then a ∈ A means a is a member of A. If A and B are sets, wesay that A is a subset of B (written A ⊂ B) whenever a ∈ A we have a ∈ B. If A ⊂ B andB ⊂ A we say the sets are equal and we write A = B. A is a proper subset of B if A ⊂ Bbut A 6= B. Note that ∅, the set with no elements, is a subset of every set.

There are many operations we can perform with sets.

• If A and B are sets, A ∪B is the union of A and B and is the set

A ∪B = {a : a ∈ A or a ∈ B} .

• If A and B are sets, A ∩B is the intersection of A and B and is the set

A ∩B = {a : a ∈ A and a ∈ B} .

• Of course we can generalize these to arbitrary numbers of sets. If C is a (possiblyinfinite) collection of sets (that is, a set whose elements are themselves sets), we define⋃

A∈C

A = {a : a ∈ A for some A ∈ C}

⋂A∈C

A = {a : a ∈ A for all A ∈ C} .

The sets A and B are called disjoint if A ∩B = ∅.These operations obey the following properties

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C) .

Let us give a proof of the first. To show these sets are equal, we must show each is containedin the other. So let a ∈ A ∩ (B ∪ C). We would like to show that a ∈ (A ∩ B) ∪ (A ∩ C).We know a ∈ A and a ∈ (B ∪ C). One possibility is that a ∈ A and a ∈ B, in which casea ∈ A∩B, giving a ∈ (A∩B)∪ (A∩C). The only other possibility is that a ∈ A and a ∈ C,since a must be in either B or C. Then a ∈ A∩C and the same conclusion holds. The otherdirection is an exercise.

If A and B are sets then define the difference A \B as

A \B = {a : a ∈ A but a /∈ B} .

4

One can verify the following as well.

A \ (B ∪ C) = (A \B) ∩ (A \ C)

A \ (B ∩ C) = (A \B) ∪ (A \ C) .

Finally the symmetric difference is

A∆B = (A \B) ∪ (B \ A) .

1.2 Relations and functions

Our last important way to build a set from other sets is the product. We write

A×B = {(a, b) : a ∈ A, b ∈ B} .

Definition 1.2.1. A relation R between sets A and B is any subset of A×B. If (a, b) ∈ Rwe think of a as being related to b.

We will first mention types of relations on a single set.

• A relation R between A and A is reflexive if (a, a) ∈ R for all a ∈ A.

• It is symmetric if whenever (a1, a2) ∈ R we have (a2, a1) ∈ R.

• It is transitive if whenever (a1, a2) ∈ R and (a2, a3) ∈ R we have (a1, a3) ∈ R.

Definition 1.2.2. A relation R on A which is reflexive, symmetric and transitive is calledan equivalence relation. Given a ∈ A and an equivalence relation R on A we write

[a]R = {a′ ∈ A : (a, a′) ∈ R}

for the equivalence class of a.

Sometimes the condition (a, a′) ∈ R is written a ∼ a′ (and sometimes R is not evenmentioned). An example is equality of sets; that is, defining the relation A ∼ B if A = Bgives an equivalence relation. And here we have not specified R or the “larger” set on whichR is a relation. You can check set equality is reflexive, symmetric and transitive.

Proposition 1.2.3. If R is an equivalence relation on a nonempty set A and a1, a2 ∈ Rthen

either [a1]R = [a2]R or [a1]R ∩ [a2]R = ∅ .

Proof. We will first show that both conditions cannot simultaneously hold. Then we willshow that at least one must hold. To show the first, note that a1 ∈ [a1]R and a2 ∈ [a2]Rsince R is reflexive. Therefore if [a1]R = [a2]R then a1 ∈ [a1]R ∩ [a2]R, giving nonemptyintersection.

For the second claim, suppose that [a1]R ∩ [a2]R 6= ∅, giving some a in the intersection.We claim that [a]R = [a1]R. If a′ ∈ [a]R then (a′, a) ∈ R. But a ∈ [a1]R, so (a, a1) ∈ R.By transitivity, (a′, a1) ∈ R, so a′ ∈ [a1]R. This proves [a]R ⊆ [a1]R. To show the othercontainment, let a′ ∈ [a1]R so that (a′, a1) ∈ R. Again, (a, a1) ∈ R, giving (a1, a) ∈ R.Transitivity then implies (a′, a) ∈ R, so a′ ∈ [a]R.

5

The picture is that the equivalence classes of R partition A.

Definition 1.2.4. A partition of A is a collection P of subsets of A such that

1. A =⋃S∈P S and

2. S1 ∩ S2 = ∅ whenever S1 and S2 in P are not equal.

Using this definition, we can say that if R is an equivalence relation on a set A then thecollection

CR = {[a]R : a ∈ A}of equivalence classes form a partition of A.

Just a note to conclude. If we have an equivalence relation R on a set A, it is standardnotation to write

R/A = {[a]R : a ∈ A}for the set of equivalence classes of A under R. This is known as taking the quotient by anequivalence relation. At times the relation R is written in an implied manner using a symbollike ∼. For instance, (a, b) ∈ R would be written a ∼ b. In this case, the quotient is R/ ∼.

We will spend much of the course talking about functions, which are special kinds ofrelations.

Definition 1.2.5. Let A and B be sets and f a relation between A and B. We say that fis a (well-defined) function from A to B, written f : A→ B if the following hold.

1. For each a ∈ A, there is at least one b ∈ B such that (a, b) ∈ f .

2. For each a ∈ A, there is at most one b ∈ B such that (a, b) ∈ f . That is, if we everhave (a, b1) ∈ f and (a, b2) ∈ f for b1, b2 ∈ B, it follows that b1 = b2.

The set A is called the domain of f and the B is called the codomain of f .

Of course we will not continue to use this notation for a function, but the more familiarnotation: if (a, b) ∈ f then because of item 2 above, we can unambiguously write f(a) = b.

We will be interested in certain types of functions.

Definition 1.2.6. The function f : A → B is called one-to-one (injective) if whenevera1 6= a2 then f(a1) 6= f(a2). It is called onto (surjective) if for each b ∈ B there exists a ∈ Asuch that f(a) = b.

Another way to define onto is to first define the range of a function f : A→ B by

f(A) = {f(a) : a ∈ A}

and say that f is onto if f(A) = B.Many times we want to compose functions to build other ones. Suppose that f : A→ B

and g : B → C are functions. Then

(g ◦ f) : A→ C is defined as (g ◦ f)(a) = g((f(a)) .

6

Formally speaking we define g ◦ f ⊆ A× C by

(a, c) ∈ g ◦ f if (a, b) ∈ f and (b, c) ∈ g for some b ∈ B .

You can check that this defines a function.

Proposition 1.2.7. Let f : A→ B and g : B → C be functions.

1. If f and g are one-to-one then so is g ◦ f .

2. If f and g are onto then so is g ◦ f .

Proof. We start with the first statement. Suppose that f and g are one-to-one; we will showthat g ◦ f must be one-to-one. Suppose then that a and a′ in A are such that (g ◦ f)(a) =(g ◦ f)(a′). Then by definition, g(f(a)) = g(f(a′)). But g is one-to-one, so f(a) = f(a′).Now since f is one-to-one, we find a = a′. This shows that if (g ◦ f)(a) = (g ◦ f)(a′) thena = a′, proving g ◦ f is one-to-one.

Suppose then that f and g are onto. To show that g ◦ f is onto we must show thatfor each c ∈ C there exists a ∈ A such that (g ◦ f)(a) = c.This is the same statement asg(f(a)) = c. We know that g is onto, so there exists b ∈ B such that g(b) = c. Furthermore,f is onto, so for this specific b, there exists a ∈ A such that f(a) = b. Putting these together,

(g ◦ f)(a) = g(f(a)) = g(b) = c .

This completes the proof.

If a function is both one-to-one and onto we can define an inverse function.

Definition 1.2.8. If f : A→ B is both one-to-one and onto we call f a bijection.

Theorem 1.2.9. Let f : A→ B. There exists a function f−1 : B → A such that

f−1 ◦ f = idA and f ◦ f−1 = idB , (1)

where idA : A→ A and idB : B → B are the identity functions

idA(a) = a and idB(b) = b

if and only if f is a bijection. The meaning of the above equations is f−1(f(a)) = a andf(f−1(b)) = b for all a ∈ A and b ∈ B.

Proof. Suppose that f : A→ B is a bijection. Then define f−1 ⊆ B × A by

f−1 = {(b, a) : (a, b) ∈ f} .

This is clearly a relation. We claim it is a function. To show this we must prove that

• for all b ∈ B there exists a ∈ A such that (b, a) ∈ f−1 and

7

• for all b ∈ B there exists at most one a ∈ A such that (b, a) ∈ f−1.

Restated, these are

• for all b ∈ B there exists a ∈ A such that f(a) = b and

• for all b ∈ B there exists at most one a ∈ A such that f(a) = b.

These are exactly the conditions that f be a bijection, so f−1 is a function.Now we must show that f−1 ◦ f = idA and f ◦ f−1 = idB. We show only the first; the

second is an exercise. For each a ∈ A, there is a b ∈ B such that f(a) = b. By definition off−1, we then have (b, a) ∈ f−1; that is, f−1(b) = a. Therefore (a, b) ∈ f and (b, a) ∈ f−1,giving (a, a) ∈ f−1 ◦ f , or

(f−1 ◦ f)(a) = a = idA(a) .

We have now shown that if f is a bijection then there is a function f−1 that satisfies (1).For the other direction, suppose that f : A → B is a function and g : B → A is a functionsuch that

g ◦ f = idA and f ◦ g = idB .

We must show then that f is a bijection. To show one-to-one, suppose that f(a1) = f(a2).Then a1 = idA(a1) = g(f(a1)) = g(f(a2)) = idA(a2) = a2., giving that f is one-to-one. Toshow onto, let b ∈ B; we claim that f maps the element g(b) to b. To see this, computeb = idB(b) = f(g(b)). This shows that f is onto and completes the proof.

Here are some more facts about inverses and injectivity/surjectivity.

• If f : A→ B is a bijection then so is f−1 : B → A.

• If f : A→ B and g : B → C are bijections then so is g ◦ f .

• The identity map idA : A→ A is a bijection.

If a function f : A→ B is not a bijection then there is no inverse function f−1 : B → A.However we can in all cases consider the inverse image.

Definition 1.2.10. Given f : A→ B and C ⊂ B we define the inverse image of C as

f−1(C) = {a ∈ A : f(a) ∈ C} .

Note that if we let C be a singleton set {b} for some b ∈ B then we retrieve all elementsa ∈ A mapped to b:

f−1({b}) = {a ∈ A : f(a) = b} .In the case that f is invertible, this just gives the singleton set consisting of the point f−1(b).We note the following properties of inverse images (proved in the homework). For f : A→ Band C1, C2 ⊂ B,

• f−1(C1 ∩ C2) = f−1(C1) ∩ f−1(C2).

• f−1(C1 ∪ C2) = f−1(C1) ∪ f−1(C2).

8

1.3 Cardinality

The results of the previous section allow us to define an equivalence relation on sets:

Definition 1.3.1. If A and B are sets, we say that A and B are equivalent (A ' B or Aand B have the same cardinality) if there exists a bijection f : A→ B. The cardinality of aset A (written ](A)) is defined as the equivalence class of A under this relation. That is

](A) = {B : A ' B} .

To compare cardinalities, we introduce a new relation on sets.

Definition 1.3.2. If A and B are sets then we write ](A) ≤ ](B) if there exists a one-to-onefunction f : A→ B. Write ](A) < ](B) if ](A) ≤ ](B) but ](A) 6= ](B).

The following properties follow. (Exercise: verify the first two.)

1. (reflexivity) For each set A, ](A) ≤ ](A).

2. (transitivity) For all sets A,B,C, if ](A) ≤ ](B) and ](B) ≤ ](C) then ](A) ≤ ](C).

3. (antisymmetry) For all sets A and B, if ](A) ≤ ](B) and ](B) ≤ ](A) then ](A) = ](B).

Any relation on a set that satisfies these properties is called a partial order. For cardi-nality, establishment of antisymmetry is done by the Cantor-Bernstein theorem, which wewill skip.

Theorem 1.3.3 (Cantor’s Theorem). For any set A let P(A) be the power set of A; that is,the set whose elements are the subsets of A. Then ](A) < ](P(A)).

Proof. We first show that ](A) 6= ](P(A)). We proceed by contradiction. Suppose that A isa set but assume that ](A) = ](P(A)). Then there exists a bijection f : A → P(A). Usingthis function, define the set

S = {a ∈ A : a /∈ f(a)} .

Since this is a subset of A, it is an element of P(A). As f is a bijection, it is onto andtherefore there exists s ∈ A such that f(s) = S. There are now two possibilities; eithers ∈ S or s /∈ S. In either case we will derive a contradiction, proving that the assumptionwe made cannot be true: no such f can exist and ](A) 6= ](P(A)).

In the first case, s ∈ S. Then as S = f(s), we have s ∈ f(s). But then by definition ofS, it must actually be that s /∈ S, a contradiction. In the second case, s /∈ S, giving by thedefinition of S that s ∈ f(s). However f(s) = S so s ∈ S, another contradiction.

Second we must show that ](A) ≤ ](P(A)). To do this we define the function

f : A→ P(A) by f(a) = {a} .

To prove injectivity, suppose that f(a1) = f(a2). Then {a1} = {a2} and therefore a1 =a2.

9

Let us now give an example of two sets with the same cardinality. If A and B are setswe write BA for the set of functions f : A → B. Let F2 be a set with two elements, whichwe call 0 and 1. We claim that

](P(A)) = ](FA2 ) .

To see this we must display a bijection between the two. Define f : P(A) → FA2 by thefollowing. For any subset S ⊂ A associate the characteristic function χS : A→ F2 by

χS(a) =

{1 if a ∈ S0 if a /∈ S

.

Exercise: show that the function f : P(A)→ FA2 given by f(S) = χS is a bijection.

1.4 Natural numbers and induction

To introduce the natural numbers in an axiomatic way we will use the Peano axioms.

Assumption. We assume the existence of a set N, an element 1 ∈ N and a functions : N→ N with the following properties.

1. For each n ∈ N, s(n) (the successor of n) is not equal to 1.

2. s is injective.

3. (Inductive axiom) If any subset S ⊂ N contains 1 and has the property that whenevern ∈ S then s(n) ∈ S, it follows that S = N.

The third property seems a bit weird at first, but actually there are many sets which satisfythe first two properties and are not N. For instance, the set {n/2 : n ∈ N} does. So we needit to really pin down N.

From these axioms many properties follow. Here is one.

• for all n ∈ N, s(n) 6= n.

Proof. Let S = {n ∈ N : s(n) 6= n}. Clearly 1 ∈ S. Now suppose that n ∈ S for somen. Then we claim that s(n) ∈ S. To see this, note that by injectivity of s, s(n) 6= nimplies that s(s(n)) 6= s(n). Thus s(n) ∈ S. By the inductive axiom, since 1 ∈ S andwhenever n ∈ S we have s(n) ∈ S, we see that S = N. In other words, s(n) 6= n forall n.

AdditionIt is customary to call s(1) = 2, s(2) = 3, and so on. We define addition on the natural

numbers in a recursive manner:

• for any n ∈ N, define n+ 1 to be s(n) and

10

• for any n,m ∈ N, define n+ s(m) to be s(n+m).

That this indeed defines a function + : N×N→ N requires proof, but we will skip this andassume that addition is defined normally. Of course, addition satisfies the commutative andassociative laws.

1. For any m,n, r ∈ N, m+ (n+ r) = (m+ n) + r.

Proof. First we show the statement for r = 1 and all m,n. We have

m+ (n+ 1) = m+ s(n) = s(m+ n) = (m+ n) + 1 ,

where we have used the inductive definition of addition. Now suppose that the formulaholds for some r ∈ N; we will show it holds for s(r). Indeed,

m+ (n+ s(r)) = m+ (n+ (r + 1)) = m+ ((n+ r) + 1) = m+ s(n+ r)

= s(m+ (n+ r)) = s((m+ n) + r) = (m+ n) + s(r) .

In other words, the set

S = {r ∈ N : m+ (n+ r) = (m+ n) + r for all m,n ∈ N}

has 1 ∈ S and whenever r ∈ S, also s(r) ∈ S. By the inductive axiom, S = N.

2. For any m,n ∈ N, m+ n = n+m.

Proof. Again we use an inductive argument. Define

S = {n ∈ N : n+m = m+ n for all m ∈ N} .

The first step is to show that 1 ∈ S; that is, that 1 + m = m + 1 for all m ∈ N. Forthis we also do an induction. Set

T = {m ∈ N : 1 +m = m+ 1} .

First, 1 ∈ T since 1 + 1 = 1 + 1. Suppose then that m ∈ T . We claim that this impliesm+ 1 ∈ T . To see this, write

1 + (m+ 1) = (1 +m) + 1 = (m+ 1) + 1 .

By the induction, T = N.

Now that we have shown 1 ∈ S, we assume n ∈ S and prove n+ 1 ∈ S. For m ∈ N,

(n+ 1) +m = n+ (1 +m) = n+ (m+ 1) = (n+m) + 1

= (m+ n) + 1 = m+ (n+ 1) .

By the inductive axiom, S = N and we are done.

11

3. For all n,m ∈ N, n+m 6= n.

Proof. Define the set

S = {n ∈ N : n+m 6= nfor all m ∈ N} .

Then since by the Peano axioms,

1 +m = s(m) 6= 1 for all m ∈ N ,

so 1 ∈ N. Suppose then that n ∈ S; that is, n is such that n + m 6= n for all m ∈ N.Then by injectivity, for m ∈ N,

(n+ 1) +m = (n+m) + 1 = s(n+m) 6= s(n) = n+ 1 ,

giving n+ 1 ∈ S. By the inductive axiom, S = N and we are done.

Last, for proving facts about ordering we show

• s is a bijection from N to N \ {1}.

Proof. We know s does not map any element to 1 so s is in fact a function to N \ {1}.Also it is injective. To show surjective, consider the set

S = {1} ∪ {s(n) : n ∈ N} .

Clearly 1 ∈ S. Supposing that n ∈ S then n ∈ N, so s(n) ∈ S. Therefore S = N.Therefore if k 6= 1 then k = s(n) for some n ∈ N.

The above lets us define n− 1 for n 6= 1. It is the element such that (n− 1) + 1 = n.

OrderingWe also define an ordering on the natural numbers. We say that m ≤ n for m,n ∈ N if

either m = n or m + a = n for some a ∈ N. This defines a total ordering of N; that is, it isa partial ordering that also satisfies

• for all m,n ∈ N, m ≤ n or n ≤ m.

In the case that m ≤ n but m 6= n we write m < n. Note that by item 3 above, n < n+mfor all n,m ∈ N. In particular, n < s(n).

Proposition 1.4.1. ≤ is a total ordering of N.

12

Proof. First each n ≤ n so it is reflexive. Next if n1 ≤ n2 and n2 ≤ n3 then if n1 = n2 orn2 = n3, we clearly have n1 ≤ n3. Otherwise there exists m1,m2 ∈ N such that n1 +m1 = n2

and n2 +m2 = n3. In this case,

n3 = n2 +m2 = (n1 +m1) +m2 = n1 + (m1 +m2) ,

giving n1 ≤ n3.For antisymmetry, suppose that m ≤ n and n ≤ m. For a contradiction, if m 6= n then

there exists a, b ∈ N such that m = n+a and n = m+b. Then m = (m+a)+b = m+(a+b),a contradiction with item 3 above. Therefore m = n.

So far we have proved that ≤ is a partial order. We now prove ≤ is a total ordering. Tobegin with, we claim that for all n ∈ N, 1 ≤ n. Clearly this is true for n = 1. If we assumeit holds for some n then

n+ 1 = 1 + n ≥ 1 ,

verifying the claim by induction.Now for any m > 1 (that is, m ∈ N with m 6= 1), define the set

S = {n ∈ N : n ≤ m} ∪ {n ∈ N : m ≤ n} .

By the above remarks, 1 ∈ S. Supposing now that n ∈ S for some n ∈ N, we claim thatn+ 1 ∈ S. To show this, we have three cases.

1. Case 1: n = m. In this case, n+ 1 = m+ 1 ≥ m, giving n+ 1 ∈ S.

2. Case 2: n > m, so there exists a ∈ N such that n = m+a. Then n+1 = m+a+1 ≥ m,giving n+ 1 ∈ S.

3. Case 3: n < m, so there exists a ∈ N such that m = n+a. If a = 1 then n+1 = m ∈ S.Otherwise a > 1, implying that a− 1 ∈ N (that is, a− 1 is defined), so

m = n+ a = n+ a− 1 + 1 = (n+ 1) + a− 1 > n+ 1 ,

so that n + 1 ∈ S. By the inductive axiom, S = N and therefore for all n, we haven ≤ m or m ≤ n.

A consequence of these properties is trichotomy of the natural numbers. For any m,n ∈N, exactly one of the following holds: m < n, m = n or n < m.

A property that relates addition and ordering is

• if m,n, r ∈ N such that m < n then m+ r < n+ r.

Proof. There must be a ∈ N such that n = m+a. Then n+r = m+a+r = m+r+a,giving m+ r < n+ r.

Clearly then if m ≤ n and r ∈ N we have m+ r ≤ n+ r.

13

• If n < k then n+ 1 ≤ k.

Proof. If n < k then there exists j ∈ N such that n + j = k. Because 1 ≤ j we findn+ 1 ≤ n+ j = k.

Multiplication.We define multiplication inductively by

n · 1 = n for all n ∈ Nn · s(m) = n+ (n ·m) .

One can prove the following properties; (try it!) let m,n, r, s ∈ N:

1. for all n,m, r ∈ N,n · (m+ r) = (n ·m) + (n · r) .

2. n ·m = m · n.

3. (n ·m) · r = n · (m · r).

4. if n < m and r ≤ s then rn < sm.

1.5 Cardinality and the natural numbers

For each n ∈ N we write the set

Jn = {m ∈ N : m ≤ n} .

Note that J1 = {1} and for n ≥ 1, we have

Jn+1 = Jn ∪ {n+ 1} .

To show this let k be in the right side. If k = n+ 1 then k ∈ Jn+1. Otherwise k ≤ n, givingby n ≤ n+ 1 the inequality k ≤ n+ 1, or k ∈ Jn+1. To prove the inclusion ⊂, suppose thatk ∈ Jn+1. If k ∈ Jn we are done, so suppose that k /∈ Jn. Therefore k > n, so k ≥ n+ 1. Onthe other hand, k ≤ n+ 1, so k = n+ 1.

Definition 1.5.1. For an arbitrary set A we say that A has cardinality n if A ' Jn. In thiscase we say A is finite and we write ](A) = n. If A is not equivalent to any Jn we say A isinfinite.

In this definition, ](A) is an equivalence class of sets and n is a number, so what we havewritten here is purely symbolic: it means A ' Jn.

Lemma 1.5.2. If A and B are sets such that A ⊂ B then ](A) ≤ ](B).

Proof. Define f : A→ B by f(a) = a. Then f is an injection.

14

Theorem 1.5.3. For all n ∈ N, ](Jn) < ](Jn+1) < ]N.

Proof. Each set above is a subset of the next, so the proposition holds using ≤ instead of<. We must then prove 6= in each spot above. Assume first that we have proved that](Jn) 6= ](Jn+1) for all n ∈ N; we will show that ](Jn) 6= ]N for all n ∈ N. If we had equality,then we would find ](Jn+1) ≤ ]N = ](Jn). This contradicts the first inequality.

To prove the inequality ](Jn) 6= ](Jn+1), we use induction. Clearly it holds for n = 1since J1 = {1} and J2 = {1, 2} and any function from J1 to J2 can only have one elementin its range (cannot be onto). Suppose then that ](Jn) 6= ](Jn+1); we will prove that](Jn+1) 6= ](Jn+2) by contradiction. Assume that there is a bijection f : Jn+1 → Jn+2. Thensome element must be mapped to n+ 2; call this k ∈ Jn+1. Define h : Jn+1 → Jn+1 by

h(m) =

m m 6= k, n+ 1

n+ 1 m = k

k m = n+ 1

.

This function just swaps k and n + 1. It follows then that f = f ◦ h : Jn+1 → Jn+2 is abijection that maps n+ 1 to n+ 2.

Now Jn is just Jn+1 \ {n + 1} and Jn+1 is just Jn+2 \ {n + 2}, so define g : Jn → Jn+1

to do exactly what f does: g(m) = f(m). It follows that g is a bijection from Jn to Jn+1,giving Jn ' Jn+1, a contradiction.

Because of the proposition, if a set A has A ' N it must be infinite. In this case we saythat A is countable. Otherwise, if A is infinite and ](A) 6= ]N, we say it is uncountable. Fromthis point on, we will be more loose about working with the natural numbers. For example,we will use the terms finite and infinite in the same way that we normally do – a set is finiteif it has finitely many elements and infinite otherwise. Of course every proof we write fromnow on could be done using the Peano axioms, but we will be spared that.

Theorem 1.5.4. Let S be an infinite subset of N. Then S is countably infinite.

Proof. We must construct a bijection from N to S. We can actually do this using the well-ordering property: that each non-empty subset of N has a least element. Define f : N→ Srecursively: f(1) is the least element of S and, assuming we have defined f(1), . . . , f(n),define f(n+ 1) to be the least element of S \ {f(1), . . . , f(n)}.

This is a bijection.

Definition 1.5.5. We say a set A is countable if it is either finite or countably infinite.

Note that A is countable if and only if there is an injection f : A → N; that is, that](A) ≤ ]N.

Theorem 1.5.6. Let C be a countable collection of countable sets. Then ∪A∈CA is countable.

15

Proof. To prove this we need to construct a bijection from N. We will do this somewhatnon-rigorously, thinking of a bijection from N as a listing of elements of ∪A∈CA in sequence.For example, given a countably infinite set S we may take a bijection f : N→ S and list allof the elements of S as

f(1), f(2), f(3), . . .

If S is finite then this corresponds to a finite list.Since each A ∈ C is countable, we may list its elements. The collection C itself is countable

so we can list the elements of ∪A∈CA in an array:

a1 a2b1 b2 b3 · · ·c1d1 d2 d3 d4 · · ·· · ·

Note that some rows are finite. We now list the elements according to diagonals. That is,we write the list as

a1, b1, a2, c1, b2, d1, b3, d2, . . .

Because we want the list to correspond to a bijection, we need to make sure that no elementis repeated. So, for instance, if b1 and a2 are equal we would only include the first.

1.6 Exercises

1. Let f : A → B and g : B → C be functions. Show that the relation g ◦ f ⊂ A × C,defined by

(a, c) ∈ g ◦ f if (a, b) ∈ f and (b, c) ∈ g for some b ∈ B

is a function.

2. Show that the function f : P(A)→ FA2 mentioned at the end of Section 1.3 and givenby f(S) = χS is a bijection.

3. Prove the properties of multiplication listed at the end of Section 1.4.

4. Prove the following statements by induction.

(a) For all n ∈ N,

1 + 2 + · · ·+ n =n(n+ 1)

2.

(b) For all n ∈ N,

12 + 22 + · · ·+ n2 =n(n+ 1)(2n+ 1)

6.

16

5. Strong Induction. In this exercise we introduce strong mathematical induction,which, although being referred to as “strong,” is actually equivalent to mathematicalinduction. Suppose we are given a collection {P (n) : n ∈ N} of mathematical state-ments. To show P (n) is true for all n, mathematical induction dictates that we showtwo things hold: P (1) is true and if P (n) is true for some n ∈ N then P (n+ 1) is true.

To argue instead using strong induction we prove that

• P (1) is true and

• if n ∈ N is such that P (k) is true for all k ≤ n then P (n+ 1) is true.

(a) Define a sequence (an) of real numbers recursively by

a1 = 1 and an = a1 + · · ·+ a[n/2] for n ≥ 2 .

(Here [n/2] is the largest integer no bigger than n/2.) Prove by strong inductionthat an ≤ 2n−1 for n ≥ 2. Is it possible to find b < 2 such that an ≤ bn−1 for alln ≥ 2?

(b) Why does strong induction follow from mathematical induction? In other wordsin the second step of strong induction, why are we allowed to assume that P (k)is true for all k ≤ n to prove that P (n+ 1) is true?

6. Prove that any non-empty subset S ⊂ N has a least element. That is, there is an s ∈ Ssuch that for all t ∈ S we have s ≤ t. This is a major result about N, expressed bysaying that N is well-ordered.

Hint. Assume there is no least element. Let

M = {m ∈ N : ∀t ∈ S, m ≤ t} .

Use Peano’s induction axiom to prove that M = N. Does this lead to a contradiction?

17

2 The real numbers

2.1 Rationals and suprema

From now on we will proceed through Rudin, using the standard notations

Z = {. . . ,−1, 0, 1, . . .}Q = {m/n : m,n ∈ Z and n 6= 0} .

When thinking about the rational numbers, we quickly come to realize that they do notcapture all that we wish to express using numbers. For instance,

Theorem 2.1.1. There is no rational number whose square is 2.

Proof. We argue by contradiction, so assume that 2 = (m/n)2 for some m,n ∈ Z with n 6= 0.We may assume that m and n are not both even; otherwise, we can “reduce the fraction,”removing enough factors of 2 from the numerator and denominator. Then

2n2 = m2 ,

so m2 is even. This actually implies that m must be even, for otherwise m2 would be odd(since the square of an odd number is odd). Therefore we can write m = 2s for some s ∈ Z.Plugging back in, we find

2n2 = 4s2 or n2 = 2s2 ,

so n2 is also even, giving that n is even. This is a contradiction.

From the previous theorem, what we know as√

2 is not a rational number. Therefore ifwe were to construct a theory from only rationals, we would have a “hole” where we think√

2 should be. What is even stranger is that there are rational numbers arbitrarily close tothis hole.

Theorem 2.1.2. If q ∈ Q satisfies 0 < q2 < 2 then we can find another rational q ∈ Q suchthat

q2 < q2 < 2 .

Similarly, for each r ∈ Q such that r2 > 2, there is another rational r such that 2 < r2 < r2.

Proof. Suppose that q > 0 satisfies q2 < 2 and define

q = q +2− q2

q + 2.

Then q > q and

q2 − 2 =2(q2 − 2)

(q + 2)2,

giving q2 < 2.

18

We see from above that the set {q ∈ Q : q2 < 2} does not have a largest element. Thisleads us to study largest elements of sets more carefully.

Definition 2.1.3. If A is a set with a partial ordering ≤ we say that a ∈ A is an upperbound for a subset B ⊂ A if b ≤ a for all b ∈ B. We say that a is a least upper bound for Bif whenever a′ is an upper bound for B, we have a ≤ a′. We define lower bound and greatestlower bound similarly.

Note that if a is a least upper bound for B then a is unique. Indeed, assume that a anda′ are least upper bounds. Since they are both upper bounds, we have a ≤ a′ and a′ ≤ a, soby antisymmetry of partial orderings, a = a′. Because of this uniqueness, there is no harmin writing

a = supB when a is the least upper bound of B

anda = inf B when a is the greatest lower bound of B .

Proposition 2.1.4. Let A be a totally ordered set and B a subset. Define C to be the setof all upper bounds for B. Then supB = inf C.

Proof. We are trying to show that some element (inf C) is the supremum of B, so we mustshow two things: inf C is an upper bound for B and any other upper bound a for B satisfiesinf C ≤ a. The second statement is easy because if a is an upper bound for B then a ∈ C.As inf C is a lower bound for C we then have inf C ≤ a.

For the first, assume that inf C is not an upper bound for B, so there exists b ∈ B suchthat inf C is not ≥ b. By trichotomy, inf C < b. We claim then that b is a lower bound for Cwhich is larger than the greatest lower bound, a contradiction. Why is this? If c ∈ C thenc is an upper bound for B, giving c ≥ b, or b ≤ c.

Note that the second statement of Theorem 2.1.2 states that the set {q ∈ Q : q2 > 2}does not have a supremum in Q. Indeed, if it did have a supremum r, then r would be arational upper bound for this set and then we could find a smaller r that is still an upperbound, a contradiction. So one way of formulating the fact that there are “holes” in Q is tosay that it does not have the least upper bound property.

Definition 2.1.5. Let A be a totally ordered set with order ≤. We say that A has the leastupper bound property if each nonempty subset B ⊂ A with an upper bound in A has a leastupper bound in A.

2.2 Existence and properties of real numbers

Therefore we are led to extend the rational numbers to fill in the holes. This is actuallyquite a difficult procedure and there are many routes to its end. We will not discuss these,however, and will instead state the main theorem about the existence of the real numberswithout proof. The main point of this course will be to understand properties of the realnumbers, and not its existence and uniqueness.

19

For the statement, one needs the definition of an ordered field, which is a certain type oftotally ordered set with multiplication and addition (like the rationals).

Theorem 2.2.1 (Existence and uniqueness of R). There exists a unique ordered field withthe least upper bound property.

The sense in which uniqueness holds is somewhat technical; it is not that any two orderedfields as above must be equal, but they must be isomorphic. Again we defer to Rudin forthese definitions. We will now assume the existence of R, that it contains Q and Z, and itsusual properties.

One extremely useful property of R that follows from the least upper bound property is

Theorem 2.2.2 (Archimedean property of R). Given x, y ∈ R with x 6= 0, there existsn ∈ Z such that

nx > y .

Proof. First let x, y ∈ R such that x, y > 0 and assume that there is no such n. Then theset

{nx : n ∈ N}

is bounded above by y. As it is clearly nonempty, it has a supremum s. Then s− x < s, sos− x cannot be an upper bound, giving the existence of some m ∈ N such that

s− x < mx .

However this implies that s < (m+1)x, so s was actually not an upper bound, contradiction.This proves the statement for the case x < y. The other cases can be obtained from this oneby instead considering −x and/or −y.

The Archimedean property implies

Corollary 2.2.3 (Density of Q in R). Let x, y ∈ R with x < y. There exists q ∈ Q suchthat x < q < y.

Proof. Apply the Archimedean property to y−x and 1 to find n ∈ Z such that n(y−x) > 1.We can also find m1 > nx and m2 > −nx, so

−m2 < nx < m1 .

It follows then that there is an m ∈ Z such that m− 1 ≤ nx < m. Finally,

nx < m ≤ 1 + nx < ny .

Dividing by n we get x < m/n < y.

Now we return to countability.

Theorem 2.2.4. The set Q is countable, whereas R is uncountable.

20

Proof. We already know that N× N is countable: this is from setting up the array

(1, 1) (2, 1) (3, 1) · · ·(1, 2) (2, 2) (3, 2) · · ·(1, 3) (2, 3) (3, 3) · · ·· · ·

and listing the elements along diagonals. On the other hand, there is an injection

f : Q+ → N× N ,

where Q+ is the set of positive rationals. One such f is given by f(m/n) = (m,n), wherem/n is the “reduced fraction” for the rational, expressed with m,n ∈ N. Therefore Q+ iscountable. Similarly, Q−, the set of negative rationals, is countable. Last, Q = Q+∪Q−∪{0}is a union of 3 countable sets and is thus countable.

To prove R is uncountable, we will use decimal expansions for real numbers. In otherwords, we write

x = .a1a2a3 . . .

where ai ∈ {0, . . . , 9} for all i. Since we have not proved anything about decimal expansions,we are certainly assuming a lot here, but this is how things go. Note that each real numberhas at most 2 decimal expansions (for instance, 1/4 = .2500 . . . = .2499 . . .).

Assume that R is countable. Then as there are at most two decimal expansions for eachreal number, the set of decimal expansions is countable (check this!) Now write the set ofall expansions in a list:

1 .a0a1a2 . . .2 .b0b1b2 . . .3 .c0c1c2 . . .· · · · · ·

We will show that no matter what list we are given (as above), there must be a sequencethat is not in the list. This implies that there can be no such list, and thus R is uncountable.

Consider the diagonal element of the list. That is, we take a0 for the first digit, b1 forthe second, c2 for the third and so on:

.a0b1c2d3 . . .

We now have a rule to transform this diagonal element into a new one. We can use many,but here is one: change each digit to a 0 if it is not 0, and replace it with 9 if it is 0. Forexample,

.0119020 . . . −→ .9000909 . . .

Note that this procedure changes the diagonal number into a new one that differs from thediagonal element in every decimal place. Call this new expansion A = .a0a1 . . .

Now our original list contains all expansions, so it must contain A at some point; let ussay that the n-th element of the list is A. Then consider the n-th digit an of A. On theone hand, by construction, an is not equal to the n-th digit of the diagonal element. On theother hand, by the position in the list, an equals the n-th digit of the diagonal element. Thisis a contradiction.

21

2.3 Rn for n ≥ 2

A very important extension of R is given by n-dimensional Euclidean space.

Definition 2.3.1. For n ≥ 2, the set Rn is defined as

Rn = {~a = (a1, . . . , an) : ai ∈ R for all i} .

Addition of elements is defined as

~a+~b = (a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn)

and multiplication of elements by numbers is

c~a = c(a1, . . . , an) = (ca1, . . . , can), c ∈ R .

Note that this definition gives us R for n = 1.On Rn we place a distance, but to do that, we need the existence of square roots. We

will take this for granted now, since we will prove it later using continuity.

Lemma 2.3.2. For each x ∈ R with x ≥ 0 there exists a unique y ∈ R such that y2 = x.This element is written y =

√x.

Definition 2.3.3. On the set Rn we define the norm

|~a| = |(a1, . . . , an)| =√a21 + · · ·+ a2n

and inner product

~a ·~b = (a1, . . . , an) · (b1, . . . , bn) = a1b1 + · · ·+ anbn .

Theorem 2.3.4. Suppose ~a,~b,~c ∈ Rn and c ∈ R. Then

1. |~a| ≥ 0 with |~a| = 0 if and only if ~a = 0.

2. |c~a| = |c||~a|.

3. (Cauchy-Schwarz inequality) |~a ·~b| ≤ |~a||~b|.

4. (Triangle inequality) |~a+~b| ≤ |~a|+ |~b|.

5. |~a−~b| ≤ |~a− ~c|+ |~c−~b|.

Proof. The first two follow easily; for instance since a2 ≥ 0 for all a ∈ R (this is actuallypart of the definition of ordered field), we get a21 + · · · + a2n ≥ 0 and therefore |~a| ≥ 0. If|~a| = 0 then by uniqueness of square roots, a21 + · · · + a2n = 0 and so 0 ≥ a2i for all i, givingai = 0 for all i.

For the third item, we first give a lemma.

22

Lemma 2.3.5. If ax2 + bx+ c ≥ 0 for all x ∈ R then b2 ≤ 4ac.

Proof. If a = 0 then bx ≥ −c for all x. Then we claim b must be zero. If not, then pluggingin either 2c/b or −2c/b will give bx < −c, a contradiction. Therefore is a = 0 we must haveb = 0 and therefore b2 ≤ 4ac as claimed.

Otherwise a 6= 0. First assume that a > 0. Plug in x = −b/(2a) to get

−b2/(4a) + c ≥ 0

giving b2 ≤ 4ac. Last, if a < 0 then we have (−a)x2 + (−b)x+ (−c) ≥ 0 and applying whatwe have proved already to this polynomial, we find (−b)2 ≤ 4(−a)(−c), or b2 ≤ 4ac.

To prove Cauchy-Schwarz, note that for all x ∈ R,

0 ≤ (a1x− b1)2 + · · ·+ (anx− bn)2

= (a21 + · · ·+ a2n)x2 − 2(a1b1 + · · ·+ anbn)x+ (b21 + · · ·+ b2n)

= |~a|2x2 − 2(~a ·~b)x+ |~b|2 .

So using the lemma, (~a ·~b)2 ≤ |~a|2|~b|2.The last two items follow directly from the Cauchy-Schwarz inequality. Indeed,

|~a+~b|2 = (~a ·~b) · (~a ·~b)= ~a · ~a+ 2~a ·~b+~b ·~b≤ |~a|2 + 2|~a||~b|+ |~b|2

= (|~a|+ |~b|)2 .

The last inequality follows by taking ~a− ~c and ~c−~b in the previous.

2.4 Exercises

1. For each of the following examples, find the supremum and the infimum of the set S.Also state whether or not they are elements of S.

(a) S = {x ∈ [0, 5] : cos x = 0}.(b) S = {x : x2 − 2x− 3 < 0}.(c) S = {sn : sn =

∑ni=1 2−i}.

2. Prove by induction that for all n ∈ N and real numbers x1, . . . , xn,

|x1 + · · ·+ xn| ≤ |x1|+ · · ·+ |xn| .

3. Let A,B ⊂ R be nonempty and bounded above.

23

(a) Define the sum setA+B = {a+ b : a ∈ A, b ∈ B} .

Prove that sup(A+B) = supA+ supB.

(b) Define the product set

A ·B = {a · b : a ∈ A, b ∈ B} .

Is it true that sup(A·B) = supA·supB? If so, provide a proof; otherwise, providea counterexample.

4. Let C be a collection of open intervals (sets I = (a, b) for a < b) such that

• for all I ∈ C, I 6= ∅ and

• if I, J ∈ C satisfy I 6= J then I ∩ J = ∅.

Prove that C is countable.

Hint. Define a function f : C → S for some countable set S ⊂ R by setting f(I) equalto some carefully chosen number.

24

3 Metric spaces

3.1 Definitions

Definition 3.1.1. A set X with a function d : X × X → R is a metric space if for allx, y, z ∈ X,

1. d(x, y) ≥ 0 and equals 0 if and only if x = y and

2. d(x, y) ≤ d(x, z) + d(z, y).

Then we call d a metric.

Examples.

1. A useful example of a metric space is Rn with metric d(~a,~b) = |~a−~b|.

2. If X is any nonempty set we can define the discrete metric by

d(x, y) =

{1 if x 6= y

0 if x = y.

3. The set F [0, 1] of bounded functions f : [0, 1]→ R is a metric space with metric

d(f, g) = sup{|f(x)− g(x)| : x ∈ [0, 1]} .

3.2 Open and closed sets

Let (X, d) be a metric space. We are interested in the possible subsets of X and in whatways we can describe these using the metric d. Let’s start with the simplest.

Definition 3.2.1. Let r > 0. The neighborhood of radius r centered at x ∈ X is the set

Br(x) = {y ∈ X : d(x, y) < r}For example,

1. in R using the metric d(x, y) = |x− y| we have the open interval

Br(x) = (x− r, x+ r) = {y ∈ R : x− r < y < x+ r} .

2. In Rn using the metric d(x, y) = |x− y| we have the open ball

Br(x) = {(y1, . . . , yn) : (x1 − y1)2 + · · ·+ (xn − yn)2 < r2} .

To describe that these sets appear to be open (that is, no point is on the boundary), weintroduce a formal definition of open.

Definition 3.2.2. Let (X, d) be a metric space. A set Y ⊂ X is open if for each y ∈ Ythere exists r > 0 such that Br(y) ⊂ Y .

For each point y we must be able to fit a (possibly tiny) neighborhood around y so thatit still stays in the set Y . Thinking of Y as, for example, an open ball in Rn, as our pointy approaches the boundary of this set, the radius we take for the neighborhood around thispoint will have to decrease.

25

Proposition 3.2.3. Any neighborhood is open.

Proof. Let x ∈ X and r > 0. To show that Br(x) is open we must choose y ∈ Br(x) andshow that there exists some s > 0 such that Bs(y) ⊂ Br(x). The radius s will depend onhow close y is to the boundary. Therefore, choose

s = r − d(x, y) .

To show that for this s, we have Bs(y) ⊂ Br(x) we take z ∈ Bs(y). Then

d(x, z) ≤ d(x, y) + d(y, z) < d(x, y) + s = r .

Some more examples:

1. In R, the only intervals that are open are the (surprise!) open intervals. For instance,let’s consider the half-open interval (0, 1] = {x ∈ R : 0 < x ≤ 1}. If it were open, wewould be able to, given any x ∈ (0, 1], find r > 0 such that Br(x) ⊂ (0, 1]. But clearlythis is false because Br(1) contains 1 + r/2.

2. In R2, the set {(x, y) : y > 0} ∪ {(x, y) : y < −1} is open.

3. In R3, the set {(x, y, z) : y > 0} ∪ {(0, 0, 0)} is not open.

Proposition 3.2.4. Let C be a collection of open sets.

1. The union⋃O∈C O is open.

2. If C is finite then⋂O∈C O is open. This need not be true if the collection is infinite.

Proof. Let x ∈ ∪O∈CO. Then there exists O ∈ C such that x ∈ O. Since O is open, thereexists r > 0 such that Br(x) ⊂ O. This is also a subset of ∪O∈CO so this set is open.

To show that we cannot allow infinite intersections, consider the sets (−1/n, 1 + 1/n) inR. We have

∩∞n=1(−1/n, 1 + 1/n) = [0, 1] ,

which is not open (under the usual metric of R).For finite intersections, let O1, . . . , On be the open sets from C and x ∈ ∩ni=1Oi. Then

for each i, we have x ∈ Oi and therefore there exists ri > 0 such that Bri(x) ⊂ Oi. Lettingr = min{r1, . . . , rn}, we have Br(x) ⊂ Bri(x) for all i and therefore Br(x) ⊂ Oi for all i.This implies Br(x) is a subset of the intersection and we are done.

Definition 3.2.5. An interior point of Y ⊂ X is a point y ∈ Y such that there exists r > 0with Br(y) ⊂ Y . Write Y ◦ for the set of interior points of Y .

Directly by definition, Y is open if and only if Y = Y ◦.

26

Examples:

1. The set of interior points of [0, 1] (under the usual metric) is (0, 1).

2. The set of interior points of

{(x, y) : y > 0} ∪ {(x, y) : x = −1, y ≥ 0}

is just {(x, y) : y > 0}.

3. What is the set of interior points of Q?

4. Define a metric on R2 by d(x, y) = 1 if x 6= y and 0 otherwise. This can be shown tobe a metric. Given a set Y ⊂ R2, what is Y ◦?

Definition 3.2.6. A set Y ⊂ X is closed if its complement X \ Y is open.

Sets can be both open and closed. Consider ∅, whose complement is clearly open, making∅ closed. It is also open.

Proposition 3.2.7. Let C be a collection of closed sets.

1.⋂C∈C C is closed.

2. If C is finite then⋃C∈C C is closed.

Proof. Just use X \ [∩C∈CC] = ∪C∈C(X \ C).

3.3 Limit points

There is an alternative characterization of closed sets in terms of limit points

Definition 3.3.1. Let Y ⊂ X. A point x ∈ X is a limit point of Y if for each r > 0 thereexists y ∈ Y such that y 6= x and y ∈ Br(x). Write Y ′ for the set of limit points of Y .

Examples:

1. 0 is a limit point of {1, 1/2, 1/3, . . .}.

2. {1, 2, 3} has no limit points.

3. In R2, B1(0) ∪ {(0, y) : y ∈ R} ∪ {(10, 10)} has limit points

{(x, y) : x2 + y2 ≤ 1} ∪ {(0, y) : y ∈ R} .

Actually we could have given a different definition of limit point.

Proposition 3.3.2. x ∈ X is a limit point of Y if and only if for each r > 0 there areinfinitely many points of Y in Br(x)

27

Proof. We need only show that if x is a limit point of Y and r > 0 then there are in-finitely many points of Y in Br(x). We argue by contradiction; assume there are onlyfinitely many and label the ones that are not equal to x as y1, . . . , yn. Choosing r =min{d(x, y1), . . . , d(x, yn)}, we then have that Br(x) contains no points of Y except pos-sibly x. This contradicts the fact that x is a limit point of Y .

Here is yet another definition of closed.

Theorem 3.3.3. Y is closed if and only if Y ′ ⊂ Y .

Proof. Suppose Y is closed and let y be limit point of Y . If y /∈ Y then because X \ Y isopen, we can find r > 0 such that Br(y) ⊂ (X \ Y ). But for this r, there is no x ∈ Br(y)that is also in Y , so that y is not a limit point of Y , a contradiction.

Suppose conversely that Y ′ ⊂ Y ; we will show that Y is closed by showing that X \ Y isopen. To do this, let z ∈ X \ Y . Since z /∈ Y and Y ′ ⊂ Y , z cannot be a limit point of Y .Therefore there is an r > 0 such that Br(z) contains no points p 6= z such that p ∈ Y . Sincez is also not in Y , we must have Br(z) ⊂ (X \ Y ), implying that X \ Y is open.

Examples:

1. Again the set {1, 2, 3} has no limit points (because from the above proposition, a finiteset cannot have limit points). However it is closed by the above theorem.

2. Is Q closed in R? How about Q2 in R2?

3. The set Z has no limit points in R, so it is closed.

Definition 3.3.4. The closure of Y in X is the set Y = Y ∪ Y ′.

Theorem 3.3.5. Let C be the collection of all sets C ⊂ X such that C is closed and Y ⊂ C.Then

Y =⋂C∈C

C .

Proof. We first show the inclusion ⊂. To do this we need to show that each y ∈ Y andeach y ∈ Y ′ must be in the intersection on the right (call it J). First if y ∈ Y then becauseeach C ∈ C contains Y , we have y ∈ J . Second, if y ∈ Y ′ and C ∈ C we also claim thaty ∈ C. This is because y, being a limit point of Y , is also a limit point of C (directly fromthe definition). However C is closed, so it contains its limit points, and y ∈ C.

For the inclusion ⊃, we will show that Y ∈ C. This implies that Y is one of the sets weare intersecting to form J , and so J ⊂ Y . Clearly Y ⊃ Y , so we need to show that Y isclosed. If x /∈ Y then x is not in Y and x is not a limit point of Y , so there exists r > 0 suchthat Br(x) does not intersect Y . Since Br(x) is open, each point in it has a neighborhood iscontained in Br(x) and therefore does not intersect Y . This means that each point in Br(x)is not in Y and is not a limit point of Y , giving Br(x) ⊂

(Y)c

, so the complement of Y is

open. Thus Y is closed.

From the theorem above, we have a couple of consequences:

28

1. For all Y ⊂ X, Y is closed. This is because the intersection of closed sets is closed.

2. Y = Y if and only if Y is closed. One direction is clear: that if Y = Y then Y is closed.For the other direction, if Y is closed then Y ′ ⊂ Y and therefore Y = Y ∪ Y ′ ⊂ Y .

Examples:

1. Q = R.

2. R \Q = R.

3. {1, 1/2, 1/3, . . .} = {1, 1/2, 1/3, . . .} ∪ {0}.

For some practice, we give Theorem 2.28 from Rudin:

Theorem 3.3.6. Let Y ⊂ R be nonempty and bounded above. Then supY ∈ Y and thereforesupY ∈ Y if Y is closed.

Proof. By the least upper bound property, s = supY exists. To show s ∈ Y we need toshow that s ∈ Y or s ∈ Y ′. If s ∈ Y we are done, so we assume s /∈ Y and prove that s ∈ Y ′.Since s is the least upper bound, given r > 0 there must exist y ∈ Y such that

s− r < y ≤ s .

If this were not true, then s − r would be an upper bound for Y . But now we have foundy ∈ Y such that y 6= s and y ∈ Br(s), proving that s is a limit point for Y .

Note that supY is not always a limit point of Y . Indeed, consider the set

Y = {0} .

This set has supY = 0 but has no limit points. The set Y can even have limit points butjust not with supY a limit point. Consider Y = {0} ∪ [−2,−1].

3.4 Compactness

It will be very important for us, during the study of continuity for instance, to understandexactly which sets Y ⊂ R have the following property: for each infinite subset E ⊂ Y , Ehas a limit point in Y . We will soon see that the interval [0, 1] has this property, whereas(0, 1) does not (take for example the subset {1, 1/2, 1/3, . . .}). The reason is that we willmany times find ourselves exactly in this situation: with an infinite subset E of some set Yand we will want to find a limit point for E (and hope that it is also in E). This propertyis what we will call on the problem set limit point compactness.

Limit point compactness was apparently one of the original notions of compactness (seethe discussion in Munkres’ topology book at the beginning of the compactness section –thanks Prof. McConnell). However over time it became apparent that there was a strongerand more general version of compactness (equivalent in metric spaces, but not in all topo-logical spaces) which could be formulated only in terms of open sets. We give this definition,now taken to be the standard one, below.

29

Definition 3.4.1. A subset K of a metric space X is compact if for every collection Cof open sets such that K ⊂ ∪C∈CC, there are finitely many sets C1, . . . , Cn ∈ C such thatK ⊂ ∪ni=1Ci.

The collection C is called an open cover for K and {C1, . . . , Cn} is a finite subcover.The process of choosing this finite number of sets from C is referred to as extracting a finitesubcover. The definition, in this language, states that K is compact if from every open coverof K we can extract a finite subcover of K.

It is quite difficult to gain intuition about the above definition, but it will develop as wego on and use compactness in various circumstances. The main point is that finite collectionsare much more useful than infinite collections. This is true for example with numbers: wealready know that a set of finitely many numbers has a min and a max, whereas an infinite setdoes not necessarily. As we go through the course, to develop a clearer view of compactness,you should revisit the following phrase: often times, compactness allows us to pass from“local” information (valid in each open set from the cover) to “global” information (valid onthe whole space), by patching together the sets in the finite subcover.

Let us now give some properties of compact sets and try to emphasize where the abilityto extract finite subcovers comes into the proofs.

Theorem 3.4.2. Any compact set is limit point compact.

Proof. Let K ⊂ X be compact and let E ⊂ K be an infinite set. Assume for a contradictionthat E has no limit point in K, so for each x ∈ K we can find rx > 0 such that Brx(x)intersects E only possibly at x. The collection C = {Brx(x) : x ∈ K} is an open cover ofK, so by compactness it can be reduced to a finite subcover of K (and thus of E). But thismeans that E must have been finite, a contradiction.

Definition 3.4.3. A set E ⊂ X is bounded if there exists x ∈ X and R > 0 such thatE ⊂ BR(x).

Theorem 3.4.4. Any compact K ⊂ X is bounded.

Proof. Pick x ∈ X and define a collection C of open sets by

C = {BR(x) : R ∈ N} .

We claim that C is an open cover of K. We need just to show that each point of X is in atleast one of the sets of C. So let y ∈ X and choose R > d(y, x). Then y ∈ BR(x).

Since K is compact, there exist C1, . . . , Cn ∈ C such that K ⊂ ∪ni=1Ci. By definitionof the sets in C we can then find R1, . . . , Rn such that K ⊂ ∪ni=1BRi(x). Taking R =max{R1, . . . , Rn}, we then have K ⊂ BR(x), completing the proof.

In the proof it was essential to extract a finite subcover because we wanted to take R tobe the maximum of radii of sets in C. This is clearly infinity if we have an infinite subcover,and so in this case the proof would break down (that is, if K we were not able to extract afinite subcover).

Examples.

30

1. The set {1/2, 1/3, . . .} is not compact. This is because we can find an open cover thatadmits no finite subcover. Indeed, consider

C =

{(1

n− 1

2n,

1

n+

1

2n

): n ≥ 2

}.

Each one of the sets in the above collection covers only finitely many elements from{1/2, 1/3, . . .}, and so any finite sub collection cannot cover the whole set.

2. However if we add 0, by considering the set {1/2, 1/3, . . .} ∪ {0}, it becomes compact.To prove this, let C be any open cover; we will show that there are finitely many setsfrom C that still cover our set.

To do this, note first that there must be some C ∈ C such that 0 ∈ C. Since C isopen, it contains some interval (−r, r) for r > 0. Then for n > 1/r, all points 1/n arein this interval, and thus C contains all but finitely many of the points from our set.Now we just need to cover the other points, of which there are finitely many. Writing1/2, . . . , 1/N for these points, choose for each i a set Ci from C such that 1/i ∈ Ci.Then

{C,C2, . . . , CN}

is a finite subcover.

The main problem in example 1 was actually that the set was not closed. It is notimmediately apparent how that was manifested in our inability to produce a finite subcover,but it is a general fact:

Theorem 3.4.5. Any compact K ⊂ X is closed.

Proof. We will show that Kc = X \K is open. Therefore pick x ∈ Kc; we will produce anr > 0 such that Br(x) ⊂ Kc.

We first produce an open cover of K. For each y ∈ K, the distance d(x, y) must bepositive, since x 6= y (as x /∈ K). Therefore define the ball

By = {B(y, d(x, y)/2)} .

We now define the collectionC = {By : y ∈ K} .

Since each y ∈ By this is an open cover of K.By compactness, we can extract a finite subcover {By1 , . . . , Byn}. Choosing

r = min{d(x, yi)/2 : i = 1, . . . , n} ,

we claim then that Br(x) ⊂ Kc. To show this, let z ∈ Br(x). Then d(z, x) < r and by thetriangle inequality,

d(yi, x) < d(z, yi) + d(z, x) < d(z, yi) + r ,

31

givingd(x, yi)/2 ≤ d(yi, x)− r ≤ d(z, yi) for all i = 1, . . . , n .

In other words, z /∈ Byi for all i. But the Byi ’s cover K and therefore z /∈ K. This meansKc is open, or K is closed.

We now mention a useful way to produce new compact sets from old ones.

Theorem 3.4.6. If K ⊂ X is compact and L ⊂ K is closed, then L is compact.

Proof. Let C be an open cover of L. Define

D = C ∪ {Lc}

and note that D is actually an open cover of K. Therefore, as K is compact, we can extractfrom D a finite subcover {D1, . . . , Dn}. If Di ∈ C for all i, then we are done; otherwise Lc

it in this set (say it is Dn) and we consider the collection {D1, . . . , Dn−1}. This is a finitesubcollection of C. We claim that it is an open cover of L as well. Indeed, if x ∈ L thenthere exists i = 1, . . . , n such that x ∈ Di. Since x /∈ Lc, Di cannot equal :c, meaning thati 6= n. This completes the proof.

3.5 Heine-Borel Theorem: compactness in Rn

In the above theorems, we see that a compact set is always closed and bounded. The converseis true, but not in all metric spaces. The fact that it is true in Rn is called the Heine-Boreltheorem.

Theorem 3.5.1 (Heine-Borel). A set K ⊂ Rn is compact if and only if it is closed andbounded.

To prove this theorem, we will need some preliminary results. Recall that Rudin definesan n-cell to be a subset of Rn of the form

[a1, b1]× · · · × [an, bn] for ai ≤ bi, i = 1, . . . , n .

Lemma 3.5.2. Suppose that C1, C2, . . . are n-cells that are nested; that is, if

Ci =n∏k=1

[a(k)i , b

(k)i ] ,

then[a

(k)i , b

(k)i ] ⊃ [a

(k)i+1, b

(k)i+1] for all i and k .

Then ∩iCi is nonempty.

32

Proof. We first consider the case n = 1. That is, take Ci = [ai, bi] for i ≥ 1 and ai ≤ bi.Define A = {a1, a2, . . .} and a = supA. We claim that

a ∈ ∩iCi .

To see this, note that ai ≤ bj for all i, j. Indeed,

ai ≤ aj ≤ bj if i ≤ j

andai ≤ bi ≤ bj if i ≥ j .

Therefore bj is an upper bound for A. But a is the least upper bound of A so a ≤ bj for allm. This gives

ai ≤ a ≤ bi for all m ,

or a ∈ ∩iCi.For the case n ≥ 2 we just do the same argument on each of the coordinates to find

(a(1), . . . , a(n)) such that

a(k)i ≤ a(k) ≤ b

(k)i for all i, k ,

or (a(1), . . . , a(n)) ∈ ∩iCi.

Lemma 3.5.3. Any n-cell is compact in Rn.

Proof. For simplicity, take K = [0, 1]× · · ·× [0, 1] = [0, 1]n. Since Rn is a metric space (withthe usual metric), it suffices to prove that K is limit point compact; that is, that each infinitesubset of K has a limit point in K. This is from exercise 11 at the end of the Chapter. Itstates that compactness and limit point compactness are equivalent in metric spaces.

Suppose that E ⊂ K is infinite. We will produce a limit point of E inside K. We beginby dividing K into 2n sub-cells by cutting each interval [0, 1] into two equal pieces. Forinstance, in R2 we would consider the 4 sub-cells

[0, 1/2]× [0, 1/2], [0, 1/2]× [1/2, 1], [1/2, 1]× [0, 1/2], [1/2, 1]× [1/2, 1] .

At least one of these 2n sub-cells must contain infinitely many points of E. Call this sub-cellK1. Repeat, by dividing K1 into 2n equal sub-cells to find a sub-sub-cell K2 which containsinfinitely many points of E.

We continue this procedure ad infinitum, at stage i ≥ 1 finding a sub-cell Ki of K of theform

Ki = [r1,i2−i, (r1,i + 1)2−i]× · · · × [rn,i2

−i, (rn,i + 1)2−i]

which contains infinitely many points of K. Note that the Ki’s satisfy the conditions of theprevious lemma: they are nested n-cells. Therefore there exists z ∈ ∩iKi. Because each Ki

is a subset of K, we have z ∈ K.We claim that z is a limit point of E. To show this, let r > 0. Note that for all points

x, y ∈ Ki we have

|x− y|2 = (x1 − y1)2 + · · ·+ (xn − yn)2 ≤ n(2−i)2 =n

4i.

33

Therefore

diam(Ki) = sup{|x− y| : x, y ∈ Ki} ≤√n

2i≤√n

i.

(You can prove this inequality i ≤ 2i for all i by induction.) So fix any i >√nr

; then for allx ∈ Ki we have (because z ∈ Ki)

|x− z| ≤ diam(Ki) ≤√n

i< r ,

so that Ki ⊂ Br(z). However Ki contains infinitely many points of E, so we can find onenot equal to z in Br(z). This means z is a limit point of E.

Proof of Heine-Borel. Suppose that K is closed and bounded in Rn. Then there exists ann-cell C such that K ⊂ C. By the previous lemma, C is compact. But K is a closed subsetof C so K is compact.

Suppose conversely that K is compact. Then we have already shown K is closed andbounded.

Therefore we find that

(closed and bounded)⇒ (compact) in Rn ,

(closed and bounded)⇐ (compact) in metric spaces ,

and(compact)⇔ (limit point compact) in metric spaces .

3.6 The Cantor set

We now give a famous example of a compact set. This is the Cantor set, and it has manyinteresting properties. We construct it iteratively. At stage 0, we start with

C0 = [0, 1] ,

the entire unit interval in R. At stage 1, we remove the middle third of C0 to produce

C1 =

[0,

1

3

]∪[

2

3, 1

],

a set which is a union of 2 disjoint closed intervals, each of length 1/3. We continue, at stagen, producing Cn from Cn−1 by removing the middle third of each interval which comprisesCn−1. For example,

C2 =

[0,

1

9

]∪[

2

9,1

3

]∪[

2

3,7

9

]∪[

8

9, 1

].

It follows that at stage n, the set Cn is a union of 2n disjoint closed intervals, each of length3−n. We then define

C = ∩∞n=0Cn

to be the Cantor set.

Properties.

34

1. C is closed because it is an intersection of closed sets.

2. C is compact because it is closed and bounded (in R).

3. C has “total length” 0. Although we have not defined this, we can compute the lengthof Cn: it is composed of 2n intervals of length 3−n. Thus its “length” is (2/3)n. Becausethis number tends to 0 as n goes to infinity (don’t worry – we will define these thingsrigorously later),

length(C) ≤ length(Cn) = (2/3)n

for all n, giving length(C) = 0.

4. Although it looks like all that will remain in the end is the endpoints of the intervalsused to construct C, in fact there is much more. The set of such endpoints is countable,whereas C is uncountable. To see why this is true, note that each x ∈ C can be givenan “address.” The point x is in C1, so it is in exactly one of the two intervals of C1;assign the value 0 to x if it is in the first and 1 if it is in the second. Similarly, theset C2 splits each interval of C1 into two: give x the value 0 if it is in the left suchinterval and 1 if it is in the right. Continuing in this way, we can assign to x an infinitesequence of 0’s and 1’s:

x 7→ 0111000110101 . . .

(In fact, this is nothing but the ternary expansion of x, replacing 2’s by 1’s.) The mapsending x to a sequence is actually a bijection from C to the set of sequences of 0’sand 1’s, which we know is uncountable.

One example of an element of C that is not an endpoint is 1/4: its address is

1/4 7→ 010101 . . .

(Endpoints have 1-term repeating addresses, like 1/3 7→ 011111 . . .)

5. Every point of C is a limit point of C. To show this, we will prove more. We will showthat for each x ∈ C and each r > 0, there are points y and z in (x − r, x + r) suchthat y 6= x 6= z and y ∈ C, z /∈ C. To do this, choose N such that N > 1/r and notethat since 2N > N we certainly have 3N > N , giving 3−N < r. Since x ∈ C it followsthat x ∈ CN so there is some subinterval I of CN of length 3−N such that x ∈ I. Thisinterval is necessarily contained in (x − r, x + r) and so both endpoints of I (whichsurvive through the construction of C) are in this neighborhood. At least one of thesepoints is not equal to x, so we have found a point of C in (x− r, x+ r), giving that xis a limit point of C.

6. C contains no open intervals. This proof proceeds like in the previous case. If x ∈ Cand r > 0 we can find some N such that CN contains an interval I entirely containedin (x− r, x+ r). In the next stage of the construction, we remove part of this intervaland so we can find some y ∈ Cc that is in (x − r, x + r). So, each point of C is alsoa limit point of Cc. This implies that no point of C is an interior point, which in Rmeans that C contains no open intervals.

35

Let’s finish with an observation. In exercise 6, you are asked to show that if (X, d) isa metric space and (On) is a countable collection of open dense subsets of X then ∩∞n=1On

is nonempty. From this we can actually derive uncountability of the real numbers. Indeed,assume for a contradiction that R is countable and list its elements as {x1, x2, . . .}. DefineOn = R \ {xn}. Each On is open and dense in R, so the intersection of all On’s is nonempty.This is a contradiction, since ⋂

n

On =⋂n

[R \ {xn}] = ∅ .

3.7 Exercises

1. For X = R2 define the function d : X ×X → R by

d ((x1, y1), (x2, y2)) = |x1 − x2|+ |y1 − y2| .

Prove that d is a metric on X. Describe the unit ball centered at the origin geometri-cally. Repeat this question using

d ((x1, y1), (x2, y2)) = max{|x1 − x2|, |y1 − y2|} .

2. Let F [0, 1] be the set of all bounded functions from [0, 1] to R. Show that d is a metric,where

d(f, g) = sup{|f(x)− g(x)| : x ∈ [0, 1]} .

3. Let X be the set of real valued sequences with only finitely many nonzero terms:

X = {x = (x1, x2, . . .) : xi ∈ R and xi 6= 0 for only finitely many i} .

For an element x ∈ X write n(x) for the largest i ∈ N such that xi 6= 0. Define thefunction d : X ×X → R by

d(x, y) =

max{n(x),n(y)}∑i=1

(xi − yi)21/2

.

(a) Show that (X, d) is a metric space.

(b) For each n ∈ N define en as the element of X that has n-th coordinate equal to1 and all others zero. Show that the set {en : n ∈ N} is closed and bounded butdoes not have a limit point.

4. For each of the following examples, verify that the collection C is an open cover of Eand determine if it can be reduced to a finite subcover. If it can, give a finite subcover;otherwise, show why it cannot be reduced.

(a) E = {1, 1/2, 1/4, . . .} = {2−n : n ≥ 0}, C = {(2−n−1, 3 · 2−n−1) : n ≥ 0}.

36

(b) E = [0, 1], C = {(x− 10−4, x+ 10−4) : x ∈ Q ∩ [0, 1]}.

5. Prove that an uncountable set E ⊂ R cannot have countably many limit points.

Hint. Argue by contradiction and assume that there is a set E ⊂ R such that E ′, theset of limit points of E, is countable. What can you say about E \ E ′?

6. An open interval I ⊂ R is a set of the form

• (a, b) = {x ∈ R : a < x < b} for a ≤ b or

• (a,∞) = {x ∈ R : a < x} or

• (−∞, b) = {x ∈ R : x < b} or

• R = (−∞,∞).

Let O ⊂ R be a nonempty open set and let x ∈ O. Define Ox as the union of all openintervals I such that x ∈ I and I ⊂ O. Prove that Ox is a nonempty open interval.

7. Let O ⊂ R be a nonempty open set. By completing the following two steps, show thatthere exists a countable collection C of open intervals such that

for all I, J ∈ C with I 6= J, we have I ∩ J = ∅ and (2)⋃I∈C

I = O . (3)

(a) For x ∈ O, let Ox be defined as in exercise 3. Show that if Ox ∩Oy 6= ∅ for somex, y ∈ O then Ox = Oy.

(b) Define C = {Ox : x ∈ O} and complete the proof by showing that C is countableand has properties (2) and (3).

8. The Kuratowski closure and complement problem. For subsets A of a metric space X,consider two operations, the closure A, and the complement Ac = X \ A. We can

perform these multiple times, as in Ac, (A)c, etc.

(a) Prove that, starting with a given A, one can form no more than 14 distinct sets byapplying the two operations successively.

(b) Letting X = R, find a subset A ⊆ R for which the maximum of 14 is attained.

Hint to get started. Clearly (Ac)c = A, so two complements in a row get you nothingnew. What about two closures in a row? See Rudin, Thm. 2.27.

9. A subset E of a metric space X is dense if each point of X is in E or is a limit pointof E (or both). Let A1, A2, . . . be open dense sets in R. Show that ∩nAn 6= ∅.

Hint. Define a sequence of sets as follows. Choose x1 ∈ A1 and r1 > 0 such thatBr1(x1) ⊂ A1. Then argue that there exists x2 ∈ A1 ∩ A2 and r2 > 0 such that

37

Br2(x2) ⊂ Br1/2(x1). Continuing, find infinite sequences r1, r2, . . . and x1, x2, . . . suchthat xn ∈ ∩n−1k=1Ak and Brn(xn) ⊂ Brn−1/2(xn−1). Then, for each n, define Bn =

Brn/2(xn). What can you say about ∩nBn?

10. Show that both Q and R \Q are dense in R with the usual metric.

11. We now extend the definition of dense. If E1, E2 are subsets of a metric space X thenE1 is dense in E2 if each point of E2 is in E1 or is a limit point of E1. Show that ifE1, E2, E3 are subsets of X such that E1 is dense in E2 and E2 is dense in E3 then E1

is dense in E3.

12. We say a metric space (X, d) has the finite intersection property if whenever C is a col-lection of closed sets in X such that each finite subcollection has nonempty intersection,the full collection has nonempty intersection:⋂

C∈C

C 6= ∅ .

Show that X has the finite intersection property if and only if X is compact. (Youmay use Rudin, Theorem 2.36.)

13. Let K be a collection of compact subsets of a metric space X.

(a) Show that⋂K∈KK is compact.

(b) Show that if K is finite then⋃K∈KK is compact. Is this still true if K is infinite?

14. Let (X, d) be a metric space. We say that a subset E of X is limit point compact ifevery infinite subset of E has a limit point in E. We have seen in class that if E iscompact then E is limit point compact. This exercise will serve to show the converse:that if E is limit point compact then it is compact. For the following questions, fix asubset E that is limit point compact.

(a) Show that E is closed.

(b) Show that if δ > 0 then there exist finitely many points x1, . . . , xn in E such that

E ⊂n⋃i=1

Bδ(xi) .

(c) Show that if A ⊂ E is closed, then A is also limit point compact.

(d) Show that if A1, A2, . . . are closed subsets of E such that An ⊃ An+1 for all n ≥ 1then ∩nAn 6= ∅.

Hint. Define a set {xn : n ≥ 1} by choosing x1 ∈ A1, x2 ∈ A2, and so on.

38

(e) Use the previous parts to argue that E is compact.

Hint. Argue by contradiction and assume that there is an open cover C of E thatcannot be reduced to a finite subcover. Begin with δ1 = 1/2 and apply part (b)

to get points x11, . . . , x1n1∈ E such that E ⊂ ∪n1

k=1Bδ1(x1k). Clearly

E ⊂ ∪n1k=1

[Bδ1(x

1k) ∩ E

].

At least one of these sets, say Bδ1(x1j1

) ∩E cannot be covered by a finite numberof sets from C, or else we would have a contradiction. By parts (a) and (c), it hasthe limit point property and C is a cover of it, so repeat the construction usingthis set instead of E and δ2 = 1/4. Continue, at step n ≥ 3 using δn = 2−n, tocreate a decreasing sequence of closed subsets of E. Use part (d).

15. In this exercise we will consider a construction similar to that of the Cantor set. Wewill define a countable collection of subsets {En : n ≥ 0} of the interval [0, 1] and wewill set E = ∩∞n=0En.

We define E0 = [0, 1], the entire interval. To define E1, we remove a subinterval of E0

of length 1/4 from the middle of E0. Precisely, we set

E1 =

[0,

3

8

]∪[

5

8, 1

].

Next, let E2 be the set obtained by removing two subintervals, each of length 1/16,from the middle of each piece of E1. Thus

E2 =

[0,

5

32

]∪[

7

32,3

8

]∪[

5

8,25

32

]∪[

27

32, 1

].

Continuing, at each step n ≥ 3, we create En by removing 2n−1 subintervals, each oflength 4−n, from the middle of each piece of En−1. Define

E =∞⋂n=0

En .

(a) Show that each point of E is a limit point of E.

(b) Show that E does not contain any open interval.

(c) What is the total length of E?

39

4 Sequences

4.1 Definitions

Definition 4.1.1. Let (X, d) be a metric space. A sequence is a function f : N→ X.

We think of a sequence as a list of its elements. We typically write x1 = f(1), x2 = f(2)and forget about f , denoting the sequence as (xn) and the elements x1, x2, . . ..

The most fundamental notion related to sequences is that of convergence.

Definition 4.1.2. A sequence (xn) converges to a point x ∈ X if for every ε > 0 there existsN such that if n ≥ N then

d(xn, x) < ε .

In this case we write xn → x.

We can think of proving convergence of a sequence as follows. We have a sequence (xn)and you tell me it has a limit x. I ask “Oh yeah? Well can you show that the terms of thesequence get very close to x?” You say yes and I ask “Can you show that all but finitelymany terms are within distance ε = 1 of x? You say yes and provide an N equal to 600.Then you proceed to show me that all xn for n ≥ 600 have d(xn, x) < 1. Temporarilysatisfied, I ask, “Well you did it for 1, what about for ε = .00001?” You then dream up ofan N equal to 40 billion such that for n ≥ N , d(xn, x) < .00001. This game can continueindefinitely, and as long as you can come up with an N for each of my values of ε, then wesay xn converges to x.

Example.We all believe that the sequence (xn) given by xn = 1

n2+n(in R) converges to 0, How do

we prove it? Let ε > 0. We want |xn − 0| < ε, so we solve:

1

n2 + n< ε, which is equivalent to n2 + n >

1

ε.

This will certainly be true if n > 1ε, so set

N =

⌈1

ε

⌉.

Now if n ≥ N then1

n2 + n≤ 1

N2 +N<

1

N2< ε .

In the previous example, to show convergence to something we could have noticed thatthe sequence is monotonic and bounded.

Definition 4.1.3. A sequence (xn) in R is

1. monotone increasing if xn < xn+1 for all n (monotone non-decreasing if xn ≤ xn+1)and

40

2. monotone decreasing if xn > xn+1 for all n (monotone non-increasing if xn ≥ xn+1).

Theorem 4.1.4. If (xn) is monotone (any of the types above) and bounded (that is, {xn :n ∈ N} is bounded) then it converges.

Proof. Suppose (xn) is monotone increasing. The other cases are similar. Then

X := {xn : n ∈ N}

is nonempty and bounded above so it has a supremum x. We claim that xn → x. To provethis, let ε > 0. Then x − ε is not an upper bound for X and there exists N such thatxN > x− ε. Then if n ≥ N ,

x ≥ xn ≥ xN > x− ε ,

giving |x− xn| < ε, so xn → x.

We now recall some basic properties of limits. For one part we need a definition.

Definition 4.1.5. A sequence (xn) in a metric space X is bounded if there exists q ∈ X andm ∈ R such that

d(pn, q) ≤M for all n ∈ N .

Note that this is the same as saying that the set {xn : n ∈ N} is bounded.

Theorem 4.1.6 (Rudin, Theorem 3.2). Let (xn) be a sequence in a metric space X.

1. (xn) converges to x ∈ X if and only if every neighborhood of x contains xn for all butfinitely many n.

2. (Uniqueness of the limit) If x, y ∈ X and (xn) converges to both x and y, then x = y.

3. If (xn) converges then (xn) is bounded.

4. If E ⊂ X and x is a limit point of E then there is a sequence (xn) in E converging tox.

Proof. Part 1 is just a restatement of the definition of a limit. For the second part, supposethat (xn) converges to x and to y. Let ε > 0, so that there exists N1 and N2 such that

if n ≥ N1 then d(xn, x) < ε/2

and if n ≥ N2 then d(xn, y) < ε/2 .

Using the triangle inequality, for N = max{N1, N2}, we get

d(x, y) ≤ d(x, xN) + d(xN , y) < ε/2 + ε/2 = ε .

Thus d(x, y) < ε for all ε > 0; this is only possible if d(x, y) = 0 and thus x = y.

41

For part 3, suppose that (xn) converges to x ∈ X and let ε = 1. Then there exists Nsuch that if n ≥ N then d(x, xn) < 1. Now choose

r = max{1, d(x, x1), . . . , d(x, xN−1)} .

It follows then that d(x, xn) < r for all n, and so the set {xn : n ∈ N} is contained in Br(x).We now show part 4. Suppose x is a limit point of E. For each n, choose any point (call

it xn) in the set B1/n(x)∩E. We claim that this sequence of points (xn) converges to x. Tosee this, let ε > 0 and pick

N =

⌈1

ε

⌉.

Then if n ≥ N , d(xn, x) < 1/n ≤ 1/N < ε.

In the above, we see that a limit of a sequence is unique. This is in contrast to the limitpoints (plural!) of a subset E of X. The points in E are in no particular order, and E mayhave many limit points. But in a sequence, the points are ordered, and there can be at mostone limit as n runs through that chosen order.

In the case that the sequence is of real numbers, there is a nice compatibility witharithmetic operations.Properties of real sequences. Let (xn) and (yn) be real sequences such that xn → x andyn → y.

1. xn + yn → x+ y.

2. If c ∈ R then cxn → cx.

3. xnyn → xy.

4. If y 6= 0 and yn 6= 0 for all n ∈ N then xnyn→ x

y.

Proofs of properties. Many of these are similar so we will prove only 1 and 3. Rudin containsall of the proofs. Suppose first that xn → x and yn → y. Given ε > 0 choose N1 and N2

such thatif n ≥ N1 then |xn − x| < ε/2 and

if n ≥ N2 then |yn − y| < ε/2 .

Letting N = max{N1, N2}, then if n ≥ N we have

|xn + yn − (x+ y)| ≤ |xn − x|+ |yn − y| < ε .

For the third part we write

|xnyn − xy| ≤ |yn||xn − x|+ |x||yn − y| .

Now note that since (yn) converges, it is bounded. Therefore we can find M > 0 such that|x| ≤M and |yn| ≤M for all n. Given ε > 0 choose N such that if n ≥ N then both

|xn − x| ≤ ε/(2M) and |yn − y| ≤ ε/(2M) .

42

Then if n ≥ N ,|xnyn − xy| ≤Mε/(2M) +Mε/(2M) = ε .

Note that for the last item above we required yn 6= 0 for all n. This is not necessary forthe following reasons.

Lemma 4.1.7. If (yn) is a real sequence such that yn → y and y 6= 0 then yn = 0 for atmost finitely many n ∈ N.

Proof. Suppose that yn → y with y 6= 0 and let ε = |y|. Then there exists N ∈ N such thatif n ≥ N then |yn − y| < ε. By the triangle inequality, if n ≥ N then

|yn| ≥ |y| − |yn − y| > 0 ,

giving yn 6= 0.

The next lemma says that if we remove a finite number of terms from a convergentsequence, this does not affect the limit.

Lemma 4.1.8. Let (yn) be a sequence in a metric space X. For a fixed k ∈ N define asequence (zn) by

zn = yn+k for n ∈ N .

Then (yn) converges if and only if (zn) does. If yn → y then zn → y.

Proof. Suppose yn → y. If ε > 0 we can pick N ∈ N such that d(yn, y) < ε for n ≥ N . Forn ≥ N ,

d(zn, y) = d(yn+k, y) < ε ,

since n+ k ≥ N also. This means zn → y.Conversely, if zn → y then given ε > 0 we can find N ∈ N such that n ≥ N implies that

d(zn, y) < ε. Define N ′ = N + k. Then if n ≥ N ′, we have n− k ≥ N and so

d(yn, y) = d(zn−k, y) < ε .

Thus yn → y.

Now we can change the last property of real sequences as follows. If (xn) and (yn) arereal sequences such that xn → x and yn → y with y 6= 0 then xn/yn → x/y. To do this,we use the first lemma to find k such that for all n, yn+k 6= 0. Then we can consider thesequences (xn+k) and (yn+k) and prove the property for them. Since they only differ from(xn) and (yn) by a finite number of terms, the property also holds for (xn) and (yn).

We will mostly deal with sequences of real numbers (or elements of an arbitrary metricspace), but it is useful to understand convergence in Rk, k ≥ 2. It can be reformulatedin terms of convergence of each coordinate. That is, if (xn) is a sequence in Rk, we write

xn = (x(1)n , . . . , x

(k)n ). The sequence (xn) converges to x ∈ Rk if and only if each coordinate

sequence (x(j)n ) converges to x(j), the j-th coordinate of x.

43

Theorem 4.1.9. Let (xn) and (yn) be sequences in Rk and (βn) a sequence of real numbers.

1. (xn) converges to x ∈ Rk if and only if x(j)n → x(j) (in R) for all j = 1, . . . , k.

2. If xn → x, yn → y in Rk and βn → β in R, then

xn + yn → x+ y, xn · yn → x · y, and βnxn → βx ,

where ‘·′ is the standard dot product in Rk.

Proof. The second part follows from the first part and properties of limits in R we discussedabove. To prove the first, suppose that xn → x and let j ∈ {1, . . . , k}. Given ε > 0, let Nbe such that n ≥ N implies that |xn − x| < ε. Then we have

|x(j)n − x(j)| =√

(x(j)n − x(j))2 ≤

√(x

(1)n − x(1))2 + · · ·+ (x

(k)n − x(k))2 = |xn − x| < ε .

So x(j)n → x(j).

For the converse, suppose that x(j)n → x(j) for all j = 1, . . . , k and let ε > 0. Pick

N1, . . . , Nk such that for j = 1, . . . , k, if n ≥ Nj then |x(j)n − x(j)| < ε/√k. then for

N = max{N1, . . . , Nk} and n ≥ N , we have

|xn − x| =√

(x(1)n − x(1))2 + · · ·+ (x

(k)n − x(k))2 <

√ε2/d+ · · ·+ ε2/d = ε .

We finish this section with the idea of convergence to infinity.

Definition 4.1.10. A real sequence (xn) converges to ∞ if for each M > 0 there existsN ∈ N such that

n ≥ N implies xn > M .

It converges to −∞ if (−xn) converges to ∞.

As before we write xn → ∞ (or xn → −∞) in this case. In this definition we think ofM as taking the role of ε from before and we imagine that (M,∞) is a “neighborhood ofinfinity.”

Clearly a sequence that converges to infinity needs to be unbounded. The converse is nottrue. Consider (xn), defined by

xn =

{1 n odd

n n even.

This sequence does not converge to infinity, but it is unbounded.

44

4.2 Subsequences, Cauchy sequences and completeness

We now move back to sequences in general metric spaces. Sometimes the sequence does notconverge, but if we remove many of the terms we can make it converge. Another way to saythis is that a sequence might not converge but it may have a convergent subsequence.

Definition 4.2.1. Let (xn) be a sequence in a metric space X. Given a monotonicallyincreasing sequence (nk) in N (that is, the sequence is such that n1 < n2 < · · · ) then thesequence (xnk) is called a subsequence of (xn). If xnk → y as k → ∞ then we call y asubsequential limit of (xn).

Note that a sequence (xn) converges to x if and only if each subsequence of (xn) convergesto x. To prove this, suppose first that xn → x and let (xnk) be a subsequence. Given ε > 0we can find N such that if n ≥ N then d(xn, x) < ε. Because (nk) is monotone increasing, itfollows that nk ≥ k for all k, so choose K = N . Then for k ≥ K, the element xnk is a termof the sequence (xn) with index at least equal to N , giving d(xnk , x) < ε.

Conversely, suppose that each subsequence of (xn) converges to x. Then as (xn) is asubsequence of itself, we also have xn → x!

The next theorem is one of the most important in the course. It is a restatement ofcompactness; in general topological spaces, it is called sequential compactness.

Theorem 4.2.2. Let (xn) be a sequence in a compact metric space X. Then some subse-quence of (xn) converges to a point x in X.

Proof. It may be that the set of sequence elements {xn : n ∈ N} is finite. In this case, atleast one of these elements must appear in the sequence infinitely often. That is, there existsx ∈ {xn : n ∈ N} and a monotone increasing sequence (nk) such that xnk = x for all k.Clearly then xnk → x and the element x ∈ X because the sequence terms are.

Otherwise, the set {xn : n ∈ N} is infinite. Because compactness implies limit point com-pactness, there exists x ∈ X which is a limit point of this set. Then we build a subsequencethat converges to x as follows. Since d(xn, x) < 1 for infinitely many n, we can pick n1 suchthat d(xn1 , x) < 1. Continuing in this fashion, at stage i we note that d(xn, x) < 1/i forinfinitely many n, so we can pick ni > ni−1 such that d(xni , x) < 1/i. Because n1 < n2 < · · · ,the sequence (xni) is a subsequence of (xn). Further, given ε > 0, choose I > 1/ε, so that ifi ≥ I,

d(xni , x) ≤ 1/i ≤ 1/I < ε .

Corollary 4.2.3 (Bolzano-Weierstrass). Each bounded sequence in Rk has a convergentsubsequence.

Proof. If (xn) is bounded in Rk then we can fit the set {xn : n ∈ N} into a k-cell, which iscompact. Now, viewing (xn) as a sequence in this compact k-cell, we see by the previoustheorem that it has a convergent subsequence.

45

The last topic of the section is Cauchy sequences. The motivation is as follows. Manytimes we are in a metric space X that has “holes.” For instance, we may consider Q as ametric space inside of R (that is, using the metric d(x, y) = |x − y| from R). In this space,the sequence

(1, 1.4, 1.41, 1.414, . . .)

does not converge (it should only converge in R – to√

2, but this element is not in ourspace). Although we cannot talk about this sequence converging; that is, getting close tosome limit x, we can do the next best thing. We can say that the terms of the sequence getclose to each other.

Definition 4.2.4. Let (xn) be a sequence in a metric space X. We say that (xn) is Cauchyif for each ε > 0 there exists N ∈ N such that

if m,n ≥ N then d(xm, xn) < ε .

Just like before, the number N gives us a cutoff in the sequence after which all termsare close to each other. Each convergent sequence (xn) (with some limit x) is Cauchy, for ifε > 0 then we can pick N such that if n ≥ N then d(xn, x) < ε/2. Then for m,n ≥ N ,

d(xn, xm) ≤ d(xn, x) + d(xm, x) < ε/2 + ε/2 = ε .

One reason a Cauchy sequence might not converge was illustrated above; the “limit” maynot be in the space. This is not possible, though, in a compact space.

Theorem 4.2.5. If X is a compact metric space, then all Cauchy sequences in X converge.

Proof. Let X be compact and (xn) a Cauchy sequence. By the previous theorem, (xn) hasa subsequence (xnk) such that xnk → x, some point of X. We will show that since (xn)is already Cauchy, the full sequence must converge to x. The idea is to fix some elementxnk∗ of the subsequence which is close to x. This term is chosen far enough along the initialsequence so that all terms are close to it, and thus close to x.

Let ε > 0 and choose N such that if m,n ≥ N then d(xm, xn) < ε/2. Choose also someK such that if k ≥ K then d(xnk , x) < ε/2. Last, set N ′ = max{N,K}. Because (nk) ismonotone increasing, we can fix k∗ such that nk∗ ≥ N ′. Then for any n ≥ N ′, we have

d(xn, x) ≤ d(xn, xnk∗ ) ≤ d(xnk∗ , x) ≤ ε/2 + ε/2 = ε .

Definition 4.2.6. A metric space X in which all Cauchy sequences converge is said to becomplete.

The above theorem says that compact spaces are complete. This is also true of Rk,though it is not compact.

Theorem 4.2.7. Rk (with the usual metric) is complete.

46

Proof. Let (xn) be a Cauchy sequence. We claim that it is bounded. The proof is almostthe same as that of the fact that a convergent sequence is bounded. We can find N suchthat if n,m ≥ N then d(xn, xm) < 1. Therefore d(xn, xN) < 1 for all n ≥ N . PuttingR = max{d(xN , x1), . . . , d(xN , xN−1), 1}, we then have d(xj, xN) < R for all j, so (xn) isbounded.

Since (xn) is bounded, we can put it in a k-cell C. Then we can view the sequenceas being in the space C, which is compact. Now we use the fact that compact spaces arecomplete, giving some x ∈ C such that xn → x. But x ∈ R, so we are done.

4.3 Special sequences

Here we will list the limits Rudin gives in the book and prove a couple of them. Oneinteresting one is the fourth. Imagine taking α = 1010 and p = 10−10. Then it says that(1 + 10−10)n eventually outgrows n1010 .

Theorem 4.3.1. The limits below evaluate as follows.

• If p > 0 then n−p → 0.

• If p > 0 then p1/n → 1.

• n1/n → 1.

• If p > 0 and α ∈ R then nα

(1+p)n→ 0.

• If |x| < 1 then xn → 0.

Proof. For the first limit, let ε > 0 and choose N = d 1ε1/pe + 1. Then if n ≥ N we have

n > 1ε1/p

and therefore n−p < ε.For the second, Rudin uses the binomial theorem:

Lemma 4.3.2. For x, y ∈ R and n ∈ N,

(x+ y)n =n∑j=0

(n

j

)xjyn−j .

Proof. The proof is by induction. For n = 0 we get

1 =

(0

0

)= 1 .

47

Assuming it holds for some n, we show it holds for n+ 1. We have

(x+ y)n+1 = (x+ y)(x+ y)n = (x+ y)n∑j=0

(n

j

)xjyn−j

=n∑j=0

(n

j

)xj+1yn−j +

n∑j=0

(n

j

)xjyn+1−j

=n+1∑j=1

(n

j − 1

)xjyn+1−j +

n∑j=0

(n

j

)xjyn+1−j

=

(n

0

)yn+1 +

(n

n

)xn+1 +

n∑j=1

[(n

j − 1

)+

(n

j

)]xjyn+1−j

But now we use the identity(nj−1

)+(nj

)=(n+1j

), valid for n ≥ 0 and j = 1, . . . , n. This gives

yn+1 + xn+1 +n∑j=1

(n+ 1

j

)xjyn+1−j ,

which is∑n+1

j=0

(n+1j

)xjyn+1−j.

Returning to the proof of the second limit, we first assume that p > 1 and set yn = p1/n−1.Computing,

p = (yn + 1)n ≥ 1 + nyn ,

where we have taken only the first two terms from the binomial theorem. This means0 ≤ yn ≤ p−1

nand letting n → ∞ we get yn → 0, completing the proof in the case p > 1.

If 0 < p < 1 then we consider 1/p and see that (1/p)1/n → 1. Taking reciprocals, we getp1/n → 1.

For the third limit, we use a different term in the binomial theorem. Set xn = n1/n − 1and compute

n = (1 + xn)n ≥(n

2

)x2n =

n(n− 1)

2x2n ,

so 0 ≤ xn ≤√

2n−1 ≤

√4n

if n ≥ 2. Since n−1/2 → 0 we are done.

The fourth limit is a bit more difficult. Choose any k ≥ α and consider n > 2k. Then

(1 + p)n ≥(n

k

)pk =

n(n− 1) · · · (n− k + 1)

k!pk ≥ (n/2)kpk

k!.

This gives

0 ≤ nα

(1 + p)n≤ (p/2)k

k!nα−k → 0

since α < k.The last limit is proved in Chapter 5 in the theorem on geometric series.

48

4.4 Exercises

1. For the following sequences, find the limit and prove your answer (using an ε − Nargument).

(a) xn =√n2 + 1− n, n ∈ N.

(b) xn = n2−n, n ∈ N.

2. Determine whether or not the following sequence converges. If it does not, give aconvergent subsequence (if one exists).

xn = sin(nπ

2

)+ cos(nπ), n ∈ N .

3. Let a1, . . . , ak be positive numbers. Show that

limn→∞

(an1 + · · ·+ ank

k

)1/n

= max{a1, . . . , ak} .

4. We have seen that if a metric space (X, d) is compact then it must be complete. Inthis exercise we investigate the converse.

(a) Show that if X is complete then it need not be compact.

(b) We say that a metric space (X, d) is totally bounded if for each δ > 0 we can findfinitely many points x1, . . . , xn such that

X ⊂n⋃i=1

Bδ(xi) .

(Here, Br(x) is the neighborhood {y ∈ X : d(x, y) < r}.) Show that X is compactif and only if X is both totally bounded and complete via the following steps.

i. Show that if X is compact then X is totally bounded.

ii. Assume that X is totally bounded and let E ⊂ X be infinite. We will try toconstruct a limit point for E in X. Begin by finding x

(1)1 , . . . , x

(1)n1 ∈ X such

that

X ⊂n1⋃i=1

B 12(x

(1)i ) .

There must be k1 ∈ {1, . . . , n1} such that

E1 = B 12(x

(1)k1

) ∩ E is infinite .

Continue, at stage n ≥ 2 choosing x(n)kn∈ X such that En = B2−n(x

(n)kn

)∩En−1is infinite. Show that (x

(1)k1, x

(2)k2, . . .) is a Cauchy sequence.

Hint. You may want to use the following fact (without proof). For n ≥ 1,define sn = 1/2 + 1/4 + · · ·+ 1/2n. Then (sn) converges.

49

iii. Assume that X is totally bounded and complete and show that any E ⊂ Xwhich is infinite has a limit point in X. Conclude that X is compact.

(c) Show that if X is totally bounded then it need not be compact.

5. Sometimes we want to analyze sequences that do not converge. For this purpose wedefine upper and lower limits; numbers that exist for all real sequences. Let (an) be asequence in R and for each n ≥ 1 define

un = sup{an, an+1, . . .} and ln = inf{an, an+1, . . .} .

(Here we write un =∞ if {ak : k ≥ n} is not bounded above and ln = −∞ if it is notbounded below.)

(a) Show that (un) and (ln) are monotonic. If (an) is bounded, show that there existnumbers u, l ∈ R such that un → u and ln → l. We denote these numbers as thelimit superior (upper limit) and limit inferior (lower limit) of (an) and write

lim supn→∞

an = u, lim infn→∞

an = l .

(b) Give reasonable definitions of lim supn→∞ an and lim infn→∞ an in the unboundedcase. (Here your definitions should allow for the possibilities ±∞.)

(c) Show that (an) converges if and only if

lim supn→∞

an = lim infn→∞

an .

(You may want to separate into cases depending on whether the lim inf and/orlim sup is finite or infinite.)

6. Let (an) be a real sequence and write E for the set of all subsequential limits of (an);that is,

E = {x ∈ R : there is a subsequence (ank) of (an) such that ank → x as k →∞} .

Assume that (an) is bounded and prove that lim supn→∞ an = supE. Explain how youwould modify your proof to show that lim infn→∞ an = inf E. (These results are alsotrue if (an) is unbounded but you do not have to prove that.)

7. Let a0, b0 be real numbers with 0 6 b0 6 a0. Define sequences {an}, {bn} by

an+1 =an + bn

2and bn+1 =

√anbn for n > 0 .

(a) Prove the arithmetic-geometric mean inequality : b1 6 a1. This trivially extendsto bn 6 an for all n.

(b) Prove that {an} and {bn} converge, and to the same limit. This limit is calledthe arithmetic-geometric mean of a0 and b0.

50

8. This problem is not assigned; it is just for fun. Let x0 = 1 and xn+1 = sinxn.

(a) Prove xn → 0.

(b) Find limn→∞√nxn.

51

5 Series

We now introduce series, which are special types of sequences. We will concentrate on themfor the next couple of lectures.

5.1 Definitions

Definition 5.1.1. Let (xn) be a real sequence. For each n ∈ N, define the partial sum

sn = x1 + · · ·+ xn =n∑j=1

xj .

We say that the series∑xn converges if (sn) converges.

Just as before, the tail behavior is all that matters (we can chop off as many initial termsas we want). In other words

∞∑n=1

xn converges iff∞∑n=N

xn converges for each N ≥ 1 .

The proof is the same as that for sequences.We would like to characterize which series converge. We start with a simple criterion

that must be satisfied.

Theorem 5.1.2. If the series∑xn converges then the terms (xn) converge to 0.

Proof. Let ε > 0 and suppose that∑

n xn = s (that is, sn → s). Then there exists N ∈ Nsuch that if n ≥ N then |sn − s| < ε/2. Now for n ≥ N + 1,

|sn − s| < ε/2 and |sn−1 − s| < ε/2 ,

implying that |sn − sn−1| < ε. Therefore

|xn − 0| = |sn − sn−1| < ε

and we are done.

The above tells us that many series cannot converge. For instance,∑xn diverges, where

xn = (−1)n. However, it is not true that all series∑xn with xn → 0 converge. For example,

Theorem 5.1.3. The harmonic series,∑∞

n=1 1/n, diverges.

Proof. To prove this, we give a lemma that allows us to handle series of non-negative termsmore easily.

Lemma 5.1.4. Let (xn) be a sequence of non-negative terms. Then∑xn converges if and

only if the sequence of partial sums (sn) is bounded.

52

Proof. This comes directly from the monotone convergence theorem. If xn ≥ 0 for all n,then

sn+1 = sn + xn+1 ≥ sn ,

giving that (sn) is monotone, and converges if and only if it is bounded.

Returning to the proof, we will show that the partial sums of the harmonic series areunbounded. Let M > 0 and choose n of the form n = 2k for k > 2M . Then we give a lowerbound:

sn = 1 +1

2+

1

3+ · · ·+ 1

2k

> 1 +

(1

2+

1

3

)+

(1

4+

1

5+

1

6+

1

7

)+ · · ·+

(1

2k−1+ · · ·+ 1

2k − 1

)>

1

2+ 2

(1

4

)+ 4

(1

8

)+ · · ·+ 2k−1

(1

2k

)=

1

2(1 + 1 + 1 + · · ·+ 1) =

k

2> M .

So given any M > 0, there exists n such that sn > M . This implies that (sn) is unboundedand we are done.

In the proof above, we used an argument that can be generalized a bit.

Theorem 5.1.5 (Comparison test). Let (xn) and (yn) be non-negative real sequences suchthat xn ≥ yn for all n.

1. If∑xn converges, then so does

∑yn.

2. If∑yn diverges, then so does

∑xn.

Proof. The first part is implied by the second, so we need only show the second. Write (sn)and (tn) for the partial sums

sn = x1 + · · ·+ xn and tn = y1 + · · ·+ yn .

Since yn ≥ 0 for all n we can use the above lemma to say that (tn) is unbounded, so givenM > 0 choose N such that n ≥ N implies that tn > M . Now for such n,

sn = x1 + · · ·+ xn ≥ y1 + · · ·+ yn = tn > M ,

so that (sn) is unbounded and diverges.

This test can be generalized in at least two ways:

1. We only need xn ≥ yn for n greater than some N0. This is because we can considerconvergence/divergence of

∑∞n=N0

xn etc.

53

2. In the first part, we do not even need yn ≥ 0 as long as we modify the statement.Suppose that (xn) is non-negative such that

∑xn converges and |yn| ≤ xn for all n.

Then setting sn and tn as before, we can just show that (tn) is Cauchy. Since (sn) is,given ε > 0 we can find N such that if n > m ≥ N then |sn − sm| < ε. Then

|tn − tm| = |ym+1 + · · ·+ yn| ≤ |ym+1|+ · · ·+ |yn| ≤ xm+1 + · · ·+ xn = |sn − sm| < ε .

To use the comparison test, let us first introduce one of the simplest series of all time.

Theorem 5.1.6 (Geometric series). For a ∈ R define a sequence (xn) by xn = an. Then thegeometric series

∑n xn converges if and only if |a| < 1. Furthermore,

∞∑n=0

an =1

1− aif |a| < 1 .

Proof. The first thing to note is that an → 0 if |a| < 1. We can prove this by showing that|a|n → 0. So if 0 ≤ |a| < 1 then the sequence |a|n is monotone decreasing:

|a|n+1 = |a||a|n ≤ |a|n

and is bounded below so has a limit, say L. Then we get

L = limn→∞

|a|n = |a| limn→∞

|a|n−1 = |a|L ,

but as |a| 6= 0 we have L = 0.Now continue to assume that |a| < 1 and compute the partial sum for n ≥ 1

sn + an+1 = sn+1 = 1 + a+ a2 + · · ·+ an+1 = 1 + a(1 + a+ a2 + · · ·+ an) = 1 + asn .

Solving for sn, we find sn(1− a) = 1− an+1. Since a 6= 1,

sn =1− an+1

1− a.

We let n→∞ to get the result.If |a| ≥ 1 then the terms an do not even go to zero, since |a|n ≥ |a| 6= 0, so the series

diverges.

Now we can prove facts about the “p-series.”

Theorem 5.1.7. The series∑n−p converges if and only if p > 1.

54

Proof. For p ≤ 1 we have n−p ≥ 1/n and so the comparison test gives divergence. Supposethen that p > 1. We can group terms as before: taking n = 2k − 1,

1 +1

2p+

1

3p+ · · ·+ 1

(2k − 1)p

= 1 +

(1

2p+

1

3p

)+

(1

4p+

1

5p+

1

6p+

1

7p

)+ · · ·+

(1

2p(k−1)+ · · ·+ 1

(2k − 1)p

)≤ 1 + 2

(1

2p

)+ 4

(1

4p

)+ · · ·+ 2k−1

(1

2(k−1)p

)= 1 + 21−p + 41−p + · · ·+ (2k−1)1−p

= (21−p)0 + (21−p)1 + · · ·+ (21−p)k−1 ≤ 1

1− 21−p <∞ since p > 1 .

This means that if sn =∑n

j=1 j−p, then s2k−1 ≤ 1

1−21−p for all k. Since (sn) is monotone, itis then bounded and the series converges.

5.2 Ratio and root tests

We will continue to deal with series of non-negative terms. So far we have used the compar-ison test to compare series to the geometric series, which we could solve for magically. Wewill still do that, but in a more “refined” way.

Theorem 5.2.1 (Root test). Let (xn) be a real sequence and define α = lim supn→∞ |xn|1/n.

• If α < 1 then∑xn converges.

• If α > 1 then∑xn diverges.

• If α = 1 then∑xn could converge or diverge.

Proof. Defineun = sup{|xn|1/n, |xn+1|1/n+1, . . .} .

In the homework set, lim supn→∞ xn was defined as limn→∞ un (at least in the bounded case– you can adapt this proof to the unbounded case). If α < 1 this means that given p ∈ (α, 1)there exists N such that if n ≥ N then un ≤ p, or

n ≥ N implies |xn| ≤ pn .

Now we just use the comparison test. Since∑pn converges (as 0 1. Recall from the homework that given a real sequence (yn),there always exists a subsequence (ynk) such that ynk → lim supn→∞ yn. So we can find anincreasing sequence (nk) such that |xnk |1/nk → α. Thus there exists K such that

k ≥ K implies |xnk | ≥ 1nk = 1

and since (xn) does not converge to zero, we cannot have∑xn convergent.

55

Last if α = 1 we cannot tell anything. First (1/n)1/n → 1 but also (1/n2)1/n =((1/n)1/n

)2 → 1. Since∑

1/n diverges and∑

1/n2 converges, the root test tells us noth-ing.

Applications.

1. The series∑

n2

2nconverges. We can see this by the root test.

lim supn→∞

(n2

2n

)1/n

= lim supn→∞

(n1/n

)22

= 1/2 < 1 .

2. Power series. Let x ∈ R and for a given real sequence (an), consider the series

∞∑n=0

anxn .

We would like to know for which values of x this series converges. To solve for this, wesimply use the root test. Consider

lim supn→∞

|anxn|1/n = |x| lim supn→∞

|an|1/n .

Setting α = lim supn→∞ |an|1/n, we find that the series converges if |x| < 1/α anddiverges if |x| > 1/α. So it makes sense to define

R := 1/α as the radius of convergence of the power series∞∑n=0

anxn .

Of course we cannot tell from the root test what happens when x = ±R.

3. Consider the series∑∞

n=01n!

. We have(1

n!

)1/n

≤(

1

n(n− 1) · · · (dn/2e)

)1/n

≤(

1

(n/2)n/2

)1/n

=

√2

n→ 0 .

So the root test gives convergence.

Another useful test is the ratio test.

Theorem 5.2.2 (Ratio test). Let (xn) be a real sequence.

• If lim supn→∞

∣∣∣xn+1

xn

∣∣∣ < 1 then∑xn converges.

• If xn+1 ≥ xn > 0 for all n ≥ N0 (a fixed natural number) then∑xn diverges.

56

Proof. Assume the limsup is α < 1. Then as before, choosing p ∈ (α, 1) we can find N suchthat if n ≥ N then |xn+1| < p|xn|. Iterating this from n = N we find

|xN+k| < pk|xN | for all k ≥ 1 .

Therefore if we set yk = xN+k then |yk| ≤ Cpk with C a non-negative constant equal to xN .This implies by the comparison test that

∑yk converges. This is the tail of

∑xn so this

converges as well.Suppose on the other hand that xn+1 ≥ xn > 0 for all n ≥ N0. Then by iteration,

xN0+k ≥ xN0 > 0 ,

and the terms do not even converge to 0. This implies that∑xn diverges.

The ratio test can be inconclusive, but in more ways than can the root test. First if

lim supn→∞

∣∣∣xn+1

xn

∣∣∣ > 1 it can still be that the series converges (try to think of an example!).

Also if this limsup equals 1, we could have convergence or divergence.Note however that if

limn→∞

∣∣∣∣xn+1

xn

∣∣∣∣ exists and is > 1

then we can apply the second criterion of the ratio test and conclude divergence of∑xn.

Applications.

1. The series∑xn n

n

n!converges if |x| < 1/C, where C =

∑∞n=0 1/n!. To see this, set

bn = xn nn

n!: ∣∣∣∣bn+1

bn

∣∣∣∣ = |x|(n+ 1)n+1

nn(n+ 1)= |x|

(1 +

1

n

)n.

But (1 +

1

n

)n=

n∑j=0

(n

j

)n−j =

n∑j=0

n(n− 1) · · · (n− j + 1)

j!nj≤

∞∑j=0

1

j!= C .

So lim supn→∞

∣∣∣ bn+1

bn

∣∣∣ ≤ C|x| < 1.

2. Power series. Generally we can also test convergence of power series using the ratiotest. Considering the series

∑anx

n, we compute

lim supn→∞

∣∣∣∣an+1xn+1

anxn

∣∣∣∣ = |x| lim supn→∞

∣∣∣∣an+1

an

∣∣∣∣ = |x|α ,

where α = lim supn→∞

∣∣∣an+1

an

∣∣∣. So if |x| < 1/α the series converges, whereas if |x| ≥ 1/α

we cannot tell. However, if β = limn→∞

∣∣∣an+1

an

∣∣∣ exists then for |x| > 1/β we have

divergence.

57

Remark (from class). The root and ratio tests can give different answers. Consider thesequence (an) given by

an =

{1 if n is even

2 if n is odd.

Then

lim supn→∞

∣∣∣∣an+1

an

∣∣∣∣ = 2 but lim supn→∞

(|an|)1/n = 1 .

Therefore if we consider the radius of convergence of∑anx

n, the root test gives 1. If wewere to define the radius of convergence using the ratio test (which we should not!) thenwe would get 1/2, which is smaller, and is not accurate, since for x = 3/4, for instance, theseries converges. Generally speaking we have

lim supn→∞

(|an|)1/n ≤ lim supn→∞

∣∣∣∣an+1

an

∣∣∣∣ .See Rudin, Theorem 3.37.

5.3 Non non-negative series

We saw in the last couple of lectures that when dealing with series with non-negative terms,we compare to the geometric series. This gave rise to the comparison test, the ratio test andthe root test. For other series we have basically one tool: summation by parts. It comes into the following theorem.

Theorem 5.3.1 (Dirichlet test). Let (an) and (bn) be real sequences such that, setting An =∑nj=0 aj, we have (An) bounded. If (bn) is monotonic with bn → 0 then

∑anbn converges.

Proof. Let’s suppose that (bn) is monotone decreasing; the other case is similar. The idea isto get a different representation for

∑anbn by setting An =

∑nj=0 aj for n ≥ 0 and A−1 = 0.

Now

N∑n=0

anbn =N∑n=0

(An − An−1)bn =N∑n=0

Anbn −N∑n=0

An−1bn

=N∑n=0

Anbn −N−1∑n=0

Anbn+1

=N∑n=0

An(bn − bn+1) + ANbN .

Now since (An) is bounded and bn → 0 we have Anbn → 0. We can show this as follows.Suppose that |An| ≤ M for all n and let ε > 0. Choose N0 such that n ≥ N0 implies that|bn| < ε/M . Then for n ≥ N0, |Anbn| < Mε/M = ε.

58

Since ANbN → 0, the above representation gives that∑anbn converges if and only

if∑An(bn − bn+1) converges. But now we use the comparison test: |An(bn − bn+1)| ≤

M |bn − bn+1| = M(bn − bn+1), where we have used monotonicity to get bn − bn+1 ≥ 0. But

N∑n=0

M(bn − bn+1) = M

N∑n=0

(bn − bn+1) = M(b0 − bN+1)→Mb0

converges, so∑An(bn − bn+1) converges, completing the proof.

Note in the previous proof that we used a technique similar to integration by parts.Recall from calculus that the integral

∫ bau(x)v(x) dx can be written as∫ b

a

u(x)v(x) dx = U(b)v(b)− U(a)v(a)−∫ b

a

U(x)v′(x) dx ,

where U is an antiderivative of u. Here we are thinking of An as the “antiderivative” of anand bn − bn+1 as the “derivative of bn. In the sum case above, we only have one boundaryterm, because one of them corresponds to A−1b0 = 0.

Examples

1. Alternating series test. Let (an) be a monotone non-increasing sequence convergingto 0. Then

∑(−1)nan converges. This is obtained by applying the Dirichlet test, noting

that the partial sums of∑

(−1)n are bounded by 1. As an example, we have∑(−1)n/n converges although

∑1/n does not .

2. For n ∈ N, let f(n) be the largest value of k such that 2k ≤ n (this is the integer partof log2 n). Then ∑

(−1)n/f(n) converges .

3. A series∑an is said to converge absolutely if

∑|an| converges. If it does not converge

absolutely but does converge then we say∑an converges conditionally. It is a famous

theorem of Riemann that given any L ∈ R and a conditionally convergent series∑an,

there is a rearrangement (bn) of the terms of (an) such that∑bn = L. See the last

section of Rudin, Chapter 3 for more details.

5.4 Exercises

1. Determine if the following series converge.

(a)∞∑n=1

n− 3

n2 − 6n+ 10

59

(b)∞∑n=1

n!

1 · 3 · · · (2n− 1)

2. Prove that the following series converges for all x ∈ R:

∞∑n=1

sin(nx)√n

.

Hint. Use Theorem 3.42, after multiplying by sin(x/2). Use the following identity,which is valid for all a, b ∈ R:

sin a sin b =1

2(cos(a− b)− cos(a+ b)) .

Although we have not defined sinx you can use the fact that | sinx| and | cosx| areboth bounded by 1.

3. Because∑

1/n diverges, any series∑xn with xn ≥ 1/n for all n must diverge, by the

comparison test. We might think then that if∑xn converges and xn ≥ 0 for all n

then xn is smaller than 1/n, in the sense that

limn→∞

nxn = 0 .

(a) Show that this is false; that is, there exist convergent series∑xn with xn ≥ 0 for

all n such that {nxn} does not converge to 0.

(b) Show however that if {xn} is monotone non-increasing and non-negative with∑xn convergent, then nxn → 0.

Hint. Use Theorems 3.23 and 3.27.

4. Here we give a different proof of the alternating series test. Let {xn} be a real sequencethat is monotonically non-increasing and xn → 0.

(a) For n ∈ N, let An =∑2n

i=1(−1)ixi and Bn =∑2n−1

i=1 (−1)ixi. Show that {An} and{Bn} converge.

(b) Prove that An −Bn → 0 and use this to show that∑

(−1)nxn converges.

5. Suppose that (an) and (bn) are real sequences such that bn > 0 for all n and∑an

converges. If

limn→∞

anbn

= L 6= 0

then must∑bn converge? (Here L is a finite number.)

60

6. (a) For 0 ≤ k ≤ n, recall the definition of the binomial coefficient(n

k

)=

n!

k!(n− k)!.

For n ≥ 0, let an =(2nn

). Find limn→∞

an+1

an.

(b) Find R, the radius of convergence of∑∞

n=0

(2nn

)xn. (Fun fact: this actually equals

1√1−4x when it converges; it is the Taylor series of this function.)

(c) Show that if b1, b2, . . . , bm are non-negative real numbers then

(1 + b1)(1 + b2) · · · (1 + bm) ≥ 1 + b1 + · · ·+ bm .

Write cn = an/4n and c0

cn= cn−1

cn· cn−2

cn−1· · · c1

c2· c0c1

. Use this inequality to show that(2nn

)/4n → 0.

(d) Show that the sum∑(

2nn

)xn converges when x = −R. (Fun fact we will get to

later: it converges to the value of 1√1−4x at x = −R, which is 1√

2.)

(e) What can you say about convergence when x = R?

61

6 Function limits and continuity

6.1 Function limits

So far we have only talked about limits for sequences. Now we step it up to functions. Prettyquickly, though, we will see that we can relate function limits to sequence limits. The pointat which we consider the limit does not even need to be in the domain of the function f . Itis important also to notice that x is not allowed to equal x0 below.

Definition 6.1.1. Let (X, dX) and (Y, dY ) be metric spaces and E ⊂ X with x0 a limit pointof E. If f : E → Y then we write

limx→x0

f(x) = L

if for each ε > 0 there exists δ > 0 such that whenever x ∈ E and 0 < dX(x, x0) < δ, itfollows that dY (f(x), L) < ε

Why is it important that x0 be allowed to be only a limit point of E (and therefore notnecessarily in E)? Consider f : (0,∞) → R defined by f(x) = sinx

x. Then f is not defined

at 0 but we know from calculus that it has a limit of 1 as x→ 0.Here we are using ε as a measure of “closeness” just as before. We can imagine a dialogue

similar to what occurred for sequences: you say that f(x) approaches L as x approaches x0.I say “Well, can you get f(x) within .005 of L as long as x is close to x0?” You say yes andproduce a value of δ = .01. You then qualify this by saying, “As long as x is within δ = .01of x0 then f(x) will be within .005 of L. This goes on and on, and if each time I give youan ε > 0 you manage to produce a corresponding δ > 0, then we say the limit equals L.

As promised, there is an equivalent formulation of limits using sequences. Note that thestatement below must hold for all sequences (xn) in E with xn → x0 but xn 6= x0 for all n.

Proposition 6.1.2. Let f : E → Y and x0 a limit point of E. We have limx→x0 f(x) = L ifand only if for each sequence (xn) in E such that xn → x0 with xn 6= x0 for all n, it followsthat f(xn)→ L.

Proof. Suppose first that limx→x0 f(x) = L and let (xn) be a sequence in E such that xn → x0and xn 6= x0 for all n. We must show that f(xn) → L. So, let ε > 0 and choose δ > 0such that whenever dX(x, x0) < δ, we have dY (f(x), L) < ε. Now since xn → x0 we can pickN ∈ N such that if n ≥ N then dX(xn, x0) < δ. For this N , if n ≥ N then

dX(xn, x0) < δ, so dY (f(xn), L) < ε ,

and it follows that f(xn)→ L.Suppose conversely that f(xn) → L for all sequences (xn) in E such that xn → x0 and

xn 6= x0 for all n. By way of contradiction, assume that limx→x0 f(x) = L does not hold.So there must be at least one ε > 0 such that for any δ > 0 we try to find, there is alwaysa xδ ∈ E with dX(xδ, x0) < δ but dY (f(xδ), L) > ε. So create a sequence of these, usingδ = 1/n. In other words, for each n ∈ N, pick xn ∈ E \ {x0} such that 0 < dX(xn, x0) < 1/nbut dY (f(xn), L) > ε. (This is possible in part because x0 is a limit point of E.) Then clearlyxn → x0 with xn 6= x0 for all n but we cannot have f(xn)→ L. This is a contradiction.

62

One nice thing about the sequence formulation is that it allows us to immediately bringover theorems about convergence for sequences. For instance,

Proposition 6.1.3. Let E ⊂ R and x0 a limit point of E. Let f, g : R → R (with thestandard metric) and a, b ∈ R. If limx→x0 f(x) = L and limx→x0 g(x) = M exist. Then

1. limx→x0 af(x) = aL.

2. limx→x0(f(x) + g(x)) = L+M .

3. limx→x0 f(x)g(x) = LM .

4. If M 6= 0 then

limx→x0

f(x)

g(x)=

L

M.

6.2 Continuity

We now give the definition of continuity. Note that x0 must be an element of E, since fneeds to be defined there.

Definition 6.2.1. If E ⊂ X and f : E → Y with x0 ∈ E then we say f is continuous at x0if

limx→x0

f(x) = f(x0) .

We say f is continuous on E if f is continuous at every x0 ∈ E.

Here is the equivalent definition in terms of δ, ε. The function f is continuous at x0 if foreach ε > 0 there is δ > 0 such that if x ∈ E satisfies dX(x, x0) < δ then dY (f(x), f(x0)) < ε.Note here that we are not restricting 0 < dX(x, x0) since we trivially have dY (f(x0), f(x0)) <ε for all ε > 0. This caveat (or lack thereof) carries over the the corollary:

Corollary 6.2.2. The function f is continuous at x0 ∈ E if and only if for each sequence(xn) in E with xn → x0 we have f(xn)→ f(x0).

Proof. This is just a consequence of the sequence theorem from last section.

There is yet another equivalent definition in terms of only open sets. This one is valid forfunctions continuous on all of X (although there is a more technical one for continuity at apoint, but we will not get into that). To extend the theorem to functions that are continuouson subsets E of X, one would need to talk about sets that are open in E.

Theorem 6.2.3. If f : X → Y then f is continuous on X if and only if for each open setO ⊂ Y , the preimage

f−1(O) = {x ∈ X : f(x) ∈ O}

is open in X.

63

Proof. Suppose that f is continuous on X and let O ⊂ Y be open. We want to show thatf−1(O) is open. So choose x0 ∈ f−1(O). Since f(x0) ∈ O (by definition) and O is open wecan find ε > 0 such that Bε(f(x0)) ⊂ O. However f is continuous at x0 so there exists acorresponding δ > 0 such that if x ∈ X with dX(x, x0) < δ then dY (f(x), f(x0)) < ε. So ifx ∈ Bδ(x0) then f(x) ∈ Bε(f(x0)). As Bε(f(x0)) was chosen to be a subset of O, we find

if x ∈ Bδ(x0) then f(x) ∈ O ,

of Bδ(x0) ⊂ f−1(O). This means x0 is an interior point of f−1(O) and this set is open.Suppose that for each open O ⊂ Y the set f−1(O) is open in X. To show f is continuous

on X we must show that f is continuous at each x0 ∈ X. So let x0 ∈ X and ε > 0. The setBε(f(x0)) is open in Y , so f−1 (Bε(f(x0))) is open in X. Because x0 is an element of thisset (note that f(x0) ∈ Bε(f(x0))) it must be an interior point, so there is a δ > 0 such thatBδ(x0) ⊂ f−1(Bε(f(x0))). Now if dX(x, x0) < δ then dY (f(x), f(x0)) < ε, so f is continuousat x0.

It is difficult to get intuition about this definition, but let us give an example to illustratehow it may work. Consider the function f : R→ R given by

f(x) =

{1 if x = 0

0 if x 6= 0.

We know from calculus that f is not continuous because it is not continuous at 0:

limx→0

f(x) = 0 6= 1 = f(0) .

To see this in terms of the other definition, look at the open set (1/2, 3/2). Then

f−1((1/2, 3/2)) = {0} ,

which is not open. This only proves, however, that f is not continuous everywhere.

Corollary 6.2.4. f is continuous on X if and only if for each closed C ⊂ Y , the set f−1(C)is closed in X.

Proof. f is continuous on X if and only if for each open O ⊂ Y , the set f−1(O) is open inX. If C ⊂ Y is closed then Cc is open in Y . Therefore

f−1(C) =(f−1(Cc)

)cis closed inX .

To check this equality, we have x ∈ f−1(C) iff f(x) ∈ C iff f(x) /∈ Cc iff x ∈ (f−1(Cc))c.

The other direction is similar. If f−1(C) is closed in X whenever C is closed in Y , let Obe an open set in Y . Then f−1(Oc) is closed in X, giving

f−1(O) =(f−1(Oc)

)copen in X .

64

Examples.

1. The simplest. Take f : X → X as f(x) = x. Then for each open O ⊂ X, f−1(O) = Ois open in X. So f is continuous on X.

2. Let f : R→ R be

f(x) =

{1 if x ∈ Q0 if x /∈ Q

.

This function is continuous nowhere. If x ∈ R then suppose first x is rational. Choosea sequence of irrationals (xn) converging to x (this is possible by the fact that R \ Qis dense in R, from the homework). Then limn→∞ f(xn) = 0 6= 1 = f(x). A similarargument holds for irrational x and gives that f is continuous nowhere.

Note that this conclusion cannot be obtained by showing that some open set O hasf−1(O) not open. (Take for instance O = (−1/2, 1/2).) This would prove that f is notcontinuous everywhere.

3. The last function was discontinuous at the rationals and irrationals. This one is anasty function that will be discontinuous only at the rationals. For any q ∈ Q writeq ∼ (m,n) if m/n is the “lowest terms” representation of q; that is, if m,n are theunique numbers with m ∈ Z, n ∈ N and m,n have no common prime factors. Thendefine f : R→ R by

f(x) =

{1n

if x ∈ Q and x ∼ (m,n) for some m ∈ Z0 if x /∈ Q

.

It is not hard to see that f is discontinuous at rationals. Indeed, if x is rational thenf(x) > 0 but we can choose a sequence of irrationals (xn) such that xn → x, giving0 = limn→∞ f(xn).

On the other hand it is a bit more difficult to show that f is continuous at the irra-tionals. Let x ∈ R \Q and let ε > 0. Choose N ∈ N such that 1/N < ε. Consider therational numbers in the interval (x− 1, x+ 1). There are only finitely many rationalsin this interval that are represented as (m, 1) for some m. There are also only finitelymany in this interval represented as (m, 2) for some m. Continuing, the set of rationalnumbers XN in this interval that are represented as (m,n) for some n ≤ N is finite andis therefore a closed set. Since x ∈ Xc

N we can then find δ > 0 such that Bδ(x) ⊂ XcN .

We claim that if |y − x| < δ then |f(x) − f(y)| < ε. To prove this, consider first yirrational. Then |f(x) − f(y)| = |0 − 0| = 0 < ε. Next if y is rational in Bδ(x) thenx ∈ Xc

N and so the representation of x is (m,n) with n > N . It follows that

|f(x)− f(y)| = |0− f(y)| = f(y) =1

n≤ 1

N< ε .

4. The last function was discontinuous exactly at the rationals. We will see in the home-work that there is no function that is discontinuous exactly at the irrationals.

65

We saw last time that f : R → R given by f(x) = x is continuous everywhere. We willuse this along with the following proposition to show that polynomials are also continuous.

Proposition 6.2.5. Let X be a metric space and f, g : X → R be continuous at x0 ∈ X. Ifa ∈ R then the following functions are continuous as x0:

1. f + g, af , and fg,

2. f/g as long as g(x0) 6= 0.

Proof. These follow using the limit properties from before; for example,

limx→x0

(f + g)(x) = limx→x0

f(x) + limx→x0

g(x) = f(x0) + g(x0) = (f + g)(x0) ,

giving that f + g is continuous at x0. The others follow similarly.

The proposition implies:

• Any polynomial function is continuous on all of R. That is, if f(x) = anxn+· · ·+a1x+a0

then f is continuous on R.

• Every rational function is continuous at each point for which the denominator isnonzero. That is, if f(x) = g(x)/h(x), where g and h are polynomial functions, thenf is continuous at x0 if and only if h(x0) 6= 0.

Another way to build continuous functions is through composition.

Theorem 6.2.6. Let X, Y, Z be metric spaces and f : X → Y , g : Y → Z functions. If f iscontinuous at x0 ∈ X and g is continuous at f(x0) ∈ Y then g ◦ f is continuous at x0.

Proof. Let ε > 0. Since g is continuous at f(x0), we can choose δ′ > 0 such that ifdY (y, f(x0)) < δ′ then dZ(g(y), g(f(x0))) < ε. Since f is continuous at x0, we can chooseδ > 0 such that if dX(x, x0) < δ then dY (f(x), f(x0)) < δ′. Putting these together, ifdX(x, x0) < δ then dY (f(x), f(x0)) < δ′, giving dZ(g(f(x)), g(f(x0))) < ε. This meansdZ((g ◦ f)(x), (g ◦ f)(x0)) < ε and g ◦ f is continuous at x0.

6.3 Relations between continuity and compactness

Continuous functions and compact sets work well together. The first basic theorem is:

Theorem 6.3.1. If f : X → Y is continuous and E ⊂ X is compact then the imagef(E) = {f(x) : x ∈ E} is compact.

Proof. We need to show that any open cover of f(E) can be reduced to a finite subcover, solet C be an open cover of f(E). Define a collection C ′ of sets in X by

C ′ = {f−1(O) : O ∈ C} .

66

Because f is continuous, each set in C ′ is open. Furthermore C ′ covers E, as every pointin x ∈ E is mapped to an element in f(E), which is covered by some O ∈ C. This meansthat C ′ is an open cover of E, and compactness of E allows to reduce it to a finite subcover{f−1(O1), . . . , f

−1(On)}. We claim that {O1, . . . , On} is a finite subcover of f(E). To showthis, let y ∈ f(E) so that there exists some x ∈ E with f(x) = y. There exists k with1 ≤ k ≤ n such that x ∈ f−1(Ok) and therefore y = f(x) ∈ Ok.

This theorem has many consequences.

Corollary 6.3.2. Let f : X → Y be continuous. If E ⊂ X is compact then f is bounded onE. That is, there exists y ∈ Y and R > 0 such that dY (y, f(x)) ≤M for all x ∈ E.

Proof. From the theorem, the set f(E) is compact and therefore bounded.

The next is for continuous functions to R.

Corollary 6.3.3 (Extreme value theorem). Let f : X → R be continuous and E ⊂ Xcompact. Then f takes a maximum on E; that is, there exists x0 ∈ E such that

f(x0) ≥ f(x) for all x ∈ E .

A similar statement holds for a minimum.

Proof. The set f(E) is closed and bounded, so it contains its supremum, y. Since y ∈ f(E)there exists x0 ∈ E such that f(x0) = y. Then f(x0) ≥ f(x) for all x ∈ E.

Continuous functions on compact sets actually satisfy a property that is stronger thancontinuity. To explain this, consider the function f : (0,∞) → R given by f(x) = 1/x.When we study continuity, what we are really interested in is how much a small change in xwill change the value of f(x). (Recall continuity says that if we change x by at most δ thenf(x) will change by at most ε.) Consider the effect of changing x by a fixed amount, say .1,for different values of x. If x is large, like 100, then changing x by .1 can change f so thatit lies anywhere in the interval (1/100.01, 1/99.99). If x is small, like .15, then this samechange in x changes f to lie in the interval (1/.25, 1/.05) = (4, 20). This is a much largerinterval, meaning that f is more unstable to changes when x is small compared to when xis large.

This motivates the idea of uniform continuity. For a uniformly continuous function, themeasure of stability described above is uniform on the whole set. That is, there is an upperbound to how unstable the function is to changes. This corresponds to a uniform δ > 0 overall x for a given ε > 0:

Definition 6.3.4. A function f : X → Y is uniformly continuous if given ε > 0 there existsδ > 0 such that if x, y ∈ X satisfy dX(x, y) < δ then dY (f(x), f(y)) < ε.

Theorem 6.3.5. If f : X → Y is continuous and X is compact then f is uniformly contin-uous.

67

Proof. The idea of the proof is as follows. Since f is continuous at each x, given ε > 0 wecan find a δx > 0 from the definition of continuity that “works” at x. These δx-balls coverX and by compactness we can find only finitely many δxi ’s such that these balls still coverX. Taking the minimum of these numbers will give us the required (positive) δ.

Let ε > 0. For each x ∈ X, since f is continuous at x, we can find δx > 0 such that ifdX(x, x′) < δx then dY (f(x), f(x′)) < ε/2. The collection

{Bδx/2(x) : x ∈ X}

is an open cover for X, so since X is compact, we can find x1, . . . , xn ∈ X such thatX ⊂ ∪iBδxi/2

(xi). Letδ = min{δx1/2, . . . , δxn/2} ;

we claim that if x, y ∈ X satisfy dX(x, y) < δ then dY (f(x), f(y)) < ε. To prove this, picksuch x and y. We can then find i such that dX(xi, x) < δi/2. By the triangle inequality wethen have

dX(xi, y) ≤ dX(xi, x) + dX(x, y) < δi/2 + δ ≤ δi .

This means by definition of δi that

dY (f(x), f(y)) ≤ dY (f(x), f(xi)) + dY (f(y), f(xi)) < ε/2 + ε/2 = ε .

Examples.

1. Not every continuous function on a non-compact set is uniformly continuous. If Eis any non-closed subset of R then there exists a continuous function on E that isboth unbounded and not uniformly continuous. Take x0 to be any limit point of Ethat is not in E. Then f(x) = (x − x0)

−1 is continuous but unbounded. Further fis not uniformly continuous because there is no δ > 0 such that for all x, y ∈ E with|x− y| < δ we have |f(x)− f(y)| < 1. If there were, we could just choose some y ∈ Ewith |y − x0| < δ/2 and then deduce that all points z within distance δ of y havef(z) ≤ f(y) + 1. But this is impossible.

2. If E is an unbounded subset of R then there is an unbounded continuous function onE: just take f(x) = x.

3. The only polynomials that are uniformly continuous on all of R are those of degree atmost 1. Indeed, take f(x) = anx

n + · · · a1x + a0 with an 6= 0 and n ≥ 2 and assumethat there exists δ > 0 such that if |x− y| < δ then |f(x)− f(y)| < 1. Then considerpoints x, y of the form x, x+ δ/2: you can check that

{|f(x)− f(x+ δ/2)| : x ∈ R}

is unbounded, giving a contradiction. (The problem here is that for fixed δ, the quantity|f(x)− f(x+ δ/2)| grows to infinity as x→∞. This is not the case if n = 0 or 1.)

There are other ways that functions can fail to be uniformly continuous. We will see later,however, that any differentiable function with bounded derivative is uniformly continuous.

68

6.4 Connectedness and the IVT

We would like to prove the intermediate value theorem from calculus and the simplest way todo this is to see that it is a consequence of a certain property of intervals in R. Specifically,an interval is connected. The definition of connectedness is somewhat strange so we will tryto motivate it. Instead of trying to envision what connectedness is, we will try to capturewhat it is not. That is, we want to call a metric space disconnected if we can write it as aunion of two sets that do not intersect. There is a problem with this attempt at a definition,as we can see by considering R. Certainly we can write it as (−∞, 1/2)∪ [1/2,∞) and thesesets do not intersect, but we still want to say that R is connected. The issue in this exampleis that the sets are not separated enough from each other. That is, one set contains limitpoints of the other. This problem is actually resolved if we require that both sets are open.(But you have to think about how this resolves the issue.)

Definition 6.4.1. A metric space X is disconnected if there exist non-empty open sets O1

and O2 in X such that X = O1 ∪O2 but O1 ∩O2 = ∅. If X is not disconnected we say it isconnected.

Connectedness and continuity also go well with each other.

Theorem 6.4.2. Let X, Y be metric spaces and f : X → Y be continuous. If X is connectedthen the image set f(X), viewed as a metric space itself, is connected.

Proof. As stated above, we view f(X) ⊂ Y as a metric space itself, using the metric itinherits from Y . To show that f(X) is a connected space we will assume it is disconnectedand obtain a contradiction. So assume that we can write f(X) = O1 ∪ O2 with O1 and O2

nonempty, disjoint, and open (in the space f(X)). We will produce from this a disconnectionof X and obtain a contradiction.

Now consider U1 = f−1(O1) and U2 = f−1(O2). These are open sets in X since f iscontinuous. Further they do not intersect: if x is in their intersection, then f(x) ∈ O1 ∩O2,which is empty. Last, they are nonempty because, for example, if y ∈ O1 (which is nonemptyby assumption) then because O1 ⊂ f(X), there exists x ∈ X such that f(x) = y. This x isin f−1(O1).

So we find that X is disconnected, a contradiction. This means f(X) must have beenconnected.

• Let X be a discrete metric space. If X consists of at least two points then X isdisconnected. This is because we can let O1 = {x} for some x ∈ X and O2 = Oc

1. Allsubsets of X are open, so these are open, disjoint, nonempty sets whose union is X.

• Every interval in R is connected. You will prove this in exercise 6.

Theorem 6.4.3 (Intermediate value theorem). Let f : [a, b] → R for a < b be continuous.Suppose that for some L ∈ R,

f(a) < L < f(b) .

Then there exists c ∈ (a, b) such that f(c) = L.

69

Proof. Since f is continuous, the space f([a, b]) is connected. Since

O1 := (−∞, L) ∩ f([a, b]) and O2 := (L,∞) ∩ f([a, b])

are both nonempty (because f(a) ∈ O1 and f(b) ∈ O2), open in f([a, b]), and disjoint, itcannot be that their union is equal to f([a, b]). Therefore L ∈ f([a, b]) and there existsx ∈ X with f(x) = L.

6.5 Discontinuities

Let us spend a couple of minutes on types of discontinuities for real functions. Let E ⊂ R,f : E → R and x0 ∈ E. (Draw some pictures.)

• x0 is a removable discontinuity of f if limx→x0 f(x) exists but it not equal to f(x0).

• x0 is a simple discontinuity of f if limx→x−0f(x) exists, as does limx→x+0

f(x), but theyare not equal. Here the first limit is a left limit ; that is, we are considering f as beingdefined on the metric space E∩(−∞, x0] and taking the limit in this space. The secondis a right limit, and we consider the space as E ∩ [x0,∞). This corresponds to saying,for example, that

limx→x−0

f(x) = L

if for each ε > 0 there exists δ > 0 such that if x0 − δ < x < x0 then |f(x)− L| < ε.

• x0 can be a discontinuity but not captured above. Consider f : R→ R given by

f(x) =

{sin(1/x) if x 6= 0

0 if x = 0.

Here there is not even a limit as x→ 0. This is because we can find a sequences (xn)converging to 0 such that (f(xn)) does not have a limit. Take

xn = 2/(nπ) .

6.6 Exercises

1. Let f : [a, b] → R be continuous with f(x) > 0 for all x ∈ [a, b]. Show there existsδ > 0 such that f(x) ≥ δ for all x ∈ [a, b].

2. Determine if the following functions are continuous at x = 0. Prove your answer. (Youmay use standard facts about trigonometric functions although we have not introducedthem rigorously.)

(a)

f(x) =

{x cos 1

xif x 6= 0

0 if x = 0.

70

(b)

g(x) =

{sin 1

xif x 6= 0

0 if x = 0.

3. (a) Let f, g : R → R be continuous. Show that h : R → R is continuous, where h isgiven by

h(x) = max{f(x), g(x)} .

(b) Let C be a set of continuous functions from R to R . For each x, assume that{f(x) : f ∈ C} is bounded above and define F : R→ R by

F (x) = sup{f(x) : f ∈ C} .

Must F be continuous?

4. In this problem we will show that there is no real-valued function that is continuousexactly at the rationals. Fix any f : R→ R.

(a) Show that for each n ∈ N, the set An is open, where

An =

{x : ∃δ > 0 such that |f(z)− f(y)| < 1

nfor all y, z ∈ (x− δ, x+ δ)

}.

(b) Prove that the set of points at which f is continuous is equal to ∩n∈NAn.

(c) Prove that ∩n∈NAn cannot equal Q.

Hint. Argue by contradiction and enumerate the rationals as {q1, q2, . . .}. DefineBn = An \ {qn} and obtain a contradiction using exercise 5 of Chapter 3.

5. Find metric spaces X, Y , a continuous function f : X → Y , and a Cauchy sequence{xn} in X such that {f(xn)} is not Cauchy in Y .

6. Read the last section of Chapter 4 in Rudin on limits at infinity. Prove that thefunction f : (0,∞)→ R given by f(x) = 1/x has limx→∞ f(x) = 0.

7. Prove that any interval I ⊂ R is connected.

Hint. Consider I as a metric space with the standard metric from R. Suppose thatI = O1 ∪ O2 where O1 ∩ O2 = ∅ and both Oi’s are nonempty and open in I. Thenthere must be a point x1 ∈ O1 and a point x2 ∈ O2. Suppose that x1 < x2 and define

I1 = {r ≥ x1 : [x1, r] ⊂ O1} .

What can you say about sup I1?

8. Let f : (a, b) → R be uniformly continuous. Prove that f has a unique continuousextension to [a, b). That is, there is a unique g : [a, b) → R which is continuous andagrees with f everywhere on (a, b). Show by example that it is not enough to assumef is only continuous, or even both continuous and bounded.

71

9. Show that the function f given by f(x) = 1/x is uniformly continuous on [1,∞).

10. Show that the function f given by f(x) =√x is uniformly continuous on [0,∞).

Hint. Use the fact that for a, b ≥ 0 we have√a+√b ≥√a+ b.

11. Show that the function f given by f(x) = sin(1/x) is not uniformly continuous on(0, 1).

12. Suppose that f : [0,∞) → R is continuous and has a finite limit limx→∞ f(x). Showthat f is uniformly continuous.

13. Give an example of functions f, g : [0,∞)→ R that are uniformly continuous but theproduct fg is not.

14. Let f : R→ R be continuous with f(f(x)) = x for all x. Show there exists c ∈ R suchthat f(c) = c.

15. Let p be a polynomial with real coefficients and odd degree. That is,

p(x) = anxn + · · ·+ a1x+ a0 with an 6= 0 and n odd .

(a) Show there exists c such that p(c) = 0.

(b) Let L ∈ R. Show there exists c such that p(c) = L.

16. If E ⊂ R then a function f : E → R is called Lipschitz if there exists M > 0 such that

|f(x)− f(y)| ≤M |x− y| for all x, y ∈ E .

The smallest number such that the above inequality holds for all x, y ∈ E is called theLipschitz constant for f .

(a) Show that if f : E → R is Lipschitz then it is uniformly continuous. Does theconverse hold?

(b) Show that the function f : R → R given by f(x) =√x2 + 4 is Lipschitz on R.

What is the Lipschitz constant?

(c) Is f : [0,∞)→ R given by f(x) =√x Lipschitz?

17. Let I be a closed interval. Let f : I → I and assume that f is Lipschitz with Lipschitzconstant A < 1.

(a) Prove that there is a unique y ∈ I with the following property. Choose x1 ∈ Iand define xn+1 = f(xn) for all n ∈ N. Then xn → y. This holds independentlyof the choice of x1.

(b) Show by counterexample that for (a) to work, we need I to be closed.

72

(c) Choose a1, a2, . . . , ak ∈ Q with ai > 0 for all i and with a1 · · · ak > 1. Startingfrom any x1 > 0, define a sequence {xn} by the continued fraction

xn =1

a1 +1

a2 +1

· · ·+1

ak + xn−1

.

Prove that {xn} converges. Prove that its limit is the root of a quadratic poly-nomial with coefficients in Q. In older books this is stated: an infinite periodiccontinued fraction is a quadratic surd. “The devil is the eternal surd in theuniversal mathematic.” – C. S. Lewis, Perelandra.

18. Let f : I → R for some (open or closed) interval I ⊂ R. We say that f is convex if forall x, y ∈ I and λ ∈ [0, 1],

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) .

(a) Reformulate the above condition in terms of a relation between the graph of fand certain line segments.

(b) Suppose that f : R → R is convex and let x < z < y. Choose λ = y−zy−x to show

thatf(z)− f(x)

z − x≤ f(y)− f(x)

y − x.

Interpret this inequality in terms of the graph of f . Argue similarly to show that

f(y)− f(x)

y − x≤ f(y)− f(z)

y − z.

Combine these two to get

f(z)− f(x)

z − x≤ f(y)− f(z)

y − zand interpret this inequality in terms of the graph of f .

(c) Suppose that f : [a, b]→ R is convex. Show that f is continuous on (a, b).

Hint. Let [c, d] be a subinterval of (a, b). Use the last inequality from (b) to showthat f is Lipschitz on [c, d] with Lipschitz constant bounded above by

max

{|f(c)− f(a)||c− a|

,|f(b)− f(d)||b− d|

}.

19. Suppose that f : R→ R is continuous and satisfies

f(x+ y) = f(x) + f(y) for all x, y ∈ R .

(a) Show that there exists c ∈ R such that for all x ∈ Z, f(x) = cx.

(b) Show that there exists c ∈ R such that for all x ∈ Q, f(x) = cx.

(c) Show that there exists c ∈ R such that for all x ∈ R, f(x) = cx.

73

7 Derivatives

7.1 Introduction

Continuous functions are nicer than most functions. However we have seen that they can stillbe rather weird (recall the function that equals 1/q at a rational expressed in lowest termsas p/q). So we move on to study functions that are even nicer, and for this we henceforthrestrict to functions from R to R. We could start at the very bottom, first studying constantfunctions f(x) = c and then linear functions f(x) = ax+ b, then quadratics, etc. But I trustyou learned about these functions earlier. Noting that constant functions are just specialcases of linear ones, we set out to study functions that are somehow close to linear functions.

The idea we will pursue is that even if a function f is wild, it may be that very close toa particular point x0, it may be well represented by a linear function. For a good choice ofa linear function L, it would make sense to hope that

limx→x0

(f(x)− L(x)) = 0 .

If f is already continuous then this is not much of a requirement: we just need L(x0) = f(x0).So this just means that L(x) can be written as L(x) = a(x− x0) + f(x0).

We will look for a stronger requirement on the speed at which this difference convergesto zero. It should go to zero at least as fast as x− x0 does (as x→ x0). In other words, wewill require that

limx→x0

f(x)− L(x)

x− x0= 0 or in shorthand, f(x)− L(x) = o(x− x0) .

Plugging in our form of L, this means

limx→x0

[f(x)− f(x0)

x− x0− a]

= 0 ,

or

limx→x0

f(x)− f(x0)

x− x0= a .

Rewriting this with the notation above, we get

f(x) = f(x0) + a(x− x0) + o(x− x0) ,

or setting x = x0 + h,f(x0 + h) = f(x0) + ah+ o(h)

as h → 0. Again, the symbol o(h) represents some term such that if we divide it by h andtake h→ 0, it goes to 0.

Definition 7.1.1. Let f : (a, b)→ R. We say that f is differentiable at x0 ∈ (a, b) if

limx→x0

f(x)− f(x0)

x− x0exists .

In this case we write f ′(x0) for the limit.

74

7.2 Properties

Proposition 7.2.1. Let f : (a, b)→ R be differentiable at x0. Then f is continuous at x0.

Proof.

limx→x0

f(x) = f(x0) + limx→x0

[f(x)− f(x0)] = f(x0) + limx→x0

f(x)− f(x0)

x− x0limx→x0

(x− x0) = f(x0) .

The converse is not true. Consider the function f : R→ R given by f(x) = |x|. Then

f(x) = max{x, 0} −min{x, 0} ,

so it is continuous. However, trying to compute the derivative at x = 0, we get

limx→0

|x|x,

which does not exist (it has a right limit of 1 and left limit of -1).We will now play the same game as we did for continuity, trying to find which functions

are differentiable. Here are some examples.

1. f(x) = x:

limx→x0

f(x)− f(x0)

x− x0= 1 .

So f ′(x) exists for all x and equals 1.

2. f(x) = xn for n ∈ N:

limx→x0

xn − xn0x− x0

= limx→x0

(x− x0)(xn−1 + xn−2x0 + · · ·+ xxn−20 + xn−10 )

x− x0= lim

x→x0

[xn−1 + · · ·+ xn−10

]= nxn−10 .

So f ′(x) exists for all x and equals nxn−1.

Again we look at how to build differentiable functions from others.

Proposition 7.2.2. Let f, g : (a, b)→ R be differentiable at x. Then the following functionsare differentiable with derivatives:

1. (f + g)′(x) = f ′(x) + g′(x)

2. (fg)′(x) = f ′(x)g(x) + f(x)g′(x).

3. (f/g)′(x) = f ′(x)g(x)−f(x)g′(x)g2(x)

if g(x) 6= 0.

75

Proof. For the first we just use properties of limits:

limy→x

(f + g)(y)− (f + g)(x)

y − x= lim

y→x

f(y)− f(x)

y − x+ lim

y→x

g(y)− g(x)

y − x= f ′(x) + g′(x) .

For the second, we write

(fg)(y)− (fg)(x) = (f(y)− f(x))g(y) + f(x)(g(y)− g(x)) ,

divide by y − x and take a limit:

limy→x

(fg)(y)− (fg)(x)

y − x= lim

y→x

f(y)− f(x)

y − xlimy→x

g(y) + f(x) limy→x

g(y)− g(x)

y − x.

As g is differentiable at x, it is also continuous, so g(y) → g(x) as x → y. This gives theformula.

The last property can be derived in a similar fashion:

(f/g)(y)− (f/g)(x) =1

g(y)g(x)[f(y)g(x)− f(x)g(y)]

=1

g(y)g(x)[g(x)(f(y)− f(x))− f(x)(g(y)− g(x))] .

Dividing by y − x and taking the limit gives the result.

Again, from this proposition, we find that all polynomials are differentiable everywhereas are rational functions wherever the denominator is nonzero. The next way to builddifferentiable functions is to compose:

Theorem 7.2.3 (Chain rule). Let f : (a, b)→ (c, d) be differentiable at x0 and g : (c, d)→ Rbe differentiable at f(x0). Then g ◦ f is differentiable at x0 with derivative

(g ◦ f)′(x0) = f ′(x0)g′(f(x0)) .

Proof. We will want to use a division by f(y)− f(x0) for y 6= x0, so we must first deal withthe case that this could be 0. If there exists a sequence (xn) in (a, b) with xn → x0 butxn 6= x0 for all n with f(xn) = f(x0) for infinitely many n, we would have

f ′(x0) = limy→x

f(y)− f(x0)

y − x0= lim

n→∞

f(xn)− f(x0)

xn − x0= 0 ,

so the right side of the equation in the theorem would be 0. The left side would also be zerofor a similar reason:

limy→x0

(g ◦ f)(y)− (g ◦ f)(x0)

y − x0= lim

n→∞

g(f(xn))− g(f(x0))

xn − x0= 0 .

76

In the other case, every sequence (xn) in (a, b) with xn → x0 and xn 6= x0 has f(xn) =f(x0) for at most finitely many n. Then as f is continuous at x0, we have f(xn) → f(x0)with an 6= f(x0) for all n and so

limy→x

(g ◦ f)(y)− (g ◦ f)(x)

y − x= lim

n→∞

g(f(xn))− g(f(x0))

xn − x0

= limn→∞

g(f(xn))− g(f(x0))

f(xn)− f(x0)· limn→∞

f(xn)− f(x0)

xn − x0= g′(f(x0))f

′(x0) .

Examples.

1. We know f(x) = |x| is continuous but not differentiable. To go one level deeper,consider

f(x) =

{x2 x ≥ 0

−x2 x < 0.

The derivative at 0 is

limh→0

f(0 + h)

h= 0 ,

and the derivative elsewhere is

f ′(x) =

{2x x > 0

−2x x < 0.

Note that f ′ is continuous. Then we say f ∈ C1 (or f is in class C1). However thesecond derivative does not exist.

2. The function

f(x) =

{x3 x ≥ 0

−x3 x < 0

is in class C2, as it has two continuous derivatives. But it is not three times differen-tiable.

3. Generally, the function

f(x) =

{xn x ≥ 0

−xn x < 0, n ≥ 1

is in class Cn−1, meaning that it has n−1 continuous derivatives. But it is not n timesdifferentiable.

77

7.3 Mean value theorem

We begin by looking at local extrema.

Definition 7.3.1. For X a metric space, let f : X → R. We say that x0 ∈ X is a localmaximum for f if there exists r > 0 such that for all x ∈ Br(x0) we have f(x) ≤ f(x0).Similarly x0 is a local minimum for f if there exists r > 0 such that for all x ∈ Br(x0) wehave f(x) ≥ f(x0).

In the case that X is R, if f is differentiable at a local extreme point, then the derivativemust be zero.

Proposition 7.3.2. Let f : (a, b) → R and suppose that c ∈ (a, b) is a local extreme pointfor f . If f ′(c) exists then f ′(c) = 0.

Proof. Let c be a local max such that f ′(c) exists. Then there exists r > 0 such that for ally with |y − c| < r, we have f(y) ≤ f(c). Therefore, looking at only right limits,

limy→c+

f(y)− f(c)

y − c≤ 0 .

Looking only at left limits,

limy→c−

f(y)− f(c)

y − c≥ 0 .

Putting these together, we find f ′(c) = 0. The argument for local min is similar.

Theorem 7.3.3 (Rolle’s theorem). For a < b, let f : [a, b] → R be continuous such that fis differentiable on (a, b). If f(a) = f(b) then there exists c ∈ (a, b) such that f ′(c) = 0.

Proof. If f is constant on the interval then clearly the statement holds. Otherwise for somed ∈ (a, b) we have f(d) > f(a) or f(d) < f(a). Let us consider the first case; the second issimilar. By the extreme value theorem, f takes a maximum on [a, b] and since f(d) > f(a)this max cannot occur at a or b. So it occurs at some c ∈ (a, b). Then c is a local max aswell, so we can apply the previous proposition to find f ′(c) = 0.

An important corollary is the following.

Corollary 7.3.4 (Mean value theorem). For a < b let f : [a, b]→ R be continuous such thatf is differentiable on (a, b). There exists c ∈ (a, b) such that

f ′(c) =f(b)− f(a)

b− a.

Proof. Define L(x) to be the line that connects the points (a, f(a)) and (b, f(b)):

L(x) =f(b)− f(a)

b− a(x− a) + f(a) .

78

Then the function g = f − L satisfies g(a) = g(b) = 0. It is also continuous on [a, b]and differentiable on (a, b). Therefore by Rolle’s theorem, we can find c ∈ (a, b) such thatg′(c) = 0. This gives

0 = g′(c) = f ′(c)− L′(c) = f ′(c)− f(b)− f(a)

b− a,

implying the corollary.

The mean value theorem has a lot of consequences. It is one of the central tools toanalyze derivatives.

Corollary 7.3.5. Let f : (a, b)→ R be differentiable.

1. If f ′(x) ≥ 0 for all x ∈ (a, b) then f is non-decreasing.

2. If f ′(x) ≤ 0 for all x ∈ (a, b) then f is non-increasing.

3. If f ′(x) = 0 for all x ∈ (a, b) then f is constant.

Proof. Suppose first that f ′(x) ≥ 0 for all x ∈ (a, b). To show f is non-decreasing, let c < din (a, b). By the mean value theorem, there exists x0 ∈ (c, d) such that

f ′(x0) =f(d)− f(c)

d− c.

But this quantity is nonnegative, giving f(d) ≥ f(c). The second follows by considering −finstead of f . The third follows from the previous two.

7.4 L’Hopital’s rule

For the proof of L’Hopital’s rule, we need a generalized version of the mean value theorem.

Lemma 7.4.1 (Generalized MVT). If f, g : [a, b]→ R are continuous and differentiable on(a, b) then there exists c ∈ (a, b) such that

(f(b)− f(a))g′(c) = (g(b)− g(a))f ′(c) .

Proof. The proof is exactly the same as that of the MVT but using the function h : [a, b]→ Rgiven by

h(x) = (f(b)− f(a))g(x)− (g(b)− g(a))f(x) .

Indeed, h(a) = (f(b)− f(a))g(a)− (g(b)− g(a))f(a) = h(b), so applying Rolle’s theorem, wefind c ∈ (a, b) such that h′(c) = 0.

79

Theorem 7.4.2 (L’Hopital’s rule). Suppose f, g : (a, b)→ R are differentiable with g′(x) 6= 0for all x, where −∞ ≤ a < b <∞. Suppose that

f ′(x)

g′(x)→ A as x→ a .

If f(x)→ 0 and g(x)→ 0 as x→ a or if g(x)→ +∞ as x→ a, then

f(x)

g(x)→ A as x→ a .

Proof. We will suppose that A, a 6= ±∞; otherwise the argument is similar. We considertwo cases. First suppose that f(x)→ 0 and g(x)→ 0 as x→ a. Then let ε > 0 and chooseδ > 0 such that if x ∈ (a, a+ δ) then∣∣∣∣f ′(x)

g′(x)− A

∣∣∣∣ < ε/2 .

We will now show that if x ∈ (a, a + δ) then also∣∣∣f(x)g(x)

− A∣∣∣ < ε. Indeed, choose such an x

and then pick any y ∈ (a, x). From the generalized MVT, there exists c ∈ (y, x) such that

f(x)− f(y)

g(x)− g(y)=f ′(c)

g′(c).

Note that the denominator is nonzero since g is injective (just use the MVT). But sincec ∈ (a, a+ δ), we have ∣∣∣∣f(x)− f(y)

g(x)− g(y)− A

∣∣∣∣ < ε/2 .

Let y → a and we find the result.In the second case, we suppose that g(x) → +∞ as x → a. Again for ε > 0 pick δ1 > 0

such that if x ∈ (a, a+ δ1) then ∣∣∣∣f ′(x)

g′(x)− A

∣∣∣∣ < ε/2 .

Fix x0 = a+ δ1. By the generalized MVT, as before, for all x ∈ (a, x0),

A− ε/2 < f(x)− f(x0)

g(x)− g(x0)< A+ ε/2 . (4)

Notice that since g(x)→∞ as x→ a,

limx→a

x∈(a,x0)

g(x)− g(x0)

g(x)= 1 .

80

Therefore using equation (4), there exists δ2 < δ1 such that if x ∈ (a, a+ δ2) then

A− 3ε/4 <f(x)− f(x0)

g(x)− g(x0)· g(x)− g(x0)

g(x)< A+ 3ε/4 . (5)

Also since g(x)→∞ as x→ a,

limx→a

x∈(a,x0)

f(x0)

g(x)= 0 .

Therefore using (9) we can find δ3 < δ2 such that if x ∈ (a, a+ δ3) then

A− ε < f(x)− f(x0)

g(x)− g(x0)· g(x)− g(x0)

g(x)+f(x0)

g(x)< A+ ε .

But this means

A− ε < f(x)

g(x)< A+ ε for all x ∈ (a, a+ δ3) .

This proves that f(x)g(x)→ A as x→ a.

7.5 Power series

We will derive some results about power series because they will help us on the problem setto define trigonometric functions. Let f(x) =

∑∞n=0 anx

n be a power series with radius ofconvergence R > 0. We wish to show that

• f is differentiable on (−R,R).

• The power series∑∞

n=0 nanxn−1 also has radius of convergence R.

• For all x ∈ (−R,R), f ′(x) =∑∞

n=0 nanxn−1.

Step 1. The power series∑∞

n=0 nanxn−1 also has radius of convergence R. To show this, we

need a lemma.

Lemma 7.5.1. Suppose that (xn) and (yn) are non-negative real sequences such that xn →x > 0. Then

lim supn→∞

xnyn = x lim supn→∞

yn .

Proof. We will use the definition from the homework that lim supn→∞ bn is the supremumof all subsequential limits of (bn). Let S be the set of subsequential limits of (yn) and T thecorresponding set for (xnyn). We will prove the case that S and T are bounded above; theother case is left as an exercise.

We claim thatxS = T , where xS = {xs : s ∈ S} .

To prove this, let a ∈ xS. Then there exists a subsequence (ynk) such that ynk → a/x.Now xnkynk → xa/x = a, giving that a ∈ T . Conversely, let b ∈ T so that there exists a

81

subsequence (xnkynk) such that xnkynk → b. Then ynk = xnkynk/xnk → b/x. This meansthat b = xb/x ∈ xS.

To finish the proof we show that supT = x supS. First if t ∈ T we have t/x ∈ S, sot/x ≤ supS. Therefore t ≤ x supS and supT ≤ x supS. Conversely if s ∈ S then xs ∈ T ,so xs ≤ supT , giving s ≤ (1/x) supT . This means supS ≤ (1/x) supT and thereforesupT ≥ x supS.

To find the radius of convergence of∑∞

n=0 nanxn−1, we use the root test:

lim supn→∞

(n|an|)1/n = lim supn→∞

n1/n|an|1/n .

Since n1/n → 1 we can use the previous lemma to get a limsup of 1/R, where R is the radiusof convergence of

∑∞n=0 anx

n. This means the radius of convergence of the new series is alsoR.

Step 2. The function f given by f(x) =∑∞

n=0 anxn is differentiable at x = 0.

To prove this, we use 0 < |x| < R/2 and compute

f(x)− f(0)

x− 0=

∑∞n=0 anx

n − a0x

=∞∑n=1

anxn−1 .

Pulling off the first term,∣∣∣∣f(x)− f(0)

x− 0− a1

∣∣∣∣ =

∣∣∣∣∣∞∑n=2

anxn−1

∣∣∣∣∣ = |x|

∣∣∣∣∣∞∑n=2

anxn−2

∣∣∣∣∣ .We can use the triangle inequality for the last sum to get∣∣∣∣f(x)− f(0)

x− 0− a1

∣∣∣∣ ≤ |x| ∞∑n=2

|an||x|n−2 ≤ |x|∞∑n=2

|an|(R/2)n−2 .

By the ratio test, the last series converges, so setting C equal to it, we find∣∣∣∣f(x)− f(0)

x− 0− a1

∣∣∣∣ ≤ C|x| .

Now we can take the limit as x→ 0 and find

limx→0

∣∣∣∣f(x)− f(0)

x− 0− a1

∣∣∣∣ = 0 , or limx→0

f(x)− f(0)

x− 0= a1 .

This means f ′(0) = a1.

Step 3. We will now prove that f is differentiable at all |x| < R. So take such an x0 and usethe binomial theorem:

f(x) =∞∑n=0

an(x− x0 + x0)n =

∞∑n=0

an

[n∑j=0

(n

j

)xn−j0 (x− x0)j

]

=∞∑n=0

∞∑j=0

1n≥jan

(n

j

)xn−j0 (x− x0)j . (6)

We now state a lemma.

82

Lemma 7.5.2. Let am,n, m, n ≥ 0 be a double sequence. If∑∞

n=0 [∑∞

m=0 |am,n|] convergesthen

∞∑n=0

[∞∑m=0

am,n

]=

∞∑m=0

[∞∑n=0

am,n

].

Proof. Let ε > 0 and write S for the left side above and T for the right side above. ForM,N ∈ N, define

SM,N =N∑n=0

M∑m=0

am,n and TM,N =M∑m=0

N∑n=0

am,n .

Clearly SM,N = TM,N for all M,N ∈ N. We claim that there exists M0, N0 such that ifM ≥ M0 and N ≥ N0 then both |S − SM,N | and |T − TM,N | are less than ε/2. We need toonly verify this for S because the same argument works for T . Once we show that, we have

|S − T | ≤ |S − SM,N |+ |SM,N − TM,N | ≤ |TM,N − T | < ε ,

and since ε is arbitrary, this means S = T .To prove the claim first use the fact that

∑∞n=0 [

∑∞m=0 |am,n|] converges to pick N0 such

that if n ≥ N0 then∑∞

n=N0+1 [∑∞

m=0 |am,n|] < ε/4. Next because each sum

∞∑m=0

|am,0|,∞∑m=0

|am,1|, . . . ,∞∑m=0

|am,N0|

converges, we can pick M0 such that if∑N0

n=0

∑∞m=0 |am,n| < ε/4. This gives for M ≥ M0

and N ≥ N0,

|S − SM,N | ≤N∑n=0

[∞∑

m=M+1

|am,n|

]+

∞∑n=N+1

[∞∑m=0

|am,n|

]

=

N0∑n=0

[∞∑

m=M+1

|am,n|

]+

N∑n=N0+1

[∞∑

m=M+1

|am,n|

]+

∞∑n=N+1

[∞∑m=0

|am,n|

]

≤N0∑n=0

[∞∑

m=M+1

|am,n|

]+

∞∑n=N0+1

[∞∑m=0

|am,n|

]

≤N0∑n=0

[∞∑

m=M0+1

|am,n|

]+

∞∑n=N0+1

[∞∑m=0

|am,n|

]< ε/2 .

We now want to apply the lemma to the sum in (6). To do this, we must verify that

∞∑n=0

[∞∑j=0

1n≥j|an|(n

j

)|x0|n−j|x− x0|j

]

83

converges. But using the binomial theorem again, this sum equals

∞∑n=0

|an|(|x0|+ |x− x0|)n ,

which converges as long as |x0|+ |x− x0| < R. So pick such an x and we can exchange theorder of summation:

f(x) =∞∑j=0

[∞∑n=j

an

(n

j

)xn−j0

](x− x0)j .

We can view this as a power series in x− x0 by setting g(x) = f(x+ x0) and seeing that for|x| < R− |x0|,

g(x) =∞∑j=0

bjxj , with bj =

∞∑n=j

an

(n

j

)xn−j0 .

Taking the derivative of this at x = 0 gives by the previous computation

f ′(x0) = g′(0) = b1 =∞∑n=1

nanxn−10 .

7.6 Taylor’s theorem

Note that the theorem on power series actually gives that if f(x) =∑∞

n=0 anxn then f has

infinitely many derivatives. (Just apply the theorem over and over.) Then we ask: is it truethat if a function has infinitely many derivatives then it is equal to some power series?

Definition 7.6.1. A function f : (a, b)→ R is called analytic if it equals some power series∑∞n=0 anx

n.

The question now becomes: is every f ∈ C∞ actually analytic? To try to answer thisquestion we look at the derivatives of a power series: if f(x) =

∑∞n=0 anx

n, then

f(0) = a0, f′(0) = a1, f

′′(0) = 2a2, f′′′(0) = 6a3, . . .

So we can rewrite a power series as

f(x) =∞∑n=0

f (n)(0)

n!xn .

The sum on the right is called the Taylor series for f .To try to go the other way (to try to build a power series from a function), suppose for

simplicity that f : R→ R and a < b. If f is differentiable on (a, b) and continuous on [a, b],the mean value theorem gives c1 ∈ (a, b) such that

f(b) = f(a) + f ′(c1)(b− a) .

84

We can then ask, if f is twice differentiable, can we find c2 ∈ (a, b) such that

f(b) = f(a) + f ′(a)(b− a) +f ′′(c2)

2(b− a)2 ,

or a c3 ∈ (a, b) such that

f(b) = f(a) + f ′(a)(b− a) +f ′′(a)

2(b− a)2 +

f ′′′(c3)

6(b− a)3 ?

The answer is yes, and in fact we can keep going to any order we like. For its statement,derivatives at a and b are understood as right and left derivatives, respectively.

Theorem 7.6.2 (Taylor’s theorem). Suppose that f : [a, b] → R has n − 1 continuousderivatives on [a, b] and is n times differentiable on (a, b). There exists c ∈ (a, b) such that

f(b) =n−1∑j=0

f (j)(a)

j!(b− a)j +

f (n)(c)

n!(b− a)n .

Proof. See the proof in Rudin, Thm. 5.15. It is a repeated application of the mean valuetheorem.

We get from this a corollary:

Corollary 7.6.3. Suppose that f : [a, b] → R has infinitely many derivatives; that is, f ∈C∞([a, b]). Set

Mn = supc∈(a,b)

f (n)(c) .

If Mn

n!(b− a)n → 0 then

f(b) =∞∑n=0

f (n)

n!(b− a)n .

We can see that in this corollary it is necessary to have this bound on Mn. Take forexample f : [0,∞)→ R given by

f(x) =

{e−1/x if x > 0

0 if x = 0.

In this case, you can check that f (n)(0) = 0 for all n. However, if f(x) =∑∞

n=0 anxn this

would imply that an = 0 for all n, giving f(x) = 0 for all x.This means in particular that we must not have the required growth on f (n)(x) to apply

the corollary. If you compute the n-th derivative, you can try to see why the corollary doesnot apply; that is, why f is not analytic. For instance, we have

f ′(x) = e−1/x[−1

x2

], f ′′(x) = e−1/x

[1

x4− 2

x3

]for x > 0

85

and n-th derivative can be written as

f (n)(x) = e−1/xP (1/x) ,

where P is a polynomial in 1/x of degree 2n. For any given r > 0, you can show that

supx∈[0,r]

f (n)(x)

n!rn →∞ ,

so that the corollary cannot apply.

7.7 Exercises

1. Prove that for any c ∈ R, the polynomial equation x3 − 3x+ c = 0 does not have twodistinct roots in [0, 1].

2. Suppose that f : R→ R is differentiable and there exists C < 1 such that |f ′(x)| ≤ Cfor all x.

(a) Show that there exists a unique fixed point; that is, an x such that f(x) = x.

(b) Show that if f(0) > 0 then the fixed point is positive.

3. Let f : R→ R be continuous. Suppose that for some a < b, both of the following twoconditions hold:

• f(a) = f(b) = 0 and

• f si differentiable at both a and b with f ′(a)f ′(b) > 0.

Show there exists c ∈ (a, b) such that f(c) = 0.

4. Assume f on [a, b] is continuous, and that f ′ exists and is everywhere continuous andpositive on (a, b). Let [c, d] be the image of f . Prove that f has an inverse functionf−1 : [c, d]→ [a, b] and that the derivative of f−1 is continuous on (c, d).

5. Let f : (−a, a) → R. Assume there is a C ∈ R such that for all x ∈ (−a, a), we have|f(x)− x| ≤ Cx2. Does f ′(0) exist? If so, what is it?

6. Use the Mean Value Theorem to prove that for x 6= 0

1 +x

2√

1 + x<√

1 + x < 1 + x/2 .

7. If I is an open interval and f : I → R is differentiable, show that |f ′(x)| is bounded onI by a constant M if and only if f is Lipschitz on I with Lipschitz constant boundedabove by (this same) M .

86

8. Read example 5.6 in Rudin. Define f : R→ R by

f(x) =

{x200 sin 1

xx 6= 0

0 x = 0.

(a) For which n ∈ N does f (n)(0), the n-th derivative of f at 0, exist?

(b) For which n ∈ N does limx→0+ f(n)(x) exist?

(c) For which n ∈ N is f ∈ Cn(R)?

9. Let I ⊂ R be an open interval. Assume f : I → R is continuous on I and is differen-tiable on I except perhaps at c ∈ I. Suppose further that limx→c f

′(x) exists. Provethat f is differentiable at c and that f ′ is continuous at c.

10. (Weierstrass M-test) Let I be any interval. For each n ∈ N, let fn : I → R becontinuous and assume that there is a constant Mn such that |fn(x)| ≤ Mn for all x.Assume further that

∑Mn converges.

(a) Show that for each x ∈ I, the sum∑

n fn(x) converges. Call this number f(x).We say {fn} converges pointwise to f .

(b) Show that f : I → R given in the first part is continuous.

Hint. Given ε > 0, find N ∈ N such that∣∣∣∑N

n=1 fn(x)− f(x)∣∣∣ < ε/2 for all x ∈ I.

Then use the fact that∑N

n=1 fn is continuous.

Remark. The condition above with ε/2 is called uniform convergence. Precisely,we say a family {fn} of functions from R to R converges uniformly to f if foreach ε > 0 there exists N such that n ≥ N implies that |fn(x)− f(x)| < ε for allx. This problem is a special case of a more general theorem: if a family {fn} ofcontinuous functions converges uniformly then the limit f is continuous. Try tothink up an example where a family of continuous functions converges pointwiseto f , but does not converge uniformly, and where f is not continuous.

11. (From J. Feldman.) In this problem we will construct a function that is continuouseverywhere but differentiable nowhere. Define g : R→ R by first setting for x ∈ [0, 2],

g(x) =

{x x ∈ [0, 1]

2− x x ∈ [1, 2].

Then for x /∈ [0, 2], define g(x) so that it is periodic of period 2; that is, set g(x) = g(x)for the unique x ∈ [0, 2) such that x = x+ 2m for some m ∈ Z. (The graph of g formsa sequence of identical triangles with the x-axis, each of height 1 and base 2. Clearlyg is continuous.) For each n ∈ N, define fn : [0, 1]→ R by fn(x) =

(34

)ng(4nx).

(a) Make a sketch of f1 and f2 on [0, 1]. (Optional: use a computer algebra packageto graph f1, f1 + f2, f1 + f2 + f3, etc.)

87

(b) Prove that the formula f(x) =∑∞

n=1 fn(x) defines a continuous function on [0, 1].

(c) Complete the following steps to show that f is not differentiable at any x.

i. Let x ∈ [0, 1] and for each m ∈ N, define hm to be either number in the set{x− 1

24−m, x+ 1

24−m

}such that there is no integer strictly between 4mx and

4mhm. Show that

if n > m thenfn(hm)− fn(x)

hm − x= 0 .

ii. Show that

if n = m then

∣∣∣∣fn(hm)− fn(x)

hm − x

∣∣∣∣ = 3m .

iii. Show that

if n < m then

∣∣∣∣fn(hm)− fn(x)

hm − x

∣∣∣∣ ≤ 3n .

Putting these three cases together, show that∣∣∣∣f(hm)− f(x)

hm − x

∣∣∣∣ ≥ 1

2(3m + 3)

and deduce that f is not differentiable at x.

12. Define for x ∈ R,

sinx =∞∑n=0

(−1)nx2n+1

(2n+ 1)!and cosx =

∞∑n=0

(−1)nx2n

(2n)!.

(a) Show that for any x, both series converge absolutely and define continuous func-tions. Show that cos 0 = 1 and sin 0 = 0.

(b) Show that the derivative of sinx is cos x and the derivative of cosx is − sinx.

(c) Show that for any x, sin2 x+ cos2 x = 1.

Hint. Take the derivative of the left side.

(d) For a given a ∈ R find the Taylor series of both f(x) = sin(a + x) and g(x) =cos(a+ x) centered at x = 0.

(e) Use the previous part to show the identities

sin(x+ y) = sinx cos y + cosx sin y and cos(x+ y) = cos x cos y − sinx sin y .

13. Define the setS = {x > 0 : cos x = 0} .

88

(a) Show that S is nonempty.

Hint. Assume it is empty. Since cos 0 = 1, show that then cosx would bepositive for all x > 0 and therefore sinx would be strictly increasing. As sinx isbounded, it would have a limit as x → ∞. Deduce then that cos x would alsohave a limit L. Show that L = 2L2− 1 and that we must have L = 1. Argue thatthis implies sinx is unbounded.

(b) Defineπ = 2 inf S .

Show that cos π2

= 0, sin π2

= 1. Then prove that sin(x + 2π) = sinx andcos(x+ 2π) = cos x.

(c) Define tanx = sinxcosx

for all x such that cos x 6= 0. Show that tan π4

= 1.

14. Please continue to use only the facts about trigonometry established in problems 9and 10.

(a) Show that the derivative of tanx is sec2 x, where we define secx = 1/ cosx.

(b) From now on, restrict the domain of tanx to (−π/2, π/2). Show that tanx isstrictly increasing on this domain. Show that its image is R. Therefore tanx hasan inverse function arctanx mapping R → (−π/2, π/2). By problem 1, arctanxis of class C1, and in particular continuous.

(c) Show that sec2(arctanx) = 1+x2 for all x ∈ R. (It is not rigorous to draw a littleright triangle with an angle θ = arctanx in one corner. Problems 9–10 involve nonotion of angle or two-dimensional geometry.)

(d) By the definition of inverse function, tan(arctanx) = x for all x ∈ R. Use theChain Rule to show the derivative of arctanx is 1

1+x2.

(e) In the geometric series 1 + x + x2 + x3 + · · · , substitute −x2 for x. Show thatthis power series converges to 1

1+x2for x ∈ (−1, 1). (Aside: is this uniform

convergence?)

(f) Consider the power series

A(x) = x− x3

3+x5

5− x7

7+ · · ·

Show that this defines an analytic function on (−1, 1). Show that A(x) andarctanx have the same derivative. Therefore A(x)−arctanx is a constant. Check-ing at x = 0 to see what this constant is, show that A(x) = arctan x on (−1, 1).

(g) Show that arctanx is uniformly continuous on R.

(h) Since A(x) equals arctan x on (−1, 1), it is uniformly continuous on that openinterval. By the last problem set, it has a unique continuous extension to [−1, 1].Conclude that

π

4= 1− 1

3+

1

5− 1

7+ · · ·

89

15. Abel’s limit theorem. Suppose that f : (−1, 1] → R is a function such that (a) fis continuous at x = 1 and (b) for all x ∈ (−1, 1), f(x) =

∑∞n=0 anx

n for some powerseries that converges for all x ∈ (−1, 1). If, in addition,

∑an converges, prove that

∞∑n=0

an = f(1) .

Hint. For x ∈ (−1, 1) write fn(x) =∑n

k=0 akxk and An =

∑nk=0 ak. Show that

fn(x) = (1− x)(A0 + · · ·+ An−1xn−1) + Anx

n .

Let n → ∞ to get a different representation for f(x). Next denote A =∑∞

k=0 ak andwrite

f(x)− A = f(x)− (1− x)∞∑n=0

Axn .

Use the representation of f(x) above to bound this difference for x near 1.

90

8 Integration

The standard motivation for integration is to find the area under the graph of a function.There are other very important reasons to study integration and one is that integration isa smoothing operation: the (indefinite) integral of a function has more derivatives than theoriginal function does. Other motivations can be seen in abstract measure theory and theapplication to, for instance, probability theory.

8.1 Definitions

We will start at the bottom and try to find the area under a graph. We will place boxesunder the graph and sum the area in these boxes. The x-coordinates of the sides of theseboxes form an (ordered) partition. Although we have used this word before, it will take anew meaning here.

Definition 8.1.1. A partition P of the interval [a, b] is a finite set {x1, . . . , xn} such that

a = x1 < x2 < · · · < xn = b .

Given a partition and a bounded function f we can construct an upper sum and a lowersum. To do this, we consider a subinterval [xi, xi+1] and let

mi = infx∈[xi,xi+1]

f(x) and Mi = supx∈[xi,xi+1]

f(x) .

A box with base [xi, xi+1] and height Mi contains the entire area below f in this interval,whereas the box with the same base but height mi is contained in this area. (Here we arethinking of f ≥ 0, so these statements are slightly different otherwise.) Counting up thearea of these boxes, we get the following definitions.

Definition 8.1.2. Given a partition P = {x1 < · · · < xn} of [a, b] and a bounded functionf : [a, b]→ R we define the upper and lower sums of f relative to the partition P as

U(f,P) =n−1∑i=1

Mi(xi+1 − xi) and L(f,P) =n−1∑i=1

mi(xi+1 − xi) .

There is a useful monotonicity property of upper and lower sums. To state this, we usethe following term. A partition Q of [a, b] is said to be a refinement of P if P ⊂ Q. Thismeans that we have just thrown in extra subintervals to P to form Q.

Lemma 8.1.3. Let f : [a, b]→ R be bounded and Q a refinement of P. Then

U(f,Q) ≤ U(f,P) and L(f,Q) ≥ L(f,P) .

91

Proof. By iteration (or induction) it suffices to show the inequalities in the case that Q hasjust one more point than P . So take P = {x1 < · · · < xn} and Q = {x1 < · · · < xk < t <xk+1 < · · · < xn}. Since most intervals are unchanged,

U(f,P)− U(f,Q) = Mk(xk+1 − xk)−

[sup

y∈[xk,t]f(y)

](y − xk)−

[sup

z∈[t,xk+1]

f(z)

](xk+1 − y)

≥Mk(xk+1 − xk)−Mk(y − xk)−Mk(xk+1 − y)

= 0 .

The argument for lower sums is similar.

The above lemma says that upper sums decrease and lower sums increase when we addmore points into the partition. Since we are thinking of taking very fine partitions, we definethe upper and lower integrals∫ b

a

f(x) dx = infPU(f,P) and

∫ b

a

f(x) dx = supPL(f,P)

for bounded f : [a, b]→ R. Note that these are defined for all bounded f .

Definition 8.1.4. If f : [a, b]→ R then f is integrable (written f ∈ R([a, b])) if∫ b

a

f(x) dx =

∫ b

a

f(x) dx .

In this case we write∫ baf(x) dx for the common value.

Note the following property of upper and lower sums and integrals.

• For any partition P of [a, b] and bounded function f : [a, b]→ R,

L(f,P) ≤∫ b

a

f(x) dx ≤∫ b

a

f(x) dx ≤ U(f,P) .

Proof. The only inequality that is not obvious is the one between the integrals. Toshow this, we first let ε > 0. By definition of the upper and lower integrals, there existpartitions P1 and P2 of [a, b] such that

L(f,P) >

∫ b

a

f(x) dx− ε/2 and U(f,Q) <

∫ b

a

f(x) dx+ ε/2 .

Taking P ′ to be the common refinement of P and Q (that is, their union), we can usethe previous lemma to find∫ b

a

f(x) dx < L(f,P) + ε/2 ≤ L(f,P ′) + ε/2 ≤ U(f,P ′) + ε/2

≤ U(f,Q) + ε/2 <

∫ b

a

f(x) dx+ ε .

Taking ε→ 0 we are done.

92

There is an equivalent characterization of integrability. It is useful because the conditioninvolves only one partition, whereas when dealing with both upper and lower integrals onewould need to approximate using two partitions.

Theorem 8.1.5. Let f : [a, b]→ R be bounded. f is integrable if and only if for each ε > 0there is a partition P of [a, b] such that U(f,P)− L(f,P) < ε.

Proof. Suppose first that f is integrable and let ε > 0. Then the upper and lower integrals

are equal. Choose P1 such that L(f,P1) >∫ baf(x) dx− ε/2 and U(f,P2) <

∫ baf(x) dx+ ε/2.

Taking P to be the common refinement of P1 and P2 we find

L(f,P) ≥ L(f,P1) >

∫ b

a

f(x) dx− ε/2

and

U(f,P) ≤ U(f,P2) <

∫ b

a

f(x) dx+ ε/2 .

Combining these two gives U(f,P)− L(f,P) < ε.Conversely suppose that for each ε > 0 we can find a partition P such that U(f,P) −

L(f,P) < ε. Then∫ b

a

f(x) dx ≤ U(f,P) < L(f,P) + ε ≤∫ b

a

f(x) dx+ ε .

Since ε > 0 is arbitrary, we find∫ baf(x) dx ≤

∫ baf(x) dx. The other inequality is obvious, so

the upper and lower integrals are equal. In other words, f ∈ R.

Using this we can show that all continuous functions are integrable.

Theorem 8.1.6. Let f : [a, b]→ R be continuous. Then f is integrable.

Proof. Since [a, b] is compact, f is uniformly continuous. Then given ε > 0 we can find δ > 0such that if x, y ∈ [a, b] with |x−y| < δ then |f(x)−f(y)| < ε/(2(b−a)). Now construct anypartition P of [a, b] such that, writing P = {x1 < x2 < · · · < xn}, we have |xi − xi+1| < δfor all i = 1, . . . , n− 1. Then in each subinterval [xi, xi+1], we have

|f(x)− f(y)| < ε/2 for all x, y ∈ [xi, xi+1] .

This gives Mi −mi ≤ ε/(2(b− a)) < ε/(b− a). Therefore

U(f,P)− L(f,P) =n−1∑i=1

(Mi −mi)(xi+1 − xi) < ε/(b− a)n−1∑i=1

(xi+1 − xi) = ε .

Using the last theorem, we are done.

93

So we know now that all continuous functions are integrable. There are some otherquestions we need to resolve.

1. Which other functions are integrable?

2. Which functions are not integrable?

3. How do we compute integrals?

Examples.

• Let f be the indicator function of the rationals:

f(x) =

{1 if x ∈ Q0 if x /∈ Q

.

We will now show that f is not integrable on any [a, b].Indeed, let P be any partitionof [a, b], written as {x1 < x2 < · · · < xn}. Then for each subinterval [xi, xi+1], we have

Mi = supx∈[xi,xi+1]

f(x) = 1 and mi = 0 .

Therefore U(f,P)−L(f,P) =∑n−1

i=1 (Mi−mi)(xi+1−xi) = b− a. Choosing any ε > 0that is less than b−a, we see that there is no partition P such that U(f,P)−L(f,P) < ε.Therefore f /∈ R.

• Every monotone function is integrable. Indeed, take f : [a, b]→ R to be nondecreasing.If Pn is the partition

Pn =

{a < a+

b− an

< a+2(b− a)

n< · · · 0 take n such that (f(b)− f(a))/n < ε. This shows that f ∈ R.

• All functions with countably many discontinuities are integrable. One example will bein the problem set. It is actually possible to show that some functions with uncountablymany discontinuities are integrable, but we will not address this.

94

Let us prove a simple example, the function f : [0, 1] given by

f(x) =

{0 x ≤ 1/2

1 x > 1/2.

Given ε > 0 we construct a partition containing a very small subinterval around thediscontinuity. Let P = {0 < 1/2− ε/3 < 1/2 + ε/3 < 1}. Then

U(f,P)− L(f,P) =2∑i=1

(Mi −mi)(xi+1 − xi)

= 0(1/2− ε/3) + 1(2ε/3) + 0(1/2− 2ε/3) = 2ε/3 < ε .

In this example we did not need to care about subintervals away from the discontinuitybecause the function is constant there (and thus has Mi = mi). In general we wouldhave to have construct a partition with somewhat more complicated parts there too(possibly using continuity).

Let us now give an example of computing an integral by hand. Consider f : [0, 1] → Rgiven by f(x) = x2. Take a partition Pn to be

Pn =

{0 <

1

n<

2

n< · · · < n− 1

n< 1

}.

The upper sum is

U(f,Pn) =n−1∑i=0

f

(i+ 1

n

)(1/n) = (1/n)

n−1∑i=0

(i+ 1

n

)2

=1

n3

n∑i=1

i2 =n(n+ 1)(2n+ 1)

6n3→ 1/3.

Similarly, L(f,Pn)→ 1/3. This means that∫ 1

0f(x) dx ≥ 1/3 and

∫ 1

0f(x) dx ≤ 1/3, giving∫ 1

0

x2 dx = 1/3 .

8.2 Properties of integration

Here we state many properties of the integral. Because of the third item, we define∫ a

b

f(x) dx = −∫ b

a

f(x) dx

and the third item remains valid for any a, b, d.

95

Proposition 8.2.1. Let f, g : [a, b]→ R be integrable and c ∈ R.

1. The functions f + g and cf are integrable with∫ b

a

(f + g)(x) dx =

∫ b

a

f(x) dx+

∫ b

a

g(x) dx

and ∫ b

a

(cf)(x) dx = c

∫ b

a

f(x) dx .

2. If f(x) ≤ g(x) for all x ∈ [a, b] then∫ b

a

f(x) dx ≤∫ b

a

g(x) dx .

3. If d ∈ (a, b) then f is integrable on [a, d] and on [d, b] with∫ b

a

f(x) dx =

∫ d

a

f(x) dx+

∫ b

d

f(x) dx .

Proof. Let us show item 1 first. For ε > 0, take P and Q to be partitions such that

L(f,P) ≤∫ b

a

f(x) dx ≤ U(f,P) < L(f,P) + ε/2

and

L(g,Q) ≤∫ b

a

g(x) dx ≤ U(g,Q) < L(g,Q) + ε/2

Let P ′ be their common refinement so that

L(f,P ′) + L(g,P ′) ≤∫ b

a

f(x) dx+

∫ b

a

g(x) dx < L(f,P ′) + L(g,P ′) + ε .

On the other hand you can check that

L(f,P ′) + L(g,P ′) ≤ L(f + g,P ′) ≤ U(f + g,P ′) ≤ U(f,P ′) + U(g,P ′) .

(Here we have used that for bounded functions h1 and h2 and any set S ∈ R, infx∈S(h1(x) +h2(x)) ≤ infx∈S h1(x) + infx∈S h2(x) and the corresponding statement for suprema.) So wefind both

U(f + g,P ′)− L(f + g,P ′) < ε

and ∣∣∣∣L(f + g,P ′)−∫ b

a

f(x) dx−∫ b

a

g(x) dx

∣∣∣∣ < ε .

96

The first statement implies that f+g is integrable and∣∣∣L(f + g,P ′)−

∫ ba(f + g)(x) dx

∣∣∣ < ε.

Combining this with the second statement gives∣∣∣∣∫ b

a

(f + g)(x) dx−∫ b

a

f(x) dx−∫ b

a

g(x) dx

∣∣∣∣ < 2ε .

Since ε is arbitrary this gives the result.If c ∈ R suppose first that c ≥ 0. Then for any set S ⊂ R and bounded function

h : S → R we have

supx∈S

(ch)(x) = c supx∈S

h(x) and infx∈S

(ch)(x) = c infx∈S

h(x) .

Therefore for any partition P of [a, b],

U(cf,P) = cU(f,P) and L(cf,P) = cL(f,P) .

So given that f is integrable and ε > 0, we can choose a partition P such that U(f,P) −L(f,P) < ε/c. Then U(cf,P)− L(cf,P) < ε, proving that cf is integrable. Furthermore,

L(cf,P) = cL(f,P) ≤ c

∫ b

a

f(x) dx ≤ cU(f,P = U(cf,P) ≤ L(cf,P) + ε ,

giving∣∣∣c ∫ ba f(x) dx− L(cf,P)

∣∣∣ < ε. However we already know that

L(cf,P) ≤∫ b

a

(cf)(x) dx ≤ U(cf,P) < L(cf,P) + ε ,

giving∣∣∣∫ ba (cf)(x) dx− L(cf,P)

∣∣∣ < ε. Combining these two and taking ε → 0 proves∫ ba(cf)(x) dx = c

∫ baf(x) dx.

If instead c < 0 then we first prove the case c = −1. Then we have for any partition P of[a, b] that U(−f,P) = −L(f,P) and L(−f,P) = −U(f,P). Thus is U(f,P)− L(f,P) < εwe also have U(−f,P)− L(−f,P) < ε, proving that −f is integrable. Further, as above,

L(−f,P) ≤∫ b

a

(−f)(x) dx < L(−f,P) + ε

and

−U(f,P) ≤ −∫ b

a

f(x) dx < −U(f,P) + ε .

Combining these and taking ε→ 0 gives∫ ba(−f)(x) dx = −

∫ baf(x) dx. Last, for any c < 0

we note that if f is integrable, so is −f and since −c > 0, so is (−c)(−f) = cf . Further,∫ b

a

(cf)(x) dx =

∫ b

a

(−(−cf))(x) dx = −∫ b

a

(−cf)(x) dx = −(−c)∫ b

a

f(x) dx

= c

∫ b

a

f(x) dx .

97

For the second item, we just use the fact that for every partition P of [a, b], U(f,P) ≤U(g,P) whenever f(x) ≤ g(x) for all x ∈ [a, b]. So given ε > 0, choose P such that

U(g,P) <∫ bag(x) dx+ ε. Now∫ b

a

f(x) dx ≤ U(f,P) ≤ U(g,P) <

∫ b

a

g(x) dx+ ε .

This is true for all ε > 0 so we deduce that∫ baf(x) dx ≤

∫ bag(x) dx.

We move to the third item. Given ε > 0 choose a partition P of [a, b] such that U(f,P)−L(f,P) < ε. Now refine P to a partition Q by adding the point d. Call P1 the partition of[a, d] obtained from the points of Q up to d and P2 the remaining points of Q (including d)that form a partition of [d, c]. Then

U(f,P1)− L(f,P1) =∑i:xi<d

(Mi −mi)(xi+1 − xi) ≤ U(f,P)− L(f,P) < ε .

This means f is integrable on [a, d]. Similarly it is integrable on [d, c]. Furthermore, we have

L(f,P1) ≤∫ d

a

f(x) dx ≤ L(f,P1) + ε ,

and

L(f,P2) ≤∫ c

d

f(x) dx ≤ L(f,P2) + ε .

Combining these with

L(f,P1) + L(f,P2) = L(f,Q) ≤∫ b

a

f(x) dx ≤ L(f,P1) + L(f,P2) + ε ,

We find ∣∣∣∣∫ b

a

f(x) dx−∫ d

a

f(x) dx−∫ c

d

f(x) dx

∣∣∣∣ < 3ε .

Taking ε to zero gives the result.

Let us give one more important property of the integral.

Proposition 8.2.2 (Triangle inequality for integrals). Let f : [a, b]→ R be integrable. Thenso is |f | and ∣∣∣∣∫ b

a

f(x) dx

∣∣∣∣ ≤ ∫ b

a

|f(x)| dx .

Proof. Let ε > 0 and choose a partition P of [a, b] such that U(f,P)− L(f,P) < ε. For theproof we use the fact (which you can check using the triangle inequality) that for any setS ⊂ R and bounded function g : S → R,

supx∈S|f(x)| − inf

x∈S|f(x)| ≤ sup

x∈Sf(x)− inf

x∈Sf(x) .

98

This implies thatU(|f |,P)− L(|f |,P) ≤ U(f,P)− L(f,P) < ε ,

so |f | ∈ R.

To prove the inequality in the proposition, note that f(x) ≤ |f(x)| for all x, so∫ baf(x) dx ≤∫ b

a|f(x)| dx. Similarly −f(x) ≤ |f(x)|, so −

∫ baf(x) dx =

∫ ba(−f(x)) dx ≤

∫ ba|f(x)| dx.

Combining these gives the inequality.

In fact this is an instance of a more general theorem, stated in Rudin. We will not proveit; the proof is similar to the above (but more complicated).

Theorem 8.2.3. Suppose that f : [a, b]→ [c, d] is integrable and φ : [c, d]→ R is continuous.Then φ ◦ f is integrable.

Proof. See Rudin, Thm. 6.11.

From this theorem we find more integrable functions:

• If f is integrable on [a, b] then so is f 2. This follows by taking φ(x) = x2 in the abovetheorem.

• If f and g are integrable on [a, b] then so is fg. This follows by writing

fg =1

4

[(f + g)2 − (f − g)2

].

8.3 Fundamental theorems

Of course we do not always have to compute integrals by hand. As we learn in calculus, wecan compute an integral if we know the “antiderivative” of the function. Stated precisely,

Theorem 8.3.1 (Fundamental theorem of calculus part I). Let f : [a, b] → R be integrableand F : [a, b]→ R a continuous function such that F ′(x) = f(x) for all x ∈ (a, b). Then

F (b)− F (a) =

∫ b

a

f(x) dx .

Proof. Since f is integrable, given ε > 0 we can find a partition P such that U(f,P) −L(f,P) < ε. We will use the mean value theorem to relate values of f in the subintervals tovalues of F . That is, writing P = {x1 < · · · < xn}, we can find for each i = 1, . . . , n − 1 apoint ci ∈ (xi, xi+1) such that

F (xi+1)− F (xi) = f(ci)(xi+1 − xi) .

Then we have

L(f,P) ≤n−1∑i=1

f(ci)(xi+1 − xi) ≤ L(f,P) + ε .

99

Furthermore

L(f,P) ≤∫ b

a

f(x) dx ≤ L(f,P) + ε .

Using the equation derived by the mean value theorem above,

n−1∑i=1

f(ci)(xi+1 − xi) =n−1∑i=1

[F (xi+1)− F (xi)] = F (b)− F (a) .

Combining with the above, ∣∣∣∣∫ b

a

f(x) dx− [F (b)− F (a)]

∣∣∣∣ < ε

and we are done.

As we learn in calculus, we are able now to say, for example, that∫ b

a

cosx dx = sin(b)− sin(a)

and ∫ b

a

xndx =1

n+ 1

[bn+1 − an+1

].

There is a second fundamental theorem of calculus. Whereas the first is about integratinga derivative, the second is about differentiating an integral. Both of them say that integrationand differentiation are inverse operations. For example, in the first, when we start with Fand differentiate to get a function f , we integrate back to get F (in a sense).

Theorem 8.3.2 (Fundamental theorem of calculus part II). Let f : [a, b]→ R be continuous.Define F : [a, b]→ R by

F (x) =

∫ x

a

f(t) dt .

Then F is differentiable on [a, b] with F ′(x) = f(x) for all x.

Proof. Let x ∈ [a, b); the case of x = b is similar and is calculated as a left derivative. Forh > 0,

F (x+ h)− F (x)

h=

1

h

[∫ x+h

a

f(t) dt−∫ x

a

f(t) dt

]=

1

h

∫ x+h

x

f(t) dt .

Let ε > 0. Since f is continuous at x we can find δ > 0 such that if |t − x| < δ then|f(t)− f(x)| < ε. This means that if 0 < h < δ then∣∣∣∣F (x+ h)− F (x)

h− f(x)

∣∣∣∣ =1

h

∣∣∣∣∫ x+h

x

f(t) dt−∫ x+h

x

f(x) dt

∣∣∣∣ =1

h

∣∣∣∣∫ x+h

x

(f(t)− f(x)) dt

∣∣∣∣≤ 1

h

∫ x+h

x

|f(t)− f(x)| dt

≤ (1/h)εh = ε .

100

In other words,

limh→0+

∣∣∣∣F (x+ h)− F (x)

h− f(x)

∣∣∣∣ = 0 .

A similar argument works for the left limit (in the case that x 6= a), using

F (x− h)− F (x)

h=

1

h

∫ x−h

x

f(t) dt .

and completes the proof.

8.4 Change of variables, integration by parts

We will now prove the “u-substitution” rule for integrals. As you know from calculus, thisis a valuable tool to solve for the value of many definite integrals. The proof is essentiallya combination of the chain rule and the fundamental theorem of calculus. Note that in itsstatement, the range of f is a closed interval. This follows from the fact that f is continuouson a closed interval. Indeed, the image must be connected and compact, therefore a closedinterval as well.

Theorem 8.4.1 (Substitution rule). Let f : [a, b] → R be C1 and write [c, d] for the rangeof f . If g : [c, d]→ R is continuous then∫ f(b)

f(a)

g(t) dt =

∫ b

a

g(f(x))f ′(x) dx .

Proof. Define a function F : [c, d]→ R by

F (x) =

∫ x

f(a)

g(t) dt .

Then because g is continuous, by the fundamental theorem of calculus II, F is differentiableand F ′(x) = g(x) (giving actually F ∈ C1). Furthermore as f is differentiable, the functionF ◦ f : [a, b] → R is differentiable with (F ◦ f)′(x) = F ′(f(x))f ′(x). Last, F ′ is continuousand f is integrable, so by Theorem 8.2.3, F ′ ◦ f is integrable. Since f ′ is continuous, it isalso integrable, so the product of F ′ ◦ f and f ′ is integrable. By the fundamental theoremof calculus I,

F (f(b))− F (f(a)) =

∫ b

a

F ′(f(x))f ′(x) dx .

Plugging in, ∫ f(b)

f(a)

g(t) dt =

∫ b

a

g(f(x))f ′(x) dx .

Just as the substitution rule is related to the chain rule, integration by parts is relatedto the product rule.

101

Theorem 8.4.2 (Integration by parts). Let f, g : [a, b]→ R be C1. Then∫ b

a

f(x)g′(x) dx = f(b)g(b)− f(a)g(a)−∫ b

a

f ′(x)g(x) dx .

Proof. This follows from the product rule since both of f ′g and fg′ is integrable.

8.5 Exercises

1. Let f : [0, 1]→ R be continuous.

(a) Suppose that f(x) ≥ 0 for all x and that∫ 1

0f(x) dx = 0. Show that f is identically

zero.

(b) Suppose that f is not necessarily non-negative but that∫ baf(x) dx = 0 for all

a, b ∈ [0, 1] with a < b. Show that f is identically zero.

2. Let f : [0, 1]→ R be continuous. Show that

limn→∞

∫ 1

0

xnf(x) dx = 0 .

Hint. For c near 1, consider [0, c] and [c, 1] separately.

3. Let f : [0, 1]→ R be continuous. Prove that

limn→∞

[∫ 1

0

|f(x)|n dx

]1/n= max

x∈[0,1]|f(x)| .

4. Define f : [0, 1]→ R by

f(x) =

{0 if x /∈ Q1n

if x = mn∈ Q, where m and n have no common divisor

.

Use Theorem 6.6 in Rudin to prove that f is Riemann integrable.

5. Let f and g be continuous functions on [0, 1] with g(x) ≥ 0 for all x. Show there existsc ∈ [0, 1] such that ∫ 1

0

f(x)g(x) dx = f(c)

∫ 1

0

g(x) dx .

6. (a) Show that the Euler-Mascheroni constant

γ = limn→∞

[n∑k=1

1

k− log n

]exists .

102

Hint. Write the above quantity as

1

n+

n−1∑k=1

1

k−∫ n

1

dx

x=

1

n+

n−1∑k=1

∫ k+1

k

[1

k− 1

x

]dx .

Show the last sum converges.

(b) Use the last part to find the limit

limn→∞

[1

n+ · · ·+ 1

2n

].

7. Let {fn} be a sequence of continuous functions on [0, 1]. Suppose that {fn} convergesuniformly to a function f . Recall from last problem set that this means that for anyε > 0 there exists N such that n ≥ N implies that |fn(x)− f(x)| < ε for all x ∈ [0, 1].Show that

limn→∞

∫ 1

0

fn(x) dx =

∫ 1

0

f(x) dx .

Give an example to show that we cannot only assume fn → f pointwise (meaning thatfor each fixed x ∈ [0, 1], fn(x)→ f(x)).

Hint. Use the inequality∣∣∣∫ 1

0g(x) dx

∣∣∣ ≤ ∫ 1

0|g(x)| dx, valid for any integrable g.

8. Suppose that {fn} is a sequence of functions in C1([0, 1]) and that the sequence {f ′n}converges uniformly to some function g. Suppose there exists some c ∈ [0, 1] such thatthe sequence {fn(c)} converges. By the fundamental theorem of calculus, we can writefor x ∈ [0, 1]

fn(x) = fn(c) +

∫ x

c

f ′n(t) dt .

(a) Show that {fn} converges pointwise to some function f .

(b) Show that f is differentiable and f ′(x) = g(x) for all x. (You will need to useTheorem 7.12 in Rudin.)

Remark. The above result gives a method to prove the form of a derivative of a powerseries. Suppose that f(x) =

∑∞n=0 anx

n has radius of convergence R > 0. Setting

fn(x) =n∑j=0

ajxj and g(x) =

∞∑j=1

jajxj−1 ,

one can show using the Weierstrass M -test that for any r with 0 < r < R, f ′n → guniformly on (−r, r). We can then conclude that f ′(x) = g(x).

103

9. You can solve either this question or the next one. In this problem we willshow part of Stirling’s formula. It states that

limn→∞

n!

nne−n√n→√

2π .

We will only show the limit exists.

(a) Show that

log

(nn

n!

)=

n−1∑k=1

[∫ k+1

k

log(x/k) dx

]+ n− 1− log n .

Use a change of variable u = x/k and continue to show that this equals

n−1∑k=1

[k

∫ 1/k

0

[log(1 + u)− u] du+1

2k

]+ n− 1− log n .

(b) Prove that n!nne−n

√n

converges if and only if

limn→∞

n−1∑k=1

[k

∫ 1/k

0

[log(1 + u)− u] du

]exists .

(c) Show that for u ∈ [0, 1],

−u2/2 ≤ log(1 + u)− u ≤ 0

and deduce that the limit in part (b) exists.

Hint. Use Taylor’s theorem.

10. You can solve either this question or the previous one. In this question, youwill work out an alternate derivation of existence of the limit in Stirling’s formula.

(a) Define a continuous function g such that g(n) = log n for n ∈ N and g(x) is linearin each interval [n, n+ 1]. Show that for n large enough,

log n+x− nn+ 1

≤ g(x) ≤ log x ≤ log n+x− nn

for x ∈ [n, n+ 1] .

(b) Let Sn =∫ n1

[log x− g(x)] dx. Use part (a) to show that (Sn) is Cauchy and thusconverges. Compute directly that∫ n

1

log x dx = n log n− n+ 1

and∫ n1g(x) dx = log n! − 1

2log n. Conclude that the limit in Stirling’s formula

exists.

104

A Real powers

The question is the following: we know what 22 or 23 means, or even 22/3, the number whosecube equals 22. But what does 2

√2 mean? We will give the definition Rudin has in the

exercises of Chapter 1. We will only use the following facts for r, s > 0, n,m ∈ Z:

• rn+m = rnrm.

• (rn)m = rmn.

• (rs)n = rnsn.

• if r > 1 and m ≥ n then rm ≥ rn. If r < 1 and m ≥ n then rm ≤ rn.

• if s < r and n > 0 then sn < rn. If s < r and n < 0 then sn > rn.

A.1 Natural roots

We first define the n-th root of a real number, for n ∈ N.

Theorem A.1.1. For any r > 0 and n ∈ N there exists a unique positive real number ysuch that yn = r.

Proof. The proof is Theorem 1.21 in Rudin. The idea is to construct the set

S = {x > 0 : xn ≤ r}

and to show that S is nonempty, bounded above, and thus has a supremum. Calling y thissupremum, he then shows yn = r. The proof of this is somewhat involved and is similar toour proof (from the first lecture) that {a ∈ Q : a2 < 2} does not have a greatest element.

To show there is only one such y, we note that 0 < y1 < y2 implies that yn1 < yn2 and soif y1 6= y2 are positive then yn1 6= yn2 .

This definition extends to integer roots.

Definition A.1.2. If r > 0 and n ∈ N we define r−1/n as the unique positive real number ysuch that yn = 1/r.

A.2 Rational powers

The above definitions allow us to define rational powers.

Definition A.2.1 (Preliminary definition of rational powers). If r > 0 and m,n ∈ N wedefine rm/n to be the unique positive real number y such that yn = rm. Also r−m/n is definedas (1/r)m/n.

Because a rational number can have more than one representation m/n we need to showthis is well defined.

105

Proposition A.2.2. If a positive a ∈ Q can be represented by m/n and p/q for m,n, p, q ∈ Nthen for all r > 0,

rm/n = rp/q .

Proof. First note that (rm/n)nq = ((rm/n)n)q = rmq and (rp/q)nq = ((rp/q)q)n = rpn. Howeveras m/n = p/q we have pn = mq and so these numbers are equal. There is a unique nq-throot of this number, so rm/n = rp/q.

Note that the above proof applies to negative rational powers: suppose that r > 0 anda ∈ Q is negative such that a = −m/n = −p/q. Then

r−m/n = (1/r)m/n = (1/r)p/q = r−p/q .

Definition A.2.3 (Correct definition of rational powers). If r > 0 and a > 0 is rationalwe define ra = rm/n for any m,n ∈ N such that a = m/n. If a < 0 is rational we definera = (1/r)−a.

Properties of rational powers. Let a, b ∈ Q and r, s > 0.

• If a = m/n for m ∈ Z and n ∈ N then ra is the unique positive number such that(ra)n = rm.

Proof. For m ≥ 0 this is the definition. For m < 0, this is because (ra)n = ((1/r)−a)n =(1/r)−m = rm and if s is any other positive number satisfying sn = rm then uniquenessof n-th roots gives s = ra.

• ra+b = rarb.

Proof. Choose m, p ∈ Z and n, q ∈ N such that a = m/n and b = p/q. Then a + b =mq+npnq

and therefore ra+b is the unique positive number such that (ra+b)nq = rmq+np.But we can just compute

(rarb)nq = ((ra)n)q((rb)q)n = rmqrnp = rmq+np .

And by uniqueness we get rarb = ra+b.

• (ra)b = rab.

Proof. Write a = m/n and b = p/q for m, p ∈ Z and n, q ∈ N. Then rab is the uniquepositive number such that (rab)nq = rmp. But

((ra)b)nq = (((ra)b)q)n = ((ra)p)n = ((ra)n)p = (rm)p = rmp ,

giving (ra)b = rab.

• (rs)a = rasa.

106

Proof. Again write a = m/n for m ∈ Z and n ∈ N. Then (rs)a is the unique positivenumber such that ((rs)a)n = (rs)m. But

(rasa)n = (ra)n(sa)n = rmsm = (rs)m .

• If r > 1 and a ≥ b then ra ≥ rb. If r < 1 and a ≥ b are rational then ra ≤ rb.

Proof. Suppose first that r > 1 and a ≥ 0 with a = m/n for m,n ∈ N. Then if ra < 1,we find rm < 1n = 1, a contradiction, as rm > 1. So ra > 1. Next if a ≥ b thena− b ≥ 0 so ra−b ≥ 1. This gives ra = ra−brb ≥ rb.

If r < 1 then ra(1/r)a = 1a = 1, so r−a = (1/r)a. Similarly r−b = (1/r)b. So since1/r > 1 we get r−a = (1/r)a ≥ (1/r)b = r−b. Multiplying both sides by rarb we getra ≤ rb.

• If s < r and a > 0 then sa < ra. If s < r and a < 0 then sa > ra.

Proof. Let a = m/n with m,n ∈ N. Then if sa ≥ ra we must have sm = (sa)n ≥(ra)n = rm. But this is a contradiction since sm < rm. This proves the first statement.For the second, 1/s > 1/r so sa = (1/s)−a > (1/r)−a = ra.

A.3 Real powers

We define a real power as a supremum of rational powers.

Definition A.3.1. Given r > 1 and t ∈ R we set

rt = sup{ra : a ∈ Q and a ≤ t} .

If 0 < r < 1 then define rt = (1/r)−t.

Proposition A.3.2. If a ∈ Q then for r > 0, the definition above coincides with the rationaldefinition.

Proof. For this proof, we take ra to be the defined as in the rational powers section.Suppose first that r > 1. Clearly ra ∈ {rb : b ∈ Q and b ≤ a}. So to show it is the

supremum we need only show it is an upper bound. This follows from the fact that b ≤ aimplies rb ≤ ra (proved above).

If 0 < r < 1 then ra(r−1)−a = (1/r)−a so the definitions coincide here as well.

Properties of real powers. Let t, u ∈ R and r, s > 0.

• rt+u = rtru.

107

Proof. We will use the following statement, proved on the homework. If A and B arenonempty subsets of [0,∞) which are bounded above then define AB = {ab : a ∈A, b ∈ B}. We have

sup(AB) = supA supB . (7)

It either of the sets consists only of 0, then the supremum of that set is 0 and bothsides above are 0. Otherwise, both sets (and therefore also AB) contain positiveelements. For any element c ∈ AB we have c = ab for some a ∈ A, b ∈ B. Thereforec = ab ≤ supA supB and therefore this is an upper bound for AB. As sup(AB) is theleast upper bound, we get sup(AB) ≤ supA supB. Assuming now for a contradictionthat we have strict inequality, because supA > 0 we also have sup(AB)/ supA < supB.Thus there exists b ∈ B such that sup(AB)/ supA < b. As b must be positive, we alsohave sup(AB)/b < supA and there exists a ∈ A such that sup(AB)/b < a, givingsup(AB) < ab. This is clearly a contradiction.

Now to prove the property, suppose first that r > 1. By the statement we just proved,we need only show that

{rb : b ∈ Q and b ≤ t+ u} = AB ,

where A = {rc : c ∈ Q and c ≤ t} and B = {rd : d ∈ Q and d ≤ u}. (This is becausethese are sets of non negative numbers.) This holds because each rational b ≤ t + ucan be written as a sum of two rationals c, d such that c ≤ t and d ≤ u.

For 0 < r < 1 we have rt+u = (1/r)−(t+u) = (1/r)(−t)+(−u) = (1/r)−t(1/r)−u = rtru.

• (rs)t = rtst.

Proof. We first note thatr−t = 1/rt . (8)

This is true because r−trt = r0 = 1, so r−t = 1/rt.

For the property, if r, s > 1 then we can just use equation (7), noting that {(rs)a :a ∈ Q and a ≤ t} = AB, where A = {ra : a ∈ Q and a ≤ t} and B = {sa : a ∈Q and a ≤ t}. If 0 < r < 1 but s > 1 with rs > 1 we get (rs)t/rt = (rs)t(1/r)−t.We now use equation (7) again, noting that {sa : a ∈ Q and a ≤ t} = AB, whereA = {(rs)a : a ∈ Q and a ≤ t} and B = {(1/r)a : a ∈ Q and a ≤ −t}. This gives(rs)t/rt = st. This same proof works if r > 1 but 0 < s < 1 with rs > 1. If 0 < r < 1but s > 1 with rs < 1 we consider st/(rs)t = st(1/(rs))−t and use the above argument.This also works in the case r > 1 but 0 < s < 1 with rs < 1. Finally, if 0 < r < 1 and0 < s < 1 then (rs)t = (1/(rs))−t = ((1/r)(1/s))−t = (1/r)−t(1/s)−t = rtst.

• (rt)u = rtu.

108

Proof. We will first show the equality in the case r > 1 and t, u > 0. We begin withthe fact that (rt)u is an upper bound for {ra : a ∈ Q and a ≤ tu}. So let a ≤ tu berational and assume further that a > 0. In this case we can write a = bc for b, c ∈ Qand b ≤ t, c ≤ u. By properties of rational exponents, we have ra = (rb)c. As rb ≤ rt

(by definition) we get from monotonicity that (rb)c ≤ (rt)c. But this is an element ofthe set {(rt)d : d ∈ Q and d ≤ u}, so (rt)c ≤ (rt)u. Putting these together,

ra = (rb)c ≤ (rt)c ≤ (rt)u .

This shows that (rt)u is an upper bound for {ra : a ∈ Q and 0 < a ≤ tu}. For the casethat a < 0 we can use monotonicity to write ra ≤ r0 ≤ (rt)u. Putting this togetherwith the case a > 0 gives that (rt)u is an upper bound for {ra : a ∈ Q and a ≤ tu}and therefore rtu ≤ (rt)u.

To prove that (rt)u ≤ rtu we must show that rtu is an upper bound for {(rt)a : a ∈Q and a ≤ u}. For this we observe that rt > 1. This holds because t > 0 and thereforewe can find some rational b with 0 r0 = 1. Now let a berational with 0 < a ≤ u; we claim that (rt)a ≤ rtu. Proving this will suffice since ifa < 0 then (rt)a < (rt)0 = 1 ≤ rtu. To show the claim, note that if we show thatrt ≤ (rtu)1/a we will be done. This is by properties of rational exponents: we wouldthen have

(rt)a ≤((rtu)1/a

)a= rtu .

So we are reduced to proving that

sup{rb : b ∈ Q and b ≤ t} ≤ (rtu)1/a ,

which follows if we show that for each b ∈ Q such that b ≤ t, we have rb ≤ (rtu)1/a.Again, this is true if rab ≤ rtu because then rb = (rab)1/a ≤ (rtu)1/a. But a ≤ t andb ≤ u so rab ≤ rtu. This completes the proof of (rt)u = rtu in the case r > 1 andt, u > 0.

In the case r > 1 but t > 0 and u < 0, we can use (8):

(rt)u = 1/(rt)−u = 1/r−tu = rtu .

If instead r > 1 but t < 0 and u > 0,

(rt)u = (1/r−t)u = 1/(r−t)u = 1/r−tu = rtu .

Here we have used that for s > 0 and x ∈ R, (1/s)x = 1/sx, which can be verified as1 = (s(1/s))x = sx(1/s)x. Last if r > 1 but t < 0 and u < 0, we compute

(rt)u = ((1/r)−t)u = 1/(r−t)u = 1/r−tu = rtu ,

completing the proof in the case r > 1.

If 0 < r < 1 then(rt)u = ((1/r)−t)u = (1/r)−tu = rtu .

109

• If r > 1 and u ≤ t then ru ≤ rt. If 0 < r < 1 and u ≤ t then ru ≥ rt.

Proof. Assume r > 1. If u = 0 and t > 0 then we can find a rational b such that0 r0 = 1. For general u ≤ t we note 1 ≤ rt−u, so multiplyingboth sides by the (positive) ru we get the result.

If 0 < r < 1 then ru = (1/r)−u ≥ (1/r)−t = rt.

• If s < r and t > 0 then st < rt. If s < r and t < 0 then st > rt.

Proof. First consider the case that s = 1. Then r > 1 and for any t > 0 we can find arational b such that 0 r0 = 1. For general s < r we writert = st(r/s)t > st. If t < 0 then st = (1/s)−t > (1/r)−t = rt.

B Logarithm and exponential functions

B.1 Logarithm

We will use the integral definition of the natural logarithm. For x > 0 define

log x =

∫ x

1

1

tdt .

This is defined because 1/x is continuous on (0,∞).

Properties of logarithm.

• log 1 = 0.

• log is C∞ on (0,∞).

• log is strictly increasing and therefore injective.

Proof. The derivative is 1/x, which is positive.

• For x, y > 0, log(xy) = log x+ log y. Therefore log(1/x) = − log x.

Proof. For a fixed y > 0 define f(x) = log(xy)− log y. We have

f ′(x) =y

xy=

1

x=

d

dxlog x .

Therefore f(x) − log x has zero derivative and must be a constant. Taking x = 1, weget

f(1)− log 1 = log y − log y = 0 ,

so f(x) = log x. This completes the proof.

110

• The range of log is R.

Proof. We first claim that limx→∞ log x = ∞. Because log is strictly increasing, itsuffices to show that the set {log x : x ∈ R} is unbounded above. Note that

log 2 =

∫ 2

1

1

tdt ≥

∫ 2

1

1

2dt = 1/2 .

Therefore log(2n) = n log 2 ≥ n/2. This proves the claim.

Because log is continuous and approaches infinity as x → ∞, the intermediate valuetheorem, combined with the fact that log 1 = 0, implies that the range of log includes[0,∞). Using log(1/x) = − log x, we get all of R.

B.2 Exponential function

Because log is strictly increasing and differentiable, exercise 1, Chapter 7 implies that theinverse function of log exists and is differentiable. We define the inverse to be the exponentialfunction:

for x ∈ R, ex is the number such that log(ex) = x .

Its derivative can by found using the chain rule:

x = log(ex), so 1 =1

exd

dxex ,

or ddxex = ex.

Properties of exponential.

• e0 = 1.

• ex is C∞ on R.

• For x, y ∈ R, ex+y = exey.

Proof. From properties of log,

log(ex+y) = x+ y = log(ex) + log(ey) = log(exey) .

Since log is injective, this shows ex+y = exey.

• ex > 0 for all x. Therefore the exponential function is strictly increasing.

Proof. Because ex is the inverse function of log x, which is defined on (0,∞), its rangeis (0,∞), giving ex > 0.

111

• For any x,

ex =∞∑n=0

xn

n!.

Proof. This follows from Taylor’s theorem. For any x, the n-th derivative of the ex-ponential function evaluated at x is simply ex. Therefore expanding at x = 0, for anyN ≥ 1,

ex =N−1∑n=0

f (n)(0)

n!xn +

fN(cN)

N !xN =

N−1∑n=0

xn

n!+ecN

N !xN ,

with cN some number between 0 and x. This remainder term is bounded by

ecN

N !xN ≤ ex

xN

N !→ 0 ,

because xN/N !→ 0 as N →∞. This follows because the ratio test gives convergenceof∑xn/n!, so the n-th term must go to 0. By the corollary to Taylor’s theorem, we

get ex =∑∞

n=0xn

n!.

• Writing e = e1, the exponential function is the x-th power of e (defined earlier in termsof suprema).

Proof. For ease of reading, write exp(x) for the function we have defined here and ex

for the x-th power of e, defined in terms of suprema. Then for x = m/n ∈ Q withn ∈ N, we have

(exp(x))n = exp(m/n)n = exp(m) = exp(1)m = em .

Because em/n was defined as the unique positive number y such that yn = em, we haveexp(x) = ex. Generally for x ∈ R we defined

ex = sup{eq : q ∈ Q and q < x} .

(This was the definition for exponents whose bases are ≥ 1, which is true in our casebecause e1 ≥ e0 = 1.) Using equivalence over rationals,

ex = sup{exp(q) : q ∈ Q and q < x} .

However exp is an increasing function, so writing S for the set whose supremum we takeabove, exp(x) ≥ supS = ex. On the other hand, because exp is continuous at x, we canpick any sequence qn of rationals converging up to x and we have exp(qn) → exp(x).This implies that exp(x)− r is not an upper bound for S for any r > 0 and thereforeexp(x) = supS = ex.

112

We now show that the exponential function can be attained by the standard limit

ex = limn→∞

(1 +

x

n

)n.

First we use the binomial formula(1 +

x

n

)n=

n∑j=0

(n

j

)(xn

)j=

n∑j=0

n!

j!(n− j)!

(xn

)j=

n∑j=0

n(n− 1) · · · (n− j + 1)

njxj

j!.

To show the limit, let ε > 0. By convergence of∑∞

j=0|x|jj!

, we may choose J such that

∞∑j=J+1

|x|j

j!< ε/3 .

Because∑J

j=0n(n−1)···(n−j+1)

njxj

j!is a finite sum and the j-th term approaches xj

j!as n → ∞,

we can pick N such that if n ≥ N then∣∣∣∣∣J∑j=0

n(n− 1) · · · (n− j + 1)

njxj

j!−

J∑j=0

xj

j!

∣∣∣∣∣ < ε/3 .

Thus by the triangle inequality, we find for n ≥ N ,∣∣∣∣∣(1 +x

n

)n−∞∑j=0

xj

j!

∣∣∣∣∣ ≤∣∣∣∣∣∞∑

j=J+1

xj

j!

∣∣∣∣∣+

∣∣∣∣∣J∑j=0

n(n− 1) · · · (n− j + 1)

njxj

j!−

J∑j=0

xj

j!

∣∣∣∣∣+

∣∣∣∣∣n∑

j=J+1

n(n− 1) · · · (n− j + 1)

njxj

j!

∣∣∣∣∣< 2ε/3 +

n∑j=J+1

|x|j

j!

< ε .

B.3 Sophomore’s dream

We end this appendix with a strange identity that is for some reason called the “Sophomore’sdream.” It is ∫ 1

0

x−x dx =∞∑n=1

n−n .

113

To prove this, we need to define the function x−x. It is given by x−x = exp(−x log x). Sothe identity reads ∫ 1

0

e−x log x dx =∞∑n=1

n−n .

For this integral to make sense, as the integrand is not defined at x = 0, we must use theright limit. Note that by l’Hopital’s rule (which we didn’t cover but it is in Rudin),

limx→0+

−x log x = limx→0+

− log x

1/x= lim

x→0+

−1/x

−1/x2= 0 .

Using continuity of the exponential function, we find

limx→0+

x−x = 1 ,

so we can continuously extend x−x to 0 by defining 00 = 1. Thus x−x is then continuous on[0, 1] and integrable.

To find the integral we will use a power series expansion of ex:

ex =∞∑n=0

xn

n!,

which has radius of convergence R =∞. Therefore by the remark after exercise 6, Chapter8, for any M > 0, this series converges uniformly for x in [−M,M ]. (The proof uses theWeierstrass M -test.) Because the number |x log x| is bounded by e−1 on the interval [0, 1](do some calculus),

e−x log x =∞∑n=0

(−x log x)n

n!converges uniformly on [0, 1] .

We now use exercise 5, Chapter 8, which says that if (fn) is a sequence of continuous functions

that converges uniformly on [0, 1] to a function f then∫ 1

0fn(x) dx →

∫ 1

0f(x) dx. Noting

that an infinite series of functions is just a limit of the sequence of partial sums (whichconverges uniformly in our case), we get∫ 1

0

x−x dx =∞∑n=0

∫ 1

0

(−x log x)n

n!dx =

∞∑n=0

1

n!

∫ 1

0

(−x log x)n dx .

Now we compute the integral∫ 1

0(−x log x)n dx using integration by parts. We take

u = (− log x)n and dv = xn dx to get du = (−1)nn(log x)n−1/x dx and v = xn+1/(n+ 1):∫ 1

0

(x log x)n dx =(− log x)nxn+1

n+ 1

∣∣∣∣10

− (−1)nn

n+ 1

∫ 1

0

xn(− log x)n−1 dx

=n

n+ 1

∫ 1

0

xn(log x)n−1 dx .

114

Repeating this, we find ∫ 1

0

(−x log x)n dx =n!

(n+ 1)n+1.

So plugging back in, we find∫ 1

0

x−x dx = −∞∑n=0

1

(n+ 1)n+1=∞∑n=1

n−n .

C Dimension of the Cantor set

In this section we will discuss how to assign a dimension to the Cantor set. One way isthrough the use of Hausdorff dimension. We will start with definitions and examples. Thistreatment is based on notes of J. Shah from UChicago.

C.1 Definitions

For any set S ⊂ R write |S| for the diameter of S:

|S| = sup{|x− y| : x, y ∈ S} .

For example, we have |[0, 1]| = 1, |Q ∩ [0, 1]| = 1 and |(0, 1) ∪ (2, 3)| = 3.

Definition C.1.1. Let S ⊂ R. A countable collection {Cn} of subsets of R is called acountable cover of S if

S ⊂ ∪∞n=1Cn .

Note that the sets in a countable cover can be any sets whatsoever. For example, theydo not need to be open or closed.

Definition C.1.2. If {Cn} is a countable collection of sets in R and α > 0, the α-totallength of {Cn} is

∞∑n=1

|Cn|α .

If α > 1 then it has the effect of increasing the diameter (that is, |Cn|α > |Cn|) when|Cn| is large (bigger than 1) and decreasing it when |Cn| is small (less than 1).

Example 1. Consider the interval [0, 1]. Let us build a very simple cover of this set byfixing n and choosing our (finite) cover {C1, . . . , Cn} by

Ci =

[i− 1

n,i

n

].

For instance, for n = 4 we have

[0, 1/4], [1/4, 1/2], [1/2, 3/4] and [3/4, 1] .

115

Computing the α-total length of this cover:

n∑i=1

∣∣∣∣[i− 1

n,i

n

]∣∣∣∣α =n

nα.

The limit as n approaches ∞ is ∞ if α < 1

1 if α = 1

0 if α > 1

.

This result gives us some hint that the dimension of a set is related to the α-total length ofcountable covers of the set. Specifically we make the following definition:

Definition C.1.3. If S ⊂ R has |S| <∞ and α > 0 we define the α-covered length of S as

Hα(S) = inf

{∞∑n=1

|Cn|α : {Cn} is a countable cover of S

}.

The Hausdorff dimension is defined as

dimH(S) = inf{α > 0 : Hα(S) = 0} .

It is an exercise to show that for all 0 < α < dimH(S), we have Hα(S) > 0. Also, setting00 = 1 then H0(S) > 0 for all S Thus we could define the Hausdorff dimension as

sup{α ≥ 0 : Hα(S) > 0} .

Note that example 1 shows that dimH([0, 1]) ≤ 1. To show the other inequality, we mustshow that for all α < 1, Hα([0, 1]) > 0. To do this, let {Cn} be a countable cover of [0, 1].We may replace the Cn’s by Dn = Cn ∩ [0, 1], since the Dn’s will still cover [0, 1] and willhave smaller α-length. For α < 1 we then have

∞∑n=1

|Dn|α ≥∞∑n=1

|Dn| ,

because |Dn| ≤ 1. Now it suffices to show.

Lemma C.1.4. If {Dn} is a countable cover of [0, 1] then

∞∑n=1

|Dn| ≥ 1 .

Proof. The proof is an exercise.

116

Assuming the lemma, we have Hα([0, 1]) ≥ 1 for all α < 1 and therefore dimH([0, 1]) = 1.If the concept of Hausdorff dimension is to agree with our current notion of dimension

it had better be that each subset of R has dimension no bigger than 1. This is indeed thecase; we can argue similarly to before. If S ⊂ R has |S| <∞ then we can find M > 0 suchthat S ⊂ [−M,M ]. Now for each n define a cover {C1, . . . , Cn} by

Ci =

[−M + 2M

i− 1

n,−M + 2M

i

n

].

As before, for α > 1, the α-total length of {C1, . . . , Cn} is

n

[2M

n

]α→ 0 as n→∞ .

Therefore Hα(S) = 0 and dimH(S) ≤ 1.

Example 2. Take S to be any countable set with finite diameter (for instance the rationalsin [0, 1]). We claim that dimH(S) = 0. To show this we must prove that for all α > 0,Hα(S) = 0. Let ε > 0 and define a countable cover of S by first enumerating the elementsof S as {s1, s2, . . .} and for i ∈ N, letting Ci be any interval containing si of length (ε/2n)1/α

(note that this is a positive number). Then the α-total length of the cover is

∞∑n=1

ε

2n= ε ;

therefore Hα(S) ≤ ε. This is true for all ε > 0 so Hα(S) = 0.

C.2 The Cantor set

Let S be the Cantor set. To remind you, the construction is as follows. We start withS0 = [0, 1]. We remove the middle third of S0 to get S1 = [0, 1/3] ∪ [2/3, 1]. In general, atthe k-th step we have a set Sk which is a union of 2k intervals of length 3−k. We then removethe middle third of each interval to get Sk+1. The definition of S is

S = ∩∞k=0Sk .

Theorem C.2.1. The Hausdorff dimension of the Cantor set is

dimH(S) =log 2

log 3= log3 2 .

Proof. Set α = log 2/ log 3. We first prove that dimH(S) ≤ α. For this we must show thatif β > α then Hβ(S) = 0. Pick k ≥ 0 and let I1, . . . , I2k be the intervals of length 3−k thatcomprise Sk, the set at the k-th level of the construction of the Cantor set. Since S ⊂ Sk,this is a cover of S. We compute the β-total length of the cover. It is

2k∑j=1

|Ij|β =2k∑j=1

3−βk = ek[log 2−β log 3] ,

117

and this approaches zero as k → ∞. Note that we have used above that, for example2k = ek log 2. Therefore Hβ(S) = 0 and dimH(S) ≤ α.

For the other direction (to prove dimH(S) ≥ α) we will show that Hα(S) > 0. Let {Cn}be a countable cover of S. We will give a bound on the α-total length of {Cn}. As before,we may assume that each Cn is actually a subset of [0, 1]. By compactness one can show thefollowing:

Lemma C.2.2. Given ε > 0 there exist finitely many open intervals D1, . . . , Dm such that∪∞n=1Cn ⊂ ∪mj=1Dj and

m∑j=1

|Dj|α <∞∑n=1

|Cn|α + ε .

Proof. The proof is an exercise. The idea is to first replace the Cn’s by closed intervals andthen slightly widen them, while making them open. Then use compactness.

Now choose k such that (1

3

)k≤ min{|Dj| : j = 1, . . . ,m} .

For l = 1, . . . , k let Nl be the number of sets Dj such that 3−l ≤ |Dj| < 3−l+1. Usingα = log 2/ log 3 and the definition of k, we find

m∑j=1

|Dj|α ≥k∑l=1

Nl3−lα =

k∑l=1

Nl2−l , (9)

so we will give a lower bound for the right side. Suppose that Dj has 3−l ≤ |Dj| < 3−l+1.Then Dj can intersect at most 2 of the intervals in Sl, the l-th step in the construction ofthe Cantor set. Since each of these intervals produces 2k−l subintervals at the k-th step ofthe construction, we find that Dj contains at most 2 · 2k−l subintervals at the k-th step ofthe construction. But there are only 2k subintervals at the k-th step so we find

2k ≤k∑l=1

Nl2 · 2k−l

or1

2≤

k∑l=1

Nl2−l .

Combining this with (9),m∑j=1

|Dj|α ≥ 1/2 .

Now using the previous lemma with ε = 1/4,∑∞

n=1 |Cn|α > 1/4 and Hα(S) ≥ 1/4. ThusdimH(S) ≥ α.

118

C.3 Exercises

1. Prove Lemma C.1.4.

2. Prove Lemma C.2.2.

3. Prove that if S ⊂ R with |S| <∞ has nonempty interior then show that dimH(S) = 1.

4. What is the Hausdorff dimension of a modified Cantor set where we remove the middle1/9-th of our intervals?

5. What is the Hausdorff dimension of the modified Cantor set from exercise 15, Chapter3?

119

Documents

Princeton MAT 215 All Lectures