192
Lectures on Real Analysis I Stephan Ramon Garcia Department of Mathematics Pomona College 610 N. College Ave. Claremont, CA 91711-6348 Preliminary Version Last Revised: January 21, 2008 http://pages.pomona.edu/˜sg064747 [email protected]

Analysis I Notes by Garcia

Embed Size (px)

DESCRIPTION

Lecture Notes of Real Analysis I by Stephan Garcia

Citation preview

Lectures on Real Analysis I

Stephan Ramon GarciaDepartment of Mathematics

Pomona College610 N. College Ave.

Claremont, CA 91711-6348

Preliminary Version

Last Revised: January 21, 2008

http://pages.pomona.edu/˜[email protected]

Contents

Lecture 1. Introduction 11.1. Preliminaries 11.2. Hippasus’ Theorem 11.3. A Nonconstructive Proof 21.4. A Third Proof of Hippasus’ Theorem 3

Lecture 2. The Archimedean Property and Its Consequences 52.1. The Archimedean Property 52.2. The Binomial Theorem & Bernoulli’s Inequality 62.3. An Analytic Proof of Hippasus’ Theorem 7

Lecture 3. The Least Upper Bound Property 93.1. The Least Upper Bound Property 93.2. The Existence of

√2 10

Lecture 4. Monotone Sequences and Series 134.1. The Monotone Sequence Property and Infinite Series 134.2. Series with Non-negative Terms and Decimal Expansions 14

Lecture 5. Bijections 175.1. Counting Without Counting 175.2. Galileo’s Paradox 175.3. Injections 185.4. Surjections 205.5. Bijections 21

Lecture 6. Cardinality 236.1. Cardinality 236.2. Countable Sets 24

Lecture 7. Cantor’s Theorem 277.1. Constructions with Countable Sets 277.2. Cantor’s Diagonal Argument 28

Lecture 8. The Continuum Hypothesis 308.1. Cantor’s Powerset Theorem 308.2. Russell’s Paradox 318.3. The Continuum Hypothesis 328.4. Digression on Geometry 33

Lecture 9. Normed Vector Spaces 35

i

ii CONTENTS

9.1. Vector Spaces 359.2. Norms on Vector Spaces 36

Lecture 10. Metric Spaces 3910.1. Metric Spaces 3910.2. Convergent Sequences 41

Lecture 11. Subsequences, Continuity 4311.1. Subsequences 4311.2. Continuity 44

Lecture 12. Sequences and Continuity 4612.1. Sequential Characterization of Continuity 4612.2. Continuity and Composition 4712.3. Limit, Accumulation, and Isolated Points 47

Lecture 13. Closed Sets 4913.1. Limit, Accumulation, and Isolated Points 4913.2. Closed Sets 50

Lecture 14. Open Sets 5114.1. Closed Sets 5114.2. Open Sets 53

Lecture 15. Set Operations with Open and Closed Sets 5415.1. Complements of Open and Closed Sets 5415.2. Set Operations with Open and Closed Sets 54

Lecture 16. Topological Characterization of Continuity 5616.1. Inverse Images 5616.2. Topological Characterization of Continuity 57

Lecture 17. Cauchy Sequences 5917.1. Cauchy Sequences 5917.2. Completeness 60

Lecture 18. Completeness 6118.1. Completions of Metric Spaces 62

Lecture 19. Infinite Series 6419.1. Cauchy Criterion for Series 6419.2. The Divergence and Comparison Tests 65

Lecture 20. Infinite Series 6820.1. An Extended Example 68

Lecture 21. Integral Test 7021.1. The Harmonic Series and Integral Test 70

Lecture 22. Alternating Series 7322.1. The Alternating Series Test 7322.2. Manipulating Series 75

CONTENTS iii

Lecture 23. Rearrangements of Series 7823.1. Rearrangements of Series 7823.2. Cauchy Products of Series 78

Lecture 24. Products of Series 8024.1. Cauchy Products of Series 8024.2. The Cauchy Product of Convergent Series Can Diverge! 8024.3. The Euler Product Formula 8224.4. Euler’s Refinement of Euclid’s Theorem 83

Lecture 25. Compactness 8625.1. Compactness 8625.2. Compact Sets inRn 87

Lecture 26. The Cantor Set 8926.1. The Cantor Set 8926.2. The Cantor Ternary Function 9126.3. Cantor Set Trivia 92

Lecture 27. Compactness and Continuity 9527.1. Continuity and Compactness 9527.2. Uniform Continuity 96

Lecture 28. Uniform Continuity 9828.1. Nested Compact Sets 98

Lecture 29. Contraction Mapping Principle 10029.1. The Contraction Mapping Principle 100

Lecture 30. Derivatives 10330.1. Derivatives 10330.2. Basic Theorems 104

Lecture 31. Mean Value Theorem 10531.1. Basic Theorems 105

Lecture 32. Functions Behaving Badly 10832.1. Functions Behaving Badly 108

Lecture 33. Uniform Convergence 11233.1. Pointwise Convergence 11233.2. Uniform Convergence 113

Lecture 34. Uniform Convergence 11534.1. Completeness ofC(X) 115

Lecture 35. WeierstrassM -test 11735.1. Weierstraβ M -Test 11735.2. Weierstrass Approximation Theorem 11935.3. Cauchy’s Mean Value Theorem 119

Lecture 36. L’Hopital’s Ruleand Taylor’s Theorem 120

iv CONTENTS

36.1. L’Hopital’s Rule 12036.2. Taylor’s Theorem 121

Lecture 37. Taylor Series 12337.1. Smoothness Classes 12437.2. Some Smooth Functions 124

Lecture 38. Initial Value Problems 12838.1. Existence and Uniqueness of Solutions 128

Lecture 39. Picard Iteration 13039.1. Initial Value Problems 13039.2. Extended Example 130

Appendix A. Basic Logic 133A.1. Primitive Concepts 133A.2. Negation (NOT) 134A.3. Conjunction (AND) 135A.4. Disjunction (OR) 136A.5. Manipulating Propositions 137A.6. Implication (P ⇒ Q) 138A.7. Converse (P ⇐ Q) 139A.8. If and only if (⇔) 140A.9. Contrapositive 140

Appendix B. Basic Set Theory 142B.1. Sets 142B.2. Using Properties to define Sets 144B.3. Russell’s Paradox 144B.4. Quantifiers 146B.5. Negating Propositions With Quantifiers 148B.6. Subsets 149B.7. Complement, Union, and Intersection 151B.8. Ordered Pairs 152B.9. Cartesian Products 152B.10. Power Sets 152B.11. Concerning Exceptional Penguins 153

Appendix C. Mathematical Induction 156C.1. The Power Sum Problem 156C.2. Mathematical Induction 157C.3. The Binomial Coefficient

(nk

)159

C.4. Pascal’s Triangle 160C.5. The Binomial Theorem 161C.6. Bernoulli’s Solution to the Power Sum Problem 163

Appendix D. Ordered Fields 165D.1. Fields 165D.2. Ordered Fields 166

Appendix E. Primes Numbers 168

CONTENTS v

E.1. Euclid’s Theorem 168E.2. The Prime Number Theorem 170

Appendix F. Galileo’s Paradox 172

Appendix G. Inner Product Spaces 174G.1. Review: The Dot Product 174G.2. Inner Products 175G.3. Norms Defined by Inner Products 176G.4. Orthogonal Vectors 177G.5. The Cauchy-Schwarz-Bunyakowsky Inequality 179G.6. The Triangle Inequality 181

Appendix H. Covering Compactness 182H.1. Covering Compactness 182H.2. Covering Compactness = Sequential Compactness 183H.3. Total Boundedness 183

LECTURE 1

Introduction

1.1. Preliminaries

Since the real number system (denoted byR) is basic to real analysis, we need to knowexactly what real numbers are. As we will see, this is a far less trivial problem than it firstappears and it deserves serious consideration.

Although to some, the rigorous construction of the real number system can be end-lessly fascinating, to others it may appear tedious and pedantic. There is a certain undeni-able beauty to seeing the real number system built from the ground up, using logic and settheory alone. On the other hand, while the grand scheme may beinspiring, many of thedetails are quite mechanical and uninteresting. We will content ourselves with some sortof middle ground, leaving some of the details to the homeworkand later lectures, whileomitting others altogether.

Since we all believe thatR exists, in some form or another, we will introduce thereal numbers somewhat axiomatically. This means that we will not go into the details ofjustifying whysuch a number system exists – in proving the existenceR we would straytoo far into the realm of set theory and away from real analysis itself. We will simply statethe basic properties ofR asaxiomsand highlight their importance.

Among other things, the real number system contains a numberof distinguished sub-sets:

Definition. R denotes the set ofreal numbers, N = {0, 1, 2, 3, . . .} denotes the set ofnat-ural numbers, Z = {. . . ,−2,−1, 0, 1, 2, . . .} denotes the set ofintegers1, andQ denotesthe set ofrational numbers(fractions).

In particular, note that our definition ofN includes the number0. This is somewhatstandard, as far as set theory and logic go, but in other branches of mathematics you mayseeN introduced starting from1. This is not a major mathematical issue, but it is importantto point out the notation that will we be using.

1.2. Hippasus’ Theorem

Thecomplementof Q in R is the set

I = R\Q = {x ∈ R : x /∈ Q}of all irrational numbers(i.e. real numbers which are not rational). It turns out thatI 6= ∅(theempty set) – in other words irrational number exist. In particular, wewill demonstratethat

√2 (the length of the diagonal of a unit square) is irrational. While it appears that the

original proof of this fact was due to Hippasus of Metapontum(who was a Pythagorean),the irrationality of

√2 is often attributed to Pythagoras himself. Regardless of who thought

of it, the proof is a standard example ofproof by contradiction.

1The letterZ stands for the first letter of the German wordZahlen.

1

2 Lecture 1. Introduction

Theorem 1.1(Hippasus of Metapontum).√

2 is irrational.

Proof #1. Suppose toward a contradiction that√

2 is rational. Let us write√

2 = a/bwhere the fraction is reduced to lowest terms (so that thegreatest common divisorgcd{a, b}of a andb is 1). Squaring the preceding equation, we obtain2b2 = a2. This shows thata2 is even, whencea is even as well (since the square of an odd number is odd). Writinga = 2c, we find that2b2 = (2c)2 = 4c2 and thusb2 = 2c2. This shows thatb2, andhenceb itself, is even. Thereforea andb are both divisible by2, a contradiction to thehypothesis that the fractiona/b was reduced to lowest terms. Our initial assumption that√

2 is rational is therefore false and we conclude that√

2 is irrational. �

It is important to note that the preceding proof implicitly relied on the FundamentalTheorem of Arithmetic. There are several other basic properties of N that we often takefor granted. Chief among these is the following:

Theorem 1.2 (Well-Ordering Property ofN). Every nonempty subset ofN contains asmallest element. In other words, ifS ⊆ N and S 6= ∅, then there existsn ∈ S suchthatn ≤ m for everym ∈ S.

The Well-Ordering Property ofN can be proved using the Principle of MathematicalInduction, which can itself be proved from the axioms of set theory. However, we will notconcern ourselves with such details. We now give another proof of Theorem 1.1 whichrelies on aminimality argument:

Proof #2. Suppose toward a contradiction that√

2 is rational. Letb be the smallest positiveinteger such that

√2 = a/b for somea ∈ Z. First observe thata > b since otherwise

2b2 = a2 ≤ b2

whence2 ≤ 1, an absurdity. A few algebraic manipulations leads us to another represen-tation of

√2 as a rational number:

√2 =

√2

(√2 − 1√2 − 1

)=

2 −√

2√2 − 1

=2 − a

bab − 1

=2b − a

a − b.

Sincea − b > 0 and√

2 > 0, it follows from the preceding that2b − a > 0. However,a > b and2b − a > 0 mean that

0 < a − b < b,

contradicting the minimality ofb. We therefore conclude that√

2 is irrational. �

1.3. A Nonconstructive Proof

Some proofs areconstructive. In other words, the method of the proof can be usedto explicitly construct examples of the objects which they assert exist. On the other hand,some proofs arenonconstructivein the sense that they establish the existence of somethingwithout directly constructing it. A particularly strikingexample of a nonconstructive proofis the following:

Theorem 1.3. There exist irrational numbersa andb so thatab is rational.

Proof. Let us consider the numberc =√

2√

2. This number is either rational or irrational

and thus there are two possible cases to consider:

(i) If c is rational, then leta = b =√

2. In this case,ab = c is rational whilea andb are irrational.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 3

(ii) If c is irrational, then leta = c andb =√

2. In this case, the usual rules formanipulating exponents yields

ab = (√

2

√2)√

2 =√

2

√2√

2=

√22

= 2.

In this case,ab is rational whilea andb are irrational.

Since both cases lead to the conclusion that there are irrational numbersa andb such thatab is rational, the proof is finished. �

Observe that the preceding proof does not tell us whetherc is irrational or not. It turnsout thatc is irrational – this follows from the famedGelfond-Schneider Theorem, a deepand difficult result in the theory of transcendental numbers.

1.4. A Third Proof of Hippasus’ Theorem

Before giving our third proof of Hippasus’ Theorem, we need afew preliminaries. Aparticularly familiar application of the Well-Ordering Property is the following:

Theorem 1.4(Division Algorithm). Givena, b ∈ Z with b > 0, there exist uniqueq, r ∈ Zsuch thata = qb + r and0 ≤ r < b.

In other words, when you dividea by b, you wind up with aquotientq and aremainderr which satisfies0 ≤ r < b. Thus the Division Algorithm is just a familiar fact from gradeschool arithmetic. Although we omit the proof, it is important to mention that the DivisionAlgorithm can be proved from more primitive notions. In particular, almost everything inmathematics can be built up from the basic axioms of set theory.

If a, b are two nonzero integers, thengcd(a, b) will denote thegreatest common divisorof a andb. An important fact about the greatest common divisor is the following theorem,the proof of which demonstrates bothminimalityandmaximality arguments.

Theorem 1.5 (Linear Representation of GCD). Let a, b be nonzero integers. Ifg =gcd(a, b), then there exist integersx0 andy0 such thatg = ax0 + by0. In other words, thegreatest common divisor ofa andb is an integral linear combination ofa andb.

Proof. Without loss of generality, we may assume thata, b > 0. The set

S = {ax + by : x, y ∈ Z}contains positive integers (as well as 0). By the Well-Ordering Property ofN, there existx0 andy0 such thatl = ax0 + by0 is the smallestpositiveinteger inS. It will turn out thatl is the greatest common divisor ofa andb, that isl = g whereg = gcd(a, b). Notice that

0 < l ≤ a,

0 < l ≤ b

by the definition ofl.We first need to show thatl is a common divisor ofa andb. By the Division Algorithm,

we may writea = lq + r where0 ≤ r < l (i.e. q is the quotient andr is the remainderwhena is divided byl). Therefore

r = a − lq

= a − q(ax0 + by0)

= a(1 − qx0) − b(qy0).

Sincer is of the formax + by, it follows thatr ∈ S. Since0 ≤ r < l andl is thesmallestpositive element ofS, we see thatr = 0. In other words,l evenly dividesa (since the

4 Lecture 1. Introduction

remainder,r, is zero). Similar reasoning shows thatl dividesb as well. Thereforel is acommon divisor ofa andb.

Sinceg = gcd(a, b) is the greatest common divisor ofa andb, it is a common divisorof a andb. Thus we can writea = gA andb = gB. Hence

l = ax0 + by0 = g(Ax0 + By0)

and thusg dividesl (in particular,g ≤ l). Sinceg ≤ l, it must be the case thatg = l sinceg is thegreatestcommon divisor ofa andb. �

In terms of Abstract Algebra, the preceding theorem states that the ideal generatedby the integersa, b in the ringZ (a principal ideal domain) must be generated by a singleelement, namelyg.

We now present yet another proof of Hippasus’ Theorem. In fact, we prove that√

n isirrational whenn is not a perfect square. The following approach is not as well-known asthe others and it has a completely different flavor altogether:

Theorem 1.6(Hippasus of Metapontum). If a natural numbern is not a perfect square,then

√n is irrational.

Proof #3. Suppose that√

n = a/b where the fractiona/b has been reduced to lowestterms. In other words,a andb share no common factors and hencegcd(a, b) = 1. By thelinear representation of the GCD, there exist integersx, y so that1 = ax + by. It thereforefollows that

√n =

√n(ax + by)

= (√

na)x + (√

nb)y

= bnx + ay

since√

na = bn and√

nb = a. However,bnx + ay is an integer, which implies that√

nis also an integer. Since this contradicts the hypothesis that n is not a perfect square, weconclude that

√n is irrational. �

It turns out that we have one more proof of Hippasus’ Theorem in store. . .

LECTURE 2

The Archimedean Property and Its Consequences

2.1. The Archimedean Property

An extremely useful property ofR is the so-called Archimedean Property:

Theorem 2.1(Archimedean Property ofR). For everyǫ > 0 andM ∈ R, there existsn ∈ N so thatM < nǫ.

The proof thatR enjoys the Archimedean Property requires the Least Upper BoundPrinciple, which we will discuss relatively soon. It is important to note that there aremathematical structures (i.e.ordered fields – seeNotes on Fields) which are similar toR,yet which donot enjoy the Archimedean Property.

The Archimedean Property is often used in the following form:

Corollary 1. For everyǫ > 0, there existsn ∈ N such that1n < ǫ.

Another consequence of the Archimedean Property is:

Theorem 2.2(Greatest Integer Function). For eachx ∈ R there exists a unique integer,denoted[x], such that

[x] ≤ x < [x] + 1. (2.1)

Proof. Without loss of generality, suppose thatx > 0. By the Archimedean Property (withǫ = 1 andM = x), the setS = {n ∈ N : x < n} is not empty. By the Well-OrderingPrinciple,S has a smallest element, saym. In other words, we have

m − 1 ≤ x < m.

It is clear that[x] = m − 1 has the desired property (2.1).We must now show that[x] is the unique integer which satisfies (2.1). Suppose that

a ∈ Z satisfiesa ≤ x < a+1. We consider two possible cases. First, suppose that[x] ≥ a.In this case, it follows that0 ≤ [x] − a ≤ x − a < 1, whence[x] = a since botha and[x]are integers. A similar argument takes care of the case[x] < a. �

In light of the preceding, we introduce the following usefulnotation:

Definition. For eachx ∈ R, let [x] denote the greatest integer≤ x and let〈x〉 = x −[x] denote thefractional part of x. In other words, each real numberx can be written(uniquely) in the formx = [x] + 〈x〉 where[x] ∈ Z and0 ≤ 〈x〉 < 1.

In case you have encountered this concept ofinner productsin linear algebra, weshould mention that the notation〈x〉 for the fractional part ofx is completely unrelated. Infact, many textbooks use the notation{x} instead. Needless to say, in a course where onedeals with sets all the time, the use of set brackets for the purpose of denoting fractionalparts is quite unwise.

The following theorem asserts thatQ is densein R, in the sense that between any tworeal numbers one can find a rational number:

5

6 Lecture 2. The Archimedean Property and Its Consequences

Theorem 2.3(Density ofQ in R). If a < b, then there existsx ∈ Q so thata < x < b.

Proof. Sinceb − a > 0 the Archimedean Principle asserts that there existsn ∈ N suchthat(b − a)n > 1. Sincebn − an > 1, it follows that there exists an integerm such that

an < m < bn. (2.2)

Indeed, it follows from (2.1) that we may letm = [an] + 1:

an < [an] + 1︸ ︷︷ ︸m

≤ an + 1 < bn.

Dividing (2.2) through byn, we find thata < x < b wherex = mn . �

The same statement holds for the set of irrational numbersI = R\Q:

Theorem 2.4(Density ofI in R). If a < b, then there existsx ∈ I such thata < x < b.

Proof. By the preceding theorem, there exists a rational numbery such that

a√2

< y <b√2,

or equivalently,a <

√2y︸︷︷︸x

< b.

Since√

2 is irrational, it follows easily thatx =√

2y is irrational as well. �

It follows from the preceding theorems that:

MORAL: In between any two distinct real numbers there areinfinitely many rational numbers and infinitely many irrationalnumbers.

2.2. The Binomial Theorem & Bernoulli’s Inequality

TheBinomial Theoremis one result from elementary algebra that turns out to be ex-ceedingly useful in analysis:

Theorem 2.5(Binomial Theorem). The formula

(x + y)n =

n∑

k=0

(n

k

)xkyn−k

holds for any integern ≥ 1 and any real numbersx, y. Moreover, thebinomial coefficient(

n

k

)=

n!

k!(n − k)!

is always an integer.

For a proof of the Binomial Theorem, you can consult theNotes on Induction. Asan immediate consequence of the Binomial Theorem, we obtainthe following:

Theorem 2.6(Bernoulli’s Inequalities). The inequalities

(1 + a)n ≥ 1 + na (Weak Version)

(1 + a)n ≥ 1 + na +n(n − 1)

2a2 (Strong Version)

hold for all a ≥ 0 andn ∈ N.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 7

Proof #1. The right hand sides of these inequalities are simply the first two and threeterms, respectively, in the binomial expansion of(1+a)n. Since each term in the binomialexpansion is≥ 0, the desired result follows. �

Both versions of Bernoulli’s Inequality can be proved by Mathematical Induction. Forinstance, here is an inductive proof of the weak version of Bernoulli’s Inequality:

Proof #2. Let a > 0 and letP (n) be the statement

(∀a > 1)( (1 + a)n ≥ 1 + na ). (2.3)

We will use mathematical induction to show that the statementsP (0), P (1), . . . are all true.

BASE CASE: Clearly P (0) is true since the desired inequality reduces1 ≥ 1, which isobviously true.

INDUCTIVE STEP: Suppose thatP (n) is true for some value ofn. In other words, supposethat (2.3) is true for this specific value ofn. Multiplying the inequality in (2.3) through by1 + a we find that

(1 + a)n+1 = (1 + a)n(1 + a)

≥ (1 + na)(1 + a)

= 1 + na + a + na2

= 1 + (n + 1)a + na2

≥ 1 + (n + 1)a.

In other words, the statementP (n + 1) is also true – we have established thatP (n) ⇒P (n + 1). This completes the inductive step.

CONCLUSION: By mathematical induction, it follows thatP (n) is true forn = 0, 1, 2, . . .and hence(1 + a)n ≥ 1 + na for a > 0 and every integern ≥ 0. �

As a consequence of the weak version of Bernoulli’s Inequality, we can prove thefollowing well-known and useful result:

Theorem 2.7. If x > 1 andM > 0, then there existsn ∈ N such thatxn > M . Similarlyif 0 ≤ x < 1 andǫ > 0, then there existsn ∈ N such that0 ≤ xn < ǫ.

Proof. Since the second assertion follows immediately from the first, we prove only thefirst statement in the theorem. Ifx > 1, then writex = 1 + a wherea > 0 and use theweak form of Bernoulli’s Inequality:

xn = (1 + a)n ≥ 1 + na.

By the Archimedean Property, there existsn ∈ N so thatna > M − 1 whencexn > M ,as desired. �

2.3. An Analytic Proof of Hippasus’ Theorem

In this section, we provide yet another proof of Hippasus’ Theorem. The followingproof is of a more analytical nature than our earlier proofs and it also illustrates some ofthe techniques that we have developed.

8 Lecture 2. The Archimedean Property and Its Consequences

Theorem 2.8(Hippasus of Metapontum).√

2 is irrational.

Proof. #4. Assume toward a contradiction that√

2 = p/q wherep, q are integers andq ≥ 1. Define the numbersen via the formula

en =⟨√

2⟩n

= (√

2 − 1)n

and observe that0 <

√2 − 1 < 1

2 . (2.4)

Indeed, elementary arithmetic shows that the preceding inequality is equivalent to the ob-vious inequality1 < 2 < 9

4 (i.e we can establish (2.4) without the use of a calculator ordecimal expansions). It follows from (2.4) and the definition of en that

0 < en <1

2n(2.5)

for all n ∈ N. Now observe that for eachn ∈ N there exist integersan, bn such that

en = an + bn

√2. (2.6)

Although this statement can be proved by Mathematical Induction, it also be proved di-rectly from the Binomial Theorem:

en = (√

2 − 1)n =

n∑

k=0

(n

k

)(√

2)n(−1)n−k.

Since the binomial coefficients(nk

)are integers and since(

√2)n is either an integer or an

integer times√

2, the desired formula (2.6) follows immediately. By (2.6), we have

en = an + bn

√2

= an + bn

(p

q

)

=anq + bnp

q

=cn

q

wherecn is an integer. Sinceen 6= 0, it follows thatcn ≥ 1 whenceen ≥ 1/q. Putting thistogether with (2.5), we find that

1

q≤ en <

1

2n

for everyn ∈ N. However, the resulting inequality2n < q fails for sufficiently largen byTheorem 2.7. This contradiction shows that

√2 must be irrational. �

LECTURE 3

The Least Upper Bound Property

3.1. The Least Upper Bound Property

The key property that singles outR among all ordered fields (such asQ or R(x) – seeNotes on Ordered Fields for background) is theleast upper bound property. To discussthis important property, we first need a few definitions.

Definition. If A ⊂ R, then anupper boundfor the setA is a numbers ∈ R such thata ≤ s for all a ∈ A. If the setA has an upper bound, then we say that it isbounded above.

In terms of the “real line” visualization ofR, an upper bound forA is simply a pointthat lies to the right of the entire setA.

Definition. We call a real numbers a least upper bound(or supremum) for A if

(i) s is an upper bound forA

(ii) if t is any upper bound forA, thens ≤ t.

This is writtens = supA and wheresup stands forsupremum. If A is not bounded above,then we say thatsupA = ∞.

The corresponding notion ofgreatest lower bound(also called theinfimum) of a setA(denotedinf A) is defined analogously.

Note thatsup A, when it exists, is uniquely determined. Indeed, ifs1, s2 are two leastupper bounds forA, thens1, s2 are both upper bounds. Sinces1 is the least upper bound,it follows thats1 ≤ s2. Similarly, we find thats2 ≤ s1 and hences1 = s2. Thus we canspeak ofthe least upper bound of a set.

Example 3.1. If A is a finite subset ofR, thensup A is simply the largest element ofAandinf A is the smallest.

Example 3.2.sup N = ∞ sinceN is not bounded above (this follows from the Archimedeanproperty).

Example 3.3. sup[0, 1) = 1, where[0, 1) denotes the half-open interval:

[0, 1) = {x ∈ R : 0 ≤ x < 1}.Clearly1 is an upper bound for[0, 1), so condition (i) in the definition is satisfied.

Now let us check condition (ii). We claim that

x is an upper bound for[0, 1) ⇒ x ≥ 1. (3.1)

instead of proceeding directly, we prove the contrapositive:

x < 1 ⇒ x is not an upper bound for[0, 1).

If x is any number smaller than1, we can say thatx = 1 − ǫ, whereǫ > 0. But thenx < 1− ǫ

2 ∈ [0, 1) and hencex is not an upper bound for[0, 1). This proves (3.1) and thus1 is theleastupper bound for[0, 1). The proves thatsup[0, 1) = 1.

9

10 Lecture 3. The Least Upper Bound Property

Note that the preceding example demonstrates thatsup A does not have to belong toA.

Example 3.4. Consider the set

A = {1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .}.The setA is bounded above, since every element ofA is≤ 2. In other words,2 is an upperbound forA. Of course, we also recognize thatS contains a list of better and better rationalapproximations to √

2 = 1.414213562 . . . .

The problem with the rational number systemQ is that it has holes which must be repaired.Our intuition tells us that the sequence

1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .

is increasing up to√

2. That is, thereshould besome real number that is theleast upperboundfor A. In other words, the sequence above is “approaching a hole inQ.” This holdwill be plugged with the real number

√2.

Although we will not prove the following theorem, rest assured that it can be provedfrom the basic axioms of Set Theory:

Theorem 3.1(Least Upper Bound Property). Every nonempty subset ofR that is boundedabove has a least upper bound inR.

It turns out thatR is the only ordered field that has this property. In other words, nomatter what “construction” is used to produce an ordered field with the least upper boundproperty, the end result will be essentiallyR. Therefore it is almost silly to speak of leastupper bounds in any other field thatR.

Theorem 3.2(Approximation Property of Suprema). If s = supA exists and is finite, thenfor everyǫ > 0 there existsa ∈ A such thats − ǫ < a.

Proof. Suppose toward a contradiction that the statement is false.In other words,(∃ǫ >0)(∀a ∈ A)(s−ǫ ≥ a). This means thats−ǫ is an upper bound forA, which is impossiblesinces is supposedly theleast(i.e. smallest) upper bound forA). Therefore the statementis true. �

3.2. The Existence of√

2

In this section, we prove that there exists a real numbers > 0 such thats2 = 2. Inother words, we prove that

√2 actually exists in the system of real numbers (we never

attempted to justify this before). The proof depends on the Least Upper Bound Principleand the following well-known, but often overlooked, theorem:

Theorem 3.3 (Trichotomy Law). If x, y ∈ R, then one and only one of the followingstatements holds:x > y, x < y, or x = y.

We are now ready to proceed:

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 11

Theorem 3.4. There exists a real numbers > 0 such thats2 = 2. In other words,√

2exists inR.

Proof. Define the setA = {x ∈ R : x ≥ 0, x2 ≤ 2}.

Since0 ∈ A, it is clear thatA is nonempty. Furthermore, since

2 ≤ x ⇒ 2 < 4 ≤ x2,

it follows thatA is bounded above by2. By the Least Upper Bound Principle, the setAhas a least upper bound inR. Let s = supA and note thats ≥ 1.

By the Trichotomy Law, there are three possible cases to check:

s2 < 2, s2 > 2, s2 = 2.

If we can show that the first two cases lead to contradictions,this we can conclude thats2 = 2, as desired.

(i) If s2 < 2, then we claim that(s + 1n )2 < 2 holds for sufficiently largen ∈ N.

Since(

s +1

n

)2

= s2 +2s

n+

1

n2

≤ s2 +2s

n+

1

n

= s2 +2s + 1

n,

if we maken large enough so that

s2 +2s + 1

n< 2,

then(s + 1n )2 < 2 will hold. In particular, any value ofn such that

2s + 1

2 − s2< n

will suffice. This contradicts the fact thats is an upper bound forA since thelarger numbers + 1

n also belongs toA.

(ii) If s2 > 2, then we claim that(s − 1n )2 > 2 holds for sufficiently largen ∈ N.

Since(

s − 1

n

)2

= s2 − 2s

n+

1

n2

> s2 − 2s

n,

if we maken large enough so that

s2 − 2s

n> 2,

then(s − 1n )2 > 2 will hold. In particular, any value ofn such that

n >2s

s2 − 2

12 Lecture 3. The Least Upper Bound Property

will suffice. In particular,s− 1n is an upper bound forS sincet > s− 1

n impliesthat

2 <

(s − 1

n

)2

≤ t2

whencet /∈ S. This contradicts the fact thats is theleastupper bound forS.

Since (i) and (ii) led to contradictions, it follows from theTrichotomy Law thats2 = 2.In other words,

√2 exists inR. �

Similar, but more complicated, arguments guarantee the existence of n√

x wheneverx ≥ 0 andn ≥ 0. We will not go into the details any further.

LECTURE 4

Monotone Sequences and Series

4.1. The Monotone Sequence Property and Infinite Series

You might recall the following definition from Math 101, Math31H, or even CalculusII:

Definition. Let an be a sequence of real numbers. We say that “the limit of the sequencean is L,” written limn→∞ an = L, if for every ǫ > 0, there existsN ∈ N such that|an − L| < ǫ holds whenevern ≥ N . In symbols, this reads:

(∀ǫ > 0)(∃N ∈ N)(n ≥ N ⇒ |an − L| < ǫ).

One of the most important consequences of the Least Upper Bound Property is theso-calledMonotone Sequence Property, which asserts that an increasing sequence whichis bounded above must converge:

Theorem 4.1(Monotone Sequence Property). If an is a sequence of real numbers whichis monotonically increasing (i.e.an ≤ an+1 for all n) and which is bounded above (i.e.there existsM ∈ R so thatan ≤ M for all n), thenan is convergent.

The proof of the preceding theorem is requested on an upcoming homework assign-ment.

Definition. An infinite series∑∞

i=0 ai of real numbers is said to converge toS if thesequence ofpartial sums

Sn =

n∑

i=0

ai

tends toS: ∞∑

i=0

ai = S means that limn→∞

Sn = S.

One of the simplest and most important examples of a convergent series of real num-bers is the geometric series

∑∞n=0 xn with common ratio|x| < 1:

Theorem 4.2(Geometric Series Formula). If |x| < 1, then∞∑

n=0

xn =1

1 − x. (4.1)

The preceding series is called the Geometric Series.

Proof. If |x| < 1 andǫ > 0 are given, then letN ∈ N be so large that

|x|N < |1 − x|ǫ.We are guaranteed that such anN exists since|x| < 1. Recalling that

Sn = 1 + x + x2 + · · · + xn−1

13

14 Lecture 4. Monotone Sequences and Series

=1 − xn

1 − x,

it follows that if n ≥ N , then∣∣∣∣Sn − 1

1 − x

∣∣∣∣ =

∣∣∣∣1 − xn

1 − x− 1

1 − x

∣∣∣∣

=

∣∣∣∣xn

1 − x

∣∣∣∣

≤ |x|n|1 − x|

≤ |x|N|1 − x|

≤ ǫ.

Thus

limn→∞

Sn =1

1 − x,

which is equivalent to the desired formula (4.1). �

Let us now recall the following fact from elementary arithmetic:

Corollary 2. If D is a block of decimal digits of lengthd, then

0.D =D

10d − 1. (4.2)

The preceding is nothing more than the familiar recipe for dealing with repeatingdecimal expansions. For example:

Example 4.1.4

9= 0.4,

47

99= 0.47,

476

999= 0.476, . . . .

4.2. Series with Non-negative Terms and Decimal Expansions

Theorem 4.3. If ai ≥ 0 for all i ∈ N and if the sequenceSn =∑n

i=0 ai is bounded above,then the series

∑∞i=0 ai converges.

Proof. Sinceai ≥ 0 for all i ∈ N, it follows that the sequenceSn is monotonicallyincreasing (Sn ≤ Sn+1 for all n ∈ N). By hypothesis, theSn are bounded above andhencelimn→∞ Sn exists by the Monotone Sequence Property. �

As a result, we find thatbase-b expansionsare valid:

Theorem 4.4 (Existence of Base-b Expansions). Let b be a positive integer. Ifdi ∈{0, 1, . . . , b − 1} for eachi, then the infinite series

∞∑

i=1

di

bi(4.3)

converges to a real number in[0, 1], denoted

(0.d1d2d3d4 . . .)b. (4.4)

Furthermore, eachx ∈ [0, 1) has a unique representation (called a base-b expansion) ofthe form(4.4)which does not eventually terminate in a string of(b − 1)’s.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 15

Proof. First note that ifdi ∈ {0, 1, . . . , b− 1} for eachi, then the formula for the summa-tion of a geometric series tells us that the partial sums of (4.3) are bounded above by

∞∑

i=1

b − 1

bi= (b − 1) ·

1b

1 − 1b

= (b − 1) · 1

b − 1= 1.

By the preceding theorem, it follows that any series of the form (4.3) converges to a realnumber in[0, 1].

Suppose now thatx ∈ [0, 1) and letd1 be the largest natural number such that

d1

b≤ x. (4.5)

In other words, letd1 = [xb]. Observe that0 ≤ d1 ≤ b − 1 sinced1 ≥ b would contradictthe fact thatx < 1. Similarly, letd2 be the largest natural number such that

d1

b+

d2

b2≤ x (4.6)

(this can again be defined in terms of the greatest integer function) and observe that0 ≤d2 ≤ b − 1 sinced2 ≥ b would violate the maximality ofd1 in (4.5). Proceeding inthis manner, we obtain a sequenced1, d2, d3 . . . of base-b digits of x which satisfy theinequality

d1

b+

d2

b2+ · · · + dn

bn≤ x (4.7)

for eachn = 1, 2, 3, . . .. Let

A =

{d1

b+

d2

b2+ · · · + dn

bn: n = 1, 2, 3, . . .

}

SinceA 6= ∅ andx is an upper bound forD, it follows that s = supA exists. Thedefinition ofsupA implies thats ≤ x. We claim now thats = x.

Suppose toward a contradiction thats < x. Let m ∈ N be so large that

1

bm< x − s.

By the definition ofdm, it follows that

x <d1

b+

d2

b2+ · · · + dm

bm︸ ︷︷ ︸∈A

+1

bm

≤ s +1

bm

< x.

However, this implies thatx < x, a contradiction. Sinces ≤ x, it follows that we mustactually havex = s, as claimed. The remainder of the proof (the fact that the series (4.3)converges tox and the uniqueness assertion) is left to the reader. �

An important fact about infinite decimal expansions is that they can help us better un-derstand the relationship between rational numbers and irrational numbers. The followingtheorem precisely characterizes rational and irrational numbers according to their infinitedecimal expansions:

16 Lecture 4. Monotone Sequences and Series

Theorem 4.5. A real number has an eventually repeating decimal expansionif and only ifit is rational. In other words, a real numberx is a rational number if and only if its infinitedecimal expansion is of the form

x = A.BC

whereA, B, C are finite blocks of decimal digits.

Proof. Suppose that the real numberx has a repeating decimal expansion:

x = A.BC

whereA, B, C are blocks of digits of lengthsa, b, c, respectively. Clearly,

x − A = 0.BC

whence

10b(x − A) = B.C

= B + 0.C.

By the preceding lemma, we know that

0.C = C/(10c − 1).

Solving the equation

10b(x − A) = B +C

10c − 1leads to

x = A + 10−bB +10−bC

10c − 1,

which shows thatx is a rational number.On the other hand, ifx is rational, thenx = a/b wherea, b are integers. When

performing the long divisiona/b, past a certain point only “0’s will drop down” sinceais an integer (i.e.a = a.000000 . . .). Once the division has proceeded to the point whereonly 0’s drop down, there are onlyb possible remainders at every step of the division.Eventually some remainder will be repeated and the divisionwill form a loop (of length atmostb). �

Example 4.2. The preceding theorem implies that

0.123456789101112131415 . . .

and0.2030507011013017019 . . .

are irrational numbers. Although their infinite decimal expansions have definite patternsthat we can concretely describe, their infinite decimal expansions do not eventually repeat.On the other hand,

π = 3.1415926535 . . .

is irrational (this requires a complicated proof) and its infinite decimal expansion does notseem to have any discernible pattern.

LECTURE 5

Bijections

5.1. Counting Without Counting

The setA = {apple, bird, cat}

has three elements. What do we mean bythree? This is a philosophical question, butclearly there is some property that the setA shares with the set

B = {a, b, c}.There is some abstract notion of the numberthree that A andB share and we instantlyrecognize this property even though we cannot define it.

We know that the setsA andB above have the same number of elements since theyboth have three elements. Unfortunately, this procedure will not work for infinitesets – thetypes of sets that are of interest in real analysis. Nevertheless, by pairing up elements

(apple, a), (bird, b), (cat, c)

and noting that there are no elements ofA or B left over, we can conclude thatA andBhave the same number of elementswithout counting. In other words, to see thatA andBhave the same number of elements does not actually require usto count to three (or to evenknow whatthreeis). For finite setsA andB, we observe that

If there is a one-to-one correspondence between the elementsof two finite sets A and B, then A and B have the same numberof elements.

In order to carry over this scheme of “counting without counting” to more generalsets, we need to discuss functions and their properties. However, let us first examine whathappens if we naively try to “count” infinite sets.

5.2. Galileo’s Paradox

In his final bookThe Discourses and Mathematical Demonstrations Relating to TwoNew Sciences(1638), Galileo has a dialogue between two characters aboutinfinite sets.They discuss what is now known asGalileo’s Paradox. Galileo did not have permissionfrom the Inquisition to publish this book – after a heresy trial based on an earlier book, theRoman Inquisition banned Galileo from publishing anything. After failed attempts to pub-lish his book in Germany, France, and Poland, it was finally published in the Netherlands.

LetS = {0, 1, 4, 9, 16, . . .}

denote the set of perfect squares. ClearlyS is aproper subsetof N. In other words,S ⊂ NandS 6= N since clearly there are natural numbers (like3) that are not perfect squares.

17

18 Lecture 5. Bijections

However, look at what happens when we “line up” the elements of S andN:

N 0 1 2 3 4 5 6 7 · · ·S 0 1 4 9 16 25 36 49 · · ·

Galileo’s Paradox is the apparent contradiction that althoughS is “much smaller thanN,” we can still “pair off” elements ofN with elementsS. According to our intuitionobtained from studying finite sets, we might say thatN andS have the “same number ofelements.”

In more precise terminology, Galileo’s Paradox is essentially the observation that thesetN properly contains

S = {0, 1, 4, 9, 16, 25, . . .},even though the functionf : N → S defined byf(n) = n2 is a one-to-one correspondencebetweenS andN:

n 0 1 2 3 4 5 6 7 · · ·f(n) 0 1 4 9 16 25 36 49 · · ·

Intuitively we know that this cannot happen for finite sets:

A finite set cannot be put into a one-to-one correspondencewith a proper subset of itself.

For example, there is clearly no way that the set{1, 2, 3} can be put into a one-to-one cor-respondence with{1, 2}. But what exactly do we mean by a one-to-one correspondence?

When we say thatf(n) = n2 is a one-to-one correspondence betweenN andS wemean first of all thatf is a function. It assigns exactly one element ofS (the target set) toeach element ofN (the domain). Moreover, this function has two nice properties:

(i) f is “one-to-one” (injective). This means that no two distinct inputs (naturalnumbers) get sent to the same output (a perfect square).

(ii) f is “onto” (surjective). In other words,f hits everything inS. In other words,all of the elements in the target setS are outputs off .

5.3. Injections

Let A andB be sets. A functionf : A → B can be thought of as a rule which assignsto eacha ∈ A, some corresponding element ofB, calledb = f(a). A function does nothave to be defined by a formula, however. It is simply some definite method of assigningan element ofB to each element ofA.

Definition. A functionf : A → B is injective(often calledone-to-one) if f(a1) = f(a2)implies thata1 = a2. A function that is injective is called aninjection.

Essentially, the preceding definition states:

A function is injective if “distinct inputs lead to distinct outputs.”

Indeed the contrapositive of the definition is

a1 6= a2 ⇒ f(a1) 6= f(a2). (5.1)

To determine whether a function is injective or not often requires a short proof.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 19

Example 5.1. The functionf : N → N defined by

f(n) = n + 1

is injective. Iff(x) = f(y), thenx+1 = y +1 and hencex = y. Thereforef is injective.Intuitively, it is clear thatf(n) = n+1 is a “one-to-one function.” The technical definitionof injectivity represents this idea in a rigorous way.

Example 5.2. The functionf : Z → N defined byf(n) = |n| is not injective sincef(−1) = f(1), for instance. However, the functiong : N → N defined byg(n) = |n|is injective. This illustrates the fact that changing the domain of a function can affectinjectivity.

Example 5.3. The functionf : R → R defined byf(x) = x2 is not injective. To provethatf is not injective, all we have to do is find a pair of distinct elements in the domainRwhichf maps to the same output. This is easy, sincef(−1) = f(1) = 1, for example (toshow thatf is not injective, we need only find one such pair).

Example 5.4. Let us prove that the functionf : [0,∞) → R defined byf(x) = x2

is injective. If f(x) = f(y) for somex, y ∈ [0,∞), thenx2 = y2. This implies that(x− y)(x+ y) = 0 and hence eitherx− y = 0 or x+ y = 0. There are two possible caseswe must consider:

(i) If x − y = 0, thenx = y.

(ii) If x + y = 0, thenx = −y. But this implies that0 ≤ x = −y ≤ 0 (sincex ≥ 0andy ≥ 0), from which we see thatx = y = 0.

Since both cases lead to the conclusion thatx = y, it follows thatf is injective.Observe that the preceding proof did not automatically assume thatf is invertible

(i.e., we did not make any use of the square-root function). Using the square-root functionwould be inappropriate here since otherwise our reasoning would have been circular.

Another proof thatf is injective can be based upon the contrapositive (5.1) of thedefinition. If x1 6= x2 andx1, x2 ∈ [0,∞), then without loss of generality suppose that0 ≤ x1 < x2. It follows from this thatx2

1 < x22 whencef(x1) 6= f(x2). To be really

picky, one should prove that0 ≤ x1 < x2 implies thatx21 < x2

2. Let x2 = x1 + δ whereδ = x2 − x1 > 0. It follows that

x22 = x2

1 + 2δx1 + δ2 > x21

sinceδ > 0 andx1 > 0.

The following theorem involves is useful for constructing various examples:

Theorem 5.1. If f is differentiable on an open intervalI andf ′(x) 6= 0 for all x ∈ I,thenf is injective onI.

Proof. Suppose toward a contradiction thata, b ∈ I, a < b, andf(a) = f(b). By theMean Value Theorem from Calculus I, there exists somec such thata < c < b such that

f(b) − f(a) = f ′(c)(b − a).

Sincef(b) − f(a) = 0 andf ′(c) 6= 0, it follows thatb − a = 0 whencea = b. Thiscontradiction proves thatf is injective. �

20 Lecture 5. Bijections

5.4. Surjections

Definition. For a functionf : A → B, the setA is called thedomainof f . The setB issometimes called thetarget setof f . Therangeof f is defined by

Ran f = {b ∈ B : (∃a ∈ A)(b = f(a))}.

The range off is also sometimes called theimageof f and denotedf(A).

Note that the rangef(A) of f is not always equal toB.

Example 5.5. We can define a functionf : R → R via the formulaf(x) = x2. HereA = B = R so that the domain and target set off are bothR. The range off , however, isthe interval[0,∞) = {x ∈ R : 0 < x}. This is because not every element of the target setR is “hit” by the function. This points out the distinction between the target setB and therange of a function. The target set is “what you are aiming for” and the range is “what youhit”.

Definition. A function f : A → B is calledsurjective(with respect toB) if for everyb ∈ B, there exists ana ∈ A such thatf(a) = b. A surjective function is called asurjection.

In symbols, the definition reads:

(∀b ∈ B)(∃a ∈ A)(f(a) = b).

Another commonly used terminology (which you may have heardin your calculus class)is onto.

Observe that thetarget setB is of fundamental importance in the definition of surjec-tivity. By definition, a function is surjective if and only ifran f = B. That is, if and onlyif the range of the function equals the entire target setB.

To say that a function is surjective is the same as saying that it“hits its entire target set.”

Whether a function is surjective or not depends heavily on the target setB.

Example 5.6. The functionf : N → N defined byf(n) = n + 1 is not surjective sincef(n) 6= 0 for anyn ∈ N.

Example 5.7. The functionf : Z → Z defined byf(n) = n + 1 is surjective. Indeed, foranyb ∈ Z, there exists ana ∈ Z (namelya = b − 1) so thatf(a) = b.

The preceding example illustrates the following rule:

A function f : A → B is surjective if and only if the equationf(a) = b has a solution for every b in B.

Example 5.8. The functionf : R → [−1, 1] defined byf(x) = sin x is surjective. Notethat the equationf(x) = y for y ∈ [−1, 1] has infinitely many solutions. In particular,surjectivity does not guarantee that solutions tof(a) = b are necessarily unique.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 21

5.5. Bijections

Definition. If f : A → B is both injective and surjective, then we say thatf is bijective.

Bijections are special. Iff : A → B is a bijection, then we can define an inversefunctionf−1 : B → A by settingf−1(b) = a wheneverf(a) = b. This is well-definedsincef is both surjective and injective. Indeed, iff is not surjective, thenf−1(b) cannotbe defined for thoseb ∈ B\f(A). If f is not injective, then there may be two distincta1, a2 ∈ A such thatf(a1) = f(a2) = b and hencef−1(b) does not make sense.

Note thatf−1 ◦ f : A → A; f ◦ f−1 : B → B

and hencef−1 ◦ f andf ◦ f−1 are different functions (they have different domains) unlessA = B. Thusf−1 ◦ f = IA andf ◦ f−1 = IB whereIA andIB denotes the identityfunctions onA andB, respectively.

Example 5.9. The table below covers a number of examples:

f(x) = Domain Target Set Range Injective Surjective Bijectionx + 1 N N {1, 2, 3, . . .} Yes No Nox + 1 Z Z Z Yes Yes Yessinx R R [−1, 1] No No No

x3 − x R R R No Yes Notanx (−π

2 , π2 ) R R Yes Yes Yes

Most of the entries in the table are relatively self-explanatory. A few are worth mentioningspecifically, however. The functionf : R → R defined byf(x) = x3 − x is a surjectionbut not an injection. It is a surjection sincelimx→±∞ f(x) = ±∞ andf is continuous(hence by the Intermediate Value Theorem from Calculus I, its range inR). It is not aninjection sincef(−1) = f(1) = 0.

Definition. Suppose thatf : A → B andg : B → C are two functions. The compositiong ◦ f is the functiong ◦ f : A → C defined by

(g ◦ f)(a) = g(f(a)).

Observe that function composition is associative. Indeed,if h : C → D, then

(h ◦ (g ◦ f))(a) = h((g ◦ f)(a))

= h(g(f(a)))

= (h ◦ g)(f(a))

= ((h ◦ g) ◦ f)(a)

for all a ∈ A and hence we may writeh ◦ g ◦ f without parentheses.From our perspective, the most important property of function composition is that it

respects the properties of injectivity, surjectivity, andbijectivity:

Theorem 5.2. Letf : A → B andg : B → C be functions.

(i) If f andg are injections, theng ◦ f : A → C is an injection,

(ii) If f andg are surjections, theng ◦ f : A → C is a surjection,

(iii) If f andg are bijections, theng ◦ f : A → C is a bijection,

22 Lecture 5. Bijections

(iv) If f : A → B is a bijection, thenf−1 : B → A is also a bijection.

Proof. We first prove (i). For eacha1, a2 ∈ A we have

(g ◦ f)(a1) = (g ◦ f)(a2) ⇔ g(f(a1)) = g(f(a2))

⇔ f(a1) = f(a2)

⇔ a1 = a2

and henceg ◦ f is injective. The first two⇔’s are becauseg andf are injections, respec-tively. Now for (ii). If c ∈ C, then we must find somea ∈ A such that(g ◦ f)(a) = c.Sinceg is surjective, there exists someb ∈ B such thatg(b) = c. Sincef is surjective,there exists somea ∈ A such thatf(a) = b. Hence

(g ◦ f)(a) = g(f(a)) = g(b) = c

andg ◦ f is surjective. The proof of (iii) follows immediately from (i) and (ii). Statement(iv) was discussed above when we defined inverse functions. �

LECTURE 6

Cardinality

6.1. Cardinality

Definition. Let A andB be sets. If there exists a bijectionf : A → B, thenA andBare said to haveequal cardinality(or stated:A andB are of the same cardinality). This iswrittenA ∼= B.

Example 6.1. For finite sets,A ∼= B just means that “A andB have the same number ofelements.” For instance, the sets

A = {apple, bird, cat}, B = {a, b, c}.have the same cardinality since there is a bijectionf : A → B.

One of the most important properties of the symbol∼= is that it is anequivalencerelation. In other words, it “behaves like an equal sign”:

Theorem 6.1. ∼= is an equivalence relation. In other words, for any setsA, B, C thefollowing are true:

(i) A ∼= A

(ii) A ∼= B implies thatB ∼= A

(iii) A ∼= B andB ∼= C implies thatA ∼= C.

Proof. (i) follows from the fact that the identity functionIA : A → A is a bijection. (ii)follows from the fact that a bijectionf : A → B has an inverse functionf−1 : B → Awhich is also a bijection. (iii) follows from the fact that the composition of bijections isalso a bijection. �

The concept of cardinality allows us to divide up the universe of sets into variouscategories. Some important definitions are:

Definition. We say that a setA is

(i) finite if A = ∅ or A ∼= {1, 2, . . . , n} for somen ∈ N

(ii) infinite if A is not finite

(iii) countableif A is finite orA ∼= N

(iv) uncountableif A is not countable.

A countable infinite set is sometimes calledcountably infinite.

There is an alternate definition of infinite which is sometimes used. We state it in theform of a theorem (without proof):

Theorem 6.2. A setA is infinite if and only if there exists a proper subsetB ( A suchthatA ∼= B.

23

24 Lecture 6. Cardinality

6.2. Countable Sets

Let us discuss some examples of countable sets and various methods for constructingthem.

Example 6.2. N ∼= N. Indeed, the identity functionI : N → N defined byI(n) = n forall n ∈ N is clearly a bijection.

Theorem 6.3. Any subset of a countable set is countable.

Sketch of Pf. If A is a countable set, then we may list the elements ofA:

a0, a1, a2, . . . .

If B ⊆ A, then we simply make a new list by crossing out those elementsof A which donot belong toB. This produces a new list which provides a recipe for a bijection. �

Example 6.3. S ∼= N, whereS = {0, 1, 4, 9, 16, . . .}. Indeed, the functionf(n) = n2 is abijection fromN ontoS.

If A is a countable infinite set, then there is a bijectionf : N → A which provides alist of the elements ofA:

f(0), f(1), f(2), f(3), . . . .

In other words, the elements of a countably infinite set can belisted.

Theorem 6.4. If A andB are countable, thenA ∪ B is countable.

Proof. Without loss of generality, suppose that bothA andB are countably infinite anddisjoint: A ∩ B = ∅. The elements ofA andB can be listed:

a0, a1, a2, a3, a4, . . .

b0, b1, b2, b3, b4, . . . .

We can simply interlace the two lists to obtain a listing of every element ofA ∪ B:

a0, b0, a1, b1, a2, b2, a3, b3, a4, b4, . . . .

SinceA ∩ B = ∅, this list provides a bijection fromN to A ∪ B. In fact, the function

f(n) =

{an/2 n even

b(n−1)/2 n odd.

The case where one or both ofA, B is finite is left to the reader (as is the case whereA ∩ B 6= ∅). �

Example 6.4. Z ∼= N. Indeed,

0, 1,−1, 2,−2, 3,−3, 4,−4, . . .

is a complete listing ofZ. Implicitly, this listing defines a functionf : N → Z whosevalues are represented in the following table:

n 0 1 2 3 4 5 6 7 · · ·f(n) 0 1 −1 2 −2 3 −3 4 · · ·

In fact, an explicit formula for this funtion is

f(n) =

{−n2 n even

n+12 n odd.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 25

Example 6.5. N2 ∼= N. This example is so important that we explain it in two differentways. First, consider Figure 1, which illustrates a procedure for listing every element ofN2. This provides a definite procedure for listing each elementof N2. In fact, one can find

FIGURE 1. A listing of N2.

a polynomial in two variables which accomplishes this task (we leave the derivation of thisformula to the homework).

On a completely different note, there is also a brief number-theoretic argument whichprovides another bijectionf : N2 → N. We claim that the function

f(a, b) = 2a(2b + 1) − 1

is a bijection. Iff(a, b) = f(c, d), then

2a(2b + 1) − 1 = 2c(2d + 1) − 1

whence2a(2b + 1) = 2c(2d + 1).

Since2b + 1 and2d + 1 are odd, it follows from the Fundamental Theorem of Arithmeticthata = c. This implies that2b + 1 = 2d + 1 whenceb = d. Thereforef is injective. Ifn ∈ N is given, then use the Fundamental Theorem of Arithmetic to factorn + 1 to yielda, b ∈ N such that

n + 1 = 2a(2b + 1).

Clearly this implies thatf(a, b) = n whencef is surjective.

Example 6.6.Q∩[0, 1] ∼= N. Using a similar idea, we can construct a list of every rationalnumber in the closed interval[0, 1]:

0, 1, 12 , 1

3 , 23 , 1

4 , 34 , 1

5 , 25 , 3

5 , 45 , 1

6 , 56 , 1

7 , . . .

To do this, simply list the fractions in[0, 1] with denominator1, 2, 3, . . . without repeats.Since there are at mostn fractions in[0, 1] having denominatorn, it follows that each stagein this procedure is finite.

Example 6.7. It is possible to prove that the Newman function

n 7→ 1

[n] + 1 − 〈n〉

26 Lecture 6. Cardinality

recursively generates theCalkin-Wilf sequence1

1→ 1

2→ 2

1→ 1

3→ 3

2→ 2

3→ 3

1→ 1

4→ 4

3→ · · ·

which contains every positive rational number exactly once.

LECTURE 7

Cantor’s Theorem

7.1. Constructions with Countable Sets

Theorem 7.1. If An is countable for eachn ∈ N, then⋃

n∈N An is countable. In otherwords, “the countable union of countable sets is countable.”

Sketch of Pf. Without loss of generality, suppose that each of theAn is countably infiniteand thatAi ∩ Aj = ∅ if i 6= j. For eachn ∈ N, arrange the elements of eachAn in a list:

An = {a0n, a1n, a2n, a3n, . . .}.The function

f(i, j) = ith element in the listing ofAj

= aij

defines a bijectionf : N2 →⋃

n∈N An. It therefore follows that⋃

n∈N

An∼= N2 ∼= N �

Theorem 7.2. The Cartesian productA × B of two countable setsA, B is countable.

Sketch of Pf. Without loss of generality, suppose thatA andB are countably infinite sets.Let a0, a1, a2, . . . be a listing of the elements ofA. Since

A × B = {(a, b) : a ∈ A, b ∈ B}=⋃

n∈N

{ (an, b) : b ∈ B }

and each setAn = {(an, b) : b ∈ B} is countable, it follows from the preceding theoremthatA × B is countable. �

Example 7.1. Z2 ∼= N. Indeed,Z ∼= N (i.e., Z is countable) and hence the precedingtheorem tells us thatZ2 is countable as well. This can also be established via a “snakeeating the dots argument.” First regardZ2 as a subset of the Euclidean plane. Starting at(0, 0), trace out a “square spiral” pattern which hits every lattice point(a, b) ∈ Z2.

Example 7.2. Q ∼= N. For each point(a, b) ∈ Z2, we can associate the fractiona/b.Some of these will be meaningless (ifb = 0) and many will be repeats, since1/2 = 2/4 =3/6 = · · · , for example. We can, however, produce a list of all ofQ by using the “snakeargument” from the preceding example to produce a complete list of all possible rationalnumbers.

Another way to prove thatQ ∼= N is to use the fact thatQ ∩ [0, 1] ∼= N and employsome of our theorems on constructing countable sets. We leave this as an exercise.

27

28 Lecture 7. Cantor’s Theorem

7.2. Cantor’s Diagonal Argument

One might begin to suspect that bijections between infinite sets are essentially mean-ingless and that “all infinities are the same.” Shockingly, it turns out that this is not thecase. The following remarkable theorem is due to Georg Cantor:

Theorem 7.3(Cantor). R is uncountable. In other words, there does not exist a bijectionf : N → R.

Proof. Suppose toward a contradiction that a bijectionf : N → R exists (in fact, we willprove that nosurjectionf : N → R exists). We will use the fact that any real number canbe written uniquely as a sequence of decimal digits1. Since the functionf is supposed tobe a bijection fromN to R, we obtain a complete listing

f(0), f(1), f(2), f(3), . . .

of all R. Let us write this list as an array:

f(0) = d00.d01d02d03d04 . . .

f(1) = d10.d11d12d13d14 . . .

f(2) = d20.d21d22d23d24 . . .

f(3) = d30.d31d32d33d34 . . .

f(4) = d40.d41d42d43d44 . . .

......

...

where thedi0’s are integers and thedij ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} for j ≥ 1. We willtake the “diagonal number”:

d00.d11d22d33d44 . . .

and tweak it so that the resulting number cannot possibly be on our list. This will be ourdesired contradiction.

Consider the new number

x = D0.D1D2D3 . . .

where the new digitsDn are defined by

Dn =

{4 dnn 6= 4

7 dnn = 4.

Note that for eachn ∈ N, thenth decimal place ofx is different than thenth decimal placeof f(n). In other words,x cannot be any of thef(n) and hence the functionf : N → R isnot surjective, a contradiction. �

The numbers 4 and 7 in the preceding proof are not important. We just do not want touse 9’s in either case since otherwise the numbery produced might end in all 9’s, whichwould cause a problem since we are using decimal expansions that do not trail off in all9’s.

MORAL: R is so much larger than N that it belongs to a higherclass of infinite sets. In other words, there are different levelsof infinity.

1If we agree not to end in all9’s. For instance:0.50000 . . . = 0.4999999 . . ..

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 29

Corollary 3. The setR\Q of irrational numbers is uncountable. In particular, therearemore irrational numbers than rational numbers.

Proof. Recall thatQ is countable. Since the union of two countable sets is countable, ifR\Q were countable, thenR would be countable too. This would be a contradiction toCantor’s Theorem. �

Corollary 4. Every subinterval(a, b) of R is uncountable.

Sketch of Pf. It suffices to find a bijection between(a, b) andR. This can be done, forinstance, by composing an appropriate linear functionf(x) = ax + b (with a 6= 0) acontinuous, monotone increasing function with two vertical asymptotes, such asg(x) =tan−1 x on the interval(−π/2, π/2) or h(x) = x/(1 − x2) on the interval(−1, 1). �

The previous corollary asserts that not only doesR contain vastly more elements thanN, any tiny subinterval ofR, no matter how small, does as well. This may seem paradoxicalat first, and it takes a long time to digest. Try to think of thisin the context of the fact thatbetween any two rational numbers, there is an irrational number and that between any twoirrational numbers, there is a rational number.

LECTURE 8

The Continuum Hypothesis

8.1. Cantor’s Powerset Theorem

Definition. If A is a set, then thepower setof A, denoteP(A) is defined to be the set ofall subsets ofA. In symbols:

P(A) = {B : B ⊆ A}.

Example 8.1. If A = {a, b, c}, then

P(A) ={∅, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A

}.

Example 8.2.Describing the power set of infinite sets is much trickier. For instance,P(N)contains every possible subset ofN and hence contains the sets

∅, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}.

It turns out thatP(N) is much larger thanN itself. In fact, Cantor showed that thereare “infinitely many levels of infinity”:

Theorem 8.1(Cantor). If S is any set, then there does not exist a bijectionf : S → P(S).In other words, “P(S) is of a strictly larger cardinality thanS.”

Proof. Assume toward a contradiction thatf : S → P(S) is a bijection. For eachx ∈ S,we havef(x) ⊆ S and hence eitherx ∈ f(x) or x /∈ f(x). Let

E = {x ∈ S : x /∈ f(x)}.

Sincef is a bijection, there exists az ∈ S such thatf(z) = E. However,

z ∈ E ⇔ z /∈ f(z) Def. of E

⇔ z /∈ E Sincef(z) = E.

This contradiction shows that no suchf can exist. �

The preceding theorem shows that ifS is an infinite set, thenP(S) is “much bigger”thanS itself, so much bigger that it bumps up to a “higher level of infinity.” Moreover, wecan obtain a chain of ever larger infinite sets:

S, P(S), P(P(S)), . . . .

This is a somewhat shocking thought! Now for a big question:

Question. If N ⊆ A ⊆ R, then is it necessarily the case that eitherA ∼= N or A ∼= R?Phrased another way, are there “intermediate cardinalities” between that ofN and that ofR?

30

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 31

8.2. Russell’s Paradox

Having worked with sets a little bit, you might be surprised to learn that our approachto sets is not logically sound. In fact, it is called “naive set theory” to distinguish it fromthe rigorous axiomatic approach used in formal set theory. Astartlingly simple logicalparadox due to Bertrand Russell immediately shows that the basis of this approach to setsis unsound.

One of the basic principles of “naive set theory” is theGeneral Comprehension Prin-ciple, which we implicitly used above. In the early days of set theory (around 1873–1900),mathematicians and logicians had always assumed that you can always define a set if youhave a “definite property”P (x). In other words, given a reasonable statementP (x), the setof all x for whichP (x) is true should exist, logically speaking. Essentially, they assumedthat

{x : P (x)}should always exist and be something that we are allowed to think about and discuss logi-cally. Surprisingly, this is not the case.

The death blow to naive set theory came in 1901 and it is calledRussell’s Paradox.Russell begins by letting

R = {x : (x is a set ) ∧ (x /∈ x)}In other words,

R is the set of all sets that are not elements of themselves.

The expressionP (x) = (x is a set ) ∧ (x /∈ x)

is quite unambiguous. An objectx shouldeither be a set or not a set. An objectx shouldeither be an element of itself or not be an element of itself. ThusP (x) looks like an unam-biguous, if a little unusual, condition. As logical human beings, we should be permitted tothink about the setR.

Russell then asks:DoesR contain itself or not?Unfortunately, the definition ofRimplies that

R ∈ R ⇔ R /∈ R.

NeitherR ∈ R nor R /∈ R is logically possible! This means that we cannot treatRas a set – it is simply “too large” of an idea to be considered ina logically sound manner.In other words, we cannot logically consider “the set of all sets that are not elements ofthemselves” without running into paradoxes. We just cannot– it is a law of the universe.

Russell’s Paradox shows thatthe General Comprehension Principle is not correct.Russell discovered this paradox and sent it to Gottlob Frege(1848 – 1925) as Frege wasfinishing hisGrundgesetze der Arithmetik, a work which attempted to rigorously derive thelaws of arithmetic from supposedly logical axioms. Russell’s Paradox invalidated much ofFrege’s work. Indeed, Frege noted:

A scientist can hardly meet with anything more undesirable than to have thefoundation give way just as the work is finished. I was put in this positionby a letter from Mr. Bertrand Russell when the work was nearlythrough thepress.

There are many other logical paradoxes that have been discovered throughout theyears, but Russell’s paradox is one of the most important. Itforced mathematicians andlogicians to completely reevaluate mathematics and logic from the ground up. Russell’s

32 Lecture 8. The Continuum Hypothesis

Paradox ushered in a new age in which sets would have to be treated in a rigorous axiomaticfashion. The rules would have to be explicitly stated in sucha way that Russell’s Paradoxwould not occur in the universe ofAxiomatic Set Theory. Although we will not discussaxiomatic set theory in this course, it is important to be aware that sets and set theory arenot as simple as they sound.

Here are a couple of paradoxes which are somewhat similar in spirit:

Example 8.3. A car is equipped with aRussell lighton its dashboard. The light turns onto warn the driver if a light has burnt out. What happens when the Russell light burns out?

Example 8.4.The following paradox of Eubulides of Miletus1 (4th century BCE) indicatesthat self-reference can be troublesome:

This statement is false.

This is a troublesome sentence (call itP ) since

P is true ⇔ P is false .

Thus Eubulides’ statement is not a logical proposition. This paradox is similar to theliarparadox: I am lying .

8.3. The Continuum Hypothesis

Question. If N ⊆ A ⊆ R, then is it necessarily the case that eitherA ∼= N or A ∼= R?Phrased another way, are there “intermediate cardinalities” between that ofN and that ofR?

The Continuum Hypothesis(CH) asserts that ifN ⊆ A ⊆ R, then eitherA ∼= Nor A ∼= R. Georg Cantor believed CH to be true, and spent years attempting to proveit. David Hilbert, one of the greatest mathematicians in history, placed it first on his listof open questions presented to the 1900 International Mathematical Congress in Paris.Surprisingly, the question of whether CH is true or false is not possible to answer.

In 1940, Kurt Godel proved that CH cannot be disproved from the axioms of set the-ory. Specifically, he showed that CH cannot be disproved using the Zermelo-Fraenkel (ZF)axioms or using the Zermelo-Fraenkel axioms with the addition of the (at one time contro-versial) Axiom of Choice (AC). This extended axiom system isdenoted ZFC. In 1963, PaulCohen demonstrated that CH cannot be proved from ZFC either and hence CH is logicallyindependent of ZFC – it is neither true nor false, with respect to the standard axioms of settheory (of course, the results of Godel and Cohen rely on theassumption that ZFC is notin itself flawed).

Using the standard (ZFC) axioms of set theory, one can add CH or its negation toobtain two different versions of mathematics, one in which CH is true and one in whichCH is false. Each universe is as valid as the other – the truth or falsehood of CH is thereforea matter of opinion, since it cannot be proved or disproved from ZFC. This seems bizarre,but it is easier to understand if we examine a similar situation that occurred in classicalgeometry.

1I have actually been to Miletus (now known asMilet, in modern Turkey). There are many fascinatingRoman era ruins, partially sunken below a swamp, which are open to the public. There are, however, few touristswho visit the site.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 33

8.4. Digression on Geometry

Around 2300 years ago, the famous geometer Euclid of Alexandria (in modern Egypt)wrotethe Elements, a monumental treatise on geometry and related topics. Whatis remark-able aboutthe Elementsis that it is an attempt to build geometry in a logical and rigorousmanner from a few basic axioms. Although Euclid’s book contains numerous oversightsand hidden assumptions, it is nonetheless a magnificent intellectual achievement (despitethe fact that many of the results inthe Elementswere already known to others and thatEuclid collected them together in a textbook).

Euclid worked with certain primitive notions, for example points, lines, and circles,although he did attempt to give some vague definitions. For instance, he says that:

Definition. A point is that which has no part.

Definition. A line is breadthless length.

Although this might appear silly, how would you define a “point”? One might betempted to say that:

A point is an ordered pair(x, y) whose entriesx andy are real num-bers,

although the modern Cartesian definition above is not much better. Indeed, what exactlyarereal numbersanyway? One cannot simply say that

A real number is a point on a line,

since this would lead us in circles! Similarly, how would youdefine the wordslengthandangle?

After making 23 somewhat lengthy definitions, defining everything from circles toisosceles trianglesto rhomboids, Euclid is ready to begin talking aboutaxioms2 – state-ments which are given as true.Euclidean geometrythen refers to the vast body of the-orems which can be proved using Euclid’s definitions and axioms. Euclid’saxioms(hecalled thempostulates) for geometry are:

Postulate 1. A straight line segment can be drawn joining any two points.

Postulate 2. Any straight line segment can be extended indefinitely in a straight line.

Postulate 3. Given any straight line segment, a circle can be drawn havingthe segment asradius and one endpoint as center.

Postulate 4. All right angles are congruent.

Postulate 5. If two lines are drawn which intersect a third in such a way that the sum ofthe inner angles on one side is less than two right angles, then the two lines inevitably mustintersect each other on that side if extended far enough.

From these he proceeds to prove many well-known theorems on plane geometry. Un-fortunately, Euclid’s 5th Postulate looks too complicated. Is the 5th Postulate somethingthat we should accept as “true”? Euclid himself must have been unsatisfied with his 5thPostulate since he absolutely held off from using it as long as he could (until his twenty-ninth theorem – Proposition I.29).

Perhaps Postulate 5 is redundant and can be proved from Postulates 1–4? If that werepossible, we would not need to assume that Postulate 5 is trueat all – we could prove it

2He actually also talked aboutcommon notions, which were mainly explicit rules for logical deduction.

34 Lecture 8. The Continuum Hypothesis

from Postulates 1–4 and call it a theorem instead. This is precisely what people tried to do(unsuccessfully) for 2000 years.

Given only Postulates 1–4, it isimpossible to prove or to disprovePostulate 5. Inother words, Euclid’s 5th Postulate isneither true nor falsein the mathematical universegenerated by Postulates 1–4. This is not a statement aboutuniversal truthanduniversalfalsehood, which is reserved for philosophy. It means only that if we are given only Pos-tulates 1–4 as true, we cannot logically deduce the truth or falsehood of Postulate 5. Onesays that the 5th Postulate islogically independentof Postulates 1–4.

This opens up two possible mathematical universes, each as valid as the other. If oneassumes that Euclid’s 5th Postulate istrue and proceeds to prove theorems from basedon Postulates 1–5, then one is proving theorems about Euclidean (flat) geometry. If oneassumes that Euclid’s 5th Postulate isfalseand proceeds to prove theorems in this setting,then one is proving theorems abouthyperbolicgeometry (a type of curved geometry). Theexistence of curved geometries is not surprising to us in the21st century, since we are usedto hearing of relativity and “curved space-time.” Many years ago, however, this was anextremely radical thought. Indeed, the philosopher Immanuel Kant went so far as to saythat “Euclidean geometry is the inevitable necessity of thought.”

LECTURE 9

Normed Vector Spaces

9.1. Vector Spaces

You may have encountered vectors and vector spaces in other settings before. Anabstract vector space is a generalization ofn-dimensional Euclidean space,Rn.

Definition. A vector spaceis a setV endowed with (and closed under) operations calledvector additionandscalar multiplicationsuch that the following hold:

(i) COMMUTATIVITY : u + v = v + u for all u,v in V .

(ii) A SSOCIATIVITY: (u + v) + w = u + (v + w) for all u,v,w in V .

(iii) A DDITIVE IDENTITY: There exists avector0 ∈ V such that

u + 0 = 0 + u = u

for all u ∈ V .

(iv) A DDITIVE INVERSE: For everyu ∈ V , there exists av ∈ V such thatu + v =0.

(v) MULTIPLICATIVE IDENTITY: 1u = u for all u ∈ V .

(vi) D ISTRIBUTIVITY :

a(u + v) = au + av,

(a + b)u = au + bu

for all a, b ∈ R andu,v ∈ V .

Note that in general we do not have a rule that lets usmultiply two vectors (i.e., likea “cross-product”). An important theorem for constructingand identifying new vectorspaces is the following:

Theorem 9.1. A subsetW of a vector spaceV is itself a vector space (with the operations“inherited” from V) if and only ifc1w1 + c2w2 ∈ W for all c1, c2 ∈ R andw1,w2 ∈ W .

Example 9.1. R itself is a vector space. The operations are simply the usualoperations ofaddition and multiplication.

Example 9.2. The simplest and most important nontrivial example of a vector space isn-dimensional Euclidean space,Rn, with the usual operations of vector addition

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn)

and scalar multiplication

a(x1, . . . , xn) = (ax1, . . . , axn).

35

36 Lecture 9. Normed Vector Spaces

Example 9.3. The setMn(R) of all n × n matrices is a vector space. Indeed, matricescan be added and multiplied by constants (one can check that the vector space axioms aresatisfied). In fact,Mn(R) is really a disguised version ofRn2

.

Example 9.4. Let Pn(R) denote the set of polynomials of degree≤ n:

Pn(R) = {a0 + a1x + a2x + · · · + anxn : a0, a1 . . . , an ∈ R}.We consider each polynomial

a0 + a1x + · · · + anxn

to be avectorin Pn(R) and we define vector addition by

(a0 + · · · + anxn) + (b0 + · · · + bnxn) = (a0 + b0) + · · · + (an + bn)xn

and scalar multiplication by

c(a0 + · · · anxn) = ca0 + · · · + canxn.

Notice the similarity betweenPn(R) andRn+1. Each polynomial inPn(R) is uniquelydetermined by an(n + 1)-tuple(a0, a1, . . . , an) of real numbers.

Example 9.5. If X ⊆ Rn, then the setC(X) of all continuous real-valued functionsf : X → R is a vector space. Using the fact that the sum of continuous functions iscontinuous, one can verify thatC(X) is closed under the operations of addition and scalarmultiplication. In particular, note that the zero functionplays the role of the zero vector inC(X).

Example 9.6. The set

V = {f : R → R : (∀x ∈ R)( f ′′(x) + f(x) = 0 )}of all solutions to the differential equation

y′′(x) + y(x) = 0 (9.1)

is a vector space (with the regular multiplication and function addition playing the roles ofscalar multiplication and vector addition).

Recall from Calculus I that every differentiable function is automatically continuous.It therefore follows thatV ⊂ C(R), a known vector space. By Theorem 9.1, to show thatV is a vector space, we need only show that ify1 andy2 are two solutions to (9.1) andc1, c2 ∈ R, c1y1 + c2y2 also satisfies the differential equation (9.1):

(c1y1 + c2y2)′′ + (c1y1 + c2y2) = c1y

′′1 + c2y

′′2 + c1y1 + c2y2

= c1(y′′1 + y1) + c2(y

′′2 + y2)

= c10 + c20

= 0.

In any case, the main point of this discussion is to explain that vector spaces composedof functionsoften arise in natural settings.

9.2. Norms on Vector Spaces

Definition. A norm on a vector spaceV is any function‖ ‖ : V → R that satisfies thefollowing conditions:

(i) ‖v‖ ≥ 0 for all v ∈ V and‖v‖ = 0 if and only if v = 0

(ii) ‖av‖ = |a|‖v‖ for anya ∈ R andv ∈ V ,

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 37

(iii) ‖v + w‖ ≤ ‖v‖ + ‖w‖.

A vector spaceV is anormed linear spaceif there is a norm onV .

The inequality (iii) in the preceding definition is known as theTriangle Inequality.

Example 9.7. R is a normed linear space when equipped with the norm‖a‖ = |a|. In fact,norms are generalizations of the absolute value function tovector spaces. Also observethat if ρ > 0, then‖a‖ = ρ|a| is also a norm onR.

Often there are several possible norms on a given vector space. In that case, we shouldbe specific about stating which norm we are using.

Example 9.8. There are many different norms onRn. For instance, the following normsonRn are extremely important:

‖v‖1 =

n∑

i=1

|vi|,

‖v‖2 =

√√√√n∑

i=1

|vi|2,

‖v‖∞ = sup1≤i≤n

|vi|.

Herev = (v1, v2, . . . , vn) denotes a typical vector inRn. Observe that the2-norm‖ ‖2 onRn is simply the standardEuclidean normthat you encountered in Multivariable Calculusand/or Linear Algebra. You should check that the axioms (i),(ii), and (iii) for a norm areindeed satisfied by the above.

Example 9.9. Any positive multiple of a norm is also a norm. For instance,

‖v‖ =1

n

n∑

i=1

|vi|

is a norm onRn. It is 1/n times the1-norm‖ ‖1 on Rn. Observe that this new norm issimply the mean of the absolute values of the entries ofv. In particular, it is not hard tosee how this norm would come up in statistics.

Example 9.10.If V = C([a, b]), the vector space of continuous functions on[a, b], then wehave a choice of many possible norms. The following norms onC([a, b]) are all extremelyimportant:

‖f‖1 =

∫ b

a

|f(x)| dx,

‖f‖2 =

√∫ b

a

|f(x)|2 dx,

‖f‖∞ = supa≤x≤b

|f(x)|.

Observe that the functions we are considering arecontinuous, the preceding norms areactually well-defined. For instance, iff is continuous, then it has an absolute maximumand minimum on[a, b] by the Extreme Value Theorem from Calculus I and‖f‖∞ is well-defined. In fact, you have actually been using the∞-norm(also called thesup norm) formost of your mathematical career since

‖f‖∞ = the absolute maximum of|f(x)| on [a, b].

38 Lecture 9. Normed Vector Spaces

Example 9.11. If V denotes the vector space of all possible functionsf : [a, b] → R,then there are no “useful” norms that can be defined onV . Indeed, the norms from thepreceding examples are no longer well-defined since withoutany restrictions onf , theintegrals defining the prospective norms need not exist (i.e., they can “blow up” or beundefined). In other words, this vector space is simply “too large” to have any usefulgeometric structure.

LECTURE 10

Metric Spaces

10.1. Metric Spaces

Having seen that normed linear spaces and inner product spaces (seeNotes on InnerProducts) are natural generalizations ofRn, we now turn tometric spaces, which can beloosely characterized as “anything that we can have a halfway decent notion of distancein.”

Definition. A metric spaceis a setM , whose elements are calledpoints, endowed with ametricd : M × M → R that satisfies the following properties:

(i) d(x, y) ≥ 0 for all x, y ∈ M . Moreover,d(x, y) = 0 if and only if x = y,

(ii) d(x, y) = d(y, x) for all x, y ∈ M ,

(iii) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ M .

The third property is called theTriangle Inequality.

The basic idea is thatd is a distance function which assigns a distanced(x, y) betweenany two pointsx, y ∈ M . Essentially, metric spaces are the most general mathematicalobject to which the notion of “distance” applies. Many metric spaces are familiar, someare quite strange and pathological. Since some metric spaces have more than one possiblemetric, we sometimes say that(M, d) is a metric space if we want to be specific aboutwhich metricd we will be using.

Example 10.1. Any normed vector space is automatically a metric space. Themetric issimply given by

d(x, y) = ‖x − y‖.Properties (i) and (ii) of a metric are obviously satisfied, and the triangle inequality (iii)follows from the short computation

d(x, z) = ‖x − z‖= ‖(x − y) + (y − z)‖≤ ‖x − y‖ + ‖y − z‖= d(x, y) + d(y, z).

In particular,Rn is a metric space in several different ways (recall that there are manydifferent ways to place a norm onRn). When confusion might occur, we should be specificabout which metric we are using.

Example 10.2. The set

Mn(R) = {A : A is ann × n matrix}39

40 Lecture 10. Metric Spaces

is a metric space. Indeed, it can be viewed asRn2

and hence we can equipMn(R) withany of the metrics that we place onRn2

. For instance, ifaij denotes theijth entry ofA,then

‖A‖2 =

√√√√n∑

i,j=1

|aij |2

defines a norm onMn(R). Thus ifA anB aren×n matrices with entriesaij andbij , then

d2(A, B) = ‖A − B‖2 =

√√√√n∑

i,j=1

|aij − bij |2

gives us a metric onMn(R). Other metrics that are useful are

d1(A, B) = ‖A − B‖1 =

n∑

i,j=1

|aij − bij |,

andd∞(A, B) = ‖A − B‖∞ = max

1≤i,j≤n|aij − bij |.

One important thing to note is that, unlike normed vector spaces (which include innerproduct spaces), there is no notion of “adding” and “scalar multiplication” in general metricspaces.

Definition. If X is a nonempty set, then we define thediscrete metricon X to be themetric defined by

d(x, y) =

{0 x = y

1 x 6= y.

The discrete metric comes up occasionally in graph theory and computer science. IfX is a metric space equipped with the discrete metric, then allpoints ofX lie at a unitdistance from all other points. This is a difficult thing to visualize. For instance, picturewhat the discrete metric onR “looks like.”

Definition. If X is a metric space andY ⊆ X , thenY is also a metric space (whenequipped with the metric inherited fromX). We say thatY is asubspaceof X .1

Example 10.3. A Mobius strip in R3 (with the standard metric) is also a metric space.The set of all invertiblen× n matrices is ametricsubspace ofMn(R), although it is not avectorsubspace (it is not closed under addition).

Example 10.4. The spiral

Y = {(r cos r, r sin r) ∈ R2 : r ≥ 0}is a subspace ofX = R2 (equipped with the usual Euclidean metric).

Essentially every subset of a metric space gives you a new metric space. However,we must be careful with the wordsubspace. If V is a normed vector space, then it isautomatically a metric space withd(x, y) = ‖x − y‖. Any subset ofV is automatically ametric space, but not all subsets ofV arenormed vector spacessince not all subsets ofVare vector spaces.

1Do not confuse this with the notion of subspace from linear algebra.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 41

10.2. Convergent Sequences

A sequence of points in a metric spaceM is simply a listx0, x1, x2, . . . of some pointsin M (possibly with repetition). Formally speaking, a sequenceis a functionf : N → Mand what we think of as thenth term in the sequencefn is actuallyf(n). In fact, everyfunctionf : N → M defines a sequence and vice-versa. However, the subscript notationis more natural.

Definition. A sequencexn in a metric space(M, d) converges to the limitx ∈ M if foreveryǫ > 0, there existsN ∈ N such thatd(xn, x) < ǫ whenevern ≥ N . In symbols:

(∀ǫ > 0)(∃N ∈ N)(n ≥ N ⇒ d(xn, x) < ǫ).

We denote this bylimn→∞ xn = x, xn → x, or xnd→ x (when we wish to be specific

about which metric we are using).

It should be remarked that when using the notationlimn→∞ xn = x or xn → x it isimportant that we know which metric we are using. Only use this notation when there isno chance of confusion.

The following simple fact from arithmetic is often useful:

Lemma 1 (Theǫ-Principle). Letx ∈ R. If |x| < ǫ for everyǫ > 0, thenx = 0.

Proof. By the Trichotomy Law, eitherx < 0, x = 0, or x > 0. If x > 0, then letǫ = x/2so that0 < x < x/2, which leads to the contradiction0 < 2 < 1. Similar reasoning showsthatx < 0 is impossible as well. Thereforex = 0. �

The following theorem is a classic example of an “ǫ2 -argument”:

Theorem 10.1.Limits are unique (when they exist). In other words, ifxn is a sequence ofpoints in a metric spaceM such thatxn → x andxn → y, thenx = y.

Proof. Let ǫ > 0. By the definition of convergence, there existNx, Ny ∈ N such that

n ≥ Nx ⇒ d(xn, x) < ǫ/2

andn ≥ Ny ⇒ d(xn, y) < ǫ/2.

Let N = max{Nx, Ny}. If n ≥ N , then it follows from the triangle inequality that

d(x, y) ≤ d(x, xn) + d(xn, y)

< ǫ2 + ǫ

2

= ǫ.

Thus0 ≤ d(x, y) < ǫ for any ǫ > 0, whenced(x, y) = 0 by theǫ-Principle. By thedefinition of a metric, it follows thatx = y, as desired. �

Another important property of convergent sequences is thatthey are always bounded:

Theorem 10.2. A convergent sequence is bounded. In other words, iflimn→∞ xn = x,then there existsR > 0 so thatd(xn, x) ≤ R for all n ∈ N.

Proof. Letting ǫ = 1 in the definition of convergence, we find that there existsN ∈ N sothatd(xn, x) < 1 whenevern ≥ N . Since

d(x0, x), d(x1, x), . . . , d(xN−1, x)

42 Lecture 10. Metric Spaces

is a finite sequence, there existsR0 > 0 so that

d(xn, x) < R0, n = 0, 1, 2, . . . , N − 1.

This implies that the inequality

d(xn, x) < max{1, R0} = R

holds for anyn ∈ N. �

Geometrically speaking, the preceding theorem asserts that the “open ball”

BR(x) = {y ∈ M : d(x, y) < R}centered atx and with radiusR contains every point in the sequencexn.

LECTURE 11

Subsequences, Continuity

11.1. Subsequences

Definition. If xn is a sequence in a metric space(M, d), then we say thatyk is asubse-quenceof xn if there is a sequencenk of natural numbers such that

0 ≤ n0 < n1 < · · ·andyk = xnk

.

Observe that the terms in the subsequenceyk must appear in the same order that theyappeared inxn. Furthermore, also note thatk ≤ nk for all k ∈ N.

Example 11.1. The sequence

1,1

2,1

3, . . .

converges to0 in the metric space(R, d), whered(x, y) = |x− y| is the usual metric. Thesequence

1,1

3,1

5,1

7,1

9, . . .

is a subsequence of the original sequence. On the other hand,1

5,1

3,1

3,1

9, 1, . . .

is not a subsequence for a variety of reasons. First, the terms do not appear in the sameorder as they did in the original sequence. Second,1

3 is repeated twice, but only occursonce in the original series.

Theorem 11.1. Every subsequence of a convergent sequence converges and itconvergesto the same limit as the original sequence does.

Proof. Let yk = xnkbe a subsequence ofxn. If ǫ > 0, let N ∈ N be so large that

d(xn, x) < ǫ for n ≥ N . Sincek ≤ nk for all k ∈ N, it follows that d(yk, x) =d(xnk

, x) < ǫ wheneverk ≥ N . Thusyk → x. �

Keep in mind the following examples of sequences and subsequences.

Example 11.2.The sequencexn = (−1)n in R (with the usual metric) does not converge.However, the subsequences1, 1, 1, 1, . . . and−1,−1,−1, . . . do converge (to different lim-its).

Example 11.3. The sequence

xn =

{1/n n even

n n odd

in R (with the standard metric) does not converge. However, one can show that everysubsequence ofxn which does converge converges to0.

43

44 Lecture 11. Subsequences, Continuity

11.2. Continuity

The principle objects of study in analysis are functions. Inparticular, we are inter-ested incontinuousfunctions. The following definition is a generalization of the notion ofcontinuity encountered in calculus:

Definition. Let (A, dA) and(B, dB) be metric spaces. We say that a functionf : A → Bis continuous at a pointx0 ∈ A if

(∀ǫ > 0)(∃δ > 0)( dA(x, x0) < δ ⇒ dB(f(x), f(x0)) < ǫ ).

We say thatf : A → B is acontinuous function onA if f is continuous at each point ofA.

Example 11.4. The preceding definition coincides with the definition of continuity fromcalculus, whenA = B = R, equipped with the regular metricd(x, y) = |x − y|. Feel freeto assume that the continuous functions you knew from calculus are actually continuous.

Example 11.5.Since a norm on a vector space always gives rise to a metric, itfollows that(Mn(R), d) is a metric space where

d(A, B) = ‖A − B‖2 =

√√√√n∑

i,j=1

|aij − bij |2,

In fact, this is essentially the Euclidean metric onRn2

! Thus thedeterminantfunctiondet : Mn(R) → R is continuous (with respect to any of the metrics above) since detis a polynomial in then2 real variablesaij (where1 ≤ i, j ≤ n) and we know fromMultivariable Calculus that polynomial functions are continuous functions.

The following theorem is an example of a standard continuityargument:

Theorem 11.2. Let (A, dA) be a metric space and letB be a normed vector space withnorm‖ ‖B.1 If f : A → B andg : A → B are continuous, thenf + g is also continuous.

Proof. This is anotherǫ/2 argument. Letǫ > 0 be given and note that the definition ofcontinuity gives usδ1 > 0 andδ2 > 0 so that

dA(x, y) < δ1 ⇒ dB(f(x), f(y)) < ǫ2

dA(x, y) < δ2 ⇒ dB(g(x), g(y)) < ǫ2 .

Let δ = min{δ1, δ2}, thendA(x, y) < δ implies that

dB((f + g)(x), (f + g)(y))) = ‖(f + g)(x) − (f + g)(y)‖B

= ‖(f(x) + g(x)) − (f(y) + g(y))‖B

= ‖(f(x) − f(y)) + (g(x) − g(y))‖B

≤ ‖f(x) − f(y)‖B + ‖g(x) − g(y)‖B

= dB(f(x), f(y)) + dB(g(x), g(y))

= ǫ2 + ǫ

2

= ǫ.

Thusf + g is continuous. �

1Recall thatB is automatically a metric space with metricdB(x, y) = ‖x − y‖B .

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 45

Example 11.6. If we takeA = B = R with the usual metric, then the preceding theoremsimply states that the sum of two continuous functions is continuous.

LECTURE 12

Sequences and Continuity

12.1. Sequential Characterization of Continuity

The following theorem is extremely useful for establishingthe convergence of certainsequences:

Theorem 12.1(Squeeze Theorem). Let an ≥ 0 be a sequence inR such thatan → 0. Ifxn is a sequence of points in a metric space(M, d) such thatd(x, xn) ≤ an for all n, thenxn → x.

Proof. Let ǫ > 0 be given and letN ∈ N be so large thatn ≥ N implies thatan < ǫ. Itfollows from this thatn ≥ N impliesd(x, xn) ≤ an < ǫ whencexn → x. �

Using the preceding, we can establish the following sequential characterization ofcontinuity:

Theorem 12.2. Let (A, dA) and (B, dB) be metric spaces. A functionf : A → B iscontinuous at a pointa ∈ A if and only if

andA→ a ⇒ f(an)

dB→ f(a). (12.1)

In particular, if f is continuous at a pointa ∈ A, then

limn→∞

f(an) = f(

limn→∞

an

).

Proof. (⇒) Suppose thatf is continuous ata and thatandA→ a. Let ǫ > 0 and use the

definition of continuity to obtain aδ > 0 so that

dA(a, a′) < δ ⇒ dB(f(a), f(a′)) < ǫ.

The definition of convergence gives usN ∈ N so that

n ≥ N ⇒ dA(an, a) < δ.

Putting this together, we see that

n ≥ N ⇒ dB(f(an), f(a)) < ǫ

and hencef(an)dB→ f(a) as desired.

(⇐) Suppose toward a contradiction that (12.1) holds for every sequencean which con-verges toa but thata is not continuous ata. To see what this means, we must negate thefollowing:

(∀ǫ > 0)(∃δ > 0)(∀a′ ∈ A)( dA(a, a′) < δ ⇒ dB(f(a), f(a′)) < ǫ ).

Performing the negation we find that

(∃ǫ > 0)(∀δ > 0)(∃a′ ∈ A)( dA(a, a′) < δ ∧ dB(f(a), f(a′)) ≥ ǫ ).

46

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 47

Applying this successively withδ = 12n we obtain a sequencean such that

d(a, an) < 12n and d(f(a), f(an)) ≥ ǫ.

By the Squeeze Theorem,andA→ a whencef(an)

dB→ f(a) by hypothesis. However, thiscontradicts the fact thatd(f(an), f(a)) ≥ ǫ for all n ∈ N. Thusf must actually becontinuous ata, as desired. �

12.2. Continuity and Composition

Using the sequential characterization of continuity, we can see that the composition ofcontinuous functions is continuous:

Theorem 12.3.The composition of continuous functions is continuous.

Pf. 1. Let (A, dA), (B, dB), and(C, dC) be three metric spaces and letf : A → B and

g : B → C be two continuous functions. LetandA→ a be a convergent sequence inA.

Sincef is continuous, it follows that

f(an)dB→ f(a).

Sinceg is continuous, it follows that

g(f(an))dC→ g(f(a)).

Therefore(g ◦ f)(an)dC→ (g ◦ f)(a) wheneveran

dA→ a. By the previous theorem, we canconclude that the compositiong ◦ f : A → C is continuous. �

Of course, we can also prove this theorem using theǫ − δ definition:

Pf. 2. Let (A, dA), (B, dB), and(C, dC) be metric spaces and letf : A → B andg :B → C be continuous functions. Ifǫ > 0 is given, the definition of continuity gives ussomeη > 0 so that

dB(b1, b2) < η ⇒ dC(g(b1), g(b2)) < ǫ.

By the definition of continuity, there existsδ > 0 so that

dA(a1, a2) < δ ⇒ dB(f(a1), f(a2)) < η.

We conclude that there existsδ > 0 so that

dA(a1, a2) < δ ⇒ dC(g(f(a1)), g(f(a2))) < ǫ,

whenceg ◦ f is continuous. �

12.3. Limit, Accumulation, and Isolated Points

Definition. Let (M, d) be a metric space and letS ⊆ M .

(i) A point x ∈ M is a limit point of S if there exists a sequencexn in S so thatxn → x.

(ii) A point x ∈ M is a accumulation pointof S if there exists a sequencexn ofdistinct pointsof S so thatxn → x.

(iii) A point x ∈ M is called anisolated pointof S if there existsǫ > 0 such thatBǫ(x) ∩ S = {x}.

HereBǫ(x) denotes the open ball of radiusǫ centered atx.

48 Lecture 12. Sequences and Continuity

In particular, note that

• An accumulation point ofS is automatically a limit point ofS,

• An isolated point ofS automatically belongs toS.

The following example illustrates more basic facts about limit, accumulation, and isolatedpoints:

Example 12.1. Let M = R andd(x, y) = |x − y|. If

(i) S = [0, 1), then1 is a limit point and an accumulation point ofS. In particular,note that neither a limit point nor an accumulation point ofS need actuallybelong toS.

(ii) S = {0}, then0 is an isolated point ofS. On the other hand,0 is also a limitpoint ofS since the sequence0, 0, 0, . . . of points ofS converges to0.

LECTURE 13

Closed Sets

13.1. Limit, Accumulation, and Isolated Points

Definition. Let (M, d) be a metric space and letS ⊆ M .

(i) A point x ∈ M is a limit point of S if there exists a sequencexn in S so thatxn → x.

(ii) A point x ∈ M is a accumulation pointof S if there exists a sequencexn ofdistinct pointsof S so thatxn → x.

(iii) A point x ∈ M is called anisolated pointof S if there existsǫ > 0 such thatBǫ(x) ∩ S = {x}.

HereBǫ(x) denotes the open ball of radiusǫ centered atx.

In particular, note that

• An accumulation point ofS is automatically a limit point ofS,

• An isolated point ofS automatically belongs toS.

The following example illustrates more basic facts about limit, accumulation, and isolatedpoints:

Example 13.1. Let M = R andd(x, y) = |x − y|. If

(i) S = [0, 1), then1 is a limit point and an accumulation point ofS. In particular,note that neither a limit point nor an accumulation point ofS need actuallybelong toS.

(ii) S = {0}, then0 is an isolated point ofS. On the other hand,0 is also a limitpoint ofS since the sequence0, 0, 0, . . . of points ofS converges to0.

Theorem 13.1. Let (A, dA) and(B, dB) be metric spaces. Ifa is an isolated point ofAandf : A → B is any function, thenf is continuous ata.

Proof. Sincea is an isolated point ofA, there existsδ > 0 such thatdA(x, a) < δ impliesthatx = a. If ǫ > 0 is given, then observe that

dA(x, a) < δ ⇒ x = a ⇒ dB(f(x), f(a)) = 0 < ǫ.

Thusf is continuous ata. �

Theorem 13.2. If (M, d) is a metric space andS ⊆ M , thenx ∈ S is an accumulationpoint ofS if and only ifx is not an isolated point ofS.

Proof. (⇒) If x ∈ S is an accumulation point ofS, then there exists a sequence of distinctpointsxn (none of which arex itself) of S such thatxn → x. It follows that for eachǫ > 0, there existsN ∈ N such thatn ≥ N implies thatd(x, xn) < ǫ. This implies thatxN ∈ Bǫ(x) ∩ S whencex is not an isolated point ofS.

49

50 Lecture 13. Closed Sets

(⇐) Suppose thatx ∈ S is not an isolated point ofS. We will inductively define a sequencexn of distinct points ofS such thatxn → x. First, select a pointx0 such thatd(x, x0) < 1.Now assume that we have found a sequence of distinct pointsx0, x1, . . . , xn of S such thatd(x, xn) < 1

2n .We claim that there exists a pointxn+1 ∈ S, not among the pointsx0, x1, . . . , xn,

such thatd(x, xn+1) < 12n+1 . Indeed, suppose toward a contradiction that no such choice

of xn+1 is possible. Let

ǫ < min{d(x, x0), d(x, x1), . . . , d(x, xn)}and observe that

Bǫ(x) ∩ S = {x},whencex is an isolated point ofS. Since this is a contradiction, it follows that such anxn+1 can always be found.

The sequencexn so constructed satisfiesd(x, xn) < 12n for all n ∈ N whencexn → x

by the Squeeze Theorem. By definition, it follows thatx is an accumulation point ofS. �

13.2. Closed Sets

Definition. The set of limit points ofS is denotedS and called theclosure ofS (withrespect toM andd). A subsetS of a metric space(M, d) is calledclosed(with respect toM andd) if every limit point ofS belongs toS. In other words,S is closed if and only if

S = S.

Example 13.2. If (M, d) is a metric space, then∅ andM are both closed sets. Indeed, theclosure of∅ is ∅ since there are no elements of∅ to make sequences with. On the otherhand,M is closed since any convergent sequence inM converges to a point ofM . ThusM contains all of its limit points.

Lemma 2. S ⊆ S holds for any subsetS of a metric space(M, d).

Proof. Each elementx ∈ S is automatically a limit point ofS. Indeed,x is the limit ofthe sequencex, x, x, x, x, x, . . .. �

Example 13.3. This example illustrates that one needs to make sure that the“big metricspace” is declared beforehand. For instance,Q is a closed subset of the metric space(Q, d),whered denotes the standard metricd(x, y) = |x − y|. When usingQ as the “big metricspace,” it is as if we are assuming that irrational numbers nolonger exist – as far asQ isconcerned, they do not.

On the other hand,Q is not closed when considered as a subset of the metric space(R, d). For example,

√2 is a limit point ofQ in R and

√2 /∈ Q. In fact, the density ofQ in

R implies thatQ = R. We therefore see that the property of being closed depends stronglyon the “big metric space” which the set in question belongs to.

Example 13.4. ConsiderC([a, b]) with the “infinity metric”:

d∞(f, g) = ‖f − g‖∞ = supa≤x≤b

|f(x) − g(x)|.

Consider the subspaceP = {p(x) : p is a polynomial} of C([a, b]). What isP? Thisis a deep and difficult question (which was answered by Weierstrass). We will hopefullydiscuss the answer later in the course.

LECTURE 14

Open Sets

14.1. Closed Sets

Definition. The set of all limit points ofS is denotedS and called theclosure ofS (withrespect toM andd).

Since each elementx ∈ S is a limit point of S (i.e., x is the limit of the sequencex, x, x, x, x, x, . . .), it follows that

S ⊆ S. (14.1)

Definition. A subsetS of a metric space(M, d) is calledclosed(with respect toM andd) if every limit point ofS belongs toS. In other words,S is closed if and only if

S = S.

Example 14.1. If (M, d) is a metric space, then∅ andM are both closed sets. Indeed, theclosure of∅ is ∅ since there are no elements of∅ to make sequences with. On the otherhand,M is closed since any convergent sequence inM converges to a point ofM . ThusM contains all of its limit points.

Theorem 14.1. If (M, d) is a metric space andS ⊆ M , then

S = S.

In other words, the closure of a set is a closed set.

Proof. By (14.1), we need only prove that

S ⊆ S. (14.2)

Givenx ∈ S, we must find a sequencexn in S such thatxn → x. However, sincex ∈ S,we know that there exists a sequenceyn in S such thatyn → x. Hence for anyǫ > 0, thereexistsN ∈ N such that

n ≥ N ⇒ d(x, yn) <ǫ

2.

On the other hand, sinceyn belongs toS, it follows that there existsxn ∈ S such thatd(xn, yn) < ǫ

2 . Putting this all together we find thatn ≥ N implies that

d(x, xn) ≤ d(x, yn) + d(yn, xn)

2+

ǫ

2= ǫ.

Thusxn is a sequence inS which converges tox, whencex ∈ S. This establishes (14.2)and completes the proof. �

51

52 Lecture 14. Open Sets

Example 14.2. This example illustrates that one needs to make sure that the“big metricspace” is declared beforehand. For instance,Q is a closed subset of the metric space(Q, d),whered denotes the standard metricd(x, y) = |x − y|. When usingQ as the “big metricspace,” it is as if we are assuming that irrational numbers nolonger exist – as far asQ isconcerned, they do not.

On the other hand,Q is not closed when considered as a subset of the metric space(R, d). For example,

√2 is a limit point ofQ in R and

√2 /∈ Q. In fact, the density ofQ in

R implies thatQ = R. We therefore see that the property of being closed depends stronglyon the “big metric space” which the set in question belongs to.

Example 14.3. ConsiderC([a, b]) with the “infinity metric”:

d∞(f, g) = ‖f − g‖∞ = supa≤x≤b

|f(x) − g(x)|.

Consider the subspaceP = {p(x) : p is a polynomial} of C([a, b]). What isP? Thisis a deep and difficult question (which was answered by Weierstrass). We will hopefullydiscuss the answer later in the course.

Theorem 14.2.Let(M, d) be a metric space. For eachx ∈ M and for eachδ > 0, the set

Cδ(x) = {y ∈ M : d(x, y) ≤ δ}is a closed subset ofM . The setCδ(x) is referred to as the closed ball of radiusδ centeredat x.

Proof. To prove thatCδ(x) is closed, we need only show thatCδ(x) contains all of itslimit points. Suppose thatyn is a sequence inCδ(x) which converges to a pointy. Thismeans that for anyǫ > 0, there existsN ∈ N so that

n ≥ N ⇒ d(yn, y) < ǫ.

Thereforen ≥ N implies that

d(x, y) ≤ d(x, yn) + d(yn, y)

< δ + ǫ.

Since this holds for everyǫ > 0, it follows from theǫ-Principle thatd(x, y) ≤ δ whencey ∈ Cδ(x). �

Example 14.4. It is not true in general that

Bǫ(x) = Cǫ(x) = {y ∈ M : d(x, y) ≤ ǫ}.Let M be any nonempty set which contains at least two points, letd be the discrete metriconN, and letǫ = 1. In this case

Bǫ(x) = {y ∈ M : d(x, y) < 1} = {x}

whenceBǫ(x) = {x}. On the other hand,

{y ∈ M : d(x, y) ≤ 1} = M 6= {x}sinceM contains at least two points.

Corollary 5. The interval[a, b] (wherea < b) is a closed subset ofR (equipped with thestandard metric).

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 53

Proof. The interval[a, b] is simply

C b−a2

( b+a2 ) = {y ∈ R : |y − b+a

2 | ≤ b−a2 },

the closed ball inR of radius(b − a)/2 centered at(b + a)/2. Indeed, the inequalities

− b−a2 ≤ y − b+a

2 ≤ b−a2

are equivalent toa ≤ y ≤ b. �

14.2. Open Sets

Definition. A subsetS of a metric space(M, d) is calledopen(with respect toM andd)if for eachx ∈ S there exists a correspondingδ > 0 such that

Bδ(x) ⊆ S.

Example 14.5. If (M, d) is a metric space, then∅ andM are open subsets ofM .

We must be careful with the term “open ball” since we now have a technical definitionfor the termopen. Is what we call anopen ballactually an open set, according to ourdefinition? Fortunately, the answer is yes.

Theorem 14.3.Let(M, d) be a metric space. For eachx ∈ M and eachδ > 0, the subset

Bδ(x) = {y ∈ M : d(x, y) < δ}.is an open subset ofM .

Proof. Let x ∈ M and letδ > 0. If y ∈ Bδ(x), thend(x, y) < δ and hence

r = δ − d(x, y) > 0

Consider the corresponding “openr ball centered aty”:

Br(y) = {z ∈ M : d(y, z) < r}.For eachz in Br(y), it follows that

d(x, z) ≤ d(x, y) + d(y, z)

< d(x, y) + (δ − d(x, y))

= δ.

Thereforez ∈ Bδ(x) whenceBr(y) ⊆ Bδ(x). By the definition of open sets, it followsthatBδ(x) is open. �

LECTURE 15

Set Operations with Open and Closed Sets

15.1. Complements of Open and Closed Sets

The relationship between open and closed sets is the following “duality theorem”:

Theorem 15.1. In a metric space(M, d),

(i) the complement of an open set is closed

(ii) the complement of a closed set is open.

Pf. of (i). Suppose thatS is an open subset ofM . We wish to prove that its complementSc is closed. Suppose thatxn is a sequence inSc and thatxn converges to some pointx inM . Suppose toward a contradiction thatx /∈ Sc (i.e.,x ∈ S). SinceS is an open set, thereexistsǫ > 0 so thatBǫ(x) ⊆ S. By the definition of convergence, there existsN ∈ N sothatn ≥ N implies thatd(xn, x) < ǫ. This implies that

xN ∈ Bǫ(x) ⊆ S.

SincexN ∈ S contradicts the fact thatxN ∈ Sc, it follows thatSc is closed. �

Pf. of (ii). Let S ⊆ M be closed and suppose toward a contradiction thatSc is open. Inother words, suppose that there exists somex ∈ Sc such thatBǫ(x) 6⊆ Sc for everyǫ > 0.It follows that for everyn ∈ N, there existsxn ∈ S such thatxn ∈ B 1

2n(x). This implies

thatd(xn, x) < 12n whencexn → x by the Squeeze Theorem. SinceS is closed, it follows

thatx belongs toS. However, this contradicts the fact thatx belongs toSc. �

Example 15.1.The “half-open” intervals[a, b) and(a, b] are neither open nor closed inR.On the other hand,∅ andR are both closed and open inR.

Definition. Let (M, d) be a metric space. A subsetS ⊆ M is calledclopenif S if bothopen and closed.

Example 15.2. In a metric space(M, d), the sets∅ andM are clopen. There are some-times other clopen sets in a metric space – we will learn more about clopen sets when wediscuss connectivity.

15.2. Set Operations with Open and Closed Sets

It turns out that open and closed sets have relatively nice properties, set theoreticallyspeaking. The following theorem tells us how open sets reactto the usual set theoreticaloperations:

54

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 55

Theorem 15.2.Let (M, d) be a metric space.

(i) the arbitrary union of open sets is open

(ii) the intersection of finitely many open sets is open

(iii) ∅ andM are open sets.

Pf. of (i). Let I denote an index set and consider the indexed union⋃

i∈I

Ai = { x ∈ M : (∃i ∈ I)(x ∈ Ai) }

of open setsAi. If x ∈ ∪i∈IAi, then there existsi ∈ I so thatx ∈ Ai. Since the setAi isopen, there existsδ > 0 so thatBδ(x) ⊆ Ai ⊆ ∪i∈IAi. Thus∪i∈IAi is open. �

Pf. of (ii). Let A1, A2, . . . , An be open subsets ofM and let

x ∈n⋂

i=1

Ai = B.

Since eachAi is open, there exist correspondingδi > 0 so thatBδi(x) ⊆ Ai. If

δ = min{δ1, δ2, . . . , δn},then it follows that

Bδ(x) ⊆ Bδi(x) ⊆ Ai

for i = 1, 2, . . . , n. ThereforeBδ(x) ⊆ ∩ni=1Ai and hence∩n

i=1Ai is an open set. �

Pf. of (iii). Trivial. �

We can phrase the preceding theorem in terms of closed sets:

Corollary 6. Let (M, d) be a metric space.

(i) the arbitrary intersection of closed sets is closed

(ii) the union of finitely many closed sets is closed

(iii) ∅ andM are closed sets.

Proof. This follows from de Morgan’s laws(⋂

i∈I

Ai

)c

=⋃

i∈I

Aci ,

(⋃

i∈I

Ai

)c

=⋂

i∈I

Aci ,

which are valid for any index setI (whether finite or infinite). �

A useful theorem (which we shall not prove, at least yet) is the following:

Theorem 15.3. An open setS ⊆ R can be uniquely expressed as a countable union ofdisjoint open intervals in such a way that the endpoints of these intervals do not belong toS.

LECTURE 16

Topological Characterization of Continuity

16.1. Inverse Images

Definition. Let f : A → B be a function and letY ⊆ B. Theinverse image ofY underfis theset

f−1(Y ) = {a ∈ A : f(a) ∈ Y }.Be aware of the fact that the notationf−1 has nothing to do with reciprocals orinversefunctions. In particular,f−1(Y ) exists as a set, even if the functionf is not invertible orY = ∅.

It is important to note that

a ∈ f−1(Y ) ⇔ f(a) ∈ Y.

It is rather convenient that inverse images work well with the standard set operations:

Theorem 16.1. If f : A → B is a function andC, D ⊆ B, then

(i) f−1(C ∪ D) = f−1(C) ∪ f−1(D),

(ii) f−1(C ∩ D) = f−1(C) ∩ f−1(D),

(iii)(f−1(Y )

)c= f−1(Y c).1

Pf. of (i). Since

x ∈ f−1(C ∪ D) ⇔ f(x) ∈ C ∪ D def. inv. img.

⇔ (f(x) ∈ C) ∨ (f(x) ∈ D) def. of∪⇔ (x ∈ f−1(C)) ∨ (x ∈ f−1(D)) def. inv. img.

⇔ x ∈ f−1(C) ∪ f−1(D) def. of∪,

it follows that the conditions for membership inf−1(C ∪ D) andf−1(C) ∪ f−1(D) areidentical. �

Pf. of (ii). This is similar to the previous proof:

x ∈ f−1(C ∩ D) ⇔ f(x) ∈ C ∩ D def. inv. img.

⇔ (f(x) ∈ C) ∧ (f(x) ∈ D) def. of∩⇔ (x ∈ f−1(C)) ∧ (x ∈ f−1(D)) def. inv. img.

⇔ x ∈ f−1(C) ∩ f−1(D) def. of∩.

Since the conditions for membership inf−1(C ∩D) andf−1(C) ∩ f−1(D) are identical,it follows thatf−1(C ∩ D) = f−1(C) ∩ f−1(D). �

1Observe thatY c ⊆ B and that`

f−1(Y )´c

⊆ A. In other words, the complements are with respect toB

andA, respectively.

56

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 57

We leave the proof of (iii) to the reader.

16.2. Topological Characterization of Continuity

Theorem 16.2. Let (A, dA) and (B, dB) be metric spaces and letf : A → B. Thefollowing statements are equivalent:

(i) f is continuous (i.e.,f satisfies theǫ − δ definition).

(ii) f−1(Y ) is a closed subset ofA wheneverY is a closed subset ofB.

(iii) f−1(Y ) is an open subset ofA wheneverY is an open subset ofB.

(iv) f(an)dB→ f(a) wheneveran

dA→ a.

Proof. (i) ⇔ (iv): This was a previous theorem proved in class.

(i) ⇒ (ii): Let Y be a closed subset ofB. Let xn be a sequence inf−1(Y ) whichconverges to some pointx in A. Sincef is continuous, the equivalence(i) ⇔ (iv) ensuresthat

f(xn)dB→ f(x).

Since the sequencef(xn) belongs to the closed subsetY , it follows thatf(x) also belongsto Y . Thusx belongs tof−1(Y ) and thereforef−1(Y ) is a clsoed set.

(ii) ⇒ (iii): This follows immediately from the fact that(f−1(Y )

)c= f−1(Y c)

for Y ⊆ B.

(iii) ⇒ (i): Let x ∈ A and letǫ > 0 and note thatBǫ(f(x)) is an open subset ofB. Bycondition (iii), the setf−1(Bǫ(f(x)) is an open subset ofA which containsx. Thereforethere existsδ > 0 so that

Bδ(x) ⊆ f−1(Bǫ(f(x)).

In other words,

d(x, y) < δ ⇒ y ∈ Bδ(x)

⇒ y ∈ f−1(Bǫ(f(x)))

⇒ f(y) ∈ Bǫ(f(x))

⇒ d(f(x), f(y)) < ǫ,

which is exactly what we need to satisfy theǫ − δ condition. �

Example 16.1. The theorem does not say that thatf(X) is open ifX is open. Indeed, letA = B = R and letf be the zero function. Thenf(X) = {0} for any subsetX of R,regardless of whetherX is open or not.

Example 16.2. The theorem does not say that thatf(X) is closed ifX is closed. Indeed,let A = B = R and letf(x) = tan−1 x. Note thatX = R is a closed subset ofR but thatf(X) = (−π/2, π/2), which is not closed.

In additional to providing an elegant topological characterization of continuity, thepreceding theorem is extremely useful since it often provides a shortcut to proving thatsets are open or closed.

58 Lecture 16. Topological Characterization of Continuity

Example 16.3.The setS = {πn : n ∈ Z} is a closed subset ofR sinceS = sin−1({0}).Heresin−1({0}) denotes the inverse image of the closed set{0} under the sine function(which we are assuming is continuous). There are of course other ways of proving thatSis closed, but none quite so easy.2

Example 16.4. The interior of an ellipse

S = {(x, y) ∈ R2 : ax2 + by2 < c}, a, b, c > 0

is an open subset ofR2 (with respect to the usual metric). Indeed, the functionf : R2 → Rdefined by

f(x, y) = ax2 + by2

is clearly continuous (it is the sum of the continuous functionsg(x, y) = ax2 andh(x, y) =by2, which are themselves products of continuous functions . . .). Since the set(−∞, c) isan open subset ofR, it follows that

S = f−1( (−∞, c) )

is an open subset ofR2. Similar arguments apply to other planar regions defined in termsof strict inequalities.

Example 16.5. ConsiderR3, equipped with the usual metric. Recall that a planeP is asubset of the form

P = {(x, y, z) ∈ R3 : ax + by + cz = d}wherea, b, c, d ∈ R are constants. Since the functionf : R3 → R defined by

f(x, y, z) = ax + by + cz

is continuous, it follows that

f−1({d}) = {(x, y, z) ∈ R3 : f(x, y, z) = d} = P

is a closed subset ofR3 since{d} is a closed subset ofR. Similar arguments show thatmost “surfaces” inR3 are closed sets.

Example 16.6.There does not exist a continuous functionf : R → R (with respect to theusual metric onR) so thatf(x) ≥ 0 if x ∈ Q andf(x) < 0 if x /∈ Q. Indeed, if such anfexisted, thenf−1( [0,∞) ) would be a closed subset ofR. However,f−1( [0,∞) ) = Q isnot closed.

2AlthoughS is a union of the closed sets{πn} (n ∈ Z), this union is not a finite union.

LECTURE 17

Cauchy Sequences

17.1. Cauchy Sequences

Definition. Let (M, d) be a metric space. A sequencexn in M satisfies theCauchy con-dition if for everyǫ > 0, there existsN ∈ N so that

m, n ≥ N ⇔ d(xn, xm) < ǫ.

If the sequencexn satisfies the Cauchy condition, then we say thatxn is a Cauchy se-quence.

In more descriptive terms, one might say that a Cauchy sequence is a sequence whoseterms eventually get arbitrarily close to each other. One important fact about Cauchy se-quences is the following:

Theorem 17.1.Every convergent sequence is a Cauchy sequence.

Proof. Let (M, d) be a metric space and suppose thatxn → x. If ǫ > 0, then letN ∈ Nbe so large thatn ≥ N implies thatd(xn, x) < ǫ/2. Therefore ifn, m ≥ N we have

d(xn, xm) ≤ d(xn, x) + d(x, xm)

< ǫ2 + ǫ

2

= ǫ. �

Using the preceding theorem we can easily prove that theharmonic series

1 +1

2+

1

3+

1

4+ · · · (17.1)

diverges.

Theorem 17.2.The harmonic series(17.1)diverges.

Proof. Suppose toward a contradiction that the series (17.1) converges. In other words,suppose that the sequence

Sn = 1 +1

2+

1

3+ · · · + 1

nof partial sums converges. Sincelimn→∞ Sn exists, it follows thatSn is a Cauchy se-quence. Thus ifǫ = 1

2 , there exists a correspondingN ∈ N such that

n, m ≥ N ⇒ |Sm − Sn| < 12 .

However, this contradicts the observation that2N, N ≥ N and

S2N − SN =

(1 +

1

2+

1

3+ · · · + 1

2N

)−(

1 +1

2+

1

3+ · · · + 1

N

)

=1

N + 1+

1

N + 2+ · · · + 1

2N

59

60 Lecture 17. Cauchy Sequences

>1

2N+

1

2N+ · · · + 1

2N︸ ︷︷ ︸N times

=1

2. �

The following example illustrates that there are Cauchy sequences which do not con-verge:

Example 17.1. In the metric spaceQ, endowed with the usual metric, the sequence

1, 1.4, 1.41, 1.414, . . . (17.2)

of rational approximations to√

2 is Cauchy. Although this can be verified directly, itfollows from the fact that the sequence (17.2) converges inR. Indeed, the sequence (17.2)is Cauchy inR and is thus Cauchy inQ since the metrics onQ andR agree onQ. However,the sequence (17.2) does notconvergein Q since

√2 is not rational.

This reflects the idea thatQ is a metric space which has “holes.” In some sense, it is“incomplete.” In this case, we have already remedied the problem by enlargingQ to formR by using the Least Upper Bound Principle.

17.2. Completeness

Definition. A metric space(M, d) is calledcompleteif every Cauchy sequence inMconverges (to a limit inM ). In other words, a metric space(M, d) is complete if and onlyif every Cauchy sequence inM converges.

One of the most important aspects of completeness is that anyclosed subspace of acomplete metric spaceinheritsthe property of completeness:

Theorem 17.3.Every closed subset of a complete metric space is itself a complete metricspace. In other words, if(M, d) is a complete metric space andS is a closed subset ofM ,then(S, d) is a complete metric space.

Proof. Let (M, d) be a metric space and suppose thatS is a closed subset ofM . If xn is aCauchy sequence inS, then clearlyxn is a Cauchy sequence inM . SinceM is complete(with respect tod) it follows thatxn is converges to some pointx in M . SinceS is closed,this limit pointx must belong toS. Therefore every Cauchy sequence inS converges to alimit in S and hence(S, d) is a complete metric space. �

We state the following theorem without proof:

Theorem 17.4.R, endowed with the usual metricd(x, y) = |x − y|, is complete.

The proof thatR is a complete metric space ultimately rests on the Least Upper BoundPrinciple. However, the details are somewhat lengthy and consequently the proof is omit-ted.

LECTURE 18

Completeness

Lemma 3. The metricsd1, d2, d∞ onRn satisfy

d∞(x,y) ≤ d2(x,y) ≤ d1(x,y) ≤ n · d∞(x,y) (18.1)

for all x,y in Rn.

Proof. Each of the inequalities can be verified directly from the definitions ofd1, d2, d∞.�

Theorem 18.1. A subsetA ⊆ Rn is open (resp. closed) with respect tod1, d2, or d∞ ifand only if it is open (resp. closed) with respect to either ofthe others. In other words, themetricsd1, d2, d∞ are equivalent in the sense that they induce the same open andclosedsets.

Sketch of Pf. One verifies that the inequality (18.1) implies the equivalence of the follow-ing statements:

(i) xid1→ x

(ii) xid2→ x

(iii) xid∞→ x,

wherexi denotes a sequence inRn andx ∈ Rn. For instance, let us prove that (iii) implies(i).

If xid∞→ x, then for anyǫ > 0 there existsN ∈ N such that

i ≥ N ⇒ d∞(x,xi) <ǫ

n.

By (18.1), this implies that

i ≥ N ⇒ d1(x,xi) < ǫ

whencexid1→ x.

It follows from the equivalence of (i), (ii), and (iii) that the closures with respect tod1,d2, andd∞ of a subsetS of Rn are identical. Since a set is closed if and only if equalsits closure, it follows that the metric spaces(Rn, d1), (Rn, d2), (Rn, d∞) have exactly thesame closed sets. Since the complement of a closed (resp. open) set is open (resp. closed),it follows that these metric spaces also have precisely the same open sets. �

Theorem 18.2.Rn is complete with respect to any of the metricsd1, d2, d∞.

Proof. By (18.1), a sequence inRn which is Cauchy (resp. convergent) with respect tod1,d2, ord∞ is automatically Cauchy (resp. convergent) with respect tothe other two metrics.It therefore suffices to prove thatRn is complete with respect to the metricd∞.

61

62 Lecture 18. Completeness

If a sequencexn is Cauchy with respect tod∞, then theith entriesxj(i) andxk(i) ofthejth andkth vectorsxj andxk satisfy

|xj(i) − xk(i)| ≤ max{ |xj(1) − xk(1)|, . . . , |xj(n) − xk(n)| }= d∞(xj ,xk)

for i = 1, 2, 3, . . . , n. Thus ifxn is a Cauchy sequence with respect to the metricd∞ onRn, thenxn(i) is a Cauchy sequence inR (with the usual metric) for eachi = 1, 2, . . . , n.SinceR is complete, we may define a vectorx ∈ Rn by setting

x = (x(1),x(2), . . . ,x(n))

where the entriesx(1),x(2), . . . ,x(n)

are defined bylim

j→∞xj(i) = x(i), i = 1, 2, . . . , n.

Our next goal is to show that the sequencexn in Rn converges tox with respect to themetricd∞.

Let ǫ > 0 be given and letM1, M2, . . . , Mn ∈ N be so large that

j ≥ Mi ⇒ |xj(i) − x(i)| < ǫ

for i = 1, 2, 3, . . . , n. IfN = max{M1, M2, . . . , Mn},

thenj ≥ N ⇒ (∀i ∈ {1, 2, . . . , n})( |xj(i) − x(i)| < ǫ ).

Putting this together, we find that ifj ≥ N then

d∞(xj ,x) = max{ |xj(1) − x(1)|, |xj(2) − x(2)|, . . . , |xj(n) − x(n)| }< ǫ.

Thereforexnd∞→ x and henceRN is complete with respect tod∞. �

Corollary 7. Mn(R) is complete (with respect to any of the metricsd1, d2, d∞).

Proof. The metricsd1, d2, d∞ on Mn(R) are essentially identical to the correspondingmetrics onRn2

. In fact, Mn(R) (as a normed vector space) is essentiallyRn2

with adifferent labeling scheme. �

18.1. Completions of Metric Spaces

If (M, d) is a metric space which is not complete, then there is a somewhat elaborateprocess by which(M, d) may be embedded into a larger metric space(M ′, d′) which iscomplete. In other words,(M ′, d′) is a complete metric space and there exists an injectionf : M → M ′ that satisfiesd(x, y) = d′( f(x), f(y) ) for all x, y ∈ M (such a function iscalled anisometry). In other words, we can embed a copy of(M, d) inside(M ′, d′) suchthat the metric on the larger metric space(M ′, d′) agrees with the original metric onM .

Theorem 18.3. Every metric space(M, d) can be completed. In other words, there existsa complete metric space(M ′, d′) and an injectionf : M → M ′ which satisfiesd(x, y) =d′( f(x), f(y) ) for all x, y ∈ M .1

1The completion(M ′, d′) depends heavily ond and is “unique up to homeomorphism,” meaning that forall practical purposes any two completions of(M, d) are “topologically identical.”

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 63

The proof of the theorem is somewhat lengthy and the difficulties are mostly nota-tional (although the concepts are interesting). The basic idea is to constructM ′ from“equivalence classes of Cauchy sequences fromM .” We will not go into the details of theconstruction, however.

Example 18.1.Consider the metric space(Q, d) whered(x, y) = |x−y|. It can be shownthat the completion of(Q, d) is essentiallyR (with the normal metric). In fact, some realanalysis textbooks begin by constructingR explicitly from Q using this technique.

Example 18.2. It turns out thatC([a, b]) is complete with respect to the metricd∞ (wewill prove this later in the course), but not with respect tod1 or d2. The completions ofC([a, b]) with respect tod1 andd2 turn out to be theLebesgue spacesL1[a, b] andL2[a, b],respectively. To go into more detail would require a long digression on measure theory andthe Lebesgue integral.

LECTURE 19

Infinite Series

19.1. Cauchy Criterion for Series

Definition. An infinite series∑∞

n=0 an in a normed vector spaceV is said toconvergetoa vectorS ∈ V if the partial sumsSm =

∑mn=0 an tend toS:

∞∑

n=0

an = S ⇔ limm→∞

Sm = S.

Here the limit is taken with respect to the metricd(x, y) = ‖x − y‖ onV .

Definition. We say that a normed vector space(V , ‖ · ‖) is complete ifV is a completemetric space when equipped with the metricd(x, y) = ‖x−y‖. A complete normed vectorspace is sometimes called a Banach space.

Standard examples of Banach spaces areRn and Mn(R) (with respect to any ofd1, d2, d∞). It also turns out that(C([a, b]), d∞) is a Banach space whered∞ denotesthe metric

d∞(f, g) = supa≤x≤b

|f(x) − g(x)|.

Later in the course, we will prove that(C([a, b]), d∞) is indeed complete. In graduateanalysis or differential equations you will encounter manyother Banach spaces, includingthe Lebesgue spaces and the Sobolev spaces. For now, we will mostly be concerned withRn andMn(R).

In a Banach space, we have theCauchy Convergence Criterionfor series:

Theorem 19.1. A series∑∞

n=0 an in a Banach spaceV converges if and only if for everyǫ > 0 there existsN ∈ N so that

k ≥ j ≥ N ⇒ ‖k∑

n=j

an‖ < ǫ.

Proof. SinceV is complete, the given series converges if and only if the sequence

Sm =

n∑

n=0

an

of partial sums is a Cauchy sequence. IfSm is a Cauchy sequence, then for eachǫ > 0there existsN ∈ N such thatk ≥ j ≥ N implies that

‖k∑

n=j

an‖ = ‖Sk − Sj−1‖ = d(Sk, Sj−1) < ǫ.

On the other hand, if the preceding condition holds, then thepartial sumsSm form aCauchy sequence. SinceV is complete, it follows thatlimn→∞ Sm exists. �

64

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 65

The following often overlooked theorem is quite useful in many contexts:

Theorem 19.2. If (V , ‖ · ‖) is a Banach space and if the series∑∞

n=0 an converges, then

limm→∞

( ∞∑

n=m+1

an

)= 0.

Here0 denotes the zero vector inV . In other words, “the tail end of a convergent seriestends to zero.”

Proof. Let ǫ > 0 and findN ∈ N such that

n ≥ N ⇒ ‖S − Sm‖︸ ︷︷ ︸d(S,Sm)

< ǫ.

In other words, we have

n ≥ N ⇒ ‖(S − Sm) − 0‖ < ǫ

whenceS −Sm converges to the vector0. However, this is just another way of saying that

limm→∞

( ∞∑

n=m+1

an

)= lim

m→∞(S − Sm) = 0. �

19.2. The Divergence and Comparison Tests

An important consequence of the preceding theorem is the so-calledDivergence Testfrom Calculus II:

Theorem 19.3.Let∑∞

n=0 an be a series in a Banach space(V , ‖ · ‖).(i) If

∑∞n=0 an converges, thenlimn→∞ an = 0 (the zero vector). In other words:

“the terms of a convergent sequence tend to0.”

(ii) If limn→∞ an 6= 0 (this includes the limit not existing), then∑∞

n=0 an diverges.

Proof. Since (ii) is the contrapositive of (i) and thus it suffices toprove (i). By the CauchyCriterion for Series, we find that for anyǫ > 0 there existsN ∈ N so that

n ≥ N ⇒ ‖Sn − Sn−1‖︸ ︷︷ ︸‖an‖

< ǫ.

Putting this altogether, we find that for anyǫ > 0 there existsN ∈ N so that

n ≥ N ⇒ ‖an − 0‖ < ǫ

from which it follows thatlimn→∞ an = 0. �

Another important consequence of the Cauchy Criterion for Series is the followinggeneralization of theComparison Theoremfrom Calculus II:

66 Lecture 19. Infinite Series

Theorem 19.4.Let∑∞

n=0 an be a series in a Banach space(V , ‖ · ‖).

(i) If∑∞

n=0 bn is a convergent series of non-negative real numbers and if thereexistsN ∈ N so that

n ≥ N ⇒ ‖an‖ ≤ bn,

then∑∞

n=0 an converges. In particular, if

∞∑

n=0

‖an‖

converges (inR), then∞∑

n=0

an

converges (inV). In other words, “every absolutely convergent series inVconverges.”

(ii) If∑∞

n=0 cn is a divergent series of non-negative real numbers and if there existsN ∈ N so that

n ≥ N ⇒ 0 ≤ cn ≤ ‖an‖,then

∑an diverges.

Proof. If∑∞

n=0 bn is a convergent series of non-negative real numbers, then for eachǫ > 0there existsN ∈ N so that

k ≥ j ≥ N ⇒k∑

n=j

bn < ǫ.

Thusk ≥ j ≥ N implies that

‖Sk − Sj−1‖ = ‖k∑

n=j

an‖

≤k∑

n=j

‖an‖

≤k∑

n=j

bn

< ǫ.

By the Cauchy Criterion for Series in a Banach Space, it follows that the series∑∞

n=0 an

converges inV . This establishes (i). We leave the proof of (ii) to the reader. �

The importance of the preceding theorem is that it allows us to conclude that a series ofvectors in a normed vector space converges if a corresponding numerical series converges.Needless to say, it is usually much easy to test for the convergence of a series of realnumbers than a series of vectors in a normed vector space.

Definition. A series∑∞

n=0 an in a Banach space is calledabsolutely convergentif∑∞

n=0 ‖an‖converges inR.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 67

Example 19.1.Not every convergent series (even inR) converges absolutely. For instance,it can be shown that theAlternating Harmonic Series

∞∑

n=1

(−1)n+1

n= 1 − 1

2+

1

3− 1

4+ · · ·

converges (in fact to the valueln 2). However, the corresponding series of positive terms issimply the harmonic series, which diverges.

Example 19.2. If A is anyn × n matrix, then we may define theexponential ofA by theseries

exp(A) =∞∑

n=0

An

n!.

SinceMn(R) is complete with respect tod2 (andd1, d∞ as well), we need only showthat the terms of the preceding series are bounded in norm by the terms of a convergentnumerical series. Using the submultiplicativity ofd2 we find that

∥∥∥∥An

n!

∥∥∥∥2

=1

n!‖An‖2

≤ 1

n!‖A‖n

2 .

Since ∞∑

n=0

‖A‖n2

n!

is a convergent series inR (via the Ratio Test from Calculus II), it therefore follows that∞∑

n=0

An

n!

is a convergent series in(Mn(R), d2). The limit matrix is defined to beexp(A).This is important for the study of differential equations. LetA be ann×n matrix and

consider the initial value problem

y′(t) = Ay, y(0) = y0

where

y =

y1(t)...

yn(t)

, y′ =

y′1(t)...

y′n(t)

, y0 =

a0

...an

.

One can show that the solution is given by the simple formula

y(t) = eAty0.

This is entirely analogous to the fact that the solution to

y′(t) = ay(t), y(0) = y0

is given byy(t) = eaty0.

LECTURE 20

Infinite Series

20.1. An Extended Example

Matrices are particularly interesting to study because their algebraic structure (e.g.,multiplication and inversion) is closely related to their analytic structure (e.g., metrics andconvergence). This example highlights a few such connections.

The following algebraic lemma is quite useful. You have seenit before in the case of1 × 1 matrices (i.e., real numbers):

Lemma 4. The formulae

(I − Am) = (I − A)(I + A + A2 + · · · + Am−1)

= (I + A + A2 + · · · + Am−1)(I − A)

hold for all A ∈ Mn(R) and allm ∈ N. Here we use the notation

A0 = I, A1 = A, A2 = AA, A3 = AAA, . . . ,

whereI denotes then × n identity matrix.

Proof. Matrix multiplication is not commutative in general. Fortunately, powers of aAcommute since matrix multiplication isassociative:

AjAk = (AA · · ·A)︸ ︷︷ ︸j

(AA · · ·A)︸ ︷︷ ︸k

= (AA · · ·A)︸ ︷︷ ︸j+k

= (AA · · ·A)︸ ︷︷ ︸k+j

= (AA · · ·A)︸ ︷︷ ︸k

(AA · · ·A)︸ ︷︷ ︸j

= AkAj .

Since any power ofA (including A0 = I) commutes with any other power ofA, theidentities

(I − A)(I + A + · · · + Am−1) = I − Am

and(I + A + · · · + Am−1)(I − A) = I − Am

can be proved using the same arguments used in the real case. �

Lemma 5. If ‖A‖2 < 1, then∑∞

n=0 An converges (the limit being taken with respect tothed2 metric).

68

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 69

Proof. It suffices to prove that the sum is absolutely convergent. Note that we cannotassume that the series sums to(I − A)−1 since we do not know yet whetherI − A isinvertible.

Since we will be dealing only with the2-norm, we will simply drop the subscript2 inthe following. The sum is absolutely convergent since

∞∑

n=0

‖An‖ ≤∞∑

n=0

‖A‖n =1

1 − ‖A‖ .

Indeed, the preceding shows that the partial sums of the realseries∑∞

n=0 ‖An‖ (which hasnonnegative terms) are bounded above. Also observe that we used the fact that‖A‖ < 1 tosum the resulting real geometric series. Moreover, the inequality ‖An‖ ≤ ‖A‖n followsfrom the submultiplicativity of the2-norm. SinceMn(R) is complete, we know that everyabsolutely convergent series inMn(R) converges inMn(R) and therefore the given seriesconverges to some matrixS. �

Lemma 6. Matrix inverses are unique, when they exist. In other words,if X, Y, Z ∈Mn(R) satisfyXY = Y X = I andXZ = XZ = I, thenY = Z.

Proof. Using the fact that matrix multiplication is associative, we see that

Y = Y I = Y (XZ) = (Y X)Z = IZ = Z. �

Theorem 20.1. If ‖A‖2 < 1, thenI − A is invertible and

(I − A)−1 =

∞∑

n=0

An,

where the series converges absolutely with respect to thed2 metric.

Proof. Be a preceding lemma,∑∞

n=0 An converges to some matrixS. Let

Sm =

m−1∑

n=0

An

denote themth partial sum of the series we are concerned with. Since

(I − Am) = (I − A)Sm

= Sm(I − A)

by a preceding lemma, we may pass to the limit and use the fact thatAm d2→ 0 (the zeromatrix) since‖A‖2 < 1. Using the fact that multiplication byI − A is continuous withrespect tod2, we have

I = (I − A)S = S(I − A)

from which it follows (using the uniqueness of inverses) that S = (I − A)−1. �

LECTURE 21

Integral Test

21.1. The Harmonic Series and Integral Test

In this section, we consider only series with real terms. In other words, we haveV = Rand our metric is implicitly given byd(x, y) = |x − y|.

Example 21.1. Recall that theharmonic series∞∑

n=1

1

n

diverges. In particular,

limn→∞

an =1

n= 0,

but∑∞

n=1 an diverges. In particular, the implication thatlimn→∞ an = 0 implies theconvergence of

∑∞n=1 an is false in general. Make sure you remember this.

Although we have already proved that the harmonic series diverges (Lecture 17), a sec-ond proof of the divergence of the harmonic series is requested on an upcoming homeworkassignment. A “cheap”1 way of seeing that the harmonic series diverges is theIntegral Testfrom Calculus II:

Theorem 21.1(The Integral Test). Suppose thatf : [0,∞) → R is continuous, posi-tive, and decreasing. Ifan = f(n), then

∫∞1 f(x) dx converges if and only if

∑∞n=1 an

converges.

Proof. By considering the graph off(x) and interpreting the partial sums are the sum ofthe areas of boxes, one obtains the inequalities

∫ n

1

f(x) dx ≤ a1 + a2 + · · · + an ≤ a1 +

∫ n

1

f(x) dx. (21.1)

In particular, the convergence of either the improper integral or the series will imply theconvergence of the other. �

Since we have not introduced the integral, let alone improper integrals, in a respectablemanner yet,please do not use the integral test on the homework(until we cover integralsmore formally). In any case, back to the harmonic series:

Example 21.2. The estimates (21.1) from the integral test (withf(x) = 1/x) imply that

lnn ≤ 1 +1

2+ · · · + 1

n≤ 1 + lnn. (21.2)

1In the sense that we have not introduced integrals in a rigorous manner.

70

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 71

In particular, the partial sums of the harmonic series tend to infinity, but extremely slowly.For instance, the sum of the firstmillion terms satisfies:

13.815 . . . <

106∑

n=1

1

n< 14.815 . . . .

The sum of the firstbillion terms satisfies:

20.7233 . . . <

109∑

n=1

1

n< 21.7233 . . . .

In particular, observe that it would be very difficult indeedto conclude that the harmonicseries diverges based on purely numerical evidence. In fact, we would have to add the first

2.688 × 1043

terms to get a partial sum of the harmonic series to be greaterthan100.It follows from (21.2) that

0 ≤ 1 +1

2+ · · · + 1

n− lnn

︸ ︷︷ ︸=F (n)

≤ 1.

This implies that the sequenceF (n) is non-negative (i.e., bounded below by0). Moreover,F (n) is a decreasing sequence since:

F (n) − F (n + 1) =

(1 +

1

2+ · · · + 1

n− lnn

)−(

1 +1

2+ · · · + 1

n + 1− ln(n + 1)

)

= ln(n + 1) − ln(n) − 1

n + 1

=

∫ n+1

n

dx

x− 1

n + 1

> 0.

The inequality follows from the fact that

1

x>

1

n + 1

on the interval[n, n + 1] and hence the area under the graph off(x) from x = n tox = n + 1 must be greater than1/(n + 1).

It follows from the preceding computations that

F (n) = 1 +1

2+ · · · + 1

n− lnn

is a decreasing sequence of real numbers which is bounded below. This implies thatlimn→∞ F (n) exists. This limit is calledEuler-Mascheroni Constantand it is denotedγ (lower-case gamma):

γ = limn→∞

(1 +

1

2+ · · · + 1

n− lnn

)

≈ 0.5772156649 . . . .

After 0, 1, π, e, and the imaginary uniti, γ is perhaps the most important mathematicalconstant. The Euler-Mascheroni constant appears, among other places, in number theory

72 Lecture 21. Integral Test

and complex analysis. For instance, Dirichlet proved that if d(n) denotes the number ofdivisors of an integern, then

limn→∞

(1

n

n∑

i=1

d(i) − lnn

)= 2γ − 1.

This is an interesting statement about the average number ofdivisors of positive integers.

LECTURE 22

Alternating Series

22.1. The Alternating Series Test

The following lemma is useful in a variety of scenarios:

Lemma 7. If an is a sequence in a metric space(M, d) such thatlimn→∞ a2n = L andlimn→∞ a2n+1 = L, thenlimn→∞ an = L.

Proof. Consideringa2n anda2n+1 as sequences in their own right, it follows that for eachǫ > 0 there existsN1, N2 ∈ N such that

n ≥ 12N1 ⇒ d(a2n, L) < ǫ

n ≥ 12N2 ⇒ d(a2n+1, L) < ǫ.

Now letN = max{N1, N2} and observe that

n ≥ N ⇒ d(an, L) < ǫ,

regardless of whethern is even or odd. �

The Alternating Series Testapplies to series of real numbers whose terms alternatebetween positive and negative values.

Theorem 22.1(Alternating Series Test). If an ≥ an+1 > 0 for all n ∈ N andlimn→∞ an =0, then the alternating series

∞∑

n=0

(−1)nan = a0 − a1 + a2 − a3 + · · ·

converges.

Proof. Let Sm denote themth partial sum of the given series. Observe that

S0 = a0

≥ 0

S2 = a0 − a1 + a2

= S0 + (a2 − a1)

≤ S0

S4 = a0 − a1 + a2 − a3 + a4

= S2 + (a4 − a3)

≤ S2

73

74 Lecture 22. Alternating Series

and in general we have0 ≤ · · · ≤ S4 ≤ S2 ≤ S0.

Since the evenly indexed partial sumsS2n form a decreasing sequence which is boundedbelow, it follows thatlimn→∞ S2n exists. Let us denote this limit byS. A similar argumentshows thatlimn→∞ S2n+1 exists as well. By the preceding lemma, it suffices to show thatlimn→∞ S2n+1 = S:

limn→∞

S2n+1 = limn→∞

(S2n − a2n−1)

= limn→∞

S2n − limn→∞

a2n−1

= S + 0

= S. �

Example 22.1. The series∞∑

n=0

(−1)n

√n + 1

= 1 − 1√2

+1√3− 1√

4+

1√5− · · ·

is convergent by the Alternating Series Test.

Recall that thealternating harmonic seriesis the series∞∑

n=1

(−1)n+1

n= 1 − 1

2+

1

3− 1

4+

1

5− · · ·

obtained by inserting± signs into theharmonic series:∞∑

n=1

1

n= 1 +

1

2+

1

3+

1

4+

1

5+ · · · .

The Alternating Series Test asserts that the alternating harmonic series converges, but wecan say much more. In fact, it is possible to find the sum of the alternating harmonic seriesexplicitly. In particular, this eliminates the need to appeal to the Alternating Series Test inthe first place – we can show that the alternating harmonic series converges without it andwe can compute the sum exactly.

Theorem 22.2.The alternating harmonic series converges toln 2:

1 − 1

2+

1

3− 1

4+ · · · = ln 2. (22.1)

Proof. Let

Hm = 1 +1

2+ · · · + 1

m

denote themth partial sum of the harmonic series and let

Sm = 1 − 1

2+ · · · + (−1)m+1

m

denote themth partial sum of the alternating harmonic series. Recall that the Euler-Mascheroni constantγ is defined by the limit

γ = limm→∞

(Hm − lnm)

≈ 0.577 . . . .

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 75

In particular, we proved that the preceding limit exists. A clever trick now shows thattheevenly indexedpartial sumsS2m of the alternating harmonic series converges toln 2.Observe that

S2m = 1 − 1

2+

1

3− · · · + 1

2m − 1− 1

2m

=

(1 +

1

2+

1

3+ · · · + 1

2m

)− 2

(1

2+

1

4+

1

6+ · · · + 1

2m

)

=

(1 +

1

2+

1

3+ · · · + 1

2m

)−(

1 +1

2+

1

3+ · · · + 1

m

)

= H2m − Hm

= H2m − ln(2m) + ln(2m) − Hm

= [H2m − ln(2m)] + ln 2 + [lnm − Hm].

Taking the limit asm → ∞ we find that

limm→∞

S2m = ln 2 + limm→∞

[H2m − ln(2m)] − limm→∞

[Hm − lnm]

= ln 2 + γ − γ

= ln 2.

On the other hand,

limm→∞

S2m+1 = limm→∞

(S2m +

1

2m + 1

)

= limm→∞

S2m + limm→∞

1

2m + 1= ln 2 + 0

= ln 2.

Sincelimm→∞ S2m = limm→∞ S2m+1 = ln 2, it follows that limn→∞ Sn = ln 2. Inother words, we have proved (22.1). �

22.2. Manipulating Series

Theorem 22.3. If∑∞

n=1 an = A and∑∞

n=1 bn = B are convergent series in a Banachspace, then

∞∑

n=1

(an + bn) = A + B

and∞∑

n=1

can = cA

for anyc ∈ R.

Proof. We prove only the first portion of the theorem. The proof of thesecond statementis considerably easier. Ifǫ > 0 is given, then the partial sums

Sm =

m∑

n=1

an, Tm =

m∑

n=1

bn

are convergent sequences inV . Therefore there existN1, N2 ∈ N so that

m ≥ N1 ⇒ ‖Sm − A‖ <ǫ

2

76 Lecture 22. Alternating Series

m ≥ N2 ⇒ ‖Tm − B‖ <ǫ

2.

If n ≥ N = max{N1, N2}, then

‖∑m

n=1(an + bn) − (A + B)‖ = ‖ [(∑m

n=1 an) − A] + [(∑m

n=1 bn) − B] ‖≤ ‖ (

∑mn=1 an) − A‖ + ‖ (

∑mn=1 bn) − B‖

= ‖Sm − A‖ + ‖Tm − B‖

2+

ǫ

2= ǫ. �

In other words, we may add convergent series together (or multiply them by constants)without affecting convergence. Things “work as we would expect,” one might say. On theother hand, taking products of infinite series is tricky indeed. First of all, products are notdefined in most complete normed vector spaces, since their isno notion of multiplication.Second, even forMn(R) which does have a notion of multiplication, we have the additionaldifficulty that multiplication is not commutative. Indeed,the situation is delicate enoughwhen using only real numbers as we shall shortly see.

Example 22.2. We know that

1 − 1

2+

1

3− 1

4+ · · · = ln 2. (22.2)

Multiplying (22.2) by 12 we find that

1

2− 1

4+

1

6− 1

8+ · · · = 1

2 ln 2. (22.3)

Inserting zeros between the terms of (22.3) we find that

0 +1

2+ 0 − 1

4+ 0 +

1

6+ 0 − 1

8+ 0 + · · · = 1

2 ln 2. (22.4)

Adding (22.2) and (22.4) we find that

1 +1

3− 1

2+

1

5+

1

7− 1

4+ · · · = 3

2 ln 2.

In other words, by rearranging the terms of the series (22.2)so that each negative termoccurs after a pair of positive terms we have changed the sum.

Theorem 22.4. The sum of rearrangement of the alternating harmonic series(22.2)con-sisting ofp positive terms andq negative terms (the terms stay in the same relative order)is equal to islog 2 + 1

2 (log p − log q).

Proof. Let Hn denote thenth partial sum of the harmonic series and observe that

Hn = log n +1

n+ γn

whereγn is a sequence converging to the Euler-Mascheroni constantγ. Indeed, note that

Hn − log n = Hn −∫ n

1

dx

x

=

n−1∑

k=1

(1

k−∫ k+1

k

dx

x

)+

1

n

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 77

=

n−1∑

k=1

∫ k+1

k

(1

k− 1

x

)dx

︸ ︷︷ ︸γn

+1

n

= γn +1

n.

Sincelimn→∞ Hn − log n = γ, it follows thatlimn→∞ γn = γ.Consider only partial sums that consists of blocks of terms.Since the pattern has

periodp + q we consider the partial sumsS(p+q)n. Since the sum for each successiveblock ofp + q terms tends to zero, it suffices to prove that

limn→∞

S(p+q)n = log 2 +1

2(log p − log q). (22.5)

Since

S(p+q)n =

(1

1+

1

3+ · · · + 1

2pn− 1

)−(

1

2+

1

4+ · · · + 1

2qn

)

=

(1

1+

1

2+

1

3+ · · · + 1

2pn

)−(

1

2+

1

4+ · · · + 1

2pn

)

−(

1

2+

1

4+ · · · + 1

2qn

)

= H2pn − 1

2Hpn − 1

2Hqn

= log(2pn) + γ2pn +1

2pn− 1

2log(pn) − 1

2γpn − 1

2pn

− 1

2log(qn) − 1

2γqn − 1

2qn

= log 2 + log p + log n − 1

2log p − 1

2log n − 1

2log q

− 1

2log n + γ2pn − 1

2γpn − 1

2γqn − 1

2qn

= log 2 +1

2(log p − log q) + γ2pn − 1

2γpn − 1

2γqn − 1

2qn,

(22.5) follows upon passing to the limit. �

LECTURE 23

Rearrangements of Series

23.1. Rearrangements of Series

Definition. A series∑∞

n=0 an of real numbers is calledconditionally convergentif it isconvergent but not absolutely convergent.

Example 23.1. The alternating harmonic series conditionally convergent.

Definition. Let an be a sequence in a metric space(M, d). A rearrangementof an is asequencebn which is of the formbn = aϕ(n) whereϕ : N → N is a bijection.

Theorem 23.1(Weierstraβ). If∑∞

n=0 an is a conditionally convergent series of real num-bers, then for eachα ∈ R there exists a rearrangementbn of an such that

∑∞n=0 bn = α.

Additionally, there also exist rearrangements whose sums diverge to∞ or −∞.

There are cases when rearrangement does not affect the sum ofa series:

Theorem 23.2. If an ≥ 0 for all n ∈ N and∑∞

n=0 an converges, then∑∞

n=0 an =∑∞n=0 bn for every rearrangementbn ofan. In particular, every rearrangement of

∑∞n=0 an

is convergent.

Proof. Sincebn is a rearrangement ofan, there exists a bijectionϕ : N → N such thatbn = aϕ(n) for all n ∈ N. It suffices to show that

∑∞n=0 bn ≤

∑∞n=0 an. Indeed, if we can

prove this then the reverse inequality∑∞

n=0 an ≤∑∞n=0 bn will follow since an is also a

rearrangement ofbn (i.e.,an = bϕ−1(n)). For eachN ∈ N, we have

N∑

n=1

bn =

N∑

n=1

aϕ(n) ≤M∑

n=1

an ≤∞∑

n=1

an

where

M = max{ϕ(0), ϕ(1), . . . , ϕ(N)}.Sincebn ≥ 0 for all n ∈ N and the partial sums of the series

∑∞n=0 bn are bounded above

by∑∞

n=1 an, it follows that∑∞

n=0 bn converges and∑∞

n=0 bn ≤∑∞n=0 an. �

Along similar lines, the following theorem is true:

Theorem 23.3. If∑∞

n=0 an is an absolutely convergentseries in a Banach space, then∑∞n=0 an =

∑∞n=0 bn for any rearrangementbn of an.

23.2. Cauchy Products of Series

If∑∞

n=0 an and∑∞

n=0 bn are two convergent series inR, then what might we sayabout their product? Formally, we expect thatterm-by-termmultiplication (the so-called

78

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 79

Cauchy product1 of the two series)( ∞∑

i=0

ai

)

∞∑

j=0

bj

= (a0 + a1 + a2 + · · · )(b0 + b1 + b2 + · · · )

= a0b0 + (a0b1 + a1b0) + (a0b2 + a1b1 + a2b0) + · · ·= c0 + c1 + c2 + · · ·

=

∞∑

n=0

cn

where the terms of the new series are given by

cn =

n∑

k=0

akbn−k.

The following theorem of Mertens implies that products of series can be taken term-by-term as long asat least one of the series is absolutely convergent.

Theorem 23.4(Mertens). If∑∞

n=0 an and∑∞

n=0 bn are both convergent (with sumsAandB, respectively) and if at least one of the two series is absolutely convergent, then theproduct of the two series may be taken term-by-term:

∞∑

n=0

cn = AB

where

cn =

n∑

k=0

akbn−k.

In other words, the product of an absolutely convergent series and a convergent series canbe multiplied together term-by-term via the Cauchy formula.

We will not prove Mertens’ theorem here, since it is more important to understandthis particular result than to reproduce its proof. A related theorem of N.H. Abel is thefollowing:

Theorem 23.5(Abel). If∑∞

n=0 an = A,∑∞

n=0 bn = B,∑∞

n=0 cn = C are convergentseries of real numbers (wherecn =

∑nk=0 akbn−k), thenC = AB.

In other words, Abel’s theorem says that if the term-by-termproduct series∑∞

n=0 cn

converges (without the absolute convergence assumption that Mertens’ theorem requires),then the sum must actually beAB. The proof requires a clever argument based onpar-tial summation(a discrete analog of integration by parts) and a theorem on the boundarybehavior of power series near their circle of convergence.

1The reason for introducing this method of multiplication isdue to the fact that we need to add everypossible termaibj . We cannot sum with respect toi first since this would lead to the sum of infinitely manyinfinite series. Similarly, summing with respect toj first would lead to the same problem. This is similar to theproblem of countingN×N – by thinking diagonally we can actually list everyaibj without introducing infinitelymany “. . . ”’s. The sums defining the new termscn are finite sums, and hence cause us no trouble.

LECTURE 24

Products of Series

24.1. Cauchy Products of Series

If∑∞

n=0 an and∑∞

n=0 bn are two convergent series inR, then what might we sayabout their product? Formally, we expect thatterm-by-termmultiplication (the so-calledCauchy product1 of the two series)

( ∞∑

i=0

ai

)

∞∑

j=0

bj

= (a0 + a1 + a2 + · · · )(b0 + b1 + b2 + · · · )

= a0b0 + (a0b1 + a1b0) + (a0b2 + a1b1 + a2b0) + · · ·= c0 + c1 + c2 + · · ·

=

∞∑

n=0

cn

where

cn =

n∑

k=0

akbn−k

should yield a convergent series. As the following example shows, this is not always thecase.

24.2. The Cauchy Product of Convergent Series Can Diverge!

Consider the series∞∑

n=0

(−1)n

√n + 1

By the Alternating Series Test, this series converges to some valueA. What happens whenwe square this series and perform term-by-term multiplication? In other words, let

an = bn =(−1)n

√n + 1

and consider the Cauchy product of the two series∑∞

n=0 an and∑∞

n=0 bn.The formula for the new termscn tells us that

cn =

n∑

k=0

akbn−k

1The reason for introducing this method of multiplication isdue to the fact that we need to add everypossible termaibj . We cannot sum with respect toi first since this would lead to the sum of infinitely manyinfinite series. Similarly, summing with respect toj first would lead to the same problem. This is similar to theproblem of countingN×N – by thinking diagonally we can actually list everyaibj without introducing infinitelymany “. . . ”’s. The sums defining the new termscn are finite sums, and hence cause us no trouble.

80

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 81

=n∑

k=0

((−1)k

√k + 1

)((−1)n−k

√n − k + 1

)

=

n∑

k=0

(−1)k(−1)n−k

√(n − k + 1)(k + 1)

= (−1)nn∑

k=0

1√(n − k + 1)(k + 1)

.

CLAIM : The inequality

(n − k + 1)(k + 1) ≤(n

2+ 1)2

holds for0 ≤ k ≤ n.

Pf. of Claim. The inequality can be verified by simply multiplying it all out and checkingto see whether we are led to a true inequality:

(n − k + 1)(k + 1)?≤(n

2+ 1)2

nk + n − k2 − k + k + 1?≤ n2

4+ n + 1

nk − k2?≤ n2

4

0?≤ n2

4− nk + k2

0 ≤(n

2− k)2

. (TRUE)

This last inequality is clearly true, and hence so is our desired inequality (i.e., workingbackward from the last inequality yields the desired inequality). �

Returning to the formula for the termscn we find that

|cn| =

n∑

k=0

1√(k + 1)(n − k + 1)

≤n∑

k=0

1√(n2 + 1

)2

=

n∑

k=0

1n2 + 1

=

n∑

k=0

2

n + 2

= (n + 1)2

n + 2

=2n + 2

n + 2.

82 Lecture 24. Products of Series

From this it is clear thatlimn→∞ |cn| = 2 6= 0 and therefore the series∑∞

n=0 cn does notconverge (by the so-calledDivergence Test). In other words, attempting to compute

∞∑

j=0

(−1)j

√j + 1

( ∞∑

k=0

(−1)k

√k + 1

)

by multiplying term-by-term leads to a divergent series.

24.3. The Euler Product Formula

If p is a fixed prime number andz > 1, then the series∞∑

n=0

1

pzn=

∞∑

n=0

(1

pz

)n

=1

1 − 1pz

converges absolutely since|1/pz| < 1. Since Merten’s theorem says that we may multiplyabsolutely convergent series term-by-term, it follows that (usingp = 2, 3 in the above) that

1

1 − 12z

· 1

1 − 13z

=

(1 +

1

2z+

1

22z+ · · ·

)(1 +

1

3z+

1

32z+ · · ·

)(24.1)

=

(1 +

1

2z+

1

4z+ · · ·

)(1 +

1

3z+

1

9z+ · · ·

)

= 1 +1

2z+

1

3z+

1

4z+

1

6z+

1

8z+

1

9z+

1

12z+ · · ·

where the last sum includes terms corresponding exactly to those numbers whose primefactorizations only use 2 and 3. To see why, consider that we must multiply each term1/2jz with each term1/3kz when expanding out the multiplication on the right hand sideof (24.1). Similarly, one can show that

1

1 − 12z

· 1

1 − 13z

· 1

1 − 15z

= 1 +1

2z+

1

3z+

1

4z+

1

5z+

1

6z+

1

8z+

1

9z+

1

10z+

1

12z+

1

15z+ · · · (24.2)

where this time the sum includes terms corresponding to all numbers whose factorizationsuse only the primes2, 3, and5. The great mathematician Leonhard Euler (1707-1783)recognized that this process can be repeated indefinitely, producing what is now known astheEuler product formula. We describe this amazing formula below.

TheRiemannζ-functionis the function

ζ(z) =

∞∑

n=0

1

nz, (24.3)

defined (at the moment) for realvariablez > 1. The Euler Product Formula relates theseries (24.3) for theζ-function to an infinite product indexed by the prime numbers:

∞∑

n=0

1

nz=∏

p∈P

(1 − 1

pz

)−1

. (24.4)

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 83

HereP = {2, 3, 5, 7, . . .} denotes the set of all prime numbers. We will not go intothe details of the proof here, although we will mention that it involves the FundamentalTheorem of Algebra which tells us that each term1/nz can be written in the form

1

nz=

1

(pa1

1 pa2

2 · · · parr )z

=

(1

pa1

1

)z (1

pa2

2

)z

· · ·(

1

parr

)z

in exactly one way. From (24.4), we can see exactly why theζ-function is important tonumber theorists. It connects analysis (i.e. infinite series and later functions of acomplexvariable) to the prime numbers.

24.4. Euler’s Refinement of Euclid’s Theorem

Recall that Euclid (over 2300 years ago) showed that the set of prime numbers isinfinite. This nontrivial assertion, now known asEuclid’s theorem, was proved in BookIX of Euclid’s Elements. One proof of Euclid’s theorem is in Lecture 1.2 An 18th centuryproof based on the Euler Product Formula (24.4) is given below:

Theorem 24.1(Euclid’s Theorem). The number of primes is infinite.

Pf. 1 (Euler, 1737). If the setP of primes were finite, then the Euler Product Formula(24.4) would have only finitely many terms. Hence theproductwould remain bounded asz → 1 which contradicts the fact that theseriesis unbounded (since theharmonic series∑∞

n=1 1/n diverges) asz → 1. �

Using infinite series techniques, Euler proved a much sharper version of Euclid’s the-orem. Euler’s version roughly tells us that there are enoughprimes to make the series ofprime reciprocals

1

2+

1

3+

1

5+

1

7+

1

11+

1

13+ · · · (24.5)

diverge. Compare this with the series of reciprocals of perfect squares

1 +1

4+

1

9+

1

16+

1

25+ · · · (24.6)

which converges by the integral test (Euler also proved that(24.6) converges toπ2/6, animportant result which we will discuss later). Although there are infinitely many primesand infinitely many perfect squares, the primes are packed close enough together to make(24.5) diverge while the perfect squares are far enough apart to make (24.6) converge.

The recent proof of Euler’s theorem presented below is due toClarkson:

Theorem 24.2(Euler, 1737). If pn denotes thenth prime number, then the series∞∑

n=1

1

pn(24.7)

diverges. In particular, this implies that the number of primes is infinite.

2Suppose that the setP = {p1, p2, . . . , pn} of all primes is finite. The numberN = p1p2 · · · pn + 1 isnot divisible by any of the primespj since division by anypj would leave a remainder of1. Therefore the primefactors ofN do not belong toP , a contradiction.

84 Lecture 24. Products of Series

Pf. (Clarkson, 1966). Suppose toward a contradiction that the series (24.7) converges. Itfollows that there exists a positive integerk such that

∞∑

m=k+1

1

pm<

1

2. (24.8)

This is because the left hand side of (24.8) is the “tail-end”of a convergent series and hencetends to0 asm → ∞ (i.e we are lettingǫ = 1

2 ). Now let

Q = p1p2 · · · pk

and note that all of the numbers

1 + nQ, n = 1, 2, 3, . . .

are not divisible by any of the primesp1, . . . , pk. This follows from the same trick used inEuclid’s original proof (see Lecture 1). Hence the prime factors3 of each number1 + nQall belong to the set{pk+1, pk+2, . . .}.

For eachN ≥ 1 we have

N∑

n=1

1

1 + nQ≤

∞∑

j=0

( ∞∑

m=k+1

1

pm

)j

. (24.9)

The reason for the inequality is due to the fact that the sum onthe right hand side of (24.9),when expanded, includes every term on the left hand side. Nowobserve that (24.8) tells usthat the right hand side of (24.9) is dominated by the convergent geometric series

∞∑

j=0

(1

2

)j

.

This implies that∞∑

n=1

1

1 + nQ

converges, since it is a series of positive terms which has bounded partial sums. Theintegral test, however, reveals that this is false.4 This contradiction shows that the originalseries (24.7) diverges. �

There are many variants and further refinements of this theorem. For instance, a sharp-ened form says that

limx→∞

p≤x

1

p− log log x

= B1

whereB1 ≈ 0.2614972 . . . is calledMertens’ Constant. This was first demonstrated (in-dependently) in 1866 by Meissel and Mertens in 1874. A shocking refinement of Euler’s

3We are implicitly using

Theorem 24.3(Fundamental Theorem of Arithmetic). Every integern > 1 can be expressed as a product ofprimes. Specificallyn = pa1

1pa2

2· · · par

r where thepk are distinct primes and theak are positive integers. Thefactorization of an integern > 1 into primes is unique, apart from the order of the prime factors.

4i.e.R

1

dx1+xQ

diverges.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 85

theorem isBrun’s theorem(1919). This theorem states that the series of reciprocals of twinprimesconverges. In fact

(1

3+

1

5

)+

(1

5+

1

7

)+

(1

11+

1

13

)+ · · · ≈ 1.9021606 . . .

It is not known whether the constant1.9021606 . . . is rational or irrational. Furthermore,it is not even known whether or not there are infinitely many twin primes.

LECTURE 25

Compactness

25.1. Compactness

Definition. Let (M, d) be a metric space. A subsetS ⊆ M is calledcompact1 if everysequencexn in S has a subsequencexnk

which converges to a point inS.

Example 25.1. ∅ is a compact subset of any metric space.

Example 25.2. Any finite subset of a metric space is compact. Indeed, let(M, d) be ametric space and letS = {a1, a2, . . . , an} be a finite subset ofM . If xn is a sequence inS, then there existsi ∈ {1, 2, . . . , n} so thatxn = ai for infinitely manyn. In other words,there exists a sequencenk of natural numbers such that the corresponding subsequencexnk

of xn is constant (each term isai). In particular, the subsequencexnkconverges toai.

Example 25.3. The setS = { 1n : n ∈ N} is not compact inR (with respect to the usual

metric), even though every subsequence of each sequence inS converges to0. Since thislimit point is not an element ofS, S is not compact. On the other hand,S ∪ {0} is acompact subset ofR.

Theorem 25.1.Every closed interval[a, b] ⊂ R is compact.2

Proof. Without loss of generality, suppose thatxn is a sequence in[0, 1]. Let I0 = [0, 1]and select anyxn0

∈ I0. Now observe thatxn ∈ [0, 12 ] or xn ∈ [ 12 , 1] for infinitely many

values ofn. Let I1 denote one of these subintervals which containsxn for infinitely manyvalues ofn and selectxn1

∈ I1 wheren0 < n1. Continuing this bisection procedure, weconstruct a sequence of subintervalsIk such that

· · · Ik+1 ⊂ Ik ⊂ · · · ⊂ I1 ⊂ I0 = [0, 1]

and

Length(Ik) =1

2k,

and corresponding pointsxnk∈ Ik such thatn0 < n1 < · · · . The subsequencexnk

soconstructed satisfies the condition

j, k ≥ N ⇒ |xnk− xnj

| <1

2N

since the termsxnkandxnj

are restricted to lie in the intervalIN . Therefore the subse-quencexnk

is Cauchy. Since[0, 1] is complete (it a closed subset of the complete metricspaceR), it follows that the subsequencexnk

converges to a limit in[0, 1]. �

1The termsequentially compactis sometimes used to distinguish this concept fromcovering compactness,which we will discuss later. It turns out that in metric spaces the two concepts are equivalent (this is a majortheorem). Therefore we can safely use the termcompactwithout worrying too much in the long run.

2By definition [a, b] is of finite lengthb − a. In other words, we do not mean to include closed intervals ofthe form[a,∞) or (−∞, b].

86

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 87

Theorem 25.2.A “box” [a1, b1] × [a2, b2] × · · · × [an, bn] is compact inRn.

Proof. We prove the theorem inR2. The general case (wheren > 2) is similar, but thenotation is more cumbersome. If(xn, yn) is a sequence in the box[a1, b1] × [a2, b2], thenxn is a sequence in the compact subset[a1, b1] of R. By the compactness of[a1, b1] (as asubset ofR), there exists a subsequencexnk

of xn so thatxnkconverges to somex in R.

Now ynklives in the compact set[a2, b2] and hence has a convergent subsequence

ynkjwhich converges to some pointy in R. Therefore the subsubsequence(xnkj

, ynkj)

converges to the point(x, y) in R2. This proves that the box[a1, b1] × [a2, b2] is compact.�

Theorem 25.3.Every compact subset of a metric space(M, d) is closed and bounded.

Proof. Let S be a compact subset of a metric space(M, d). Suppose thatxn is a sequencein S which converges inM . In other words, there existsx ∈ M so thatxn → x. SinceS is compact, there is a subsequencexnk

of xn that converges to some pointy ∈ S. Butsubsequences of a convergent sequence must converge to the original limit. Thereforex = y ∈ S and henceS is closed (sinceS contains all of its limits points).

To see thatS is bounded, fixx ∈ M . EitherS is bounded (i.e., there existsM > 0so thatS ⊆ BM (x)) or else for eachn ∈ N there existsxn so thatd(x, xn) > n. SinceS is compact, there exists a subsequencexnk

of xn which converges to a pointy ∈ S.However,

nk < d(xnk, x)

≤ d(xnk, y) + d(y, x)

for all k ∈ N. However, sincexnk→ y andd(y, x) is constant, the right hand side of the

preceding inequality is bounded, a contradiction. ThusS is bounded, as claimed. �

Theorem 25.4.Every closed subset of a compact metric space(M, d) is compact.

Proof. Let S be a closed subset of a compact metric space(M, d). If xn is a sequencein S, then the compactness ofM yields a subsequencexnk

which converges to a pointxin M . However,xnk

is a sequence in the closed setS and hence the limit pointx mustbelong toS. In particular, this means that every sequence inS has a subsequence whichconverges inS. ThusS is compact. �

Corollary 8. The arbitrary intersection of compact sets is compact. In other words, ifAi

is a compact subset of(M, d) for all i ∈ I, then⋂

i∈I Ai is also compact.

Proof. Recall that the arbitrary intersection of closed sets is closed. It therefore followsthat

⋂i∈I Ai is a closed subset of a compact set (namely any of theAi) and is thus compact

by the preceding theorem. �

It is also true that the union of finitely many compact sets is also compact.

25.2. Compact Sets inRn

Theorem 25.5(Bolzano-Weierstrass). A bounded sequence inRn has a convergent sub-sequence.3

3Note that the limit point does not have to be a member of the sequence. The sequence1, 1

2, 1

3, . . . in R

is bounded and has many convergent subsequences. However, the limit point0 does not belong to the originalsequence.

88 Lecture 25. Compactness

Proof. A bounded sequence inRn is contained in a box. Since boxes are compact, somesubsequence converges to a limit contained in the box. �

The preceding theorem is sometimes stated without mention of sequences:

Theorem 25.6(Bolzano-Weierstrass). A bounded, infinite subsetS of Rn has an accumu-lation point inRn.4

Recall that if a subsetS in a metric space(M, d) is compact, thenS must be closedand bounded. In general, the converse is false (i.e., it is possible forS to be closed andbounded, but not compact). However,Rn is particularly nice in the sense that the converseis true for subsets ofRn:5

Theorem 25.7(Heine-Borel). A subsetS of Rn is compact if and only ifS is closed andbounded.

Proof. We have already proved that a compact set must be closed and bounded. Indeed,this holds in any metric space(M, d), not justRn.

On the other hand, suppose thatS is a closed and bounded subset ofRn. SinceS isbounded, it follows thatS is contained in a boxB = [a1, b1]×[a2, b2]×· · ·×[an, bn] ⊂ Rn.SinceB is compact, it follows that every sequencexn in S has a subsequencexnk

whichconverges to a limitx in B. However, sinceS is closed, it follows thatx belongs toS.ThereforeS is compact. �

4The limit point need not belong toS.5HereRn is endowed with the standard Euclidean metricd2. However, since a sequence inRn converges

with respect to one ofd1, d2, d∞ if and only if it converges with respect to the other two metrics, it turns out thatthe following theorem holds regardless of whether which of the three metricsd1, d2, d∞ one uses.

LECTURE 26

The Cantor Set

26.1. The Cantor Set

TheCantor Setis an interesting and bizarre mathematical object and one ofthe first“fractals” to be discovered. It is an endless source of counterexamples and quite simply aninteresting object to ponder. Moreover, the Cantor Set is the tip of the iceberg of a wholetheory of fractals (self-similar objects) and can be used toproduce examples ofPeanocurves(space-filling curves).

Define a sequence of subsetsCn of [0, 1] according to the following scheme:

C0 = [0, 1]

C1 = [0, 13 ] ∪ [23 , 1]

C2 = [0, 19 ] ∪ [29 , 1

3 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1]

......

...

whereCn+1 is obtained fromCn by removing the middle third of every closed intervalcontained inCn. To be more specific:

Cn consists of2n closed intervals of length1

3n.

TheCantor setis defined to be

C =

∞⋂

n=0

Cn.

In other words,C is the set that is “left over” after removing the intervals

[ 13 , 23 ], [19 , 2

9 ], [79 , 89 ], . . .

from [0, 1]. One might at first think thatC is empty. Nothing could be further from thetruth. In fact, it is immediately clear thatC is infinitesince0, 1, 1

3 , 23 , 1

9 , . . . all belong toC. This is because each of these numbers belongs to everyCn (i.e., these numbers arenever “removed” from[0, 1] during the construction ofC). In other words, the endpointsof any of the closed subintervals that belong to any of theCn also belong to the Cantor SetC.

It turns out thatC is a highly nontrivial example of a compact set:

Theorem 26.1.C is compact.

Proof. Each setCn is thefinite intersection of closed sets and hence closed. It follows thatC, being the intersection of closed sets, is itself closed. SinceC is also bounded, it followsthatC is compact by the Heine-Borel theorem. �

89

90 Lecture 26. The Cantor Set

One of the first things one notices aboutC is that it is somewhat “sparse.” Although wehave not discussed the concept of Lebesgue measure (a Math 137-138 topic), the followingtheorem is too curious to pass up:

Theorem 26.2.C has “measure zero.” In other words, the “length” ofC is zero.

Proof. We will show that the complement[0, 1]\C of the Cantor set has length1. This isa much more intuitive thing to do since[0, 1]\C is simply a union of open intervals, eachof which has a well-defined length. Therefore the length of[0, 1]\C is simply the sum ofthe lengths of the intervals that are removed in the formation of C. Recalling that the setCn from thenth stage of the construction of the Cantor set consists of2n closed intervalsof length 1

3n , we compile the following table:

Stage # of Intervals Removed Length of Intervals RemovedTotal Removed0 0 0 01 1 1/3 1/32 2 1/9 2/93 4 1/27 4/27...

......

...n 2n−1 1/3n 2n−1/3n

......

......

The total removed is therefore13 + 2

9 + 427 + · · · = 1

3 (1 + 23 + 4

9 + · · · )

= 13

∞∑

n=0

(23 )n

=1

3· 1

1 − 23

= 1.

Thus[0, 1]\C has length1, which implies thatC has “zero length.” �

Of course we may have expected the preceding sinceC seems so “sparse” and “spreadout.” However, in another senseC is quite large:

Theorem 26.3.C is uncountable. In fact,C has the same cardinality as[0, 1] itself.

Sketch of Pf. One notes that a numberx ∈ [0, 1] belongs toC if and only if the base-3 expansion ofx contains only the digits0 or 2. In other words, eachx ∈ C can berepresented uniquely in the form

x =a0

3+

a1

32+

a2

33+ · · ·

whereai ∈ {0, 2} for eachi. ThusC has the same cardinality as the set of all infinitestrings of0’s and2’s. However, this is the same cardinality as that of the set ofall infinitebinary strings – strings which use only the symbols0 and1. The set of all infinite binarystrings corresponds to[0, 1] itself, however, which is uncountable. �

The preceding theorem is quite remarkable, since it says that C has the same cardi-nality as[0, 1] (and henceR itself) despite the fact that the length ofC, by any reasonablestandard of measurement, is zero.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 91

Recall that the Cantor setC is a peculiar compact subset of[0, 1] which is uncount-able, yet is of “measure0.” Unlike other sets that we have encountered in our cardinalitydiscussions, the Cantor set isclosedin R. In other words,C contains all of its limit points.1

In fact, it turns out that even more is true:

Theorem 26.4.Every point ofC is an accumulation point ofC.

Proof. If x ∈ C andǫ > 0 are given, then letn ∈ N so be large that3−n < ǫ. Sincex ∈ Cn, there exists a closed intervalI of length3−n such thatx ∈ I ⊂ Cn. If y be anendpoint ofI which is distinct fromx, then|x − y| ≤ 3−n < ǫ. This implies thatx is anaccumulation point ofC. �

ThusC is an uncountable subset of[0, 1] of measure zero which is somehow so “densein itself” that every point ofC is the limit of a sequence of distinct points ofC. Moreover,C is totally disconnected, in the sense that between any two pointsx, y ∈ C, there existsa pointz in between them which is not an element ofC. To show this, it suffices to provethatC contains no intervals.

Theorem 26.5.C contains no intervals.

Proof. Let I be a subinterval of[0, 1] of length δ > 0. Let n ∈ N be so large that3−n < δ. ThenCn consists entirely of intervals of length< δ which implies thatI 6⊂ Cn.In particular, this means thatI 6⊂ C. �

26.2. The Cantor Ternary Function

Using the Cantor set, one can create a host of pathological functions. For instance,consider theCantor ternary function(a.k.a. theDevil’s Staircase) f : [0, 1] → [0, 1]defined by

f(x) =

12 if x ∈ [ 13 , 2

3 ]14 if x ∈ [ 19 , 2

9 ]34 if x ∈ [ 79 , 8

9 ]...

...

(see Figure 1). Clearly this is well defined ifx /∈ C. If does belong toC, then recallx ∈ Cif and only if there exists a sequencean of 0’s and2’s so that

x =

∞∑

n=1

an

3n.

For suchx, we define

f

( ∞∑

n=1

an

3n

)=

∞∑

n=1

an

2n+1.

Theorem 26.6. The Cantor ternary functionf : [0, 1] → [0, 1] is continuous and increas-ing and satisfiesf ′(x) = 0 for all x /∈ C.

Sketch of Pf. Sincef is constant on each of the open intervals removed during the con-struction ofC, we need only show thatf is continuous at each point ofC itself. Letǫ > 0be given and letn ∈ N be so large that1/2n < ǫ. If |x − y| < δ = 1

3n , then there are

1For instance, contrast this withQ ∩ [0, 1] which is certainly not closed.

92 Lecture 26. The Cantor Set

1

9

2

9

1

3

2

3

7

9

8

9

1

1

4

1

2

3

4

1

FIGURE 1. Graph of the Cantor ternary functionf .

ternary expansions ofx andy whose firstn symbols agree. Therefore the firstn symbolsin the binary expansion off(x) andf(y) agree, implying that

|f(x) − f(y)| ≤ 12n < ǫ.

Sincef is constant on[0, 1]\C, it follows thatf ′(x) = 0 for all x /∈ C. �

In other words, the Cantor ternary function is flat “almost everywhere” yet is stillincreasing. Using such tricks, one can create even more bizarre functions:

Example 26.1. Let f denote the Cantor ternary function. The functiong : [0, 1] → Rdefined by

g(x) = x − 2f(x)

satisfies (see Figure 2)g′(x) = 1 “almost everywhere” (whenx /∈ C) while also satisfying

g(0) = 0, g(1) = −1.

In particular,g is increasing “almost everywhere,” yet manages to decreaseoverall.

There are many more strange fractals that can be created using this same circle ofideas. For instance, theMenger Spongeis depicted in Figure 2.

26.3. Cantor Set Trivia

There are a whole host of deeper theorems concerning the Cantor set (which we willnot prove). For example:

Theorem 26.7.D(C) = {x − y : x, y ∈ C} = [−1, 1].

In other words, the set of all differences of elements ofC is all of [−1, 1]. This isdespite the fact thatC itself has measure zero.

Another shocking theorem is the followingCantor Surjection Theorem:

Theorem 26.8. If (M, d) is a compact metric space, then there exists a continuous surjec-tion f : C → M whereC is the Cantor set.2

2HereC is endowed with the usual metricd(x, y) = |x − y|.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 93

1

9

2

9

1

3

2

3

7

9

8

9

1

-

1

4

-

1

2

-

3

4

-1

FIGURE 2. Graph ofg(x) = x − 2f(x)

FIGURE 3. The second stage in the construction of theMenger sponge.

In other words, all compact metric spaces are continuous images of the Cantor set.This does not contradict anything we have learned about connectedness. Although thecontinuous image of a connected set is connected, the continuous image of a disconnectedset can certainly be connected.

Another byproduct of the construction of the Cantor Set is the construction of Peanocurves (space-filling curves). For example:

94 Lecture 26. The Cantor Set

Theorem 26.9. There exists a continuous functionf : [0, 1] → [0, 1] × [0, 1] which issurjective.

Finally, we leave off with the shocking fact that a nontrivial metric space can be home-omorphic to its own Cartesian product:

Theorem 26.10.C is homeomorphic toC × C.

LECTURE 27

Compactness and Continuity

27.1. Continuity and Compactness

Theorem 27.1. Let (A, dA) and (B, dB) be metric spaces and letf : A → B be acontinuous function. IfS is a compact subset ofA, thenf(S) is a compact subset ofB. Inother words, “the continuous image of a compact set is compact.”

Proof. Let S be a compact subset ofA and suppose thatyn is a sequence inf(S) Foreachn ∈ N, selectxn in S so thatf(xn) = yn.1 SinceS is compact, it follows that somesubsequencexnk

of xn converges to a limitx ∈ S. By the continuity off , it followsthat the sequenceynk

= f(xnk) converges to the pointy = f(x), which clearly belongs

to f(S). In particular, we have shown that every sequenceyn in f(S) has a subsequencewhich converges inf(S). Thusf(S) is compact. �

Corollary 9. If S is a closed and bounded subset ofRn andf : Rn → Rm is a continuousfunction, thenf(S) is a compact subset ofRm.

Proof. This follows immediately from the Heine-Borel theorem (a subset ofRn is compactif and only if it is closed and bounded) and the fact that the continuous image of a compactset is compact. �

Example 27.1. Even for functionsf : R → R, the image of a closed set need not beclosed and the image of a bounded set need not be bounded. The reader is urged to comeup with examples demonstrating this.

For real-valued continuous functions, the preceding theorem has the following impor-tant corollary:

Theorem 27.2(Extreme Value Theorem). If (A, dA) is compact, then each real-valuedcontinuous functionf : A → R is bounded (i.e., there existsM > 0 such that|f(x)| ≤ Mfor all x ∈ A). Moreover, it assumes global maximum and minimum values. Aspecial casefrom Calculus I is:

A continuous functionf : [a, b] → R is bounded and assumes anabsolute maximum and absolute minimum somewhere on[a, b].

Proof. By the preceding theorem,f(A) is a compact subset ofR. In particular, this meansthat f(A) is closed and bounded. Sincef(A) is nonempty and bounded above (resp.below), it follows thatsup f(A) (resp.inf f(A)) exists and is finite. It is clear that

inf f(A) ≤ f(x) ≤ sup f(A)

holds for allx ∈ A. We claim thatf attains the absolute maximum valuey = sup f(A) atsome point ofA (the proof thatf attains its absolute minimum valueinf f(A) is similar).

1In other words, letxn belong tof−1({yn}).

95

96 Lecture 27. Compactness and Continuity

By the Approximation Property of Suprema, there exists a sequenceyn ∈ f(A) suchthatyn → y (if such a sequence did not exist, theny would not be theleastupper boundof f(A)). Thus there exists a sequencexn ∈ A so thatf(xn) → y. SinceA is compact, itfollows that some subsequencexnk

converges to a pointx in A. In particular,

f(x) = limn→∞

f(xnk) = y = sup f(A)

and thusf assumes the maximum valuesup f(A) atx. �

Example 27.2.There is a “hottest” place on Earth – if we imagine the surfaceof the earthto be the sphereS = {(x, y, z) ∈ R3 : x2 + y2 + z2 = 1}, we see thatS is compact (it isclosed and bounded). Since the functionT (x, y, z) describing the temperature at any pointof S is continuous (one would assume), the preceding theorem says that there is a point onS at which the temperature is an absolute maximum.

27.2. Uniform Continuity

Recall that a functionf : A → B is said to be continuous onA if f is continuousat each pointx in A. In particular, observe that iff is continuous onA, then theδ in thedefinition of continuity is allowed to depend uponx. In other words, givenǫ > 0, the sameδ is not guaranteed “to work” for eachx in A. Somex will require smallerδ’s than others.

There is a stronger notion of continuity that is extremely important in analysis:

Definition. Let (A, dA) and(B, dB) be metric spaces. A functionf : A → B is calleduniformly continuousif

(∀ǫ > 0)(∃δ > 0)(∀x, y ∈ A)( dA(x, y) < δ ⇒ dB(f(x), f(y)) < ǫ ).

The key difference between uniform continuity and continuity is that onceǫ > 0 isfixed, the sameδ > 0 must work for allx, y ∈ A. Let us make this distinction explicit. Afunctionf : A → B is continuousonA if

(∀x ∈ A)(∀ǫ > 0)(∃δ > 0)(∀y ∈ A)( dA(x, y) < δ ⇒ dB(f(x), f(y)) < ǫ )

anduniformly continuousonA if

(∀ǫ > 0)(∃δ > 0)(∀x, y ∈ A)( dA(x, y) < δ ⇒ dB(f(x), f(y)) < ǫ ).

Example 27.3. The functionf : [0, 1] → R defined byf(x) = x2 is uniformly contin-uous. One can prove this directly from theǫ − δ definition, but we prefer a more cleverapproach. Consider the following:

|f(x) − f(y)| = |x2 − y2|= |x + y||x − y|≤ 2|x − y|.

It follows that if ǫ > 0 is given that we may takeδ = ǫ/2 for all x, y ∈ [0, 1]. Indeed,|x − y| < δ immediately implies, by the preceding inequalities, that|f(x) − f(y)| < ǫ.

The following example indicates that the domain of the function plays a significantrole in whether a continuous function is uniformly continuous:

Example 27.4.The functionf : R → R defined byf(x) = x2 is continuous, but not uni-formly continuous. What does it mean to not be uniformly continuous? Since the definition

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 97

of uniform continuity is somewhat complicated, let us explicitly negate the definition. Inother words, let us evaluate the following logical expression:

∼(∀ǫ > 0)(∃δ > 0)(∀x, y)( |x − y| < δ ⇒ |f(x) − f(y)| < ǫ ).

It follows thatf is not uniformly continuous if and only if the following holds2:

(∃ǫ > 0)(∀δ > 0)(∃x, y)( |x − y| < δ ⇒ |f(x) − f(y)| ≥ ǫ ).

Thus we must prove that the preceding statement is satisfied by our given function. Indeed,if ǫ = 1 andδ > 0 is given, we wish to findx, y such that|x − y| < δ and|x2 − y2| ≥ 1.We claim that ifx is sufficiently large and ify = x + δ

2 , then both conditions will hold.We therefore wish to findx ≥ 0 such that the inequality

1 ≤ |f(x + δ2 ) − f(x)|

= |(x + δ2 )2 − x2|

= 2x( δ2 ) + ( δ

2 )2

= xδ + δ2

4

is satisfied. A short computation reveals that anyx which satisfies

1

δ− δ

4≤ x

will do. In particular, this shows thatf is not uniformly continuous.

2Recall thatP ⇒ Q is equivalent to∼P ∨ Q. Thus∼(P ⇒ Q) is equivalent toP ∧ (∼Q).

LECTURE 28

Uniform Continuity

Theorem 28.1.A continuous function on a compact set is uniformly continuous.

Proof. Let (A, dA) be a compact metric space, let(B, dB) be a metric space, and letf :A → B be continuous. Suppose toward a contradiction thatf is not uniformly continuous.Hence there existsǫ > 0 so that no matter how smallδ > 0 is, there existx, y ∈ A so thatdA(x, y) < δ butdB(f(x), f(y)) ≥ ǫ.

Letting δ = 12n for n ∈ N, we may therefore find sequencesxn, yn in A so that

dA(xn, yn) < 12n and dB(f(xn), f(yn)) ≥ ǫ.

Since(A, dA) is compact, it follows that some subsequencexnkof xn converges to a

pointx in A. However, since the right hand side of

dA(x, ynk) ≤ dA(x, xnk

) + dA(xnk, ynk

)

tends to zero, it follows thatynkalso converges tox:

limk→∞

xnk= lim

k→∞ynk

= x.

The sequential characterization of continuity tells us that

limk→∞

f(xnk) = lim

k→∞f(ynk

) = f(x)

and thus the right hand side of

dB(f(xnk), f(ynk

)) ≤ dB(f(xnk), f(x)) + dB(f(x), f(ynk

))

tends to zero. In particular, it follows thatdB(f(xnk), f(ynk

)) < ǫ holds for sufficientlylargek. However, this contradicts the fact thatdB(f(xnk

), f(ynk)) ≥ ǫ. �

Example 28.1. Let A = C ∪ [2, 3] whereC is the Cantor set. SinceA is closed andbounded inR, it follows thatA is compact. If we definef : A → R by

f(x) =e−x2

sin√

x + 47 + cos(sin(cos( 3√

x + 47)))47√

ex + 47,

then the preceding theorem asserts thatf is uniformly continuous onA. Clearly this is notsomething you would want to ever verify by direct computation.

28.1. Nested Compact Sets

Theorem 28.2. Let (M, d) be a metric space. IfAn is a sequence of nonempty, compactsubsets ofM such that

A0 ⊇ A1 ⊇ A2 ⊇ · · ·thenA =

⋂∞n=0 An is compact and nonempty.

98

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 99

Proof. Recall that any closed subset of a compact set is compact. Since the arbitraryintersection of closed sets is closed, it follows thatA is a closed subset ofA0 whenceA iscompact. We must now show thatA is nonempty.

For eachn ∈ N, select a pointxn ∈ An. SinceA0 is compact and sincexn ∈ A0 forall n ∈ N, it follows that some subsequencexnk

of xn converges to a limitx ∈ A0.We wish to show that the limit pointx of this subsequence also belongs to eachAn.

To do this, note that for eachn ∈ N the “tail” sequence

xn, xn+1, xn+2, . . .

is a sequence inAn (a closed set) which has a subsequence which converges tox. Hence itfollows that the limit pointx belongs toAn for eachn ∈ N. Thereforex ∈

⋂∞n=1 An = A

andA 6= ∅. �

Definition. Let (M, d) be a metric space. Thediameterdiam(S) of a subsetS ⊆ M isdefined by

diam(S) = sup{d(x, y) : x, y ∈ S}.In other words, the diameter of a set is the supremum of the distances between points ofS.

Theorem 28.3. Let (M, d) be a metric space. IfAn is a sequence of nonempty, compactsubsets ofM satisfying

limn→∞

diam(An) = 0,

thenA =⋂∞

n=0 An consists of a single point.

Proof. By the preceding theorem,A 6= ∅. On the other hand,A ⊆ An for eachn ∈ Nand thus for anyx, y ∈ A it follows that

0 ≤ d(x, y) ≤ diam(An)n→∞−→ 0.

In other words,d(x, y) = 0 for everyx, y ∈ A, whencex = y for all x, y ∈ A. ThusAconsists of exactly one point. �

LECTURE 29

Contraction Mapping Principle

29.1. The Contraction Mapping Principle

To avoid the unnecessary proliferation of parentheses, we will sometimes denoteT (x)by Tx (as is done in linear algebra).

Definition. Let (M, d) be a metric space. A functionT : M → M is a

(i) contractionif d(Tx, T y) ≤ d(x, y) for all x, y ∈ M ,

(ii) strict contractionif d(Tx, T y) < d(x, y) for all x, y ∈ M ,

(iii) uniformly strict contractionif there existsα ∈ [0, 1) so that

d(Tx, T y) ≤ αd(x, y)

for all x, y ∈ M .

Lemma 8. If T is a contraction, thenT is continuous.

Proof. Suppose thatT : M → M is a contraction. Letǫ > 0 and letδ = ǫ. If d(x, y) ≤ δ,thend(Tx, T y) ≤ d(x, y) < δ = ǫ. ThusT is continuous. �

Theorem 29.1(Contraction Mapping Principle). If T : M → M is a uniformly strictcontraction on a complete metric space(M, d), thenT has a unique fixed pointx ∈ M(i.e.,Tx = x). Furthermore, for anyx0 ∈ M the iterates ofx0 underT converge tox. Inother words,

limn→∞

T nx0 = x

for all x ∈ M whereT n = T ◦ T ◦ T ◦ · · · ◦ T︸ ︷︷ ︸

n

.

Proof. We need to show that a fixed pointp of T exists and that it is unique.

EXISTENCE: SinceT : M → M is a uniformly strict contraction, there exists a constantα ∈ [0, 1) so thatd(Tx, T y) ≤ αd(x, y) for all x, y ∈ M . Fix anyx0 ∈ M and setxn = T nx0 for n ≥ 1. It follows that

d(xn+1, xn) = d(T n+1x0, Tnx0)

= d(T (T nx0), T (T n−1x0))

= d(Txn, Txn−1)

≤ αd(xn, xn−1).

It follows from a previous HW assignment that the sequencexn = T nx0 converges tosome limitx:

limn→∞

T nx0 = x.

100

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 101

SinceT is a contraction it is continuous and hence

Tx = T ( limn→∞

xn)

= limn→∞

Txn

= limn→∞

xn+1

= x.

ThusTx = x andx is a fixed point ofT . However, it appears thatx might depend uponthe initial pointx0. We must show that this is not the case – in other words we must showthatT can have at most one fixed point.

UNIQUENESS: If y is a fixed point ofT then

0 ≤ d(x, y) = d(Tx, T y) ≤ αd(x, y),

which is possible if and only ifd(x, y) = 0 (i.e., whenx = y). �

The following example demonstrates the power of the Contraction Mapping Principleand of the entire metric space machinery that we have built up. In particular, the Contrac-tion Mapping Principle can provide proofs that certain complicated differential and integralequations have unique solutions. Moreover, it also provides an algorithm by which thesesolutions can be computed.

Example 29.1. If g : [0, 1] → R is continuous, then there exists a continuous real-valuedfunctionf : [0, 1] → ∞ such that

f(x) −∫ x

0

f(x − t)e−t2 dt = g(x). (29.1)

Define the functionT : C[0, 1] → C[0, 1] by

Tf = g(x) +

∫ x

0

f(x − t)e−t2 dt.

Clearly if Tf = f , thenf is a solution to (29.1). Moreover, any solutionf to (29.1) alsosatisfiesTf = f . In other words, the solutions of (29.1) (if any exist) correspond to thefixed points of the functionT .

Recall thatC([0, 1]) is complete with respect tod∞ (we mentioned this earlier andwill prove it later in the course). Using some of the basic properties of integrals fromCalculus II, we find that

d∞(f1, f2) = ‖Tf1 − Tf2‖∞

= sup0≤x≤1

∣∣∣∣(

g(x) +

∫ x

0

f1(x − t)e−t2 dt

)+

(g(x) +

∫ x

0

f2(x − t)e−t2 dt

)∣∣∣∣

= sup0≤x≤1

∣∣∣∣∫ x

0

[f1(x − t) − f2(x − t)] e−t2 dt

∣∣∣∣

≤ sup0≤x≤1

∫ x

0

|f1(x − t) − f2(x − t)|e−t2 dt

≤ ‖f1 − f2‖∞∫ x

0

e−t2 dt

≤ ‖f1 − f2‖∞∫ 1

0

e−t2 dt

102 Lecture 29. Contraction Mapping Principle

≤ α‖f1 − f2‖∞where

α =

∫ 1

0

e−t2 dt = 0.746824 . . . < 1.

The fact that0 ≤ α < 1 can be seen by considering the graph ofe−x2

for 0 ≤ x ≤ 1 (seeFigure 1). An alternate approach might be to use the fact thatthe series expansion fore−x2

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

FIGURE 1. Graph ofy = e−x2

.

is alternating:

e−x2

=

∞∑

n=0

(−x2)n

n!

= 1 − x2 +x4

2− x6

6+ · · ·

≤ 1 − x2 +x4

2

for x ∈ [0, 1]. Thus∫ 1

0

e−t2 dt ≤∫ 1

0

(1 − x2 +

x4

2

)dt

= 0.76.

By the Contraction Mapping Principle, it follows thatT has a unique fixed point. In otherwords, there exists a unique continuous real-valued function f : [0, 1] → ∞ such that(29.1) holds.

The preceding example shows the power of this abstract approach. It also indicatesthat we need to have a better understanding of integrals, derivatives, infinite series, and thed∞ metric in order to handle some of the sophisticated problemsthat one encounters inother branchs of mathematics and in applications.

LECTURE 30

Derivatives

The following notion of convergence is commonly used in Calculus:

Definition. Let (A, dA) and(B, dB) be metric spaces, leta ∈ A, and letf : A\{a} → Bbe a function. We say that

limx→a

f(x) = y

if for everyǫ > 0 there existsδ > 0 such that

0 < dA(x, a) < δ ⇒ |f(x) − y| < ǫ.

The following theorem asserts that the preceding definitionof limits agrees with ouroriginal definition and also agrees quite well with theǫ − δ definition of continuity:

Theorem 30.1.Let(A, dA) and(B, dB) be metric spaces, leta ∈ A and letf : A\{a} →B be a function. The following are equivalent:

(i) limx→a f(x) = y,

(ii) limn→∞ f(xn) = y for any sequencexn in A which converges toa.

(iii) f can be extended continuously toa by settingf(a) = y. In other words, thefunctionf : A → B defined by

f(x) =

{f(x) x 6= a

y x = a

is continuous onA.

30.1. Derivatives

Definition. A functionf : (a, b) → R is said to bedifferentiableatx0 ∈ (a, b) if for everyǫ > 0 there existsδ > 0 such that

0 < |x − x0| < δ ⇒∣∣∣∣f(x) − f(x0)

x − x0− L

∣∣∣∣ < ǫ. (30.1)

We callL the derivative off atx0, denotedf ′(x0).

The preceding condition is frequently shortened to

limx→x0

f(x) − f(x0)

x − x0= L

or equivalently

limh→0

f(x0 + h) − f(x0)

h= L

103

104 Lecture 30. Derivatives

In other words,f is differentiable atx0 if the function

g(x) =

{f(x)−f(x0)

x−x0x 6= x0

L x = x0

is continuous atx0.One can also see that (30.1) is equivalent to saying that

0 < |x − x0| < δ ⇒

∣∣∣∣∣∣∣f(x) − f(x0) − f ′(x0)(x − x0)︸ ︷︷ ︸

E(x)

∣∣∣∣∣∣∣< ǫ|x − x0| (30.2)

so thatf(x) = f(x0) + f ′(x0)(x − x0) + E(x)

where the error termE(x) has the property that for eachǫ > 0, there existsδ > 0 so that|x − x0| < δ implies that ∣∣∣∣

E(x)

x − x0

∣∣∣∣ < ǫ.

In other words, the error termE(x) is goes to zero faster thanx − x0 does:

limx→x0

E(x)

x − x0= 0.

The astute reader will observe that there is the slight problem definingE(x0). However,one notes that (30.2) implies thatE(x) is uniformly continuous nearx0 and hence can beextended continuously tox0 (see HW).

Summing things up, a differentiable functionf is well approximated by the linearfunctionf(x0) + f ′(x0)(x − x0) nearx0.

Theorem 30.2. If f : (a, b) → R is a constant function, thenf ′(x) = 0 for all x ∈ (a, b).

Proof. Since the difference quotient is always0, the theorem follows immediately fromthe definition of the derivative. �

30.2. Basic Theorems

Theorem 30.3. If f is differentiable atx0, thenf is continuous atx0.

Proof. This follows from the fact that

limx→x0

|f(x) − f(x0)| = limx→x0

∣∣∣∣f(x) − f(x0)

x − x0

∣∣∣∣ |x − x0|

= limx→x0

∣∣∣∣f(x) − f(x0)

x − x0

∣∣∣∣ · limx→x0

|x − x0|

= |f ′(x0)| · limx→x0

|x − x0|

= 0. �

LECTURE 31

Mean Value Theorem

31.1. Basic Theorems

Definition. A function f : (a, b) → R has alocal maximum(resp alocal minimum) atx0 ∈ (a, b) if there exists an open intervalI such thatx0 ∈ I ⊆ (a, b) andf(x) ≤ f(x0)(resp.f(x) ≥ f(x0)) for all x ∈ I.

The following theorem is familiar from Calculus I. It justifies the traditional method ofmaximizing or minimizing differentiable functions by searching for zeros of the derivative.

Theorem 31.1. If f : (a, b) → R has a local maximum or a local minimum atx0 ∈ (a, b)andf is differentiable atx0, thenf ′(x0) = 0.

Proof. Without loss of generality, suppose thatf has a local maximum atx0. Thus thereexists an open intervalI ⊆ (a, b) such thatx0 ∈ I andf(x) ≤ f(x0) for all x ∈ I. Inother words,

f(x) − f(x0) ≤ 0

for all x ∈ I. Now letδ > 0 be so small that(x0 − δ, x0 + δ) ⊆ I and then observe that

x0 < x < x0 + δ ⇒ f(x) − f(x0)

x − x0≤ 0

whencef ′(x0) ≤ 0. On the other hand, we also see that

x0 − δ < x < x0 ⇒ f(x) − f(x0)

x − x0≥ 0

whencef ′(x0) ≥ 0. Putting this all together, we find thatf ′(x0) = 0, as desired. �

Another incredibly useful theorem from Calculus I is the following:

Theorem 31.2(Mean Value Theorem). If f : [a, b] → R is continuous on[a, b] anddifferentiable on(a, b), then there existsx0 ∈ (a, b) so that

f(b) − f(a) = f ′(x0)(b − a). (31.1)

Proof. Let

S =f(b) − f(a)

b − a

denote the slope of the secant of the graph off from (a, f(a)) to (b, f(b)). Next define theauxiliary function

g(x) = f(x) − Sx

and note thatg(a) = g(b) since the net “rise” of bothf(x) andSx are the same over theinterval[a, b]. Let us make this more precise:

g(a) = f(a) − Sa

105

106 Lecture 31. Mean Value Theorem

= f(a) − af(b) − f(a)

b − a

=(b − a)f(a) − af(b) + af(a)

b − a

=bf(a) − af(b)

b − a,

g(b) = f(b) − Sb

= f(b) − bf(b) − f(a)

b − a

=(b − a)f(b) − bf(b) + bf(a)

b − a

=bf(a) − af(b)

b − a

whence

g(a) = g(b) =bf(a) − af(b)

b − a.

There are two cases to investigate:

(i) If g is constant on[a, b], then

0 = g′(x) = f ′(x) − S

for all x ∈ (a, b) whencef ′(x0) = S for everyx0 ∈ (a, b). In particular, thisproves (31.1).

(ii) Suppose now thatg is not constant. Sinceg is continuous on[a, b], it followsfrom the Extreme Value Theorem thatg assumes an absolute maximum andabsolute minimum on[a, b]. Sinceg is not constant, either the absolute maxi-mum or absolute minimum value ofg on [a, b] is attained at somex0 ∈ (a, b).It follows that g(x0) is either a local maximum or local minimum whenceg′(x0) = 0. However, this implies thatf(x0) = S whence (31.1) follows. �

An immediate corollary of the Mean Value Theorem is Rolle’s Theorem:

Theorem 31.3(Rolle’s Theorem). If f : [a, b] → R is continuous on[a, b], differentiableon (a, b), andf(a) = f(b), then there existsx0 ∈ (a, b) so thatf ′(x0) = 0.

Example 31.1. If f(x) = x3 + px + q wherep > 0, thenf has a unique real root. Firstlet us observe that at least one root exists. Sincelimx→±∞ f(x) = ±∞, it follows thatf assumes both positive and negative values whence a root mustexist by the IntermediateValue Theorem. Now suppose toward a contradiction that there exista < b such thatf(a) = f(b) whence there would exist ac ∈ (a, b) such that0 = f ′(c) = 3c2 + p > 0.This is a contradiction.

Theorem 31.4. If f is differentiable on(a, b) andf ′(x) ≥ 0 for all x ∈ (a, b), thenf isincreasing on(a, b). In other words,x ≤ y implies thatf(x) ≤ f(y).

Proof. Let x < y (the casex = y is trivial) and note that the Mean Value Theorem givesusc ∈ (x, y) so that

f(y) − f(x) = f ′(c)(y − x) ≥ y − x ≥ 0. �

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 107

Example 31.2.The Mean Value Theorem can be used to prove various interesting inequal-ities for everyday functions. For example, givena, b ∈ (−π/2, π/2) there existsc strictlybetweena andb so that

tan b − tan a = (sec2 c)(b − a)

from which it follows that| tan b − tana| ≥ |b − a|

sincesec2 c ≥ 1 for all c ∈ (−π/2, π/2).

Example 31.3. The functionf(x) = ex2

is uniformly continuous on[0, 1]. Althoughthis is guaranteed from the fact thatf is continuous and[0, 1] is compact, we can provethis directly. Given0 ≤ x < y ≤ 1, the Mean Value Theorem asserts that there existsc ∈ (x, y) so that

ex2 − ey2

= 2cec2

(x − y).

Since0 ≤ c ≤ 1, it follows that

|ex2 − ey2 | ≤ 2e|x − y|.If ǫ > 0 is given, it follows from the preceding inequality that if

|x − y| < δ =ǫ

2e⇒ |f(x) − f(y)| < ǫ.

Since thisδ depends only uponǫ, it follows thatf(x) = ex2

is uniformly continuous on[0, 1].

The following theorem (which we state without proof) asserts that a differentiablefunction satisfies the mean value property:

Theorem 31.5. Let f : [a, b] → R be differentiable at each point of[a, b]. If f ′(a) < y <f ′(b) or f ′(b) < y < f ′(a), then there existsx ∈ (a, b) such thatf ′(x) = y.

Example 31.4. There does not exist a functionF : R → R such thatF ′(x) = [x] for allx ∈ R. Since the greatest integer function certainly does not have the intermediate valueproperty, it is clear from the preceding theorem that it cannot be the derivative of anotherfunction.

LECTURE 32

Functions Behaving Badly

32.1. Functions Behaving Badly

Since we “know” most of the standard theorems from Calculus Ipertaining to deriva-tives, we will not bother to prove them here. It is simply moreimportant to know how touse these classic theorems (and to understand exactly what they do and do not say) ratherthan how to prove them.

The following examples illustrate that differentiable functions can have discontinuitieswhich are not jump discontinuities:

Example 32.1. The functionf : R → R defined by

f(x) =

{x2 sin 1

x x > 0

0 x ≤ 0

(see Figure 1) is differentiable everywhere – even atx = 0. Indeed, the standard formulas

0.025 0.05

-0.002

-0.001

0.001

0.002

FIGURE 1. Graph off(x) = x2 sin(1/x).

from Calculus I tell us that the derivative (see Figure 2) given by

f ′(x) = 2x sin1

x− cos

1

x

for x > 0. Using the definition of the derivative, we also see that

limx→0−

∣∣∣∣f(x) − f(0)

x − 0

∣∣∣∣ = limx→0−

0 = 0

and

limx→0+

∣∣∣∣f(x) − f(0)

x − 0

∣∣∣∣ = limx→0+

|x sin 1x |

108

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 109

0.025 0.05

-1

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

FIGURE 2. Graph off ′(x) = 2x sin(1/x) − cos(1/x).

≤ limx→0+

x = 0

whencef ′(0) = 0. Sincef ′(x) oscillates wildly (with amplitude approaching1 asx → 0),it follows that limx→0 f ′(x) 6= 0 = f ′(0) and hencef ′ is discontinuous atx = 0. Inparticular, note thatf ′ exists everywhere and the discontinuity atx = 0 is not a jumpdiscontinuity.

Example 32.2. An even more bizarre function can be constructed by modifying the pre-ceding example. Consider the function (see Figure 2)

0.02 0.04

-0.01

-0.0075

-0.005

-0.0025

0.0025

0.005

0.0075

0.01

FIGURE 3. Graph ofg(x) = x3/2 sin(1/x).

g(x) =

{x3/2 sin 1

x x > 0

0 x ≤ 0.

By reasoning similar to that of the preceding example, we seethatg′(0) = 0. However,the standard derivative formulas from Calculus I tell us that

g′(x) =3

2

√x sin

1

x− 1√

xcos

1

x

for x > 0. Thusg is differentiable atx = 0 but g′ oscillates with increasing frequencyand unbounded amplitude asx approaches zero (see Figure 1). In particular,g′ existseverywhere but is discontinuous at0 in an extreme way.

110 Lecture 32. Functions Behaving Badly

0.025 0.05

-20

-15

-10

-5

5

10

15

20

FIGURE 4. Graph ofg′(x) = 32

√x sin(1/x) − 1√

xcos(1/x).

Example 32.3.This example should dispel the common misconception that iff ′(x0) > 0,thenf must be increasing in some neighborhood ofx0. Using similar reasoning to thepreceding examples, one can show that the function (see Figure 5)

h(x) =

{x + 2x2 sin 1

x x 6= 0

0 x = 0

satisfiesh′(0) = 1 > 0 and hence one might naively assume thath is increasing in somesmall neighborhood of0 (this is not guaranteed by any theorem – read your calculus textmore closely). This turns out to be false. Indeed, the derivative of h is given by

-0.1 -0.05 0.05 0.1

-0.075

-0.05

-0.025

0.025

0.05

0.075

FIGURE 5. Graph ofh(x) = x + 2x2 sin(1/x).

h′(x) =

{1 + 4x sin 1

x − 2 cos 1x x 6= 0

1 x = 0

which oscillates between positive and negative values infinitely often asx approaches0.Thush is not increasing on any open interval that contains0.

Example 32.4.Another common misconception is that if a function has a local minimumor maximum at a point, then the derivative of that function must undergo a “simple” changeof sign at that point. Consider the function (see Figure 6)

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 111

-0.04 -0.02 0.02 0.04

2´10-6

4´10-6

6´10-6

8´10-6

0.00001

0.000012

0.000014

FIGURE 6. Graph ofk(x) = x4(2 + sin(1/x)).

k(x) =

{x4(2 + sin 1

x) x 6= 0

0 x = 0.

In particular, we note that attains its absolute minimum atk(0) = 0. Using the definitionof the derivative, we can see that

k′(x) =

{4x3(2 + sin 1

x) − x2 cos 1x x 6= 0

0 x = 0.

In particular,k′(0) = 0 as expected. However, the formula fork′ shows thatk′(x) as-sumes both positive and negative values in any neighborhoodof 0 and hencek(x) is notmonotonic on any interval(0, δ) or (−δ, 0) for anyδ > 0.

LECTURE 33

Uniform Convergence

33.1. Pointwise Convergence

Definition. Let (A, dA) and(B, dB) be metric spaces. A sequence of functionsfn : A →B converges pointwiseto a functionf : A → B if

limn→∞

fn(a) = f(a)

for eacha ∈ A.

In other words, a sequencefn converges pointwise tof if and only if it converges“point-by-point.” Unfortunately, pointwise convergenceis of limited use since it does notrespect continuity. Consider the following example:

Example 33.1. The functionsfn : [0, 1] → R defined byfn(x) = xn (see Figure 1) are

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

FIGURE 1. Graphs ofx, x2, . . . , x20.

each continuous. Nevertheless,

limn→∞

fn(x) =

{0 if 0 ≤ x < 1

1 if x = 1

and thus the pointwise limit of the functionsfn is discontinuous atx = 1. In particular,this shows that the notion of pointwise convergence is not restrictive enough to ensure that“the limit of continuous functions is continuous.”

Even more disheartening is the fact that[0, 1] is compact and hence each functionfn

is uniformly continuous (as opposed to merely continuous) on [0, 1]. We can verify thisdirectly. Consider some specific functionfn in our sequence. Ifǫ > 0 is given, then letδ = ǫ/n so that|x − y| < δ implies that

|fn(x) − fn(y)| = |xn − yn|112

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 113

= |(x − y)(xn−1 + xn−2y + · · · + xyn−2 + yn−1)|≤ (1 + 1 + · · · + 1 + 1)︸ ︷︷ ︸

n

|x − y|

= n|x − y|< ǫ.

so that eachfn is uniformly continuous. Thus eventhe pointwise limit of uniformly con-tinuous functions need not be continuous. Although ourδ’s do not depend onx, y, they doseem to depend onn andǫ.

33.2. Uniform Convergence

Since pointwise convergence does not preserve continuity,we need a stronger, morerestrictive notion of convergence. Fortunately, we have already laid some of the ground-work for this in our discussion of normed vector spaces.

Definition. Let (A, dA) and(B, dB) be metric spaces. A sequence of functionsfn : A →B converges uniformlyto a functionf : A → B if for eachǫ > 0 there existsN ∈ N sothat for alla ∈ A

n ≥ N ⇒ dB(fn(a), f(a)) < ǫ.

The main way to picture uniform convergence (at least in the special case of functionsfrom some closed interval[a, b] to R) is with via “ǫ-tubes” (see Figure 2). Based upon

FIGURE 2. Graphs of functionf (in black), an “ǫ-tube”, and a functionfn(x) (in blue) satisfying|f(x) − fn(x)| < ǫ for all x.

the definitions, one can clearly see that a sequence which converges uniformly also con-verges pointwise. Moreover, uniform convergence is another name for something we haveencountered before:

Theorem 33.1.A sequencefn ∈ C([a, b]) converges uniformly to a functionf ∈ C([a, b])

if and only if fnd∞→ f . In other words, convergence with respect to the metricd∞ is

equivalent to uniform convergence.

114 Lecture 33. Uniform Convergence

Proof. This is due to the fact that|fn(x) − f(x)| < ǫ for all x ∈ [a, b] if and only ifsup{|fn(x) − f(x)| : x ∈ [a, b]} < ǫ if and only if d∞(fn, f) < ǫ. �

An important fact about uniform convergence is that it preserves continuity:

Theorem 33.2. Let (A, dA) and(B, dB) be metric spaces. Iffn : A → B is a sequenceof continuous functions which converges uniformly tof : A → B, thenf is continuous. Inother words: “the uniform limit of continuous functions is continuous.”

Proof. Suppose thatfn : A → B converges uniformly to some functionf : A → B. Wewish to show that the limit functionf is continuous onA. It therefore suffices to showthatf is continuous at each pointx ∈ A. To this end, letǫ > 0 and letx ∈ A. Sincefn

converges tof uniformly onA, there existsN ∈ N so that

n ≥ N ⇒ dB(fn(y), f(y)) <ǫ

3(33.1)

for anyy ∈ A (in particular, this holds forn = N ). SincefN is continuous atx, thereexistsδ > 0 so that

dA(x, y) < δ ⇒ dB(fN (x), fN (y)) <ǫ

3. (33.2)

Putting (33.1) and (33.2) together we find that

dB(f(x), f(y)) ≤ dB(f(x), fN (x)) + dB(fN (x), fN (y)) + dB(fN (y), f(y))

3+

ǫ

3+

ǫ

3= ǫ.

In other words, given anyǫ > 0 and anyx ∈ A, we can find a correspondingδ > 0 so that

dA(x, y) < δ ⇒ dB(f(x), f(y)) < ǫ.

Thus the limit functionf is continuous onA. �

LECTURE 34

Uniform Convergence

34.1. Completeness ofC(X)

Definition. Let (X, d) be a compact metric space.C(X) denotes the normed vector spaceof all continuous functionf : X → R endowed with the norm‖f‖∞ = supx∈X |f(x)|.

By the Extreme Value Theorem, it follows that‖f‖∞ is finite for eachf ∈ C(X) (thisis why we needX to be compact). Being a normed vector space,C(X) is automatically ametric space when equipped with the associated metric

d∞(f, g) = supx∈X

|f(x) − g(x)|.

Theorem 34.1.If (X, d) is a compact metric space, thenC(X) is a complete metric space.

Proof. Let fn be a Cauchy sequence inC(X). It follows from the fact that

|fn(x) − fm(x)| ≤ supa≤x≤b

|fn(x) − fm(x)|

= d∞(fn, fm)

thatfn(x) is Cauchy inR for eachx ∈ X . Thus the sequencefn converges pointwise andwe may define a functionf : X → R by the formula

f(x) = limn→∞

fn(x).

We claim that the sequencefn converges tof uniformly.If ǫ > 0 is given, then the fact thatfn is Cauchy (with respect tod∞) implies that

there existsN ∈ N so that

n, m ≥ N ⇒ d∞(fn, fm) <ǫ

2.

Sincefn converges tof pointwise, it follows that for eachx ∈ X there exists anm(x) ≥N so that

|fm(x)(x) − f(x)| <ǫ

2.

Putting this all together, we see thatn ≥ N implies that

|fn(x) − f(x)| ≤ |fn(x) − fm(x)(x)| + |fm(x)(x) − f(x)|

2+

ǫ

2= ǫ

for anyx ∈ X . This implies that the sequencefn converges tof uniformly onX . Sincethe uniform limit of continuous functions is continuous, itfollows thatf is continuous (andhence belongs toC(X)). ThereforeC(X) is complete. �

Another fact (which we shall not prove) is the following:

115

116 Lecture 34. Uniform Convergence

Theorem 34.2. If fn ∈ C([a, b]) converges tof uniformly, then

limn→∞

∫ b

a

fn(x) dx =

∫ b

a

f(x) dx.

In other words, it is legal to interchange integration and uniform limits.

The situation for derivatives is somewhat more complicated(see Figure 1).

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1

1.2

1.4

FIGURE 1. The uniform limit of the differentiable functionsfn(x) =√x2 + 1

2n is f(x) = |x|, which is not differentiable atx = 0.

LECTURE 35

WeierstrassM-test

35.1. Weierstraβ M -Test

Theorem 35.1(Weierstraβ M -Test). Let (X, d) be a metric space and let letfn : X → Rbe a sequence of functions satisfying

|fn(x)| ≤ Mn

for all x ∈ X and for all n ∈ N. If∑∞

n=0 Mn converges, then∑∞

n=0 fn convergesuniformly (and absolutely for eachx ∈ X). In particular, if eachfn is continuous, thenthe limit functionf : X → R is continuous.

Proof. For eachx ∈ X , the numerical series∑∞

n=0 |fn(x)| converges by comparison with∑∞n=0 Mn. In particular,

∑∞n=0 fn(x) is absolutely convergent for eachx ∈ X whence

we may define a functionf : X → R by settingf(x) =∑∞

n=0 fn(x).Givenǫ > 0, let N ∈ N be so large that

n ≥ N ⇒∞∑

j=n+1

Mj < ǫ.

This follows from the fact that “the tail end of a convergent series goes to zero.” For eachx ∈ X we now have

|f(x) −n∑

j=0

fj(x)| = |∞∑

j=n+1

fj(x)|

≤∞∑

j=n+1

|fj(x)|

≤∞∑

j=n+1

Mj

< ǫ.

Thus the series∑∞

n=0 fn(x) converges uniformly tof(x).Now suppose that eachfn is continuous. Since the uniform limit of continuous func-

tions is continuous, it follows that the limit functionf : X → R is continuous. �

The WeierstrassM -Test is a useful way for constructing continuous functionsusinginfinite series:

Corollary 10. If fn : [a, b] → R is a sequence of continuous functions and where∑∞n=0 ‖f‖∞ converges, thenf =

∑∞n=0 fn is well-defined and continuous on[a, b].

Example 35.1.Letfn = an sin nx where|an| ≤ 1/n2. It follows thatf(x) =∑∞

n=0 an sin nxconverges uniformly onR (why?) and is continuous there.

117

118 Lecture 35. WeierstrassM -test

The WeierstrassM -Test also furnishes a way for producing somewhat bizarre contin-uous functions. For example, one can show that a sequence of everywhere differentiablefunctions can converge uniformly to a nowhere differentiable function.

Example 35.2. Start with a sawtoothw : R → R defined by

w(x) = 1 − |2 〈x〉 − 1|where〈x〉 denotes the fractional part ofx (see Figure 1). The Weierstrass nowhere differ-

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

FIGURE 1. Graphs ofw(x), w(4x), w(16x).

entiable function is defined by

W (x) =

∞∑

n=0

(3

4

)n

w(4nx).

By the WeierstrassM -Test, one sees thatW (x) series converges uniformly on[0, 1]. InparticularW (x) is continuous on[0, 1] (and hence uniformly continuous).

However, one can show thatW (x) does not have a derivative at any point of[0, 1]. Inlight of how spiky the graph of the functions

∑3n=0(

34 )nw(4nx) and

∑4n=0(

34 )nw(4nx)

are (see Figure 2), it is not surprising that the limit function W is not differentiable any-where.

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5

FIGURE 2. Graphs of the fourth and fifth partial sums of the Weierstrass series.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 119

35.2. Weierstrass Approximation Theorem

A famous result which we might prove in Math 132 is the Weierstrass ApproximationTheorem:

Theorem 35.2. The polynomials are dense inC([a, b]) with respect to thed∞ metric.Givenf ∈ C([a, b]), there exist polynomialspn such thatpn converges tof uniformly.(i.e. ‖pn − f‖∞ → 0).

The Weierstrass Theorem is remarkable for it asserts that even a continuous functionwhich is not differentiable anywhere on[a, b] (like the infamous Weierstrass function fromthe preceding lecture) is the uniform limit of polynomials (which are themselves infinitelydifferentiable).

35.3. Cauchy’s Mean Value Theorem

The following theorem is known asCauchy’s Mean Value Theoremor theExtendedMean Value Theorem:

Theorem 35.3(Cauchy’s Mean Value Theorem). If f(x) andg(x) are both continuous onthe closed interval[a, b], differentiable on the open interval(a, b), andg′(x) 6= 0 for allx ∈ (a, b), then there exists somec ∈ (a, b), such that1

f ′(c)

g′(c)=

f(b) − f(a)

g(b) − g(a). (35.1)

Settingg(x) = x on [a, b] yields the standard version of the Mean Value Theorem.

Proof. Define an auxiliary function

h(x) = f(x) − f(b) − f(a)

g(b) − g(a)g(x)

and observe that

h(a) = h(b) =f(a)g(b) − f(b)g(a)

g(b) − g(a)

(this is straightforward, but slightly tedious computation). Sinceh is continuous on[a, b]and differentiable on(a, b), it follows from Rolle’s Theorem that there existsc ∈ (a, b)such thath′(c) = 0. In other words,

0 = f ′(c) − f(b) − f(a)

g(b) − g(a)g′(c).

The preceding equation immediately implies (35.1). �

1Observe that the denominatorg(b) − g(a) is nonzero. Indeed, ifg(a) = g(b), then by Rolle’s Theoremthere existsx0 ∈ (a, b) such thatg′(x0) = 0. This contradicts the hypothesis of the theorem.

LECTURE 36

L’H opital’s Ruleand Taylor’s Theorem

36.1. L’Hopital’s Rule

An important consequence of Cauchy’s Mean Value Theorem is the following:

Theorem 36.1(L’Hopital’s Rule). If

(i) f, g are differentiable on(a, b),

(ii) limx→a+ f(x) = limx→a+ g(x) = 0,

(iii) g(x) 6= 0 andg′(x) 6= 0 for all x ∈ (a, b),

(iv) limx→a+

f ′(x)

g′(x)tends to a finite limitL,

then

limx→a+

f(x)

g(x)= lim

x→a+

f ′(x)

g′(x)= L. (36.1)

Similar statements hold in the cases wherex → ±∞ and/orlimx→a+ f(x) = limx→a+ g(x) =±∞Proof. Let x > a and observe that (ii) ensures thatf andg extend continuously to[a, b)by setting

f(a) = g(a) = 0

Letxn be a sequence in(a, b) tending toa. By Cauchy’s Mean Value Theorem, there existsa sequencecn such thata < cn < xn for all n ∈ N and such that

f ′(cn)

g′(cn)=

f(xn) − f(a)

g(xn) − g(a).

Sincef(a) = g(a) = 0, it follows that

f ′(cn)

g′(cn)=

f(xn)

g(xn)

for all n ∈ N. As xn → a+, it follows from the Squeeze Theorem thatcn → a+ whence

limn→∞

f(xn)

g(xn)= lim

n→∞f ′(cn)

g′(cn)= L.

By (iv), the limit L is independent of the sequencecn. In particular, the preceding holds forevery sequencexn in (a, b) tending toa from which the desired result (36.1) follows.�

Example 36.1. Condition (iv) is essential. Consider the functions

f(x) = x + sin x,

g(x) = x.

120

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 121

Clearly

limx→∞

f ′(x)

g′(x)= lim

x→∞(1 + cosx),

which does not exist. On the other hand, it is clear that

limx→∞

f(x)

g(x)= lim

x→∞x + sin x

x= 1 + lim

x→∞sin x

x= 1 + 0 = 1.

An incorrect application of L’Hopital’s rule has led to thewrong answer.

Example 36.2. Condition (iii) is also essential. Consider the functions

f(x) = x + cosx sin x

g(x) = esin x(x + cosx sin x)

and observe that

f ′(x) = 2 cos2 x

g′(x) = 2esin x cos2 x + esin x cosx(x + cosx sin x)

= esin x cosx(x + cosx sin x + 2 cosx).

In particular, note thatg′(2n+12 π) = 0 for eachn ∈ Z because of the factor ofcosx in the

expression forg′(x). On one hand,

limx→∞

f ′(x)

g′(x)= lim

x→∞

ddx(x + cosx sin x)

ddxesin x(x + cosx sin x)

= limx→∞

2 cos2 x

2esinx cos2 x + esin x cosx(x + cosx sin x)

= limx→∞

2 cosx

esin x(x + sin x cosx + 2 cosx)

≤ limx→∞

2e

x − 3= 0,

whereas

limx→∞

f(x)

g(x)= lim

x→∞1

esin(x)

does not exist. An incorrect application of L’Hopital’s rule in this instance leads to thewrong answer.

36.2. Taylor’s Theorem

An important generalization of the Mean Value Theorem is Taylor’s Theorem:

Theorem 36.2(Taylor’s Theorem). Letn ≥ 0 andf : [a, b] → R. If

(i) f ′, f ′′, . . . f (n) are continuous on(a, b),

(ii) f (n+1) exists on(a, b),

(iii) x, x0 ∈ (a, b),

122Lecture 36. L’H opital’s Rule

and Taylor’s Theorem

then there existsξ strictly betweenx andx0 such that

f(x) = f(x0) + f ′(x0)(x − x0) + · · · + f (n)(x0)

n!(x − x0)

n

︸ ︷︷ ︸Pn(x)

+f (n+1)(ξ)

(n + 1)!(x − x0)

n+1

︸ ︷︷ ︸Rn(x)

.

Proof. Fix x 6= x0 and letrn = rn(x) be the number defined by

f(x) =

n∑

k=0

f (k)(x0)

k!(x − x0)

k

︸ ︷︷ ︸Pn(x)

+rn

(n + 1)!(x − x0)

n+1

︸ ︷︷ ︸Rn(x)

(for this specific value ofx – we are certainly not asserting thatf is a polynomial of degreen + 1). We wish to show thatrn = f (n+1)(ξ) for someξ lying strictly betweenx andx0.Define the auxiliary function

F (t) =

n∑

k=0

f (k)(t)

k!(x − t)k +

rn

(n + 1)!(x − t)n+1

and observe thatF (x0) = F (x) = f(x).

A computation based on the “telescoping series” trick showsthat

F ′(t) =f (n+1)(t)

n!(x − t)n − rn

n!(x − t)n

=(x − t)n

n!(f (n+1)(t) − rn).

SinceF is differentiable on the open interval betweenx0 andx, it follows from Rolle’sTheorem that there exists aξ lying strictly betweenx0 andx such thatF ′(ξ) = 0. In otherwords,

0 = F ′(ξ) =(x − ξ)n

n!(f (n+1)(ξ) − rn)

whencern = f (n+1)(ξ), as desired. �

The expression

Pn(x) =n∑

k=0

f (k)(x0)

k!(x − x0)

k

is called thenth order Taylor approximation tof at x0. It is a polynomial of degree≤ n.Observe that Taylor’s Theorem provides the estimate

|f(x) − Pn(x)| ≤ |f (n+1)(ξ)|(n + 1)!

|x − x0|n+1

of the error in approximatingf(x) by Pn(x). The expression

f (n+1)(ξ)

(n + 1)!(x − x0)

n+1

is called thenth order Remainder term.

LECTURE 37

Taylor Series

Theorem 37.1(Taylor’s Inequality). If |f (n+1)(x)| ≤ M for |x − x0| < r, then theremainderRn(x) of the Taylor series satisfies

|Rn(x)| ≤ M

(n + 1)!|x − x0|n+1

for |x − x0| < r.

Proof. This follows from Taylor’s Theorem and the fact that|f (n+1)(ξ)| ≤ M for all ξbetweenx andx0. �

Definition. If f is infinitely differentiable atx0, then

∞∑

n=0

f (n)(x0)

n!(x − x0)

n

is called theTaylor seriesfor f centered atx0. We do not claim that the series convergesnor that it converges tof on some open interval containingx0.

Theorem 37.2(Taylor Expansion Theorem). Suppose that

(i) the Taylor series forf centered atx0 converges1 for |x − x0| < r,

(ii) limn→∞ Rn(x) = 0 for |x − x0| < r,

then

f(x) =

∞∑

n=0

f (n)(x0)

n!(x − x0)

n

for |x−x0| < r. In other words, the Taylor series forf(x) centered atx0 converges to thevaluef(x).

Proof. Let |x−x0| < r and recall thatf(x)−Pn(x) = Rn(ξ) whereξ is strictly betweenxandx0. Taking absolute values and applying (ii) we find thatlimn→∞ |f(x)−Pn(x)| = 0.SincePn(x) is simply thenth partial sum of the Taylor series forf(x) centered atx0, thedesired result follows. �

To most students of Calculus II, it is surprising thatboth (i) and (ii) are required forthe conclusion of the theorem to hold. We will consider several bizarre examples shortly.

1In particular, this implies thatf is infinitely differentiable atx0.

123

124 Lecture 37. Taylor Series

37.1. Smoothness Classes

Definition. Let f : (a, b) → R be a function.

• If f is continuous on(a, b), thenf is C0 on (a, b),

• If f is continuously differentiable (i.e.,f ′ exists and is continuous) on(a, b),thenf is C1 on (a, b),

• If f is nth order continuously differentiable on(a, b) (i.e., f (n) exists and iscontinuous), thenf is Cn on (a, b),

• If f is infinitely differentiable on(a, b) (i.e.,f (n) exists for eachn ∈ N), thenfis C∞ on (a, b). C∞ functions are also calledsmooth functions.

This leads to the hierarchy ofsmoothness classes:

C0 ⊃ C1 ⊃ C2 ⊃ · · · ⊃ C∞ =⋂

n∈N

Cn.

Each inclusion is a proper inclusion since one can show that

f0(x) = |x| is C0 but notC1

f1(x) = x|x| is C1 but notC2

f2(x) = |x|3 is C2 but notC3

...

Definition. A functionf : (a, b) → R is analyticon (a, b) if for eachx0 ∈ (a, b) there isa power series

f(x) =

∞∑

n=0

an(x − x0)n

centered atx0 which converges in some interval|x − x0| < r. If f is analytic on(a, b),thenf is said to be of classCω.

We know from Calculus II that the coefficients are given byTaylor’s Formula

an =f (n)(x0)

n!.

Indeed, this is not hard to derive assuming that term-by-term differentiation is allowable.Also recall that a power series is infinitely differentiableon the interior of its interval ofconvergence.2 In particular, this implies thatCω ⊆ C∞. We will discuss analytic functionsin more detail when we have discussed uniform convergence. However, it is important (andnot at all obvious) that thatCω ( C∞.

37.2. Some Smooth Functions

Example 37.1.In this example, we demonstrate the existence of an infinitely differentiablefunction which does not equal its own Taylor series on any nontrivial open interval around

2This requires proof, of course. The proper setting for powerseries is in complex analysis and you willlearn more about them there.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 125

its center. In particular, condition (ii) in the Taylor Expansion theorem cannot be ignored.To be more specific, we claim that the functionf : R → R

f(x) =

{e−1/x2

x 6= 0

0 x = 0

is C∞ butnotCω (see Figure 1). To be specific, we claim thatf is infinitely differentiable

-4 -2 2 4

0.2

0.4

0.6

0.8

1

FIGURE 1. Graph off(x) nearx0 = 0. Despite appearances, the function isnonconstant nearx = 0.

at each point ofR, and that

f (n)(0) = 0, n = 0, 1, 2, . . . .

In other words, we are claiming that the Taylor series off(x) centered atx0 = 0 is thezero function. In particular,f is an example of an infinitely differentiable function whichdoes not equal its own Taylor series on any open interval containing0. This shows thatCω

is a proper subset ofC∞.By the standard differentiation formulas, it follows thatf is infinitely differentiable

at x0 as long asx0 6= 0. It therefore remains to show thatf (n)(0) exists for eachn =0, 1, 2, . . . (in particular, we will show thatf (n)(0) = 0). We do this by induction.

BASE CASE: Sincef (0)(0) = f(0) = 0 by definition, the base case is trivial.

INDUCTIVE STEP: Suppose that we have already shown that

f(0) = f ′(0) = · · · = f (n−1)(0) = 0.

Forx 6= 0, we see that

f ′(x) = 2x−3e−1/x2

f ′′(x) = (4x−6 − 6x−4)e−1/x2

f ′′′(x) = (8x−9 − 36x−7 + 24x−5)e−1/x2

...

and, more generally, we observe thatf (n)(x) is a polynomial3 in 1/x timese−1/x2

:

f (n)(x) = Pn( 1x )e−1/x2

, x 6= 0.

3This requires a short inductive proof itself.

126 Lecture 37. Taylor Series

Using L’Hopital’s rule we see that

f (n)(0) = limx→0

f (n−1)(x) − f (n−1)(0)

x − 0

= limx→0

f (n−1)(x)

x

= limx→0

1xPn−1(

1x )e−1/x2

= limt→∞

tPn−1(t)e−t2

= 0

sincee−t2 tends to zero faster4 than any polynomial can “blow up.” Thusf (n)(0) = 0 forall n = 0, 1, 2, . . . as claimed. In particular, the Taylor series forf(x) atx0 = 0 is the zerofunction!

Definition. Let f : R → R be a function. Thesupportof f is the set

supp(f) = {x ∈ R : f(x) 6= 0}.In other words,supp(f) is the closure of the set upon whichf does not vanish. In partic-ular,supp(f) is always closed.

Example 37.2. In this example, we construct aC∞ with compact support. Consider the“bump function”

f(x) =

{e−1/(1−x2) if |x| < 1,

0 otherwise.

An argument similar to that used in Example 37.1 shows thatf ∈ C∞(R). See Figure 2We say thatf hascompact supportsincesupp(f) = [−1, 1] is compact. Keep in mind that

FIGURE 2. Graph of a “bump function.”

f accomplishes this despite being infinitely differentiableon R. Bump functions are in-credibly useful tools in advanced topology, advanced partial differential equations, Fourieranalysis, and functional analysis.

Example 37.3. Let f(x) be the “bump function” constructed in the preceding exampleand letA =

∫ 1

−1f(x) dx. The function

g(x) =1

A

∫ x

−1

f(t) dt

4This requires yet another application of L’Hopital’s Ruleto verify.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 127

is C∞ (by the Fundamental Theorem of Calculus) and satisfies

g(x)

= 0 if x ≤ −1

∈ (0, 1) if −1 < x < 1

= 1 if x ≥ 1

Thus we have aC∞ “ramp function.”

The following theorem shows that condition (i) in the TaylorExpansion Theoremcannot be ignored:

Theorem 37.3. For each sequencea0, a1, . . . of real numbers, there exists a functionf ∈C∞(R) such that

f (n)(0)

n!= an

for all n ∈ N. In particular, for each prospective Taylor series∑∞

n=0 anxn, there existsa functionf whose Taylor coefficients are preciselyan. Moreover, the choicean = nn

yields aC∞ function whose Taylor series diverges wheneverx 6= 0.

Sketch of Pf. Let ϕ be aC∞ supported in[−2, 2] and such thatϕ(x) = 1 if x ∈ [−1, 1](it takes some work to justify the existence of such a function). In particular, observe thatϕ(n)(0) = 0 for n = 1, 2, 3, . . . sinceϕ is constant on[−1, 1]. Now let

fn(x) = anxnϕ(λnx)

whereλn is a sequence of positive numbers to be defined later. Now observe that

f (j)n (0) =

{n!an if j = n

0 if j 6= n.

We need only chooseλn such that the seriesf(x) =∑∞

n=0 fn(x) converges to aC∞

function. We claim that

λn = n +

n∑

j=0

|aj |

does the job. �

LECTURE 38

Initial Value Problems

38.1. Existence and Uniqueness of Solutions

Consider theinitial value problem

y′(x) = F (x, y(x)), y(x0) = y0 (38.1)

wherey is a function ofx, x0 andy0 are real constants, andF (x, y) is a continuous functionof x, y. Many standard problems in pure and applied mathematics areof this form.

Theorem 38.1. Suppose thatF (x, y) and ∂F∂y (x, y) are continuous on an open neighbor-

hood of(x0, y0) ∈ R2. If ǫ > 0 is given, there there exists aδ > 0 such that there is aunique continuously differentiable functiony(x) onI = [x0− δ, x0 + δ] which satisfies theinitial value problem(39.1)and for which|y(x) − y0| < ǫ for all x ∈ I.

Proof. By the hypotheses of the theorem, there exists a closed rectangleR (a compact set)centered at(x0, y0) and constantsM0, M1 > 0 such that

|F (x, y)| ≤ M0 ∀(x, y) ∈ R,∣∣∣∣∂F

∂y(x, y)

∣∣∣∣ ≤ M1 ∀(x, y) ∈ R.

Now let (x, y1), (x, y2) ∈ R. By the Mean Value Theorem applied to∂F∂y with respect to

the variabley, there existsc lying strictly betweeny1, y2 such that

|F (x, y1) − F (x, y2)| =

∣∣∣∣∂F

∂y(x, c)

∣∣∣∣ |y1 − y2|

≤ M1|y1 − y2|.In summary, we have obtained the inequalities

|F (x, y)| ≤ M0 ∀(x, y) ∈ R,

|F (x, y1) − F (x, y2)| ≤ M1|y1 − y2| ∀(x, y1), (x, y2) ∈ R.

By considering an even smaller rectangle, we may also presume thatR has width2δ whereδ > 0 is sufficiently small so that

M0δ ≤ ǫ, M1δ < 1.

Let I = [x0 − δ, x0 + δ] and letE denote the closedǫ ball in C(I) centered at theconstant functiony0. In other words,

E = {f ∈ C(I) : ‖f(x) − y0‖∞ ≤ ǫ}.SinceE is a closed subset of the complete metric space(C(I), d∞), it follows that(E, d∞)is itself a complete metric space.

128

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 129

Now consider the functionT : E → C(I) defined by

[Tϕ](x) = y0 +

∫ x

x0

F (t, ϕ(t)) dt, ϕ ∈ E.

We claim thatT (E) ⊆ E. In other words,T mapsE into itself and can be regarded as afunctionT : E → E. Indeed, letϕ ∈ E (i.e.,‖ϕ − y0‖∞ ≤ ǫ). For eachx ∈ I it followsthat

|[Tϕ](x) − y0| =

∣∣∣∣∫ x

x0

F (t, ϕ(t)) dt

∣∣∣∣

≤ M0|x − x0|≤ M0δ

≤ ǫ.

Thus‖Tϕ − y0‖∞ ≤ ǫ andTϕ ∈ E.Having established thatT mapsE into E, we now show thatT : E → E is a strict

uniform contraction. In fact, we will show that the contraction constant is

α = M1δ < 1.

Let ϕ1, ϕ2 ∈ E and note that

‖Tϕ1 − Tϕ2‖∞ = supx∈I

∣∣∣∣∫ x

x0

F (t, ϕ1(t)) dt −∫ x

x0

F (t, ϕ2(t)) dt

∣∣∣∣

≤ supx∈I

∫ x

x0

|F (t, ϕ1(t)) − F (t, ϕ2(t))| dt

= supx∈I

M1

∫ x

x0

|ϕ1(t) − ϕ2(t)| dt

= M1‖ϕ1 − ϕ2‖∞∫ x

x0

dt

= M1δ‖ϕ1 − ϕ2‖∞≤ α‖ϕ1 − ϕ2‖∞.

Sinceα = M1δ < 1, it follows thatT is a strict uniform contraction.SinceT : E → E is a strict uniform contraction, it follows from the Contraction

Mapping Principle thatT has a unique fixed pointy ∈ E. In other words, there exists afunctiony ∈ E such that

y(x) = y0 +

∫ x

x0

F (t, y(t)) dt (38.2)

for all x ∈ I. In particular,y(x0) = y0.

Moreover, taking the derivative of (38.2) and using the Fundamental Theorem of Calculusit follows that y′ = F (x, y) for all x ∈ I. In other words, our fixed pointy ∈ E is asolution to the initial value problem (39.1). Moreover, it is the only solution inE. �

LECTURE 39

Picard Iteration

39.1. Initial Value Problems

Consider theinitial value problem

y′(x) = F (x, y(x)), y(x0) = y0 (39.1)

wherey is a function ofx, x0 andy0 are real constants, andF (x, y) is a continuous functionof x, y. Many standard problems in pure and applied mathematics areof this form.

Theorem 39.1. Suppose thatF (x, y) and ∂F∂y (x, y) are continuous on an open neighbor-

hood of(x0, y0) ∈ R2. If ǫ > 0 is given, there there exists aδ > 0 such that there is aunique continuously differentiable functiony(x) onI = [x0− δ, x0 + δ] which satisfies theinitial value problem(39.1)and for which|y(x) − y0| < ǫ for all x ∈ I.

Let us recall a few things about the method of proof of the Existence and UniquenessTheorem. First, we obtained theδ > 0 at the beginning of the proof. Recall that the sizeof δ (and consequently the length of the interval upon which we expect a solution to (39.1)to exist) was determined by the behavior ofF and ∂F

∂y (x, y). Onceδ > 0 was determined,we definedI = [x0 − δ, x0 + δ] and definedE to be the closedǫ-ball in C(I) centered atthe constant functiony0.

Next, we noticed thaty(x) is a solution to the initial value problem (39.1) if and onlyif

y(x) − y(x0) =

∫ x

x0

y′(t) dt

=

∫ x

x0

F (t, y(t)) dt.

Since the preceding is equivalent to

y(x) = y0 +

∫ x

x0

F (t, y(t)) dt,

it follows thaty ∈ C(I) is a solution to (39.1) if and only ify is a fixed point of the integraloperator

[Tϕ](x) = y0 +

∫ x

x0

F (t, ϕ(t)) dt,

39.2. Extended Example

Consider the initial value problem

y′ = 2x(1 + y), y(0) = 0.

130

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 131

Since the equation is both separable and linear, it is easy tosolve (assuming you have takenan elementary course in differential equations):

y(x) = ex2 − 1.

In fact, it is easy to check that the above is indeed a solutionto the initial value problem.Let us see the Contraction Mapping Principle in action. Herex0 = y0 = 0 and

F (x, y) = 2x(1 + y). The corresponding initial value problem can be rewritten as theintegral equation:

ϕ(x) =

∫ x

0

2t[1 + ϕ(t)] dt.

We therefore would like to find a fixed point of the integral operator

[Tϕ](x) =

∫ x

0

2t[1 + ϕ(t)] dt.

We letϕ0(x) = y0 = 0 (the zero function) and repeatedly applyT :

ϕn(x) = [T nϕ0](x).

The sequenceϕn should approach (with respect tod∞) the actual solution to our initialvalue problem (at least on some neighborhood ofx0 = 0). This method is known asPicarditeration.

With the initial approximation isϕ0(x) = y0 = 0, it follows that

ϕ1(x) = [Tϕ](x)

=

∫ x

0

2t[1 + 0] dt

=

∫ x

0

2t dt

= x2.

Similarly,

ϕ2(x) = [Tϕ1](x)

=

∫ x

0

2t[1 + ϕ1(t)] dt

=

∫ x

0

2t[1 + t2] dt

=

∫ x

0

2t + 2t3 dt

= x2 +x4

2.

Computing again

ϕ3(x) = [Tϕ2](x)

=

∫ x

0

2t[1 + ϕ2(t)] dt

=

∫ x

0

2t

[1 + t2 +

x4

2

]dt

=

∫ x

0

2t + 2t3 + t5 dt

132 Lecture 39. Picard Iteration

= x2 +x4

2+

x6

6.

The general pattern

ϕn(x) = x2 +x4

2!+

x6

3!+ · · · + x2n

n!can be established by mathematical induction. In particular, the Contraction MappingPrinciple tells us that the sequenceϕn converges uniformly on some interval containingx0 = 0 to our solution.

The iterative method of solving the initial value problem isclearly leading us to thesolution

ex2 − 1 = x2 +x4

2!+

x6

3!+ · · ·+

=

∞∑

n=1

x2n

n!.

In particular,ϕn is simply the a partial sum in the Taylor series forex2 − 1.

APPENDIX A

Basic Logic

A.1. Primitive Concepts

To do meaningful mathematics one needs to start out with various primitive con-cepts. There are many things that we cannot adequately definewithout some form ofself-reference. For example, try to define the following without referring to other conceptsthat require further definitions:

• Idea

• Statement

• True, false

• Sets, objects

• Everything, nothing

There are a host of words that we use every day that we simply cannot define withoutreference to other, equally hard-to-define concepts. You might say, “asetis acollectionofobjects.” But what is acollection? What areobjects? Simply put, to convey information tosomeone, you must both already have a common language and several primitive conceptsthat both parties understand beforehand.

Another interesting example is that of numbers. What exactly is “2”? What is a wholenumber, exactly? Can you define it? Of course, one might just say that this is silly – we allknow what numbers are, don’t we? It turns out, however, that some languages only havewords for “one” and “many” – but no words to express the concept of “two,” “three,” etc.

There are certain ideas (such assets, true, false, etc.) which mathematicians use freely,without worrying about any of the philosophical difficulties involved. On the other hand,many philosophers are not satisfied with this situation and seek further to clarify the mean-ing of some of these words (in analytic philosophy). In keeping with our main theme(learning about real analysis), we will not be overly picky with the philosophical details.

The termsentenceandstatementwill be used interchangeably in these notes to referto an expression that iswell-formedin the rules of the language in which it is written. Thisbrings up the ideas oflanguagesand of what exactly constitutesmeaning(these are issuesthat are discussed in the realms of computer science and philosophy). There are expres-sions likei(& * #dfs9[{ andat the the up that have no meaningful interpretationin the language in which they are written and we will not consider these to be sentences.

Sentences are classified according to theirtruth value:

Example A.1. Some sentences aretrue. The sentences

1 + 1 = 2

andThere are infinitely many prime numbers

133

134 Appendix A. Basic Logic

are true. The fact that the second statement (known asEuclid’s theorem) is true is notobvious – it requires proof.

Example A.2. Some sentences arefalse. For instance

0 > 1

and

One can get an ‘A’ in Math 131 without doing the homework

are false statements.

We also adopt the conventions that a statementcannot be simultaneously true andfalse, although a sentence can beneither true nor false.1 A proposition is a statementwhich has a definite truth value (it is either true or false). For example,1 + 1 = 3 isa proposition (which is false). Of course, there are many propositions whose exact truthvalue is unknown to us. For instance:

(i) There are infinitely many pairs of twin primes. 2

(ii) There exists an odd perfect number. 3

Nevertheless, either an odd perfect number exists or one does not. The sentence “Thereexists an odd perfect number ” is a proposition. Unfortunately, we have notbeen able to determine its exact truth value at this point in time.

A.2. Negation (NOT)

There are several basic operations which allow us to create new propositions fromold ones. The simplest of these operations is callednegation, which simply reverses thetruth value of its argument. The negation∼P (readnot P ) of a propositionP is theproposition

It is not the case that P .

When negating English sentences, one can often write thingsin a more elegant fashion.

Example A.3. If P is the proposition

Class meets at 9am ,

then∼P would be

It is not the case that︸ ︷︷ ︸∼

class meets at 9am︸ ︷︷ ︸P

.

A shorter sentence which has the same meaning is

Class does not meet at 9am .

Example A.4. If P is the proposition1 + 1 = 2, then∼P is simply1 + 1 6= 2.

Example A.5. If P is the propositioneπ < πe, then∼P is simply eπ ≥ πe. Whichproposition is true? This is actually a moderately interesting calculus problem. Can youdetermine (without a calculator) which ofeπ andπe is larger?

1This does not occur often in practice, but it does come up whenconsidering meta-mathematical issues. Wewill consider only one such example in this course.

2Twin primes are prime numbers like (17 and19) or (29 and31) which differ from each other by2.3A natural numbern is calledperfect if n is equal to the sum of its proper divisors. For instance,6 =

1 + 2 + 3, so6 is a perfect number. The next largest perfect number is28 since28 = 1 + 2 + 4 + 7 + 14. Canyou find more?

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 135

A propositionP and and its negation∼P are related by the followingtruth table:

P ∼PT FF T

Moreover, it is not hard to see that the expressions∼∼P andP have the same truth table:

P ∼P ∼∼PT F T

F T F

The importance of this observation is that the roles ofP and∼∼P can be interchanged inmathematical arguments. We say thatP and∼∼P areequivalentand write

P ⇔ ∼∼P

to symbolize this relationship.

Example A.6. Many times, we have propositions which depend on variables.For instance,let x denote a real variable and letP (x) be the statement

x is rational .4 (A.1)

The truth value ofP (x) depends, of course, onx. Since a real number which is not rationalis calledirrational, we can write∼∼P (x) as

x is not irrational . (A.2)

Clearly, (A.1) and (A.2) are saying the same thing in two different ways.

A.3. Conjunction (AND)

If P andQ are propositions, then the new propositionP∧Q is interpreted asP and Q,just as in English. In other words, the sentenceP ∧Q is true if and only ifbothstatementsP andQ are true. Therefore the truth value ofP ∧Q is related toP andQ via the followingtable:

P Q P ∧ QT T TT F FF T FF F F

Example A.7. If

P = It is Thursday

Q = It is raining today ,

thenP ∧ Q is the proposition

It is Thursday︸ ︷︷ ︸P

and︸︷︷︸∧

it is raining today︸ ︷︷ ︸Q

.

The propositionP ∧ Q is therefore true only on rainy Thursdays (whenP andQ arebothtrue).

4A rational numberis a “fraction,” a ratioa/b of integersa andb, whereb 6= 0.

136 Appendix A. Basic Logic

Example A.8. Using truth tables, we can derive theassociative lawfor ∧:

P ∧ (Q ∧ R) ⇔ (P ∧ Q) ∧ R

Indeed, we merely need to produce the truth tables forP ∧ (Q ∧R) and(P ∧ Q)∧ R andcompare entries. Since these expressions have three propositional variables (P, Q, R), ourtruth table will have8 = 23 rows since there are two possibilities for each variable (namelyT or F ).

P Q R P ∧ Q (P ∧ Q) ∧ R Q ∧ R P ∧ (Q ∧ R)T T T T T T T

T T F T F F F

T F T F F F F

F T T F F T F

F F T F F F F

F T F F F F F

F F F F F F F

Since the truth tables forP ∧ (Q ∧ R) and(P ∧ Q) ∧ R are the same, they are equivalentstatements.

A.4. Disjunction (OR)

If P andQ are propositions, thenP ∨ Q is the new proposition

P or Q

where the wordor is to be interpreted as aninclusive or(see below). Specifically, thetruth value of the propositionP ∨ Q is related toP andQ via the following table:

P Q P ∨ QT T TT F TF T TF F F

Example A.9. A mathematician is in a restaurant and sees a sign (which is presumablytruthful) that says

Lunch comes with soup or salad .

Everyday English often uses anexclusive or, meaning that one can have either soup orsalad,but not both. This is not how things work in mathematics. From the mathematician’sviewpoint, however, having both soupandsalad with lunch is a definite possibility.

Example A.10. If

P = It is Thursday

Q = It is raining today ,

thenP ∨ Q is the proposition

It is Thursday︸ ︷︷ ︸P

or︸︷︷︸∨

it is raining today︸ ︷︷ ︸Q

.

This proposition is false only on sunny days that are not Thursday.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 137

A.5. Manipulating Propositions

Now that we have introduced∼, ∧, and∨, we need to know how they interact witheach other. One of the most important conventions is that∼ takes priority over∧ and∨.Two other basic rules for manipulating propositions are calledde Morgan’s laws:

∼(P ∧ Q) ⇔ ∼P ∨ ∼Q (A.3)

∼(P ∨ Q) ⇔ ∼P ∧ ∼Q. (A.4)

A short computation shows that the expressions∼(P ∧ Q) and∼P ∨ ∼Q have the sametruth tables, which establishes (A.3):

P Q P ∧ Q ∼(P ∧ Q) ∼P ∼Q ∼P ∨ ∼QT T T F F F F

T F F T F T T

F T F T T F T

F F F T T T T

We could do something similar to show that (A.4) is correct, but there is a better way. Since(A.3) holds for any two propositionsP andQ, we can insert∼P and∼Q in their place toobtain

∼(∼P∧ ∼Q) ⇔ (∼∼P ) ∨ (∼∼Q)

⇔ P ∨ Q.

Negating both sides gives

∼(P ∨ Q) ⇔ ∼∼(∼P∧ ∼Q)

⇔ ∼P∧ ∼Q,

which establishes (A.4), the second of de Morgan’s laws.

Example A.11. Let x denote an integer variable,x ≥ 2, and define

P (x) = x is prime

Q(x) = x is odd .

Thus

P (x) ∧ Q(x) = x is prime and x is odd

= x is an odd prime

The propositionP (x) ∧ Q(x) is therefore true for

x = 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .

and false for all other integers. By (A.3), the first of de Morgan’s laws, it follows that thenegation of the proposition “x is an odd prime ” is

∼(x is an odd prime︸ ︷︷ ︸P (x)∧Q(x)

) ⇔ ∼(x is prime and x is odd︸ ︷︷ ︸P (x)∧Q(x)

)

⇔ ∼((x is prime︸ ︷︷ ︸

P (x)

) and︸︷︷︸∧

(x is odd︸ ︷︷ ︸Q(x)

))

⇔ ∼(x is prime︸ ︷︷ ︸P (x)

)∨ ∼(x is odd︸ ︷︷ ︸Q(x)

)

⇔ (x is not prime ) ∨ (x is not odd )

138 Appendix A. Basic Logic

⇔ (x is composite ) ∨ (x is even )

⇔ x is composite or x is even

⇔ x is composite or even .

(Recall that an integern is compositeif it is divisible by a positive integer other than1 andn).

A.6. Implication (P ⇒ Q)

The proposition∼P ∨ Q (called animplication) is read

If P , then Q

orP implies Q

and is commonly denotedP ⇒ Q

orQ ⇐ P.

The propositionP is called thehypothesisof the implication and the propositionQ iscalled theconclusion. Be careful with the order since∼P ∨ Q and∼(P ∨ Q) are quitedifferent expressions.5 Always remember that∼ takes priority over∧ and∨.

The truth tableP Q ∼P ∼P ∨ QT T F T

T F F F

F T T T

F F T T

(A.5)

for ∼P ∨ Q shows that the only case whereP ⇒ Q is false is whenP is true andQ isfalse.

In some texts,P ⇒ Q is defined to be

∼(P∧ ∼Q).

There is no contradiction, since the truth table

P Q ∼Q P∧ ∼Q ∼(P∧ ∼Q)T T F F T

T F T T F

F T F F T

F F T F T

for ∼(P∧ ∼Q) is identical to the truth table (A.5) for∼P ∨ Q and hence

∼P ∨ Q ⇔ ∼(P∧ ∼Q).

This also follows from de Morgan’s laws as well.

Example A.12. The term “implies” has a slightly different meaning in mathematics thanin the everyday world. The truth value of an implicationP ⇒ Q does not depend on theactual meaning ofP andQ, only on their truth values. For instance,

• If 1 + 1 = 2, then penguins can swim. (TRUE)

5In fact, if you want to be extra careful, you can write(∼P ) ∨ Q instead.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 139

• If 1 + 1 = 2, then penguins can fly. (FALSE)

• If penguins can fly, then 1 + 1 = 2. (TRUE)

• If penguins can fly, then 1 + 1 6= 2. (TRUE)

Example A.13. If

P = You miss the final

Q = You fail the course ,

then in plain English the propositionP ⇒ Q reads:

If you miss the final, then you fail the course.

Going back to the technical definition∼P ∨Q of P ⇒ Q, we see thatP ⇒ Q can also beinterpreted as

You do not miss the final︸ ︷︷ ︸∼P

or︸︷︷︸∨

you fail the course︸ ︷︷ ︸Q

.

In other words,P ⇒ Q can be read as

You take the final or you fail the course .

Note that theor does not mean that taking the final and failing the course are mutuallyexclusive possibilities. Indeed, it is certainly possibleto take the final and fail the course.

A.7. Converse (P ⇐ Q)

Theconverseof P ⇒ Q is the propositionQ ⇒ P , which we usually write as

P ⇐ Q.

In English, this might be read:

P is implied by Q.

Example A.14. If

P = Every student passes the course

Q = You pass the course ,

thenP ⇒ Q is the proposition

If every student passes the course, then you pass the course.

On the other hand, the converse ofP ⇒ Q is the propositionQ ⇒ P which can be writtenas:

If you pass the course, then every student passes the course.

Clearly these mean quite different things. It is quite possible thatP is false (someone willfail) andQ is true (you pass). In this case,P ⇒ Q is true butQ ⇒ P is false. This makesperfect sense – just because you pass the course does not meanthat everyone else will.

140 Appendix A. Basic Logic

A.8. If and only if (⇔)

The expression(P ⇒ Q) ∧ (P ⇐ Q) is so important that it has its own symbol,P ⇔ Q. We read this as

P if and only if Q

or P iff Q. In plain English, we sometimes say that

P and Q are equivalent statements .

The truth table forP ⇔ Q looks like

P Q P ⇒ Q P ⇐ Q P ⇔ QT T T T T

T F F T FF T T F FF F T T T

Essentially,P ⇔ Q means thatP andQ are either simultaneously true or simultaneouslyfalse. Also of note is the fact that (make the appropriate truth table)

(P ⇔ Q) ⇔ (∼P ⇔ ∼Q).

This does not conflict with our earlier usage of⇔. For instance, we wrote de Morgan’sfirst rule (A.3) as:

∼(P ∧ Q) ⇔ ∼P ∨ ∼Q. (A.6)

The preceding can itself be thought of as a statement, as opposed to simply relating thetruth values of the two statements∼(P ∧ Q) and∼P ∨ ∼Q. A short computation showsthat the expression (A.6) has the following truth table:

P Q ∼(P ∧ Q) ∼P ∨ ∼Q ∼(P ∧ Q) ⇔ ∼P ∨ ∼QT T F F T

T F T T T

F T T T T

F F T T T

In other words, the statement (A.6) is true regardless of thetruth value (or meaning) ofPandQ. Such statements (in more precise terminology,sentential forms) are calledtautolo-gies.

A.9. Contrapositive

Thecontrapositiveof an implicationP ⇒ Q is defined to be

∼Q ⇒ ∼P.

The reason that contrapositives are so important is becausethey are equivalent to theiroriginal implications:

(P ⇒ Q) ⇔ (∼Q ⇒ ∼P ).

This follows from examining the associated truth table:

P Q P ⇒ Q ∼Q ∼P ∼Q ⇒ ∼P (P ⇒ Q) ⇔ (∼Q ⇒ ∼P )T T T F F T T

T F F T F F T

F T T F T T T

F F T T T T T

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 141

Thus if one wants to prove that the statementP ⇒ Q is true, one can prove∼Q ⇒∼Pinstead.

Example A.15. A positive integerx ≥ 2 is calledperfectif x equals the sum of its properdivisors. For instance6 and28 are perfect numbers since

6 = 1 + 2 + 3

28 = 1 + 2 + 4 + 7 + 14.

Let x denote a positive integer≥ 2 and let

P (x) = x is perfect

Q(x) = x is even .

In plain English, we might say that

(P (x) ⇒ Q(x)) = If x is perfect, then x is even

(∼Q(x) ⇒∼P (x)) = If x is odd, then x is not perfect .

Note that these two propositions mean exactly the same thing, but in different ways. It isunknown whether an odd perfect number exists (an unsolved problem for over 2000 years),so the truth value of the propositions above are unknown.

APPENDIX B

Basic Set Theory

B.1. Sets

Almost everything in mathematics can be defined in terms of sets. Indeed, most ofmathematics fits comfortably inside the framework of set theory. What exactly is a set?As we mentioned in before, this is a difficult question. According to Georg Cantor (the“founder of set theory”)

By a set we are to understand any collection into a whole of definite andseparate objects of our intuition or our thought.

This definition is somewhat circular and it underlines one ofthe obstacles in talking aboutsets. One cannot define a set as a “a collection” without first knowing what a collectionis. After all, how does one define a “collection?” We will simply have to accept that thestudent understands what is meant by the term “set.” We do nothave the time to grapplewith the deep philosophical issues that are clearly at hand.

Sets haveelements, also known asmembers. If A is a set, thenx ∈ A stands for theproposition

x belongs to A

orx is an element of A.

For example,2 ∈ {0, 1, 2} is a true proposition. One way to describe a set is by justwriting out its members between theset brackets{ and}. The proposition∼(x ∈ A),which translates as

x is not an element of A,

is usually writtenx /∈ A.

Example B.1. According to the definition, we have

2 /∈ {0, penguin, {0, 1, 2}}.This example shows a couple things. First, the elements of a set do not have to be the“same type of thing.” Second, a set (namely{0, 1, 2}) can be anelementof another set. Ifone thinks of a set as being a “box” in which objects are placed, then one sees that is notunreasonable for a box to contain some items and possibly another box.

Two setsA andB are calledequal, writtenA = B, if and only if they have exactly thesame elements. If two setsA andB are not equal, we writeA 6= B (which literally means∼(A = B)). This is the case wheneverA contains an object thatB does not, or vice-versa.

Example B.2. According to our definition of set equality, a set is completely determinedby its members. For instance,

{π, e, e, π} = {π, π, π, e} = {π, e} = {e, π}.142

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 143

Repetition and order do not matter when listing the members of a set. Also observe that

{π, e, {e}} 6= {π, e}since{e} ande are not the same thing. One way to think about this is thate and a boxcontaininge are not the same thing.

The set∅ = { }

is called theempty set. It has no elements, it contains nothing. One can think of it as anempty box. There is one catch, however, for the empty set is considered to be unique – itis the only set with no elements.

Example B.3. Using the definition of set equality, we see that

∅ 6= {∅}since∅ ∈ {∅} while ∅ /∈ ∅ (and therefore∅ and{∅} do not have exactly the sameelements). Think of it this way: An box with an empty box inside is not the same thing asan empty box.

Example B.4. The following sets

{∅}{∅, {∅}}{∅, {∅}, {∅, {∅}}}

...

are all distinct from one another. In fact, each successive set in our list contains all of thepreceding ones aselements. They are all created from nothing, using only the primitivenotion of sets. You can therefore build quite complicated sets without assuming that actualobjects exist! In fact, if one wants toaxiomatizeset theory and construct all of mathematicsrigorously from the basic principle of a set, one can take thesequence of sets above asstarting point for defining the natural numbers.

Since this is a mathematics course, we will obviously be talking about numbers (ofvarious sorts) quite often. Some important sets of numbers which we will frequently referto are the following:

• P = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}, the set of prime numbers.1

• N = {1, 2, 3, . . .}, the set of natural numbers.

• Z = {. . . ,−2,−1, 0, 1, 2, . . .}, the set of integers.

• Q, the set of rational numbers (fractions).

• R, the set of real numbers.

1This is not standard mathematical notation.

144 Appendix B. Basic Set Theory

Although all of the above can be rigorouslyconstructedfrom the basic axioms ofset theory, we will not do so here. In this course, we make the bold assumption thatP, N, Z, andQ exist. The real numbers, however, are a different matter altogether. We willexplore the nature and structure ofR shortly. The real numbers, it turns out, are much morecomplicated that you might think.

B.2. Using Properties to define Sets

We can use propositions to define sets using the so-calledset builder notation. If P (x)is a proposition (whose truth value depends on the variable objectx), we define

{x : P (x)}to be the set of allx such thatP (x) is true. We are overlooking a few fine points of logichere,2 but this definition will be sufficient for most purposes (although we will see howunrestricted use of the set builder notation can lead to logical paradoxes).

Example B.5. The setP of all prime numbers can be written as

{2, 3, 5, 7, 11, 13, . . .} = {x : x is a prime number }= {y : y is a prime number }

Note that the particular symbol used as a variable is irrelevant. It is a “dummy variable” asin calculus:

∫ 1

0f(x) dx =

∫ 1

0f(y) dy.

Example B.6. When using the set builder notation, we must be careful to useconditionsthat are unambiguous. For instance

{x : x is a lucky number }is not well-defined sincex is a lucky number is not a proposition depending onx(it is an opinion).

Example B.7. Using the set builder notation, one can easily create sets which exist, log-ically speaking, but whose elements are impossible to find explicitly. For instance, theelements of

{x : x is a finite string of digits from the decimal expansion of π}are not possible to explicitly produce since we do not know the entiredecimal expansionfor π (for it is an infinite string of digits without any apparent pattern). We know thatcertain strings of digits, like1415 and535 belong to the set above, but in general it is nota set that we can grasp in its entirety.

B.3. Russell’s Paradox

Having worked with sets a little bit, you might be surprised to learn that our approachto sets is not logically sound. In fact, it is called “naive set theory” to distinguish it fromthe rigorous axiomatic approach used in formal set theory. Astartlingly simple logicalparadox due to Bertrand Russell immediately shows that the basis of this approach to setsis unsound.

One of the basic principles of “naive set theory” is theGeneral Comprehension Prin-ciple, which we implicitly used above. In the early days of set theory (around 1873–1900),mathematicians and logicians had always assumed that you can always define a set if you

2For instance, what is anobject? Isx allowed to be any object? Clearlyx should be restricted to “all objectsfor which P (x) makes sense,” whatever that would mean.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 145

have a “definite property”P (x). In other words, given a reasonable statementP (x), the setof all x for whichP (x) is true should exist, logically speaking. Essentially, they assumedthat

{x : P (x)}should always exist and be something that we are allowed to think about and discuss logi-cally. Surprisingly, this is not the case.

The death blow to naive set theory came in 1901 and it is calledRussell’s Paradox.Russell begins by letting

R = {x : (x is a set ) ∧ (x /∈ x)}In other words,

R is the set of all sets that are not elements of themselves.

The expressionP (x) = (x is a set ) ∧ (x /∈ x)

is quite unambiguous. An objectx shouldeither be a set or not a set. An objectx shouldeither be an element of itself or not be an element of itself. ThusP (x) looks like an unam-biguous, if a little unusual, condition. As logical human beings, we should be permitted tothink about the setR.

Russell then asks:DoesR contain itself or not?This is a simple yes/no question andthere are clearly only two possibilities.

CASE 1: If R ∈ R is true, thenR /∈ R is true by the definition ofR. However, this is notlogically possible sinceR /∈ R is false whenR ∈ R is true.

CASE 2: If R /∈ R is true, thenR ∈ R is true by the definition ofR. However, this is notlogically possible sinceR ∈ R is false whenR /∈ R is true.

NeitherR ∈ R notR /∈ R are logically possible! This means that we cannot treatRas a set – it is simply “too large” of an idea to be considered ina logical sound manner.In other words, we cannot logically consider the set of all sets that are not elements ofthemselves without running into paradoxes. We just cannot –it is a law of the universe.Russell’s Paradox shows thatthe General Comprehension Principle is not correct. Russelldiscovered this paradox and sent it to Gottlob Frege (1848 – 1925) as Frege was finishinghisGrundgesetze der Arithmetik, a work which attempted to rigorously derive the laws ofarithmetic from supposedly logical axioms. Russell’s Paradox invalidated much of Frege’swork. Indeed, Frege noted:

A scientist can hardly meet with anything more undesirable than to have thefoundation give way just as the work is finished. I was put in this positionby a letter from Mr. Bertrand Russell when the work was nearlythrough thepress.

There are many other logical paradoxes that have been discovered throughout theyears, but Russell’s paradox is one of the most important. Itforced mathematicians andlogicians to completely reevaluate mathematics and logic from the ground up. Russell’sParadox ushered in a new age in which sets would have to be treated in a rigorous ax-iomatic fashion. The rules would have to be explicitly stated in such a way that Russell’sParadox would not occur in the universe ofaxiomatic set theory. Although we will notdiscuss axiomatic set theory in this course, it is importantto be aware that sets and settheory are not as simple as they sound.

146 Appendix B. Basic Set Theory

Here are a couple of paradoxes which are somewhat similar in spirit:

Example B.8. A car is equipped with aRussell lighton its dashboard. The light turns onto warn the driver if a light has burnt out. What happens when the Russell light burns out?

Example B.9. The following paradox of Eubulides of Miletus3 (4th century BCE) indi-cates that self-reference can be troublesome:

This statement is false.

This is a troublesome sentence (call itP ) since

P is true ⇔ P is false .

Thus Eubulides’ statement is not a logical proposition. This paradox is similar to theliarparadox: I am lying .

B.4. Quantifiers

In mathematics, we often deal with propositions which depend on variables. Thespecial symbols (calledquantifiers) ∀ and∃ will help us. The symbol∀ stands either forfor all , for every , or for each (depending on which makes more grammaticalsense). The symbol∃ stands forthere exists .

There are many ways to use quantifiers and various ways to combine them with othersymbols. The best way to understand how to read and translatesentences with quantifiersis to study a number of examples.

Example B.10. The statement

(∀x > 0)(∃y)(x = ey)

can be translated in a number of ways:

• For every x > 0, there exists a y such that x = ey.

• For each x > 0, there is a y so that x = ey.

• For each positive x there exists a y such that x = ey.

• Every positive number is the exponential of anothernumber.

• Every positive number has a logarithm.

Note that we did not even bother specifying what type of object y is. To be more precise,we could have written∃y ∈ R instead of∃y. Most of the time, however, this level ofprecision is more cumbersome that it is worth. It is clear, inthis context, thaty is a realnumber and not (for instance) a function, a matrix, or a penguin.

The order of in which quantifiers appear is extremely important. Changing the orderof quantifiers often completely changes the meaning of a statement.

3I have actually been to Miletus (now known asMilet, in modern Turkey). There are many fascinatingRoman era ruins, partially sunken below a swamp, which are open to the public. There are, however, few touristswho visit the site.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 147

Example B.11. The statement

(∃y)(∀x > 0)(x = ey)

translates as

There exists a y such that for every x > 0, x = ey.

In more proper English, this reads

There exists a y such that x = ey for every x > 0.

This is completely false. It asserts that there is asinglenumbery with the property thaty = lnx for every nonnegativex. Presumably, that is not what we intended to say!

Great liberties are often taken when translating from mathematical notation to math-ematical English. It takes a while to get used to switching back and forth and mathemati-cians normally do their thinking somewhere in between the two extremes.

Example B.12. The statement

(∀x, y ∈ Q)(( (x + y) ∈ Q) ∧ (xy ∈ Q)

)

can be translated in many different ways:

• For all x, y ∈ Q, x + y ∈ Q and xy ∈ Q.

• For all x, y ∈ Q, both x + y and xy belong to Q.

• For any rational numbers x and y, it is the case thatx + y and xy are rational.

• The sum and product of rational numbers is rational.

• The rational numbers are closed under addition andmultiplication.

Although most mathematical writing is in English, you must always be able to breakthings down symbolically if you have any doubt as to the logical meaning of a statement.This is especially important if you need to negate a complicated logical statement. Thisoccurs, for instance, when beginning a proof by contradiction.

Example B.13. Recall thatP denotes the set of prime numbers, so that

P = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}.The statement:

(∀p ∈ P)(∃q ∈ P)(p < q).

translates as

For every prime p there exists a prime q such that p < q. (B.1)

In other words, this says that

For any prime, there is a bigger prime.

This proposition is true and it is known asEuclid’s Theorem4, which is usually stated as

There are infinitely many prime numbers.

4There are several theorems that go by the name “Euclid’s Theorem.” The theorem we are discussing isProposition IX.20 ofthe Elements.

148 Appendix B. Basic Set Theory

This is a somewhat liberal translation of (B.1), of course. Nevertheless, it emphasizes thefact that there is not a direct correspondence between mathematics and English.

Example B.14. As mentioned earlier, the order in which the quantifiers appear is crucial.For example, the proposition

(∃p ∈ P)(∀q ∈ P)(p < q)

means

There exists a prime p such that for every prime q, p < q.

The preceding statement asserts that there is a prime that isstrictly less thanall otherprimes. We can easily demonstrate this is if false. Takeq = 2 and note that there is noprimep such thatp < 2.

Example B.15. Another symbol which occurs quite frequently is the ! symbol. It standsfor “unique” so that(∃!x) is translated asthere exists a unique x such that .UsingR as our universal set, consider the statement:(∃!x)(∀y)(xy = y). In other words

There exists a unique x such that for every y, xy = y.

This is a true statement – in fact thex in question is simply the number1. It is the onlyreal numberx with the property thatxy = y for everyy.

B.5. Negating Propositions With Quantifiers

We negate propositions involving quantifiers according to the rules:

∼[(∀x)P (x)] ⇔ (∃x)(∼P (x))

∼[(∃x)P (x)] ⇔ (∀x)(∼P (x)).

If the quantifiers have additional symbols attached, the rules are the same. For instance:

∼[(∀x ∈ A)P (x)] ⇔ (∃x ∈ A)(∼P (x))

∼[(∃x ∈ A)P (x)] ⇔ (∀x ∈ A)(∼P (x)).

Consider the following example:

Example B.16. There are a number of ways that the statement

(∃x ∈ Q)(x2 = 2) (B.2)

could be translated. For instance, we might say that

• There exists an x in Q such that x2 = 2.

• There exists a rational number x such that x2 = 2.

• There is a rational number x such that x2 = 2.

• 2 has a rational square root .

It turns out that (B.2) is false. In fact,√

2 is an irrational number and hence cannot bewritten in the forma/b wherea andb are integers.5

We can negate (B.2) to obtain atrue statement. In words, we might say:

There does not exist a rational number x such that x2 = 2.

5According to legend, this was discovered by the Pythagoreanphilosopher Hippasus of Metapontum (anancient Greek colony in southern Italy) around 500 BCE. The numberse andπ were proved to be irrational byEuler and Lambert in 1737 and 1760, respectively.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 149

We would like to write this in terms of quantifiers (There does not exist is not aquantifier) According to the rules for negating propositions with quantifiers, the negationof (B.2) is:

∼(∃x ∈ Q)(x2 = 2) ≡ (∀x ∈ Q)((∼(x2 = 2))

≡ (∀x ∈ Q)(x2 6= 2).

There are several ways to interpret this:

• For every x in Q, x2 6= 2.

• For each rational number x, it is the case thatx2 6= 2.

• For all rational x, x2 6= 2.

Of course, the three interpretations above are all equivalent ways of saying:√

2 is irrational .

This is a true statement, known as Hippasus’ theorem (and often attributed to Pythagoras).

B.6. Subsets

Definition (Set Inclusion). If A andB are sets, then we say thatA ⊆ B (read:A is asubset of B) if every member ofA is also a member ofB. In other words,

(A ⊆ B) ⇔ (∀x)(x ∈ A ⇒ x ∈ B), (B.3)

When we write∀x, the variablex actually lives in someuniversal setU . Typically,Uwill be a set of numbers, functions, or other mathematical objects. Moreover, exactly whatthe universal setU is will typically be clear from context.

The following theorem can be proved from the basic definitions and logical principles(although we will not prove it in these notes):

Theorem B.1. (A = B) ⇔ [A ⊆ B ∧ B ⊆ A]

The importance of the theorem above is that if we wish to provethatA = B, it sufficesto proveA ⊆ B andB ⊆ A separately. This is sometimes easier than provingA = Bdirectly.

Example B.17. ∅ ⊆ {0, 1} ⊆ {0, 1, 2} ⊆ R. Indeed,{0, 1} is subset of{0, 1, 2} sinceevery element in{0, 1} (namely the numbers0 and1) also belongs to{0, 1, 2}. We alsosee that{0, 1, 2} is a subset ofR since0, 1, and2 all are elements ofR (the set of realnumbers).

Example B.18. Observe thatA ⊆ A holds for any setA. In other words,every set is asubset of itself.Indeed, the proposition

(∀x)((x ∈ A) ⇒ (x ∈ A))

is true for anyx in our universal space. To see this, simply write out a truth table for theimplicationP ⇒ Q whereP = (x ∈ A) andQ = (x ∈ A):

x ∈ A x ∈ A (x ∈ A) ⇒ (x ∈ A)T T TF F T

We can therefore say that

For any x ∈ A, it is the case that x ∈ A.

150 Appendix B. Basic Set Theory

By the definition (B.3) we can conclude thatA ⊆ A.

Example B.19. Also observe that∅ ⊆ A for any setA. In other words,every set has theempty set∅ as a subset.Indeed, the statement(∀x)((x ∈ ∅) ⇒ (x ∈ A)) is true sincethere are no elements of the empty set and hence thehypothesisx ∈ ∅ (theP in P ⇒ Q)of the implication above is always false:

x ∈ ∅ x ∈ A (x ∈ ∅) ⇒ (x ∈ A)F T TF F T

We can therefore say that(∀x)((x ∈ ∅) ⇒ (x ∈ A)).

Note that any proposition of the form(x ∈ ∅) ⇒ Q(x) is true in the same way thatthat the proposition

If a penguin can fly, then it will rule the world .

Logically, this is a true statement since there are no penguins that can fly and hence thehypothesisa penguin can fly is always false.

Instead of writing∼ (A ⊆ B) we write A * B, which is read:A is not asubset of B.

Example B.20. What doesA * B really mean?

A * B ⇔ ∼[(∀x)(x ∈ A ⇒ x ∈ B)]

⇔ (∃x)[∼(x ∈ A ⇒ x ∈ B)]

⇔ (∃x)[∼(∼(x ∈ A) ∨ (x ∈ B))]

⇔ (∃x)[(x ∈ A) ∧ (x /∈ B)].

Hence another way of saying thatA * B is:

There exists an x such that x is an element ofA and x is not an element of B.

In other words, to show thatA * B it suffices to show that there exists somex whichbelongs toA but not toB.

Definition (Proper Subset). If A andB are sets, then we say thatA ⊂ B (read:A is aproper subset of B) if A ⊆ B but A 6= B. This is sometimes writtenA ( B aswell.

Example B.21. Recalling from previous lectures our definitions of standard sets of num-bers, we immediately recognize that

P ⊂ N ⊂ Z ⊂ Q ⊂ R ⊂ C

Here we have addedC, the set of complex numbers, to our list of number sets.6

6We might also addC ⊂ H ⊂ O whereH is quaternionnumber system andO is theoctonionnumbersystem. The quaternions arenoncommutative4-dimensional number system (whose elements are of the forma + bi + cj + dk wherea, b, c, d ∈ R) discovered by the Irish mathematician William Rowan Hamilton. Theidea came to him while he was walking to a meeting of the Irish Academy. Hamilton scratched the fundamentalformulasi2 = j2 = k2 = ijk = −1 on the stone of Brougham Bridge (Dublin). Hamilton’s graffiti remains tothis day a mathematical tourist attraction. TheoctonionsO are a bizarre number system (also called theCayleynumbers) which we do not describe here in any further detail.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 151

B.7. Complement, Union, and Intersection

Having introduced sets and some basic logical principles, we are now ready to discussrelationships between sets and methods of constructing newsets from old ones. First wewill discuss the notions ofinclusionandcomplement.

Definition (Set Complement). If A andB are sets, then thecomplement ofB in A is theset7

A\B = {x ∈ A : x /∈ B}.Example B.22. Let A = {0, 1, {a, b}} andB = {a, b, 1}. ThenA\B = {0, {a, b}}. Onthe other handB\A = {a, b}.

Typically, one works inside of some universal setU and in terms of Venn diagrams, thecomplement ofA in U is just the “outside ofA.” If the universal set is declared beforehand(or obvious from context), then we sometimes denote the complement ofA in U by A′ orAc.

Example B.23. If the universal setU = Z andA = N, then

Nc = Z\N = {. . . ,−3,−2,−1}the set of negative integers.

Definition (Union). If A andB are sets then theunionof A andB is the set

A ∪ B = {x : (x ∈ A) ∨ (x ∈ B)}.Definition (Intersection). If A andB are sets then theintersectionof A andB is the set

A ∩ B = {x : (x ∈ A) ∧ (x ∈ B)}.There are many laws about how unions and intersections interact. They can be derived

from the rules for∧ and∨. For example:

Theorem B.2. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

Proof. One way to show that the two sets are equalA ∩ (B ∪ C) and(A ∩B) ∪ (A ∩ C)is to show that the conditions for membership in them are logically equivalent. We willtherefore try to show that the statementsx ∈ [A ∩ (B ∪C)] andx ∈ [(A ∩B) ∪ (A ∩C)]are logically equivalent:8

x ∈ [A ∩ (B ∪ C)]

⇔ (x ∈ A) ∧ [x ∈ (B ∪ C)] def. of∩⇔ (x ∈ A) ∧ [(x ∈ B) ∨ (x ∈ C)] def. of∪⇔ [(x ∈ A) ∧ (x ∈ B)] ∨ [(x ∈ A) ∧ (x ∈ C)] dist. law for∧,∨⇔ [x ∈ (A ∩ B)] ∨ [x ∈ (A ∩ C)] def. of∩⇔ x ∈ [(A ∩ B) ∪ (A ∩ C)] def. of∪

In conclusion, we have shown thatx ∈ A∩(B∪C) if and only if x ∈ (A∩B)∪(A ∩ C) and therefore the two sets are equal. �

7Many books denote set complement ofB in A by A−B. Both notations are common in mathematics, butI preferA\B to A − B since it is not uncommon in abstract algebra to consider setsof the form{ a − b : (a ∈A) ∧ (b ∈ B)} whereA andB are sets of numbers (or, more generally, elements of a commutative group).

8It is good form, especially when you are starting out, to write your reasoning on the side. You should writein such a manner that a fellow student could look at your work and understand the reasoning behind what you aredoing.

152 Appendix B. Basic Set Theory

Although Venn diagrams are not always accurate (they are limited by the constraintsof being in two dimensions and are hence unsuitable for picturing complex relationshipsbetween large numbers of sets), they are generally a good tool for getting the feel of astatement. For instance, draw some Venn diagrams to convince yourself that the theoremabove is true.

B.8. Ordered Pairs

Since we will typically be concerned with order pairs (and orderedn-tuples) of realnumbers, we do not need to go further into the subject of ordered tuples and Cartesianproducts at this point. However, let us briefly mention the technical definition:

Definition (Ordered Pairs). The symbol(a, b) denotes anordered pair. It has the propertythat if (c, d) is another ordered pair then

(a, b) = (c, d) ⇔ (a = c) ∧ (b = d).

It is important to note that the existence of a definition doesnot logically imply theexistence of the object defined. For instance, we might make the following definition:

Definition. A penguinp is calledexceptionalif p can fly.

It is clear that no exceptional penguins exist, despite the nice definition we made forthem. We have not actually proved that ordered pairs exist orthat some structure satisfyingthe definition can be constructed using sets. However, one can actuallydefinethe orderedpair (a, b) to be the set

(a, b) = {{a}, {a, b}}and then verify that this set satisfies the property of the definition. However, we will notgo through the (somewhat tedious) proof.

B.9. Cartesian Products

Definition (Cartesian Product). If A andB are sets, then theCartesian productof A andB is the set

A × B = {(a, b) : a ∈ A, b ∈ B}.Example B.24. If A = R andB = R then the Cartesian productR × R is denotedR2.This is typically thought of as thexy-plane in analytic geometry.

Example B.25. [−1, 1]× (0, 1) = {(x, y) : (−1 ≤ x ≤ 1) ∧ (0 < y < 1)}.

Example B.26. If A = {0, 1} andB = R, thenA × B can be thought of as the set ofpoints which are on the vertical linesx = 0 andx = 1 in thexy-plane.

B.10. Power Sets

Definition. If A is a set, then thepower setof A, denoteP(A) is defined to be the set ofall subsets ofA. In symbols:

P(A) = {B : B ⊆ A}.Example B.27. If A = ∅, thenP(A) = {∅}.

Example B.28. If A = {a}, thenP(A) = {∅, A}.

Example B.29. If A = {a, b}, thenP(A) = {∅, {a}, {b}, A}.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 153

Example B.30. If A = {a, b, c}, then

P(A) ={∅, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A

}.

You can probably see a pattern forming here. In this case, ourintuition is correct:

Theorem B.3. If A is finite and#A = n, thenP(A) = 2n.

Sketch of Pf. SinceA is finite and has exactlyn elements, we may index the elements ofA:

A = {a1, a2, . . . , an}.To see why the theorem is true, the easiest way is to think likea computer. There is aone-to-one correspondence between subsets ofA andbinary strings(a string of0’s or 1’s)of lengthn. For a given subsetB of A, thejth binary digit of the corresponding string is0if aj /∈ B and1 if aj ∈ B. For instance, the subset

B = {a1, a3, a5}corresponds to the binary string10101000 · · ·000. Since there are2n possible strings,there are2n possible subsets. �

Another way to think of the preceding “sketch” is to considerhow many choices onehas when creating a subset ofA. To construct a subset ofA, one has to choose whether toincludea1 or not. Then on has to choose whether to includea2 or not, and so forth. In all,there will ben choices to make and there are2n possible ways of doing this.

Example B.31. Describing the power set of infinite sets is much trickier. For instance,P(N) contains every possible subset ofN and hence the sets

∅, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}all belong toP(N).

B.11. Concerning Exceptional Penguins

Definition. A penguinp is calledexceptionalif p can fly.

This is clearly a defintion whose use may lead to contradictions and falsehoods. Justbecause we can define something, it does not mean that this thing necessarily exists, that itis logically sound, or that it is necessarily a useful concept. Furthermore, you can never besure that you will not run into contradictions far down the road if you just “invent” a newmathematical concept.

The present digression concernsodd perfect numbers, a class of much-studied num-bers which few people actually believe exist. Nevertheless, there is an enormous literatureon the subject and many theorems have been proved about them.

The Pythagoreans (who were numerologists) regarded the number6 as special becauseit is equal to the sum of its proper divisors. Specifically

1 + 2 + 3 = 6.

The next largest numbers with this property are28, 496, and8128 since:

28 = 1 + 2 + 4 + 7 + 14

496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248

8128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127

+254 + 508 + 1016 + 2032 + 4064.

154 Appendix B. Basic Set Theory

One of the cornerstones of Pythagorean philosophy was the assignment of mystical quali-ties to numbers. They chose to call numbers like6, 28, 496, and8126 perfect numbers.

Later philosophers and theologians like St. Augustine and Alcuin of York would ex-pound the special nature of such numbers. For instance, in the City of God, St. Augustine(354-430) said:

Six is a number perfect in itself, and not because God createdthe world in sixdays; rather the contrary is true. God created the world in six days because thisnumber is perfect, and it would remain perfect, even if the work of the six daysdid not exist.

The fact that it takes28 days for the Moon to travel round the Earth was also seen toconfirm the importance of perfect numbers.

In his bookIntroductio Arithmetica, Nicomachus of Gerasa (ca. 60-120 C.E.) conjec-tured that there is one perfect number with exactlyk digits for eachk ≥ 1 and that theyalternate ending in6 and8. Both of these claims are incorrect, since the fifth and sixthperfect numbers are33, 550, 336 and8, 589, 869, 056.

After Euclid and until Euler, most mathematicians implicitly assumed that all perfectnumbers were generated by a formula due to Euclid and Euler9. This formula producesonly evenperfect numbers. Some, like Descarte and Mersenne admittedthat they saw noreason why odd perfect numbers should not exist, despite thefact that no one had yet foundone.

Euler was one of the first to attack one of the most intriguing (and one of the oldest)problems in number theory and proved an important theorem onodd perfect numbers.Although no odd perfect numbers are known to exist, there aremany conditions that ahypothetical odd perfect number must satisfy. As Sylvester(1814-1897) noted:

. . . the existence of [an odd perfect number] - its escape, so to say, from thecomplex web of conditions which hem it in on all sides – would be little short ofa miracle.

A (by no means complete) list of conditions that an odd perfect numbern must satisfy aregiven below:

• n has at least four distinct prime factors. (Cole, 1888)

• If n is not divisible by3, 5, or 7, thenn has at least26 distinct prime factors.(Catalan, 1888)

• n has at least5 distinct prime factors andn > 2 · 106. (Turcanov, 1908)

• n has at least6 distinct prime factors. (Gradshtein, 1925)

• Not all of the even exponentski can be2. (Steuerwald, 1937) %item Not all ofthe even exponentski can be4. Nor is it possible for one of them to be4 andthe others2. (Kanold, 1941)

• n > 1020. (Kanold, 1957)

• Not all of the even exponentski can be6. (Haggis, McDaniel, 1972)

• n > 1036 (Tuckerman, 1976).

• If 3 does not dividen thenn has at least11 distinct prime factors. (Haggis,1983)

9Marking a collaboration of “Eu” mathematicians 2000 years apart!

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 155

• n > 10160. (Brent, Cohen, 1987)

• n > 10300. (Brent, 1991)

A whole theory has been developed about a class of numbers that probably do notexist. The goal of these mathematicians is to show that odd perfect numbers are so strangethat they cannot exist (so the whole topic of odd perfect numbers can be thought of asa giant proof by contradiction). Their work makes it seem extremely plausible that oddperfect numbers do not exist, but so far a proof has evaded mathematicians for over 2000years.

This indicates that one should take definitions seriously. One can define a concept(odd perfect numbers, for instance) and prove many theoremsabout this concept, but nev-ertheless the concept may be vacuous since the concept may contradict earlier axioms.For instance, someone might prove one day that odd perfect numbers do not exist. Thetheorems quoted above would therefore be about a class of numbers that do not exist.10

10Although in this case, fortunately, the aim of these theorems is to show that the properties that an oddperfect number must satisfy are so restrictive thatno number can satisfy them. In particular, no one actuallybelieves that odd perfect numbers exist.

APPENDIX C

Mathematical Induction

C.1. The Power Sum Problem

What is1 + 2 + 3 + · · · + 100? (C.1)

According to mathematical folklore, the correct answer (namely 5050) was given immedi-ately by the young Carl Friedrich Gauss (1777-1855) when histeachers assigned his classthis “busy work” problem. His teachers soon realized Gauss’prodigious talent and hiseducation was later sponsored by the Duke of Brunswick.

The young Gauss found the sum (C.1) using the formula

1 + 2 + 3 + · · · + n =n(n + 1)

2. (C.2)

This formula1 can be derived by adding the two equations

1 + 2 + 3 + · · · + n = S

n + (n − 1) + (n − 2) + · · · + 1 = S

together to obtain the equationn(n + 1) = 2S

for the unknown sumS. Dividing by 2 yields the formulaS = 12 (n(n + 1)). In particular,

settingn = 100 in Gauss’ formula (C.2) gives the answer5050 to (C.1).What about sums of squares? Noting that

12 = 1

12 + 22 = 5

12 + 22 + 32 = 14

12 + 22 + 33 + 42 = 30,

12 + 22 + 33 + 42 + 52 = 55,

some trial and error might lead us to conjecture the formula

12 + 22 + · · · + n2 =n(n + 1)(2n + 1)

6. (C.3)

Of course, we have not actually proved that (C.3) is the correct formula since checkinga finite number of cases does not prove that the formula is valid for everyn = 1, 2, . . ..Consider the following example:

Example C.1. Let p(n) = n2 + n + 41 so thatp(0) = 41, p(1) = 43, p(2) = 47,p(3) = 53, p(4) = 61, . . . . Do you notice a pattern? It appears thatp(0), p(1), p(2), . . .are always primes. In fact,p(n) is prime forn = 0, 1, 2, . . . , 39 but p(40) is composite.

1The formula (C.2) was known since ancient times (and hence merely “rediscovered” by the young Gauss).

156

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 157

Indeed,p(40) = 402 +40+ 41 = 40(40+ 1)+41 = 40 · 41+41 = 41 · 41. This strikingexample is due to Leonhard Euler.

Is (C.3) correct? Can we prove it? Furthermore, how do we find formulas for sums ofcubes and higher powers?

C.2. Mathematical Induction

Suppose thatP (n) is a statement depending on the natural numbern. The Principleof Mathematical Induction2 states that if both

(1) P (1) is true ,

(2) If P (n) is true, then P (n + 1) is true ,

are true statements, thenP (n) is true for alln ≥ 1.The informal reason why is the following. By (1),P (1) is true. By (2),P (2) =

P (1 + 1) is true sinceP (1) is true. By (2),P (3) = P (2 + 1) is true sinceP (2) is truean so on. Another way to think about induction is to imagine climbing an infinite ladder.Condition (2) means that if you are on step#n of the ladder, then you are able to climb tostep#(n+1). Condition (1) says that you start on step #1. We conclude from this that wecan eventually reach every single step on the ladder.3

Proving thatP (1) is true is called the “base case” of the induction. The secondstep isa little more conceptually difficult. We must prove the implication

If P (n) is true, then P (n + 1) is true.

We are not asserting thatP (n) IS true for alln, but rather we are trying to prove thatP (n + 1) is true under theinductive hypothesisthatP (n) is true. The key word is “if.”

Theorem C.1. The summation formula

1 + 2 + · · · + n =n(n + 1)

2(C.4)

holds for alln ∈ N.

Proof. Let P (n) be the formula (C.4). We want to show thatP (n) is true for all (integers)n ≥ 1.

BASE CASE: P (1) is true since1 = 1·22 . This establishes the base case.

INDUCTIVE STEP: If P (n) is true for some numbern, does it follow thatP (n + 1) is alsotrue? We basically want to prove the statementIf P (n) is true, then P (n + 1)is true. In other words, we must show that

(1 + 2 + · · · + n =

n(n + 1)

2

)

⇒(

1 + 2 + · · · + n + (n + 1) =(n + 1)((n + 1) + 1)

2

).

Therefore our goal is to derive the formula

1 + 2 + · · · + n + (n + 1) =(n + 1)((n + 1) + 1)

2

2It is possible to prove the principle of mathematical induction from the axioms of set theory, but we willnot do that here.

3Note that there is no “infinity step” and that it is improper tospeak of a “last step” since there is no “lastnatural number.”

158 Appendix C. Mathematical Induction

from the formula

1 + 2 + · · · + n =n(n + 1)

2.

Addingn + 1 to both sides of the preceding formula gives

(1 + 2 + · · · + n) + (n + 1) =n(n + 1)

2+ (n + 1)

=n(n + 1) + 2(n + 1)

2

=(n + 1)(n + 2)

2

=(n + 1)((n + 1) + 1)

2.

ThereforeP (n + 1) is true if P (n) is true. By induction, the formula holds for alln ∈N. �

Similarly, we canprovethat (C.3) is correct by mathematical induction:

Theorem C.2. The summation formula

12 + 22 + · · · + n2 =n(n + 1)(2n + 1)

6(C.5)

holds for all natural numbersn ≥ 1.

Solution. We proceed using mathematical induction.

BASE CASE: The base casen = 1 is easily verified:

12 =1 · 2 · 3

6.

INDUCTIVE STEP: Suppose that the formula (C.5) holds for some unspecified value of n(this is ourinductive hypothesis). In other words, suppose that

12 + 22 + · · · + n2 =n(n + 1)(2n + 1)

6(C.6)

for some specific value ofn. Adding(n + 1)2 to both sides of (C.6) we obtain

12 + 22 + · · · + n2 + (n + 1)2 =n(n + 1)(2n + 1)

6+ (n + 1)2

=n(n + 1)(2n + 1)

6+ (n2 + 2n + 1)

=n(n + 1)(2n + 1)

6+

6(n2 + 2n + 1)

6

= 16 ((n2 + n)(2n + 1) + 6n2 + 12n + 6)

= 16 (2n3 + 3n2 + n + 6n2 + 12n + 6)

= 16 (2n3 + 9n2 + 13n + 6)

= 16 ((2n3 + 7n2 + 6n) + (2n2 + 7n + 6))

= 16 (n + 1)(2n2 + 7n + 6)

= 16 (n + 1)(n + 2)(2n + 3)

=(n + 1)((n + 1) + 1)(2(n + 1) + 1)

6.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 159

Hence if (C.5) holds for some value ofn, it must also hold forn + 1 as well. Since wehave established that the formula holds forn = 1, it follows that it also holds for2. Sinceit holds forn = 2, it must hold forn = 3, and so on. This is the essence of mathematicalinduction. �

Based on the facts that

1 + 1 + 1 + · · · + 1 = n

1 + 2 + 3 + · · · + n =n(n + 1)

2(C.7)

1 + 22 + 32 + · · · + n2 =n(n + 1)(2n + 1)

6. (C.8)

we might conjecture that thepower sum

Sm(n) = 1m + 2m + · · · + nm (C.9)

is always a polynomial in the variablen of degreem + 1. This was proved by JacobBernoulli (1654-1705) in his bookArs Conjectandi(published posthumously in 1713). Heproudly proclaimed that “in less than half a quarter of an hour” he was able to sum thetenth powers of the first thousand integers. Before we solve this old problem, we need toreintroduce the binomial coefficient, first encountered in calculus.

C.3. The Binomial Coefficient(nk

)

You may have seen thebinomial coefficient(

n

k

)=

n!

k!(n − k)!(C.10)

before. Heren! (wheren is a natural number) denotes the product

n = 1 · 2 · 3 · · · (n − 1) · nif n ≥ 1. If n = 0, then0! is defined to be1. Although it is not obvious, it turns out that(nk

)is always an integer. Indeed, looking at the formula (C.10),it is actually somewhat

remarkable that(nk

)is an integer! We will see why shortly.

The symbol(nk

)is sometimes read “n choosek” since it turns out that this also repre-

sents the number of ways to choosek objects from a collection ofn. In other words,(nk

)

is the number ofk element subsets of a set withn elements:

Theorem C.3. Let S denote a set containing exactlyn elements. For any non-negativeintegerk, the number of subsets ofS containing preciselyk elements is given by

(nk

).

Proof. If S hasn elements and we wish to form an ordered list with exactlyk (necessarilydistinct) elements, we haven ways to choose the first element,n − 1 ways to choose thesecond, and so forth. There are thereforen(n − 1) · · · (n − k + 1)(n − k) = n!/(n − k)!separate lists ofk elements fromS. For a given subset withk elements, there arek!different orderings, so for each of then!/(n − k)! lists that were chosen, there are onlyn!/(k!(n − k)!) distinct subsets containing preciselyk elements. �

Corollary 11.(nk

)is always an integer.

Although it follows from the combinatorial interpretationthat(nk

)is always an integer,

we will present an independent proof of this fact later. The following example shows howthe preceding theorem works:

160 Appendix C. Mathematical Induction

Example C.2. Consider the setS = {a, b, c, d, e}. How many two element subsets ofSare there? To make a two element subset, we first need to chooseone element, and thereare5 ways of doing this. Let’s say that we picka:

{a}.Now we have to choose an additional element ofS to into our subset. There are4 additionalways of doing this. Let’s pickb:

{a, b}.We have produced a two element subset ofS. There were

5 × 4 =5!

3!

ways of doing this. However, if we had chosenb first and thena, we would have{b, a}instead. But{a, b} = {b, a} and we must therefore divide5!3! by the number of ways toorder a set with2 objects, namely2!. Therefore the total number of2 elements subsets ofS is

(52

)or “5 choose2.” Of course this example does not prove anything, but it gives you

a little bit of the feel for the proof of the preceding theorem.

C.4. Pascal’s Triangle

A useful mnemonic for remembering binomial coefficients isPascal’s Triangle, thefirst few rows of which are reproduced below:

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

(C.11)

From Pascal’s Triangle, one can deducePascal’s Rulewhich describes the(n + 1)st rowof Pascal’s Triangle in terms of thenth row.

Theorem C.4(Pascal’s Rule). For n, k ≥ 0,(

n

k

)+

(n

k + 1

)=

(n + 1

k + 1

)

Proof. This is a straightforward computation:(

n

k

)+

(n

k + 1

)=

n!

k!(n − k)!+

n!

(k + 1)!(n − k − 1)!

=n!(k + 1)

(k + 1)!(n − k)!+

n!(n − k)

(k + 1)!(n − k)!

=n!(k + 1 + n − k)

(k + 1)!((n + 1) − (k + 1))!

=(n + 1)!

(k + 1)!((n + 1) − (k + 1))!

=

(n + 1

k + 1

). �

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 161

Using Pascal’s Rule we see that the entries(n+1

k

)in the(n + 1)st row of the triangle

are integers precisely because the entries(nk

)in the preceding row are integers. Sometimes

Pascal’s Rule is written in the form:(n

k − 1

)+

(n

k

)=

(n + 1

k

)

for n ≥ 1 and1 ≤ k ≤ n.

Corollary 12.(nk

)is always an integer.

Proof. We prove this by induction onn. HereP (n) is the statement(nk

)is an integer for all k ∈ {0, 1, 2, . . . , n}.

We may start our induction atn = 1 since(00

)= 1 is clearly an integer.

BASE CASE: The statementP (1) is true since(10

)=(11

)= 1 follows immediately from

the definition of(nk

).

INDUCTIVE STEP: Now we must show that

If P (n) is true, then P (n + 1) is true

to complete the proof. IfP (n) is true, then(nk

)is an integer when0 ≤ k ≤ n. We must

then show, under this hypothesis, that(n+1

k

)is an integer when0 ≤ k ≤ n + 1. This is

where Pascal’s Rule comes in.For eachk ∈ {1, 2, 3, . . . , n} Pascal’s Rule says that

(n + 1

k

)=

(n

k − 1

)+

(n

k

).

If P (n) is true, then(

nk−1

)and

(nk

)are both integers and hence so is

(n+1

k

). Therefore(

n+1k

)is an integer when1 ≤ k ≤ n. Fork = 0 andk = n + 1

(n + 1

0

)= 1,

(n + 1

n + 1

)= 1

follow from the definition of the binomial coefficient. Therefore(n+1

k

)is an integer when

0 ≤ k ≤ n + 1 and henceP (n) impliesP (n + 1). This completes the proof. �

C.5. The Binomial Theorem

Expanding out (by brute-force)(x+y)n shows that the coefficient of the termxkyn−k

in the expansion of(x + y)n is given by(nk

). The first fewbinomial expansions(for small

integer exponents) are written below:

(x + y)0 = 1

(x + y)1 = x + y

(x + y)2 = x2 + 2xy + y2

(x + y)3 = x3 + 3x2y + 3xy2 + y3

(x + y)4 = x4 + 4x3y + 6x2y2 + 4xy3 + y4

(x + y)5 = x5 + 5x4y + 10x3y2 + 10x2y3 + 5xy4 + y5.

Thebinomial theoremsays that this pattern (based on Pascal’s triangle) continues indefi-nitely:

162 Appendix C. Mathematical Induction

Theorem C.5(Binomial Theorem). The formula

(x + y)n =

n∑

k=0

(n

k

)xkyn−k (C.12)

holds for any integern ≥ 1 and any real numbersx, y.

Proof. We prove this by induction onn. HereP (n) is the statement

(x + y)n =n∑

k=0

(n

k

)xkyn−k.

BASE CASE: The statementP (1) is true since(x + y)1 = x + y.

INDUCTIVE STEP: Now we must prove the statementIf P (n) is true, then P (n+1) is true. In other word we must show that

[(x + y)n =

n∑

k=0

(n

k

)xkyn−k

]⇒[(x + y)n+1 =

n+1∑

k=0

(n + 1

k

)xky(n+1)−k

].

If P (n) is true, then

(x + y)n+1 = (x + y)(x + y)n

= (x + y)n∑

k=0

(n

k

)xkyn−k

=

n∑

k=0

(n

k

)xk+1yn−k +

n∑

k=0

(n

k

)xkyn−k+1

=

n+1∑

k=1

(n

k − 1

)xkyn−(k−1) +

n∑

k=0

(n

k

)xky(n+1)−k

=

n+1∑

k=1

(n

k − 1

)xky(n+1)−k +

n∑

k=0

(n

k

)xky(n+1)−k

=

(n

0

)x0yn+1 +

(n∑

k=1

[(n

k − 1

)+

(n

k

)]xky(n+1)−k

)+

(n

n

)xn+1y0

=

(n + 1

0

)x0yn+1 +

n∑

k=1

(n + 1

k

)xky(n+1)−k +

(n + 1

n + 1

)xn+1y0

=

n+1∑

k=0

(n + 1

k

)xky(n+1)−k.

We have shownIf P (n) is true, then P (n + 1) is true and henceP (n) istrue for alln ∈ N by induction. �

Example C.3. The equation

2n = (1 + 1)n =n∑

k=0

(n

k

)

follows from the binomial theorem. Recall that(nk

)is the number ofk element subsets of

a set withn elements. The preceding equation tells us that there are2n total subsets of aset withn elements.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 163

C.6. Bernoulli’s Solution to the Power Sum Problem

Jacob Bernoulli discovered an extremely clever solution tothe power sum problem,which we present here. Using the Binomial Theorem to expand(x + 1)m+1 we find that

(x + 1)m+1 − xm+1 =

(m+1∑

k=0

(m + 1

k

)xk · 1m+1−k

)− xm+1

=

(m+1∑

k=0

(m + 1

k

)xk

)− xm+1

=

[1 +

(m + 1

1

)x +

(m + 1

2

)x2 + · · · +

(m + 1

m

)xm + xm+1

]− xm+1

= 1 +

(m + 1

1

)x +

(m + 1

2

)x2 + · · · +

(m + 1

m

)xm,

yielding the formula

(x + 1)m+1 − xm+1 = 1 +

(m + 1

1

)x +

(m + 1

2

)x2 + · · · +

(m + 1

m

)xm.

Since this holds forx = 1, 2, 3, . . . , n, we may add this equation to itself asx goes from1to n to obtain

n∑

x=1

[(x + 1)m+1 − xm+1

]=

n∑

x=1

[1 +

(m + 1

1

)x +

(m + 1

2

)x2 + · · · +

(m + 1

m

)xm.

]

The sum on the left “telescopes” and hence

(n + 1)m+1 − 1 =

n∑

x=1

[1 +

(m + 1

1

)x +

(m + 1

2

)x2 + · · · +

(m + 1

m

)xm.

]

=

(n∑

x=1

1

)+

(m + 1

1

)( n∑

x=1

x

)+

(m + 1

2

)( n∑

x=1

x2

)+ · · · +

(m + 1

m

)( n∑

x=1

xm

)

= n +

(m + 1

1

)S1(n) +

(m + 1

2

)S2(n) + · · · +

(m + 1

m

)Sm(n)

where

Sm(n) = 1m + 2m + · · · + nm

denotes the sum of the firstn mth powers. All of these computations yield Bernoulli’sformula

(n + 1)m+1 − 1 = n +

(m + 1

1

)S1(n) +

(m + 1

2

)S2(n) + · · · +

(m + 1

m

)Sm(n).

This is arecursiveformula forSm(n). In other words, if we have formulas forSk(n) fork = 1, 2, . . . , m − 1 we can solve the equation above forSm(n).

Example C.4. Recall that our experimentation suggested that

S3(n) = 13 + 23 + · · · + n3 =

[n(n + 1)

2

]2.

164 Appendix C. Mathematical Induction

This formula can be derived from Bernoulli’s recursive procedure. Indeed, we have

S1(n) =n(n + 1)

2

S2(n) =n(n + 1)(2n + 1)

6and hence settingm = 3 in Bernoulli’s formula we see that

(n + 1)4 − 1 = n +

(4

1

)n(n + 1)

2+

(4

2

)n(n + 1)(2n + 1)

6+

(4

3

)S3(n).

Expanding out both sides of the preceding equation yields

n4 + 4n3 + 6n2 + 4n = n + (2n2 + 2n) + (2n3 + 3n2 + n) + 4S3(n).

Collecting common terms reduces the preceding to

n4 + 2n3 + n2 = 4S3(n)

from which it follows that

S3(n) =

[n(n + 1)

2

]2

as desired. Although this formula could also be proved usingmathematical induction, onewould first have to know the formula beforehand (i.e. via numerical computations andguesswork, as we have done). The advantage of Bernoulli’s method is that knowledge oflower order power sums leads directly to formulas for higherorder power sums, withouthaving to derive formulas from numerical computations and inspired guesswork.

APPENDIX D

Ordered Fields

D.1. Fields

The two prominent modern methods of constructing the real numbers (starting onlywith the rational numbers, set theory, and logic) is throughDedekind cutsor equivalenceclasses of Cauchy sequences. We will briefly touch on these later on in the course andthrough the homework. However, we will not dwell on them now.Rather, we will examinethe properties of the real numbers that makes them what (we think) they are.1

Let us assume for the moment thatR exists. What type of object isR? Where does itfit into the grand scheme of things? In algebraic terminology, the real numbersR form afield, a type of “generalized number system” which shares many of the standard propertiesof elementary arithmetic.

Definition. A field is a setK endowed with two operations, denoted+ and·, which satisfythe following axioms:

(i) COMMUTATIVITY : x + y = y + x andx · y = y · x for everyx, y ∈ K.

(ii) A SSOCIATIVITY: (x + y) + z = x + (y + z) and(x · y) · z = x · (y · z) foreveryx, y, z ∈ K.

(iii) D ISTRIBUTIVITY : x · (y + z) = x · y + x · z for everyx, y, z ∈ K.

(iv) A DDITIVE AND MULTIPLICATIVE IDENTITIES: There are distinct elementscalled0 and1 of K such thatx + 0 = x and1 · x = x for everyx ∈ K

(v) ADDITIVE AND MULTIPLICATIVE INVERSES: For eachx ∈ K, there exists anelement ofK, denoted−x, such thatx + (−x) = 0. For any nonzerox ∈ K,there exists an element ofK, denotedx−1, such thatx · x−1 = 1.

It is important to be explicit about these axioms, for there are many algebraic systemswhich do not obey all of the rules above. For instance, one canadd and multiplyn × nmatrices, but matrix multiplication is not commutative nordoes every nonzero matrix hasan inverse.

Most of the rules of basic algebra that you are familiar with from grade school can beproved from these basic axioms. Unless you have taken abstract algebra, you might nothave known that there are many other “number systems” that obey these rules too.

It is important to understand that many different fields exist, and that the operations+ and· do not necessarily correspond to our usual understanding ofaddition and multipli-cation. Furthermore, the symbols0 and1 do not necessarily correspond to the numbers0and1, in the usual sense. Consider the following examples:

1Fortunately, mathematicians have proved that the real number system exists and that it satisfies the prop-erties of a “complete ordered field.” These properties are not assumed as axioms, rather they can be deducedlogically from either construction method referred to above.

165

166 Appendix D. Ordered Fields

Example D.1. Let K be a set containing the symbols0 and1 and define the operations+and· by the following tables:

+ 0 10 0 11 1 0

· 0 10 0 01 0 1

One can check thatK = {0, 1}, equipped with the operations above, forms a field. In fact,you are already familiar with this field since it correspondsto the “algebra of even and oddnumbers” (represented by0 and1).

Example D.2. One can sometimes make new fields from pieces of old number systems.If p is a prime number, then the setZp = {0, 1, 2, . . . , p − 1} forms a field2 when theoperations are defined by

x + y = remainder ofx + y when divided byp

x · y = remainder ofx · y when divided byp

As expected,0 and1 play the role of additive and multiplicative identities in this field.Note also thatZ2 is simply the field from the previous example.

Example D.3. The rational numbersQ, endowed with the standard operations, form afield. It is asubfieldof R.

Example D.4. The setQ(

√2) = {a + b

√2 : a, b ∈ Q},

endowed with the usual operations of addition and multiplication, is also field.

Example D.5. The complex numbers systemC is a field. Notice also that

Q ⊂ Q(√

2) ⊂ R ⊂ C.

Example D.6. R(x), the set of (real) rational functions, is a field (when endowed withthe usual addition and multiplication of functions). The constant functions0 and1 are theadditive and multiplicative identities.

MORAL: Although R is a field, the field axioms (i.e. standardproperties of commutativity, associativity, and distributivity) donot narrow things down to the points where R is the only suchobject. Can we list more properties of R? In fact, can we finda list of properties that characterize R completely?

D.2. Ordered Fields

One property that helps to distinguishR from otherfields is the fact thatR comesequipped with an ordering. Specifically, the real numbers form what is called anorderedfield. In addition to the standard field axioms, an ordered field also satisfies the following:

Definition. A field K is an ordered field if there is a subsetK+ of K such that

(i) If x, y ∈ K+, thenx + y ∈ K+ andx · y ∈ K+.

(ii) T RICHOTOMY: For eachx ∈ K, one and only one of the following is true:x ∈ K+, x = 0, −x ∈ K+.

2If p is not a prime number, thenZp is not a field. For instance,2 has no multiplicative inverse inZ4.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 167

One then says thatx < y if y − x ∈ K+. The elements ofK+ are calledpositiveand theelements such that−x ∈ K+ are callednegative.

Example D.7. Q, Q(√

2), andR, endowed with the usual notions of positive and negative,are ordered fields.

Example D.8. The fieldR(x) of all rational functions in the variablex can be ordered.Specifically, we say thatf ≥R(x) 0 if f is eventually positive. In other words

(f ≥R(x) 0) ⇔ (∃M > 0)(∀x > M)(f(x) > 0).

Although this example may seem somewhat alien at first, it directly corresponds to theintuitive notion of “how strong a function is asx → ∞.” For instance, in the ordering ofR(x) we havex2 > x > 1/x. Unlike R, however,R(x) does not have the ArchimedeanProperty. Indeed, inR(x), we have1 ≤R(x) x butn · 1 ≤R(x) x also holds for anyn ∈ N.

Every ordered field comes equipped with anabsolute value, defined by:

|x| =

{x if x ≥ 0

−x if x < 0.

It is not too hard to show that the absolute value enjoys the standard features that we allexpect it to. However, there are two important properties that are often forgotten:

|x + y| ≤ |x| + |y| (Triangle Inequality)

||x| − |y|| ≤ |x − y| (Reverse Triangle Inequality).

Know these inequalities well – you will use them many times inthis course.

Another important consequence of the order axioms is the Trichotomy Law:

Theorem D.1 (Trichotomy Law). Let K be an ordered field. Givenx, y ∈ K, then oneand only one of the following statements is true:x < y, x = y, x > y.

Example D.9. Ordered fields are much rarer than fields. For example, no finite field isan ordered field. Furthermore, the complex number systemC is not ordered. Indeed, ifC were an ordered field, then by the Trichotomy Law, eitheri > 0, i = 0, or i < 0.Manipulating these inequalities quickly leads to contradictions (try it).

Example D.10. Some fields can be ordered in more than one way. For example,Q(√

2)sits “inside” R, and as such has a natural ordering. However, one can declarea new or-dering by saying thata + b

√2 is positive in the “new sense” ifa − b

√2 is positive in the

usual sense. It requires some checking, but it turns out thatthis givesQ(√

2) two possibleorderings. Fortunately,Q andR themselves can be ordered in one and only one way (thisrequires checking too).

Adding the order axioms to the field axioms narrows things down a bit. We are closerto obtaining a list of properties that characterizesR. However,Q, Q(

√2), andR(x) are

also ordered fields. We therefore need to add at least one moreaxiom to make sure that wehave completely characterizedR.

APPENDIX E

Primes Numbers

E.1. Euclid’s Theorem

Recall that the prime numbers are the building blocks of all integers. You are prob-ably at least informally acquainted (via grade school arithmetic) with many of their basicproperties.

Definition. An integerp > 1 is called aprime numberif there is no (integer) divisord ofp such that1 < d < p. A positive integer that is not prime is called acomposite number.

Example E.1. The integers 2, 3, 5, and 7 are primes and 4, 6, 8, and 9 are composites.Less obvious examples are1299709 (the100000th prime number) and1299711, which isdivisible by3 and hence composite.

Theorem E.1 (Fundamental Theorem of Arithmetic). Every integern > 1 can be ex-pressed as a product of primes. Specifically, we may writen = pa1

1 pa2

2 · · · parr where the

pk are distinct primes and theak are positive integers. The factorization of an integern > 1 into primes is unique, apart from the order of the prime factors.

This theorem first appeared (somewhat vaguely) as Proposition 14 of Book IX of Eu-clid’s book theElements(ca. 2300 BCE):

“If a number be the least that is measured by prime numbers, itwill not bemeasured by any other prime except those originally measuring it.”

However, C.F. Gauss (in his groundbreaking 1804 treatiseDisquisitiones Arithmeti-cae) was the first to state and prove the Fundamental Theorem of Arithmetic in a rigorousway. Incidentally, Gauss was also the first to prove the Fundamental Theorem of Algebrain a rigorous way!

An important mathematical fact is that the set

P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}of prime numbers in infinite. This nontrivial assertion, nowknown asEuclid’s theorem,was proved in Book IX of Euclid’s bookthe Elements. Euclid’s proof, along with the irra-tionality of

√2 (commonly attributed to Pythagoras, but most likely due to the Pythagorean

Hippasus of Metapontum), is considered one of the most mathematically elegant contribu-tions of the ancient Greeks.

In his famous bookA Mathematician’s Apology, the great early 20th century Englishmathematician G.H. Hardy stated that

I can hardly do better than go back to the Greeks. I will state and prove two ofthe famous theorems of Greek mathematics. They are ‘simple’theorems, bothin idea and in execution, but there is no doubt at all about their being theoremsof the highest class. Each is as fresh and significant as when it was discovered. . . two thousand years have not written a wrinkle on either ofthem . . . The first

168

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 169

is Euclid’s proof of the existence of an infinity of prime numbers. . . . My secondexample is Pythagoras’s proof of the ‘irrationality’ of

2. . .

Euclid’s proof is startling in its simplicity and its elegant use ofreductio ad absurdum(proof by contradiction). As Hardy says:

Reductio ad absurdum, which Euclid loved so much, is one of a mathematician’sfinest weapons. It is a far finer gambit than any chess play: a chess player mayoffer the sacrifice of a pawn or even a piece, but a mathematician offers thegame.

We are now ready to prove Euclid’s theorem:

Theorem E.2(Euclid of Alexandria). The number of primes is infinite.

Proof. Suppose toward a contradiction that the setP = {p1, p2, . . . , pn} of all primes isfinite. If this is the case, then the numberN = p1p2 · · · pn + 1 is not divisible by any ofthe primespj . Indeed, division bypj leaves a remainder of1 sinceN is one more thanp1p2 · · · pn, which is divisible bypj . Therefore the prime factors ofN cannot belong tothe setP (which was supposed to contain all of the prime numbers). This contradicts ourhypothesis thatP contains every prime number and hence this hypothesis must be false. Inother words, the set of all primes cannot be finite – it must be infinite. �

Although there are infinitely many primes, it is always possible to find arbitrarily largegaps between consecutive primes. The proof of this fact is relatively straightforward andprovides an example of a direct proof:

Theorem E.3. There are arbitrarily large gaps in the sequence of primes: For each integern ≥ 2, there exists a sequence ofn consecutive composite integers.

Proof. Let n ≥ 2 be a given positive integer and note that(n + 1)!, being the product of1, 2, . . . , n, n + 1, is divisible by each of the numbers2, 3, . . . , n. Therefore

(n + 1)! + 2 is divisible by2

(n + 1)! + 3 is divisible by3

......

(n + 1)! + n is divisible byn.

Hence it follows that there existn consecutive composite numbers. �

Example E.2. Forn = 4, the construction used in the proof of Theorem E.3 produces thesequence

122 = 2 · 61

123 = 3 · 41

124 = 4 · 31

125 = 5 · 25

of four consecutive composite integers. Hoewever,24, 25, 26, 27 and32, 33, 34, 35 areboth much smaller sequences of composite integers. In general, the method of Theorem E.3produces much larger sequences than necessary. This also illustrates the fact that althougha proof might work, it does not mean that the methods used are necessarily “optimal.”

170 Appendix E. Primes Numbers

E.2. The Prime Number Theorem

Legendre was the first to publicly make a significant conjecture regarding the largescale distribution of prime numbers. In hisEssai sur la Theorie des Nombres(1798), heproposed that

limx→∞

π(x)/( x

log x − 1.08366

)= 1

whereπ(x) denotes the number of primes≤ x and log denotes the natural logarithm.Based on numerical evidence, Gauss (as a child) conjecturedthat

limx→∞

π(x)

x/ log x= 1 (E.1)

andlim

x→∞π(x)

/Li(x) (E.2)

where the function

Li(x) =

∫ x

2

dt

log t

is called thelogarithmic integral. It appears that Gauss’ work on the subject began in 1791(at the age of fourteen), well before Legendre’s book was written. The conjecture (E.1) istrue, and it is now known as thePrime Number Theorem. The proof of the Prime NumberTheorem would have to wait until the end of the 19th century.

A major step was taken in 1850, when the Russian mathematician Pafnuty LvovichChebyshev proved that there exist constantsc1, c2 such that

c1x

log x< π(x) < c2

x

log x

for sufficiently largex. He also proved that if

limx→∞

π(x)

x/ log x

exists, then this limit must equal1. Unfortunately, Chebyshev was not able to prove thatthe limit actually exists.

In 1896, Hadamard and de la Vallee Poussin (independently) proved the celebratedPrime Number Theorem:

Theorem E.4(Prime Number Theorem).

limx→∞

π(x)

x/ log x= 1.

Their proofs are technical and involve the use of complex function theory and theRiemannζ-function. In 1949, Selberg and Erdos succeeded in provingthe Prime Num-ber Theorem without using complex function theory. Their so-calledelementaryproof isexceedingly complicated, but does not use advanced complexanalysis.

It is interesting to note that the conjecture (E.2) of the fourteen year old Gauss is alsotrue andmore accuratethan the standard prime number theorem.

A result of Littlewood (1914) shows that the differenceπ(x) − Li(x) assumes bothpositive and negative values infinitely often. However, thefirst value ofx for whichπ(x) >Li(x) is not known. In 1933, Skewes proved that such anx must occur before

eee79

≈ 10101034

. (E.3)

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 171

The number (E.3) is calledSkewes numberand is widely believed to be the largest numberthat has ever appeared for a genuine purpose. Subsequently this extravagant bound hasbeen reduced to1.165× 101165 by Lehman (1966),8.185× 10370 by te Riele (1987), andit is now known to be somewhat less than1.39822× 10316.

APPENDIX F

Galileo’s Paradox

The following is the passage fromThe Discourses and Mathematical Demonstrations Re-lating to Two New Sciencesconcerning Galileo’s Paradox:

SIMPLICIO: Here a difficulty presents itself which appears to meinsoluble. Since it is clear that we may have one line greaterthananother, each containing an infinite number of points, we areforcedto admit that, within one and the same class, we may have somethinggreater than infinity, because the infinity of points in the long line isgreater than the infinity of points in the short line. This assigning toan infinite quantity a value greater than infinity is quite beyond mycomprehension.

SALVIATI : This is one of the difficulties which arise when we at-tempt, with our finite minds, to discus the infinite, assigning to itthose properties which we give to the finite and limited; but this Ithink is wrong, for we cannot speak of infinite quantities as being theone greater or less than or equal to another. To prove this I have inmind an argument which, for the sake of clearness, I shall putin theform of questions to Simplicio who raised this difficulty. I take it forgranted that you know which of the numbers are squares and whichare not.

SIMPLICIO: I am quite aware that a squared number is one whichresults from the multiplication of another number by itself; this 4, 9,etc., are squared numbers which come from multiplying 2, 3, etc., bythemselves.

SALVIATI : Very well; and you also know that just as the products arecalled squares so the factors are called sides or roots; while on theother hand those numbers which do not consist of two equal factorsare not squares. Therefore if I assert that all numbers, including bothsquares and non-squares, are more than the squares alone, I shallspeak the truth, shall I not?

SIMPLICIO: Most certainly.

SALVIATI : If I should ask further how many squares there are onemight reply truly that there are as many as the correspondingnumberof roots, since every square has its own root and every root its own

172

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 173

square, while no square has more than one root and no root morethanone square.

SIMPLICIO: Precisely so.

SALVIATI : But if I inquire how many roots there are, it cannot bedenied that there are as many as the numbers because every numberis the root of some square. This being granted, we must say thatthere are as many squares as there are numbers because they are justas numerous as their roots, and all the numbers are roots. Yetatthe outset we said that there are many more numbers than squares,since the larger portion of them are not squares. Not only so,butthe proportionate number of squares diminishes as we pass tolargernumbers, Thus up to 100 we have 10 squares, that is, the squaresconstitute 1/10 part of all the numbers; up to 10000, we find only1/100 part to be squares; and up to a million only 1/1000 part;onthe other hand in an infinite number, if one could conceive of such athing, he would be forced to admit that there are as many squares asthere are numbers taken all together.

SAGREDO: What then must one conclude under these circumstances?

SALVIATI : So far as I see we can only infer that the totality of allnumbers is infinite, that the number of squares is infinite, and thatthe number of their roots is infinite; neither is the number ofsquaresless than the totality of all the numbers, nor the latter greater thanthe former; and finally the attributes “equal,” “greater,” and “less,”are not applicable to infinite, but only to finite, quantities. Whentherefore Simplicio introduces several lines of differentlengths andasks me how it is possible that the longer ones do not contain morepoints than the shorter, I answer him that one line does not containmore or less or just as many points as another, but that each linecontains an infinite number.

APPENDIX G

Inner Product Spaces

G.1. Review: The Dot Product

Let us recall some ideas you may have seen in your basic LinearAlgebra and/or Mul-tivariable Calculus course. Recall that thenormof a vectorx = (x1, x2, x3) in R3 is givenby

‖x‖ =√

x21 + x2

2 + x23.

You may also recall that thedot product(or scalar product) of two vectorsx =(x1, x2, x3) andy = (y1, y2, y3) in R3 is defined by the formula

x · y = x1y1 + x2y2 + x3y3.

Note that the dot product takes twovectorsas input and outputs ascalar. Hence the dotproductdoes notprovide us a way to multiply two vectors together to obtain another vector.

Of paramount importance is the geometric relation:

x · y = ‖x‖‖y‖ cos θ

whereθ was the “angle betweenx and y.” This easily implies the Cauchy-Schwarz-Bunyakowky inequality

|x · y| ≤ ‖x‖‖y‖

for all x,y in R3.One of the most important properties of the dot product is thefollowing:

x · x = x1x1 + x2x2 + x3x3

= x21 + x2

2 + x23

= ‖x‖2.

In particular, we note that

x · x ≥ 0

for any vectorx. The dot product also satisfies the properties:

(x + y) · z = x · z + y · z

and

x · y = y · x

as you should recall.

174

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 175

G.2. Inner Products

Since the dot product proved so useful in Vector Calculus andbasic Linear Algebra,we would like to generalize it as much as possible. Motivatedby the ideas in the precedingsection, we are led to the following formal definition:

Definition. An inner producton a vector spaceV is a function

〈 · , · 〉 : V × V → R

such that:

(i) (POSITIVITY ) 〈v,v〉 ≥ 0 for all v ∈ V ;

(ii) (D EFINITENESS) 〈v,v〉 = 0 if and only if v = 0;

(iii) (A DDITIVITY IN FIRST SLOT) 〈u + v,w〉 = 〈u,w〉 + 〈v,w〉;

(iv) (SYMMETRY ) 〈u,v〉 = 〈v,u〉;

(v) (HOMOGENEITY) 〈au,v〉 = a 〈u,v〉 for all a ∈ R.

An inner product spaceis simply a vector spaceV equipped with an inner product.

There are a couple additional properties that inner products have, which follow quicklyfrom the definitions. For example, combining (iii) and (v) yields:

〈au + bv,w〉 = a 〈u,w〉 + b 〈v,w〉for all a, b ∈ R andu, v, w,∈ V .

Example G.1. Rn, when equipped with the dot product, is an inner product space. Withour new notation, we have

〈x,y〉 =

n∑

i=1

xiyi.

wherex = (x1, x2, . . . , xn) andy = (y1, y2, . . . , yn).

Example G.2. If A is an invertiblen × n matrix, then

〈x,y〉A = 〈Ax, Ay〉defines an inner product onRn. Here〈Ax, Ay〉 refers to thestandardinner product onRn

from the preceding example. Let us briefly check that this satisfies properties (i) through(v):

(i) If x ∈ Rn, then〈x,x〉A = 〈Ax, Ax〉 ≥ 0 since the standard inner product (i.e.,the dot product) satisfies (i). More geometrically, we note that

‖Ax‖ =√〈Ax, Ax〉,

the Euclidean norm of the vectorAx ∈ Rn.

(ii) If 〈x,x〉A = 0, then〈Ax, Ax〉 = 0 whenceAx = 0 since the standard innerproduct satisfies (ii). SinceA is invertible, it follows thatx = 0 since thehomogeneous systemAx = 0 has only the trivial solution.

(iii) This is a straightforward computation using the fact that multiplication byA islinear:

〈u + v,w〉A = 〈A(u + v), Aw〉

176 Appendix G. Inner Product Spaces

= 〈Au + Av, Aw〉= 〈Au, Aw〉 + 〈Av, Aw〉= 〈u,w〉A + 〈v,w〉A .

(iv) Since the standard inner product satisfies (iv) it follows that

〈u,v〉A = 〈Au, Av〉 = 〈Av, Au〉 = 〈v,u〉A .

(v) Using the fact that the standard inner product satisfies (v) along with the factthatA(au) = a(Au) for all a ∈ R andu ∈ V , we see that

〈au,v〉A = 〈A(au), Av〉 = 〈aAu, Av〉 = a 〈Au, Av〉 = a 〈u,v〉A .

In summary, there are many possible inner products onRn. It turns out that the innerproducts described above are the only possible inner products onRn.

G.3. Norms Defined by Inner Products

Recall that anorm on a vector spaceV is a function‖ ‖ : V → R that satisfies thefollowing conditions:

(i) ‖v‖ ≥ 0 for all v ∈ V and‖v‖ = 0 if and only if v = 0

(ii) ‖av‖ = |a|‖v‖ for anya ∈ R andv ∈ V ,

(iii) ‖v + w‖ ≤ ‖v‖ + ‖w‖.

It turns out that an inner product space is always a normed vector space. In fact, thefollowing definition is a generalization of the fact that ifx = (x1, x2, x3) is a vector inR3,then its Euclidean length‖x‖ is given by‖x‖2 = x · x.

Definition. If V is an inner product space andv ∈ V , then thenorm onV induced by theinner productis defined by

‖v‖ =√〈v,v〉. (G.1)

It turns out that (G.1) indeed defines a norm onV . In other words, one can verify thatthe axioms for a norm are satisfied by the expression (G.1):

Theorem G.1. If V is an inner product space, then‖v‖ =√〈v,v〉 defines a norm onV .

In particular, ‖v‖ satisfies the axioms (i), (ii), and (iii) for a norm onV andV is thus anormed vector space.

Proof. Property (i) is easily verified:√〈v,v〉 ≥ 0 for all v ∈ V is automatic since

〈v,v〉 ≥ 0 for all v ∈ V by the definition of an inner product. If√〈v,v〉 = 0, then

〈v,v〉 = 0 whencev = 0 by the definition of an inner product.Property (ii) is slightly trickier:

‖av‖2 = 〈av, av〉= a 〈v, av〉= a 〈av,v〉= a2 〈v,v〉= |a|2‖v‖2.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 177

Make sure you see why each step was valid – look at the axioms for inner products to seewhich rules we used. Taking square roots yields the desired formula

‖av‖ = |a|‖v‖.We postpone the proof of Property (iii), the Triangle Inequality, until later. �

Example G.3. We can define an inner product onC([a, b]), the vector space of continuous(real-valued) functions on the closed interval[a, b], by defining

〈f, g〉 =

∫ b

a

f(x)g(x) dx.

The reason for using continuous functions is to ensure that the preceding integral existsand is finite. Something that requires proof is that

〈f, f〉 =

∫ b

a

|f(x)|2 dt

equals zero ifff(x) is the zero function. We will overlook this for the moment.

The preceding product is not so bizarre. In fact, vectors inRn are just functions, ifyou think of them the right way. One usually thinks of a vectorf ∈ Rn as ann-tuple

f = (a1, a2, . . . , an).

One can also think off as the function

f : {1, 2, 3, . . . , n} → R

such thatf(x) = ax for eachx ∈ {1, 2, 3, . . . , n}. From this point of view the innerproduct onRn is simply

〈f ,g〉 =

n∑

x=1

axbx =

n∑

x=1

f(x)g(x).

Keeping in mind that integration is a type of “summation process” (think Riemann sums),one begins to see the relationship between the standard inner products onRn andC([a, b]).They are essentially the same, except that one is discrete and one is continuous.In light ofthis revelation, we will begin using the symbolsf, g to denote generic vectors (as opposedto u,v, . . .).

G.4. Orthogonal Vectors

Definition. Two vectorsu,v ∈ V are calledorthogonalif 〈u,v〉 = 0.

Example G.4. In the real inner product spaceRn vectorsa = (a0, a1, . . . , an) andb =(b1, b2, . . . , bn) are orthogonal iff

〈a,b〉 =

n∑

k=1

anbn = 0.

Recall that inRn we have〈u,v〉 = ‖u‖‖v‖ cos θ

whereθ denotes the angle betweenu andv. Therefore〈u,v〉 = 0 if and only if u andv are perpendicular vectors. In light of the preceding example, we see that the concept oforthogonality is a generalization of the notion of perpendicularity inRn. Indeed, we studyinner products precisely because we want to import as many geometric notions into thestudy of abstract inner product spaces as possible.

178 Appendix G. Inner Product Spaces

Example G.5. If m, n ∈ Z, thencos 2πnx andsin 2πmx are orthogonal inC([0, 1]) withrespect to the inner product〈f, g〉 =

∫ 1

0f(x)g(x) dx. Indeed, the following integral can

be verified directly: ∫ 1

0

cos(2πnx) sin(2πmx) dx = 0.

This is the main observation behind the theory of Fourier series.

Given two perpendicular line segments which form the sides of a right triangle, thePythagorean theorem tells us how to find the length of the hypotenuse. Although this isone of the most basic theorems in all of mathematics, a surprising number of math majorsdo not know how to prove it from basic principles. Here is a simple proof:

Theorem G.2(Classical Pythagorean Theorem). If a, b, c are the lengths of the two sidesand hypotenuse of a right triangle, respectively, thena2 + b2 = c2.

Proof. Put four copies of the triangle around a square of sidec to make a square of sidea+ b. Comparing areas of the big square to the sum of the areas of the components we get:

(a + b)2 = c2 + 4(12ab).

Expanding and canceling terms shows thata2 + b2 = c2. �

Properly interpreted, the Pythagorean theorem suggests something about inner productspaces. The Euclidean plane is simply the inner product space R2 and the sides of ourtriangle are orthogonal vectorsu andv. In this form, the Pythagorean theorem states

‖u + v‖2 = ‖u‖2 + ‖v‖2.

This is true in complete generality and it is one of the most fundamental properties ofabstract inner product spaces:

Theorem G.3(Abstract Pythagorean Theorem). If f andg are orthogonal vectors in aninner product space, then

‖f + g‖2 = ‖f‖2 + ‖g‖2.

Proof. If f, g are orthogonal, then〈f, g〉 = 0 by definition. Thus

‖f + g‖2 = 〈f + g, f + g〉= 〈f, f〉 + 〈f, g〉 + 〈g, f〉 + 〈g, g〉= ‖f‖2 + ‖g‖2. �

Another geometrically inspired theorem is the following:

Theorem G.4(Parallelogram Identity). If f, g are vectors in an inner product space, then

‖f + g‖2 + ‖f − g‖2 = 2(‖f‖2 + ‖g‖2).

Proof. The proof is a straightforward computation based upon the properties of an innerproduct. It is important to note that the Parallelogram Identity does not hold for normedvector spaces in general. The Parallelogram Identity is a special property of norms arisingfrom inner products. �

The abstract Pythagorean Theorem highlights the usefulness of orthogonal vectors. Infact, the same idea ofprojectionof one vector along another (think back to the dot product)also applies in arbitrary inner product spaces. Iff, g ∈ V , then the equation

f = cg + (f − cg)

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 179

obviously holds for allc ∈ R. It will be useful to find a constantc such that

〈f − cg, cg〉 = 0.

In other words, we want to writef as a scalar multiple ofg plus something orthogonal tog. To do this, we solve the above equation for the constantc:

0 = 〈f − cg, cg〉= c 〈f, g〉 − c2 〈g, g〉= c(〈f, g〉 − c‖g‖2)

and thus eitherc = 0 or

c =〈f, g〉‖g‖2

.

We obtain theorthogonal decompositionf = cg + h where the vector

h = f − 〈f, g〉‖g‖2

g

is orthogonal tog. Notice the important fact thath = 0 if and only if f andg are scalarmultiples of one another.

G.5. The Cauchy-Schwarz-Bunyakowsky Inequality

One of the most useful inequalities in all of mathematics is the Cauchy-Schwarz-Bunyakowsky Inequality. In the west, the following has traditionally be known as theSchwarz Inequality or the Cauchy-Schwarz Inequality. In Eastern Europe, it is frequentlycalled the Bunyakowsky Inequality. In light of this, many authors simply refer to it as theCSB Inequality.

Theorem G.5(Cauchy-Schwarz-Bunyakowsky Inequality). If 〈, 〉 is an inner product onV , then| 〈f, g〉 | ≤ ‖f‖‖g‖ for all f, g ∈ V . Equality holds if and only iff andg are scalarmultiples of one another.

Pf. #1. If either f or g is the zero vector, then the inequality is obviously true. Thusis suffices to check the case where neitherf nor g is zero. Write down the orthogonaldecomposition off with respect tog:

f =〈f, g〉‖g‖2

g + h.

Here the vectorh is orthogonal tof . The Pythagorean Theorem states that:

‖f‖2 = ‖〈f, g〉‖g‖2

g‖2 + ‖h‖2

≥ | 〈f, g〉 |2‖g‖2

‖g‖4

=| 〈f, g〉 |2‖g‖2

which implies the CSB inequality. Equality holds in the CSB inequality if and only ifh = 0, which by the comment at the end of the preceding section implies thatf andg arescalar multiples of one another. �

There is an entirely different proof of the Cauchy-Schwarz inequality that is interestingin and of itself. We present this below:

180 Appendix G. Inner Product Spaces

Pf. #2. Let f, g ∈ V and lett ∈ R be any real scalar. Furthermore, suppose thatf 6= 0andg 6= 0 to avoid any trivialities. Now observe that

p(t) = ‖tf + g‖2 ≥ 0

is a real-valued function of thevariablet and furthermorep(t) ≥ 0 for all t. We can usethe definition of the norm and some basic properties of inner products to derive an explicitformula forp(t):

p(t) = ‖tf + g‖2

= 〈tf + g, tf + g〉= 〈tf, tf〉 + 〈tf, g〉 + 〈g, tf〉 + 〈g, g〉= t2 〈f, f〉 + 2t 〈f, g〉 + 〈g, g〉= ‖f‖2t2 + 2 〈f, g〉 t + ‖g‖2.

We can rewrite this asp(t) = at2 + bt + c

wherea = ‖f‖2 > 0, b = 2 〈f, g〉, andc = ‖g‖2. The graph ofp(t) is a parabolawhich opens upward. Moreover,p(t) is always nonnegative and hence the discriminant isnonpositive:

b2 − 4ac ≤ 0.

Substituting in fora, b, c yields the CSB inequality.If equality held in the CSB inequality, thenb2 − 4ac = 0 whence the quadratic poly-

nomialp(t) has a unique real root, sayc. Thusp(c) = ‖cf +g‖2 = 0, whencecf +g is thezero vector. In particular, this implies thatf andg are scalar multiples of each other.�

Example G.6. Applying the CSB inequality to the inner product

〈f, g〉 =

∫ b

a

f(x)g(x) dx

onC([a, b]) yields the highly nontrivial inequality∣∣∣∣∣

∫ b

a

f(x)g(x) dx

∣∣∣∣∣ ≤

√∫ b

a

|f(x)|2 dx

√∫ b

a

|g(x)|2 dx,

valid for all continuous functionf, g on [a, b]. Try proving that directly!

Example G.7. If x1, x2, . . . , xn andy1, y2, . . . , yn are real numbers, then∣∣∣∣∣

n∑

i=1

xiyi

∣∣∣∣∣

2

≤(

n∑

i=1

i|xi|2)(

n∑

i=1

|yi|2i

).

Why? Because of the CSB inequality. Let

x = (x1,√

2x2,√

3x3, . . . ,√

nxn)

y = (y1,y2√2, y3√

3, . . . , yn√

n).

Sincex, y ∈ Rn, we may use the CSB inequality for the standard inner productto get

| 〈x, y〉 | ≤ ‖x‖‖y‖,which (when squared) yields exactly the strange inequalityproposed above.

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 181

G.6. The Triangle Inequality

We mentioned earlier (Theorem G.1) that whenever you have aninner product, youget a norm for free via the formula

‖u‖ =√〈u,u〉.

We proved that this proposed “norm” satisfies (i) and (ii) of the axioms for a norm, but wenever showed that (iii), the Triangle Inequality, was satisfied.

A fundamental theorem in plane geometry says that the sum of the lengths of two sidesof a triangle is always greater than the length of the other side. The following theoremgeneralizes this idea to inner product spaces:

Lemma 9 (Triangle Inequality). LetV be an inner product space. Iff, g ∈ V , then

‖f + g‖ ≤ ‖f‖ + ‖g‖.Equality holds if and only iff andg are nonnegative scalar multiples of each other.

Proof.

‖f + g‖2 = 〈f + g, f + g〉= 〈f, f〉 + 〈f, g〉 + 〈g, f〉 + 〈g, g〉= ‖f‖2 + 2 〈f, g〉 + ‖g‖2

≤ ‖f‖2 + 2| 〈f, g〉 | + ‖g‖2

≤ ‖f‖2 + 2‖f‖‖g‖+ ‖g‖2

= (‖f‖ + ‖g‖)2.Taking square roots of both sides yields the triangle inequality. �

In conclusion, we have the following relationships betweenvector spaces, normedvector spaces, and inner product spaces:

inner product spaces( normed vector spaces( vector spaces.

APPENDIX H

Covering Compactness

It turns out that there is a completely different approach tothe concept of compactness.These notes give a brief introduction to this viewpoint.

H.1. Covering Compactness

Definition. Let (M, d) be a metric space and letS ⊆ M . We say thatS is coveringcompactif, wheneverS is contained in the union of a collection of open subsets ofM , Sis contained in the union of a finite number of these open subsets.

This definition is frequently stated as:

“ S is covering compact if every open cover ofS has a finite sub-cover.”

Example H.1. Any finite setS = {x1, x2, . . . , xn} in a metric space(M, d) is coveringcompact. Let{Aα}α∈I be an open cover ofS. In other words,I is an index set1 and foreachα ∈ I we have an open subsetAα of M . Since

S ⊆ ∪α∈IAα,

it follows that eachxn belongs to at least one of theAα. In other words, there existα1, α2, . . . , αn ∈ I so thatxi ∈ Aαi

for i = 1, 2, . . . , n. In particular,

S ⊆n⋃

i=1

Aαi.

Thus the open cover{Aα}α∈I for S can be refined to produce a subcover

{Aα1, Aα2

, . . . Aαn}

of S containingn of theAi.

Example H.2. (0, 1] is not covering compact since the open cover defined by

Aǫ = (ǫ, 1 + ǫ), ǫ > 0

does not have a finite subcover which still covers(0, 1]. Indeed, taken of theAǫ:

Aǫ1 , Aǫ2 , . . . , Aǫn

and note that

x < min{ǫ1, ǫ2, . . . , ǫn} ⇒ x /∈n⋃

i=1

Aǫi.

In other words, the union of any finite number of theAǫ excludes points of(0, 1] whichare sufficiently close to zero. Since there exists an open cover of (0, 1] which cannot berefined to produce a finite subcover of(0, 1], it follows that(0, 1] is not covering compact.

1The index set can be finite, countably infinite, or even uncountable – there are no restrictions.

182

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 183

Example H.3. The subsetS = { 1

n : n ∈ N} ∪ {0}of R is covering compact. If{Aα}α∈I is an open cover ofS, then there exists someα0 ∈ Iso that0 ∈ Aα0

. Since this setAα0is open, there existsδ > 0 so thatBδ(0) ⊆ Aα0

. Sincelimn→∞

1n = 0, there existsN ∈ N so thatn ≥ N implies that1n < δ. In particular, all

but finitely many of the1n belongs toAα0

and hence only finitely many other of theAα areneeded to coverS. ThusS is covering compact.

Example H.4. N, regarded as a subset ofR, is not covering compact. Indeed, ifAn =(n − 1, n + 1) for n ∈ N, then clearly eachAn is open and∪n∈NAn = N. Nevertheless,there is no finite subcollection of theAn whose union contains all ofN since eachAn

contains only one natural number (namelyn).

H.2. Covering Compactness = Sequential Compactness

Covering compactness is a somewhat difficult property of a set to verify directly, sinceit involves checking that every possible open cover of a set always reduces to a finitesubcover of that set. Fortunately, we have the following theorem:

Theorem H.1. Let (M, d) be a metric space. A subsetS ⊆ M is compact if and only ifSis covering compact. In other words, the notions of sequential compactness2 and coveringcompactness are equivalent.

Proof. Suppose toward a contradiction thatS is covering compact but not sequentiallycompact. Thus there exists a sequencexn in S which has no subsequences which convergein S. In particular, this implies that thexn assumes infinitely many distinct values sinceotherwisexn would have a subsequencexnk

which is constant.Therefore for eacha ∈ S there existsδ, which depends ona, such thatBδ(a) contains

only finitely many of thexn. Therefore

{Bδ(a) : a ∈ A}forms an open cover ofS. SinceS is covering compact, there exists a finite subcover

{Bδ1(a1), Bδ2

(a2), . . . , Bδn(an)}

of S. However, eachBδi(ai) contains only finitely many terms of the sequencexn. On the

other hand, since

S ⊆n⋃

i=1

Bδi(ai),

it follows that the sequencexn assumes only finitely many values, a contradiction.The proof that sequential compactness implies covering compactness is significantly

more difficult (it would take a couple pages) and is thereforeomitted. �

H.3. Total Boundedness

Definition. A setS ⊆ M is totally boundedif for eachǫ > 0 there exists a finite coveringof S by ǫ-balls. In other words,S is totally bounded if there existx1, x2, . . . , xn ∈ Msuch thatS ⊆ ⋃n

i=1 Bǫ(xi).

Theorem H.2. Let(M, d) be a metric space and letS ⊆ M . The following are equivalent:

(i) S is (sequentially) compact

2What we have been referring to as “compactness.”

184 Appendix H. Covering Compactness

(ii) S is covering compact

(iii) S is closed and totally bounded

(iv) S is complete and totally bounded

If M = Rn andd is the Euclidean metric, then the three conditions above areequivalentto S being closed and bounded.

There are essentially two totally different ways of lookingat compactness. We havechosen to use the sequential approach because it is somewhatmore intuitive. The coveringapproach is a little more abstract and difficult to motivate.Nevertheless, the concept of cov-ering compactness is open to greater generalization. When one studies point-set topology(typically in graduate school), one no longer considers metric spaces, but rathertopologicalspaceswhere open and closed sets exist, but there is no notion of distance. Consequently,the notion of compactness one encounters there is actually covering compactness.

For each theorem about compactness which we proved using thesequential definition,there is typically a corresponding proof which uses the covering definition. For example:

Theorem H.3. A continuous function on a compact metric space is uniformlycontinuous.

Pf. (via Covering Compactness).Let (A, dA) be compact and letf : A → B be contin-uous. For eachǫ > 0 and for eachx ∈ A there exists a numberδ(x) > 0 so that

dA(x, y) < δ(x) ⇒ dB(f(x), f(y)) < ǫ2 .

The open ballsBδ(x)/2(x) form an open cover ofA sincex ∈ Bδ(x)/2(x) for eachx. SinceA is (covering) compact, there existsx1, x2, . . . , xn ∈ A so that

A ⊆n⋃

i=1

Bδ(xi)/2(xi).

Now letδ = min{ δ(x1)

2 , δ(x2)2 , . . . , δ(xn)

2 }Now suppose thatx, y ∈ A and |x − y| < δ and letx ∈ Bδ(xi)/2(xi) for some

i ∈ {1, 2, . . . , n}. It follows that

dA(xi, y) ≤ dA(xi, x) + dA(x, y)

< δ(xi)2 + δ

≤ δ(xi)2 + δ(xi)

2

= δ(xi).

ThereforedA(xi, y) < δ(xi) and dA(x, xi) < δ(xi)

which implies that

dB(f(xi), f(y)) < ǫ2 and dB(f(x), f(xi)) < ǫ

2 .

Putting this all together, we have shown that|x − y| < δ implies that

dB(f(x), f(y)) ≤ dB(f(x), f(xi)) + dB(f(xi), f(y))

< ǫ2 + ǫ

2

= ǫ

Since thisδ depends only uponǫ (and notx or y) it follows thatf is uniformly continuous.�

S.R. Garcia – Lectures on Real Analysis I (Preliminary Version) 185

Theorem H.4. Let (M, d) be a metric space. IfAn is a sequence of nonempty, compactsubsets ofM such that

A1 ⊇ A2 ⊇ A3 ⊇ · · · ,

thenA =⋂∞

n=1 An is also compact and nonempty.3

Pf. (via Covering Compactness).We have already seen that the arbitrary intersection ofcompact sets is compact. ThusA is a compact subset of(M, d). We must now show thatA is nonempty.

Suppose toward a contradiction thatA = ∩∞n=1An = ∅. This implies that

M =

∞⋃

n=1

Acn

and hence{Acn : n ∈ N} is an open cover ofM . Since(M, d) is compact, it follows that

the open cover{Acn : n ∈ N} has a finite subcover. In other words, there exists

n1 < n2 < . . . < nm

so thatM = Ac

n1∪ Ac

n2∪ · · · ∪ Ac

nm.

SinceAn1

⊇ An2⊇ An3

⊇ · · · ⊇ Anm,

if follows thatAc

n1⊆ Ac

n2⊆ Ac

n3⊆ · · · ⊆ Ac

nm

whenceM ⊆ Ac

nm⇔ Anm

⊆ ∅,

which is a contradiction. �

3The important part of the theorem is the assertion that the intersection is nonempty!