GEM 360. Foundations of Analysis - Bilkent Universityfranz/lect/calcon.pdf · GEM 360. Foundations of Analysis Franz Lemmermeyer, Spring 2001 March 9, 2002. Abstract The mathematicians

GEM 360. Foundations of Analysis

Franz Lemmermeyer, Spring 2001

March 9, 2002

Abstract

The mathematicians who have inventend (Newton, Leibniz) and developed cal-culus (the Bernoullis, Euler, etc.) used geometric insight freely in their proofs,and even Gauss considered the fact that e.g. a polynomial of odd degree hasat least one real zero as being obvious. In the 19th century, however, peoplebegan to ask why it is that x2 = 2 has a solution in R but not in Q. Theresult on which this fact is based is called the ‘completeness property’ of thereal numbers, and is essential for nearly every theorem in calculus.

In most textbooks on analysis (including Ross’s), however, the completenessof the reals is taken on faith, and the situation is not really improved by callingit an axiom (why don’t we call the fundamental theorem of calculus an axiom?).There are other omissions I’m not really happy with: the square root of realnumbers, for example, is freely used but never defined, and I challenge all of youto read Chapter 1 of Ross and then tell me where to find the proof that

√2 is

a real number but√−1 is not. And while we’re at it, let me repeat Dedekind’s

lament that most students (actually he was talking about mathematicians) havenever seen a proof of

√2 ·√

3 =√

6.In short: I don’t consider Chapter 1 of Ross as a ‘Foundation of Analysis’

because it doesn’t contain a foundation of the real numbers. To quote the author(p. vi):

Even after studying this book (or writing it) it will not be easy tohandle questions such as “What is a number?”

And even though the construction of the reals (and the proof of the completenessproperty) is quite abstract and takes much time, I will at least try to explainhow mathematicians create the universe of real numbers and analysis out of thenatural numbers.

Contents

1 From N to Q 21.1 The Natural Numbers N . . . . . . . . . . . . . . . . . . . . . . . 21.2 The Integers Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3 The Rationals Q . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Cauchy Sequences in Q 242.1 Sequences in Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 From Q to R (and C) 363.1 The Reals R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 The Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . 493.3 The p-adic numbers Qp . . . . . . . . . . . . . . . . . . . . . . . 513.4 Bolzano-Weierstrass . . . . . . . . . . . . . . . . . . . . . . . . . 533.5 Absolute Convergence . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Continuous Functions 584.1 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Properties of Continuous Functions . . . . . . . . . . . . . . . . . 614.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 The Riemann Integral 665.1 Riemann Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2 Main Properties of Riemann Integrals . . . . . . . . . . . . . . . 70

6 Differentiable Functions 746.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.2 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . 78

1

Chapter 1

From N to Q

1.1 The Natural Numbers NIn every deductive theory there are certain statements you must take for granted:you can’t prove theorems by assuming nothing. What we are taking for grantedhere are elementary notions of sets and the basic properties of natural numbersas encoded by the following statements called the Peano axioms: Let N be a settogether with a ‘successor’ function s such that

N1 1 ∈ N;

N2 if x ∈ N, then s(x) ∈ N;

N3 there is no x ∈ N with s(x) = 1;

N4 if s(x) = s(y), then x = y;

N5 if S is a subset of N containing 1, and if s(n) ∈ S whenever n ∈ S, thenS = N.

Remark 1. Ross is not really exact in his statement of the Peano axioms ashe already assumes the existence of an addition on N and then simply replacess(n) by n + 1.

Remark 2. Axiom [N2] states that s is a map N −→ N, that is: each elementof N gets mapped to another element of N.

Axiom [N4] states that the map s : N −→ N is injective. A map f : A −→ Bis called injective if f(a) = f(a′) for a, a′ ∈ A implies that a = a′, in otherwords: if different elements get mapped to different images.

Remark 3. Axiom [N5] is called the Principle of Induction. Assume you wantto prove a statement P (n) (like n2 + n is even) for all n ∈ N; let S denote theset of natural numbers z ∈ N for which P (n) is true. If you can show that P (1)holds (i.e. that 1 ∈ S) and that P (s(n)) holds whenever P (n) does (i.e. that

2

s(n) ∈ S whenever n ∈ S) then this axiom allows you to conclude that P (n)holds for every natural number.

Naively speaking, these axioms describe the basic properties of natural num-bers; logicians can prove that if a set N with such a successor function s satisfying[N1]–[N5] exists, then it is essentially unique (this means that the Peano axiomscharacterize the natural numbers), but we won’t need this.

What we want to do here is to show how the arithmetic of the naturalnumbers can be defined from the Peano axioms. We start by giving the naturalnumbers their usual names: we put 2 := s(1), 3 = s(2), 4 = s(3), etc.; inparticular N = {1, 2, 3, 4, . . .}.

Exercise. Which of these Peano axioms are satisfied by the ring Z of integersand successor function z 7−→ z + 1?

Exercise. Consider the set N = {1} with successor function s : N −→ N :1 7−→ 1. Show that this system satisfies all Peano axioms except one – whichone?

Exercise. Consider the set N = {1, 2} with successor function s : N −→ Nmapping 1 7−→ 2 and 2 7−→ 2. Show that this system satisfies all Peano axiomsexcept one – which one?

Exercise. Consider the set N = N ∪ (N + 12 ), where N = {1, 2, 3, . . .} and

N+ 12 = { 1

2 +1, 12 +2, . . .}. Define a successor function s : N −→ N by mapping

n 7−→ n + 1 and 12 + n 7−→ 1

2 + (n + 1) for all n ∈ N. Show that this systemsatisfies all Peano axioms except one – which one?

Addition

Next we define an operation + on N that we call addition. We have to say whatm + n should mean. How can we do that in terms of our axioms? Well, we cancertainly define m + 1 by putting

m + 1 := s(m). (1.1)

Once we know what m + 1 is, we can put m + 2 = (m + 1) + 1. In order to finda simple rule that does it in general we observe that m+2 = m+s(1), and thatour definition reads m + 2 = s(m + 1). This suggests that we define addition‘inductively’: assume that we already know what m + n means; we then define

m + s(n) := s(m + n), (1.2)

that is, m + (n + 1) := (m + n) + 1.Here’s our first theorem (just kidding):

Theorem 1.1. We have 1 + 1 = 2.

Proof. By (1.1), we have 1 + 1 = s(1). On the other hand, we have defined2 = s(1).

3

Remark. The way we have done it, the equation 2 = 1+1 follows immediatelyfrom the definitions. There are other axiom systems (for example, it is possibleto define natural numbers using the axioms of set theory), and in some of themthe statement 1 + 1 = 2 is a theorem whose proof requires some work. In theirbook ‘Principia Mathematica’, Whitehead & Russels prove this equation afterabout 500 pages. This means that they must be doing something we don’t, andthey do. Not only are we using naive set theory (we haven’t defined notionslike sets, subsets, elements etc.), we haven’t even defined what a proof is, or bywhich methods we are allowed to draw conclusions. It is possible to make all thisprecise, but you shouldn’t be surprised to learn that this gets quite complicated(as a matter of fact, it doesn’t ‘get’, it is complicated right from the start). Inany case, this is part of what is called ‘mathematical logic’, and some people likeit. You can even get famous by doing it: Godel was a logician, and Hofstaedter’sbook Godel, Escher, Bach was a bestseller in the early 1980s. One of Godel’ssurprising (if not shocking) results was the proof that there are statements aboutnatural numbers that are true but cannot be proved, that is to say: cannot bederived from the Peano axioms in a finite number of steps. In fact, for eachfinite set of axioms including those of Peano, there are true statements aboutnatural numbers that can’t be proved. That hasn’t kept number theorists fromproving theorems, however: life goes on.

Lemma 1.2. Addition m + n is now defined for all m,n ∈ N.

Proof. The following proof is fairly typical for much that follows. Make sureyou understand everything.

Let m ∈ N be any natural number. Let S be the set of all n ∈ N for whichm+n is defined. We want to show that m+n is defined for all n ∈ N, i.e., thatS = N. We shall accomplish this by using axiom N5 (induction).

First, we have 1 ∈ S since, by (1.1), m + 1 is defined as s(m).Next, if n ∈ S, then m + n is defined, and since m + s(n) = s(m + n) by

(1.2), so is m + s(n). In other words: if n ∈ S, then s(n) ∈ S.By the Induction axiom [N5], we conclude that S = N, hence addition m+n

is defined for all n ∈ N (and also for all m ∈ N since m was arbitrary).

Now we can prove all the laws that are satisfied by the addition of naturalnumbers; we shall be content with a few examples, however.

Proposition 1.3 (Associativity of Addition). For all x, y, z ∈ N, we havex + (y + x) = (x + y) + z.

Proof. Let x, y ∈ N be arbitrary and put

S = {z ∈ N : x + (y + x) = (x + y) + z}.

Again, S is the set of natural numbers z ∈ N for which the claim is true, andour task is to show that S = N.

4

Now 1 ∈ S because

x + (y + 1) = x + s(y) by (1.1)= s(x + y) by (1.2)= (x + y) + 1 by (1.1)

Next assume that z ∈ S. Then we want to show that s(z) ∈ S, and to thisend we have to prove that x + (y + s(z)) = (x + y) + s(z). Here we go:

x + (y + s(z)) = x + s(y + z) by (1.2)= s(x + (y + z)) by (1.2)= s((x + y) + z) since z ∈ S= (x + y) + s(z) by (1.2)

By the induction principle, this proves that S = N and we are done.

Note that there’s not the smallest possibility of a gap in our proof; weexplained each and every step, and the result holds without any shred of doubt.

Lemma 1.4. For all x ∈ N, we have x + 1 = 1 + x.

Proof. Let S denote the set of all x ∈ N for which x + 1 = 1 + x. Then 1 ∈ Ssince 1 + 1 = 1 + 1. Now assume that x ∈ S. Then

s(x) + 1 = (x + 1) + 1 by (1.1)= (1 + x) + 1 since x ∈ S= 1 + (x + 1) by Prop. 1.3= 1 + s(x) by (1.1)

Thus S = N by the induction principle.

Now we can prove

Proposition 1.5 (Commutativity of Addition). For all x, y ∈ N we havex + y = y + x.

Proof. You know the game by now: for an arbitrary x ∈ N, let S denote the setof all y ∈ N such that x + y = y + x. By Lemma 1.4, we have 1 ∈ S.

Now assume that y ∈ S. Then

x + s(y) = s(x + y) by (1.2)= s(y + x) since y ∈ S= y + (x + 1) by (1.1)= y + (1 + x) by Lemma 1.4= (y + 1) + x by Prop. 1.3= s(y) + x by (1.1)

Thus S = N, and we are done.

OK, now it’s your turn: the proofs of the following results are left as exer-cises.

5

Proposition 1.6 (Cancellation Law). If x + z = y + z for some x, y, z ∈ N,then x = y.

Hint. This is easy for z = 1 (use Peano); now use induction.

Proposition 1.7. For x, y ∈ N, we have x + y 6= x.

Hint. Use induction on x.

Theorem 1.8 (Trichotomy Law for Addition). For any x, y ∈ N, exactlyone of the following three cases holds:

(i) x = y;

(ii) x = y + z for some z ∈ N;

(iii) y = x + z for some z ∈ N.

Proof. We first show that no two of these statements can hold simultaneously.Assume that (i) and (ii) are both true. Then x = x + z, contradicting Prop.

1.7.The claim that (i) and (iii) [or (ii) and (iii)] cannot hold simultaneously is

left as an exercise.Now we have to prove that, given x, y ∈ N, at least one of these claims is

true. We consider an arbitrary y ∈ N and do induction on x, that is, we put

S = {x ∈ N : (i) or (ii) or (iii) is true}.

We claim that 1 ∈ S. If 1 = y, this is clear, so assume that 1 6= y. In thiscase, y must be the successor of some natural number, say y = s(z). But theny = s(z) = z + 1 = 1 + z by the definition of addition and commutativity, andy = 1 + z shows that (iii) is satisfied.

Now we claim that x ∈ S implies s(x) ∈ S, so assume that x ∈ S. Then weare in exactly one of three cases:

a) x = y; then s(x) = s(y) = y + 1, so (ii) holds;b) x = y + z for some z ∈ N; then s(x) = s(y + z) = y + s(z), so again (ii)

is true.c) y = x + z for some z ∈ N. If z = 1, then y = s(x), and (i) holds. If z 6= 1,

then z = s(v) for some v ∈ N, hence

y = x + z = x + s(v) = x + (v + 1) = (x + 1) + v = s(x) + v,

where we have used various properties of addition, so (iii) holds.Thus if x ∈ S, then s(x) ∈ S, hence S = N by induction, and we are

done.

6

Finally, let us introduce the well known (or so I hope) sigma notation forsums: for integers x1, . . . , xn, . . . ∈ N we define

∑nk=1 xk inductively by

1∑k=1

xk = x1 (1.3)

ands(n)∑k=1

=( n∑

k=1

xk

)+ xn+1. (1.4)

Generalizing the properties proved above to finite sums is then again left as anexercise in induction; let us just mention the examples

n+m∑k=n+1

xk =m∑

k=1

xn+k

n∑k=1

xk +m∑

k=1

xn+k =n+m∑k=1

xk

n∑k=1

xk +n∑

k=1

yk =n∑

k=1

(xk + yk).

Let us see how to prove the first claim. For m = 1, the claim is

n+1∑k=n+1

xk =1∑

k=1

xn+k,

and by our definition (1.3), both sides are equal to xn+1.Now assume that the claim is true for some m ∈ N; then we claim that it

also holds for s(m). In fact,

n+s(m)∑k=n+1

xk =( n+m∑

k=n+1

+xk

)+ xn+m+1 by (1.4)

=( m∑

k=1

xn+k

)+ xn+m+1 by assumption

=n+s(m)∑

k=1

xn+k by (1.4),

and the claim follows. Induction does the rest.

Exercise. Prove the other two formulas.

We also should mention the following generalization of associativity: thesum

∑nk=1 xk is by definition equal to ((((x1 + x2) + x3) + x4) + . . .) + xn; “of

course” this sum does not depend on how we place the brackets, but we stillhave to prove this. A concise formulation of this property is the following: ifx1, . . . , xn is a finite set of natural numbers, and if y1, . . . , yn is a permutationof the xk, then

∑nk=1 xk =

∑nk=1 yk.

7

Multiplication

We are now going to define the multiplication of natural numbers. For thedefinition of x · z we use induction. First we put

x · 1 = x (1.5)

(of course). Now assume that we have defined x · y; then we put

x · s(y) = x · y + x (1.6)

(in other words: we put x · (y + 1) := x · y + x). It should be obvious by nowthat the induction principle guarantees that xy is defined for any x, y ∈ N. Ingeneral, we omit the multiplication sign · and write xy instead of x · y’ Weshall also write xy + z instead of (xy) + z and agree that we always evaluateexpressions by multiplying first and then adding the products.

Next we prove the basic properties of multiplication:

Proposition 1.9 (Left Distributive Law). For x, y, z ∈ N we have x(y+z) =xy + xz.

Proof. Take z, y ∈ N and do induction on z. We find

x(y + 1) = x · s(y) by (1.1)= xy + x by (1.6)= xy + x · 1 by (1.5).

Next we assume that the left distributive law holds for z and prove that italso holds for s(z):

x(y + s(z)) = x · (s(y + z)) by (1.2)= x(y + z) + x by (1.6)= (xy + xz) + x by assumption= xy + (xz + x) by Prop. 1.3= xy + x · s(z) by (1.6)

This proves the claim by induction.

Where there’s a left distributive law, there’s a

Proposition 1.10 (Right Distributive Law). We have (x + y)z = xz + yzfor all x, y, z ∈ N.

This proof is left as an exercise. Note that right distributivity would followimmediately from left distributivity if we already knew that multiplication wascommutative. Fact is, however: we don’t. But it comes right next on our (thatis your) agenda: we start out with commutativity for multiplication by 1:

Lemma 1.11. For all x ∈ N, we have 1 · x = x.

and then do induction:

8

Proposition 1.12 (Commutativity of Multiplication). For all x, y ∈ N,we have xy = yx.

Yet another exercise:

Proposition 1.13 (Associativity of Multiplication). For x, y, z ∈ N, wehave x(yz) = (xy)z.

And another one:

Proposition 1.14 (Cancellation Law of Multiplication). If xz = yz forx, y, z ∈ N, then x = y.

Exercise. Prove that a( ∑n

k=1 xn

)=

∑nk=1(axn) for a, x1, . . . , xn ∈ N.

Exercise. Define∏n

k=1 xk for x1, . . . , xk ∈ N.

Exercise. Prove that∏m

k=1 xk

∏nk=m+1 xk =

∏nk=1 xk.

Exercise. Prove that∏n

k=1 xk

∏nk=1 yk =

∏nk=1(xkyk).

Now that we know how to multiply, we can go forth and define exponen-tiation an for a, n ∈ N: we put a1 = a, and if an is already defined, thenas(n) = an · a. Armed with this definition, we perform the same procedure aslast time:

• an is defined for all a, n ∈ N (induction on n)

• am+n = aman for a,m, n ∈ N (induction on n)

• amn = (am)n for a,m, n ∈ N (induction on n)

• anbn = (ab)n for a, b, n ∈ N.

This is recommended as an exercise unless it is perfectly clear for you how toproceed.

There is one last set of properties of the naturals that we have not yettouched upon: those based on the relation <.

N as a well-ordered set

We start by defining the relevant concept. For x, y ∈ N we write

• x < y if there is an n ∈ N such that x + n = y;

• x > y if y < x;

• x ≤ y if x < y or x = y;

• x ≥ y if y ≤ x.

9

We say that a set R is simply ordered if we have a relation < such that thefollowing conditions are satisfied for all x, y, z ∈ R:

O1 Trichotomy: We either have x < y or x = y or x > y.

O2 Transitivity: if x < y and y < z then x < z.

The proofs of the following claims are now straight forward:

Proposition 1.15. The set N of natural numbers is simply ordered.

Proof. That condition O1 holds follows immediately from the trichotomy lawof addition (Thm. 1.8).

Now assume that x < y and y < z. By definition, there exist m,n ∈ N suchthat y = x + m and z = y + n. But then z = y + b = x + m + n, hence x < z.This proves O2.

Proposition 1.16. For x, y, z ∈ N, < and ≤ have the following properties:

• x ≤ x;

• if x ≤ y and y ≤ x then x = y;

• either x ≤ y or y ≤ x;

• if x < y, then x + z < y + z for z ∈ N.

• if x < y then xz < yz.

We shall also need a couple of almost trivial results:

Proposition 1.17. We have

1. There is no x ∈ N with x < 1;

2. x < s(y) if and only if x ≤ y, where x, y ∈ N;

3. 1 ≤ x for all x ∈ N;

4. s(y) ≤ x if and only if y < x, where x, y ∈ N;

For a subset S ∈ N, we say that S has a smallest element if there is an s ∈ Ssuch that s ≤ t for all t ∈ S. The following result is basic (we say that N iswell-ordered):

Theorem 1.18. Every nonempty subset S ∈ N has a smallest element.

Proof. The idea is quite simple: since S is non-empty, there is an s1 ∈ S. Ifs1 ≤ s for all s ∈ S, then s1 is a smallest element, and we are done; if not, thenthere is an s2 ∈ S such that s2 < s1. If s2 is a smallest element, the proof iscomplete, if not then pick an s3 ∈ S with s3 < s2. After finitely many steps,this process must terminate.

10

While the last claim may be intuitively clear (and almost obvious), it is quitehard to prove because we are going ‘from the top to the bottom’, whereas ourprincipal method of proof (induction) goes in the inverse direction. If someonecan come up with a simpler proof, I’m all ears.

Let S ⊆ N be non-empty, and assume for contradiction that S does notcontain a smallest element. Define

R = {x ∈ N : x < y for all y ∈ S}.

Then R ⊆ S, where S = N \S. In fact, if x ∈ R were also an element of S, thenx < y does not hold for all y ∈ S because it fails for y = x.

We now prove by induction that R = N. This gives the desired contradiction,since R = N and R ⊆ S imply that N ⊆ S, hence S = ∅.

We start by showing that 1 ∈ R. We have 1 ≤ y for all y ∈ S by Prop. 1.17.3.Now if 1 ∈ S, then 1 would be a smallest element in S; since by assumptionthere is no such beast, we have 1 /∈ S, and thus 1 < y for all y ∈ S.

Next we have to show that if x ∈ R, then s(x) ∈ R. So assume that x ∈ R,and let y ∈ S. Then x < y, so s(x) ≤ y by Prop. 1.17.4. If s(x) ∈ S, then s(x)would be a smallest element in S contrary to our assumption. Hence s(x) /∈ S,and so s(x) < y for all y ∈ S, thus s(x) ∈ R.

Let us also prove a simple result that will evolve into the Archimedeanproperty of the reals:

Proposition 1.19. If x < y are natural numbers, then there exists an n ∈ Nsuch that nx > y.

Proof. Put n = y + 1.

Finally we take the opportunity to explain the difference between arbitrarilylong and infinitely long sequences:

• there exist arbitrarily long decreasing sequences of natural numbers;

• there do not exist infinitely long decreasing sequences of natural numbers.

The first statement means that we can find sequences of decreasing naturalnumbers that are as long as we wish: in fact, if we want a sequence of length N ,we simply pick N , N − 1, N − 2, . . . , 2, 1. On the other hand, there clearly areno infinite sequences of this type: if the sequence starts with a number N , thenthe sequence cannot have more than N elements, and in particular it is finite.

1.2 The Integers ZThe first step in constructing the real numbers from N is of course the con-struction of Z, that is, of the (ring of) integers. Clearly mathematicians don’tlike to mumble something about debts or negative temperatures when definingnegative integers. So how do they do it?

11

Here’s how. We can represent every natural number n as a difference of twonatural numbers in many ways, e.g. 2 = 3− 1 = 4− 2 = 5− 3 = . . .. Thus wecan represent 2 by the pairs (3, 1), (4, 2), (5, 3) etc. of natural numbers. If wealready knew negative numbers, then of course −2 = 1− 3 = 2− 4 = . . . wouldbe represented by the pairs (1, 3), (2, 4), (3, 5) etc. The idea is now to turneverything around and create negative integers using pairs (m,n) of naturalnumbers.

We define an equivalence relation on the set

W = {(m,n) : m,n ∈ N}

of these pairs by putting (m,n) = (m′, n′) if m + n′ = n + m′. This is indeedan equivalence relation:

• Symmetry: (m,n) ∼ (m,n);

• Reflexitivity: (m,n) ∼ (m′, n′) =⇒ (n′,m′) ∼ (m,n);

• Transitivity: (n, m) ∼ (n′,m′) and (n′,m′) ∼ (n′′,m′′) =⇒ (n, m) ∼(n′′,m′′).

Of course there is something to prove here. DO IT!Note that (3, 1) ∼ (4, 2) ∼ (5, 3) ∼ . . ..Let [m,n] = {(x, y) : (x, y) ∼ (m,n)} denote the equivalence class of (m,n),

and let Z = {[m,n] : m,n ∈ N} denote the set of all equivalence classes.We can make N into a subset of Z by identifying a natural number n with

the equivalence class [n + 1, 1] (in our example, we would identify 2 with [3, 1]).Moreover, we shall simply write −n for the class [1, 1 + n] and introduce thesymbol 0 = [1, 1].

Remark. At this point I regret that I have not defined N as N = {0, 1, 2, . . .}(simply replace 1 in the Peano axioms by 0): if I had, then we could identifynatural numbers n with the integers [n, 0] instead of [n + 1, 1]; this seems morenatural, and it would make several formulas below considerably simpler.

Remark. There are many definitions of what mathematics is (or should be),each of them emphasizing different aspects. The definition that describes whatwe are doing here is

Mathematics is the art of identifying different things.

This ‘identification’ can be given a precise mathematical formulation byintroducing the map ι : N −→ Z that identifies N with a subset of Z: we put

12

ι(n) = [n + 1, 1].N −→ Z... 7−→

...3 7−→ [4, 1] ' 32 7−→ [3, 1] ' 21 7−→ [2, 1] ' 1

[1, 1] ' 0[1, 2] ' −1[1, 3] ' −2...

Now we only are allowed to ‘identify’ N with its image ι(N) ⊆ Z if ι does notmap two natural numbers to the same integer, that is, if ι is injective. Let’scheck this:

Assume that ι(m) = ι(n), i.e., that [m + 1, 1] = [n + 1, 1]. By definitionof these equivalence classes this means that (m + 1, 1) ∼ (n + 1, 1), that is,m + 1 + 1 = n + 1 + 1. The cancellation law then implies m = n, hence ι isinjective.

We remark that

Z = −N ∪ {0} ∪ N = {. . .− 3,−2,−1, 0, 1, 2, 3, . . .}.

To this end, we have to prove that for every (m,n) ∈ W there is an x ∈ Nsuch that (m,n) ∼ (x, 1) or (m,n) ∼ (1, x), because this implies [m,n] = x ∈ N(if (m,n) ∼ (x, 1) and x > 1) or [m,n] = 0 (if (m,n) ∼ (x, 1) and x = 1) or[m,n] = −x (if (m,n) ∼ (1, x) and x > 1). This follows from the TrichotomyLaw.

We also mention that Z is countable. Recall that a set A is said to becountable if there exists a bijection between A and N. The fact that ι : N ↪→ Zis not a bijection does not imply that Z is not countable: it simply means thatι is not bijective, so we have to look for another map.

This isn’t hard: the sequence 0, 1, −1, 2, −2, etc. contains every integeronce, hence 0 ←→ 1, 1 ←→ 2, −1 ←→ 3, 2 ←→ 4, −2 ←→ 5 etc. gives us thedesired bijection Z←→ N. We could write down a formula if we wanted to, butthis would only complicate things.

Addition

We now show that we can define addition, multiplication and an ordering < onZ in such a way that the properties of N proved in the last section continue tohold.

We start by defining addition ⊕ on Z. We have to say what [r, s] ⊕ [t, u] issupposed to be. Clearly we would like to have [r, s] = r − s, [t, u] = t − u, sothe sum should be r− s + t− u = r + t− (s + u) = [r + t, s + u]. With this ideain mind we now define

[r, s]⊕ [t, u] = [r + t, s + u], (1.7)

13

where the addition inside the brackets is the addition in N.There is no need to prove that we have defined addition on all elements of Z,

because this is obvious here; the point is that we did not define ⊕ inductively,we simply wrote down a formula.

Nevertheless there’s a lot of work to do. First we have to prove that thisaddition is well defined (this is something that comes up whenever we definesomething on equivalence classes). What this means is: assume that [r, s] =[r′, s′] and [t, u] = [t′, u′]. On the one hand, we have

[r, s]⊕ [t, u] = [r + t, s + u].

If we replace the left hand side by [r′, s′]⊕ [t′, u′], then we clearly get

[r′, s′]⊕ [t′, u′] = [r′ + t′, s′ + u′].

But if our addition is to make any sense, then the right hand sides should beequal because, after all, the left hand sides are. Thus we want to show that

[r′ + t′, s′ + u′] = [r + t, s + t]. (1.8)

What do we know? We know that [r, s] = [r′, s′], which by definition means(r, s) ∼ (r′, s′), that is, r + s′ = s + r′. Similarly, [t, u] = [t′, u′] implies t + u′ =u + t′. Adding these equations and using commutativity and associativity fornatural numbers we get r+t+s′+u′ = s+u+r′+t′, which in turn is equivalentto (1.8).

Next we have to show that the two additions agree on N; after all, we areusing the very same symbols for natural numbers 1, 2, . . . and their images 1,2, . . . under ι in Z. This can only work if, for natural numbers m,n, the summ + n is the same whether evaluated in N or in Z. In other words: we want tobe sure that

ι(m + n) = ι(m)⊕ ι(n).

This is a straight forward computation:

ι(m)⊕ ι(n) = [m + 1, 1]⊕ [n + 1, 1] by definition of ι= [m + n + 2, 2] by (1.7)= [m + n + 1, 1] (m + n + 2, 2) ∼ (m + n + 1, 1)= ι(m + n) by definition of ι

Now that there is no need to distinguish between the two types of additionanymore, we shall often write + instead of ⊕ for the addition on Z.

Of course we have prove that associativity and commutativity also holdsfor our addition in Z. So why is (x + y) + z = x + (y + z) for all x, y, z ∈Z? Write x = [r, s], y = [t, u] and z = [v, w] with r, s, t, u, v, w ∈ N. Then(x + y) + z = [r + t, s + u] + [v, w] = [(r + t) + v, (s + u) + w], and similarlyx+(y + z) = [r +(t+ v), s+(u+w)]. Because addition in N is associative, so isaddition in Z (again, observe that there’s no need for invoking induction here).

Exercise. Prove that addition on Z is commutative.

14

As we know, we also can subtract integers in Z. So how do we define x− y?Well we write x = [r, s] and y = [t, u]; we cannot put x − y = [r − t, s − u]because r − t and s − u might not be natural numbers; but if they were, wewould have [r− t, s− u] = [r + u, s + t], and nothing prevents us from defining

[r, s] [t, u] = [r, s]⊕ [u, t] = [r + u, s + t]. (1.9)

Again, we have to show that it is well defined. Once this is done, it is easyto prove that Z is a group with respect to addition, and that 0 = [1, 1] is theneutral element.

What does that mean? A group is a set G of elements together with acomposition, that is, a map + : G × G −→ G that maps a pair of elements(g, g′) ∈ G × G to another element g + g′ ∈ G; we also demand that thiscomposition satisfy the following rules:

G1 there is a neutral element 0 ∈ G such that g + 0 = g for all g ∈ G;

G2 for every g ∈ G there is an element g′ ∈ G such that g + g′ = 0 (we shallwrite g′ = −g);

G3 the composition is associative: we have (g + g′) + g′′ = g + (g′ + g′′) forall g, g′, g′′ ∈ G.

If the group also satisfies the condition

G4 g + g′ = g′ + g for all g, g′ ∈ G;

then we say that G is commutative (abelian).The set N of natural numbers is not a group with respect to +: there is

a composition + : N × N −→ N, but there is no neutral element (let aloneinverses).

The set Z of integers, on the other hand, is a group with respect to +(CHECK IT!). In fact, Z is not only a group, it also carries the structure of aring. But to see this, we have to define multiplication in Z first.

Multiplication

We want to define [r, s] � [t, u] in such a way that multiplication of naturalnumbers N ⊂ Z agrees with the one we have defined before, namely that ι(m)�ι(n) = ι(mn). In other words, our multiplication should give

[m + 1, 1]� [n + 1, 1] = [mn + 1, 1].

More generally, if, for a minute, we think of [r, s] as the ‘integer’ r− s, then wewant [r, s]� [t, u] ' (r − s)(t− u) = rt + su− ru− st ' [rt + su, ru + st], andthis suggests the definition

[r, s]� [t, u] = [rt + su, ru + st]. (1.10)

15

Exercise. Show that the operation [r, s] ∗ [t, u] = [rt, su] is not well defined.

Once more we have to show that the multiplication (1.10) is well defined andthat it agrees with multiplication in N (actually we have defined it in such a waythat it must; what we have to check here is that ι(m) � ι(n) = ι(mn)). Thenone generalizes distributivity, commutativity, associativity and the cancellationlaw to integers in Z.

Let us just note in passing that

(−1) · (−1) = [1, 2]� [1, 2] by our identification= [5, 4] by (1.10)= [2, 1] since (5, 4) ∼ (2, 1)= +1 since ι(1) = [2, 1]

In fact, for m,n ∈ Z it is now easy to prove that

(−m) · n = −mn,

m · (−n) = −mn,

(−m) · (−n) = mn.

Thus the rules for multiplying signs come out naturally from our definition ofmultiplication on Z, which in turn was forced upon us as the simplest way ofextending multiplication on N.

Now we are ready to state that Z is a ring. A ring R is a set on whichtwo kinds of compositions are defined; they are usually denoted by + and ·. Ofcourse, these compositions are to satisfy certain conditions; first of all, r+s andr · s should be elements of R whenever r and s are. Moreover, we demand

R1 R is an abelian group with respect to +;

R2 (associativity): r(st) = (rs)t for r, s, t ∈ R;

R3 (distributivity): we have r(s + t) = rs + rt and (r + s)t = rt + st forr, s, t ∈ R.

R4 R contains a unit element e satisfying er = re = r for all r ∈ R;

If R also satisfies rs = sr for all r, s ∈ R, then we say that R is commutative.Finally, a ring R is called a domain if xy = 0 implies x = 0 or y = 0.

In any ring we have 0x = 0: in fact,

0x = (0 + 0)x since 0 neutral element of += 0x + 0x by distributivity,

so substracting 0x from both sides gives 0 = 0x.

Theorem 1.20. The integers Z form a commutative domain with respect toaddition and multiplication.

16

Let us prove that Z is indeed a domain bAssume that xy = 0 for x, y ∈ Z. Write x = [r, s] and y = [t, u]. Then

[1, 1] = 0 = xy = [r, s]�[t, u] = [rt+su, ru+st] by assumption, hence rt+su+1 =ru + st + 1, and by the cancellation law, rt + su = ru + st.

Now assume that x 6= 0; then r + m = s or r = s + m for some m ∈ N byTheorem 1.8. In the first case, r+m = s for some m ∈ N. Then rt+(r+m)u =rt + su = ru + st = ru + (r + m)t, hence mu = mt and so u = t, that is, y = 0.The case r > s is treated similarly.

Z as an ordered domain

Last not least we have to extend the relation < to Z. We put

[r, s] < [t, u] if r + u < t + s. (1.11)

This is well defined and agrees with the ordering on N.For showing that the relation is well defined, we have to assume that (r, s) ∼

(r′, s′) and (t, u) ∼ (t′, u′), and then show that r+u < t+s implies r′+u′ < t′+s′.DO IT.

For showing that the order just defined agrees with the one we know fromN we have to prove that n < m if and only if ι(n) < ι(m).

Proposition 1.21. The set Z is simply ordered with respect to <.

An ordered domain is a domain R together with an order < such that

OD1 R is simply ordered with respect to <.

OD2 If x < y, then x + z < y + z for x, y, z ∈ R.

OD3 If x < y and 0 < z, then xz < yz.

Proposition 1.22. Z is an ordered domain with respect to <.

Proof. Write x = [r, s] and y = [t, u]. If z ∈ N, then we may put z = [v, 1]. Nowx < y means r + u < t + s, and xz < yz is equivalent to (r + u)v < (t + s)v.Now look back at Prop. 1.16.

The rest is left as an exercise.

Proposition 1.23. In any ordered ring R, the following assertions are true:

1. If x < 0, then −x > 0.

2. If x < y and z < 0, then xz > yz.

3. We have x2 ≥ 0 for all x ∈ R, with equality if and only if x = 0.

Proof. 1. If x < 0, then x + (−x) < 0 + (−x) by OD2, and so 0 < −x.

2. We have 0 < −z, hence −xz = x · (−x) < y · (−z) = −yz, hence xz > yz.

17

3. If x ≥ 0, then multiplying through by x ≥ 0 gives x2 ≥= 0; if x ≤ 0, thenmultiplying through by x ≤ 0 gives x2 ≥ 0.

We now introduce absolute values in any ordered domain by putting

|x| =

{x if x ≥ 0,

−x if x < 0.

Here are a few simple properties of

Lemma 1.24. In any ordered domain R, The absolute value |·| has the followingproperties.

1. |x| ≥ 0: this is clear if x ≥ 0; if x < 0, multiply through by −1 < 0.

2. |xy| = |x| · |y|: Just consider the four possible cases: 1) if x > 0, y > 0,then xy > 0, so the claim is xy = xy, which obviously holds; 2) if x > 0and y < 0, then xy < 0, hence the claim is −xy = x · (−y). The other twocases are treated similarly.

3. If s ≥ 0 and −s ≤ r ≤ s, then |r| ≤ s. In fact, it is sufficient to provethat r ≤ s and −r ≤ s. The first one is true by assumption, the secondone follows from multiplying −s ≤ r through by −1.

Proposition 1.25 (Triangle Inequality). For all x, y in an ordered domain,we have |x + y| ≤ |x|+ |y|.

Proof. By adding −|x| ≤ x ≤ |x| and −|y| ≤ y ≤ |y| we obtain −(|x| + |y|) ≤x + y ≤ |x|+ |y|. Now apply Lemma 1.24.3 to r = x + y and s = |x|+ |y|.

1.3 The Rationals QJust as mathematicians avoid thinking of debts when introducing negative in-tegers, they don’t think of splitting pies when introducing rationals. Instead,they play the game of constructing them using – pairs of integers, of course.

In this case, we want to identify a fraction ab with the equivalence class [a, b],

where a, b ∈ Z are integers. Since we want ab = c

d whenever ad = bc, we startby defining an equivalence relation on the set

V = {(r, s) : r ∈ Z, s ∈ N}

of pairs of an integer and a natural number, and of course we put

(r, s) ∼ (t, u) ⇐⇒ ru = st.

It is easily seen that this is an equivalence relation (DO IT!), and we now put

[r, s] = {(x, y) ∈ V : (x, y) ∼ (r, s)}.

18

Let Q denote the set of all equivalence classes [r, s] with (r, s) ∈ V ; a class [r, s]is called a rational number, and we often write r

s instead of [r, s].

Remark. This should be no problem at all for computer scientists: if theywere to write a program that does arithmetic with rational numbers, they wouldrealize r

s as a pair [r, s] of integers, identify [r, s] = [nr, ns] for n ∈ N, and definesums and products of such pairs in exactly the same way as we are doing here.

First we identify an integer z ∈ Z with the rational number [z, 1], that is,we consider the map ι : Z −→ Q defined by ι(z) = [z, 1] and first prove thatι is injective (that is, no two different integers correspond to the same rationalnumber). In fact, assume that x, y ∈ Z are such that ι(x) = ι(y). Then[x, 1] = [y, 1], i.e., (x, 1) ∼ (y, 1), and by definition of equivalence in V thismeans x · 1 = y · 1, hence x = y.

We want to have rs + t

u = ru+stsu , so we are led to define

[r, s]⊕ [t, u] = [ru + st, su]. (1.12)

This is well defined (!), and agrees with addition on Z under the identificationι: in fact,

ι(x)⊕ ι(y) = [x, 1]⊕ [y, 1]= [x · 1 + 1 · y, 1 · 1] = [x + y, 1]= ι(x + y).

Thus it does not matter whether we add in Z and then identify the result witha rational number, or first view the integers as elements of Q and add there.

Next we define multiplication of rationals by

[r, s]� [t, u] = [rt, su]. (1.13)

Here we were motivated by rs ·

tu = rt

su . Again, this is well defined and agreeswith multiplication of integers on the subset Z ⊂ Q: we have ι(x)� ι(y) = ι(xy)because

ι(x)� ι(y) = [x, 1]� [y, 1] by definition of ι= [xy, 1] by definition (1.13)= ι(xy) by definition of ι

Remark. The map ι : Z −→ Q from the ring of integers to the ring of rationalnumbers satisfies

ι(x)⊕ ι(y) = ι(x + y),ι(x)� ι(y) = ι(xy),

Maps R −→ S between rings with these properties (we say that they ‘respectthe ring structure’) are called ring homomorphisms if they map the unit elementof R to the unit element of S. In particular, our ‘identification map’ ι is a ringhomomorphism.

19

Using these definitions, we can prove associativity, commutativity, distribu-tivity, thereby verifying that Q is a ring. In fact, Q is even a field!

A field F is a commutative ring in which we can divide by nonzero elements:thus F is a field if F satsifies the ring axioms, and if in addition

F1 For every r ∈ F \ {0} there is an s ∈ F such that rs = 1.

This is a strong axiom: together with some other ring axioms it implies thatfields are domains, in other words: if xy = 0 for field elements x, y ∈ F , thenx = 0 or y = 0. In fact, if y 6= 0, then there is some z ∈ F such that yz = 1 byF1, hence 0 = 0z = (xy)z = x(yz) = x · 1 = x.

One of the most important properties of fields F is

Proposition 1.26. If F is a field and if xy = 0 for x, y ∈ F , then x = 0 ory = 0.

Proof. In fact, assume that xy = 0 and y 6= 0. Since the nonzero elements ofF form a group, y has an inverse, that is, there is a z ∈ F such that yz = 1.But now 0 = xy implies 0 = 0z = (xy)z = x(yz) = x · 1 = x; here we have usedassociativity of multiplication.

Theorem 1.27. The set Q of rational numbers forms a field with respect toaddition and multiplication.

Starting from N, we have step by step constructed larger domains: we intro-duced Z in order to be able to subtract, and we went over to Q in order to beable to divide by nonzero elements. Now in Q we can add, subtract, multiplyand divide by nonzero elements, so why should we want to extend Q again?One reason is that there ‘seem to exist’ numbers that are not rational:

Proposition 1.28. There is no x ∈ Q with x2 = 2.

Proof. Assume that x2 = 2 for x = mn . Assume that m and n are not both even

(if they are, we can cancel factors of 2 until one of them is odd). Multiplyingthrough by n2 gives 2n2 = m2. Since the left hand side is even so is the righthand side, and this implies that m is even, say m = 2p. Then 2n2 = 4p2, hencen2 = 2p2. This in turn shows that n is even: contradiction, since we assumedthat m or n is odd.

There is another proof that uses less number theory and is much more pow-erful:

Theorem 1.29. If n ∈ N is not the square of an integer, then it is not thesquare of a rational number.

Proof. In fact, if n is not a square of an integer, then it lies between two squares,that is, we can find an integer a such that a2 < n < (a + 1)2. Assume that√

n = pq with q > 0 minimal. Then p2 = nq2, hence p(p − aq) = p2 − apq =

nq2 − apq = q(nq − ap), sop

q=

nq − ap

p− aq.

20

But a < pq < a + 1 implies 0 < p − aq < q: this contradicts the minimality of

the denominator q.

Observe the difference between the statement ‘2 is not a square in Q’ and ‘√

2is an irrational number’: the second statement only makes sense if we alreadyknow what

√2 means.

Countability of QAlthough there are ‘many more’ rational numbers than integers, in some waytheir cardinality is the same:

Theorem 1.30. The set Q is countable.

Proof. We first show that the positive rational numbers are countable. To thisend we consider the lattice of numbers

11

12

13

14 . . .

21

22

23

24 . . .

31

32

33

34 . . .

41

42

43

44 . . .

......

......

. . .

Clearly, every positive rational number is contained in here. Now we form asequence by starting at the left top like this:

11,12,21,31,22,13,14,23,32,41, . . .

that is, we enumerate the fractions mn according to growing m + n. Now here

we have counted several rational numbers twice (as a matter of fact, infinitelyoften), because 1

1 = 22 = 3

3 = . . . , 12 = 2

4 = 36 = . . . etc. But if we omit every

rational number that has appeared before then we get a sequence that containsevery positive rational number exactly once:

11,12,21,31,13,14,23,32,41, . . .

Now we have to take care of 0 and negatives. But this is easy: put 0 at thebeginning, and after each positive rational number squeeze in its negative:

0,11,−1

1,12,−1

2,21,−2

1, . . .

This sequence contains each rational number exactly once, and our claim isproved.

Part of the above arguments can be formalized easily. For example, if youlook at our proof carefully, you should be able to extract a proof of

Proposition 1.31. If A is a subset of a countable set B, then A is countable.

21

Hilbert’s Hotel.

The intuitive problems with infinite countable sets become apparent in Hilbert’sHotel. This is a hotel with infinitely many rooms: one room for each naturalnumber 1, 2, 3, . . .. Assume that all rooms are taken, and that a new guestarrives. There’s nothing you can do in a hotel with finitely many rooms, butthe manager of Hilbert’s hotel simply tells the guest in room n to go to rooms(n) = n + 1 and gives the room with #1 to the new guest. Similarly, if mguests arrive, he sends the guest in room n to room n + m and gives the mguests the rooms 1, 2, . . . , m.

It is hard to imagine how many space there is in Hilbert’s hotel even ifthere are no vacancies: assume that Hilbert’s bus arrives, bringing infinitely butcountably many new guests. The trick above doesn’t work, but the managersaves the day by sending the guest in room n to room 2n, and then giving theroom 2n− 1 to the new guest with number n.

Proposition 1.32. If A and B are countable sets, then so is A ∪B.

Proof. Exercise. (Hint: think of the elements of A as guests at Hilbert’s Hotel,and of the elements of B as newly arrived people in Hilbert’s Bus).

Q as an ordered set

We define an order relation < on Q by putting

[r, s] < [t, u] ⇐⇒ ru < st

(recall that s, u ∈ N). This is well defined: if [r, s] = [r′, s′] and [t, u] = [t′, u′],then rs′ = r′s and tu′ = t′u. Now

[r, s] < [t, u] ⇐⇒ ru < st by definition⇐⇒ rus′u′ < sts′u′ since s′u′ > 0⇐⇒ r′suu′ < ss′t′u since rs′ = r′s and tu′ = t′u⇐⇒ r′u′ < s′t′ since su > 0⇐⇒ [r′, s′] < [t′, u′] by definition

Now we have

Theorem 1.33. Q is an ordered domain (even field).

Proof. Since exactly one of the relations ru < st, ru = st or ru > st is true bythe trichotomy law for integers, exactly one of x < y, x = y or x > y is true,where x = [r, s] and y = [t, u].

Next assume that x < y and y < z, where z = [v, w]. Then ru < st andtw < uv, hence ruw < stw and stw < suv since w, s > 0; transitivity for theintegers gives ruw < suv, and since u > 0, this is equivalent to rw < sv, i.e.,x < z.

This shows that Q is simply ordered. The rest of the proof that Q is anordered domain is left as an exercise.

22

Thus everything proved for general ordered domains holds for the rationals;in particular, x2 ≥ 0 for all x ∈ Q, and |x + y| ≤ |x|+ |y| for x, y ∈ Q.

Now let us collect a few simple results that will turn out to be useful.

Lemma 1.34. We have |x| < |y| if and only if n|x| < n|y| for some n ∈ N.

Proof. Exercise.

Proposition 1.35. If x, y ∈ Z satisfy |x− y| < 1, then x = y.

Proof. If x 6= y, then x = y + z for some nonzero z ∈ Z. This means |z| ∈ N,hence |z| ≥ 1, and we see that |x− y| = |z| ≥ 1.

Proposition 1.36. Let x, y ∈ Q and assume that for every rational ε > 0 wehave |x− y| < ε; then x = y.

Proof. Assume that this is false, i.e. that x − y 6= 0. Then ε = |x − y| is apositive rational number, so by assumption we have |x − y| < ε. This impliesε < ε, which is a contradiction.

Proposition 1.37. Let 0 < x < y be rational numbers. Then there is an n ∈ Nsuch that nx > y.

Proof. Write x = rs and y = s

t with r, s, t, u ∈ N (here we have used thatx, y > 0). Then x < y is equivalent to ru < st; by Prop. 1.19 there is an n ∈ Nsuch that n(ru) > st. But the last inequality is equivalent to nx > y.

23

Chapter 2

Cauchy Sequences in Q

In this chapter, ε will always denote a rational number > 0.

2.1 Sequences in QA sequence (sn) in Q is a sequence s1, s2, s3, . . . of elements si ∈ Q. In mathe-matical terms, we could define it as a function s : N −→ Q assigning a rationalnumber sn to each natural number n.

If s is a rational number such that the elements sn get as close as you wantto s, then we say that sn tends to s, or that s is the limit of the sequence s.Here’s the correct definition; it does away with all the dynamics of ‘tending’ or‘getting closer’, and replaces it by something static.

Definition.1 A sequence (sn) in Q is said to converge to a rational numbers ∈ Q if the following condition is satisfied:

For all ε > 0 there is an N ∈ N such that |sn− s| < ε for all n > N .

If this is true, then we write s = limn→∞ sn or simply s = lim sn.

Remark. What the definition says is that sn converges to s if we can make thedifference |sn − s| as small as we want if we only choose n large enough.

Remark. A definition equivalent to the above was given already in 1665 byJohn Wallis, a contemporary of Pierre Fermat.

As for proofs, you should convince yourself of the following facts:

1. If you want to use the fact that sn converges, then you pick an ε > 0; thefact that sn converges provides you with an N such that |sn − s| < ε forn > N .

2. If you have to prove that sn converges to s, you are given an ε > 0, andyou have to find an N ∈ N such that |sn − s| < ε for all n > N .

1Ross, Def. 7.1.

24

Go through the proofs below to see what this remark means.

Example 1. Consider the sequence sn = 1n . Then the elements of the sequence

get closer and closer to 0. As a matter of fact, they also get closer and closerto −1; but while the difference between 1

n and 0 can be made as small as youwant by choosing n sufficiently large, the difference between 1

n and −1 is alwayslarger than 1.

In fact, we claim that lim 1n = 0. For a proof, we have to show that for every

ε > 0 we can find a natural number N with the property that sn = |sn− 0| < εfor all n > N . Since sn = 1

n , we have to check that 1n < ε, i.e., that n > 1/ε.

If we choose N as any natural number larger than 1/ε, then n > 1/ε for alln > N .

Example 2. A sequence (sn) of integers converges if and only if there is anN ∈ N and an s ∈ Z such that sn = s for all n > N . In this case, lim sn = s.

This shows that the notion of limits is not very important for sequences ofintegers. Let’s now prove our claim.

Assume first that sn converges. We pick ε = 12 ; by definition of convegence

there is an s ∈ Q and an N ∈ N such that |sn−s| < ε for all n > N . Let m > Nbe another integer; then |sn−sm| = |sn−s+s−sm| ≤ |sn−s|+|sm−s| < 2ε = 1.But sn, sm ∈ Z, and if two integers differ by less than 1, then they are equal.This proves that sn = sm =: t for some t ∈ Z and all m,n > N . Since we clearlyhave lim sn = t, the uniqueness of limits proves that s must be an integer.

Now assume that there is an N ∈ N and an s ∈ Z such that sn = s for alln > N . Then we claim that (sn) converges (to s). In fact, given any ε > 0, wehave |sn − s| = 0 < ε for all n > N .

Here are some simple properties of the limit.

Proposition 2.1. 2 Limits are unique: if lim sn = s and lim sn = t, then s = t.

Proof. Assume that s 6= t and put ε = | s−t2 |. Since lim sn = s, there is an

N1 ∈ N such that |sn − s| < ε for all n > N1. Similarly, there is an N2 ∈ Nsuch that |sn− t| < ε for all n > N2. Let N = max {N1, N2}. Then |sn− s| < εand |sn − t| < ε for all n > N . But now

2ε = |s− t| = |s− sn + sn − t|≤ |sn − s|+ |sn − t| < ε + ε = 2ε,

hence 2 < 2, which is a contradiction.

Proposition 2.2. If (sn) converges, then so does the sequence (an) defined bya1 = s1 and an+1 = sn+1 − sn for n ≥ 2, and we have lim an = 0.

Proof. Given ε > 0 we have to find an N ∈ N such that |an| = |an − 0| < εfor all n > N . Our only chance of getting such an N is using what we know

2Ross, p. 27.

25

about (sn). So how are an and sn related? Well, |an+1| = |sn+1− sn|. We haveto make this small, but all we know is that |sn − s| is small for large n, wheres = lim sn. Here is where you learn to appreciate the power of the triangleinqequality:

|sn+1 − sn| = |sn+1 − s + s− sn| ≤ |sn+1 − s|+ |sn − s|.

But the last two terms can be made small, so we have won!Here’s the formal proof: We know that there is an N ∈ N such that |sn−s| <

ε/2 for all n > N . Then

|an+1 − 0| = |sn+1 − sn| ≤ |sn+1 − s|+ |sn − s| < ε

2+

ε

2= ε

for all n > N , hence lim an = 0.

Beginners often feel that the converse of this proposition should also be true,i.e., that if the sn differ less and less from each other, then the sequence (sn)should converge. But: Beware of the harmonic series!

In fact, consider the sequence s1 = 1, s2 = 1 + 12 , . . . , sn = 1 + 1

2 + . . . + 1n ,

. . . . Here we have an = sn+1 − sn = 1n+1 , so clearly lim an = 0. Nevertheless,

(sn) does not converge. In fact,

13

+14

>14

+14

=12,

15

+ . . . +18

> 4 · 18

=12,

19

+ . . . +116

> 8 · 116

=12,

. . .1

2n + 1+ . . . +

12n+1

> 2n 12n+1

=12,

. . .

Thus s4 > 1 + 12 + 1

2 = 2, s8 > 1 + 3 · 12 = 5

2 , s16 > 1 + 4 · 12 = 3, . . . , and in

general s2n > 1 + n · 12 = n+2

2 . Thus the sequence sn is not bounded, and weshall see below (Prop. 4) that such sequences never converge.

Nevertheless the result Prop. 4 is very useful for guessing limits: see theexample at the beginning of Section 2.2.

Theorem 2.3. 3 Assume that the sequences sn and tn converge. Then thesequences sn± tn and sntn converge, and we have lim(sn± tn) = lim sn± lim tnand lim(sntn) = (lim sn)(lim tn).

Remark. This shows that the set C of converging sequences is a ring: if sn andtn converge, then so do sn ± tn and sntn. The neutral element with respect toaddition is the sequence defined by sn = 0, the unit element is the sequence 1,

3Ross, 9.3, 9.4.

26

1, 1, . . . defined by sn = 1. Converging sequences do not form a field, however:we cannot divide by sequences sn containing a 0.

The map lim is a map from C to the ring (even field) Q, and Theorem 2.3says that lim is a ring homomorphism. A map f : R −→ S between rings is aring homomorphism if it respects the structure, that is, if f(r+r′) = f(r)+f(r′)and f(rr′) = f(r)f(r′) for all r, r′ ∈ R, and if it maps the identity of R to theidentity of S. The unit element of C is the sequence 1, 1, 1, 1, . . ., and lim mapsit to 1 ∈ Q.

Proposition 2.4. 4 Convergent sequences (sn) are bounded: there exists anM ∈ N such that |sn| < M for all n ∈ N.

Proof. The idea is simple: assume that lim sn = s; then for ε = 1, there is an Nsuch that |sn−s| < ε = 1. Put M0 = max {s1, . . . , sN} (this is ok since the set isfinite). Then |sn| ≤M0 for all n ≤ N and |sn| = |sn−s+s| ≤ |sn−s|+|s| < |s|+1for all n > N . Now put M = max {M0, |s|+ 1}.

Proof of Thm. 2.3. There’s a lot of work to do; we’ll start with verifying that(sn + tn) converges if (sn) and (tn) do.

So assume that lim sn = s and lim tn = t, and assume we are given someε > 0. We have to find an N ∈ N such that |sn + tn− (s+ t)| < ε for all n > N .We know that we can make |sn − s| and |tn − t| small, so we start by using thetriangle inequality

|sn + tn − (s + t)| = |sn − s + tn − t| ≤ |sn − s|+ |tn − t|.

Since sn converges to s, there is an N1 ∈ N such that |sn − s| < ε/2 for alln > N1. Since tn converges to t, there is an N2 ∈ N such that |tn − t| < ε/2for all n > N2. Put N = max {N1, N2}. Then these inequalities hold for anyn > N , and we find

|sn + tn − (s + t)| ≤ |sn − s|+ |tn − t| < ε

2+

ε

2= ε,

which completes the proof.The proof for (sn − tn) is basically the same, so let’s consider the product

sntn. Here we have to make |sntn − st| small. How is this related to |sn − s|and |tn − t|? Here’s how:

|sntn − st| = |sntn − snt + snt− st| ≤ |sn| · |tn − t|+ |t| · |sn − s|.

This looks very good because we can make |sn − s| and |tn − t| small. There isa problem, however: |tn− t| is multiplied by |sn|, so even if we can make |tn− t|small, |sn| may be so large that it kills all efforts to make the product small.

Fortunately, however, |sn| can’t be too large: after all, (sn) converges, henceis bounded, say by |sn| ≤ M for some constant M . Now choose N2 so largethat |tn− t| < ε/2M for all n > N2, and N1 so large that |sn− s| < ε/2(|t|+ 1)

4Ross Thm. 9.1.

27

(we don’t want to divide by 0 if (tn) happens to converge to 0) for all n > N1.Then put N = max {N1, N2} again and observe that

|sntn− st| ≤ |sn| · |tn− t|+ |t| · |sn− s| < M · ε

2M+ |t| · ε

2(|t|+ 1|)<

ε

2+

ε

2= ε,

where we have used that |t||t|+1 < 1. This shows that lim sntn = st and completes

the proof.

Example. In order to show that lim 1n2 = 0, we would have to work with N ’s

and ε’s if we started with the definition. No one in his right mind would do so,however (excluding textbook authors). What we do is observe that 1

n2 = 1n ·

1n ;

since we have already shown that lim 1n = 0, we get lim 1

n2 = 0.

Exercise. Let B denote the set of bounded sequences. Show that B is a ringcontaining the ring C of converging sequences as a subring.

Exercise. Show that the set K of constant sequences in Q forms a subring ofC.

We have already remarked that we cannot divide by sequences that containa 0 term; if tn 6= 0 for all n ∈ N, then we can introduce the ‘quotient’ (sn/tn).What we would like to have is lim sn

tn= lim sn

lim tn, but this cannot possibly hold if

lim tn = 0. If we exclude these bad guys, however, we get

Proposition 2.5. Assume that sn and tn converge; assume moreover that tn 6=0 for all n ∈ N and that lim tn 6= 0. Then sn/tn converges, and we havelim sn

tn= lim sn

lim tn.

Proof. The proof of the convergence of quotients is similar to that of products,but with an additional twist. In fact, we need to make

∣∣ sn

tn− s

t

∣∣ small. We find∣∣∣sn

tn− s

t

∣∣∣ =∣∣∣snt− tns

tnt

∣∣∣ ≤ |snt− st|+ |ts− tns||tnt|

.

This looks good, because we can make |sn − s| and |tn − t| small and since sand t are constants, but we get a problem from the tn in the denominator: ifthe tn are very small, then their inverse will be very large, which is bad. Butcan the tm be very small without tn converging to 0? They cannot:

Lemma 2.6. 5 If (tn) converges to t 6= 0, then there exists an N ∈ N such thatfor all n > N we have {

tn > 12 t if t > 0,

tn < 12 t if t < 0.

In particular, |tn| > 12 |t| for all n > N .

5Ross, Example 6, §8

28

Assuming this for a moment, we can prove Prop. 2.5 as follows: let an ε > 0be given; choose N1 ∈ N such that |sn − s| < |t|ε/4 for all n > N1; chooseN2 ∈ N such that |tn − t| < t2ε/4|s| for all n > N2; and choose N3 ∈ N suchthat |tn| > 1

2 |t| for all n > N3. Then∣∣∣sn

tn− s

t

∣∣∣ ≤ |t||sn − s|+ |s||t− tn||tnt|

≤ t2/2t2ε/4 + t2ε/4

= ε

whenever n > N := max {N1, N2, N3}.

Proof of Lemma 5. Put ε = |t|/2. Then there exists an N ∈ N such that |tn −t| < ε for all n > N . But then |t| = |t − tn + tn| ≤ |tn − t| + |tn|, hence|tn| > |t| − |tn − t| > |t|/2.

If t > 0, then |tn| > t/2 and |tn − t| < t/2 imply tn > t/2. If t < 0, on theother hand, the same argument shows that tn < t/2.

In some proofs, subsequences of sequences will turn up. These are just whatthe name says: if a1, a2, a3, . . . is a sequence of rational numbers, then anyinfinite sequence you get by omitting terms from (an) is called a subsequenceof (an). In mathematical terms: (bn) is called a subsequence of (an) if there areintegers

n1 < n2 < . . . < nk < nk+1 < . . .

such that bk = ankfor all k ∈ N. Note that nk ≥ k (induction: n1 ≥ 1; if

nk ≥ k, then nk+1 > nk ≥ k, hence nk+1 ≥ k + 1).Subsequences are well behaved:

Proposition 2.7. 6 If (ank) is a subsequence of a converging sequence (an),

then (ank) converges, and we have

limk→∞

ank= lim

n→∞an.

Proof. Let a = lim an; we have to make |ank− a| small. We know that |an − a|

is small, but that’s enough because the ank’s are among the an somewhere.

In fact, let ε > 0; then there is an N ∈ N such that |ak−a| < ε for all k > N .But then we have |ank

− a| < ε for all k > N because nk ≥ k, and that’s it.

Proposition 2.8. Let (sn) be a sequence converging to s. If a ≤ sn ≤ b foralmost all n (that is, for all n larger than some integer N), then a ≤ s ≤ b.

Proof. It is sufficient to prove that a ≤ sn for almost all n implies a ≤ s.Assume that this is false; then a > s, and since sn ≥ a for all n > N , we getsn − s = sn − a + a − s ≥ a − s. If we put ε = a − s > 0, then this showssn − s ≥ ε, in particular |sn − s| > ε, for all n > N . This contradicts the factthat |sn − s| can be made arbitrarily small.

6Ross, Thm. 11.2.

29

Note that strict inequalities in general do not survive the limit: we have1n > 0 for all n ∈ N, but lim 1

n = 0.

Consider the sequence defined by a1 = 2 and an+1 = 12an + 1/an. The first

few terms are a1 = 2, a2 = 1.5, a3 = 1.41666 . . ., a4 = 1.41421 . . ., so it seemsthe sequence converges. But does it?

Let us start with the observation

a2n+1 =

(12an +

1an

)2

=(1

2an −

1an

)2

+ 2;

we deduce that a2n+1 − 2 ≥ 0; in fact, since the an are rational numbers (induc-

tion!), we must have a2n+1 > 2 for all n ≥ 1. This in turn implies that

a2n+1 =

14a2

n + 1 +1a2

n

<14a2

n +32,

hencea2

n+1 − 2 <14(a2

n − 2).

What this means is that the difference between a2n and 2 shrinks by at least a

factor of 4 every time we increase n by 1. In other words: we have lim(a2n−2) =

0, or lim a2n = 2.

Now if an would converge, say lim an = a, then the above discussion wouldshow that a2 = 2. But there is no rational a whose square equals 2, so an doesnot converge. By now you should have the impression that this is due to thefact that Q contains not enough elements to provide limits for sequences thatseem to converge. This observation is at the basis for our construction of thereals: we simply ‘adjoin to Q the limits of sequences that seem to converge’. Weshall make this precise in the next section by introducing Cauchy sequences.

Some Algebra

[This is for those of you who are familiar with algebra. Don’t be alarmed if youare not.]

We have seen in Theorem 2.3 that the set C of converging sequences in Qforms a ring with respect to addition and multiplication, and that lim : C −→ Qis a ring homomorphism.

Now consider the subset N ⊂ C of sequences converging to 0; clearly, thesum and product of sequences converging to 0 again converge to 0, hence N isnot only a subset but a subring of C. Even better: N is an ideal!

Recall that a subring S of R is called an ideal if not only products of elementsof S are again in S, but if products of elements of S with elements of R areelements of S again, that is: if S is closed under multiplication by elements ofR.

In our case, the product of a converging sequence and a sequence convergingto 0 is a sequence converging to 0 (PROOF!), that is, N is closed with respectto multiplication by elements in C.

30

In algebra you learn that ideals are exactly the kernels of ring homomor-phisms. The only ring homomorphism around here was the map lim : C −→ Q;the kernel of a ring homomorphism is the set of elements that get mapped to0, so the kernel of lim is indeed N . Since lim is clearly surjective (given q ∈ Q,the constant sequence q, q, q, . . . converges to q), a standard theorem in algebrashows

Theorem 2.9. We have C/N ' Q, where the isomorphism is induced by lim.

Series

Out of any sequence an we can form a new sequence sn =∑n

k=1 an. If (sn)converges to s = lim sn, then we write

s =∞∑

n=1

an

and call s an infinite series.The most important series in much of calculus and complex analysis is the

geometric series; in addition, it does us the favor of converging:

Theorem 2.10. The series 1 + q + q2 + q3 + . . . is called the geometric series.It converges for any q ∈ Q with |q| < 1, and we have

∑∞k=1 qk = 1

1−q .

Proof. We have to show that the sequence sn =∑n

k=1 qk of partial sums con-verges. This is easy if you remember the little trick involved:

Sn = 1+ q+ q2+ . . .+ qn

qSn = q+ q2+ . . .+ qn+ qn+1

Sn − qSn = 1 −qn+1

hence Sn(1− q) = 1− qn=1 and finally

Sn =1− qn+1

1− q

whenever q 6= 1. If |q| < 1, then lim qn = 0, hence lim Sn = 11−q as claimed.

Here’s a little application:

Exercise. An m × m-matrix M is called nilpotent if there exists an integern ≥ 1 such that Mn = 0 (if you’ve never seen this, show that upper triangularmatrices with 0’s on the diagonal are all nilpotent). If M is nilpotent, showthat E −M has an inverse, where E is the m×m unit matrix.

31

2.2 Cauchy Sequences

The problem with our definition of convergence is: how can we prove that asequence converges if we can’t write down it’s limit? Well, we can’t.

What we can do is to define a (possibly larger) set of sequences that ‘behaveas if they would converge’. This definition should not involve the limit of thesequence (because, in general, we don’t know it).

How do we do that? If a sequence (sn) converges to s, then the differencebetween sn and s becomes as small as we want for large n. This means that thedifference between any two terms sn and sm becomes small if only n and m arelarge enough. So here’s what we’ll do:

A Cauchy sequence in Q is a sequence (sn) with the following property:

For every ε > 0 there is an N > 0 such that |sn − sm| < ε for allm,n > N .

Remark. This condition is due to Cauchy (Analyse algebrique (1821), p. 125).Actually he proved that sequences of real numbers converge if and only if theysatisfy the condition above.

From the way we arrived at the definition we expect that converging se-quences are Cauchy. They are:

Proposition 2.11. 7 Any converging sequence is a Cauchy sequence.

Proof. Let (sn) be a sequence converging to s. This means that we can make|sn − s| small. However, we need to make |sn − sm| small. How do we do that?You should be able to guess the answer by now:

|sn − sm| = |sn − s + s− sm| ≤ |sn − s|+ |sm − s|, (2.1)

and these terms can be made small.OK, here’s the formal proof: given ε > 0, we pick an integer N such that

|sn − s| < ε/2 for all n > N . By (2.1), this implies that |sn − sm| < ε for allm,n > N , and this proves that (sn) is Cauchy.

As a matter of fact, Cauchy sequences share most of the properties of con-verging series; as an Exercise, formulate the analog of Prop 2.2 and prove it. Asan example of how to do it, we will go through the proof of the following

Proposition 2.12. 8 Cauchy sequences are bounded.

Proof. Let (an) be a Cauchy sequence. Since it suffices to prove that (|an|) isbounded, we may assume without loss of generality that an ≥ 0.

Since (an) is Cauchy, there is an N ∈ N such that |am − an| < 1 for allm,n > N . Assume that (an) is not bounded; then for m > N there exists ann > m such that an > am + 1: otherwise, max {a1, . . . , am, am + 1} would bean upper bound. But now |am − an| = am − an > 1: contradiction!

7Ross 10.9.8Ross, Lemma 10.10.

32

We shall also need the analog of Lemma 5 for Cauchy sequences:

Lemma 2.13. Let (tn) be a Cauchy sequence that does not converge to 0. Thenthere exists an N ∈ N and a τ > 0 such that either tn > τ for all n > N ortn < −τ for all n > N . In particular, we have |tn| > τ for all n > N .

Proof. We clearly have to exploit the fact that (tn) does not converge to 0 forgetting a τ > 0 that will work. Now (tn) converges to zero if

∀ε > 0 ∃N ∈ N ∀n > N : |tn| = |tn − 0| < ε.

What does this tell us about sequences not converging to 0? Well, if (tn) doesnot converge to 0, then

∃ε > 0 ∀N ∈ N ∃n > N : |tn| = |tn − 0| > ε.

In plain words: there must be an ε > 0 such that no matter how large youchoose N , there is always some n > N such that tn lies outside the ε-striparound 0.

Now put τ = ε/2 and choose N ∈ N such that |tm − tn| < ε/2 for allm,n > N . Now we pick a particular m by demanding that |tm| > ε, which wecan do by the preceding discussion.

If tm > ε, then tm − tn < ε/2 gives tn > tm − ε/2 > ε/2 for all n > N , andour claim follows.

If tm < −ε, then tm − tn > −ε/2 gives tn < tm + ε/2 < −ε/2 for all n > N ,and again we are home. This concludes the proof.

We have seen above that converging sequences are Cauchy. Conversely,Cauchy sequences in Z converge: in fact, let (sn) be a Cauchy sequence ofintegers, and pick ε = 1

2 ; then there is an N ∈ N such that |sm − sn| < ε for allm,n > N . This implies that sm = sn for all m,n > N , that is, the sequenceeventually becomes constant; in particular, it converges.

Unfortunately, this result does not carry over to Q: there are Cauchy se-quences in Q that do not converge. In fact, the sequence (an) constructedabove is such a beast. Since we already know that it does not converge, all weneed to prove is that it is Cauchy. This follows from a slightly more generalresult which shows that any sequence (an) with |an+1 − an| decreasing quicklyis Cauchy:

Proposition 2.14. 9 Let (an) be a sequence with |an+1 − an| < Cqn for con-stants C > 0 and 0 ≤ q < 1. Then (an) is a Cauchy sequence.

Proof. For m > n we have

|am − an| = |am − am−1 + am−1 − . . .− an+1 + an+1 − an|< |am − am−1|+ |am−1 − am−2|+ . . . + |an+1 − an|≤ C[qm−1 + qm−2 + . . . + qn]= Cqn[1 + q + q2 + . . . + qm−n−1].

9Ross, Exercise 10.6.

33

The sum in the brackets is smaller than∑∞

k=1 qk = 11−q , hence |am − an| <

qn C1−q . Since we may make this expression as small as we wish by choosing n

sufficiently large, the claim follows.

So (an) is a Cauchy sequence, but, as we have already seen, it does notconverge. Here’s another proof for the last claim: assume that a = lim an; thena 6= 0 because clearly an ≥ 1 for all n ≥ 1 (induction). Moreover a = lim an+1,hence a = lim an+1 = lim( 1

2an + 1an

) = a2 + 1

a , hence a2 = 2. But there is noa ∈ Q such that a2 = 2.

There are other useful criteria that allow us to conclude that a given sequenceis Cauchy; the following is particularly simple and useful:

Theorem 2.15. Every monotone bounded sequence is Cauchy.

Here, a bounded sequence is a sequence (an) such that |an| ≤ M for alln ∈ N and some constant M ; a monotone sequence (an) is one that satisfies oneof the following conditions:

a1 ≤ a2 ≤ a3 ≤ . . . ora1 ≥ a2 ≥ a3 ≥ . . .

We call (an) monotone increasing or decreasing according as we are in the firstor the second case.

Proof of Thm. 2.15. Assume that (an) is monotone (say decreasing; otherwise,consider (−an)) and not Cauchy; we have to show that it is not bounded.

Here’s my original attempt: Let ε > 0 and put s1 = a1; since (an) is notCauchy, there is an m > 1 such that |a1 − am| ≥ ε; we put s2 = am. Byrepeating this construction I could show that (an) is not bounded.

However, not being Cauchy does not guarantee that, for any ε > 0 there isa pair m,n ∈ N with |an − am| ≥ ε: if the sequence is bounded and ε is largeenough, all terms will differ from each other by something less than ε.

Here’s the correct way of exploiting that (an) is not Cauchy: we know that(an) is Cauchy if

∀ ε > 0 ∃ N ∈ N ∀ n, m > N : |an − am| < ε.

Thus if an is not Cauchy, there must be some ε > 0 for which this is false; moreexactly:

∃ ε > 0 ∀ N ∈ N ∃ n, m > N : |an − am| ≥ ε.

This works in general: finding the negation of a statement involving quantorsyou simply replace ∃ by ∀ and vice versa, and you replace the actual claim byits negation.

Now let ε > 0 be a positive number such that for all N ∈ N there existm,n ∈ N with |an − am| ≥ ε. Pick any such pair n1 < m1 (they are necessarilydifferent, and the order doesn’t matter) and put N = m1. Then pick a pairn2 < m2 with |an2 − am2 | ≥ ε, put N = m2, and repeat this reasoning. This

34

way we get a sequence of indices n1 < m1 < n2 < m2 < n3 < . . . such thatan1 > am1 ≥ an2 > am2 ≥ an3 > . . .; in fact, we have

an1 − amk= an1 − am1 + am1 + . . .− amk−1 + amk−1 − amk

≥ an1 − am1 + an2 + . . . + ank−1 − amk−1 + ank− amk

> kε.

By making k large enough we can make an1 − amkas large as we please (by

the Archimedean property of the rationals): this means that (an) can’t bebounded.

Ring of Cauchy Sequences

We have seen in the preceding section that the set of converging sequences inQ form a ring. The same is true for the set R of Cauchy sequences: if (an) and(bn) are Cauchy sequences, then so are (an ± bn) and (anbn).

Exercise. Prove that R is a ring.

Here’s a hint for the case an + bn. Given an ε > 0, you have to find aninteger N such that |(am + bm)− (an + bn)| < ε for all m,n > N . Now use thefact that (an) and (bn) are Cauchy, and don’t forget the triangle inequality.

Exercise. Prove that if (an) is a Cauchy sequence not converging to 0 and suchthat an 6= 0 for all n ∈ N, then (1/an) is also Cauchy.

Exercise. Prove that R contains the set N of sequences converging to 0 as asubring.

Actually, much more is true: N is not only a subring of R, it is an ideal.This means that, for any Cauchy sequence (an) ∈ R and any sequence (bn) ∈ N ,the product (anbn) is not only Cauchy, but lies again in N .

Exercise. Prove that N is an ideal in R.

Remark. Those with a working knowledge about rings and ideals can nowimmediately define the real numbers R as the ring R/N . We shall define R inthe next chapter as a set of equivalence classes, and then define a ring structureon this set: in the algebraic approach, we get this ring structure for free.

Remark. We can define the following sets of sequences of rational numbers:

N = sequences converging to 0C = converging sequencesR = Cauchy sequencesB = bounded sequencesS = sequences

These are all rings, and in fact

N ⊂ C ⊂ R ⊂ B ⊂ S,

where all inclusions are proper.

35

Chapter 3

From Q to R (and C)

3.1 The Reals RThe construction of the real numbers differs fundamentally from the construc-tion of the integers or the rationals, and in fact some mathematicians (e.g.Kronecker) did not accept the following construction (and thus the existence ofreal numbers as we know them) for mainly philosophical reasons.

Let R denote the set of all Cauchy sequences of rational numbers. Wesay that two sequences (an) and (bn) are equivalent (and write (an) ∼ (bn)) iflim(an−bn) = 0, that is, if they differ by a null sequence (a sequence convergingto 0).

WARNING. This does not imply that lim an = lim bn, because these limitsmay not exist!

Let R = R/∼ be the set of all equivalence classes of R. Then we can identifyQ with a subset of R: in fact, for r ∈ Q, the constant sequence rn = r is certainlyCauchy, so the map ι : Q −→ R sending r to the real number [rn] := [(rn)] allowsus to identify r with the real number ι(r).

The map ι is injective: if ι(r) = ι(s), then (rn) ∼ (sn) for the constant setiesrn = r and sn = s; this implies that rn − sn converges to 0, and so r = s.

Defining addition and multiplication on the reals is a breeze: we set

[rn]± [sn] = [rn ± sn], (3.1)[rn] · [sn] = [rn · sn]. (3.2)

First we have to verify that this is defined at all: but if (rn) and (sn) are Cauchysequences, then so are (rn±sn) and (rnsn), so the expressions on the right handside of (3.1) and (3.2) make sense.

Next we claim that (3.1) is well defined: assume that [rn] = [r′n] and [sn] =[s′n]; we have to show that [rn + sn] = [r′n + s′n]. But rn − r′n and sn − s′n aresequences converging to 0, hence so is rn + sn − (r′n + s′n), and this implies theclaim. The proof that multiplication is well defined is similar.

36

The new addition on real numbers agrees with the corresponding operationson Q under the embedding ι : Q ↪→ R, in other words: that the map ι is aring homomorphism. In fact, if r and s are rational numbers, and if ι(r) = [rn]and ι(s) = [sn] are the corresponding real numbers defined by the constantsequences rn = r and sn = s, then

ι(r) + ι(s) = [rn] + [sn] = [rn + sn] = ι(r + s),

and similarly for products. Moreover, ι(0) is the neutral element of addition,and ι(1) that of multiplication.

So far we know that R is a ring containing Q as a subring. Before we canshow (or even claim) that R is a field we have to define division, which is a bitmore complicated: if (rn) and (sn) are Cauchy sequences such that lim sn existsand is different from 0, then we cannot simply put

[rn]/[sn] = [rn/sn]

because although sn does not converge to 0, some of the sn might vanish. Here’sone possible way around it: we claim

Lemma 3.1. If (sn) is a Cauchy sequence not converging to 0, then sn 6= 0 foralmost all n, that is, for all n larger than some natural number N .

Proof. Assume not. Then there are infinitely many indices n1, n2, . . . , suchthat ank

= 0. Since (an) is Cauchy, for every ε > 0 there is an N such that|an − am| < ε for all m,n > N . In particular, |ank

− am| < ε for all m,nk > N .But ank

= 0, hence we have proved that for every ε > 0 we have |am| < ε forall sufficiently large m: but this implies that (am) converges to 0.

Actually this lemma is a special case of Prop. 2.13; the proof given here isslightly simpler because we have proved less.

Thus if (sn) is a Cauchy sequence not converging to 0, only finitely manyterms actually vanish. The sequence (s′n) defined by

s′n =

{sn if sn 6= 01n if sn = 0

differs from (sn) in only finitely many places. In particular, sn− s′n is a null se-quence, hence [sn] = [s′n]. Moreover, we know that (1/s′n) is a Cauchy sequenceby Lemma 2.13 (in fact, sn = s′n for all n > N for some N , and the sequencean = sn+N is a Cauchy sequence satisfying the conditions of Lemma 2.13. Thus1/an is a Cauchy sequence, and since adding finitely many terms to an does notchange this, so is 1/s′n). Thus if (rn) and (sn) are Cauchy sequences and if (sn)is not a null sequence, then (rn/s′n) is a Cauchy sequence, and we may definedivision by

[rn]/[sn] = [rn/s′n]. (3.3)

Using these definitions, it is now straight forward (if tedious) to verify that theoperations on the reals as constructed above satisfy the usual properties, andin particular that the set R of reals forms a field.

37

Let us show for example that every nonzero element of R has an inverse. Tothis end, consider a real number s 6= 0; then s = (sn) for a Cauchy sequence(sn) that does not converge to 0. Replacing (sn) by (s′n) as above we get areal number r = [1/sn], and clearly rs = ι(1), where ι(1) is the real numberrepresented by the constant sequence 1, 1, 1, . . . .

Theorem 3.2. The real numbers R form a field containing Q as a subfield.

What we would like to do now is to prove that R does what we want it todo, namely provide a limit for every Cauchy sequence (rn) in Q. But what doesit mean that lim rn = r? It should mean that we can make |rn − r| small, butthis does not yet make sense because we do not yet have an absolute value onR. This will be taken care of next.

R as an ordered set

We say that x > y if x − y > 0, so it is sufficient to define when r > 0 for anr ∈ R. Assume that r 6= 0 and let r = [rn]. By Lemma 2.13, we either havern > 0 or rn < 0 for sufficiently large n. In the first case, we put r > 0, in thesecond we put r < 0. As for rationals, we say r ≥ 0 if r = 0 or r > 0, and definer ≤ 0 accordingly.

We claim that this is well defined. In order to prove this, let (rn) and (r′n) betwo Cauchy sequences in Q representing the same real number r = [rn] = [r′n]. Ifr > 0, then there is a τ > 0 such that rn > τ for all n > N for a suitable N ∈ N.Since (rn−r′n) converges to 0, there is an N0 ∈ N such that |rn−r′n| < 1

2τ for alln > N0. But then r′n = r′n−rn +rn > rn− 1

2τ > 12τ for all n > max {N,N0}, so

r′n > 0 for all sufficiently large n. Thus it does not matter whether we representr by (rn) or (r′n).

It is now easy to arrest the usual suspects:

Proposition 3.3. The order < on R satisfies the usual properties:

1. The identification map ι : Q ↪→ R respects the order: for x, y ∈ Q we haveι(x) < ι(y) if and only if x < y.

2. (Trichotomy Law) For r ∈ R, exactly one of the statements r < 0, r = 0and r > 0 holds.

3. If x, y ∈ R and x > 0, y > 0, then xy > 0. Similarly, if x > 0 and y < 0,then xy < 0, and if x, y < 0, then xy > 0.

4. We have x2 ≥ 0 for all x ∈ R. In particular, there is no real numberwhose square equals −1.

5. If x < y and y < z, then x < z.

6. If x ≥ y and y ≥ x, then x = y.

38

Proof. We’ll leave some of the proofs as an exercise. Assume that x, y ∈ Q. Thenι(x) is the real number that is represented by the Cauchy sequence xn = x forn ∈ N. Thus ι(x) − ι(y) = ι(x − y) is represented by the sequence zn = x − y.The claim is that x− y < 0 if and only if [zn] < 0. But if x < y, then the termsof (zn) (namely x − y) are all negative, so [zn] < 0 by our definition, and theconverse is just as obvious. This proves 1.

For 2, observe that r < 0 and r = 0 cannot hold at the same time bydefinition. Moreover, we cannot have r < 0 and r > 0: in fact, put r = [rn].Then r < 0 implies that rn < 0 for all sufficiently large n, whereas r > 0 impliesrn > 0 for large n. But the rn are rational, and by the trichotomy law forrationals we cannot have rn > 0 and rn < 0 at the same time.

Now let r = [rn] be a real number. If rn converges to 0, then r = 0 because(rn) and the sequence 0, 0, 0, . . . differ by a null sequence. If (rn) does notconverge to 0, then Lemma 2.13 shows that either rn > 0 or rn < 0 for allsufficiently large n, hence by definition either r > 0 or r < 0. This proves thetrichotomy law for R.

Now assume that x and y are real numbers with x > 0 and y > 0. Bydefinition this means that if x = [xn] and y = [yn] for Cauchy sequences (xn)and (yn) of rational numbers, then there exist N1, N2 ∈ N with xn > 0 forn > N1 and yn > 0 for n > N2. But then xnyn > 0 for all n > max {N1, N2},hence xy = [xnyn] > 0.

The proofs of the other claims are left as an exercise.

Now we introduce an absolute value on R by defining

|r| =

{+r if r ≥ 0,

−r if r < 0.

It is easy to check that | · | has the usual properties; in particular, it satisfiesthe triangle inequality.

We shall occasionally need to show that certain real numbers are small;assume that ε > 0 is rational and r = [rn] is a real number. Then |r| < ε means−ε < r < ε. and by definition of < this means that there is a natural numberN such that −ε < rn < ε.

Finally, let’s prove the Archimedean property.

Lemma 3.4. If x > 0 is a real number, then there is a rational number r suchthat 0 < r < x.

Proof. Let (xn) be a sequence of rational numbers with x = [xn]. Lemma 2.13provides us with a positive rational number r > 0 such that xn > r for allsufficiently large n ∈ N. By definition of the order, this means x > r.

Corollary 3.5 (Archimedean Property). 1 If a, b > 0 are real numbers,then there is a natural number n ∈ N such that na > b.

1Ross, 4.6.

39

Proof. Put x = ab ; by the above Lemma there is a rational number p

q such that0 < p

q < x. Then ba < q

p < q + 1, so n = q + 1 ∈ N does it.

Corollary 3.6. For any real x > 0, there is an integer z ≥ 0 such that z ≤ x <z + 1.

Proof. Consider the set M = {n ∈ N : n > x}. Then M is a non-empty setof natural numbers, since applying the archimedean property with a = 1 andb = x shows that there is some n ∈ N with n > x. Since N is well-ordered,M has a minimal element n. Put z = n − 1. Then 0 ≤ z ∈ Z and z + 1 ≥ xfollow immediately, and z ≤ x also holds: for if not, then z = n − 1 > x,which contradicts the minimality of n unless n = 1, in which case 0 = z > xcontradicts the assumption x > 0.

The integer z in the above corollary is called the largest integer less than orequal to x; we write z = bxc.

Theorem 3.7. 2 The rationals Q are dense in R: for all real numbers a < b,there is a rational number q such that a < q < b.

Proof. We have b − a > 0, so by the Archimedean property there is an integern with n(b − a) > 1, that is, with nb > na + 1. Put z = bnbc; then z + 1 >nb ≥ z, hence z > nb − 1 > na. Together, this shows that nb ≥ z > na, henceb ≥ z

n > a.

R as a complete field

Now that we have an absolute value, we can carry over a couple of definitionsfrom the rationals: a sequence (rn) of real numbers is called

• bounded if there exists an M ∈ R such that |rn| < M for all n ∈ N.

• convergent if there is an r ∈ R such that, for all ε > 0, there is an N ∈ Nsuch that |rn − r| < ε; we call r the limit of (rn) and write r = lim rn.

• a Cauchy sequence if for all ε > 0 there is an N ∈ N such that |rn−rm| < εfor all m,n > N .

Let N (C, R, B) denote the set of null (converging, Cauchy, bounded sequences.Again, all these sets are rings, in fact subrings of the ring S of sequences of realnumbers. The proofs are exactly the same. The main difference is that in theinclusions

N ⊂ C ⊆ R ⊂ B ⊂ S,

the inclusion C ⊆ R is not proper anymore: a major theorem that we shallprove below is that C = R in R. Moreover, converging sequences have exactlyone limit (same proof).

Now we are ready to show2Ross, 4.7.

40

Proposition 3.8. Every Cauchy sequence in Q converges to some real number.

Proof. Let (rn) be a Cauchy sequence of rational numbers. We claim thatlim rn = r, where r = [rn]. To this end, let an ε > 0 be given (we may assumethat ε is rational; if it is not, we replace it by some smaller positive rationalnumber); we have to find an integer N such that |rn − r| < ε for all n > N . Bydefinition of the order <, this holds if and only if the inequality |rn− rm| < ε ofrational numbers holds for all sufficiently large m. But since (rn) is a Cauchysequence, there is an N such that |rn − rm| < ε for all m,n > N .

We now formulate the result on which the rest of this course is built:

Theorem 3.9. The field R of reals is complete, that is: every Cauchy sequenceconverges.

At first it might seem that there is nothing to prove here. After all, didn’twe introduce R by ‘adding’ the limits of every Cauchy sequence to Q? Doesn’tthat mean that every Cauchy sequence converges?

We have to be very careful here: by Prop. 3.8, only Cauchy sequencesconsisting of rational numbers converge in R. But maybe there is a Cauchysequence consisting of real numbers whose limit is not real? Well, there is nosuch thing, but this must be proved.

Proof. Assume that (an) is a Cauchy sequence of real numbers. We have tofind a real number a such that lim an = a. Real numbers are limits of Cauchysequences in Q, so we need to construct a Cauchy sequence in Q with the samelimit as (an). Here’s how we do that:

Every real number an has the form an = [an,k], where (an,k) is a Cauchy se-quence of rational numbers. By Proposition 3.8, this Cauchy sequence convergesto an. Let us arrange all these numbers as follows:

a1,1 a2,1 a3,1 . . . an,1 . . .a1,2 a2,2 a3,2 . . . an,2 . . .a1,3 a2,3 a3,3 . . . an,3 . . .

......

......

a1,n a2,n a3,n . . . an,n . . ....

......

...a1 a2 a3 . . . an . . .

Now the first idea that came to my mind was making up a new sequence byputting s1 = a1,1, s2 = a2,2, . . . , sn = an,n. If one could show that this sequenceof rational numbers is Cauchy, it would have a limit a, and then a proof thata = lim an would be mere child’s play. After an hour of futile attempts to provethat (sn) is Cauchy I realized that it is easy to show that this approach cannotpossibly work: all you have to do is to replace an,n by n; then for all m ∈ N, am,ν

would still be a Cauchy sequence converging to am, but our sequence sn = nwould certainly not converge.

41

This shows that you have to make sure that the sm you pick from thesequence am,n is very close to the limit am. Here’s how to do that: since (am,n)converges to am, we can find an n such that |am,n − am| < 1

m and then putsm = am,n.

Now we make two claims:

A. the sequence (sn) is Cauchy;

B. we have lim sn = lim an.

A) (sn) is a Cauchy sequence. We have to make |sµ−sν | small. All we knowabout the sµ is that they are close to aµ, so we try to invoke the old triangle:

|sµ − sν | = |sµ − aµ + aµ − aν + aν − sν |≤ |sµ − aµ|+ |aµ − aν |+ |aν − sν |.

This looks promising.In fact, we can now prove A. Let ε > 0 be given. By our choice of sµ, we

have |sµ − aµ| < 1/µ; for µ > 3/ε, we therefore have |sµ − aµ| < ε/3. Similarly,we find |sν − aν | < ε/3 for ν > 3/ε. Finally, there is an N0 ∈ N such that|aµ − aν | < ε/3 whenever µ, ν > N0. Thus, for N = max {N0, b3/εc + 1} andall µ, ν > N , we have

|sµ − sν | <ε

3+

ε

3+

ε

3= ε.

Since (sn) is a Cauchy sequence of rational numbers, it converges to a realnumber a by Prop. 3.8. We now claim that a = lim an. The triangle inequalitygives |an − a| ≤ |an − sn| + |sn − a|. We know |an − sn| < 1

n by construction,so for ε > 0 we can pick N1 > 2/ε, and then |an − sn| < ε/2 for all n > N1.Moreover, |sn − a| < ε/2 for all n larger than some N2 (since a = lim sn),hence |an − a| < ε for all n > max{N1, N2}, and this implies a = lim an. Thiscompletes the proof of B.

Supremum and Infimum

Let S ⊂ R be a nonempty set of real numbers. We say that M ∈ R is an upperbound of S if s ≤ M for all s ∈ S. We say that M is a least upper bound (orsupremum) if M is an upper bound of S and if there is no upper bound m < Mof S. The corresponding concept for lower bounds is the infimum of S (thegreatest lower bound). We write supS and inf S to denote the supremum andthe infimum of S (in fact, if a supremum (infimum) exists, then it is unique:EXERCISE).

It is extremely important to understand the difference between the supre-mum and the maximum of a set: for S = {1 − 1

n : n ∈ N}, we have supS = 1,but maxS does not exist.

Corollary 3.10. Every nonempty set S of real numbers with an upper boundM has a supremum.

42

Proof. Put m1 = M and pick s1 ∈ S. If m1 = s1, then m1 is a supremum of S (infact, a maximum). If m1 > s1, then we consider the number M2 = 1

2 (m1 + s1).If s ≤ M2 for all s ∈ S, then M2 is an upper bound, and we set m2 = M2 ands2 = s1; if not, then there is an s2 ∈ S with s2 > M2, and we put m2 = m1.

Assume that we have constructed upper bounds m1 ≥ m2 ≥ . . . ≥ mn andelements s1 ≤ s2 ≤ . . . ≤ sn of S with mk − sk = 1

2 (mk−1 − sk−1) for all1 ≤ k ≤ n. Put Mn+1 = 1

2 (mn + sn). Again there are two cases:

• s ≤ Mn+1 for all s ∈ S; then Mn+1 is an upper bound for S, and we setmn+1 = Mn+1 and sn+1 = sn. Observe that mn+1−sn+1 = Mn+1−sn =12 (mn − sn).

• s > Mn+1 for some s ∈ S; put sn+1 = s and mn+1 = mn. Observe thatmn+1 − sn+1 = mn − sn+1 < mn −Mn+1 = 1

2 (mn − sn).

We note for later use that

0 ≤ mn+1 − sn+1 ≤12(mn − sn). (3.4)

Clearly (mn) is a monotone decreasing sequence, and since mn ≥ s for alls ∈ S, we have mn ≥ s1, that is, (mn) is bounded from below by s1. Butmonotone decreasing sequences bounded from below are Cauchy, and since R iscomplete, (mn) converges; put m = lim mn.

Similarly, (sn) is a monotone increasing sequence bounded from above bym1, hence it is Cauchy, so converges in R. Put s = lim sn; we claim that m = s.In fact, applying the limit to the inequalities (3.4), we get 0 ≤ m−s ≤ 1

2 (m−s).Subtracting 1

2 (m− s) from m− s ≤ 12 (m− s) gives 1

2 (m− s) ≤ 0; on the otherhand, 1

2 (m− s) ≥ 0 since m− s ≥ 0, hence m = s.Now we claim that m is a least upper bound of S. There are two things to

prove:

• m is an upper bound. Assume it isn’t. Then there is an s0 ∈ S suchthat s0 > m. Put ε = s0 − m. Then there is an integer N such that|mn − m| < ε; moreover, s0 ≤ mn since by construction, the mn wereupper bounds for S. But now

0 ≤ mn − s0 = mn −m + m− s0 < ε− ε = 0,

which is a contradiction.

• There is no upper bound m′ < m of S. For assume there is; then s ≤ m′0for all s ∈ S, in particular sn ≤ m′ for the elements of the sequence (sn);but then lim sn ≤ m′ < m, contradicting the fact that lim sn = m.

43

Decimal Expansion

Fix a natural number g ≥ 2. Given any sequence (dn) of integers dn ∈{0, 1, . . . , g}, the sequence of rational numbers

sn =n∑

k=1

dkg−k

is a Cauchy sequence. In fact, if m > n > N for some N ∈ N, then

|sm − sn| =m∑

k=n+1

dkg−k ≤m∑

k=n+1

(g − 1)g−k < (g − 1)∞∑

k=N+1

g−k

= (g − 1)g−N∞∑

k=+1

g−k = (g − 1)g−N 1g − 1

= g−N

Thus if we are given an ε > 0, and if we choose N so large that g−N < ε,then |sm − sn| < ε for all m,n > N . This means that all numbers of the form∑

dkg−k are real numbers, and in fact they are real numbers in the interval[0, 1].

Conversely, let r ∈ [0, 1) be a real number. We claim that it has a g-adic expansion. Assume for the moment we already have what we are lookingfor: r = 0.d1d2d3. . . ; then gr = d1.d2d3 . . ., hence d1 = bgrc. Turning thisobservation around we see how we have to proceed.

In fact, put d1 = bgrc and r1 = gr − d1; then d1 ∈ {0, 1, . . . , g − 1} andr1 ∈ [0, 1). The first claim follows from d1 being an integer by definition andthe inequality 0 ≤ gr < g; the second claim follows from the definition ofthe floor function. This allows us to define a sequence (dn) of integers dn ∈{0, 1, . . . , g − 1} inductively: assume that we have found d1, . . . , dn, and thatrn ∈ [0, 1); then put dn+1 = bgrnc and rn+1 = grn − dn.

What remains to show is that we do in fact have r = 0.d1d2d3. . . . Byconstruction, 0 ≤ gr−d1 < 1; using induction, we can prove 0 < gnr−(d1g

n−1+d2g

n−2 + . . . + dn) < 1, which is equivalent to 0 < r− 0.d1d2 . . . dn < g−n. Thisimplies the claim, and we have proved:

Theorem 3.11. Every real number has a decimal expansion.

It is a bit unfortunate that the decimal expansion of real numbers is notalways unique:

Proposition 3.12. We have 0.99999 . . . = 1.

Proof. By definition,

0.99999 . . . =∞∑

n=1

9 · 10−n = 9∞∑

n=1

10−9 = 91

10− 1= 1.

44

In fact we have

Proposition 3.13. Every real number either has a unique g-adic expansion,or it has two: one that terminates and another one ending in infinitely manyg − 1’s.

Proof. If r = 0.d1d2 . . . dn > 0 has a terminating g-adic expansion (with dn 6= 0),then we also have r = 0.d1d2 . . . (dn − 1)(g − 1)(g − 1)(g − 1) . . .. The harderproblem is showing that reals have at most two g-adic expansions.

It is sufficient to do this for reals in the interval [0, 1]. So assume thatr = 0.d1d2d3 . . . = 0.e1e2e3 . . . are two different g-adic expansions of r ∈ [0, 1],and let m denote the smallest index such that dm 6= em. We may assume thatdm < em.

We claim that dm+1 = dm+2 = . . . = g − 1; in fact, if dm+k < g − 1, thenusing the geometric series we get

r =m∑

j=1

dmg−j +∞∑

j=m+1

djg−j

<m∑

j=1

dmg−j +∞∑

j=m+1

(g − 1)g−j

=m∑

j=1

dmg−j + g−m

= 0.d1d2 . . . (dm + 1)≤ 0.e1e2 . . . em < r,

which is a contradiction.In a similar way, we can prove that em+k = 0 for all k ≥ 1. Thus if a

real number has two different g-adic expansions, then one of them is termi-nating, and there is exactly one other expansion, namely one terminating with0.d1 . . . dm(g − 1)(g − 1)(g − 1) . . ..

Theorem 3.14 (Cantor). The set of real numbers (0, 1) is not countable.

Proof. Our only chance is a proof by contradiction, so let us assume that wecan enumerate the real numbers in (0, 1), that is, assume that the sequence ofreal numbers r1, r2, r3, . . . contains each and every element of (0, 1). Now wewrite each of these real numbers in decimal notation:

r1 = 0.r11r12r13 . . . ,r2 = 0.r21r22r23 . . . ,r3 = 0.r31r32r33 . . . ,

and make once more use of the diagonal to construct another real number. Whatwe do is define a real number

s = 0.s1s2s3 . . .

45

by putting

sn =

{rnn + 1 if rnn ≤ 5rnn − 1 if rnn ≥ 6

This is a genuine real number because sn ∈ {0, 1, . . . , 9}, and clearly s ∈ [0, 1].In fact, s 6= 0, 1 because 5 ≤ sn ≤ 8.

Now what is so special about our number s? Well, for one thing, s 6= r1,because these numbers differ at the first decimal: s1 = r11 ± 1 6= r11. Similarly,s 6= r2 because s2 6= r22, and in general, s 6= rn because sn = rnn ± 1 6= rnn.But this means that our real number s does not occur in the sequence r1, r2, r3,. . . : contradiction.

Exercise. Every rational number has a decimal expansion; we know that ra-tionals are countable. Where does the proof above go wrong if we attempted touse it to prove that rationals are not countable?

Remark. Given sets A and B, we say that A and B have the same cardinalityif there is a bijection between A and B. We say that B has greater cardinalitythan A if they do not have the same cardinality and if there is a bijectionbetween A and a subset of B. The fact that Q is countable means that N and Qhave the same cardinality. The theorem above shows that there are many morereal than rational numbers, or more exactly that the cardinality of R is greaterthan that of Q. Cantor, who invented the theory of countable sets and cardinalnumbers, also asked the following apparently simple question: is there a subsetC of R with Q ⊂ C ⊂ R such that the cardinality of C is larger than that of Qbut smaller than that of R?

Cantor conjectured that there is no such set (this conjecture became knownas the continuum hypothesis; here continuum is Cantor’s word for the reals) andthought several times that he had found a proof, but each time found a gap. Thestriking result of Godel and Cohen3 is that this theorem is independent of ZFC,the axiom system of set theory that is currently accepted (named after Zermeloand Frankel, with C representing the axiom of choice). What that means is thatnot only is it impossible to prove the continuum hypothesis within this systemof axioms, it is also impossible to prove that it is wrong. One possible way ofmaking some sense of this affair is to say that our current set of axioms is notstrong enough to decide Cantor’s question.

Finally let me point out that there is no intuition to guide you throughthe theory of cardinalities, essentially because the human mind is not used todealing with infinities. Here’s a striking example:

Theorem 3.15. The interval [0, 1) ⊂ R and the unit square [0, 1)× [0, 1) ⊂ R2

have the same cardinality.3Godel proved in 1938 that assuming the continuum hypothesis does not contradict ZFC,

that is, you can’t prove that it is wrong from ZFC; in 1963, Paul Cohen showed that you alsocan’t prove the continuum hypothesis from ZFC.

46

Naively, one would have expected the 2-dimensional square to have manymore points than the interval. When Cantor stumbled across this result, he sentoff a letter to Dedekind and wrote ‘I see it, but I don’t believe it’. The idea ofthe proof, as a matter of fact, is so simple that it almost hurts: we have to finda bijection between the two sets above. Write all real numbers as decimals (notallowing them to end with an infinite string of 9’s to make the representationunique). Then we define a map [0, 1) −→ [0, 1)× [0, 1) by

0.a1a2a3 . . . 7−→ (0.a2a4a6 . . . , 0.a1a3a5 . . .).

This map is obviously surjective; unfortunately it is not injective, since e.g.

0.01010101 . . . 7−→ (0, 0.19999 . . .)

and0.02 7−→ (0, 0.2).

Theorem 3.16. For any real number a > 0, there is exactly one real b > 0 withb2 = a.

The element b in the above theorem is denoted by√

a.

Proof. Consider the sequence (an) defined by a1 = 1 and

an+1 =12

(an +

a

an

).

We find thata2

n+1 − a =14

(an −

a

an

)2

for all n ≥ 1; in particular, we have a2n ≥ a for all n ≥ 2.

a2n+1 =

14a2

n +a

2+

a

4a2n

<14a2

n +3a

4,

hence0 ≤ a2

n+1 − a <14(a2

n − a).

Thus a2n converges to a. Finally,

an − an+1 =1

2an(a2

n − b),

and since an is bounded and |a2n − b| < 4|a2

1 − a|4−n, (an) is Cauchy, henceconverges. This gives a = lim a2

n = (lim an)2 and proves the claim.

Proposition 3.17. If a, b ∈ R are positive, then√

ab =√

a√

b.

47

Proof. By Theorem 3.16, it is sufficient to prove that√

a√

b is the positive rootof y = x2−ab. But this follows from what we know about multiplication of realnumbers.

We remark that the inequality x2 ≥ 0 has some beautiful and useful conse-quences. A very important corollary is the inequality between arithmetic andgeometric means:

Proposition 3.18. For real numbers a, b ≥ 0, we have a+b2 ≥

√ab, with equality

if and only if a = b.

The proof is left as homework. The inequality is powerful enough to help ussolve some simple minimax problems.

Exercise. Among the rectangles with circumference c, determine the one withmaximal area.

Let x and y denote the sides of the rectangle; then x + y = c2 . We have

to maximize the area A = xy. By the inequality of arithmetic and geometricmeans, we have

√A =

√xy ≤ x+y

2 = 14c, hence A ≤ 1

2

√c with equality if and

only if x = y. Thus the rectangle with given circumference and maximal areais the square.

The following result is often useful:

Proposition 3.19 (Sandwich Lemma). Let (an) and (bn) be sequences ofreal numbers with the same limit s. If (sn) is a sequence with an ≤ sn ≤ bn forall n larger than some N ∈ N, then sn converges, and lim sn = s.

Proof. Let an ε > 0 be given. Then there are integers N1, N2 such that |an−s| <ε for n > N1 and |bn−s| < ε for n > N2. Now −ε < an−s ≤ sn−s ≤ bn−s < εshows that |sn − s| < ε for all n > max{N1, N2}.

e

We now can define Euler’s number

e = 1 + 1 +12!

+13!

+14!

+ . . . =∞∑

n=0

1k!

,

but of course we have to show that this series converges. Since the sequence ofpartial sums ek =

∑kn=0

1k! is monotone, it suffices to show that the sequence is

bounded. This is quite pretty: just observe that 1n! = 1

n(n−1)···1 ≤1

n(n−1) , and

48

the ‘do the telescope’:

ek =k∑

n=0

1k!≤ 2 +

k∑n=2

1n(n− 1)

= 2 +k∑

n=2

( 1n− 1

− 1n

)= 2 + 1− 1

2+

12− 1

3+

13− . . .− 1

n− 1+

1n− 1

− 1n

= 2 + 1− 1n

< 3.

For those who are tired of seeing proofs that√

2 is irrational, let us provethat

Proposition 3.20. The number e is irrational.

Proof. Of course we do that by contradiction. Assume that e = pq . Then

q!e = (q − 1)! · p is an integer. On the other hand,

q!e = q! + q! +q!2!

+q!3!

+ . . . +q!q!

+q!

(q + 1)!+ . . .

is an integer plus

X =q!

(q + 1)!+

q!(q + 2)!

+. . . =1

q + 1+

1(q + 1)(q + 2)

+1

(q + 1)(q + 2)(q + 3)+. . . .

We will have reached a contradiction if we can prove that 0 < X < 1. ClearlyX > 0; on the other hand,

X <1

q + 1+

1(q + 1)(q + 2)

+1

(q + 2)(q + 3)+ . . .

=1

q + 1+

1q + 1

− 1q + 2

+1

q + 2− 1

q + 3+ . . .

=2

q − 1,

hence X < 1 if q > 3. But for q = 1, 2, 3 it is easily seen by inspection thate = 2.71828 . . . 6= p/q.

3.2 The Complex Numbers

Constructing complex numbers from the reals is really easy again (the hard partwas constructing R from Q): consider the set

C = {(x, y) : x, y ∈ R}

49

of pairs of real numbers; we call any such pair a complex number. We defineaddition and multiplication on C by

(r, s) + (t, u) = (r + t, s + u), (3.5)(r, s) · (t, u) = (rt− su, ru + st). (3.6)

Since we have defined these operations on elements (not on equivalence classes),there is no well-definedness to prove here. We identify x ∈ R with the (x, 0); itis then easy to check that C is a field containing R as a subfield. More exactly:consider the map ι : R −→ C defined by ι(x) = (x, 0). Then ι is an injectivering homomorphism, i.e., ι(x + y) = ι(x) + ι(y), ι(xy) = ι(x)ι(y), ι(1) = (1, 0)is the multiplicative identity in C.

We now introduce the abbreviations 1 = (1, 0) and i = (0, 1); then (x, y) =(x, 0)+(0, x) = x·1+y·i =: x+yi. Note that i2 = −1 since (0, 1)·(0, 1) = (−1, 0).

It can be shown that C cannot be ordered in such a way that z1 > 0 andz2 > 0 imply z1z2 > 0. On the other hand, we can easily define an absolutevalue on C by putting

|x + iy| =√

x2 + y2.

Proposition 3.21. For z, w ∈ C we have

1. |z| ≥ 0, with equality if and only if z = 0;

2. |zw| = |z| · |w|;

3. Triangle inequality: |z + w| ≤ |z|+ |w|.

The proof is left as an exercise. Using | · |, it is easy to define Cauchysequences exactly as in R. As a matter of fact, even the proofs carry overwithout changing a single word; thus we have

Theorem 3.22. Every Cauchy sequence in C converges. The set C of Cauchysequences in C forms a ring, and lim : C −→ C is a ring homomorphism.

Given any sequence (zn) of complex numbers, we can form two sequencesof real numbers by taking the real and the imaginary part: we simply writezn = xn + iyn with xn, yn ∈ R. What does it mean in terms of (xn) and (yn)that (zn) is Cauchy? Well, the connection is a simple as it can be:

Proposition 3.23. The sequence (zn) with zn = xn + iyn is a Cauchy sequenceif and only if (xn) and (yn) are; in that case, lim zn = lim xn + i lim yn.

Most of the results that we prove for sequences and series of real numbersalso hold for complex numbers; the only things that can possibly go wrong arethose connected with the order on R, because there is nothing like this in C.Consider e.g. the proof of the existence of square roots of positive reals in R:the proof there fails to hold in C because we used the order of R. Nevertheless,square roots exist, even for all complex numbers:

50

Proposition 3.24. For every z ∈ C there is a w ∈ C such that w2 = z. Thisnumber is unique up to sign, that is, w and −w are the only complex numberswith w2 = z.

Proof. We reduce everything to R by writing z = x + iy and w = s + ti. Theequation w2 = z gives s2 − t2 = x and 2st = y. There are two cases:

• y = 0. Then s = 0 or t = 0.

If x < 0, the first case must hold because of s2 − t2 = x, and thent2 = −x > 0 has a solution in R. But then w = 0 + it solves our problem.

If x > 0, the second case must hold, and then s2 = x > 0 has a solutionin R, so w = s + 0t does it.

• y 6= 0. Then s, t 6= 0, and plugging t = y/2s into s2 − t2 = x givess4 − y2/4 = xs2. This is a biquadratic equation that can be solved easily;first, we put u = s2 and get u2 − xu − y2/4 = 0, hence u1,2 = 1

2 (x ±√x2 + y2 ). Since we want to extract the square root of u, we must make

sure that the solution we get is nonnegative. But x +√

x2 + y2 > 0because |x| <

√x2 + y2. Thus we set

s =

√x +

√x2 + y2

2,

then solve 2st = y for t, and then verify that w = s + it does what wewant.

This proves that square roots of complex numbers always exist. Moreover, ifw2 = z, then also (−w)2 = z, so for z 6= 0 there are always at least two squareroots. But there can’t be more than two: If w2

k = z for k = 1, 2, then w21 = w2

2,hence (w1 − w2)(w1 + w2) = 0. Since C is a field, a product is 0 if and only ifone of its factors is 0, and this implies w1 = w2 or w1 = −w2.

Note that we have used the uniqueness of positive square roots in R to provethat

√ab =

√a√

b. In C, not only does our proof not carry over, not even theresult does:

−1 =√−1√−1 =

√(−1) · (−1) =

√1 = 1,

where the first and the last two equality signs hold without doubt, showing thatthe second equality sign does not hold in this case.

3.3 The p-adic numbers Qp

The importance of the construction of the reals by ‘adjoining’ the limits ofCauchy sequences to Q lies with the fact that the same method can be used toconstruct other fields containing Q that are fundamentally different from thereal numbers (or, if you don’t like number theory: there is a whole branch of

51

topology dealing with generalizing the above to the construction of completionsof uniform spaces; uniform spaces are topological spaces that behave very muchlike metric spaces, and in fact every metric space is uniform). Actually, theconstruction itself is exactly the same, the only difference being that the absolutevalue | · | is replaced by other functions with similar properties. Let us introducethese functions now.

Fix a prime number p ≥ 2; then every nonzero rational number x can bewritten uniquely in the form x = pay, where a is an integer and y = r

s is afraction with p - rs (we have extracted the power of p from x). Now we define|x|p = p−a and |0|p = 0. It is seen directly from the definition that |x|p is smallif and only if the numerator of x (when written in lowest terms) is divisible bya high power of p. In particular, |p|p = 1

p , |p2|p = 1p2 etc.

This p-adic valuation shares the following properties with the usual absolutevalue:

Proposition 3.25. For x, y ∈ Q, we have

1. |x|p ≥ 0, with equality if and only if x = 0;

2. |xy|p = |x|p|y|p;

3. |x + y|p ≤ |x|p + |y|p; in fact, | · |p satisfies the stronger (ultrametric)triangle inequality |x + y|p ≤ max {|x|p, |y|p}.

Proof. Exercise.

Thus we have a whole family of absolute values, and in order to be able tomake statements about all of them at the same time we put | · |∞ := | · |.

Using any absolute value with this property, we can define a distance onQ×Q by putting

dp(x, y) := |x− y|p.

The usual distance between rational numbers is given by d∞(x, y) = |x− y|.This ‘distance function’ is called a metric: it has the following properties:

Proposition 3.26. For all x, y, z ∈ Q we have

1. dp(x, y) ≥ 0, with equality if and only if x = y;

2. dp(x, y) = dp(y, x);

3. dp(x, z) ≤ dp(x, y) + dp(y, z).

The simple proof is left as an exercise. We say that any of these functionsdp make Q into a metric space.

Now the definition of a Cauchy sequence makes sense in any metric space:a sequence (an) is Cauchy if for every ε > 0 there is an integer N such thatdp(am, an) < ε for all m,n > N .

In fact, for p = ∞, this is the usual definition of Cauchy sequences. Forp 6=∞, these Cauchy sequences are not at all like the Cauchy sequences we are

52

used to: for example, 1, p, p2, p3, . . . is a Cauchy sequence with respect to dp; infact, this sequence converges to 0.

Question: does every Cauchy sequence with respect to dp converge in Q?Well, the answer is no, and the proof isn’t hard using a little number theory. Asa matter of fact, not even Cauchy sequences in Z converge necessarily wheneverp 6=∞

Now we can construct the completion of Q with respect to any of thesefunctions dp by looking at equivalence classes of Cauchy series modulo Cauchyseries converging to 0, and the result turns out to be a field for every p: forp =∞, we get back R, and for primes p we find the field Qp of p-adic numbers.

Just as real numbers have a decimal expansion, p-adic numbers can be rep-resented as

a−np−n + . . . + a−1p−1 + a0 + a1p + a2p

2 + . . . =∑

k≥−n

akpk

with ak ∈ {0, 1, . . . , p − 1}. These fields are very important fields in numbertheory. It can be shown, for example, that for odd primes p not dividing aninteger a, the field Qp contains

√a if and only of a is a quadratic residue modulo

p.

3.4 Bolzano-Weierstrass

The theorem of Bolzano-Weierstrass asserts that any bounded sequence has aconvergent subsequence. The assumption that the sequence is bounded cannotbe removed: an = n defines a sequence not containing a convergent subsequence.On the other hand, there are sequences containing a whole lot converging sub-sequences: the sequence 1, 1

2 , 2, 3, 1, 13 , 1

4 , . . . by which we enumerated thepositive rationals has a subsequence converging to any given p

q : just take p+1q+1 ,

2p+12q+1 , 3p+1

3q+1 , . . . .

Theorem 3.27. 4 Every sequence (sn) of real numbers contains a monotonesubsequence.

Proof. We call the term sn dominant if it is greater than every term that follows,i.e. if sn > sm for all m > n. Now there are two cases:

1. (sn) contains infinitely many dominant terms sn1 , sn2 , . . . ; then the sni

form a monotone decreasing subsequence of (sn).

2. (sn) contains only finitely many dominant terms. Then there is an N ∈ Nsuch that there are no dominant terms sn with n > N . Pick any n1 > N .Since sn1 is not dominant, there is an n2 > n1 such that sn1 ≤ sn2 . Forthe same reason, there is an s3 > s2 such that sn2 ≤ sn3 . Continuing inthis way we get a monotone increasing subsequence (sni

).

4Ross, Thm. 11.3.

53

Theorem 3.28 (Bolzano-Weierstrass). Every bounded sequence of real num-bers has a convergent subsequence.

Proof. Since the sequence is bounded, it contains a monotone subsequence. Thissubsequence is Cauchy by Theorem 2.15, hence it converges.

Remark. This result was frequently ascribed to Weierstrass until it was dis-covered that it already had been published by Bolzano in 1817.

Let us sketch a second proof for the theorem of Bolzano-Weierstrass. Assumethat a1 < sn < b1 for all n ∈ N and constants a1, b1 ∈ R (these exist becausethe sequence is bounded).

Now [a1, b1] contains infinitely (namely all) sn. If we split this interval intotwo parts, at least one of them must contain infinitely many sn. If [a1,

12 (a1 +

b1)] contains infinitely many points, we put a2 = a1 and b1 = 12 (a1 + b1),

otherwise we put a2 = 12 (a1+b1) and b2 = b1. Repeating this game we construct

inductively a sequence of numbers an < bn such that [an, bn] contains infinitelymany sn. Clearly (an) is monotone increasing and bounded, whereas (bn) ismonotone decreasing and bounded. Thus (an) and (bn) converge. Moreover,|an+1 − bn+1| < 1

2 |an − bn|, hence (an − bn) converges to 0, and we concludethat lim an = lim bn.

Now where’s our subsequence? Well, just pick some sn1 ∈ [a1, b1], sn2 ∈[a2, b2] with n2 > n1, . . . (each of these intervals contains infinitely many sn, sothis is possible). Then ak ≤ snk

≤ bk, hence snkconverges (to lim ak) by the

sandwich theorem.The theorem of Bolzano-Weierstrass also holds for complex numbers.

Theorem 3.29. Each bounded sequence of complex numbers contains a con-verging subsequence.

Proof. Let (zn) be a bounded sequence of complex numbers. Then zn = xn+iyn,and since |zn| < M for some constant M , we find x2

n +y2n < M2, so in particular

|xn|, |yn| < M . The first idea now is to apply the real Bolzano-Weierstrass to thesequences (xn) and (yn), but this doesn’t work: if (xni

) and (ynk) are converging

subsequences, then xni + yni need not be a subsequence of (zn); for example,we might have xn1 = x1 and yn1 = y2.

Thus we have to be a little more careful: the real Bolzano-Weierstrass givesus a converging subsequence (xni

). Now consider the sequence (yni) defined by

zni= xni

+ yni. This is a bounded sequence of real numbers, hence contains

a converging subsequence ynik. But now znki

= xnki+ ynki

is a convergentsubsequence of (zn) because the real and imaginary parts converge.

Instead of applying the real Bolzano-Weierstrass we also could have usedthe idea of our second proof: In the real case, we kept on dividing the interval[a1, b1] into smaller and smaller parts. In the complex case, we have a rectangle[a1, b1]× [c1, d1] containing the real and imaginary parts of (zn): a1 < xn < b1

and c1 < yn < dn. This follows easily from the fact that (zn) is bounded. But

54

now we divide this rectangle into four equal parts; at least one of them containsinfinitely many (zn), and we take this as our second rectangle. The rest of theproof now goes through easily.

3.5 Absolute Convergence

Consider the series

S = 1− 12

+13− 1

4+

15−+ . . . .

Since this is an alternating series whose terms converge to 0, it converges, and itis immediately clear that 0 < S < 1: indeed, each of the terms 1− 1

2 , 13 −

14 etc

is positive, and each of the terms − 12 + 1

3 , − 14 + 1

5 etc. is negative. As a matterof fact it can be shown that S = log 2, where log is the natural logarithm.

Now we consider 2S and rearrange a bit:

2S =21− 2

2+

23− 2

4+

25− 2

6± . . .

=(2

1− 2

2

)− 2

4+

(23− 2

6

)− 2

8+

(25− 2

10

)− 2

12± . . .

= 1− 12

+13− 1

4+

15− 1

6± . . .

= S,

and since S 6= 0, we can cancel and get 2 = 1. Note that we have not cheated:every summand was taken care of in the rearrangement! In fact, the 2

m and the2

2m with m odd form the pairs ( 2m −

22m ) = 1

m , whereas the 2m with m divisible

by 4 end up as the negative terms in the rearrangement.Once you know that 1 = 2, you can prove a whole lot of things; it is known

that if A and B are finite sets with A ⊆ B and #A = #B, then A = B. Usingthat (and 1 = 2) you can prove that you are the Pope by considering the setsA = {you} and B = {you, Pope}. Clearly A ⊆ B, #A = 1 = 2 = #B, henceA = B qed.

Similarly you can prove that any theorem is true if 1 = 2, which means thatthe whole of mathematics would be utterly useless. Now fortunately our ‘proof’above isn’t a proof at all: while we may rearrange (finite) sums a1 + . . . + an

by using the laws of associativity and commutativity that we have proved, wehave proved no such thing for series! In fact we shall show below that we mayrearrange series if and only if they are absolutely convergent.

A series∑

an is called absolutely convergent if∑|an| converges. Clearly

the sequence above is not absolutely convergent since∑|(−1)n+1 1

n | =∑

1n is

the harmonic series that we know to be divergent. Converging series that arenot absolutely convergent are called conditionally convergent. Now we can statethe following two theorems:

55

Theorem 3.30. If (an) is a sequence, (bn) a rearrangement of (an), and if∑an is absolutely convergent, then so is

∑bn, and we have

∑an =

∑bn.

Theorem 3.31 (Riemann). If∑

an is conditionally convergent, then for anyreal number s ∈ R there is a rearrangement (bn) of (an) such that

∑bn = s.

We call (bn) a rearrangement of (an) if there is a bijection π : N −→ N withan = bπ(n) (what this means is the sets {an : n ∈ N} and {bn : n ∈ N} are equal,but that their indices are permuted, so the terms appear in a possibly differentorder.

Proof of Thm. 3.30. Let sn =∑n

k=1 an and tn =∑n

k=1 bn denote the partialsums of the two series. Let ε > 0; since

∑an converges absolutely, there is an

N ∈ N such that∞∑

k=N+1

|ak| <ε

2.

In particular,

|sN − s| =∣∣∣ ∞∑

k=N+1

ak

∣∣∣ ≤ ∞∑k=N+1

|ak| <ε

2.

Let π : N −→ N be the bijection such that an = bπ(n), and put M = max {f(1),. . . , f(N)}. Then we have

{a1, . . . , aN} ⊆ {b1, . . . , bM}.

Now for any m ≥M , the sum tm − sN contains only ak’s with k > N . Thus

|tm − sN | ≤∞∑

k=N+1

|ak| <ε

2.

This implies|tm − s| ≤ |tm − sN |+ |sN − s| < ε

for all m ≥ N , and this shows that (bn) converges absolutely to lim tm = s.

Proof of Thm. 3.31. Assume that∑

an converges, but not absolutely; definetwo sequences (rn) and (sn) by

rn =

{an if an ≥ 00 if an < 0,

and sn =

{0 if an ≥ 0−an if an < 0.

We claim that∑

rn and∑

sn both diverge. In fact, if they both would con-verge, then so would

∑rn −

∑sn =

∑|an|. If

∑rn would converge and

∑sn

wouldn’t, then∑k

n=1 an =∑k

n=1 rn −∑k

n=1 sn would show that∑

an wouldnot converge, contrary to assumption.

Thus both∑

rn and∑

sn diverge, and since rn, sn ≥ 0, the subsequences∑kn=1 rn and

∑kn=1 sn are monotone,

∑rn and

∑sn can’t be bounded.

56

The idea behind Riemann’s proof is now very simple: assume you want torearrange the an in such a way that the resulting series converges to a givenreal number r ≥ 0 (the case r ≤ 0 follows upon replacing an by −an). Here’swhat you do: there is an index N1 such that B1 =

∑N1n=1 rn > r because

∑rn

is not bounded; now you subtract terms s1, s2, . . . , sM1 until the sum becomessmaller than r, then you add terms rN1+1, . . . , rN2 until it becomes larger thanr, and so on. This way you get a rearrangement b1 = r1, . . . , bN1 = rN1 ,bN1+1 = −s1, . . . , bN1+M1 = −sM1 , bN1+M1+1 = rN1+1, . . . , bN2+M1 = rN2 ,bN2+M1+1 = −sM1+1, . . . , and all you have to do is show that the correspondingseries

∑bn converges to r.

What does this mean? It means that for every ε > 0 there is an index Nsuch that |r −

∑Nn=1 bn| < ε. [DO IT].

Summary

In this chapter, we have constructed the real numbers and proved a lot of impor-tant theorems: R is a complete ordered field, R is not countable, the existenceof square roots in R and C, the theorem of Bolzano-Weierstrass, the existenceof supremum and infimum of bounded sets of real numbers, and the fact thatwe may rearrange series if and only if they converge absolutely.

57

Chapter 4

Continuous Functions

In this chapter, we shall study functions f : I −→ R defined on intervals I. Ingeneral, f will denote the function and f(x) its value at x.

4.1 Continuity

Let a < b be real numbers and I = (a, b) be an open interval; a function I −→ Ris said to be continuous at x0 ∈ I if for every ε > 0 there is a δ > 0 such that|f(x)− f(x0)| < ε whenever |x−x0| < δ. This means that f(x) should be closeto f(x0) if x is close to x0.

We say that f is continuous on I if it is continuous at every x ∈ I.The function

f : (−1, 1) −→ R : x 7−→

{−1 if − 1 < x < 0+1 if 0 ≤ x < 1

is not continuous at x0 = 0: we have |f(x) − f(x0)| = |f(x)| = 2 for anyx ∈ (−1, 0), so e.g. for ε = 1 we can make δ as small as we want without everachieving |f(x)− f(x0)| < ε.

Here’s the only example of a continuous function for which we shall applythe definition of continuity directly:

Lemma 4.1. The functions f(x) = 1 and g(x) = x defined on any interval Iare continuous on I.

Proof. Let x0 ∈ I = (a, b) and ε > 0 be given. We claim that δ = min{ε, 12 |x0−

a|, 12 |x0 − b|} does it (the last two elements make sure that we are working

entirely within I). In fact, if |x − x0| < δ, then |f(x) − f(x0)| = 0 < ε and|g(x)− g(x0)| = |x− x0| < δ ≤ ε.

In general, we use the ε-δ-criterion for verifying continuity as often as weuse the Cauchy criterion for checking convergence: almost never. What we’ll

58

do is prove a couple of results on continuous functions that allows us to reduceeverything to Lemma 4.1.

To this end, we have to introduce some structure on the set FI of functionsf : I −→ R defined on an interval I. First, we can introduce an addition onFI by letting f + g denote the function defined by (f + g)(x) = f(x) + g(x).Similarly, (fg)(x) := f(x)g(x). With these operations, FI is a ring whose zeroelement is the function that vanishes identically on I, and whose unit elementis the constant function f : x 7−→ 1.

Proposition 4.2. Let f : I −→ R be a realvalued function on the open intervalI. Then the following conditions are equivalent:

1. f is continuous at x ∈ I;

2. For any converging sequence (xn) with xn ∈ I and limxn = x we havelim f(xn) = f(x).

Proof. Assume that f is continuous at x0, and let (xn) be a sequence of elementsxn ∈ I converging to x. We have to show that f(xn) converges to f(x). Thismeans that, given some ε > 0, we must find an N such that |f(xn)− f(x)| < εfor all n > N .

Here’s what we know: given any ε > 0, there is a δ > 0 such that |f(ξ) −f(x)| < ε whenever |ξ − x| < δ and ξ ∈ I. This is almost what we want: allwe have to do is make sure that the xn above differ at most by δ from x. Butthat is easy: since xn converges to x, there is an N such that |xn − x| < δ forall n > N . For these n, we have |f(xn)− f(x)| < ε from continuity.

Now assume that the sequence (f(xn)) converges to f(x) for any sequencexn converging to x. We have to show that f is continuous at x, that is, givenε > 0 we have to find δ > 0 such that |f(ξ) − f(x)| < ε whenever |ξ − x| < δ.We prove this by contradiction.

We know that f is continuous at x if

∀ε > 0 ∃δ > 0 ∀ξ ∈ (x− δ, x + δ) : |f(ξ)− f(x)| < ε.

Thus f is not continuous at x if1

∃ε > 0 ∀δ > 0 ∃ξ ∈ (x− δ, x + δ) : |f(ξ)− f(x)| ≥ ε.

Fix an ε > 0 for which this is true. For each positive δ = 1n we can find a

real number ξn ∈ [x − 1n , x + 1

n ] such that |f(ξn) − f(x)| ≥ ε. We clearly havelim ξn = x, hence lim f(ξn) = f(x) by assumption; but this contradicts theinequality |f(ξn)− f(x)| ≥ ε.

1Let P (x) be a property of x (say ‘x is even’) and let Q(x) be its negation (that is, ‘x isnot even); then the negation of ∀x ∈ N : P (x) (x is even for all x) is ∃x ∈ N : Q(x) (there issome x such that x is not even’); similarly, the negation of ∃x ∈ N : P (x) is ∀x ∈ N : Q(x).Applying this observation repeatedly shows that the negation of ∀x ∃y : P (x) is ∃x ∀y : Q(x)etc.

59

If f(xn) converges to f(ξ) for every sequence (xn) in I converging to ξ, wewrite f(ξ) = limx→ξ f(x). In particular, continuity at ξ = lim xn means thatf(lim xn) = lim f(xn).

Theorem 4.3. The set C0I of continuous function on an open interval I formsa ring with respect to addition and multiplication; in fact, CI is a subring of FI .For any x0 ∈ I, the evaluation map f 7−→ f(x0) is a ring homomorphism (infact an R-homomorphism) C0I −→ R.

Proof. We have to show that if f, g ∈ C0I , then so is f + g, in other words: if fand g are continuous functions on I, then so is f + g. Assume therefore thatf and g are continuous, and let an x0 ∈ I and a ε > 0 be given. We have tomake |(f + g)(x)− (f + g)(x0)| small; it should be clear how to proceed: thereexist δf > 0 and δg > 0 such that |f(x) − f(x0)| < ε/2 for all x ∈ I with|x − x0| < δf , and |g(x) − g(x0)| < ε/2 for all x ∈ I with |x − x0| < δg. Nowput δ = min {δ1, δ2}; then

|(f + g)(x)− (f + g)(x0)| = |f(x)− f(x0) + g(x)− g(x0)|≤ |f(x)− f(x0)|+ |g(x)− g(x0)|

<ε

2+

ε

2= ε

whenever x ∈ I and |x− x0| < δ. Thus f + g is continuous at any x0 ∈ I.This proof was an exercise in using the ε-δ criterion. Using limits, everything

becomes much easier: Assume that f and g are continuous ar x0. Let (xn) bea sequence in I converging to x0. Then

lim(f + g)(xn) = lim[f(xn) + g(xn)] = lim f(xn) + lim g(xn)= f(x0) + g(x0) = (f + g)(x0).

Nice, huh? The proof that fg is continuous is just as easy. Finally, commu-tativity, associativity and distributivity are inherited from the ring of functionson I.

Thus e.g. f(x) = 2x2 + 7 is continuous on any interval I because f(x) =2g(x) · g(x) + 7 · 1 is formed by adding and multiplying functions that arecontinuous on I.

Proposition 4.4. Assume that f and g are functions I −→ R continuous atx0 ∈ I. If g(x0) 6= 0, then there is an interval J ⊆ I containing x0 such thatf/g is defined on J and continuous at x0.

Proof. We want to show that if g(x0) 6= 0, then actually g(x) 6= 0 for all x ∈ J ,where J is some interval containing x0.

Assume that this is wrong. Then each interval [x0− 1n , x0 + 1

n ] contains a ξn

with g(ξn) = 0. Clearly lim ξn = x0, and since g is continuous at x0, we haveg(x0) = lim g(xn) = lim 0 = 0: contradiction.

Now f/g is a function J −→ R. We claim that f/g is continuous atx0. Let (xn) be a sequence in J with limit x0. Then lim f(xn)/g(xn) =

60

lim f(xn)/ lim g(xn) = f(x0)/g(x0) by the limit theorems and the fact thatf and g are continuous at x0.

Theorem 4.5. If f : I −→ R and g : J −→ R are functions such that f(I) ⊆ J ,then g ◦ f is continuous if f and g are.

Proof. Let x0 ∈ I; then y0 = f(x0) ∈ J . We have to prove that if f is continuousat x0 and g is continuous at y0, then g ◦ f is continuous at x0. Let (xn) be asequence in I converging to x0. Then (f(xn)) converges to y0 = f(x0) becausef is continuous at x0. But now g is continuous at y0, and (f(xn)) is a sequenceconverging to y0, hence (g(f(xn))) converges to g(y0) = g(f(x0)).

4.2 Properties of Continuous Functions

The results above form the formal part of the investigation of continuous func-tions. Now we get to more interesting theorems.

Continuous functions on open intervals need not be bounded: f(x) = 1x is

continuous on (0, 1), but clearly not bounded. Moreover, f does not assume amaximum or minimum on (0, 1). Continuous functions on closed intervals, onthe other hand, have these properties:

Theorem 4.6. 2 Let f : [a, b] −→ R be a real-valued continuous function. Thenf is bounded. Moreover, f assumes a maximum and a minimum on [a, b].

Proof. Assume that f is not bounded on I = [a, b]. Then for each n ∈ N theremust be an xn ∈ I with f(xn) > n. Bolzano-Weierstrass gives us a convergentsubsequence (ξn) of (xn); put x = lim ξn. Since a ≤ ξn ≤ b, we must havea ≤ x ≤ b (here the proof would go wrong if we started with an open interval).Now f is continuous at x, hence f(x) = lim f(ξn). But f(ξn) > n, so thesequence f(ξn) does not converge: contradiction.

Now f is bounded. This implies that the set S = {f(x) : x ∈ I} is a boundedset of real numbers. Such sets have a supremum; let M = sup S. For each n ∈ N,the number M − 1

n can’t be an upper bound of S because M is the least upperbound. Thus there must exist some yn ∈ I with M − 1

n < f(yn) ≤ M . Bydefinition of convergence, this implies lim f(yn) = M . By Bolzano-Weierstrass(you’re beginning to see how important this result is), there is a convergentsubsequence (ηn) of (yn); put y = lim ηn. Again, y ∈ I, and since f is continuousat y, we find f(y) = lim f(ηn). But f(ηn) is a subsequence of the convergingsequence f(yn), hence lim f(ηn) = lim f(yn) = M . But now we just have provedthat M = f(y), that is: the supremum is actually a maximum!

Replacing f by −f and applying the result we just proved, it follows that falso has a minimum because the maximum of −f is the minimum of f .

Theorem 4.7 (Intermediate Value Theorem). 3 Let f : I = [a, b] −→ Rbe a continuous functions. If f(a) < y < f(b), then there exists an x ∈ (a, b)such that f(x) = y.

2Ross, Thm. 18.1.3Ross, Thm. 18.2.

61

Proof. Put S = {x ∈ I : f(x) < y} ; then S 6= ∅ since a ∈ S. Since S isbounded (by a and b), it has a supremum: put ξ = supS. Since b is an upperbound for S and ξ is the least upper bound, we have ξ ≤ b. Moreover, a ∈ Simplies ξ ≥ a. Thus ξ ∈ I. Finally, we can’t have ξ = a or ξ = b since f(a) < yand f(b) > y.

Now for each n ∈ N. ξ − 1n is not an upper bound for S. Therefore there

exists an sn ∈ S with ξ− 1n < sn ≤ ξ. This implies lim sn = ξ. Since f(sn) < y,

we have lim f(sn) ≤ y, and continuity gives f(ξ) ≤ y.It remains to show that f(ξ) ≥ y. To this end, put tn = min(ξ + 1

n , b);thentn ∈ I, and (tn) converges to ξ. Since tn /∈ S, we have f(tn) ≥ y; since f iscontinuous at ξ, we find f(ξ) = lim f(tn) ≥ y.

A function f : I −→ R is called strictly increasing on I if for every pairx, x′ ∈ I with x < x′ we have f(x) < f(x′).

Lemma 4.8. Strictly increasing functions are injective.

Proof. Take x, x′ ∈ I and assume that x 6= x′; we want to show that f(x) 6=f(x′). But x 6= x′ implies x < x′ or x′ < x; since f is strictly increasing, thisimplies f(x) < f(x′) or f(x′) < f(x). In either case, f(x) 6= f(x′).

Now we can show that strictly increasing continuous functions have an in-verse with respect to composition:

Theorem 4.9 (Existence of inverse function). 4 Let f : I −→ R be astrictly increasing continuous function. Then f(I) is an interval, and thereexists a function f−1 : f(I) −→ I such that f−1 ◦ f is the identity map on I.The function f−1 is strictly increasing and continuous.

Proof. Let I = [a, b]; we claim that f(I) = [f(a), f(b)]. Note that a < b impliesf(a) < f(b), so [f(a), f(b)] really is an interval. For proving the claim we haveto show that if f(a) < y < f(b), then y = f(x) for some x ∈ I. But this is theintermediate value theorem.

By the lemma, f is injective, hence f : I −→ f(I) is a bijection, and we candefine a function f−1 : f(I) −→ I by writing y ∈ f(I) as y = f(x) and mappingy 7−→ x.

We have to show that f−1 is strictly increasing and continuous. Assumethat y < y′ for y, y′ ∈ f(I). Write y = f(x) and y′ = f(x′); If we hadx = f−1(y) ≥ f−1(y′) = x′, then we would also have y = f(x) ≥ f(x′) = y′:contradiction.

Finally, let us prove that f−1 is continuous at η ∈ f(I). There are two waysto do this: we can use the limit definition or the ε− δ-definition of continuity.

First Proof. Let’s use the limit definition first. Let (yn) be a sequence in Jwith yn −→ η. We have to show that xn := f−1(yn) converges to ξ := f−1(η).Assume ot doesn’t; then

∃ ε > 0∀ N ∈ N ∃n > N : |f−1(yn)− f−1(η)| ≥ ε.

4Ross, Thm. 18.4.

62

Pick an ε > 0 with this property; then there is an n1 > 1 such that |f−1(yn1)−f−1(η)| ≥ ε; next there is an n2 > n1 such that |f−1(yn2)−f−1(η)| ≥ ε, and byrepeating this construction we get a subsequence (ynk

) of (yn) with the propertythat

|f−1(ynk)− f−1(η)| ≥ ε for all k ∈ N. (4.1)

Then f−1(ynk) is a sequence in I, hence bounded, so Bolzano-Weierstraß gives

us a converging subsequence (ynkl) such that f−1(ynkl

)) converges. Let x ∈I denote the limit. Then evaluating f at x = lim f−1(ynkl

) gives f(x) =f(lim f−1(ynkl

)). But f is continuous, hence

f(lim f−1(ynkl)) = lim f(f−1(ynkl

)) = lim ynkl= y.

Thus f−1(ynkl) converges to x = f−1(y), contradicting (4.1): in fact, if all

f−1(ynk) differ from f−1(y) by at least some fixed ε > 0, then there can’t be a

subsequence converging to f−1(y).

Second Proof.The second proof uses the ε−δ-criterion. We use some facts we already have

proved: that f−1 is a strictly increasing function mapping an interval J onto aninterval I (that is, f−1(J) = I). The claim thus will follow once we have proved

Proposition 4.10. If g is a strictly increasing function mapping an interval Jonto another interval I, then g is continuous on g.

Proof. Write I = [a, b] and J = [c, d]; since g : J −→ I is strictly increasing, weactually have a = g(c) and b = g(d). Pick η ∈ J ; we want to show that g iscontinuous at η. To this end, we first assume that η 6= c, d. Since g is strictlyincreasing, this implies g(η) 6= a, b. Thus there exists some ε0 > 0 such that[g(η)− ε0, g(η) + ε0] ⊆ I.

Now let ε > 0; we have to show that |g(y) − g(η)| < ε for |y − η| < δ; it isclearly sufficient to prove this for small ε, so we are free to assume ε < ε0.

Since g maps J onto I, there exist y1, y2 ∈ J such that g(y1) = g(η) − εand g(y2) = g(η) + ε. Since g is strictly increasing, we have y1 < η < y2. Nowwhenever y1 < y < y2, we have g(y1) < g(y) < g(y1), hence

g(η)− ε = g(y1) < g(y) < g(y2) = g(η) + ε.

But this implies |g(y) − g(η)| < ε. Thus we have proved the inequality wewanted for all y such that y1 < y < y2. Thus if we put δ = min{y− y1, y2 − y},we see that |y − η| < δ implies y − η < y2 − y, hence 2y < η + y2 < 2y2 and soy < y2; similarly, y − η > y1 − y implies 2y > y1 + η > 2y1, so y > y1. Thus|y − η| < δ implies y1 < y < y2, which in turn implies |g(y)− g(η)| < ε.

63

4.3 Uniform Continuity

Consider the function f : R −→ R : x 7−→ x2. Let us look at the proof that fis continuous. Given ξ ∈ R, we have to find for every ε > 0 a δ > 0 such that|x−ξ| < δ implies |f(x)−f(ξ)| < ε. Now f(x)−f(ξ) = x2−ξ2 = (x−ξ)(x+ξ);if we assume that |x−ξ| < δ, then we can bound |x+ξ| by |x+ξ| = |x−ξ+2ξ| ≤|x− ξ|+ 2|ξ| < δ + 2|ξ|, and we get |f(x)− f(ξ)| < δ(δ + 2|ξ|).

Can we make the right hand side small by choosing δ small enough? Cer-tainly, as long as ξ is fixed, because taking

δ = min{

1,ε

1 + 2|ξ|

}gives δ + 2|ξ| < 1 + 2|ξ| and

|f(x)− f(ξ)| < δ(δ + 2|ξ|) < ε

as desired.Note, however, that our choice of δ depends on ξ: we don’t have a uniform

choice valid for all ξ, and the larger |ξ|, the smaller we have to make our δ.If f : I −→ R is a function for which such a uniform choice of δ is possible,

then we say that f is uniformly continuous on I. In other words: f is uniformlycontinuous on I if

∀ ε > 0 ∃ δ > 0 ∀ x, ξ ∈ I, |x− ξ| < δ : |f(x)− f(ξ)| < ε.

Compare this with the definition of ‘regular’ continuity on I: f is continuouson I if

∀ ξ ∈ I ∀ ε > 0 ∃ δ > 0 ∀ x ∈ I, |x− ξ| < δ : |f(x)− f(ξ)| < ε.

The following result is important for defining the Riemann integral (it is alsoimportant for showing that functions like exp, sin, cos, log defined by powerseries are continuous):

Theorem 4.11. If f : I −→ R is continuous on the closed interval I = [a, b],then f is uniformly continuous on I.

Proof. Assume not. Then

∃ ε > 0 ∀ δ > 0 ∃ x, ξ ∈ I, |x− ξ| < δ : |f(x)− f(ξ)| ≥ ε.

Pick such an ε > 0; then for each δ = 1n (you see we are constructing sequences;

expect to see Bolzano-Weierstrass soon) there exist xn, ξn ∈ I such that |xn −ξn| < 1

n and|f(xn)− f(ξn)| ≥ ε. (4.2)

By Bolzano-Weierstrass there is a convergent sbsequence (xnk) of (xn). Since

a ≤ xnk≤ b, we have a ≤ limxnk

≤ b (here we use the fact that I is closed).

64

Since (xn − ξn) is a null sequence, we have lim xnk= lim ξnk

=: z. Using thefact that f is continuous we get

f(z) = lim f(xnk), f(z) = lim f(ξnk

).

Thus f(xnk)− f(ξnk

) converge to 0, contradicting (4.2).

65

Chapter 5

The Riemann Integral

Consider a function f : I −→ R on a closed interval I = [a, b]. The integral off defines the ‘area’ between the graph of the function and the x-axis. Thereare several ways of defining integrals: the most powerful among the simpledefinition is that of the Riemann integral, and even for them there is a varietyof almost equivalent definitions. There are integrals that are more powerful thanRiemann’s, in particular the Lebesgue integral (which, in some sense, can beshown to be the most powerful); the Lebesgue integral is more powerful thanRiemann’s because every function that can be integrated by a Riemann integralcan also be integrated by Lebesgue’s, but not conversely: Dirichlet’s function

f(x) =

{1 if x is rational0 if x is irrational

has Lebesgue integral 0 on the interval I = [0, 1], but is not Riemann integrable.In any case, all definitions of integrals yield the same results for the class

of continuous functions, so unless you are interested in functions with lots ofdiscontinuities, the Riemann integral is fine.

5.1 Riemann Sums

Here we want to motivate the definition of the integral of a function f : [a, b] −→R. We do this by writing down a couple of properties such an integral shouldhave if it is to give us the area beneath the graph of a function.

No matter how the area is defined, if we add the area below f on [a, b] andthe area on [b, c] we should get the area on [a, c]: so what we want is[INT1]

∫ b

af(x)dx +

∫ c

bf(x)dx =

∫ c

af(x)dx.

We also want the integral of the function k · f(x) to be k times the integralof f(x):[INT2]

∫ b

ak · f(x)dx = k

∫ b

af(x)dx.

66

Finally, put mI(f) = inf{f(x) : x ∈ I} and MI(f) = sup{f(x) : x ∈ I}.Then we want the area of f on I = [a, b] to be bounded by (b − a)mI(f) frombelow and by (b− a)MI(f) from above:[INT3] (b− a)mI(f) ≤

∫ b

af(x)dx ≤ (b− a)MI(f).

The last condition determines the integral of constant functions: if f(x) = c

on I = [a, b], then mI(f) = MI(f) = c, hence [INT3] implies∫ b

af(x)dx =

(b− a)c.There is of course no guarantee that an integral with these three properties

exists. As a matter of fact, however, both the Riemann and the Lebesgueintegral have these properties, so we can’t expect that these properties determinethe integral.

The key to the definition of the Riemann integral is the observation that[INT3] gives good approximations if the interval I is small. How can you improvethe approximation if I is large? Well, you might try to divide the interval intosmaller parts: if I = [a, b], look at ‘partitions’ P = {a = t0 < t1 < t2 < . . . <tn = b}; by [INT1], our integral should satisfy∫ b

a

f(x)dx =∫ t1

t0

f(x)dx +∫ t2

t1

f(x)dx + . . . +∫ tn

tn−1

f(x)dx.

Now if we put mi(f) = sup{f(x) : ti ≤ x ≤ ti+1} (sometimes we shall alsowrite m(f, [ti, ti+1]) for mi(f)) and define Mi(f) accordingly, then [INT3] givesus the bounds

L(f, P ) = (t1 − t0)m0(f) + (t2 − t1)m1(f) + . . . + (tn − tn−1)mn(f)

≤∫ b

a

f(x)dx

≤ (t1 − t0)M0(f) + . . . + (tn − tn−1)Mn(f) = U(f, P ),

and if the intervals [ti, ti+1] are small, we expect that these bounds are muchbetter than those you get by applying [INT3] directly to f and [a, b].

But if we get better and better approximations by making our partitionsfiner and finer, then we would expect that the ‘true’ value of the integral issomething like the ‘limit’ of these approximations L(f, P ) and U(f, P ), at leastin the case when both lower and upper bounds tend to the same limit. Actually,we should not talk about limits here because partitions do not form a sequence:in fact, even the partitions of [a, b] consisting of three points {a, t1, b} are notcountable because t1 can be any real number in (a, b), and the real numbers inan nonempty interval can’t be counted!

We can, however, construct sequences of partitions by looking only at par-titions that are refinements of one another. Let us call a partition Q = {s0, s1,. . . , sn} of an interval [a, b] finer than a partition P = {t0, t1, . . . , tm} of [a, b] ifP ⊆ Q (as sets).

Lemma 5.1. If f : I −→ R is a bounded function, and if P and Q are partitionsof I with P ⊆ Q, then L(f, P ) ≤ L(f,Q) ≤ U(f,Q) ≤ U(f, P ).

67

Proof. The partition Q is finer than P ; this means that we can get Q from Pby ‘adding’ some points to P . Using induction it is therefore sufficient to provethe claim for

P = {a = t0 < t1 < . . . < tk < tk+1 < . . . < tn = b}

andQ = {a = t0 < t1 < . . . < tk < u < tk+1 < . . . < tn = b}.

The sums L(f, P ) and L(f,Q) don’t differ much; in fact, if we form the differ-ence, everything cancels except

L(f,Q)− L(f, P ) = m(f, [tk−1, u])(u− tk−1) + m(f, [u, tk])(tk − u)−m(f, [tk−1, tk])(tk − tk−1).

We have to show that this is nonnegative. To this end, observe that for sets∅ 6= S ⊆ T of real numbers, we have inf T ≤ inf S ≤ supS ≤ supT (assumingthese exist, e.g. in the case where T (and hence S) is bounded): in fact, sinceS ⊆ T , every lower bound for T is a lower bound for S, in particular the greatestlower bound for T is a lower bound for S, and so inf T ≤ inf S.

This shows that

m(f, [tk−1, tk])(tk − tk−1) = m(f, [tk−1, tk]){(tk − u) + (u− tk−1)}≤ m(f, [u, tk])(tk − u) + m(f, [tk−1, u])(u− tk−1)

This takes care of L(f, P ) ≤ L(f,Q). The inequality U(f,Q) ≤ U(f, P ) canbe proved similarly, or, if you want to, by reducing it to the above: in fact,sup{−f(x) : x ∈ I} = − inf{f(x) : x ∈ I} implies that L(−f, P ) = −U(f, P ),hence L(f, P ) ≤ L(f,Q) if and only if U(f,Q) ≤ U(f, P ).

The middle inequality finally is obvious in view of inf S ≤ supS for boundednonempty sets S.

Definition. A bounded function f : I = [a, b] −→ R (we need boundedness toensure that the suprema and infima involved in the definition of L(f, P ) andU(f, P ) exist; of course continuous functions on closed intervals are bounded,but we want to define our integral for functions that are not necessarily contin-uous) is called (Riemann) integrable if L(f) = U(f).

Example 1. Constant functions are integrable. In fact, if f(x) = c, thenwe expect the integral of f over I = [a, b] to be the area of the rectanglebounded by (a, 0), (b, 0), (b, c) and (a, c), namely A = c(a − b). And this iswhat we get: take any partition P = {a = t0 < t1 < . . . < tk = b}; thenmi(f) = sup{f(x) : ti ≤ x ≤ ti+1} = c since f is constant, and similarlyMi(f) = c, so

L(f, P ) =k∑

i=1

c(ti − ti−1) = c(t1 − t0) + c(t2 − t1) + . . . + c(tk − tk−1)

= c(tk − t0) = c(b− a).

68

Similarly, U(f, P ) = c(b− a) for any partition, hence L(f) and U(f) exist andboth equal c(b− a). This proves that

∫ b

af(x)dx = c(b− a) for f(x) = c.

Example 2. Consider f(x) = x. Here m(f, [ti, ti+1]) = ti and M(f, [ti, ti+1]) =ti+1. Let us use the partitions Pk defined by ti = a + i

k (b − a); then t0 = a,t1 = a + b−a

k , . . . , tk = a + (b− a) = b, hence ti − ti−1 = b−ak , and

U(f, Pk) =k∑

i=1

ti(ti − ti−1) =k∑

i=1

(a +

i

k(b− a)

)b− a

k

=b− a

k

( k∑i=1

a +b− a

k

k∑i=1

i

)=

b− a

k· ka +

(b− a)2

k2· k(k + 1)

2

= ba− a2 +12(b− a)2 +

12k

(b− a)2 =12(b2 − a2) +

12k

(b− a)2.

Clearly lim U(f, Pk) = 12 (b2 − a2). In particular, U(f) ≤ 1

2 (b2 − a2). The proofthat lim L(f, Pk) exists and has the same value is left as an exercise. What wemay conclude from the above is that L(f) ≥ 1

2 (b2 − a2) and U(f) ≤ 12 (b2 − a2),

inparticular L(f) ≥ U(f). This does, however, not yet prove that f(x) = x isintegrable because we have only considered a very special type of partitions; forproving that the integral of f exists we have to look at all partitions.

Since the number of partitions is not countable, this task is hopeless withoutthe help of a theorem that does this for us. What we would like to have is atheorem telling us that L(f) ≤ U(f). But isn’t this obvious? After all, we haveL(f, P ) ≤ U(f, P ) for every partition P .

Actually, what might happen is that there exist partitions P and Q suchthat L(f, P ) > U(f,Q): this is not excluded by the inequality above, and ifthat happened, L(f) = supL(f, P ) would be larger than U(f) = inf U(f, P ).Fortunately, this doesn’t happen. For a proof, we will need to compare L(f, P )and U(f,Q) for different partitions, and this is done by constructing a partitioncontaining both. But this is easy: if P and Q are partitions, then so is P ∪Q,and since P ⊆ P ∪Q and Q ⊆ P ∪Q, this partition is a common refinement ofP and Q.

The following result is exactly what we need:

Proposition 5.2. If f : I −→ R is a bounded function on I = [a, b], thenL(f) ≤ U(f).

In fact, if we know that L(f) ≤ U(f) from the lemma and L(f) ≥ U(f)by our calculation, then we can conclude that L(f) = U(f), hence f(x) = x isintegrable on [a, b], and

∫ b

af(x)dx = 1

2 (b2 − a2).

Lemma 5.3. If f : I −→ R is a bounded function on I = [a, b], and if P andQ are partitions of I, then L(f, P ) ≤ U(f,Q).

69

Proof. Applying Lemma 5.1 to P , P ∪Q and Q we get

L(f, P ) ≤ L(f, P ∪Q) ≤ U(f, P ∪Q) ≤ U(f,Q).

Proof of Prop. 5.2. Let P be a partition of I. For any partition Q of I, we haveL(f, P ) ≤ U(f,Q) by Lemma 5.3. Thus L(f, P ) is a lower bound for the setof U(f,Q), hence the greatest lower bound U(f) for this set can’t be smaller:L(f, P ) ≤ U(f).

This inequality now is true for any partition P ; thus U(f) is an upper boundfor the set of L(f, P ), which in turn means that the least upper bound L(f)can’t be larger than U(f), i.e., L(f) ≤ U(f).

This concludes the proof of∫ b

ax dx = 1

2 (b2 − a2).Next we prove a criterion for integrability that will come in handy in proofs:

Proposition 5.4. A bounded function f : I −→ R is integrable if and only iffor every ε > 0 there exists a partition P = {a = t0 < t1 < t2 < . . . < tn = b}of I = [a, b] such that U(f, P )− L(f, P ) < ε.

Proof. Note that we don’t need absolute values around U(f, P )−L(f, P ) sincewe know that this term is nonnegative.

Assume first that f has a Riemann integral on I, and let ε > 0 be given.Since L(f) − ε

2 is not an upper bound for the set of L(f, P ), there must be apartition P1 of I such that L(f, P1) > L(f) − ε

2 . Similarly, there is a partitionP2 such that U(f, P2) < U(f) + ε

2 .Now put P = P1 ∪ P2. Then applying Lemma 5.3 gives

U(f, P )− L(f, P ) ≤ U(f, P2)− L(f, P1)

< U(f) +ε

2− L(f) +

ε

2= U(f)− L(f) + ε,

but since f is integrable, we have L(f) = U(f). This implies the claim U(f, P )−L(f, P ) < ε.

Now suppose that for each ε > 0 there is a partition P of I with U(f, P )−L(f, P ) < ε. Then

U(f) ≤ U(f, P ) = U(f, P )− L(f, P ) + L(f, P )< ε + L(f, P ) ≤ ε + L(f).

Since this is true for any ε > 0, we must have U(f) ≤ L(f). But L(f) ≤ U(f)by Prop. 5.2, and our claim follows.

5.2 Main Properties of Riemann Integrals

The main theorems we shall prove about the Riemann integral is that every con-tinuous function is integrable, and that the Riemann integral has the properties[INT1] - [INT3] (as well as a few more).

Let us start with

70

Proposition 5.5. Integrable functions I −→ R form a ring.

Proof. Assume that f and g are integrable. Given ε > 0, we have to find apartitions P of [a, b] such that U(f + g, P )− L(f + g, P ) < ε.

Now we know that f is integrable, so there is a partition P1 of [a, b] suchthat U(f, P1) − L(f, P1) < ε/2; similarly, there is a partition P2 of [a, b] suchthat U(g, P2)− L(f, P2) < ε. Now consider P = P1 ∪ P2. Then

U(f, P )− L(f, P ) ≤ U(f, P1)− L(f, P1) <ε

2,

U(g, P )− L(g, P ) ≤ U(g, P2)− L(g, P2) <ε

2.

Now for any subinterval J of I we have (Exercise!)

inf{f(x) + g(x) : x ∈ J} ≥ inf{f(x) : x ∈ J}+ inf{g(x) : x ∈ J},

hence mJ(f+g) ≥ mJ(f)+mJ(g), which in turn implies L(f+g, P ) ≥ L(f, P )+L(g, P ). Similarly, we can prove U(f + g, P ) ≤ U(f, P ) + U(g, P ), hence

U(f +g, P )−L(fg, P ) ≤ U(f, P )+U(g, P )−L(f, P )−L(g, P ) < ε2 + ε

2 = ε.

Thus f + g is Riemann integrable. Moreover,∫ b

a

(f + g)(x)dx = U(f + g) ≤ U(f + g, P ) ≤ U(f, P ) + U(g, P )

< L(f, P ) + L(g, P ) + ε ≤ L(f) + L(g) + ε

=∫ b

a

f(x)dx +∫ b

a

g(x)dx + ε

and ∫ b

a

(f + g)(x)dx = L(f + g) ≥ L(f + g, P ) ≥ L(f, P ) + L(g, P )

> U(f, P ) + U(g, P )− ε ≥ U(f) + U(g)− ε

=∫ b

a

f(x)dx +∫ b

a

g(x)dx− ε,

so we conclude that∫ b

a(f + g)(x)dx =

∫ b

af(x)dx +

∫ b

ag(x)dx.

Since constant functions are Riemann integrable, the neutral element withrespect to additon, the zero function, is integrable. Moreover, if f is Riemannintegrable, then so is −f . In fact, we have

mJ(−f) = inf{−f(x) : x ∈ J} = − sup{f(x) : x ∈ J} = −MJ(f),

hence L(−f, P ) = −U(f, P ) and L(−f) = −U(f). Similarly, U(−f) = −L(f),hence L(f) = U(f) implies L(−f) = U(−f).

Now let us show that if f and g have Riemann integrals, then so does theirproduct fg. It is sufficient to prove this for g = f : in fact, if we know that f2

71

is integrable as long as f is, then 4fg = (f + g)2 − (f − g)2 is integrable sinceit is a sum (difference) of integrable functions.

Now consider U(f2, P ) and L(f2, P ) for partitions P of I. We know that

f(x)2 − f(y)2 = [f(x) + f(y)][f(x)− f(y)]

Our biggest source of integrable functions are the continuous functions:

Theorem 5.6. Every continuous function f : I = [a, b] −→ R is Riemannintegrable.

Proof. We want to apply Prop. 5.4, so let an ε > 0 be given. We have toconstruct a partition P of I such that U(f, P ) − L(f, P ) < ε. How can wedo this? We have to use the fact that f is continuous somehow. Continuousat x0 means that we can find a δ > 0 such that |f(x) − f(x0)| < ε whenever|x − x0| < δ. Thus we we have found a δ-interval around x0, but this is notexactly what we want; what we’d like to have is a whole lot of δ-intervals thatcover I = [a, b] because then we would have a partition of I. But as long as weexploit the usual continuity, our δ will depend on the choice of x0. Fortunately,continuous functions on closed intervals I are uniformly continuous, so there isa δ > 0 such that |f(x) − f(y)| < ε/(b − a) whenever x, y ∈ I and |x − y| < δ.Now we choose a partition P = {a = t0 < t1 < . . . < tn = n} such that thelengths of the intervals [tk, tk+1] are bounded by δ, i.e., such that tk+1 − tk < δfor k = 1, 2, . . . , n.

On each Ik = [tk, tk+1], the function f assumes a maximum Mk (at xk ∈ Ik,say) and a minimum mk (at yk ∈ Ik), and then Mi − mi = f(xi) − f(yi) <ε/(b− a) since |xi − yi| < δ. But then

U(f, P )− L(f, P ) <n∑

k=1

ε

b− a(tk − tk+1) =

ε

b− a(tn − t0) = ε,

and therefore f is Riemann integrable.

Proposition 5.7. If f is integrable, then so is |f |, and we have∣∣∣ ∫ b

af(x)dx

∣∣∣ ≤∫ b

a|f(x)|dx.

Proof.

Proposition 5.8. If f : I = [a, b] −→ R is integrable, then F (x) :=∫ x

af(t)dt

defines a continuous function I −→ R.

Proof. We have to show that, given x0 ∈ I and ε > 0, there is a δ > 0 such that|F (x) − F (x0)| < ε whenever |x − x0| < δ. We clearly have F (x) − F (x0) =∫ x

x0f(t)dt; assume that x > x0. Since f is bounded, we have |f(t)| ≤ M for

t ∈ I, hence∣∣∣ ∫ x

x0f(t)dt

∣∣∣ ≤ ∫ x

x0|f(t)|dt ≤ M |x − x0|. So if we pick δ = ε/M ,

then |F (x)− F (x0)| < ε. The case x < x0 is treated similarly.

72

Proposition 5.9. If f, g : I −→ R are integrable, and if f(x) ≤ g(x) for allx ∈ I, then

∫ b

af(x)dx ≤

∫ b

ag(x)dx.

Proof.

Theorem 5.10 (Intermediate Value Theorem for Integrals). If f : I −→R is a continuous function on I = [a, b], then there is an x ∈ I such that

f(x) =1

b− a

∫ b

a

f(x)dx.

Proof. Since continuous functions on closed intervals assume a maximum Mand a minimum m, Proposition 5.9 gives us the bounds

m(b− a) ≤∫ b

a

f(x)dx ≤M(b− a).

Dividing through by b−a > 0 shows that 1b−a

∫ b

af(x)dx lies between m and M ;

by the intermediate value theorem, f assumes this value somewhere in [a, b].

73

Chapter 6

Differentiable Functions

6.1 Derivatives

Just as we used two definitions of continuity, there are very good reasons forusing two defintions of differentiable functions. The most natural one seems tobe the following: a function f : I −→ R defined on an open interval is said tobe differentiable at a ∈ I if the limit

limx→a

f(x)− f(x)x− a

exists; in this case, we denote it by f ′(a) and call it the derivative of f at a.A simple sketch shows that f ′(a) can be interpreted as the slope of a tangent

to f at x = a.The derivative of the function f(x) = xk is computed easily (assuming it

exists): we consider the sequence xn = a + 1n with xn −→ a; then

xkn − ak = (xn − a)(xk−1

n + axk−2n + a2xk−3

n + . . . + ak−2xn + ak−1),

hence

limn→∞

f(xn)− f(a)xn − a

= limn→∞

(xk−1n + axk−2

n + . . . + ak−1) = kak−1.

Note that for proving f ′(a) = kak−1 we would have to consider every sequencexn −→ a, not only the one we have looked at.

As for continuity, there are two definitions of differentiability; as a rule, thefollowing is easier to apply in proofs:

Proposition 6.1. Let f : I −→ R be a function defined on an open interval I.Then the following statements are equivalent:

1. f is differentiable at a ∈ I;

2. f(x) = f(a) + (x − a)(f ′(a) + u(x) for some function u : I −→ R suchthat limx→a u(x) = 0.

74

Proof. Assume that f ′(a) exists; then u(x) = f(x)−f(a)x−a − f ′(a) is a function

defined on I \ {a} such that limx→a u(x) = 0.Conversely, assume that u has the properties in 2. Then u(x) = f(x)−f(a)

x−a −f ′(a) for x 6= a, and since the left hand side has a limit as x → a, so doesthe right hand side. This implies that f has a derivative at x = a with valuef ′(a).

Corollary 6.2. If f is differentiable at a, then it is continuous at a.

Proof. If f is differentiable at a, then

f(x)− f(a) = (x− a)(f ′(a) + u(x)

for some function u : I −→ R such that limx→a u(x) = 0. Thus if (xn) is anysequence converging to a, then f(xn)− f(a) converges to 0.

Thus the set of functions differentiable at a is a subset of the ring of functionscontinuous at a. As a matter of fact, it is a subring:

Proposition 6.3. Let I be an open interval; the functions f : I −→ R dif-ferentiable at a ∈ I form a subring of the functions f : I −→ R continuousat a. More exactly, we have (f + g)′(a) = f ′(a) + g′(a) and the product rule(fg)′(a) = f(a)g′(a) + f ′(a)g(a).

The reason for choosing an open interval is that we don’t want to definedifferentiability from the left or right.

Proof. Let f and g be functions I −→ R that are differentiable at a ∈ I. Let(xn) be a sequence in I converging to a; then

(f + g)(xn)− (f + g)(a)xn − a

=f(xn)− f(a)

xn − a+

g(xn)− g(a)xn − a

,

and since the terms on the right hand side have limits as xn → a by assumption,so does the term on the left hand side. This shows that f + g has a derivativeat x = a, and the proof shows that in fact (f + g)′(a) = f ′(a) + g′(a).

Now let us look at the product of differentiable functions. We have

fg(xn)− fg(a)xn − a

=f(xn)g(xn)− f(xn)g(a) + f(xn)g(a)− f(a)g(a)

xn − a

= f(xn)g(xn)− g(a)

xn − a+ g(a)

f(xn)− f(a)xn − a

.

Now the terms on the right hand side have limits for xn −→ a, hence so doesthe left hand side; this shows that fg has a derivative at x = a if f and g do,and that (fg)′(a) = f(a)g′(a) + f ′(a)g(a).

The existence of neutral elements with respect to addition and multiplicationis clear since constant functions are differentiable. Moreover, if f is differentiableat a ∈ I, then so is −f .

75

Since f(x) = x has derivative f ′(x) = 1, repeated application of the productrule proves that f(x) = xn has derivative f ′(x) = nxn−1 for all n ≥ 0. Fornegative n we need

Proposition 6.4. If f : I −→ R is differentiable at a ∈ I, and if f(a) 6= 0,then 1/f is differentiable at a, and we have ( 1

f )′(a) = −f ′(a)/f(a)2.

Proof. Remember when we proved that 1/f is continuous at a if f is continuousat a? We proved that, under these assumptions, f is nonzero on some intervalcontaining a, and I complained that Ross didn’t prove this. Here’s how hebegins his proof that 1/f is differentiable at a:

Since f(a) 6= 0 and f is continuous at a, there exists an open intervalI containing a such that f(x) 6= 0 for x ∈ I.

Now we may use this since we proved it, so for any x ∈ I we have

1f(x)

− 1f(a)

=f(a)− f(x)f(a)f(x)

= −f(x)− f(a)x− a

· x− a

f(x)f(a),

hence1/f(x)− 1/f(a)

x− a= −f(x)− f(a)

x− a· 1f(x)f(a)

,

and letting x −→ a we see that 1/f is differentiable at x = a and has derivative−f ′(a)/f(a)2.

Now we claim

Proposition 6.5 (Chain Rule). Assume that f : I −→ R is differentiableat a ∈ I, and that g : J −→ R is differentiable at f(a) ∈ J . Then g ◦ f isdifferentiable at a, and we have (g ◦ f)′(a) = f ′(a)g′(f(a)).

Proof. Put h = g ◦ f and b = f(a). Since f is differentiable at x = a and g atx = b, there exist functions u, v such that

f(x) = f(a) + (x− a)(f ′(a) + u(x),g(y) = g(b) + (y − b)(g′(b) + v(y),

where limx→0 u(x) = limy→0 v(y) = 0. Now

h(x)− h(b) = g(f(x))− g(f(a))= [f(x)− f(a)] · [g′(b) + v(y)]= (x− a)[f ′(a) + u(x)] · [g′(b) + v(y)],

so for all x 6= a we have

h(x)− h(a)x− a

= [g′(b) + v(y)] · [f ′(a) + u(x)].

If we let x → a, then y → b since g is continuous at b, and the right hand sidetends to g′(b)f ′(a) as claimed.

76

Remember the theorem of the existence of inverse functions? We provedthat strictily increasing functions I −→ J = f(I) on intervals I have an inversefunction f−1 : J −→ I, and that f−1 is continuous if f is. Now we shall prove

Theorem 6.6. Let f : I −→ R be an injective function on an open intervalI, and let J = f(I). If f is differentiable at a ∈ I, and if f ′(a) 6= 0, thenf−1 : J −→ I is differentiable at b = f(a), and we have (f−1)′(b) = 1/f ′(a).

Proof. Here using the second definition of differentiability simplifies the proofconsiderably: compare with Ross.

Since f is differentiable at a, there is a function u : I −→ R with limx→a u(x) =0 and

f(x) = f(a) + (x− a)(f ′(a) + u(x)). (6.1)

Now we are interested in g(y)−g(b)y−b , where g = f−1; with g(y) = x and g(b) = a,

this is nothing but x−af(x)−f(a) ; using (6.1), this becomes

g(y)− g(b)y − b

=x− a

f(x)− f(a)=

1f ′(a) + u(x)

.

Now we can write 1/(f ′(a) + u(x)) = 1/f ′(a) + v(x), where limx→a v(x) = 0;thus

g(y)− g(b)y − b

=1

f ′(a)+ v(x),

which implies that g is differentiable at y = b with g′(b)− 1/f ′(a).

Example: if f(x) = x2, then g = f−1 is given by g(y) =√

y as long as y ≥ 0.In order to apply Theorem 6.6 we need f to be injective, so we have to restrictthe domain of f to the nonnegativ reals. There f ′(a) 6= 0 as long as a 6= 0, solet us assume that x > 0. Then g is differentiable at b = f(a) = a2, and wehave g′(b) = 1/f ′(a) = 1/2a = 1/2

√b.

We say that f attains a local maximum at x0 if there is a δ > 0 such thatf(x) ≤ f(x0) for all x ∈ (x0−δ, x0+δ). The following criterion gives a necessarycondition for the existence of local extrema:

Theorem 6.7. Let f : I −→ R be a differentiable function on some openinterval I. If f attains a local maximum or a minimum at a ∈ I, then f ′(a) = 0.

Proof. Again, this is a point where the second definition comes in handy: thereis a function u(x) with limx→a u(x) = 0 and

f(x) = f(a) + (x− a)(f ′(a) + u(x)).

Now if f attains a local maximum at a, then f(x) ≤ f(a) for all x close to a.But this inequality is equivalent to (x − a)(f ′(a) + u(x)) ≤ 0. If x < a, thisimplies f ′(a) ≥ u(x), and letting x −→ a we find f ′(a) ≥ 0. If x > a, on theother hand, we find f ′(a) ≤ u(x), and letting x −→ a, we find f ′(a) ≤ 0. Thusf ′(a) = 0.

This takes care of the case when f has a maximum at a. If f has a minimumat a, then −f has a maximum, hence −f ′(a) = 0 by what we just proved.

77

Note that the condition f ′(a) = 0 does not guarantee the existence of amaximum or minimum at a, as the example f(x) = x3 with a = 0 shows.

The mean value theorem for differentiable functions is going to be our keyto the fundamental theorem of calculus (it is also the key to finding formulasfor the arc length of (smooth) curves, curvature, etc.):

Theorem 6.8 (Mean Value Theorem). If f : I −→ R is differentiable onI = (a, b), then there is an x ∈ I such that

f(b)− f(a) = (b− a)f ′(x).

We shall prove a special case first:

Theorem 6.9 (Rolle’s Theorem). Let f : I −→ R be continuous on I = [a, b]and differentiable on (a, b). If f(a) = f(b), then there exists an x ∈ (a, b) suchthat f ′(x) = 0.

Proof. Since f is continuous on a closed interval, it assumes its maximum andminimum somewhere, so there are x0, y0 ∈ I such that f(x0) ≤ f(x) ≤ f(y0).If x0 = a and y0 = b or vice versa, then f is constant since f(a) = f(b) byassumption. But then f ′(x) = 0 for all x ∈ (a, b). Otherwise, f assumes eithera maximum at y0 ∈ (a, b) or a minimum in x0 ∈ I, and then f ′(x) = 0 forx = x0 or x = y0.

The Mean Value Theorem now follows by applying Rolle’s theorem to g(x) =f(x)− f(b)−f(a)

b−a f(x).

Proposition 6.10. Let f : I −→ R be a differentiable function on some openinterval I. Then f is strictly increasing on I if f ′(a) > 0 for all a ∈ I.

Proof. Assume that f ′(a) > 0 for all a ∈ I. If x1 < x2, then by the mean valuetheorem

f(x2)− f(x1)x2 − x1

= f ′(x)

for some x ∈ (x1, x2); since f ′(x) > 0, this implies f(x2) > f(x1), hence f isstrictly increasing.

Note that the converse is not completely true (it is almost true, however):the function f(x) = x3 is strictly increasing on (−1, 1) but f ′(0) = 0.

6.2 Fundamental Theorem of Calculus

Let f : [a, b] −→ R be integrable, and consider a subinterval [x, x + ∆] of [a, b].The area below f in this subinterval equals the product of the length ∆ ofthe interval and the approximate height f(x). If F is a function such thatF (x + ∆)− F (x) gives the area below f in [x, x + ∆], then F (x + ∆)− F (x) ≈∆ · f(x), thus F (x+∆)−F (x)

∆ ≈ f(x). Letting ∆→ 0, we get F ′(x) = f(x).

78

This argument involves lots of hand-waving, and can be made exact byreplacing the estimate F (x+∆)−F (x) ≈ ∆ ·f(x) by something exact using themean value theorem. Of course we have to do this for every small subintervaldefined by a partition to get good results.

Theorem 6.11 (Fundamental Theorem of Calculus). Let F : [a, b] −→ Rbe a function differentiable in (a, b). If f = F ′ has a Riemann integral, then∫ b

a

f(x)dx = F (b)− F (a).

Proof. The fact that f has a Riemann integral means that for every ε > 0there exists a partition P = {a = t0 < t1 < . . . < tn = b} of [a, b] such thatU(f, P )− L(f, P ) < ε.

In each subinterval [ti, ti+1] of this partition we apply the mean value the-orem to F and get F (ti+1) − F (ti) = (ti+1 − ti)f(ξi) for some ξi ∈ (ti, ti+1).Summing all these equations (telescope sums!) gives

F (b)− F (a) =∑

f(ξi)(ti+1 − ti).

Now m(f, [ti, ti+1]) ≤ f(ξi) ≤ M(f, [ti, ti+1]), hence L(f, P ) ≤∑

f(ξi)(ti+1 −ti) ≤ U(f, P ). Thus L(f, P ) ≤ F (b)− F (a) ≤ U(f, P ), and taking the infimumover the left and the supremum over the right inequality we see L(f) ≤ F (b)−F (a) ≤ U(f). Since f is Riemann integrable, we have L(f) = U(f) =

∫ b

af(x)dx,

hence this must equal F (b)− F (a) as claimed.

79

Documents

GEM 360. Foundations of Analysis - Bilkent Universityfranz/lect/calcon.pdf · GEM 360. Foundations of Analysis Franz Lemmermeyer, Spring 2001 March 9, 2002. Abstract The mathematicians