54
MA3051: Mathematical Analysis II Course Notes Stephen Wills 2014/15 Contents 0 Assumed Knowledge 2 Sets and maps ................................ 2 Sequences and series ............................. 2 Metric spaces ................................. 2 Complex analysis ............................... 4 Linear algebra ................................ 5 1 Motivation 6 2 Inner Product Spaces 8 3 Normed Vector Spaces 13 4 Completeness and Convexity 24 5 Orthogonality 31 6 Continuous/Bounded Linear Maps 39 7 Fourier Series 49

MA3051: Mathematical Analysis IIMA3051: Mathematical Analysis II Course Notes Stephen Wills 2014/15 Contents ... Consider the following problem of an applied mathematical nature: a

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

MA3051: Mathematical Analysis II

Course Notes

Stephen Wills

2014/15

Contents

0 Assumed Knowledge 2Sets and maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Complex analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1 Motivation 6

2 Inner Product Spaces 8

3 Normed Vector Spaces 13

4 Completeness and Convexity 24

5 Orthogonality 31

6 Continuous/Bounded Linear Maps 39

7 Fourier Series 49

0 Assumed Knowledge

The following is material that I believe that you have encountered already; con-sequently I won’t be proving it again, nor (explicitly) asking you to reprove thisstuff. The symbol K denotes the real (R) or complex (C) numbers.

Sets and maps

1. A map f : X → Y between sets X and Y is:

(a) injective/one-to-one if f(x1) = f(x2) ⇒ x1 = x2;

(b) it is surjective/onto if ∀ y ∈ Y ∃x ∈ X such that f(x) = y;

(c) it is bijective if it is injective and surjective, equivalently if it is invertible,i.e. ∃ g : Y → X such that (g f)(x) = x and (f g)(y) = y ∀x ∈ X, y ∈ Y .

Here g f denotes the composition of g and f : (g f)(x) = g(f(x)).

2. If f : X → Y is a map then for A ⊂ X and B ⊂ Y ,

f(A) := f(x) : x ∈ A (the image of A)

f−1(B) := x ∈ X : f(x) ∈ B (the preimage of B).

Preimages behave better than images; for any family of subsets (Bi)i∈I of Y

f−1(⋃

iBi)

=⋃i f−1(Bi) and f−1

(⋂iBi)

=⋂i f−1(Bi).

Also f−1(Bc) = (f−1(B))c, where Bc denotes the complement of B. On the otherhand, for sets Ai ⊂ X,

f(⋃

iAi)

=⋃i f(Ai) and f

(⋂iAi)⊂⋂i f(Ai)

with the inclusion above not necessarily being an equality (here I use ⊂ to denoteany subset, denoting proper subsets with $, rather than the ⊆/⊂ convention).

3. An infinite set X is countable if there is a bijection f : X → N = 1, 2, . . .,otherwise it is uncountable. If X1, X2, . . . are countable, so are

⋃∞i=1Xi and X1 ×

· · · ×XN for each N ∈ N.

Sequences and series

1. If (an)∞n=1 and (bn)∞n=1 are convergent sequences of numbers in K with limits a andb respectively then an + bn → a+ b, anbn → ab, and, if b 6= 0, an/bn → a/b.

2. If∑∞

n=1 an is a convergent series of numbers in K then an → 0; if∑∞

n=1 |bn| is aconvergent series then so is

∑∞n=1 bn (the converse is not true).

Metric spaces

1. A metric space is a set X together with a metric or distance function d : X×X → Rsuch that

(i) d(x, y) > 0 for all x, y ∈ X, with equality if and only if x = y;

2

(ii) d(x, y) = d(y, x);

(iii) d(x, y) 6 d(x, z) + d(z, y) for all x, y, z ∈ X.

This gives rise to a family of subsets called the open subsets of X, which are thosesubsets U ⊂ X such that for each x ∈ U there is some rx > 0 such that

B(x, rx) := y ∈ X : d(x, y) < rx ⊂ U.

Here B(x, rx) is the open ball of radius rx, centre x, and the choice of rx is allowedto depend on x.

The closed subsets are the complements of the open sets. Unions of open sets areopen, hence intersections of closed sets are closed.

Intersections of finite numbers of open sets are open, with a corresponding state-ment for finite unions of closed sets.

2. A sequence (xn)∞n=1 in a metric space converges to x if and only if one of thefollowing equivalent conditions holds:

(i) ∀ε > 0 ∃N > 1 such that d(xn, x) < ε ∀n > N

(ii) d(xn, x)→ 0 (in R)

(iii) ∀ open sets U that contain x ∃N > 1 such that xn ∈ U ∀n > N .

3. Let A and B be subsets of a metric space X. The following conditions on B areequivalent:

(i) B is the smallest closed set containing A

(ii) ∀x ∈ B, if U ⊂ X is open and x ∈ U then U ∩A 6= ∅

(iii) ∀x ∈ B ∃ a sequence (xn)∞n=1 ⊂ A such that xn → x

The set B that satisfies these conditions is the closure of A, denoted A (it alwaysexists).

Consequence: A is closed if and only if A = A, from this it follows that A beingclosed is equivalent to the condition that whenever (xn)∞n=1 ⊂ A is convergent tosomething in X then in fact we have lim

n→∞xn ∈ A.

4. A subset A of a metric space X is dense if A = X. X is separable if it contains acountable dense subset (finite or infinite).

5. A Cauchy sequence in a metric space is a sequence (xn)∞n=1 ⊂ X that satisfiesd(xm, xn) → 0 (formally: ∀ ε > 0 ∃ N > 1 such that d(xm, xn) < ε ∀m,n > N).A metric space is complete if every Cauchy sequence is convergent.

6. A subset of a complete metric space is complete (in itself) if and only if it is closed.

7. If (X, dX) and (Y, dY ) are metric spaces, then the following are all metrics onX × Y :

d1((x1, y1), (x2, y2)

)= dX(x1, x2) + dY (y1, y2),

d2((x1, y1), (x2, y2)

)= (dX(x1, x2)

2 + dY (y1, y2)2)1/2,

d∞((x1, y1), (x2, y2)

)= maxdX(x1, x2), dY (y1, y2)

3

However, these all induce the same topology on X × Y , the product topology, i.e.if U ⊂ X × Y is open with respect to d1 then it is open with respect to d2 andd∞, and so on.

8. (a) A map f : X → Y between metric spaces is continuous at the point x ∈ X ifany of the following equivalent conditions hold:

(i) ∀ ε > 0 ∃ δ > 0 such that dX(x, x′) < δ ⇒ dY(f(x), f(x′)

)< ε;

(ii) For every sequence (xn)∞n=1 that converges to x we have f(xn)→ f(x).

(iii) For each open set V that contains f(x) there is an open set U containingx such that f(U) ⊂ V ;

(b) A map f : X → Y between metric spaces is continuous if any of the followingequivalent hold:

(i) f is continuous at every x ∈ X;

(ii) ∀ open V ⊂ Y , f−1(V ) is open in X;

(iii) ∀ closed F ⊂ Y , f−1(F ) is closed in X.

9. The composition of continuous maps is continuous.

[Note: (g f)−1(A) = f−1(g−1(A)

).]

10. The distance from a point x to a (nonempty) subset A ⊂ X of a metric space isdist(x,A) = infd(x, a) : a ∈ A. The map x 7→ dist(x,A) is continuous.

11. A subset C of X is compact if every open cover has a finite subcover. Equivalently,C is compact if every sequence (xn)∞n=1 has a subsequence that converges to a pointin C.

The image of a compact subset under a continuous map is compact.

A compact subset C of a metric space is closed and bounded (for each x ∈ X∃K > 0 such that C ⊂ B(x,K)); the converse is true for subsets of Kn (theHeine-Borel Theorem) but not in general.

Complex analysis

1. A map f : A ⊂ C → C is differentiable at z ∈ A if limh→0 h−1(f(z + h) − f(z)

)exists, in which case it is denoted f ′(z). It is analytic/holomorphic at z if thereis some open ball or open set U containing z such that f is differentiable at eachw ∈ U .

If f is analytic on and inside a contour C (a closed curve in the complex planewithout too many corners) then Cauchy’s Integral Formula states that

f(a) =1

2πi

∫C

f(z)

z − adz

for any a from the interior of the contour, where the integral is carried out anti-clockwise around the contour.

4

Linear algebra

1. A subset U of a vector space V is a subspace if either of the following equivalentconditions hold:

(i) x+ y ∈ U ∀x, y ∈ U and λx ∈ U ∀λ ∈ K, x ∈ U ;

(ii) λx+ µy ∈ U ∀λ, µ ∈ K, x, y ∈ U .

2. If U and W are subspaces of V then so are U ∩W and U + W := u + w : u ∈U,w ∈W. [U∩W is the largest subspace in both U and W ; U+W is the smallestsubspace containing U and W .]

Other notation: for A ⊂ V , x ∈ V and λ ∈ K, A+x := a+x : a ∈ A (translationof A by x) and λA := λa : a ∈ A (scaling of A by λ).

3. If S ⊂ V , its linear span is the set LinS := ∑n

i=1 λixi : n ∈ N, λi ∈ K, xi ∈ S,the set of all linear combinations of (finite subsets of) S. It is the smallest subspacethat contains S.

4. A map T : V →W between vector spaces is linear if any of the following equivalentconditions holds:

(i) T (x+ y) = Tx+ Ty ∀x, y ∈ V and T (λx) = λTx ∀λ ∈ K, x ∈ V ;

(ii) T (λx+ µy) = λTx+ µTy ∀λ, µ ∈ K, x, y ∈ V ;

(iii) T (x+ λy) = Tx+ λTy ∀λ ∈ K, x, y ∈ V .

The composition of linear maps is linear.

The range of T , RanT := T (V ), is a subspace of W ; the kernel of T , KerT :=x ∈ V : Tx = 0, is a subspace of V .

5. If K = C then a map S : V → W is conjugate linear if S(x + y) = Sx + Sy andS(λx) = λSx.

6. A subset S of V is linearly independent if whenever∑n

i=1 λixi = 0 for n ∈ N,λi ∈ K and distinct elements xi of S, then λ1 = · · · = λn = 0. It is linearlydependent if it is not linearly independent.

7. If V = LinS for a finite subset S ⊂ V then V is finite dimensional, with dimVequal to the size of any linearly independent set S′ such that LinS′ = V ; suchsets all have the same size, which is the size of the largest linearly independentset contained in V ; these sets are bases of V . V is infinite dimensional if no suchfinite set exists, equivalently if it contains an infinite linearly independent set.

If dimV = n < ∞ then there is a linear bijection (more usually called a linearisomorphism) T : V → Kn. An example of such a T is obtained by picking a basisv1, . . . , vn and setting T (

∑ni=1 λivi) = (λ1, . . . , λn).

8. Rank-nullity formula: if V is finite dimensional and T : V → W is linear thendimV = dim RanT + dim KerT .

9. If V and W are vector spaces, then V ×W is a vector space if we define

(v1, w1) + (v2, w2) = (v1 + v2, w1 + w2), λ(v, w) = (λv, λw).

5

1 Motivation

Consider the following problem of an applied mathematical nature: a chain oflength L is suspended vertically from a hook, and then subjected to a small per-turbation and released from rest. Find the subsequent displacement of the chain.

Applying the laws of physics and standard techniques of applied mathematicsone obtains the following equations of motion:

∂2u

∂t2=

∂x

(x∂u

∂x

)where u(x, t) is the horizontal displacement of the chain a distance x from thebottom at time t. We have assumed that the chain has uniform density. We alsohave boundary/initial conditions:

u(L, t) = 0, 0 6 t <∞,u(x, 0) = u0(x), 0 6 x 6 L,

∂u

∂t(x, 0) = 0, 0 6 x 6 L

sup |u(x, t)| <∞, 0 6 x 6 L, 0 6 t <∞

A standard technique to solve this now is to apply the technique of separation ofvariables, i.e. to look for a solution of the form

u(x, t) = f(x)g(t).

One then obtainsf(x)g′′(t) = xf ′′(x)g(t) + f ′(x)g(t)

and sog′′(t)

g(t)= x

f ′′(x)

f(x)+f ′(x)

f(x).

Both sides must equal a constant, −λ say, since the left-hand side is a function oft only whereas the right-hand side is a function of x only. Thus we have to solve

g′′(t) + λg(t) = 0 andd

dx

(x

df

dx

)+ λf(x) = 0.

The first equation has general solution g(t) = A cos√λt + B sin

√λt, and the

second can be solved by use of Bessel functions. However, when we start to fit theboundary conditions we find that only certain values λ1 < λ2 < · · · are allowable,depending on L, giving an associated sequence uj(x, t) = fj(x)gj(t), where fj andgj are the solutions for this λj . But now we want to write the general solution ofthe PDE as a sum of the normal modes uj , that is, write

u(x, t) =

∞∑j=1

αjuj(x, t)

for constants αj chosen suitably so that we can fit the boundary conditions.Several questions should be apparent:

6

1. In what sense does the series converge?

2. Are there any restrictions on the initial displacement? For example, whatguarantees that the solution can be expressed as a sum of normal modes? Inparticular are there enough? What does “enough” mean in this context?

3. Most fundamentally — why can we add normal modes together to get an-other solution of the PDE?

The answer to the last one is because both the PDE and the related ODEs arelinear, i.e. the map u 7→ utt is linear, etc.

Both of the ODEs are examples of Sturm–Liouville problems, and were solvedin the 1830’s long before the advent of Hilbert space theory. However, as soonas we want to generalise the above problem then the methods of Hilbert spacetheory, and functional analysis more generally, gives a framework that makes thisreasonably straightforward.

Problems of this nature were the inspiration for the creation of functionalanalysis which combines the methods of analysis and linear algebra, in particularwhen dealing with infinite-dimensional vector spaces of functions. Note that themaps

S : g 7→ g′′ and T : f 7→ d

dx

(x

df

dx

)are both linear maps, i.e. T (f1 + µf2) = Tf1 + µTf2, and to say that f satisfiesthe ODE above amounts to saying

Tf + λf = 0 ⇔ Tf = −λf,

that is, −λ is an eigenvalue of T with f an associated eigenvector.More generally, the set of solutions of any homogeneous, linear ODE (e.g. the

types of equation considered in MA2054) form a subspace of the vector space ofsuitably differentiable functions. For example, solutions to

y′′ + 6y′ − 7y = 0

form a subspace X of C2[0, 1]. If we turn this into a nonhomogeneous problem bychanging the right-hand side as follows:

y′′ + 6y′ − 7y = f,

then the set of solutions becomes g +X, where g is any solution to the nonhomo-geneous problem. Subsets of vector spaces of this form, “vector + subspace” arecalled affine-linear spaces.

Other examples where we have a need to consider a combination of both metricand linear algebraic ideas is when we consider the collection of all probabilitymeasures/distributions on a set Ω. The set of such measures is not a subspace,nor an affine-linear space, but is convex. See Example 4.11 for more details.

7

2 Inner Product Spaces

Recall the scalar product of two vectors in R3:

(a1, a2, a3).(b1, b2, b3) = a1b1 + a2b2 + a3b3.

We have

a.b = |a||b| cos θ,

where θ is the angle between the vectors a and b, and |a| = (a21 +a22 +a23)1/2 is the

length of a. In particular we get

−|a||b| 6 a.b 6 |a||b| ⇔ |a.b| 6 |a||b|

with

a.b = 0 ⇔ a, b are orthogonal/perpendicular.

We want to generalise this structure to other vector spaces. All our vectorspaces will be real or complex, and we will use K to denote R or C.

Definition 2.1. An inner product on a vector space X over K is a map 〈·, ·〉 :X ×X → K that satisfies:

IP1. 〈x, x〉 > 0 for all x ∈ X, with equality if and only if x = 0;

IP2. 〈x, y + λz〉 = 〈x, y〉+ λ〈x, z〉 for all x, y, z ∈ X and λ ∈ K;

IP3. 〈x, y〉 = 〈y, x〉 for all x, y ∈ X.

The space X together with 〈·, ·〉 is called an inner product space, or a pre-Hilbertspace.

Remarks. (i) The three properties are usually known as positivity, linearity andsymmetry respectively.

(ii) When K = R, (iii) just means 〈x, y〉 = 〈y, x〉, since these are real numbers.

Lemma 2.2. If X is a complex inner product space then for any x, y, z ∈ X andλ ∈ C we have

〈x+ λy, z〉 = 〈x, z〉+ λ〈y, z〉.

Proof. We have

〈x+ λy, z〉 = 〈z, x+ λy〉= 〈z, x〉+ λ〈z, y〉= 〈z, x〉+ λ 〈z, y〉= 〈x, z〉+ λ〈y, z〉

as required.

Exercise 2.3. Show that for a real inner product space we have 〈x + λy, z〉 =〈x, z〉+ λ〈y, z〉.

8

Property IP2 of an inner product space just says that the map y 7→ 〈x, y〉 islinear for each fixed choice of x. Lemma 2.2 then says that for a complex innerproduct space the map x 7→ 〈x, y〉 is conjugate linear, but it is linear in the caseof a real inner product space.

Health warning: Some authors use the opposite convention regarding whichvariable is linear and which is conjugate linear for complex inner product spaces.

We can summarise this by saying that the map 〈·, ·〉 : X ×X → K is bilinear(linear in both arguments) if K = R, and sesquilinear (11

2 -times linear) if K = C.

One immediate consequence is that for any x ∈ X we have

〈x, 0〉 = 〈0, x〉 = 0.

To see this put z = y and λ = −1 in property IP2 of the definition to get

〈x, 0〉 = 〈x, y − y〉 = 〈x, y〉 − 〈x, y〉 = 0,

and similarly for 〈0, x〉.

Lemma 2.4. Let X be an inner product space. If 〈x, z〉 = 〈y, z〉 for all z ∈ Xthen x = y.

Proof. We have

0 = 〈x, z〉 − 〈y, z〉 = 〈x− y, z〉

for all z ∈ X. In particular this is true when z = x− y, and so we have

〈x− y, x− y〉 = 0 ⇒ x− y = 0 ⇒ x = y.

Example 2.5.

1. Kn equipped with the usual inner product:

〈(λ1, . . . , λn), (µ1, . . . , µn)〉 =n∑i=1

λiµi.

2. C([a, b];K

)= f : [a, b] → K | f continuous with the vector space opera-

tions

(f + g)(x) = f(x) + g(x), (λf)(x) = λf(x),

and equipped with inner product 〈f, g〉 =∫ ba f(t)g(t) dt.

More generally, if w ∈ C[a, b] with w(t) > 0 for all t ∈ [a, b], then 〈f, g〉w =∫ ba f(t)g(t)w(t) dt is an inner product, involving the weight function w. In

the original example we have w(t) = 1 for all t ∈ [a, b].

Most of the inner product definitions are easy to check, but you shouldconvince yourself of the following fact: if f : [a, b] → K is continuous with∫ ba |f(t)|dt = 0 then f(t) = 0 for all t ∈ [a, b].

9

3. Pn = polynomials of degree 6 n, z0, . . . , zn ∈ C distinct points, and

〈p, q〉 :=n∑i=0

p(zi)q(zi).

Again we need an argument to show why 〈p, p〉 = 0 only happens if p ≡ 0.This time use the Fundamental Theorem of Algebra to factorise p into linearfactors, of which there can be at most n.

4. On the space Mm,n(K) of m× n matrices with entries from K we can define

〈A,B〉 = tr(A∗B)

where A∗ denotes the conjugate transpose of A, and trC =∑n

i=1 cii for anyn × n matrix C, the trace of C. This example can be viewed as a specialcase of the first one, by forming a vector of length mn out of the rows ofeach matrix in Mm,n(K).

5. An important infinite-dimensional example is

l2 =

x = (xn)∞n=1 : xn ∈ K,

∞∑n=1

|xn|2 <∞

,

the set of square-summable sequences, with inner product

〈x, y〉 =∞∑n=1

xnyn.

To ensure that this one works we first need to check that l2 is a vector spaceunder the operations:

x+ y = (xn + yn)∞n=1, λx = (λxn)∞n=1.

This is postponed for now.

6. Another pair of infinite-dimensional examples that appear in control prob-lems in electrical engineering are

RL2 := f(z) : f is rational and analytic on the unit circle |z| = 1,

and its subspace

RH2 := f(z) : f is rational and analytic on the closed unit disc |z| 6 1,

This time we can take as inner product

〈f, g〉 =1

2πi

∫|z|=1

f(z)g(z)dz

z

noting that the integrand is continuous on the circle, so the integral makessense, and the inner product space properties can be verified just as in part 2.This example has an advantage over part 2 in that the integrals/inner prod-ucts can be computed easily using Cauchy’s Integral Formula.

10

Given an inner product space X, we define ‖x‖ :=√〈x, x〉 ∈ [0,∞) for each

x ∈ X. For our example of the usual scalar product on R3 we have ‖a‖ = |a|, theusual Euclidean length.

Proposition 2.6. Let X be an inner product space and x,∈ X, λ ∈ K. Then:

(i) ‖x ± y‖2 = ‖x‖2 ± 2 Re〈x, y〉 + ‖y‖2 for all x, y ∈ X. (Note that Re issuperfluous when K = R.)

(ii) ‖x‖ > 0, with ‖x‖ = 0 if and only if x = 0.

(iii) ‖λx‖ = |λ|‖x‖.

Proof. Immediate from the definition of inner products. For example

‖x± y‖2 = 〈x± y, x± y〉 = 〈x, x± y〉 ± 〈y, x± y〉= 〈x, x〉 ± 〈x, y〉 ± 〈y, x〉+ (±1)2〈y, y〉= ‖x‖2 ± 2 Re〈x, y〉+ ‖y‖2.

Similarly

‖λx‖ =√〈λx, λx〉 =

√λλ〈x, x〉 =

√|λ|2〈x, x〉 = |λ|‖x‖.

We next prove that the inequality satisfied by the usual scalar product on R3

carries over to general inner product spaces.

Theorem 2.7 (Cauchy–Schwarz Inequality). Let X be an inner product space.Then for all x, y ∈ X we have |〈x, y〉| 6 ‖x‖‖y‖, with equality if and only if x, yare linearly dependent.

Proof. Since 〈x, y〉 = 0 if x = 0 or y = 0, we may assume both vectors are nonzero,so that ‖x‖, ‖y‖ > 0. Pick θ ∈ [0, 2π) such that e−iθ〈x, y〉 = |〈x, y〉|. Then for allr ∈ R we have

0 6 ‖reiθx− y‖2 = ‖reiθx‖2 + 2 Re〈reiθx,−y〉+ ‖−y‖2

= |reiθ|2‖x‖2 − 2rRe e−iθ〈x, y〉+ ‖y‖2

= r2‖x‖2 − 2r|〈x, y〉|+ ‖y‖2.

Thus the quadratic in r has at most one root. It discriminant is

(−2|〈x, y〉|)2 − 4‖x‖2‖y‖2 = 4(|〈x, y〉|2 − (‖x‖‖y‖)2

)and so we have

discriminant 6 0 ⇒ |〈x, y〉| 6 ‖x‖‖y‖.

Moreover,

|〈x, y〉| = ‖x‖‖y‖ ⇔ discriminant = 0 ⇔ ∃ repeated root

⇔ reiθx− y = 0 for some r

⇔ x, y linearly dependent.

11

Proposition 2.8. Let X be an inner product space, and x, y ∈ X. Then

‖x+ y‖ 6 ‖x‖+ ‖y‖.

Proof. Using the Cauchy–Schwarz inequality we get

‖x+ y‖2 = ‖x‖2 + 2 Re〈x, y〉+ ‖y‖2

6 ‖x‖2 + 2|〈x, y〉|+ ‖y‖2

6 ‖x‖2 + 2‖x‖‖y‖+ ‖y‖2 =(‖x‖+ ‖y‖

)2,

and the result follows.

This now allows us to give the details for one of our examples.

Example 2.9. The space l2 of square-summable sequences is a vector space, andthe inner product is well-defined.

To see this let x, y ∈ l2, so that x = (xn)∞n=1, y = (yn)∞n=1 with∑∞

n=1 |xn|2 <∞and

∑∞n=1 |yn|2 < ∞. We must show x + y = (xn + yn)∞n=1 ∈ l2, so choose

some N ∈ N and consider x(N) = (x1, . . . , xN ), y(N) = (y1, . . . , yN ) ∈ KN , thetruncations of x and y to N -dimensional vectors. We can apply Proposition 2.8to x(N) + y(N) to get

‖x(N) + y(N)‖2 6 ‖x(N)‖2 + ‖y(N)‖2

i.e.

N∑n=1

|xn + yn|2 6N∑n=1

|xn|2 +

N∑n=1

|yn|2.

Now let N →∞ on the right-hand side to get

N∑n=1

|xn + yn|2 6∞∑n=1

|xn|2 +∞∑n=1

|yn|2

which is valid for all N ∈ N. It follows that∑∞

n=1 |xn+yn|2 <∞, and so x+y ∈ l2as required. We can also show easily that if x ∈ l2 then λx = (λx1, λx2, . . .) ∈ l2.Finally, if x, y ∈ l2 then

N∑n=1

|xnyn| 6

(N∑n=1

|xn|2)1/2( N∑

n=1

|yn|2)1/2

6

( ∞∑n=1

|xn|2)1/2( ∞∑

n=1

|yn|2)1/2

<∞

for all N ∈ N, by applying the Cauchy–Schwarz in equality in KN to the trun-cations. Consequently the series

∑∞n=1 xnyn is absolutely convergent in K, hence

convergent.

12

3 Normed Vector Spaces

The properties satisfied by the quantity ‖x‖ defined at the end of the last section,which we think of as the length of x, are the basis of a theory more general thanjust inner product spaces.

Definition 3.1. A normed vector space (NVS) over K is a vector space X overK together with a map ‖ · ‖ : X → R satisfying the following:

N1. ‖x‖ > 0 for all x ∈ X, with ‖x‖ = 0 if and only if x = 0.

N2. ‖λx‖ = |λ|‖x‖ for all x ∈ X,λ ∈ K.

N3. ‖x+ y‖ 6 ‖x‖+ ‖y‖ for all x, y ∈ X.

So we can summarise our work above as follows:

Proposition 3.2. If X is an inner product space then it is a normed vector spacewhen we define ‖x‖ =

√〈x, x〉.

Of course, we should give some examples of normed vector spaces, and deter-mine if we have anything new from this definition.

Example 3.3.

1. Let X = Kn = x = (x1, . . . , xn) : xi ∈ K, vector space of n-tuples (whichis n-dimensional). Three possible norms on X are

‖x‖1 =n∑i=1

|xi|, ‖x‖2 =( n∑i=1

|xi|2)1/2

, ‖x‖∞ = max16i6n

|xi|.

That these are norms is easy to show; that ‖ · ‖2 defines a norm follows fromPropositions 2.6 and 2.8.

2. For any a < b in R let X = C([a, b];K) = f : [a, b] → K | f is continuous,as in part 2 of Example 2.5. Possible norms on X are

‖f‖1 =

∫ b

a|f(x)| dx, ‖f‖2 =

(∫ b

a|f(x)| dx

)2

,

‖f‖∞ = sup|f(x)| : a 6 x 6 b.

That these are well-defined follows since [a, b] is compact in R, so the integralsmake sense.

3. X = C(R;K) = f : R → K | f is continuous is a vector space in the sameway as part 2., but defining a norm is not easy. It is better to considersubspaces such as

Cc(R;K) = f ∈ X : ∃M > 0 such that f(x) = 0 ∀|x| > M,C0(R;K) = f ∈ X : |f(x)| → 0 as |x| → ∞,Cb(R;K) = f ∈ X : ∃C > 0 such that |f(x)| 6 C ∀x ∈ R,

13

i.e. the continuous functions of compact support, the continuous functionsthat vanish at infinity, and the bounded continuous functions, respectively.

Note that

Cc(R;K) ⊂ C0(R;K) ⊂ Cb(R;K) ⊂ C(R;K),

and each is a proper subspace of the next. Moreover, we can define a norm onthe first three by setting ‖f‖∞ := supx∈R |f(x)|, but this is not well-definedon all of X.

In particular some of the examples above are those coming from inner productspaces, since in part 1 we have ‖x‖2 =

√〈x, x〉 and in part 2 we have ‖f‖2 =√

〈f, f〉 for the inner products on Kn and C[a, b] given in Example 2.5. However,some of the above are not inner product spaces, which can be shown by adapting anelementary result of Euclidean geometry (based on Pythagoras’ Theorem) whichasserts that for vectors x, y in the plane we have

‖x+ y‖2 + ‖x− y‖2 = 2(‖x‖2 + ‖y‖2

). (3.1)

Definition 3.4. A normed vector space X is said to satisfy the parallelogram lawif (3.1) holds for all x, y ∈ X.

Theorem 3.5. Let X be a normed vector space. The norm on X arises from aninner product if and only if X satisfies the parallelogram law. The inner productis related to the norm through the polarisation identities:

〈x, y〉 =1

4

(‖x+ y‖2 − ‖x− y‖2

)if K = R,

〈x, y〉 =1

4

3∑n=0

1

in‖x+ iny‖2 if K = C

=1

4

(‖x+ y‖2 − i‖x+ iy‖2 − ‖x− y‖2 + i‖x− iy‖2

).

Proof. Suppose X is an inner product space, then

‖x+ y‖2 + ‖x− y‖2 =(‖x‖2 + 2 Re〈x, y〉+ ‖y‖2

)+(‖x‖2 + 2 Re〈x,−y〉+ ‖−y‖2

)=(‖x‖2 + 2 Re〈x, y〉+ ‖y‖2

)+(‖x‖2 − 2 Re〈x, y〉+ ‖y‖2

)= 2(‖x‖2 + ‖y‖

)2.

Verifying that the inner product can be written in terms of the norm through thepolarisation identities is shown similarly.

Conversely, suppose X is a normed vector space satisfying the parallelogramlaw, that K = R, and define 〈·, ·〉 : X ×X → R by

〈x, y〉 =1

4

(‖x+ y‖2 − ‖x− y‖2

).

14

Then 〈x, x〉 = 1/4 × ‖2x‖2 = ‖x‖2 > 0 with equality if and only if x = 0. Also,〈x, y〉 = 〈y, x〉 is easy to see. For any x, y, z ∈ X we have

〈x, y〉+ 〈x, z〉 =1

4

(‖x+ y‖2 + ‖x+ z‖2 − (‖x− y‖2 + ‖x− z‖2)

)=

1

4

(1

2‖2x+ y + z‖2 + ‖y − z‖2 − 1

2‖2x− y − z‖2 + ‖y − z‖2

)=

1

4

(22

2

∥∥∥x+y + z

2

∥∥∥+22

2

∥∥∥x− y + z

2

∥∥∥)= 2⟨x,y + z

2

⟩.

However, for any w ∈ X we have

〈x, 2w〉 =1

4

(‖x+ 2w‖2 − ‖x− 2w‖2

)=

1

4

(‖(x+ w) + w‖2 − ‖(x− w)− w‖2

)=

1

4

(2(‖x+ w‖2 + ‖w‖2)− ‖x‖2 − 2(‖x− w‖2 + ‖w‖2)− ‖x‖2

)= 2〈x,w〉.

Hence we have 〈x, y〉 + 〈x, z〉 = 〈x, y + z〉, i.e. the map is additive in the secondargument. Now for all m,n ∈ N we get

m〈x, y〉 = 〈x,my〉 =⟨x, n

m

ny⟩

= n⟨x,m

ny⟩,

so that q〈x, y〉 = 〈x, qy〉 for all q ∈ Q, q > 0. But it follows readily from thedefinition that 〈x, 0〉 = 0, and that 〈x,−y〉 = −〈x, y〉, and so we now get

q〈x, y〉 = 〈x, qy〉 ∀q ∈ Q, x, y ∈ X.

Finally, using density of Q in R, and continuity of norms, we can replace λ ∈ Qby any λ ∈ R, giving linearity in the second argument. A similar argument worksif K = C.

Example 3.6. The normed vector spaces (Kn, ‖ · ‖1), (Kn, ‖ · ‖∞), (C[a, b], ‖ · ‖1)and (C[a, b], ‖ · ‖∞) are not inner product spaces. For example if we take x =(1, 0, 0, . . . , 0) and y = (0, 1, 0, . . . , 0) in Kn then

‖x+ y‖21 + ‖x− y‖21 = ‖(1, 1, 0, . . . , 0)‖21 + ‖(1,−1, 0, . . . , 0)‖21 = 2× (1 + 1)2 = 8,

whereas 2(‖x‖21 + ‖y‖21

)= 2(12 + 12) = 4. Similar calculations work for the other

cases.

Theorem 3.7. If (X, ‖·‖) is a normed vector space then d(x, y) := ‖x−y‖ definesa metric on X that satisfies

(i) d(x+ z, y + z) = d(x, y) ∀x, y, z ∈ X;

(ii) d(λx, λy) = |λ|d(x, y) ∀x, y ∈ X,λ ∈ K.

Conversely, if X a vector space over K and d a metric on X that satisfies (i)and (ii) then defining ‖x‖ := d(x, 0) turns X into a normed vector space.

15

Proof. If X is a normed vector space then

d(x, y) = ‖x− y‖ > 0, with equality iff x− y = 0 ⇔ x = y;

d(x, y) = ‖x− y‖ = ‖(−1)(y − x)‖ = |−1|‖y − x‖ = ‖y − x‖ = d(y, x); and

d(x, y) = ‖x− y‖ = ‖(x− z) + (z − y)‖ 6 ‖x− z‖+ ‖z − y‖ = d(x, z) + d(z, y),

so that d really is a metric on X. Moreover,

d(x+ z, y + z) = ‖(x+ z)− (y + z)‖ = ‖x− y‖ = d(x, y); and

d(λx, λy) = ‖λx− λy‖ = ‖λ(x− y)‖ = |λ|‖x− y‖ = |λ|d(x, y).

The converse, starting from a vector space equipped with a metric that satisfies(i) and (ii), is proved similarly.

Since a norm induces a metric, we can talk about open and closed balls: if(X, ‖ · ‖) is a normed vector space then we will write

B(x, r) := y ∈ X : d(x, y) < r = y ∈ X : ‖x− y‖ < r and

B(x, r) := y ∈ X : ‖x− y‖ 6 r,

the open (respectively closed) ball of radius r and with centre x. In particularB(0, 1) is the open unit ball, and B(0, 1) is the closed unit ball. Note that

y ∈ B(x, r) ⇔ ‖y − x‖ < r ⇔∥∥∥y − x

r

∥∥∥ < 1

⇔ y − xr∈ B(0, 1) ⇔ y ∈ x+ rB(0, 1),

so that the ball B(x, r) is obtained from the open unit ball by scaling and transla-tion. Here we have used some standard notation for operations defined on a subsetA of a vector space V : given v ∈ V and λ ∈ K we set

v +A := v + w : w ∈ A, λA := λw : w ∈ A,

i.e. the subsets obtained from A by translating all vectors inA by v, or, respectively,scaling all vectors in A by λ.

Definition 3.8. Let X be a normed vector space. A subset A ⊂ X is bounded ifthere is some K > 0 such that ‖x‖ 6 K for all x ∈ A, i.e. A ⊂ B(0,K).

Proposition 3.9. Let X be a normed vector space.

(i) Let (xn)∞n=1 be a sequence in X, x ∈ X. Then xn → x ⇔ xn − x →0 (in X) ⇔ ‖xn − x‖ → 0 (in R).

(ii) The map x 7→ ‖x‖ from X to R is continuous.

(iii) If (xn), (yn) ⊂ X are sequences with xn → x and yn → y then xn+yn → x+y.If (λn) ⊂ K with λn → λ then λnxn → λx.

16

Proof. (i) By definition

xn → x in X ⇔ d(xn, x)→ 0 in R⇔ d(xn − x, 0) = ‖xn − x‖ → 0 in R (by translation invariance)

⇔ xn − x→ 0 in X

(ii) Follows from the reverse triangle inequality:∣∣‖x‖ − ‖y‖∣∣ 6 ‖x− y‖.

(iii) Fix ε > 0. There is some N such that d(xn, x) < ε/2 and d(yn, y) < ε/2for all n > N . But then

d(xn + yn, x+ y) = ‖(xn + yn)− (x+ y)‖ = ‖(xn − x) + (yn − y)‖6 ‖xn − x‖+ ‖yn − y‖ < ε/2 + ε/2 = ε

for all n > N , as required. Scalar multiplication follows similarly.

Part (iii) shows that the maps X × X 3 (x, y) 7→ x + y ∈ X and K × X 3(λ, x) 7→ λx ∈ X are continuous (when X ×X and K×X are given their producttopology, generated by many different natural metrics on these cartesian products),i.e. the metric and (linear) algebraic aspects of X are in sync. More is true whenwe specialise to inner product spaces.

Corollary 3.10. If X is an inner product space then the map 〈·, ·〉 : X ×X → Kis continuous.

Proof. Suppose (xn, yn) → (x, y) in X × X, i.e. xn → x and yn → y. Then, bythe triangle inequality (in K) and the Caucy-Schwarz inequality,

|〈xn, yn〉 − 〈x, y〉| = |〈xn − x, yn − y〉+ 〈x, yn − y〉+ 〈xn − x, y〉|6 ‖xn − x‖‖yn − y‖+ ‖x‖‖yn − y‖+ ‖xn − x‖‖y‖ → 0

as n→∞.

We will need to spend some time dealing with subspaces of normed vectorspaces in various guises. Recall the following:

Definition 3.11. (a) A subset Y of a vector space X is a subspace if x+ λy ∈ Yfor all x, y ∈ Y and λ ∈ K. Equivalently, Y is closed under addition and scaling.

(b) A subset Y of a metric space X is closed if X \ Y is open, equivalently Yis closed if whenever (xn)∞n=1 ⊂ Y is a convergent sequence then the limit, x say,must belong to Y .

(c) If Y is a subset of a metric space X then its closure, denoted Y , is thesmallest closed subset of X that contains Y . Note that y ∈ Y if and only if thereis some sequence (yn)∞n=1Y such that y = lim

n→∞yn, equivalently if and only if for

every ε > 0 there is some y′ ∈ Y such that d(y, y′) < ε.(d) A subset Y of a metric space X is dense if X = Y .

Since a normed vector space is both a vector space and a metric space, it makessense to talk about closed subspaces of X. We will show shortly that every finite-dimensional subspace of a normed vector space is always closed, but the same neednot be true of infinite-dimensional subspaces.

17

Example 3.12. Let c00 denote the set of all eventually zero sequences. That is

x = (xn)∞n=1 ∈ c00 ⇔ ∃N = Nx ∈ N such that xn = 0 ∀n > N.

This is a subset of l2 since

x ∈ c00 ⇒ x = (x1, . . . , xN , 0, 0, . . .) for some N ⇒∞∑n=1

|xn|2 =

N∑n=1

|xn|2 <∞.

Moreover it is a subspace: if x, y ∈ c00 then we can choose Nx, Ny such that xn = 0for all n > Nx and yn = 0 for all n > Ny. If we set N = maxNx, Ny then for alln > N we have

(x+ λy)n = xn + λyn = 0 + λ× 0 = 0

so that x+ λy ∈ c00.However, it is not closed as a subspace of l2. Indeed, consider the sequence

x = (1, 1/2, 1/3, . . .) = (1/n)∞n=1

which is in l2 since∑∞

n=1 n−2 < ∞, but is clearly not in c00. However, for each

N ∈ N letx(N) = (1, 1/2, . . . , 1/N, 0, 0, . . .) ∈ c00.

Then (x(N))∞N=1 ⊂ c00 is a sequence (of sequences!) and, moreover,

‖x− x(N)‖2 = ‖(0, 0, . . . , 0, 1/(N + 1), 1/(N + 2), . . .)‖2 =

( ∞∑n=N+1

1

n2

)1/2

→ 0

as N →∞, since we have the tail of a convergent series. Thus the sequence in c00converges to a point outside, and so c00 is not closed.

It is not hard to show, however, that c00 is a dense subspace of l2, i.e. c00 = l2.

Proposition 3.13. Let X be a normed vector space and A ⊂ X a subset of X.(a) There is precisely one subspace of X, called the linear span of A (denoted

LinA) that satisfies any of the following equivalent properties:

(i) LinA is the smallest subspace that contains A;

(ii) LinA is the intersection of all subspaces that contain A;

(iii) x ∈ LinA if and only if x =∑k

i=1 αisi for some k ∈ N, αi ∈ K and si ∈ A.

(b) There is precisely one subspace of X, called the closed linear span of A(denoted LinA) that satisfies any of the following equivalent properties:

(i) LinA is the smallest closed subspace that contains A;

(ii) LinA is the intersection of all closed subspaces that contain A;

(iii) LinA is the closure LinA of LinA, i.e. x ∈ LinA if and only if x = limn→∞

xn

for some sequence (xn)∞n=1 ⊂ LinA.

18

Proof. Exercise.

Remarks. (i) Note that X is always a subspace of itself, and always closed, so X isalways a closed subspace that contains S, no matter what we choose for S. Hencethere is always at least one subspace in these intersections.

(ii) Each vector xn in (b) (iii) can be written as xn =∑kn

i=1 α(n)is(n)i, i.e. asa finite linear combination of vectors from S, but the number of s(n)i is allowedto vary between the different xn.

(iii) For Example 3.12 if we write

δ(n) := (0, 0, . . . , 0, 1, 0, . . .)

where the 1 is in the nth position, and D = δ(n) : n ∈ N, then it follows thatLinD = c00, and so LinD = l2. That is, the linear span of the sequence δ(n) givea dense subspace of l2.

Example 3.14. Consider the space C[0, 1] of continuous real-valued functions on[0, 1]. A subspace of C[0, 1] is the subspace P of all polynomials. One functionthat does not belong to P is f(x) = ex. However, this function has a power seriesrepresentation:

f(x) =

∞∑n=0

xn

n!

and this series converges uniformly on any bounded interval of R since the radiusof convergence is infinite. What does this say in terms of norms? Suppose we giveC[0, 1] the supremum norm ‖ · ‖∞, and write fk(x) =

∑kn=0 x

n/n!, then

‖f − fk‖∞ = supx∈[0,1]

∣∣∣∣ ∞∑n=k+1

xn

n!

∣∣∣∣→ 0

as k →∞, i.e. f ∈ P , the closure of P , with respect to this norm.

Example 3.15. In the previous example we saw that the exponential functionbelongs to the closure of the subspace P of polynomials, with respect to thesupremum norm. In fact more is true courtesy of the Weierstrass Approxima-tion Theorem (Theorem 7.8): P is in fact a dense subspace of (C[a, b], ‖ · ‖∞) forany a < b. From this it is not hard to show that P is also dense in (C[a, b], ‖ · ‖1)and (C[a, b], ‖ · ‖2).

The definition of linear independence for a not necessarily finite-dimensionalvector space is based on the finite-dimensional version.

Definition 3.16. Let V be a vector space over K. A subset A ⊂ V is linearlyindependent if for every finite subset F = v1, . . . , vn ⊂ A we have

n∑i=1

αivi = 0 ⇒ α1 = · · · = αn = 0.

That is, every finite subset of A is linearly independent in the MA2055 sense.

19

Example 3.17. The subset δ(n) of l2 and the subset fn ∈ C[a, b] : fn(t) = tnare both easily seen to be linearly independent, but their spans are not all of l2

and C[a, b] respectively, so these are not bases in the usual sense. However, theydo have dense linear spans, which is nearly what the usual definition requires.

Definition 3.18. Let X and Y be normed vector spaces. They are isometricallyisomorphic if there is a linear map U : X → Y that is onto, and such that‖Ux‖ = ‖x‖ (i.e. U is an isometry, since it preserves lengths).

Proposition 3.19. Suppose that U : X → Y is a linear isometry. The U isinjective.

Proof. If x, x′ ∈ X such that Ux = Ux′ then

0 = ‖Ux− Ux′‖ = ‖U(x− x′)‖ = ‖x− x′‖

and so x = x′, i.e. U is injective.

Thus any surjective linear isometry U is bijective, and so invertible. Moreover,U−1 is then automatically linear and isometric.

If X is a finite-dimensional vector space, and T : X → X is a linear map thenthe Rank-Nullity Formula states:

dimX = dim RanT + dim KerT.

It follows that

T onto ⇔ RanT = X ⇔ dim RanT = dimX

⇔ dim KerT = 0 ⇔ KerT = 0 ⇔ T one-to-one,

in which case T is bijective. The same is not true in infinite-dimensions; forexample let X = l2 and consider the map T : l2 → l2 given by

T (x1, x2, x3, . . .) = (0, x1, x2, x3, . . .).

The map T is known as the right-shift operator. It is easy to see that T is linear,and that T is isometric (‖Tx‖ = ‖x‖ for every x), hence T is injective, but it isnot surjective (e.g. (1, 0, 0, . . .) /∈ RanT ), hence not invertible.

Proposition 3.20. Let X and Y be inner product spaces, and U : X → Y a linearmap. Then U is isometric if and only if it preserves inner products, that is

〈Ux,Uy〉 = 〈x, y〉 ∀x, y ∈ X.

Proof. Exercise: one way is very easy, the other uses polarisation.

Example 3.21. The spaces (R2, ‖ · ‖1) and (R2, ‖ · ‖∞) from Example 3.3 areisometrically isomorphic. One way to see this is to draw their closed unit balls.These are both squares, so there is an invertible linear map that takes one squareonto the other; this map will define an isometry (check!).

20

Definition 3.22. Let X be a vector space, and let ‖ · ‖1 and ‖ · ‖2 be norms onX. The norms are called equivalent if there are constants 0 < a 6 b such that

a‖x‖1 6 ‖x‖2 6 b‖x‖1

for all x ∈ X.

Example 3.23. The norms ‖·‖1, ‖·‖2 and ‖·‖∞ on Kn are all equivalent. Indeed,it can be shown that

‖x‖∞ 6 ‖x‖2 66 ‖x‖1 6√n‖x‖2 6 n‖x‖∞

for all x ∈ Kn.

Proposition 3.24. The relation of equivalence of norms defines an equivalencerelation on the set of all norms on a vector space X.

Proof. Exercise.

Proposition 3.25. Let ‖ · ‖1 and ‖ · ‖2 be equivalent norms on a vector space X,and let (xn)∞n=1 be a sequence in X. The sequence converges with respect to ‖ · ‖1if and only if it converges with respect to ‖ · ‖2, in which case the limit is the samein both cases.

Proof. Suppose that (xn)n=1 converges to x with respect to ‖ · ‖1, then

0 6 ‖x− xn‖2 6 b‖x− xn‖1 → b× 0 = 0

as n → 0, and so (xn)n=1 converges to x with respect to ‖ · ‖2. Conversely, if(xn)n=1 converges to x with respect to ‖ · ‖2, then

0 6 ‖x− xn‖1 6 a−1‖x− xn‖2 → a−1 × 0 = 0

as n→ 0, and so (xn)n=1 converges to x with respect to ‖ · ‖1.

Example 3.26. Returning to C[0, 1], note that for any g ∈ C[0, 1] we have

|g(t)| 6 supx∈[0,1]

|g(x)| = ‖g‖∞

for all t ∈ [0, 1]. It follows that

‖g‖1 =

∫ 1

0|g(t)| dt 6

∫ 1

0‖g‖∞ dt = ‖g‖∞.

Consequently f(x) = ex is in the closure of P , the subspace of polynomials, withrespect to ‖ · ‖1 since

‖f − fk‖1 6 ‖f − fk‖∞ → 0 as k →∞

where fk(x) =∑k

n=1 xn/n! is the truncation of the Maclaurin series.

However, it can be shown that there is no constant a > 0 such that

a‖g‖∞ 6 ‖g‖1 ∀g ∈ C[0, 1],

and so the norms ‖ · ‖1 and ‖ · ‖∞ are not equivalent.A similar result is true if we try to compare the norms ‖ · ‖∞ and ‖ · ‖2.

21

Corollary 3.27. If ‖ · ‖1 and ‖ · ‖2 are equivalent norms on a vector space X thenany subset A ⊂ X is open with respect to ‖ · ‖1 if and only if it is open with respectto ‖ · ‖2.

Proof. We have the following equivalences:

A open w.r.t. ‖ · ‖1 ⇔ X \A closed w.r.t. ‖ · ‖1⇔ if (xn)∞n=1 ⊂ X \A converges to x w.r.t. ‖ · ‖1 then the limit x ∈ X \A⇔ if (xn)∞n=1 ⊂ X \A converges to x w.r.t. ‖ · ‖2 then the limit x ∈ X \A⇔ X \A closed w.r.t. ‖ · ‖2⇔ A open w.r.t. ‖ · ‖2

Exercise 3.28. Construct another proof based on nesting of open balls definedwith respect to the two norms, where the radii depend on the constants a and b.

Corollary 3.29. If ‖ · ‖1 and ‖ · ‖2 are equivalent norms on a vector space X thenany subset A ⊂ X is bounded with respect to ‖ · ‖1 if and only if it is bounded withrespect to ‖ · ‖2.

Proof. Exercise.

Corollary 3.30. If ‖ · ‖1 and ‖ · ‖2 are equivalent norms on a vector space Xand f : X → Y is a map into another metric space Y , then f is continuous withrespect to ‖ · ‖1 if and only if it is continuous with respect to ‖ · ‖2.

Proof. Exercise.

Next, an important result that says that from a topological point of view it isimmaterial which norm we put on a finite-dimensional vector space.

Theorem 3.31. All norms on a finite dimensional normed vector space are equiv-alent.

Proof. Let X be vector space with dimX = n, and pick a basis e = e1, . . . , enof X. Consider the invertible linear map T : Kn → X given by Tα :=

∑ni=1 αiei,

where α = (α1, . . . , αn). This allows us to define a norm on X by

‖x‖e := ‖T−1x‖ =( n∑i=1

|αi|2)1/2

if x =n∑i=1

αiei.

Now take any other norm ‖·‖X on X. The maps fi : α 7→ αiei are all continuous(Kn, ‖ · ‖2) → (X, ‖ · ‖X), being compositions of the map Kn 3 α 7→ αi ∈ K andthe map K 3 λ 7→ λei., Hence so is the map

F : Kn → [0,∞), F (α) =

∥∥∥∥ n∑i=1

αiei

∥∥∥∥X

= ‖Tα‖X ,

since it is the sum of f1, . . . , fn, composed with the (new) norm. The unit sphereS := α ∈ Kn : ‖α‖2 = 1 ⊂ Kn is closed and bounded, hence compact (by the

22

Heine–Borel Theorem), so its image under F is compact, thus a := infα∈S F (α)and b := supα∈S F (α) exist and are attained. In particular, since α 6= 0 for allα ∈ S, F (α) > 0 for all α ∈ S, and so 0 < a 6 b. But from this we get

a 6 ‖Tα‖X 6 b ∀α ∈ S⇒ a‖β‖2 = a‖Tβ‖e 6 ‖Tβ‖X 6 b‖β‖2 = b‖Tβ‖e ∀β ∈ Kn.

The latter equivalence follows by writing α = β/‖β‖2 to get α ∈ S, when β 6= 0,and noting that when β = 0 we have equality throughout the last line.

Thus every norm on X is equivalent to ‖ · ‖e, and the result now follows sinceequivalence of norms is an equivalence relation.

Recall that a subset A of a metric space X is compact if every sequence(xn)∞n=1 ⊂ A has a subsequence that is convergent to some point in A. It followsthat every compact set must be closed and bounded, but the converse is not alwaystrue.

Example 3.32. Consider the closed unit ball B(0, 1) of l2. This closed andbounded subset is not compact. To see this consider the sequence (δ(n))∞n=1 ⊂B(0, 1). (The inclusion follows since ‖δ(n)‖2 = 1 for all n.) We have for any m < n

‖δ(m) − δ(n)‖2 = ‖(0, . . . , 0, 1, 0, . . . , 0,−1, 0 . . .)‖2 =√

12 + (−1)2 =√

2.

Consequently there cannot be any subsequence that is convergent, since no subse-quence can be Cauchy, and so the set is not compact.

Corollary 3.33. If X is a finite-dimensional normed vector space then a subsetA is compact if and only if it is closed and bounded.

Proof. This follows since the given norm ‖ ·‖X is equivalent to the norm ‖ ·‖e fromthe proof of Theorem 3.31 and the following equivalences:

(i) A is closed and bounded in (X, ‖ · ‖X) if and only if it is closed and boundedin (X, ‖ · ‖e) by Corollaries 3.27 and 3.29;

(ii) Since (X, ‖ · ‖e) is isometrically isomorphic to (Kn, ‖ · ‖2) by construction, Ais compact with respect to ‖ · ‖e if and only if it is closed and bounded (bythe Heine–Borel Theorem);

(iii) A is compact in (X, ‖ · ‖X) if and only if it is compact in (X, ‖ · ‖e) byProposition 3.25.

23

4 Completeness and Convexity

Recall the following:

Definition 4.1. (a) A sequence (xn)∞n=1 in a metric space X is a Cauchy sequenceif for every ε > 0 we can find some N ∈ N such that d(xm, xn) < ε for all m,n > N .

(b) A metric space is complete if every Cauchy sequence is convergent.

Recall that R and hence C are complete (for C consider the sequences of realand imaginary parts to reduce it to the case of R), but Q is not complete, forexample there is a sequence (xn)∞n=1 ⊂ Q with xn →

√2 /∈ Q, hence the sequence

(xn)∞n=1 is Cauchy, but not convergent within Q.

Definition 4.2. A Banach space is a complete normed vector space. A Hilbertspace is a complete inner product space.

In particular since every inner product space is a normed vector space, it followsthat every Hilbert space is a Banach space, but the converse is not true.

Example 4.3. The space Kn is a Banach space with respect to any norm on it.To prove this all one needs to do is prove it is complete with respect to one norm,then use Theorem 3.31 and Proposition 3.25. The main part of the proof is similarto the next example, which is more complicated.

Example 4.4. The space l2 is a Hilbert space. We already know from Example 2.9that l2 is an inner product space, so it only remains to show that it is complete.So let (x(k))∞k=1 ⊂ l2 be a Cauchy sequence, i.e. ‖x(k)− x(j)‖2 → 0 as k, j →∞.That is, we have a sequence of sequences:

x(1) = (x(1)1, x(1)2, x(1)3, . . .)

x(2) = (x(2)1, x(2)2, x(2)3, . . .)

x(3) = (x(3)1, x(3)2, x(3)3, . . .)

Now for any j, k, n ∈ N we have

|x(k)n − x(j)n|2 6∞∑n=1

|x(k)n − x(j)n|2 = ‖x(k)− x(j)‖22 → 0

as k, j → 0. It follows that for each n the sequence (x(k)n)∞k=1 of components is aCauchy sequence in the complete space K, and hence convergent to some xn ∈ K.

Set x = (x1, x2, . . .) where xn = limk→∞

x(k)n. We want to show x ∈ l2 and that

x(k) → x with respect to ‖ · ‖2. Fix ε > 0. There is some K ∈ N such that‖x(k)− x(j)‖2 < ε for all k, j > K. Then for any N ∈ N and k, j > K

N∑n=1

|x(k)n − x(j)n|2 6 ‖x(k)− x(j)‖22 6 ε2.

Taking the limit j →∞ gives

N∑n=1

|x(k)n − xn|2 6 ε2 ∀k > K,

24

since we only have finitely many sequences on the left hand side. But since theabove is true for all N ∈ N we get

∞∑n=1

|x(k)n − xn|2 6 ε2,

that is, x(k)−x ∈ l2, hence x = x(k)−(x(k)−x

)∈ l2. Moreover ‖x(k)−x‖2 6 ε

for each k > K, so that x(k)→ x with respect to ‖ · ‖2.

Exercise 4.5. Consider the following two sets:

l1 := (xn)∞n=1 :∞∑n=1

|xn| <∞, l∞ := (xn)∞n=1 : supn>1|xn| <∞.

That is, l1 is the set of all summable sequences, and l∞ is the set of all boundedsequences. Show that both are vector spaces with respect to the same operationswe defined on l2, and that both are Banach spaces with respect to the followingnorms:

‖x‖1 :=

∞∑n=1

|xn|, for x ∈ l1, ‖x‖∞ :=∑n>1

|xn|, for x ∈ l∞.

Furthermore show that neither is a Hilbert space.

Show that the following is a closed subspace of l∞:

c0 := (xn)∞n=1 : limn→∞

xn = 0,

the set of sequences that converge to 0. Thus show that it is a Banach space, butthat it is also not a Hilbert space.

Finally, show that c00 is dense subspace of c0, l1 and l2, but not a dense

subspace of l∞. Hence show that c00 with any of the norms ‖ · ‖1, ‖ · ‖2 or ‖ · ‖∞is not complete.

Example 4.6. The space C[a, b] is a Banach space when given the norm ‖ · ‖∞.To see this let (fn)∞n=1 be a Cauchy sequence in C[a, b], then for each t ∈ [a, b] wehave

|fn(t)− fm(t)| 6 maxs∈[a,b]

|fn(s)− fm(s)| = ‖fn − fm‖∞ → 0 as m,n→∞,

so that f(t) = limn→∞

fn(t) exists, being the limit of the Cauchy sequence(fn(t)

)∞n=1

.

This defines a function f : [a, b]→ K such that fn → f pointwise. In fact imitatingthe technique of the previous example it is not hard to show that the convergence isin fact uniform, which implies in particular that the limit function f is continuous,so belongs to C[a, b].

The above shows that C[a, b] is complete with respect to the supremum norm,however it is also not hard to show that it is not an inner product space, so thisis an example of a Banach space that is not Hilbert space.

25

Example 4.7. The space C[a, b] is not complete when given the norm ‖ · ‖2, anorm arising from an inner product. To prove this formally is actually quite fiddly,but consider the following illustrative example: let (fn)∞n=1 ⊂ C[−1, 1] defined by

fn(x) =

0 if x ∈ [−1, 0),

nx if x ∈ [0, 1/n),

1 if x ∈ [1/n, 1].

Sketching these graphs it is clear that pointwise the sequence is convergent to thediscontinuous function f(x) = 1(0,1], the indicator function of the subset (0, 1],where

f(x) = 1(0,1](x) =

0 if x ∈ [−1, 0],

1 if x ∈ (0, 1].

Clearly f /∈ C[−1, 1], but this in itself is not an immediate problem; what we needto show is that there is no g ∈ C[−1, 1] such that ‖fn − g‖2 → 0 as n→∞. Thiscan be done.

Now, it appears that one possible solution is to enlarge C[−1, 1] and insteadlook at R[−1, 1], the vector space of Riemann-integrable functions [−1, 1] → K,to which f belongs, since the inner product makes sense on this space, and also‖fn − f‖2 → 0. However, this space is still not complete.

The solution to this problem is to instead use the space L2[a, b] of Lebesgue-measurable functions f : [a, b]→ K such that∫ b

a|f(t)|2 dt <∞, with inner product 〈f, g〉 =

∫ b

af(t)g(t) dt.

This contains R[a, b], and hence C[a, b], plus additional functions. However, thereis in fact one additional complication, which again can be illustrated using thesequence (fn)∞n=1 above. We noted that f = 1(0,1] is the pointwise limit of thefn, but consider the function h = 1[0,1] where we have added the left endpoint.Although fn(0) 6→ h(0), it is the case that

‖f − h‖22 =

∫ 1

−1|f(t)− h(t)|2 dt.

But

f(t)− h(t) =

−1 if t = 0,

0 if t 6= 0

which is a Riemann-integrable function that has integral 0, i.e. f − h 6= 0 yet‖f − h‖2 = 0. The solution to this is to define an equivalence relation on L2[a, b]by saying that

f ∼ g ⇔ ‖f − g‖2 = 0 ⇔∫ b

a|f(t)− g(t)|2 dt = 0,

which turns out to be equivalent to saying that f(t) and g(t) agree with one anotherexcept perhaps for some values t belonging to a subset of [a, b] of Lebesgue measure

26

zero. The space L2[a, b] is then, officially, the vector space of equivalence classes,where we define

[f ] + [g] = [f + g], λ[f ] = [λf ]

if [f ] denotes the equivalence class of f . In practice this distinction can be safelyignored in most circumstances.

A more abstract answer to the incompleteness problem of (C[a, b], ‖ · ‖2) isgiven by the following:

Theorem 4.8. Let X be an incomplete normed vector space. Then there is aBanach space Y and a linear isometry T : X → Y (i.e. ‖Tx‖ = ‖x‖ for all x ∈ X)such that RanT = Tx : x ∈ X is a dense subspace of Y .

The space Y is called the completion of X; it is not unique, but if Y ′ is anyother Banach space that satisfies the conditions of Theorem 4.8 for some linear mapT ′ : X → Y then there is a linear, length preserving, invertible map S : Y → Y ′

such that ST = T ′. In particular Y and Y ′ are isometrically isomorphic.

Remark. Although Theorem 4.8 says that any incomplete normed vector spacecan be viewed as a dense subspace of a Banach space, and this can in particularbe applied to the incomplete space (C[a, b], ‖ · ‖2), it is usally advantageous to usethe concrete realisation of this completion given by the space L2[a, b].

An alternative viewpoint on completeness comes from looking at series. If Xis a normed vector space and xi ∈ X, i ∈ N, then the series

∑∞i=1 xi is convergent

if the sequence(sN :=

∑Ni=1 xi

)∞N=1

of partial sums is convergent; it is absolutelyconvergent if the series of nonnegative numbers

∑∞i=1 ‖xi‖ <∞.

Proposition 4.9. Let X be a normed vector space. It is a Banach space if andonly if every absolutely convergent series is convergent.

Proof. Suppose X is a Banach space, and let∑∞

i=1 xi be an absolutely convergentseries. For N > M we have, using the triangle inequality,

‖sN − sM‖ =

∥∥∥∥ N∑i=1

xi −M∑i=1

xi

∥∥∥∥ =

∥∥∥∥ N∑i=M+1

xi

∥∥∥∥ 6N∑

i=M+1

‖xi‖ → 0

as M,N →∞, since∑∞

i=1 ‖xi‖ <∞. Thus (sN )∞N=1 is Cauchy, hence convergent,as required.

Suppose, instead, that every absolutely convergent series is convergent, and let(xn)∞n=1 be a Cauchy sequence. Then we can choose N1 < N2 < · · · such that‖xn − xm‖ < 2−r for all n,m > Nr. Now set yr = xNr+1 − xNr , so in particular‖yr‖ 6 2−r, and hence

∑∞r=1 ‖yr‖ 6 1 < ∞, by the Comparison Test, and basic

facts about geometric series.So there is some y ∈ X such that

K∑r=1

yr =

K∑r=1

(xNr+1 − xNr) = xNK+1− xN1 → y.

But then limK→∞

xNK= y + xN1 , i.e. the Cauchy sequence (xn) has a convergent

subsequence, and so must itself be convergent (also to y + xN1).

27

Completeness is a crucial ingredient in the next result.

Definition 4.10. A subset C of a vector space V over K is convex if wheneverx, y ∈ C and t ∈ [0, 1] then tx+(1− t)y ∈ C. Note that tx+(1− t)y = y+ t(x−y),so this just says that the line segment from y to x is contained in C for any pairx, y ∈ C.

Example 4.11. Let Ω = 1, . . . , n. A probability measure/distribution on Ω isa collection of n numbers p1, . . . , pn with the properties

pi > 0 ∀ i, andn∑i=1

pi = 1.

The interpretation is that pi is the probability of outcome i occurring. Note thatwe can think of this as a vector in Rn by writing p = (p1, . . . , pn). If P is then theset of all probability measures on Ω then it is a convex subset (exercise), but nota subspace. In this case a general vector x ∈ Rn corresponds to a general (signed)measure on Ω.

Theorem 4.12. Let X be an inner product space, and C a complete, convexsubset. For each x ∈ X there is a unique point Px ∈ C such that

‖x− Px‖ = dist(x,C) := inf‖x− z‖ : z ∈ C.

Proof. Put d = dist(x,C), then by definition there is a sequence (zn) ⊂ C suchthat ‖x− zn‖ → d. We can apply the parallelogram law to get

0 6 ‖zm − zm‖2 = ‖(x− zn)− (x− zm)‖2

= 2(‖x− zn‖2 + ‖x− zm‖2

)− ‖2x− zn − zm‖2

= 2(‖x− zn‖2 + ‖x− zm‖2

)− 22

∥∥∥x− zn + zm2

∥∥∥26 2(‖x− zn‖2 + ‖x− zm‖2

)− 4d2

since 12(zn + zm) ∈ C by convexity. But now

2(‖x− zn‖2 + ‖x− zm‖2

)− 4d2 → 2(d2 + d2)− 4d2 = 0 ⇒ ‖zm − zn‖2 → 0

as m,n → ∞, so that the sequence (zn) is a Cauchy sequence in the completespace C, hence is convergent to some limit z ∈ C. Moreover

‖x− z‖ = limn‖x− zn‖ = d

by continuity of norms.If y ∈ C such that ‖x− y‖ = d then

‖z − y‖2 = 2(‖x− y‖2 + ‖x− z‖2

)− ‖(x− y) + (x− z)‖2

= 2(d2 + d2)− 22

∥∥∥x− y + z

2

∥∥∥2 6 4d2 − 4d2 = 0,

and so z = y, i.e. the nearest point is unique.

28

Particular examples of convex sets are subspaces; moreover, if X is a Hilbertspace, i.e. a complete inner product space, then closed subsets are complete, andvice versa.

Example 4.13. Let X = C[0, 1] equipped with the supremum norm ‖f‖∞ =maxt∈[0,1] |f(t)|. Then X is a Banach space and Y = f ∈ X : f(0) = 0 is aclosed subspace (exercise). Thus Y is a complete, convex subset of X.

Now consider the constant function g(t) ≡ 1. For any f ∈ Y we have

‖g − f‖∞ = supt∈[0,1]

|g(t)− f(t)| ≥ |g(0)− f(0)| = 1,

i.e. dist(g, Y ) > 1. But f1(t) ≡ 0 satisfies ‖g − f1‖∞ = supt∈[0,1] |g(t) − f1(t)| =supt∈[0,1] |1− 0| = 1. That is, the minimum distance is 1, and it is attained.

However, f0(t) = t is in Y , and ‖g − f0‖∞ = supt∈[0,1] |1 − t| = 1 as well, sothere is not a unique nearest point to g, and hence X cannot be a Hilbert space.In fact, ‖g − f‖∞ for uncountably many f ∈ Y : we can define fα ∈ Y by

fα(t) =

0 if 0 6 t 6 αt−α1−α if α 6 t 6 1

for each α ∈ [0, 1].

Approximation theory involves picking simpler objects that are in some senseclose to more complicated objects of interest. The following result works underdifferent assumptions to Theorem 4.12, and the resulting point is not always soeasy to compute.

Lemma 4.14. Let X be a metric space and A ⊂ X a nonempty, compact subset.For each x ∈ X there is some xA ∈ A such that

d(x, xA) = infd(x, y) : y ∈ A =: dist(x,A).

That is, there is a closest point in A to x.

Proof. We can choose a sequence (xn)∞n=1 in A such that d(x, xn) → dist(x,A)as n → ∞. But A is compact, so this sequence in A must have a convergentsubsequence, (xnk

)∞k=1 say. If we set xA = limk→∞

xnkthen

d(x, xA) = d(x, limk→∞

xnk) = lim

k→∞d(x, xnk

) = dist(x,A)

as required, using continuity of the metric.

Example 4.15. Let X = R with the usual metric, and consider the subsetsA = [0, 1], B = [0, 1) and C = [0,∞). Then A is compact (it is closed andbounded); if x > 1 then xA = 1, if x < 0 then xA = 0, and if x ∈ A then xA = x.

On the other hand, if we take x = 2 then there is no nearest point in B to x.Here, B is not compact, since it is not closed. However, for each x ∈ R there is anearest point xC ∈ C, even though C is not compact since it is not bounded.

29

Example 4.16. Let X = R with the discrete metric:

d(x, y) =

0, if x = y,

1, if x 6= y.

Then a subset A is compact if and only if it is finite. For example A = 0, 1 iscompact. If we take x = 2, then d(0, 2) = d(1, 2) = 1, and so we have a non-uniquechoice for xA, since both points in A are the same distance from x.

However, problems in approximation theory create difficulties with applyingLemma 4.14 directly. For example, if X = C[a, b], and we choose some continuousmap f ∈ X, then we may want to approximate f by a polynomial, or by apiecewise linear function, or similar, and such of sets functions are rarely compact.For example if P ⊂ X is the set of polynomials, then it a subspace and, for eachf ∈ X and ε > 0, the Weierstrass Approximation Theorem says that there issome p ∈ P such that ‖f − p‖∞ < ε. But is there a best approximation? Forcomputational reasons we may seek to limit the degree, i.e. work with Pn, the setof polynomials of degree no more than n, which is a finite-dimensional subspaceof X, but not bounded, so not compact.

Proposition 4.17. Let X be a normed vector space and A ⊂ X a finite dimen-sional subspace. Then for each x ∈ X there is some xA ∈ A such that

‖x− xA‖ = inf‖x− y‖ : y ∈ A.

Proof. Fix x ∈ X, and consider the following closed ball in A:

B := y ∈ A : ‖y‖ 6 2‖x‖ = 2‖x‖BA(0, 1).

Now B is compact by Corollary 3.33, since A is finite dimensional, and thus wecan choose xA ∈ B such that

‖x− xA‖ = inf‖x− y‖ : y ∈ B.

But now consider any z ∈ A \B. We must have ‖z‖ > 2‖x‖, and so

‖x− z‖ > ‖z‖ − ‖x‖ > 2‖x‖ − ‖x‖ = ‖x‖.

However, since 0 ∈ B, we have

‖x− xA‖ 6 ‖x− 0‖ = ‖x‖ < ‖x− z‖,

and so in fact xA is a nearest point in all of A to x.

30

5 Orthogonality

Definition 5.1. Let X be an inner product space. Vectors x, y ∈ X are orthog-onal if 〈x, y〉 = 0. A subset S ⊂ X is an orthogonal set if 〈x, y〉 = 0 for allx 6= y ∈ S. It is an orthonormal set if, in addition, ‖x‖ = 1 for all x ∈ S (i.e. thevectors are normalised).

Example 5.2. (i) In Kn the standard basis

E = e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1)

is an orthonormal set, as is any subset of E.(ii) The sequence of sequences δ(n) : n ∈ N where

δ(n) = (0, . . . , 0, 1, 0, . . .)

with 1 in the nth place is an orthonormal subset of l2.(iii) In L2(−π, π) (or C[−π, π] if you prefer) there is an orthonormal set given

by en(t) = (2π)−1/2eint : t ∈ [−π, π], n ∈ N. This follows since

〈en, em〉 =1

∫ π

−πei(m−n)t dt =

1 if m = n,

0 if m 6= n.

Part (c) of the following generalises Pythagoras’ Theorem.

Lemma 5.3. Let X be an inner product space and S = xii∈I an orthogonal setof vectors. Then

(a) S is linearly independent if 0 /∈ S.

(b) αixii∈I is orthogonal for any choice of scalars αi ∈ K.

(c) For each finite set F ⊂ I,∥∥∑

i∈F xi∥∥2 =

∑i∈F ‖xi‖2.

Proof. (a) Suppose that F ⊂ I is a finite set and we have chosen αi ∈ K for i ∈ Fsuch that ∑

i∈Fαixi = 0.

For each j ∈ F we have

0 =

⟨xj ,∑i∈F

αixi

⟩=∑i∈F

αj〈xj , xi〉.

But 〈xj , xi〉 = 0 unless i = j, since the family is orthogonal, and so

0 = αj‖xj‖2 ⇒ αj = 0, since xj 6= 0.

(b) We have for any i 6= j that

〈αixi, αjxj〉 = αiαj〈xi, xj〉 = 0.

31

(c) We have∥∥∥∑i∈F

xi

∥∥∥2 =⟨∑i∈F

xi,∑j∈F

xj

⟩=∑i∈F〈xi, xi〉+

∑(i,j)∈F×F

i 6=j

〈xi, xj〉,

with 〈xi, xj〉 = 0 when i 6= j.

Recall that if X is an inner product space and Y ⊂ X is an n-dimensionalsubspace for some n ∈ N, then Y is complete; to see this note that we can alwaysdefine a new norm on Y that is equivalent to one that makes it look identicalas a normed vector space to (Kn, ‖ · ‖2) (Theorem 3.31), and since the latterspace is complete, so is Y with respect to either norm (Proposition 3.25). In thissituation it is easy to compute the nearest point whose existence is guaranteed byTheorem 4.12, once one has a basis that is an orthonormal set. Such a basis canalways be manufactured by applying the Gram–Schmidt process to any basis ofY .

Proposition 5.4. Let X be an inner product space and suppose that e1, . . . , en areorthonormal vectors in X. Let Y = Line1, . . . , en, the n-dimensional subspacespanned by these vectors, so that Y is closed. For each x ∈ X the nearest pointPx in Y to X is given by

Px =n∑i=1

〈ei, x〉x, with ‖x− Px‖2 = ‖x‖2 −n∑i=1

|〈ei, x〉|2.

Proof. Let y =∑n

i=1 λiei and ci = 〈ei, x〉. Then, using Lemma 5.3 (b) and (c),

‖x− y‖2 =⟨x−

∑i=1

λiei, x−n∑j=1

λjej

⟩= ‖x‖2 −

n∑i=1

λi〈ei, x〉 −n∑j=1

λj〈x, ej〉+∥∥∥ n∑i=1

λiei

∥∥∥2= ‖x‖2 −

n∑i=1

λici −n∑i=1

ciλi +

n∑i=1

|λi|2

= ‖x‖2 −n∑i=1

cici +

n∑i=1

(λiλi − λici − ciλi + cici

)= ‖x‖2 −

n∑i=1

|ci|2 +

n∑i=1

|λi − ci|2.

As the λi vary it is only the last sum on the right-hand side that changes. Moreoverthis quantity is nonnegative, and equal to zero precisely when λi = ci for all i, sothese are the values of λi required to obtain the minimum distance.

Corollary 5.5. If x ∈ Line1, . . . , en then

x =n∑i=1

〈ei, x〉ei and ‖x‖2 =

n∑i=1

|〈en, x〉|2.

32

For any orthonormal family eii∈I , and x ∈ X, the numbers 〈ei, x〉 are knownas the Fourier coefficients. The following is a fundamental inequality regardingthese numbers, and paves the way to generalise Proposition 5.4 to the case when Xcontains an infinite orthonormal sequence. Note that this can only happen whenX is infinite dimensional courtesy of part (a) of Lemma 5.3.

Theorem 5.6 (Bessel’s Inequality). Suppose that (en)∞n=1 is an orthonormal se-quence in an inner product space X. Then for all x ∈ X we have

∞∑n=1

|〈en, x〉|2 6 ‖x‖2.

Proof. Fix N ∈ N, and let yN =∑N

n=1〈en, x〉en. By Proposition 5.4 we have

‖x− yN‖2 = ‖x‖2 −N∑n=1

|〈en, x〉|2

and soN∑n=1

|〈en, x〉|2 = ‖x‖2 − ‖x− yN‖2 6 ‖x‖2.

The result follows by letting N →∞ on the left-hand side.

The importance of this inequality is seen courtesy of the following convergenceresult for series of orthogonal vectors in a Hilbert space.

Proposition 5.7 (Riesz–Fischer). Let X be an Hilbert space, and let (en)∞n=1 bean orthonormal sequence. The series

∑∞n=1 λnen is convergent in X if and only if∑∞

n=1 |λn|2 <∞.

Proof. First suppose that∑∞

n=1 λnen is convergent, and write x for the value of

this series. Then x = limN→∞

∑Nn=1 λnen, and so for each m ∈ N

〈em, x〉 =⟨em, lim

N→∞

N∑n=1

λnen

⟩= lim

N→∞

N∑n=1

λn〈em, en〉 = λm,

using the facts that the sequence is orthonormal and inner products are continuous.But then Bessel’s inequality gives

∞∑m=1

|λm|2 =

∞∑m=1

|〈em, x〉|2 <∞

as required.Suppose, instead, that we choose λn ∈ K with

∑∞n=1 |λn|2 < ∞, and set

sN =∑N

n=1 λnen, the Nth partial sum of the series that we want to show isconvergent. By parts (b) and (c) of Lemma 5.3 we have for any M < N that

‖sN − sM‖2 =∥∥∥ N∑n=M+1

λnen

∥∥∥2 =N∑

n=M+1

|λn|2.

This latter sum converges to 0 as M,N →∞. Thus (sN )∞N=1 is a Cauchy sequence,hence convergent since we assumed X is complete.

33

So now if X is an infinite-dimensional Hilbert space, (en)∞n=1 an orthonormalsequence, and x ∈ X, then by Theorem 5.6 and Proposition 5.7 it follows that thefollowing series is convergent:

∞∑n=1

〈en, x〉en.

But does the limit equal x? An answer can be given using the following construc-tion.

Definition 5.8. Let X be an inner product space, S ⊂ X a subset. The orthog-onal complement of S is

S⊥ = x ∈ X : 〈x, y〉 = 0 ∀y ∈ S.

Proposition 5.9. Let X be an inner product space, S ⊂ X. Then

(i) S⊥ is a closed subspace of X;

(ii) If T ⊂ S then T⊥ ⊃ S⊥;

(iii) S⊥ = (LinS)⊥ =(LinS

)⊥;

(iv) S ∩ S⊥ = 0 if 0 ∈ S, and = ∅ otherwise;

(v) If x ∈ S and y ∈ S⊥ then ‖x+ y‖2 = ‖x‖2 + ‖y‖2;

(vi) X⊥ = 0 and 0⊥ = X;

(vii) If S =⋃i∈I Si then S⊥ =

⋂i∈I S

⊥i .

Proof. (i) If x, y ∈ S⊥, λ ∈ K, then for all z ∈ S we have

〈x+ λy, z〉 = 〈x, z〉+ λ〈y, z〉 = 0 + 0 = 0,

so that S⊥ is a subspace of X. If (xn) ⊂ S⊥ with xn → x ∈ X then for each z ∈ Swe have

〈x, z〉 =⟨

limnxn, z

⟩= lim

n〈xn, z〉 = 0,

by continuity of inner products, hence S⊥ is closed.

(ii) Trivial.

(iii) By (ii), S⊥ ⊃ (LinS)⊥ ⊃(LinS

)⊥. If x ∈ S⊥ and y ∈ LinS then y =∑n

i=1 λiyi for some n ∈ N, λi ∈ K and yi ∈ S, and so

〈x, y〉 =⟨x,

n∑i=1

λiyi

⟩=

n∑i=1

〈x, yi〉 = 0,

hence x ∈ (LinS)⊥. But now if w ∈ LinS then w = limk→∞

wk for some wk ∈ LinS,

and so

〈x,w〉 = limk→∞〈x,wk〉 = 0,

34

where we have used continuity of the inner product.

(iv) If x ∈ S ∩ S⊥ then ‖x‖2 = 〈x, x〉 = 0, so x = 0.

(v) If x ∈ S and y ∈ S⊥ then ‖x+ y‖2 = ‖x‖2 + 2 Re〈x, y〉+ ‖y‖2 = ‖x‖2 + ‖y‖2.(vi) Let x ∈ X⊥. Since x ∈ X, 〈x, x〉 = 0, and so x = 0. A similar calculationshows 0⊥ = X.

(vii) Exercise.

Theorem 5.10. Let X be a Hilbert space and let enNn=1 be an orthonormal set,where N ∈ N ∪ ∞. The following are equivalent :

(i) en⊥ = 0;

(ii) x =

N∑n=1

〈en, x〉en for all x ∈ X;

(iii) X = Linen;

(iv) ‖x‖2 =N∑n=1

|〈en, x〉|2 for every x ∈ X;

(v) enNn=1 is maximal, i.e. is not properly contained in any other orthonormalset.

Proof. (i) ⇒ (ii): Put y = x−∑∞

n=1〈en, x〉en, noting that the series converges byTheorem 5.6 and Proposition 5.7. For any m we have

〈em, y〉 = 〈em, x〉 −⟨em,

N∑n=1

〈en, x〉⟩

= 〈em, x〉 −N∑n=1

〈en, x〉〈em, en〉 (inner products are linear and continuous)

= 〈em, x〉 − 〈em, x〉 = 0.

Thus y ∈ en⊥, hence y = 0, and (ii) holds.

(ii) ⇒ (iii): Immediate since x is the limit of a convergent series whose partialsums belong to Linen.

(iii) ⇒ (i): Follows by Proposition 5.9(iii): 0 = X⊥ =(Linen

)⊥= en⊥.

(ii) ⇒ (iv): Follows from Lemma 5.3:

‖x‖2 =∥∥∥ limm→N

m∑n=1

〈en, x〉en∥∥∥2 = lim

m→N

m∑n=1

|〈en, x〉|2.

(iv) ⇒ (i): We assume (i) is false, so there is some nonzero x in en⊥. Hence〈en, x〉 = 0 for all n, and so

N∑n=1

|〈en, x〉|2 = 0 6= ‖x‖2.

35

That is, (iv) is also false.

(i) ⇔ (v): We have, by scaling vectors,

en⊥ = 0 ⇔ 6 ∃ nonzero x ∈ en⊥

⇔ 6 ∃ unit vector x ∈ en⊥ ⇔ en maximal.

Definition 5.11. In a Hilbert space H, a total or complete orthonormal set ororthonormal basis is an orthonormal set S ⊂ H satisfying any of the conditionsof Theorem 5.10.

All of the examples in Example 5.2 turn out to be complete orthonormalsets; that they are orthonormal is straightforward, and completeness both fore1, . . . , en ⊂ Kn and δ(n)∞n=1 ⊂ l2 is not hard to show. Completeness of thetrigonometric functions in L2(−π, π) is harder to show.

Definition 5.12. A Hilbert space is separable if it contains complete orthonormalsequence (finite or countably infinite).

Exercise 5.13. Show that if H is a real separable Hilbert space, with enNn=1 acomplete orthonormal set, then the set

∑ni=1 λiei : λi ∈ Q is a countable dense

subset of H, hence H is separable according to the usual metric space definition.Prove the analogous statement for complex Hilbert spaces.

There are nonseparable Hilbert spaces, i.e. no countable orthonormal set iscomplete, and then there is an analogous version of Theorem 5.10 for such spaces.

Theorem 5.14. Let H be a separable Hilbert space. Either dimH = n for somen ∈ N, in which case H is isometrically isomorphic to (Kn, ‖ · ‖2), otherwise H isinfinite-dimensional and isometrically isomorphic to l2.

Proof. We do the case when H contains a complete orthonormal sequence en∞n=1.In this case we can define Tx = (〈en, x〉)∞n=1, so that T : H → l2 by Bessel’sinequality (Theorem 5.6), and is isometric by Theorem 5.10. It is onto since for anyλ = (λn) ∈ l2 we have that x :=

∑∞n=1 λnen ∈ H is well-defined (Proposition 5.7),

and that Tx = λ.

The finite-dimensional case is similar.

Corollary 5.15. All separable infinite dimensional Hilbert spaces are isometricallyisomorphic.

Although we have introduced orthogonal complements in terms of inner prod-ucts, it turns out to be possible to characterise the orthogonal complements ofsubspaces directly in terms of the norm.

Lemma 5.16. Let Y be a subspace of an inner product space X and x ∈ X. Thenx ∈ Y ⊥ if and only if

‖x− y‖ > ‖x‖ for all y ∈ Y. (∗)

36

Proof. If x ∈ Y ⊥ then by Proposition 5.9 (v) we have

‖x− y‖2 = ‖x‖2 + ‖−y‖2 > ‖x‖2,

and so (∗) holds.

Suppose, conversely, that x ∈ X satisfies (∗) and choose y ∈ Y . Pick θ ∈ [0, 2π)such that eiθ〈x, y〉 = |〈x, y〉|. Then for all r > 0 we have reiθy ∈ Y and so

‖x‖2 6 ‖x− reiθy‖2 = ‖x‖2 − 2 Re reiθ〈x, y〉+ ‖−reiθy‖⇒ 0 6 −r|〈x, y〉|+ r2‖y‖2

⇒ |〈x, y〉| 6 r‖y‖2.

Since this is true for all r > 0 we must have 〈x, y〉 = 0 as required.

Theorem 5.17. Let K be a closed subspace of a Hilbert space H. Then H =K ⊕K⊥, that is

H = K +K⊥ := y + z : y ∈ K, z ∈ K⊥ and K ∩K⊥ = 0.

Proof. We already know that K ∩ K⊥ = 0 by Proposition 5.9, so take anyx ∈ H. Let Px ∈ K be the nearest point in K to x, whose existence is guaranteedby Theorem 4.12. For all y′ ∈ K we have

‖x− Px‖ 6 ‖x− y′‖.

If we put z = x− Px, then for any y ∈ K we have y′ = Px+ y ∈ K, and so

‖z‖ = ‖x− Px‖ 6 ‖x− y′‖ = ‖x− (Px+ y)‖ = ‖z − y‖,

so that z = x − Px ∈ K⊥ by Lemma 5.16. That is, x = Px + (x − Px) wherePx ∈ K and x− Px ∈ K⊥.

Remark. The notation V = W ⊕ X has at least two meanings in linear algebra.In general it signifies that V is a vector space and W,X are subspaces such thatW + X = V and W ∩ X = 0, so that V is the (internal) direct sum of thesesubspaces. In this case it follows that every v ∈ V can be written as v = w+x fora unique choice of w ∈ W and x ∈ X. If, further, V is an inner product space, ittypically also means that we insist that 〈w, x〉 = 0.

Thus, Theorem 5.17 establishes that given any Hilbert space H and closedsubspace K, H splits up as the direct sum of the closed subspaces K and K⊥.This ability to always find a complementary closed subspace L to K (i.e. someclosed subspace L such that H = K+L with K∩L = 0) is a property that turnsout to characterise normed vector spaces that can be turned into Hilbert spacesby changing to an equivalent norm. For example, there is no closed subspace Nof l∞ such that l∞ = c0 ⊕N .

Corollary 5.18. If H is a Hilbert space and K a closed subspace then (K⊥)⊥ = K.

37

Proof. It is easy to see that for any subset S ⊂ H we have S ⊂ (S⊥)⊥. So if K isa closed subspace and x ∈ (K⊥)⊥ then we know that x = y + z for some y ∈ Kand z ∈ K⊥ by Theorem 5.17. But then 〈x, z〉 = 0, and so

0 = 〈x, z〉 = 〈y + z, z〉 = 〈y, z〉+ 〈z, z〉 = ‖z‖2,

showing that z = 0, and hence x = y ∈ K as required.

Exercise 5.19. Show that L2(−1, 1), equipped with its usual inner product, isthe orthogonal direct sum of its subspaces of even and odd functions.

Exercise 5.20. Let H = L2[0, 1], the Hilbert space of square-integrable Lebesgue-measurable functions on [0, 1]. Divide the interval [0, 1] into n subintervals bypicking n − 1 numbers 0 < t1 < · · · < tn−1 < 1, and setting t0 = 0, tn = 1. LetK stand for the subspace of functions that are constant on each of the intervals(ti−1, ti) for i = 1, . . . , n (noting that the values at the end points are irrelevantin L2[0, 1] since we are taking functions up to equality except on a set of measurezero).

Now, K is n-dimensional, with basis 1[t0,t1), . . . ,1[tn−1,tn) (which is orthogo-nal but not orthonormal), hence it is a closed subspace of H. Moreover, for anyf ∈ H and g ∈ K we have

g = Pf, the nearest point in K to f ⇔ f − g ∈ K⊥

⇔ f − g ∈ (Lin1[t0,t1), . . . ,1[tn−1,tn))⊥,

⇔ f − g ∈ 1[t0,t1), . . . ,1[tn−1,tn)⊥,

by Theorem 5.17, Theorem 4.12 and Proposition 5.9.However, g ∈ K means that g =

∑ni=1 αi1[ti−1,ti) for some constants αi ∈ K,

and so we see that

g = Pf ⇔ 0 = 〈1[ti−1,ti), f − g〉 =

∫ 1

01[ti−1,ti)(t)

(f(t)− g(t)

)dt ∀i

=

∫ ti

ti−1

(f(t)− g(t)

)dt ∀i.

But note that this last condition is equivalent to requiring∫ ti

ti−1

f(t) dt =

∫ ti

ti−1

g(t) dt = αi(ti − ti−1) ∀i. (†)

That is, we have shown that ‖f−g‖2 as g varies through K is minimised by taking

g =

n∑i=1

αi1[ti−1,ti) for αi =1

ti − ti−1

∫ ti

ti−1

f(t) dt.

In probability theory, the fact that g is a function in K that satisfies (†) meansthat g is a conditional expectation of the random variable f , and here we see thatthe conditional expectation is obtained by taking the orthogonal projection of fonto K.

In this example the basis 1[t0,t1), . . . ,1[tn−1,tn) is not an orthonormal basis,since the norm of each indicator function is

√ti − ti−1, but these functions are at

least orthogonal to one another.

38

6 Continuous/Bounded Linear Maps

We first recall two definitions:

1. If X and Y are vector spaces over K then a map T : X → Y is linear if

(a) T (x1 + x2) = Tx1 + Tx2 ∀x1, x2 ∈ X, and

(b) T (λx) = λTx ∀λ ∈ K, x ∈ X.

This is equivalent to requiring T (x1 + λx2) = Tx1 + λTx2 for all λ ∈ K andx1, x2 ∈ X, and implies, in particular, that T0 = 0.

2. If X and Y are metric spaces then a map f : X → Y is continuous atx0 ∈ X if

∀ε > 0 ∃δ > 0 s.t. d(x, x0) < δ ⇒ d(f(x), f(x0)

)< ε,

equivalently

whenever (xn)∞n=1 ⊂ X with limn→∞

xn = x0 then limn→∞

f(xn) = f(x0),

The map is continuous if

it is continuous at each x0 ∈ X

⇔ f(

limn→∞

xn

)= lim

n→∞f(xn) for every convergent sequence (xn)∞n=1.

Definition 6.1. If X,Y are normed vector spaces over K, then L(X;Y ) denotesthe set of all linear maps X → Y , whereas B(X;Y ) denotes the subset of contin-uous linear maps.

In the next two results we will write B = B(0, 1) for the closed unit ball of agiven normed vector space X.

Proposition 6.2. Let X,Y be normed vector spaces over K, and T : X → Ylinear. The following are equivalent :

(i) T is continuous (i.e. T ∈ B(X;Y ));

(ii) T is continuous at 0;

(iii) ∃K > 0 such that ‖Tx‖ 6 K‖x‖ ∀x ∈ X;

(iv) The image of B is bounded, i.e. ∃C > 0 such that ‖Tx‖ 6 C whenever‖x‖ 6 1.

Proof. (i) ⇒ (ii) is trivial.

(ii) ⇒ (iii): T continuous at 0 implies, taking ε = 1, that there is some δ > 0such that if ‖x− 0‖ = ‖x‖ < δ then ‖Tx− T0‖ = ‖Tx‖ < 1. Now suppose x ∈ Xwith x 6= 0. Then δ

2‖x‖x ∈ X with∥∥∥ δ

2‖x‖x∥∥∥ =

δ

2

‖x‖‖x‖

2< δ,

39

and so ∥∥∥T( δ

2‖x‖x)∥∥∥ =

∥∥∥ δ

2‖x‖Tx∥∥∥ =

δ

2‖x‖‖Tx‖ < 1 ⇒ ‖Tx‖ < 2

δ‖x‖.

For x = 0 we have ‖Tx‖ = 2δ‖x‖ = 0, and so (iii) holds with K = 2/δ.

(iii) ⇒ (i): Suppose that (xn) is a sequence that is convergent to x ∈ X, i.e.‖xn − x‖ → 0, then we have

‖Txn − Tx‖ = ‖T (xn − x)‖ 6 K‖xn − x‖ → 0 as n→∞,

and so (Txn) is convergent to Tx as required.

(iii) ⇒ (iv): If x ∈ B then ‖x‖ 6 1 and so ‖Tx‖ 6 K‖x‖ 6 K, i.e. (iv) holdswith C = K.

(iv) ⇒ (iii): Let x ∈ X with x 6= 0, then x/‖x‖ ∈ B, and so

1

‖x‖‖Tx‖ =

∥∥∥T( x

‖x‖

)∥∥∥ 6 C ⇒ ‖Tx‖ 6 C‖x‖ ∀x ∈ X.

Example 6.3.

1. Let X = C[0, 1] = continuous maps [0, 1] → K with norm ‖ · ‖∞, anddefine T : X → K by Tf = f(0). Since [0, 1] is compact, f is bounded, and

‖f‖∞ = supx∈[0,1]

|f(x)| > |f(0)| = ‖Tf‖∞.

Thus T is continuous. Here K is equipped with the norm obtained from themodulus: ‖λ‖ = |λ|.

2. With X = C[0, 1] again, let g ∈ X, and define M : X → X by Mf = gf .Then (Mf)(x) = (gf)(x) = g(x)f(x), and so

|(Mf)(x)| = |g(x)||f(x)| 6(

supy∈[0,1]

|g(y)|)|f(x)| = ‖g‖∞|f(x)|.

Hence

‖Mf‖∞ = supx∈[0,1]

|(Mf)(x)| 6 ‖g‖∞ supx∈[0,1]

|f(x)| = ‖g‖∞‖f‖∞.

Thus M is continuous.

Proposition 6.4. Let X,Y be normed vector spaces over K and T ∈ B(X;Y ).Set

E = K > 0 : ‖Tx‖ 6 K‖x‖ ∀x ∈ X ⊂ [0,∞), and

F = ‖Tx‖ : x ∈ B (the image of B under x 7→ ‖Tx‖)

Then inf E = supF , and setting ‖T‖ = inf E defines a norm on B(X;Y ).

40

Proof. Let K ∈ E, then ‖Tx‖ 6 K for all x ∈ B as shown in Proposition 6.2.That is, K is an upper bound for F , hence supF exists with supF 6 K. But thisholds for all K ∈ E, i.e. supF is a lower bound for E, and so

supF 6 inf E.

However, since supF is an upper bound for F , if x 6= 0 then x/‖x‖ ∈ B (withnorm 1), and so∥∥∥T( x

‖x‖

)∥∥∥ =1

‖x‖‖Tx‖ 6 supF ⇒ ‖Tx‖ 6 (supF )‖x‖.

This shows that supF ∈ E, hence inf E 6 supF , giving equality as required.

To show that ‖T‖ = supF = inf E is a norm, note that ‖T‖ > 0 by construc-tion, and if T 6= 0 then Tx 6= 0 for some x ∈ B, and so supx∈B ‖Tx‖ > 0. That is,‖T‖ = 0 if and only if T = 0.

If T ∈ B(X;Y ) and λ ∈ K, then λT : X → Y is defined by (λT )x = λ(Tx),and so

‖λTx‖ : x ∈ B = |λ|‖Tx‖ : x ∈ B = |λ|‖Tx‖ : x ∈ B,

from which we get ‖λT‖ = |λ|‖T‖.Finally, if S, T ∈ B(X;Y ) then for any x ∈ B we have

‖(S + T )x‖ = ‖Sx+ Tx‖ 6 ‖Sx‖+ ‖Tx‖ 6 ‖S‖+ ‖T‖,

and so ‖S + T‖ = supx∈B ‖(S + T )x‖ 6 ‖S‖+ ‖T‖.

Exercise 6.5. Check that we must have E = [‖T‖,∞), and F = [0, ‖T‖) orF = [0, ‖T‖]. Give examples to show both possibilities for F can occur.

In Example 6.3 we have ‖T‖ = 1 and ‖M‖ = ‖g‖∞ — in both cases applyT or M to the constant function f(x) ≡ 1, which satisfies ‖f‖∞ = 1, to get‖T‖ > ‖Tf‖∞ = 1, and similarly for M .

Example 6.6. Consider X = R2 with norm ‖(x1, x2)‖∞ = max|x1|, |x2| and letA =

[1 −42 1

]. Then A defines a linear map T : X → X through

T (x1, x2) =

(A

(x1x2

))T= (x1 − 4x2, 2x1 + x2).

Now B = B(0, 1) = (x1, x2) ∈ R2 : max|x1|, |x2| 6 1 is a square withcorners (1, 1), (−1, 1), (−1,−1) and (1,−1). Under the map T these are sent to(−3, 3), (−5,−1), (3,−3) and (5, 1) respectively, and the edges of the square thatis B are mapped by the linear map T to straight lines connecting the images ofthe corners. Consequently T (B) is a parallelogram with these corners, and so

‖T‖ = sup‖y‖∞ : y ∈ T (B) = supmax|y1|, |y2| : y ∈ T (B) = 5.

41

More generally, any m × n matrix B = [bij ] ∈ Mm,n(K) defines a linear mapKn → Km, and if the domain and target spaces are each given the norm ‖ · ‖∞then it can be shown that

‖B‖ = max16i6m

n∑j=1

|bij |,

the maximum absolute row sum. For our A above this is

‖A‖ = max|1|+ |−4|, |2|+ |1| = 5.

Proposition 6.7. Let X,Y, Z be normed vector spaces, S ∈ B(X;Y ) and T ∈B(Y ;Z). Then TS ∈ B(X;Z) with ‖TS‖ 6 ‖T‖‖S‖.

Proof. Compositions of linear maps are linear and compositions of continuousmaps are continuous; hence TS ∈ B(X;Z). Moreover, for all x ∈ B(0, 1) ⊂ X

‖(TS)x‖ = ‖T (Sx)‖ 6 ‖T‖‖Sx‖ 6 ‖T‖‖S‖‖x‖ 6 ‖T‖‖S‖.

Example 6.8. Let P := polynomials functions [0, 1] → R, so a subspace ofC[0, 1] which we can equip with the supremum norm. Define D : P → P by

Df = f ′.

If fn(t) := tn, then ‖fn‖∞ = supt∈[0,1] |tn| = 1, but (Dfn)(t) = ntn−1 = nfn−1(t),so that ‖Dfn‖∞ = n. That is, the image of the unit ball is not bounded, hence Dis not continuous.

Lemma 6.9. If (X, ‖·‖X) and (Y, ‖·‖Y ) are normed vector spaces, and T : X → Yis linear, then ‖x‖T := ‖x‖X +‖Tx‖Y is a norm on X that makes T : (X, ‖·‖T )→(Y, ‖ · ‖Y ) a contraction.

Proof. Exercise.

In particular, the differentiation map D : P → P can be made into a continuousmap, by altering the norm on codomain copy of P .

Corollary 6.10. All linear maps from finite dimensional normed vector spacesare continuous.

Proof. If X is a finite-dimensional normed vector space then all norms on X areequivalent (Theorem 3.31), and then it follows from Proposition 3.25 that T :(X, ‖ · ‖X) → (Y, ‖ · ‖Y ) is continuous if and only if T : (X, ‖ · ‖T ) → (Y, ‖ · ‖Y )is continuous. The latter map is continuous, by Lemma 6.9, and so the resultfollows.

Example 6.11. Consider n investment schemes, and denote the return per unitinvested by r1, . . . , rn. We can create a portfolio X by buying or selling quantitiesof each scheme, i.e. X = (x1, . . . , xn) ∈ Rn denotes the fact that we have xi unitsof scheme i. Thus the return from our investment is

RX := x1r1 + · · ·+ xnrn =

n∑i=1

xiri.

42

Suppose that the quantities ri are random and are modelled by a finite prob-ability space Ω = 1, . . . ,m, i.e. k ∈ Ω are the possible outcomes, and each riis a function/random variable ri : Ω → R. It follows that RX is also a randomvariable:

RX : Ω→ R, RX(k) =

n∑i=1

xiri(k).

Finally, suppose that each outcome k ∈ Ω occurs with probability pk, that is

pk > 0 ∀ 1 6 k 6 m and

m∑k=1

pk = 1.

We can write the probability measure as a vector P = (p1, . . . , pm) ∈ Rm. Theexpected return from our portfolio is then

EP [RX ] =

m∑k=1

pkRX(k).

The following is then a fundamental result in financial mathematics:

Theorem 6.12. Exactly one of the following holds:

(i) There is a probability measure P such that

EP [ri] = 0 ∀ i ⇔ EP [RX ] = 0 ∀ portfolios X.

(ii) There is a portfolio X such that

n∑i=1

xiri(k) > 0 ∀ k ⇔ EP [RX ] > 0 ∀ prob. measures P.

This says that either there is some way of assigning probabilities to the out-comes so that all portfolios are risk-less (i.e. all bets are fair, P is a risk-neutralprobability measure), or there is some portfolio that has a guaranteed profit.

Proof. Observe that

EP [RX ] =

m∑k=1

pkRX(k) =

m∑k=1

pk

(n∑i=1

xiri(k)

)

=

n∑i=1

(m∑k=1

ri(k)pk

)xi

= 〈Y,X〉

where we take the usual inner product on Rn, and define Y ∈ Rn by setting

yi =

m∑k=1

ri(k)pk.

43

That is, Y = RP , where

R =

r1(1) r1(2) · · · r1(m)r2(1) r2(2) · · · r2(m)

......

. . ....

rn(1) rn(2) · · · rn(m)

, P =

p1...pm

.Set C = RP : P ∈ P, where P denotes the set of all probability meausures.

We consider what happens when 0 ∈ C and 0 /∈ C.

0 ∈ C: Let P be a probability measure such that Y = RP = 0, then EP [RX ] =〈0, X〉 = 0 for every X, and so we have (i).

0 /∈ C: Note that P is a convex and compact subset of Rm (ex: check these claims!).It follows that C is the image of P under the map Rm 3 Z 7→ RZ ∈ Rnwhich is linear and continuous (Corollary 6.10). Since P is convex, C mustbe convex. Moreover, since P is compact, so is its image C. Thus C is aclosed subset of Rn, hence complete. It follows from Theorem 4.12 that thereis a nearest point Y ′ ∈ C to 0 ∈ Rn. Since 0 /∈ C we must have Y ′ 6= 0, andtake X = Y ′ to be our portfolio. It is not hard to show (cf. Lemma 5.16)that this nearest point satisfies

〈Y ′, Y ′ − Y 〉 6 0 ∀ Y ∈ C, (†)

hence

EP [RX ] = 〈Y,X〉 = 〈Y − Y ′ + Y ′, Y ′〉= −〈Y ′ − Y, Y ′〉+ ‖Y ′‖2 > 0

since the first term at least 0 by (†), and ‖Y ′‖ > 0 since Y ′ 6= 0. Thus (ii)holds.

In the above example we have used the fact that a probability measure Pdetermines the expectation map EP which is a linear functional on the vectorspace of random variables, that is, a linear map from that vector space to theunderlying field R. We also used Hilbert space ideas, which worked because weused finite dimensions. More care is required for the analytical matters if we takeΩ to be an infinite set.

The following notation is standard (although not universal): if X is a normedvector space then we have the following vector spaces of linear functionals

X ′ := all linear maps X → K, (algebraic dual)

X∗ := all continuous linear maps X → K = B(X;K) (topological dual).

Example 6.13. Let X = C[0, 1] with the supremum norm. Pick t0 ∈ [0, 1] andg ∈ X. We can define maps ϕt0 : X → K and ψg : X → K by

ϕt0(f) = f(t0), ψg(f) =

∫ 1

0g(t)f(t) dt.

44

Then ϕt0 and ψg are linear functionals. We saw earlier that ‖ϕt0‖ = 1 (Exam-ple 6.3, and following remarks). Also,

|ψg(f)| =∣∣∣∫ 1

0g(t)f(t) dt

∣∣∣ 6 ∫ 1

0|g(t)f(t)| dt.

But|g(t)f(t)| = |g(t)||f(t)| 6 |g(t)| sup

s∈[0,1]|f(s)| = |g(t)|‖f‖∞,

and so

‖ψg(f)‖ 6∫ 1

0|g(t)|‖f‖∞ dt = ‖f‖∞

∫ 1

0|g(t)| dt.

That is, ψg ∈ X∗ with ‖ψg‖ 6∫ 10 |g(t)| dt = ‖g‖1. Again, it is actually an

equality, although the easiest way to show this is using the Dominated ConvergenceTheorem of Lebesgue Integration.

Now for any λ, µ ∈ K we can define λϕt0 + µψg ∈ X∗, and then

‖λϕt0 + µψg‖ 6 |λ|‖ϕt0‖+ |µ|‖ψg‖ = |λ|+ |µ|∫ 1

0|g(t)| dt.

But what is the general form of ϕ ∈ X∗?

By definition X∗ ⊂ X ′, with X∗ = X ′ if X is finite dimensional, courtesyof Corollary 6.10. In fact the converse is true, so that X∗ $ X ′ if X is infinitedimensional. A route to showing this uses Zorn’s Lemma to adapt the followingexample.

Example 6.14. Let X = c00 = sequences that are eventually 0 = Linδ(m) :m > 1 where

δ(m)n =

1 if n = m,

0 if n 6= m.

Equip X with the inner product:

〈x, y〉 =∞∑n=1

xnyn,

so that X is an incomplete, dense subspace of l2. Define ϕ : X → K by ϕ(x) =∑∞n=1 nxn. Since for each x ∈ X there is some N > 1 (depending on x) such that

xn = 0 for all n > N , we have ϕ(x) =∑N

n=1 nxn, so the series is convergent, andthis consequently defines a linear functional.

However, ‖δ(m)‖ = 1, and ϕ(δ(m)) =∑

n nδ(m)n = m, for each m > 1, so that ϕ

is an unbounded linear functional, i.e. lies in X ′ \X∗.

Question/problem: given a normed vector spaceX can we characterise/identifyall the elements of X∗?

Lemma 6.15. Let X be an inner product space. For each y ∈ X, ϕy(x) := 〈y, x〉defines a bounded linear functional with ‖ϕy‖ = ‖y‖.

45

Proof. First, ϕy is linear by definition of inner products. Moreover, the Cauchy-Schwarz inequality gives

|ϕy(x)| = |〈y, x〉| 6 ‖y‖‖x‖

showing that ϕy is continuous, with ‖ϕy‖ 6 ‖y‖. Now, if y 6= 0, let x = y/‖y‖, sothat ‖x‖ = ‖y‖/‖y‖ = 1, and

|ϕy(x)| =∣∣∣⟨y, y

‖y‖

⟩∣∣∣ =〈y, y〉‖y‖

=‖y‖2

‖y‖= ‖y‖,

so that ‖ϕy‖ > ‖y‖. If, however, y = 0 then clearly ϕy = 0, and so ‖ϕy‖ = ‖y‖ inthis case as well.

The map y 7→ ϕy is a map X → X∗ that is isometric (i.e. ‖y‖ = ‖ϕy‖) andconjugate-linear (exercise). It turns out to be onto when X is complete, that is allcontinuous linear functionals are of this form.

Theorem 6.16 (Riesz Representation Theorem/Riesz–Frechet Theorem). Let Hbe a Hilbert space, ψ ∈ H∗. Then there is a unique yψ ∈ H such that ψ(x) = 〈yψ, x〉for all x ∈ H.

Proof. If ψ = 0, then ψ(x) = 〈0, x〉, with 0 = ‖0‖ = ‖ψ‖. So assume ψ 6= 0, thenK := Kerψ is a closed subspace of H, with K 6= H (otherwise ψ = 0), and soK⊥ 6= 0 by Theorem 5.17. Thus we can choose some z ∈ K⊥ with ‖z‖ = 1. Forevery x ∈ H we have

ψ(ψ(x)z − ψ(z)x

)= ψ(x)ψ(z)− ψ(z)ψ(x) = 0,

so that ψ(x)z − ψ(z)x ∈ K, and hence

0 = 〈z, ψ(x)z − ψ(z)x〉 = ψ(x)〈z, z〉 − ψ(z)〈z, x〉.

Thus we have

ψ(x) =ψ(z)

〈z, z〉〈z, x〉 = 〈yψ, x〉

for yψ = ψ(z)z. Uniqueness of yψ follows from Lemma 2.4 (exercise).

There are many analogous results that characterise the topological duals of Ba-nach spaces, by showing that they are isometrically isomorphic to some (concrete)Banach space. The following is one such result.

Proposition 6.17. l∞ is isometrically isomorphic to (l1)∗, written (l1)∗ ∼= l∞.

Proof. Pick any x ∈ l1 and y ∈ l∞ then

∞∑n=1

|xnyn| 6∞∑n=1

supm|ym||xn| = ‖y‖∞‖x‖1 (6.1)

46

i.e. the series∑

n xnyn is absolutely convergent, hence convergent. So we candefine a map ϕy : l1 → K by

ϕy(x) :=

∞∑n=1

xnyn.

Claim: the map T : y 7→ ϕy is an isometric isomorphism of l∞ onto (l1)∗. To verifythis we must show ϕy is linear, continuous, ‖ϕy‖ = ‖y‖∞, and that T is linear andsurjective.

ϕy linear: Follows since

ϕy(x+ λz) =∑n

(xn + λzn)yn =∑n

(xnyn + λznyn)

=∑n

xnyn + λ∑n

znyn = ϕy(x) + λϕy(z).

ϕy continuous: From (6.1) we have

|ϕy(x)| =∣∣∣∑n

xnyn

∣∣∣ 6∑n

|xnyn| 6 ‖x‖1‖y‖∞,

and so ϕy is continuous with ‖ϕy‖ 6 ‖y‖∞.

‖ϕy‖ = ‖y‖∞: Given any y,

‖ϕy‖ = sup|ϕy(x)| : x ∈ l1, ‖x‖ 6 1

> sup

|ϕy(δ(n))| : n ∈ N

= sup

n∈N|yn| = ‖y‖∞

and so the result follows.

T is linear: Pick w, y ∈ l∞, λ ∈ K, then for each x ∈ l1 we have

ϕw+λy(x) =∑n

xn(wn + λyn) =∑n

xnwn + λ∑n

xnyn

= ϕw(x) + λϕy(x) = (ϕw + λϕy)(x),

that is, T (w + λy) = ϕw+λy = ϕw + λϕy = Tw + λTy.

T is surjective: Pick ϕ ∈ (l1)∗. Set yn = ϕ(δ(n)) ∈ K, where the sequences δ(n) areas above. Then we have

|yn| = |ϕ(δ(n))| 6 ‖ϕ‖‖δ(n)‖1 = ‖ϕ‖,

and so the sequence y = (y1, y2, . . .) ∈ l∞. Moreover, if x = (x1, . . . , xN , 0, . . .) =∑Nn=1 xnδ

(n) ∈ c00 ⊂ l1 then

ϕ(x) =

N∑n=1

ϕ(xnδ(n)) =

N∑n=1

xnyn = ϕy(x). (6.2)

47

That is, ϕ and Ty = ϕy agree on the dense subspace c00. But given any x′ ∈ l1, wehave x = lim

N→∞x′(N) where x′(N) = (x′1, . . . , x

′N , 0, 0, . . .) ∈ c00 and so it follows

that

ϕ(x′) = ϕ(

limn→∞

x′(N))

= limn→∞

ϕ(x′(N)

)= lim

n→∞ϕy(x′(N)

)= ϕy

(limn→∞

x′(N))

= ϕy(x′)

since both maps are continuous, so that ϕ = Ty as required.

Remark. In a similar fashion we can show that (c0)∗ ∼= l1 and that (l2)∗ ∼= l2. The

latter result is nothing but the Riesz Representation Theorem! The dual space ofC[0, 1] can be identified with the normed vector space consisting of signed (Radon)measures on [0, 1].

Let X and Y be sets, X0 ⊂ X, and f : X0 → Y , f : X → Y maps. Thenf extends f0 if f(x) = f0(x) for all x ∈ X0; alternatively we say that f0 is therestriction of f0 to X0.

Restrictions of linear (respectively continuous) maps are linear (resp. continu-ous), but the converse is not true. Note: if T ∈ B(X;Y ) and X0 is a subspace,then

‖T |X0‖ = supx∈X0,‖x‖61

‖Tx‖ 6 supx∈X,‖x‖61

‖Tx‖ = ‖T‖.

In many settings we want to produce extensions with specified properties. TheHahn–Banach Theorem is a fundamental result in functional analysis, and saysthat norm-preserving extensions of linear functionals always exist. The usual proofmakes use of heavy machinery from set theory (Zorn’s Lemma), but an easy proofis possible for the Hilbert space case courtesy of the Riesz Representation Theorem.

Theorem 6.18. Let H be a Hilbert space and K a closed subspace. Each ϕ ∈ K∗has a unique norm-preserving extension to some ψ ∈ H∗.Proof. Since K is a closed subspace of H, it is complete and hence a Hilbert spaceitself. Thus the given ϕ ∈ K∗ corresponds to some y ∈ K through

ϕ(x) = 〈y, x〉 ∀x ∈ K, with ‖ϕ‖ = ‖y‖.

If we now define ψ : H → K by ψ(z) = 〈y, z〉, it follows that ψ ∈ H∗ with‖ψ‖ = ‖y‖ = ‖ϕ‖, and that ψ|K = ϕ.

All that remains to show is that the extension ψ is unique. So let χ ∈ H∗ withw ∈ H be the corresponding element of H, i.e. χ(z) = 〈w, z〉 for all z ∈ H. Then

χ|K = ϕ ⇔ χ(x) = ϕ(x) ∀x ∈ K ⇔ 〈w, x〉 = 〈y, x〉 ∀x ∈ K⇔ 〈w − y, x〉 = 0 ∀x ∈ K⇔ w − y ∈ K⊥

⇔ w = y + v for some v ∈ K⊥.

However, we also then have, since y ∈ K and v ∈ K⊥,

‖χ‖2 = ‖y + v‖2 = ‖y‖2 + ‖v‖2 = ‖ϕ‖2 + ‖v‖2,

and so χ is a norm preserving extension if and only if v = 0 as well.

48

7 Fourier Series

Our main result in this section is the following:

Theorem 7.1. The sequence enn∈Z, where en(x) = (2π)−1/2einx, x ∈ [−π, π], isan orthonormal basis of/complete orthonormal sequence in L2[−π, π].

We have already seen that the sequence is orthonormal, but still need to es-tablish completeness, which amounts to showing that Linen = X. To do this wewill in fact establish the following:

Proposition 7.2. Let

X = f = g|[−π,π] where g : R→ K is continuous and 2π-periodic,

equivalently we can write

X = f ∈ C[−π, π] : f(−π) = f(π).

Then X is contained in Linen.

Of Theorem 7.1. Recall that by definition/construction C[−π, π] is dense in theHilbert space L2[−π, π], and it is not hard to show that X is dense in C[−π, π]with respect to ‖ · ‖2 (although it is not dense with respect to ‖ · ‖∞). So once weestablish Proposition 7.2 we will have

X ⊂ Linen ⊂ L2[−π, π],

hence, taking closures,

L2[−π, π] = X ⊂ Linen ⊂ L2[−π, π],

as required.

Before we prove Proposition 7.2 we need some notation and some preliminaries.For each f ∈ L2[−π, π] and m ∈ Z+ define

sm(f) :=

m∑n=−m

〈en, f〉en ∈ Linen.

Ideally we would like to show that sm(f) → f as m → ∞ with respect to ‖ · ‖2,but in fact we will show that

σm(f) :=1

m+ 1

(s0(f) + s1(f) + · · ·+ sm(f)

)→ f

as m→∞ for each f ∈ X, with respect to ‖ ·‖2, which shows Cesaro summabilityof the sequence of partial sums. This will be enough since note that σm(f) ∈Linen for each m, and hence we will have f ∈ Linen.

49

Now,

σm(f)(y) =1

m+ 1

m∑j=0

sj(f)(y) =1

m+ 1

m∑j=0

j∑k=−j〈ek, f〉ek(y)

=1

m+ 1

m∑j=0

j∑k=−j

∫ π

−π

1

2πe−ikxf(x)eiky dx

=1

∫ π

−πf(x)

1

m+ 1

m∑j=0

j∑k=−j

eik(y−x) dx

That is, if we define

Km(t) =1

m+ 1

m∑j=0

j∑k=−j

eikt

then

σm(f)(y) =1

∫ π

−πf(x)Km(y − x) dx = (f ∗Km)(y).

Definition 7.3. The function Km is called the mth Fejer kernel, and for anyf, g ∈ L2[−π, π], the function

(f ∗ g)(x) :=1

∫ π

−πf(y)g(x− y) dy

is the convolution of f and g.

Lemma 7.4. For any t ∈ R \ 2πa : a ∈ Z,

Km(t) =1

m+ 1

sin2 m+12 t

sin2 12 t

.

Proof. Write z = eit, then since z = e−it,

j∑k=−j

eikt =

j∑k=−j

zk = zj(1 + z + z2 + · · ·+ z2j)

=zj(1− z2j+1)

1− z

=zj − zj+1

1− zand so

(m+ 1)Km(t) =

m∑j=0

zj − zj+1

1− z

=1

1− z

1− zm+1

1− z− z(1− zm+1)

1− z

=

1

1− z

1− zm+1

1− z− (1− zm+1)

z − 1

=−zm+1 + 2− zm+1

|1− z|2

50

where to go from the second to the third line we multiplied the second term in thebrackets top and bottom by z.

Now,

|1− z| = |1− eit| = |e−it/2 − eit/2| = 2| sin t/2|

and

zm+1 − 2 + zm+1 = ei(m+1)t − 2 + e−i(m+1)t

=(ei(m+1)t/2 − e−i(m+1)t/2

)2=(

2i sin(m+ 1)t/2)2.

The result follows.

Lemma 7.5. The Fejer kernel Km satisfies the following:

(i) Km(t) > 0 for all t ∈ R,m ∈ Z+.

(ii)

∫ π

−πKm(t) dt = 2π for all m ∈ Z+.

(iii) For any δ ∈ (0, π) we have

limm→∞

∫[−π,−δ)∪(δ,π]

Km(t) dt = 0.

Proof. (i) is immediate from Lemma 7.4 if t 6= 2πa for some a ∈ Z, and so followsalso for such t since Km is a continuous function. For (ii) note that∫ π

−πeikt dt =

2π if k = 0,

0 if k 6= 0.

Finally, for (ii), if t ∈ [−π,−δ] ∪ [δ, π] then

sin2 12 t ≥ sin2 δ

2 and sin2 m+ 1

2t 6 1.

Hence

0 6 Km(t) =1

m+ 1

sin2 m+12 t

sin2 12 t

61

m+ 1csc2 δ2 .

⇒ 0 6∫ −δ−π

Km(t) dt+

∫ π

δKm(t) dt 6

2π csc2 δ2m+ 1

→ 0

as m→∞.

Theorem 7.6 (Fejer). Let f : R → K be continuous and 2π-periodic. Thenσm(f)→ f uniformly on R as m→∞.

51

Proof. Choose y ∈ [−π, π]. Letting x = y − t in Lemma 7.5(iii) gives∫ y+π

y−πKm(y − x) dx = 2π ⇒ f(y) =

1

∫ y+π

y−πf(y)Km(y − x) dx.

But Km is 2π-periodic, since it is a linear combination of the en, hence so is thefunction x 7→ f(x)Km(y − x), and thus its integral over any interval of length 2πis the same. So

σm(f)(y) =1

∫ π

−πf(x)Km(y − x) dx =

1

∫ y+π

y−πf(x)Km(y − x) dx

and so ∣∣σm(f)(y)− f(y)∣∣ =

∣∣∣∣ 1

∫ y+π

y−π

(f(x)− f(y)

)Km(y − x) dx

∣∣∣∣6

1

∫ y+π

y−π

∣∣f(x)− f(y)∣∣Km(y − x) dx.

Fix ε > 0. Since f is continuous on [−π, π], it is bounded and uniformly contin-uous, since [−π, π] is compact. Thus there is some M > 0 such that |f(x)| 6 Mfor all x ∈ R, and there is some 0 < δ < π such that

y ∈ [−π, π] and |x− y| < δ ⇒∣∣f(x)− f(y)

∣∣ < ε

2. (∗)

Using the change of variable x = y− t in Lemma 7.5(iii) shows that there is somem0 ∈ N such that for each m > m0 we have∫ y−δ

y−πKm(y − x) dx+

∫ y+π

y+δKm(y − x) dx <

πε

2M.

Moreover,∣∣f(x)− f(y)

∣∣ 6 2M for all x, y ∈ R, and so

1

∫ y−δ

y−π

∣∣f(x)− f(y)∣∣Km(y − x) dx+

1

∫ y+π

y+δ

∣∣f(x)− f(y)∣∣Km(y − x) dx

<1

2π× 2M × πε

2M=ε

2.

But for x ∈ [y − δ, y + δ] we have the estimate in (∗), from which we get

1

∫ y+π

y−π

∣∣f(x)− f(y)∣∣Km(y − x) dx 6

ε

2

1

∫ y+δ

y−δKm(y − x) dx 6

ε

2

since the last integral is bounded above by 2π (Lemma 7.5). Hence∣∣σm(f)(y)− f(y)∣∣ 6 1

∫[y−π,y−δ)∪(y+δ,y+π]

∣∣f(x)− f(y)∣∣Km(y − x) dx

+1

∫[y−δ,y+δ]

∣∣f(x)− f(y)∣∣Km(y − x) dx

2+ε

2= ε

for every y ∈ R (by periodicity) and every m > m0.

52

Proof. [Of Proposition 7.2] For any continuous function g : [−π, π]→ K we have

‖g‖2 =

(∫ π

−π|g(t)|2 dt

)1/2

6

(∫ π

−π‖g‖2∞ dt

)1/2

=√

2π‖g‖∞

hence for f ∈ X we have

‖σm(f)− f‖2 6√

2π‖σm(f)− f‖∞ → 0

as m→∞.

Theorem 5.10, in this context, then says the following:

Corollary 7.7. If f ∈ L2[−π, π] then the Fourier series∑∞

n=−∞〈en, f〉en is con-vergent, with

limm→∞

∥∥∥∥f − m∑n=−m

〈en, f〉en∥∥∥∥2

= 0 and ‖f‖22 =∞∑

n=−∞|〈en, f〉|2.

Moreover, if g ∈ L2[−π, π] then

〈f, g〉 =∞∑

n=−∞〈f, en〉〈en, g〉.

Note that in the corollary we only have convergence with respect to ‖ · ‖2;not necessarily pointwise, and certainly not uniformly. Although Fejer’s Theoremstates that the averages of the partial sums, σm(f), converge uniformly to f whenf is continuous and 2π-periodic, there are continuous functions f such that thepartial sums sm(f)(x) =

∑mn=−m〈en, f〉en(x) fail to converge at certain values of

x. However, it turns out that if the Fourier series does converge pointwise at apoint x where f is continuous, then the value of the limit must be f(x).

Theorem 7.8 (Weierstrass’ Approximation Theorem). Let P denote the vectorspace of polynomials with coefficients from K. It is a dense subspace of C[a, b] forany a < b, with respect to the supremum norm ‖ · ‖∞.

Proof. By a rescaling and translation argument it is enough to show that P‖·‖∞

=C[−π, π].

Now the power series for ez converges uniformly on any compact subset of C,

from which we get einx ∈ P ‖·‖∞ for each n ∈ Z. Next, if f ∈ C[−π, π] such thatf(−π) = f(π) (i.e. f ∈ X from before) then σm(f)→ f with respect to ‖ · ‖∞ as

m→∞, and σm(f) ∈ P ‖·‖∞ for each m. Hence f ∈ P ‖·‖∞ .Finally, if g ∈ C[−π, π] with g(−π) 6= g(π) then f(x) = g(x) + cx is continuous

for any choice of c ∈ K. A particular choice will ensure that f(π) = f(−π)(which?), and so

g(x) = f(x)− cx ∈ P ‖·‖∞

since P‖·‖∞

is a subspace of C[−π, π].

53

Corollary 7.9. For each n = 0, 1, 2, . . ., let fn(x) = xn. For any a < b, if weapply the Gram–Schmidt process to (fn)∞n=0 then we obtain an orthonormal basisof L2[a, b].

Proof. If (gn)∞n=0 is the sequence obtained by applying the Gram–Schmidt processto (fn)∞n=0 then

Lingn = Linfn = P,

where P is the subspace of polynomials in L2[a, b].Now if f ∈ L2[a, b] and ε > 0 then there is some g ∈ C[a, b] such that ‖f−g‖2 <

ε/2. Moreover, the Weierstrass Approximation Theorem says that there is someh ∈ P such that ‖g − h‖∞ < ε/(2

√b− a). But ‖k‖2 6

√b− a‖k‖∞ for any

k ∈ C[a, b], and so

‖f − h‖2 6 ‖f − g‖2 + ‖g − h‖2 <ε

2+√b− a‖g − h‖∞

2+√b− a ε

2√b− a

= ε.

54