Linear Analysis 2010

8/2/2019 Linear Analysis 2010

1/66

Linear Analysiscourse code: 151124

October 2010

University of Twente


2/66

Preface

This course is an introduction to Functional Analysis with the main difference that topology is left out almost entirely.

The topics in the notes for the year 2010-2011 differ only marginally from that of previous years, but the text is

substantially different and, we hope, more precise and easier to read.

ii


3/66

Contents

1 Introduction: real and complex vectors and matrices 1

1.1 Vectors and matrices in Rn and Rkn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The dot product and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Euclidean norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.5 Orthogonal complement in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.6 Subspace, column space and null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.7 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.8 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.9 Normal equations and the projection operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.10 Vectors and matrices inCn and Ckn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Vector space 9

2.1 Real vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Complex vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Linear combination and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Linear transformation 19

3.1 Linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Familiar linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Kernel, image and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Linear transformation onRn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Matrix representation and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Normed vector space 27

4.1 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Cauchy sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Banach space = complete vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 Bounded linear operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Inner product 37

5.1 Real inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Complex inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.4 Orthogonal complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.5 Cauchy-Schwarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.6 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.7 Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.8 Orthonormal sequences and Parseval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.9 Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Hilbert space 47

6.1 Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Complete orthonormal basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Adjoint operator on Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.4 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.5 Unitary operators norm preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

iii


4/66

Index 59

iv


5/66

Notation

tr(A) trace of a square matrix: tr(A) = i aiidet(A) determinant of a square matrix A

Natural, integer, rational, real, complex numbers:

N set of positive integers {1, 2, 3, . . .}N0 set of nonnegative integers {0, 1, 2, 3, . . .}Z set of integers {. . . , 2, 1, 0, 1, 2, . . .}Q set of rational numbers { n

k

n, k Z, k = 0 }R set of real numbers

C set of complex numbers

Real and complex vectors and matrices:

Rn set of ordered n-tuples (u1, . . . , un ) with uk R k = 1, 2, . . . , nCn set of ordered n-tuples (u1, . . . , un ) with uk C k = 1, 2, . . . , nSequence space:

sequence space {(u1, u2, . . . ) uk R, k N}. It can also be written as {u : N R}

(A,B) {u : A B} with A Z. For instance, = (N,R)2 {u : N R

uk R,k=1 u2k < }2(A,C) {u : A C

uk C,kA |uk|2 < }finite {u : N R

uk = 0 for finitely many k N}Function space:

F(A,B) {f : A B}. This is the set of functions that map from some setA to some set Bfor instance R

n

= F({1, . . . , n},R) and =F(N,R).Typically, though,F is used for function spaces such as F([0, 1],R)

L2[a, b] The square integrable functions on [a, b] R:{f : [a, b] B

ba

|f(t)|2 dt < } with either B = R or B = CL1[a, b] {f : [a, b] B

ba

|f(t)| dt < } with either B = R or B = CC[a, b] {f : [a, b] B

f is continuous} with either B = R or B = CPn(A,B) The space of polynomials of degree n or less, that map from A to B. Here A RP The space of polynomials of abitrary degree, P = n0Pn

v


6/66

vi


7/66

1 Introduction: real and complex

vectors and matrices

In this introductory chapter we review familiar facts

about vectors and matrices in Rn and Rkn and their com-plex counterparts, and we introduce a version on of the pro-

jection theorem. It is this projection theorem and most

notably its proof that we use as a motivation for the ab-

stractions and generalizations of the following chapters. It

are these abstractions and generalizations that are the main

focus of this course. In the end the real and complex vec-

tors and matrices play only a marginal role, but it is where

our story begins.

1.1 Vectors and matrices in Rn and Rkn

The set Rn is the set of ordered n-tuples (x1,x2, . . . ,xn)

with xi R, i {1, 2 . . . , n}. Commonly these n-tuplesare identified with column vectors, so we write

Rn = {x

x =

x1x2

...

xn

with xi R }.

Likewise Rnm denotes the set n m real matrices. Ma-trices are denoted by capital letters and their elements by

lower case letters with two subscript indices. The first index

is the row index, the second the column index, for example

A =

a11 a12 a13 a1ma21 a22 a23 a2m

......

......

...

an1 an2 an3 anm

Rnm .

The transpose AT is formed by considering all rows of Aas columns of AT,

AT =

a11 a21 an1a12 a22 an2a13 a23 an3

......

......

a1m a2m anm

R

mn.

It is convenient to think of the transpose AT as the result of

reflecting A in its diagonal. The kth column of a matrix A

is denoted by Ak and, similarly, Ar means its rth row.Thezero (matrix) in whatever dimension nm is usually

denoted simply as 0; the square n n identity matrix is

denoted by In or simply by I,

0 =

0 0...

......

0 0

, I =

1 0 0

0

0

0 0 1

.

We assume familiarity with the common matrix addition

and matrix multiplication.

1.2 The dot product and orthogonality

Definition 1.2.1 (Dot product and orthogonality in Rn ).

The dot product x y of two vectors x , y Rn is the realnumber defined as

x y = x1y1 + x2y2 + +xnyn.

We say that two vectors x , y Rn are orthogonal (withrespect to the dot product) if x y = 0.

Orthogonality ofx and y is often denoted as x y.

Example 1.2.2 (Orthogonality with respect to the

dot product). Consider the vectors v and w shown in

Fig. 1.1(a), that is,

v= v1v2 = 2

1 , w = w1

w2 = 1

2 .

These two vector are orthogonal because

v w = (2 1) + (1 2) = 0.

It is not hard to show that the set of vectors x R2 forwhich v x 0 is the half space shown in Fig. 1.1(b).

v

0

{x R2

v x 0}

v

w

2

1

1

2

0

Figure 1.1: Orthogonal vectors

For R2 and R3 the dot product being zero agrees with

our intuition of being orthogonal (perpendicular) but realize

that we take x y = 0 to be the definition of orthogonalityand that this is the definition for anyRn .

1


8/66

1.3 Euclidean norm

Definition 1.3.1 (Euclidean norm). The Euclidean norm

|x| ofx Rn is defined as

|x| =

x 21 +x 22 + +x 2n .

The set {x Rn |x | 1} is known as the unit ball

(in the Euclidean norm). For n = 1 it is the unit interval[1, 1] and for n = 2 it is the unit disc, see Fig. 1.2.

The Euclidean norm of x equals the square root of the

dot product ofx with itself,

|x| = x x.

{x R2 |x | 1}

(1,0)

(0,1)

Figure 1.2: Unit ball in the Euclidean norm, n = 2

1.4 Pythagoras

Now that orthogonality is definedas having zero dot prod-

uct, the Pythagorean theorem is trivial:

Theorem 1.4.1 (Pythagorean theorem). Let x , y

Rn .

Then

x y |x + y|2 = |x |2 + |y|2.

Proof.

|x + y|2 = (x + y) (x + y)= x (x + y) + y (x + y)= (x x ) + (x y) + (y x ) + (y y)= |x |2 + 2(x y) + |y|2.

Here we used that z (x + y) = z x + z y and thatx y = y x . Convince yourself of these properties.

1.5 Orthogonal complement inRn

The orthogonal complementof some set S Rn is the setofall vectors that are orthogonal to all elements of S. The

orthogonal complement is denoted S.

Example 1.5.1 (Orthogonal complement). Consider

x := 13

10

R3.

Its orthogonal complement is

x : = {y R3 x y = 0}

= {y R3 y1 + 3y2 + 10y3 = 0}

= {y R3

y3 = 1

10y1

3

10y2}

= { ab 1

10a 3

10b

a, b R}

=

x

x

The orthogonal complement here is a plane.

We write V W whenever all elements ofV are per-pendicular to all elements ofW.

1.6 Subspace, column space and null space

(a) (b)

(c) (d)

Figure 1.3: (a) subspace; (b) affine subspace; (c,d)

not subspaces

Very loosely speaking a subspace ofRn is a subset that

is flat, extends in all directions and contains the origin,see Fig. 1.3(a). It is not too hard to formalize subspace:

2


9/66

Definition 1.6.1 (Subspace). A subset S ofRn is a sub-

space if

1. The zero vector 0 is in S,

2. u, v

S implies u

+v

S, (closed under addition)

3. u S, R implies v S. (closed under scaling)

It is customary to use 0 for both the origin (i.e. the

zero vector) and the zero number. IfS is a subspace and

x Rn then x + S is referred to as an affine subspace, seeFig. 1.3(b).

Example 1.6.2 (Column space and Null space). The set

S := {v R3

v= (v1, v2, 0), v1, v2 R}

is a subspace ofR3

. It is the (x , y)-plane. Let us verifythe three defining properties of subspace:

1. Clearly 0 = (0, 0, 0) S2. Ifv, w S then v= (v1, v2, 0) and w = (w1, w2, 0).

Hence v+ w = (v1, v2, 0) + (w1, w2, 0) = (v1 +w1, v2 + w2, 0) and since its last entry is zero alsothis vector is in S.

3. If v S then v = (v1, v2, 0) so that v =(v1, v2, 0) = (v1, v2, 0) and this clearly is againan element ofS.

This subspace can be represented in many different ways:

Let

A =1 00 1

0 0

R32.

Our set S equals the column space Col(A) of the ma-

trix A. This is the set of all possible linear combina-

tions of the columns of A,

Col(A) :

= {x x = Ay, y R

2

}.

Let

W = 0 0 1 R13.The null space, Null(W), of a matrix W is the set of

vectors x for which W x = 0. It will be no surprisethat Null(W) = S for our W.

We can also interpret the null space with dot products.

Let w be the above W, now seen as a vector

w = 001

R3.

The set S is the orthogonal complement w:

S = {v R3 v w = 0}

Equivalently, it is the orthogonal complement of the

entire column space

Col(

00

1

).

This is just to say that the (x , y)-plane is the set of

vectors that is orthogonal to the z-axis.

The following lemma states that any subspace ofRn can

be represented by matrices.

Lemma 1.6.3 (Matrix representation of subspace). Let

S be a subset ofRn . The following four statements are

equivalent:

S is a subspace

S = Col(A) for some matrix A Rnk and somek N

S = Null(W) for some W Rmn and some m N S = W for some set W Rn .

Given a subspace S there are many matrices A and W for

which S = Col(A) = Null(W).

1.7 Projection

x

vV

Figure 1.4: Orthogonal projection in R3

With orthogonality, norm and subspace defined it is now

possible to formulate our intuition that connects minimal

distance (norm) with orthogonality. Here is our first ver-

sion. Have a look at the proof because it is a basis for later

generalizations.

Definition 1.7.1 (Best approximation). An element v V Rn is a best approximation in V of x Rn if

|x

v| |

x

v|

vV.

See Fig. 1.4.

3


10/66

Theorem 1.7.2 (A projection theorem). Let x Rn andlet V be a subspace ofRn . Then

1. v is a best approximation inV ofx iff1 (x v) V,

2. If the best approximation v exists then it is uniqueand it satisfies

|x v|2 = |x|2 |v|2.

Proof. Suppose (x v) V for some v V. Then forany v V the difference v v is in V by the subspaceproperty, and so by Pythagoras we get

|x v|2 = |(x v)

V

(v v)

V

|2

= |x

v|

2

+ |v

v|

2

|x

v|

2.

Hence if v= v then the norm of x vexceeds that ofx v, making v the unique best approximation.

Conversely, suppose (x v) V. Then by definitionthere is a v V such that (x v) vi.e. such that (x v) v= 0. In particular this vis nonzero. We constructan improved approximation of x of the form v + vwiththe real number yet to be determined.

|x (v + v )|2= |(x v) v|2

= |x v|2 2(x v) (v) + |v|2= |x v|2 2[(x v) v] + 2|v|2.

This quadratic expression in is minimized for =(xv)v

|v|2 , rendering it equal to

= |x v|2 2[(x v) v]2

|v|2 +[(x v) v]2

|v|2

= |x v|2 [(x v) v]2

|v|2

< |x v|2

.

So then v is not a best approximation.The equality |x v|2 = |x |2 |v|2 is a restatement of

Pythagoras, see Fig. 1.4.

The theorem avoids the issue of existence of the best

approximation v because we prefer no to worry about itnow. Here (in Rn) it does exist though.

1.8 Transpose

1iff means if-and-only-if

For explicit representations of the best approximation we

remind you of an alternative representation of the dot prod-

uct in terms of transpose of vectors,

x v= vTx = v1 v2 vnx1x

2...

xn

.

Then we get the handy rule that for any kn-matrix A andvectors x Rn , y Rk, the matrix A can be moved fromone side of the dot product to the other

(Ax ) y = x (ATy).

Indeed, (Ax ) y = y T(Ax ) = (ATy)Tx = x (ATy).

1.9 Normal equations and the projection op-erator

If we have the subspace V given in the explicit form

V = Col(A),

then the best approximation v V of x can be obtainedrather explicitly:

Lemma 1.9.1 (Explicit projection normal equations).

Let x Rn and A Rnk. Then

y = arg minyRk

|x Ay|

iff y Rk satisfies the normal equations

ATAy = ATx . (1.1)

The best approximation v Col(A) of x then is v =Ay.

Proof. This is the projection theorem for V = Col(A) andv = Ay. By the projection theorem we need only estab-lish that (x Ay) V.

(x Ay) V (x Ay) (Ay) = 0 y Rk y TAT(x Ay) = 0 y Rk AT(x Ay) = 0 (See Problem 1.2) ATx = ATAy.

This result clearly shows that the transpose is a conve-

nient notion. With it, projections can be formulated ex-

plicitly, something we will come back to later (at which

point we generalize transpose to something called adjoint).The equations (1.1) are known as the normal equations.

4


11/66

The lemma does not require that ATA is invertible and in-

deed the solution y of the normal equation need not beunique, but if ATA is invertible then (1.1) yields

y = (ATA)1ATx

and hence the best approximation v = Ay equalsv = A(ATA)1ATx . (1.2)

Example 1.9.2 (Projection in R2). Let

V = Col(

3

1

) and x =

0

1

.

According to (1.2) the best approximation in V ofx is

v =

3

1

3 1

3

1

1

3 1

0

1

=

3

1

1

10=

0.3

0.1

xv V

0

1.10 Vectors and matrices in Cn and Ckn

We briefly summarize the complex counterpart of the reals.

The setCn is the set of ordered n-tuples (x1,x2, . . . ,xn)

with xi C, i {1, 2 . . . , n}. As in the real case thesen-tuples are often identified with column vectors, so we

write

Cn = {x x =

x1x2

...

xn

with xi C }.

The set Cnm denotes the set n m of complex valuedmatrices. Given a complex matrix,

A =

a11 a12 a13

a1ma21 a22 a23 a2m

......

......

...

an1 an2 an3 anm

Cnm

its complex conjugate transpose2 AH is the matrix defined

as

AH =

a11 a21 a1na12 a22 a2na13 a23 a3n

......

......

a1m a2m

anm

Cmn .

2or Hermitian transpose or conjugate transpose.

The complex conjugate transpose AH can be obtained by

reflecting A in its diagonal and then replacing each element

by its complex conjugate. We say that a matrix isHermitian

if A = AH. If the matrix happens to be real then AH =AT. There are two well accepted notations for complex

conjugate transpose: AH

and A. We choose AH

to set itapart from the adjoint operators that we introduce later.

Example 1.10.1 (Pauli matrix). The Pauli matrices3

1, 2, 3 are the three 2 2 matrices

1 =

0 1

1 0

,

2 =

0 ii 0

,

3 =

1 0

0 1

.

All three are Hermitian and they have the property that

21 = 22 = 23 = i123 = I2.

Let us verify that 22 = I2:

22 =

0 ii 0

0 ii 0

=

1 0

0 1

.

Since Hi = i we also have that Hi i = I2. This propertyis what we later call unitary.

The dot product x y for complex vectors x and y ofequal dimension is defined as

x y = y Hx = y1 yn

x1...

xn

= n

k=1ykxk.

Example 1.10.2 (Norm of complex vector). For v =(1, 2 + i, 3i) C3 we have

v v= v1v1 + v2v2 + v3v3= |v1|2 + |v2|2 + |v3|2

=12

+ |2

+i

|2

+ | 3i

|2

= 12 + (22 + 12) + 32 = 15.

The norm defined as |v| = v vhence is

15.

1.11 Problems

1.1 Let

x =

1

2

3

3 is the common notation for Pauli matrices in physics. In this coursewe typically denote matrices with capital letters however.

5


12/66

a) Determine a matrix A such that x = Col(A)b) How many columns ofA are needed?

1.2 Show that x y = 0 for all y Rn implies that x = 0.1.3 Let W

Rm

n . Prove that Null(W) is a subspace.

1.4 Let A Rnk. Prove that Col(A) is a subspace.1.5 Let S Rn . Show that

a) S (S)b) S is a subspace iff(S) = S

1.6 Let S1,S2 Rn . Is the intersectionS1S2 a subspaceifS1 and S2 are subspaces?

1.7 Consider

A =1 01 4

0 1

Compute the best approximation in V = Col(A) ofx = (0, 0, 1)

1.8 Redo the previous example but now for

A =1 21 2

1 2

1.9 Let A Rn3 and let (as always) Ak denote its kthcolumn. Show that

ATA =

|A1|

2 A2 A1 A3 A1A1 A2 |A2|2 A3 A2A1 A3 A2 A3 |A3|2

.

1.10 Let

V = Null(1 1 1).a) Express V as V = Col(A) for some matrix Ab) Determine the best approximation in V of x =

(0, 0, 1).

c) Sketch V and both x = (0, 0, 1) and its bestapproximation.

1.11 Prove the two properties used in the proof of the

Pythagorean theorem:

x y = y x z (x + y) = (z x ) + (z y)

1.12 Suppose Q is a 2 2 matrix such that |Qx | = |x | forall x

R2.

a) Show that QT Q = I

b) Show that Q has the form

Q =

cos() sin()sin() cos()

or

Q =

cos() sin()

sin() cos()

.

vx+V

x

V

0

Figure 1.5: Minimum norm element v of an affinesubspace x + V

1.13 A version of the projection theorem that appears often

in applications is the following (see Fig. 1.5):

Let x Rn and letV be a subspace ofRn . A vector v V is a minimal normelement of the affine subspacex +V if andonly ifv V.

Prove it.

1.14 Sketch the affine subspace 01 + Col 12 and deter-mine the minimal norm element of this set.1.15 Determine the complex conjugate transpose of

a)

1 + i 1 + 2i

1 + 3i 1 + 4i

b)

3 2 + i 3 + 2i

4 + 2i 4 + i 4

c)i 0 i 1 + i

d)

0 1 + i 3 + 4i1 + i 0 2 6i

3 + 4i 2 6i 0

1.16 Let

x =

1

2i

C2

Determine a complex matrix A and W such that

x = Col(A) = Null(W).(In the complex case, Col(A) is the set of vectors of

the form Ay with y Ck, where k is the number ofcolumns of A.)

1.17 what is the smallest subspace ofR3 that contains theunit circle {(x , y,z)

x 2 + y2 = 1,z = 0}?

6


13/66

1.18 Show that

a) Col(A) = Null(AT)b) Col(A AT) = Null(AT)c) Col(A)

=Col(A AT)

1.19 Formulate and prove a projection theorem for x Cnand V a subspace ofCn . This also requires that you

think about what subspace should mean in Cn (this

chapters only defines it for real vectors).

tk

xk|k|

Figure 1.6: Least squares fit

1.20 Least squares approximation. A very common prob-

lem is to approximate a set of pairs of real numbers,

(t1,x1), (t2,x2), . . . , (tn,xn)

by a straight line, see Fig. 1.6. This can be seen as an

application of the projection theorem inRn with n thenumber of pairs. We write the candidate straight line

as

x (t) = y1 + y2t, with y1, y2 R

and the approximation error of the kth pair we write

as k := xk x(tk), see Fig. 1.6. Ideally k = 0 kwhich would mean that the straight line interpolates

all pairs. In practice we try to make the errors as small

as possible, and the most popular way of doing this is

by least squares approximation:

a) Express the vector (1, . . . , n ) of errors as

12...

n

=

x1x2

...

xn

x

? ?

? ?...

...

? ?

A

y1y2

y

(that is, determine the matrix A)

b) Show that

AT

A =n

k=11 tk

tk t2k

c) Show that ATA is invertible iff tk = tj for atleast one pair (j, k). (This might be a tough

problem.)

d) Show that the sum of squares

nk=1

2k of the er-

rors equals

|x

Ay

|2 and write down the corre-

sponding normal equations in terms of the avail-able data (tk,xk)

e) The least squares fit is defined as the straight

line that minimizes the sum of squaresn

k=1 2k.

Determine the least squares fit (that is, deter-

mine the optimal y1, y2 as functions of tk,xk.

You may assume that ATA is invertible)

7


14/66

8


15/66

2 Vector space

Let us say that it is our purpose to generalize the pro-

jection theorem. Then we should generalize the various

players in the projection theorem. These are

spaceRn ,

subspaceV ofRn ,

dot product,

Euclidean norm.

In this chapter we generalize space Rn (to be called vector

space) and subspace V (still to be called subspace). Vector

spaces and subspaces can be recognized in loads of appli-

cations, the projection theorem being just one of them.

2.1 Real vector space

What properties ofRn did we implicitly use in the projec-

tion theorem and its proof? Have a look at Thm. 1.7.2 and

its proof and you will probably agree that the following

eight properties will do:

Definition 2.1.1 (Real vector space). A real vector space(X,, ) is a nonempty set of elements X, called vectors,on which vector addition XX X and real scalar mul-tiplication RX X is defined with the following eightproperties for all v, w X and all scalars , R:

1. u v= v u commutative2. (u v) w = u (v w) associative3. There is a zero vector, also known as origin, 0 X

such that u 0 = u u X4. For each v

X there is an additive inverse

v

X

such that v (v) = 05. 1v= v6. (v) = ()v associative7. ( +)v= vv distributive8. (u v) = u v distributive

If this is your first contact with such a formal definition

then please realize this: we have the freedom to define our

own addition and multiplication and we may dream up

really weird sets X; but the moment that X with that addi-tion and multiplication satisfies the eight axioms of vector

space then automatically all results we will derive for gen-

eral vector spaces hold for our weird X as well. Thats the

beauty of generality and abstraction.

Before entering a series of examples, you will want to

know that the 8 axioms of vector space imply a host of

other properties. Here are some basic ones:

Theorem 2.1.2 (Basic properties of vector space). Sup-

pose (X,, ) is a real vector space. Then1. The origin 0 X is unique2. The additive inverse is unique: if v w1 = 0 and

v w2 = 0 then w1 = w2.3. 0v= 04. 0 = 05. The additive inverse

vequals (

1)

v

6. v= 0, v= 0 = 0Proof.

1. Suppose that 01 and 02 are two zero vectors. Then

01 02 = 01 and 01 02 = 02. So the two zerovectors are the same.

2. Let w1 and w2 be two additive inverses of v. Then

w1 = w1 0 = w1 (vw2) = (w1 v) w2 =0 w2 = w2.

3. 0

v

=0

v 0

=0

v (0

v

(0

v))

=(0

+0)v

(0v) = 0v(0v) = 04. We proved it already for = 0. If = 0 then v

0 = ( 1 v 0) = ( 1 v) = 1v= vfor every v.Hence 0 satisfies the conditions of the zero vector.

5. v (1)v= 1v+ (1)v= (1 1)v= 0v= 0.6. Suppose v= 0 and v= 0. If = 0 then 1

(v) =

1v = v = 0 while 1(v) = 1

0 = 0. This is a

contradiction. Hence = 0.

In fact properties 3, 4 and 6 of the above theorem can be

combined into

v= 0 ( = 0 and/or v= 0).

One may choose to include any number of the above prop-

erties into the definition of vector space but it is customary

not to do that. We prefer to strip a property from a defi-

nition if it is implied by others properties (axioms) of the

definition.

Example 2.1.3 (Rn ). The space Rn of ordered sequences

of given length n

N, with entries in R,

Rn = {u u = (u1, u2, . . . , un), uk R }

9


16/66

is a vector space under the vector addition and scalar mul-

tiplication defined elementwise as

u v:= (u1 + v1, u2 + v2, . . . , un + vn),u := (u1, u2, . . . , un ).

The subtlety is that the plus-sign in u vrepresents addi-

tion of two vectors whereas the plus-sign in u1 + v1 rep-resents ordinary addition of two real numbers. Likewise

u is a product of scalar and vector u while u1 simplymeans product of two real numbers. It is easy to verify that

the 8 defining properties of vector space hold, i.e. that this

(Rn,, ) is a real vector space. Example 2.1.4 (Sequence space). The space (N; R) isthe set of one-sided infinite sequences

(N; R) = {u

u = (u1, u2,. . .), uk R, k N }.

As in Rn

it is a vector space under the addition and scalarmultiplication defined elementwise as

u v:= (u1 + v1, u2 + v2, u3 + v3,. . .),v:= (u1, u2, u3, . . . ) .

we leave it to the reader to establish that the 8 properties of

real vector space indeed hold.

u

v

u v

Figure 2.1: Two vectors u, v R25 and their sum

Figure 2.1 depicts vector addition in R25. The reason toinclude this figure is to convince you of the fact that also

function spaces can be seen as vector spaces and that con-

ceptually the step from Rn to function space is marginal.

Example 2.1.5 (Function space). The set of functions

F([0, 1],R) := {f : [0, 1] R}that map from [0, 1] to R, is a vector space under addition

and scalar multiplication defined pointwise, at each t, as

( f g)(t) = f(t) + g(t), (f)(t) = f(t).

See Fig. 2.2. It is a bit of a bore to verify the eight definingrules of vector space, but once we have to do it:

1. f g = g f because ( f g)(t) = f(t) + g(t) =g(t)+ f(t) = (g f)(t). The vector addition inheritsthe commutative property of addition of real numbers.

2. ( f g) p = f (g p) indeed, and its proof isvery similar to that of part 1.

3. the function n(t) = 0 t satisfies fn = f for everyfunction f, so n is a zero vector

4. f defined pointwise as f(t) = (1)f(t) is an addi-tive inverse of f because then f f = n

5. 1f = f because (1f)(t) = 1( f(t)) = f(t) t.6. (f) = ()f. This is possibly the trickiest to

prove. Its proof is a series of applications of the defi-

nition of scalar multiplication on our function space:

(f)(t)

=f(t)

t.

Here we go:

((f))(t)= ( f)(t)= (f(t))= ()f(t)= (()f)(t)

So (f) and ()f are indeed the same functions.7. (

+)

f

=

f

f see Problem 2.6.

8. ( f g) = f g see Problem 2.7.

f

g

f g

Figure 2.2: Graph of functions f and g and their

sum f g

Notation cleanup

To avoid unduly cumbersome notation we simplify the no-

tation somewhat.

The dot on top of vector addition was used to empha-

size that it differs from addition of scalars. Now that the

difference is clear, we almost always skip the dot on vector

addition and so + from now on means both vector additionand scalar addition. The context makes clear which one it

is.

Similarly the dot in scalar-vector multiplication such asin vis deleted altogether: v.

10


17/66

Also the underline in the zero vector 0 is usually omitted,

so from now on 0 is used both for the scalar zero and the

zero vector.

Finally, we typically say X is a vector space instead of

the more precise but also more cumbersome (X, +, ) is avector space.

2.2 Complex vector space

A complex vector space differs from a real vector space

only in that the scalars the s and s in a complex

vector space are taken from C instead ofR. For complete-

ness: a complex vector space X is a nonempty set of ele-

ments, called vectors, on which vector additionXX Xand complex scalar multiplication C X X is de-fined that satisfy the 8 properties of Definition 2.1.1 for

all v, w

X and all ,

C. From the context it will be

clear whether we deal with real or complex vector spacesand we refer to the s and s simply as scalars.

The basic properties of Lemma 2.1.2 also holds for com-

plex vector space (the proof is identical).

Example 2.2.1 (Cn ). The space Cn is the set of ordered

n-tuples of complex numbers,

Cn = {u u = (u1, u2, . . . , un); u1, . . . , un C }.

It is a vector space under the addition and scalar multipli-

cation defined elementwise as

u + v= (u1 + v1, u2 + v2, . . . , un + vn),u = (u1, u2, . . . , un ).

Example 2.2.2 (Doubly infinite complex sequence).

The space (Z; C) is the set of doubly infinite orderedsequences

(Z; C)= {u

u = ( . . . , u1, u0, u1,. . .), uk C, k Z }.It is a vector space under the addition and scalar multipli-

cation defined elementwise as

u

+v

=( . . . , u1

+v1, u0

+v0, u1

+v1,. . .),

u = ( . . . , u1, u0, u1, . . . ) .

Example 2.2.3 (Function space). Complex-valued func-

tions

F([0, 1],C) := {f : [0, 1] C}that map from [0, 1]toC can be seen as a vector space with

addition and scalar multiplication defined pointwise as

( f + g)(t) = f(t) + g(t) t [0, 1],(f)(t) = ( f(t)) t [0, 1].

The zero element is the function n(t) that is zero for everyt [0, 1].

2.3 Subspace

A subset of a vector space may be a vector space itself.

For instance the (x , y)-plane of the vector space R3 is it-

self a vector space with addition and scalar multiplication

borrowed from vector space R3. If it has been settled thatX is a vector space, then to test whether or not a subset

V X is a vector space, we need not redo all the 8 defin-ing properties of vector space. It is sufficient to check that

the set is closed under addition and scalar multiplication.

All other axioms of vector space are then inherited by that

ofX. Such subsets, when nonempty, we call subspace.

Definition 2.3.1 (Subspace). A subset V of a vector

spaceX is a subspace ofX if for all u, v V and scalar :1. 0 V,

2. u + v V, closed under addition3. v V. closed under scaling

In a non-empty setV the third condition implies the first

(take = 0). Therefore the first condition in effect onlysays that subspaces are not allowed to be empty.

Example 2.3.2 (Subspace of function space). The set

S = {f : R R c, d R such that

f(t) = c cos(t) + dsin(t) t R }

is a subspace ofF(R,R). Let us verify:

1. the zero function n(t) = 0 t ofF(R,R) is an ele-ment ofS (take c = d = 0),

2. it is closed under addition, for if fk(t) = ck cos(t) +dk sin(t) S then so is their sum ( f1 + f2)(t) =(c1 + c2) cos(t) + (d1 + d2) sin(t) S.

3. it is closed under scalar multiplication, for if

f(t) := c cos(t) + dsin(t) is in S then so isf(t) = (c) cos(t) + (d) sin(t).

Our intuition forR3 that says that a subspace is something

flat may fail for function space. It is a subspace nonethe-

less.

Example 2.3.3 (Finitely nonzero sequence space). The

set of infinite sequences of which only finitely many entries

are nonzero,

finite(N,R) := {u : N R only finitely many uk

are nonzero }

is a subspace of(N; R). See problem 2.14. The next example is important. It considers the set of

square summable sequences and they play a key role infunctional analysis.

11


18/66

Example 2.3.4 (Square summable sequence). The set

of square summable sequences u = (u1, u2, . . . ) of realnumbers is denoted 2(N; R). That is,

2(N

;R)

= {u

=(u1, u2, . . . ) un R,

n=1 u2n 0. Is it true that

any set ofn elements that spans X is a basis ofX?

2.27 A subset S of a vector space is an affine subspace if

it is closed under affine combination, meaning that if

x , yS then

1x + 2y S

for all 1 and 2 that add up to one, 1 + 2 = 1.a) Consider R2 and two elements x1 =

01

and

x2 =

21

. Sketch in the plane the set of all

affine combinations of x1 and x2

b) Show that a nonempty S is an affine subspace

(of some vector space X) iffS = x0 + V forsome x0 X and some subspace V ofX.

c) Let S be an affine subspace. Show that for any

n and any x1, . . . ,xn S we haven

i=1ixi S

whenevern

i=1 i = 1.2.28 Let n > 0. Show that Rn is not a subspace Cn

2.29 Suppose V is a subspace ofX and that dim(X) < .Show that V = X iff dim(V) = dim(X).

2.30 Prove that a subspace of a vector space is itself a vec-

tor space.

2.31 ConsiderP3 with basis {1,x 3, (x 3)2, (x 3)3}.Determine the coordinates with respect to this basis

of

a) 1

b) x

c) x 2

2.32 Consider span{1, eix , eix } F(R,C) with obviousbasis S = {1, eix , eix}. With respect to this basis,determine the vector of coordinates of

a) sin(x )

16


23/66

b) 1 + cos(x )2.33 Alternative definition of vector space. Less common

but more concise is this definition of vector space:

A real vector space is a nonempty setV

with an addition : V V V andscalar multiplication : R V Vthat satisfy the following six axioms for all

x , y,z V and all, R: x (y z) = (x y)z 0x does not depend on x ( +)x = (x ) (x ) (x y) = (x ) (y) ()x = (x ) 1x = x

We denote0

x as 0. We abbreviate(

1)

x

tox andx (y) to x y.a) Show that Definition 2.1.1 implies the above six

axioms

b) Show that the above six axioms imply the eight

of Definition 2.1.1.

In other words, the two definitions of vector space are

equivalent.

17


24/66


25/66

3 Linear transformation

v F(v)

F(V)V

W

Figure 3.1: A mapping F fromV to W

Linear transformations (also known as linear operators

and linear mappings) are everywhere. For instance the pro-

jection theorem (page 4) states that the best approximation

v of an x is unique so we can consider the mapping F thatsends x to its best approximation v = F(x ). This is justone of the many mappings F that turn out to be linear.

3.1 Linear transformation

Definition 3.1.1 (Linearity). Let V and W be two vector

spaces (both real or both complex vector spaces). A map-

ping F from V to W is linear if for every v1, v2, v Vand scalar :

1. F(v1 + v2) = F(v1) +F(v2), additive

2. F(v) = F(v). homogeneous

IfFmaps from V to W then we write F : V W. Wecan apply F to elements (vectors) v V but also to setsS V, and we use the notation F(S) to mean

F(S)= {

F(v) v S}.The range ofF : V W is defined as F(V), i.e. it is theset of all possible outcomes of the mapping. The range is

also known as the image (of its domain) and is denoted as

Im(F). The setW to which Fmaps is sometimes referred

to as the codomain ofF. The codomainW may well be a

much bigger set than the range ofF.

Example 3.1.2 (Linearity on function space). This is

an attempt to graphically explain what linearity means on

function space. Suppose thatFmaps functions x : R Rto functions y : R R, and suppose that

F

=

and

F( ) = .Then additivity implies that

F =

and homogeneity implies that

F

= .

The vector addition and scalar multiplication of the

codomain W induce a form of addition of mappings and

scalar multiplication with mappings. Specifically, for any

two mappings F,G : V W we define the sum of thetwo mappings as

(F+ G)(x ) := F(x ) + G(x )and the product of scalar and the mapping is defined as

(F)(x ) := (F(x )).Also, ifF1 : V1 V2 and F2 : V2 V3 are two map-pings then F2F1 : V1 V3 by definition is the mappingdefined as

(F2F1)(x ) := F2(F1(x )).

3.2 Familiar linear transformations

Well, the most familiar linear transformations are the ones

that map from Rn to Rksee a later sectionbut here are

other standard ones. It is easy to verify that they are indeed

linear. Following the colloquial definition some trickier

issues regarding domain and codomain are added.

Example 3.2.1 (Fourier transform). The Fourier trans-

formation is a linear transformation that sends continuous

time functions x : R C to continuous frequency func-tions x : R C, defined as

x = F(x ) : x() =

x (t) eit dt.

As domain V we could take the set of absolutely inte-

grable functions {x : R C

R|x (t)| dt < } (with

standard addition and multiplication) because then x () iswell defined for every R. As codomain we may takeW := F(R,C). Example 3.2.2 (Fourier series). The Fourier series can

be seen as a linear mapping that sends continuous time

functions on finite interval, x : [0, T] R to countablymany Fourier coefficients x : Z C,

x = F(x ) : xk = 1T

T0

x (t) eik2T t dt, k Z.

19


26/66

As domainVwe could take the set of continuous functions

on [0, T] (but other sensible domains can be dreamed up).

Codomain (Z; C) is natural. Example 3.2.3 (Laplace transform). Likewise the unilat-

eral Laplace transformL is linear as well,

X = L(x ) : X(s) =

0

x (t) est dt.

One might remember that every bounded function x (t) has

a Laplace transform X(s) that is defined for all s C withRe(s) > 0. So if the domain is V = {x : [0, ) R c > 0 such that |x (t)| < c t R} then as codomain

W we might take the functions defined on the open right-

half complex plane, W = {x : ((0, ) + iR) C}. Example 3.2.4 (Convolution and Fredholm). Here is an-

other familiar linear mapping: the convolution Ch ,

y = Ch(u) : y(t) = (hu)(t) :=

h( )u(t ) d.

The convolution is in fact a special case of the general

linear mapping fromF(R,R) to F(R,R),

y = Ffredholm(u) : y(t) =b

a

K(t, s)u(s) ds.

If a and b are finite and K(t, s) is continuous and u is

continuous as well, then the operator is well defined and

its outcome is continuous. The equation relating u and y

is often called Fredholm equation (and the game then is to

find u for given K and y).

Example 3.2.5 (Differentiator). Also linear is the differ-

entiator D,

f = D(g) : f(t) = g(1)(t).

As domain V we should take a vector space whose ele-

ments are differentiable, such as

V = {f : R R

f is differentiable}.

Codomain F(R; R) will do. Let us verify linearity. Forone it is additive, because for any g, h V the derivativeof the sum is the sum of the derivatives,

D(g + h) = (g + h)(1)= g(1) + h(1)= (Dg) + (Dh)

and it is homogeneous as well,

D(g) = (g)(1) = (g(1)) = (Dg).

v(t)

t

wk

k

Figure 3.2: Original signal, sampled signal

Example 3.2.6 (Sampler). The ideal sampler Sh maps

functions to sequences, see Fig. 3.2. More specific, for

a given sampling period h > 0, it is defined as

w = Sh (v) : w(k) = v(kh ), k Z.It is a well defined linear transformation if we choose as

domain, say, V = {v : R R vis continuous } and as

codomainW = (Z; R), both with their standard additionand multiplication. Additivity in words means that the

samples of the sum equals the sum of the samples. Indeed,

(Sh( f + g))(k) = ( f + g)(kh )= f(kh ) + g(kh )= (Sh ( f))(k) + (Sh (g))(k).

It is also homogeneous: the samples of the scaled signal

are the scaled samples of the signal (or scaling commutes

with sampling):

(Sh (f))(k) = (f)(kh ) = ( f(kh )) = (Sh ( f))(k).

3.3 Kernel, image and dimension

Let F : V W be a linear mapping from vector space Vto vector space W. It is readily verified that then ker(F) is

a subspace of the domain V and that Im(F) is a subspace

of the codomain W (Problem 3.7). Now suppose that we

have to find the solutions x of the equation

F(x ) = w.There are two possibilities: either w Im(F), so then nosolution x exists, or

w Im(F).In that case there is at least one x0 for which F(x0) =w. We claim that the complete solution set is the affine

subspace

x0 + ker(F).Indeed, ifF(x0) = w then x satisfies

F(x ) = w F(x ) = F(x0) F(x x0) = 0

x

x

0 ker(F)

x x0 + ker(F).

20


27/66

Example 3.3.1. Let V be the subspace of twice differen-

tiable functions in F(R; R) and let D : V F(R; R) bethe differential operator defined as

D(y) = y(2) + y.

What is the complete solution set (in V) of

(Dy)(t) = 2 et?Clearly y0(t) = et is one solution. The complete solutionset hence is

et + ker(D) = et + span{sin, cos}.

From linear algebra one may recall that any n m ma-trix through elementary row andcolumn operations can be

transformed into the formIr 0r,mr

0nr,r 0nr,mr

Rnm .

In this form it is immediate that the kernel has dimension

m r and that the image has dimension r. These twodimensions add up to m, which is the number of columns

of the matrix. This result holds in greater generality (no

proof):

Lemma 3.3.2 (A dimension theorem). Let F : V Wbe a linear operator from vector space V to vector spaceW

and assume that V is finite dimensional. Then

dim(ker(F)) + dim(Im(F)) = dim(V).In particular, if dim(V) = dim(W) < , then the abovesays that F is injective iff it is surjective:

ker(F) = {0} Im(F) = W.

Example 3.3.3 (Differentiator). Consider the vector

space of polynomials Pn of degree at most n, and the

differentiatorD : Pn Pn defined as D(p) = p.The kernel ofD is

ker(D) = {p Pn p = 0}= {p Pn

p is constant }= P0.

Clearly this kernel has dimension 1. So by the dimension

theorem the range, Im(D), has dimension dim(Pn) 1 =n. It does:

Im(D)

= {D(p) p(t) = antn + + a1t + a0, ai R}

= {nantn1 + (n 1)an1tn2 + a1

ai R}

=Pn

1.

Example 3.3.4 (Abstract interpolation). Clearly given

any two points (x1, y1) and (x2, y2) in R2, with x1 = x2,

there is a unique degree-1 or constant polynomial that in-

terpolates these points:

(x1, y1)

(x2, y2)

With the dimension theorem this can be generalized as fol-

lows. Consider an arbitrary set ofn+1 points (xi , yi ) R2with all xi distinct. We show that there is a unique polyno-

mial of degree n or less that interpolates these points. To

this end consider the mapping

F : PnRn+1

that sends a polynomial p to

F(p) = (p(x1), p(x2) , . . . , p(xn+1)).

A polynomial p interpolates (x1, y1) , . . . , (xn+1, yn+1) iffF(p) = y where y = (y1, . . . , yn+1). The mapping Fis linear (verify this yourself). Now it is well known that

a polynomial of degree n or less does not have n + 1 ze-ros, unless it is the zero function. Hence on Pn we have

F(p) = 0 only if p is the zero element, so

ker(F)

= {0

}.

By the dimension theorem and the fact that Pn and Rn+1

have the same dimension we thus have

Im(F) = Rn+1.

In other words for every y = (y1, . . . , yn+1) there is ap0 Pn that interpolates the n + 1 points (xi , yi ). Infact the solution is unique because the general solution is

p0 + ker(F), and ker(F) = {0}. See Fig. 3.3.

(x1, y1)(x2, y2)

(x3, y3)

Figure 3.3: There is a unique p P2 that interpo-lates the three points

3.4 Linear transformation on Rn

On Rn linear mappings are often identified with matrices.

21


28/66

v

F(v)

0/2

v

F(v)

0

Figure 3.4: Rotation and reflection

Example 3.4.1 (Rotation in R2). Figure 3.4(a) illustrates

the rotation F : R2 R2 operator. It rotates its argu-ment over an angle of (counter clockwise). It is a linear

mapping (verify this). In particular it maps the unit vector

e1 :=

10

to y1 :=

cos()sin()

and the unit vector e2 :=

01

to y2 := sin()cos() . Combining the two outcomes in a ma-trixFrotation :=

y1 y2

= cos() sin()sin() cos()

is the standard way of representing this linear mapping.

Example 3.4.2 (Reflection in R2). Figure 3.4(b) depicts

the reflection transformation F : R2 R2. It reflectsits argument with respect to the line with angle /2. The

matrix F now becomes

Freflection =cos() sin()

sin() cos()

.

Example 3.4.3 (transformation on R3). Suppose we

have a mapping T that we know to be linear and that

sends the unit cube to a stretched version, see Fig. 3.5, in

particular that

T(e1) = e1, T(e2) = 2e2, T(e3) = e3.

The matrix T associated with this mapping (with respect

to the standard basis) is

T =1 0 00 2 0

0 0 1

.

Identifying linear mappings with their matrix has to do

with the fact that the linear mapping is completely specified

by its matrix (a proof follows shortly). The drawback of

such a matrix approach is that it assumes that we all agree

on what the standard basis is and while this may be so

(well) in Rn , for other vector spaces this may not be soobvious.

e1

e2

e3

T(e1) = e1 T(e2) = 2e2

T(e3) = e3

Figure 3.5: Unit cube linearly transformed

3.5 Matrix representation and eigenvectors

A message of the previous section is this: once we settle

on a basis then the linear mapping may be identified with a

matrix of scalars. As mentioned earlier, a drawback of such

an approach is that it assumes agreement on the choice of

basis. On the other hand, an advantage is that it translates

the linear mapping into a matrix of numbers, which makes

it explicit (e.g. matlabable). Consistent with the previoussection we define:

Definition 3.5.1 (Matrix representation of linear map-

pings). Let V be a vector space with finite ordered basis

S = {v1, v2, . . . , v n}. For any x V let xS Rn (or Cn)denote the column vector of coordinates of x with respect

to the basis, that is,

x =n

i=1vixS,i . (3.1)

For any linear transformation

F : V V

the matrix FS S ofFwith respect to the basis S is defined as

the n n matrix whose columns are the coordinate vectorsof the transformed basis elements,

FS S =

[F(v1)]S [F(v2)]S [F(vn)]S

.

The connection (3.1) between x and xS may be written

compactly using a row vector of basis elements, as

x = v1 v2 vnxS.For x = F(vi ) this shows that [F(vi )]S is determined bythe equation

F(vi ) =

v1 v2 vn

[F(vi )]S,

and that the matrix FS S since it is just the collection of

all these [F(vi )]S is determined byF(v1) F(v2) F(vn)

= v1 v2 vn FS S.The following lemma says that linear transformations on

finite dimensional vector space are completely specified bytheir matrix:

22


29/66

Lemma 3.5.2 (Matrix representation of linear transfor-

mation). Let V be a vector space with finite ordered ba-

sis S = {v1, . . . , v n} and let x , y V and suppose thatF : V V is linear. Then

y

=F(x )

yS

=FS SxS .

Proof. By definition ofxS we have x =

v1 vn

xS .

Using linearity we get F(x ) = F(v1 vnxS) =F(v1) F(vn)

xS =

v1 vn

FS SxS . So

y = F(x ) iff v1 vnyS = v1 vn FS SxS .As the {v1, . . . , v n} are linearly independent this last equal-ity holds iff yS = FS SxS.Lemma 3.5.3 (Eigenvalue and eigenvector). Let be a

scalar. Consider a linear mapping F : V V and let FS Sbe the matrix of this mapping, given some basis S ofV.

The following statements are equivalent.

1. There is an x V, x = 0 such that F(x ) = x .2. is an eigenvalue of the matrix FS S .

Such nonzero x we call an eigenvectorof the mapping, and

the scalar an eigenvalue of the mapping.

Proof. Apply Lemma 3.5.2 for y = x , and realize thatx = 0 iffxS = 0.

The lemma implies that the eigenvalues of FS S do not

depend on the choice of basis. Better yet, the notion of

eigenvalue does not require the notion of basis. For com-

plicated linear mappings it may however be hard to find the

eigenvalues and eigenfunctions and then a matrix represen-tation may help.

Example 3.5.4 (Differentiator). Consider the differentia-

tor D : Pn Pn that sends polynomials p of degree atmost n to their derivative D(p) := p(1). A basis for Pnclearly is

S := {1, t, t2, . . . , tn}and they map to

{0, 1, 2t, . . . , ntn1}.

With respect to this basis S, the matrix DS S that representsthe differentiator on Pn can be derived from

D(1) D(t) D(t2) D(tn )= 0 1 2t ntn1

= 1 t t2 tn

0 1 0 00 0 2

. . ....

0 0. . .

. . . 0...

. . .. . .

. . . n

0 0

DSS

.

The matrix DS S is not invertible hence neither is the differ-entiator. Indeed the differentiator is not invertible because

every constant maps to 0. The only eigenvalue that the ma-

trix has is = 0 hence the differentiator has no eigenval-ues other than = 0. Indeed, the derivative of any poly-nomial is of lower degree so nonconstant eigenfunctions

do not exist. The eigenfunctions with eigenvalue 0 are the

constant functions.If we choose as domainV = span{et, e2t} with obvious

basis V = {et, e2t} then the matrix DV V of the differentia-tor becomes

DV V =

1 0

0 2

because

D(et) D(e2t)

= et 2 e2t = et e2t 1 00 2

.

Now DV V is invertible, hence the differentiator is invert-

ible on span{et, e2t}, indeed it is. Also, its eigenvalues are1 and 2 hence f span{et, e2t} exist with D( f) = f andD( f) = 2 f, Clearly such f exist.

1/2

p(t) g(t) = p(1 t)

Figure 3.6: g(t)

=p(t

1)

Example 3.5.5 (Eigenfunction). Consider the mapping

F : P2 P2 defined as

g = F(p) : g(t) = p(1 t).

The graph (t, g(t)) is the graph (t, p(t)) reflected in the

vertical axis at t = 1/2, see Fig. 3.6. With respect to thestandard basis S = {1, t, t2} the matrix FS S follows as

F(1) F(t) F(t2) = 1 1 t (1 t)2

= 1 t t2 1 1 10 1 20 0 1

FSS

.

Because of its upper-triangular structure, the eigenvalues

of FS S are the diagonal elements,

= 1(twice) and = 1.

It is readily verified that the corresponding eigenvectors

(modulo scaling etc.) are

= 1 : v1 = 100

, v1 = 011

23


30/66

and

= 1 : v1 =12

0

.

This corresponds to the eigenfunctions

p1(t) =

1 t t2 10

0

= 1,

p1(t) =

1 t t2 01

1

= t2 t,

and

p1(t) = 1 t t2 120

= 2t 1.See Fig. 3.7. Since the eigenvector v1 for = 1 isunique (up to scaling) the eigenfunction p1 with eigen-value 1 is unique as well (up to scaling). The eigenfunc-tions with eigenvalue 1 are the linear combinations of p1and p1 .

12

p1

1

2

p1

12

p1

Figure 3.7: Three eigenfunctions (Example 3.5.5)

3.5.1 Eigenspace

Eigenvectors are not unique. If vis an eigenvector then

so are 2vand 3v, all with the same eigenvalue. For anyeigenvalue of a linear mapping F, the set of all eigenvec-

tors, including the zero element, equals

E : = {v F(v) = v}

= {v 0 = (I F)(v) }

= ker(I F).

This set E is a subspace and we call it the eigenspace of

F for eigenvalue .

Example 3.5.6 (Eigenspaceon infinite dimensional vec-

tor space). Let L : F(R,R) F(R,R) be the linearmapping defined as

(Lf)(t) = t2 f(t) t R.

We determine the eigenvalues and eigenspaces of this map-

ping. Now a nonzero f F(R,R) is an eigenvector witheigenvalue if

t2 f(t) = f(t) t R. (3.2)

Since t2 is real, any eigenvalue is necessarily real as well.

Among these we distinguish three cases:

If < 0 then (3.2) holds only if f(t) = 0 t. Butthe zero function is by definition not an eigenvector.

Hence no < 0 is an eigenvalue.

If = 0 then (3.2) implies that f(t) = 0 for all t = 0.The value f(t) at t = 0 may be anything as long asit is nonzero because eigenvectors are by definition

nonzero. So

f(t) = 1 t = 00 t = 0

is an eigenvector with eigenvalue 0 and the corre-

sponding eigenspace is the 1-dimensional

E0 = span{f1}.

If > 0 then (3.2) holds at t = irrespective off. At all other t we need f(t) = 0. Now

f2(t)

= 1 t = 0 t =

, f3(t)

= 1 t = 0 t =

are two independent eigenvectors with eigenvalue

and the eigenspace in this case equals

E = span{f2, f3}.

It has dimension two.

Notice that in the above example every real number 0 is an eigenvalue of the mapping. This is in stark contrast

with mappings on finite dimensional vector space, which

have finitely many eigenvalues only.

Example 3.5.7. The differentiatorD : Pn Pn of Ex-ample 3.5.4 has one eigenvalue only, = 0, and the eigen-vectors were shown to equal the nonzero constant func-

tions. The eigenspace for = 0 is E=0 = span{1}. It isthe set of all constant functions, including the zero func-

tion.

Example 3.5.8. The mapping of Example 3.5.5 has two

eigenvalues, = 1 and = 1. The eigenspaces are

E=1=

span

{1, t2

t

}, E=1

=span

{2t

1

}.

24


31/66

3.5.2 Diagonalization

A linear transformation F : V V is said to be diagonal-izable ifV has a basis S with respect to which the matrix

FS S is diagonal. More succinctly, it is diagonalizable if the

space has a basis of eigenvectors ofF.

Example 3.5.9 (Differentiator). The differentiator D :

Pn Pn of Example 3.5.4 is not diagonalizable be-cause only the constant functions are eigenfunctions and

these do not form a basis ofPn (unless n = 0).The same differentiator D : V V but now with V =

span{et, e2t} is diagonalizable. Example 3.5.10. Consider the linear mapping A : R2 R2 that, with respect to some basis S = {s1, s2}, has matrixrepresentation

AS S =1 1

6 2

.

This matrix has characteristic polynomial

det(I AS S) = det

1 16 2

= 2 3 4

and its zeros are 1 = 4 and 2 = 1. The correspondingeigenspaces follow as

E

=4

=ker(4I

A)

=ker

3 16 2 = span

1

3and

E=1 = ker(IA) = ker2 16 3

= span

1

2

.

Hence V := {v1, v2} defined as

v1 =

s1 s2 1

3

, v2 =

s1 s2

12

are eigenvectors ofA, and the matrix AV V with respect to

this basis is the diagonal matrix of eigenvalues,

AV V =

1 0

0 2

=

4 0

0 1

.

The AS S we started with can now be written as a product

of three matrices, each with its own interpretation:

AS S =

1 1

3 2

transform

coordinates in basis Vto coordinates in basis S

4 0

0 1

apply mapping

in coordinatesof basis V

1 1

3 21

transform

coordinates in basis Sto coordinates in basis V

3.6 Problems

3.1 Let L : F(R,R) F(R,R) be the operator de-fined as (Lf)(x ) = x 2 f(x ). Show that L is linear.

3.2 Determine which of the following mappings are lin-ear:

a) F : R R : F(t) = 3t+ 1b) A : Pn R : F(p) = p(1)(3)c) B : P P : B(p) = p(1)d) G : Cn C : G(x ) = aHx (where a Cn is

some given vector)

3.3 The plus sign + appears four times in Section 3.1.Which of these four plus signs indicate the the same

type of addition?

3.4 Let V,W be two real vector spaces or two complexvector spaces and let L(V,W) be the set of linear op-

erators from V to W. On this set of operators we de-

fine addition and scalar multiplication as

(A+ B)(x ) := A(x ) + B(x ),(A)(x ) := (A(x )).

a) Show thatA+ B is linear ifA,B are linearb) Show that A is linear ifA is linear and is

scalar

c) Show thatL(V,W) is a vector space

d) Briefly comment on a link between L(Ck

,Cn

)and n k complex matrices3.5 Let V be a subspace ofRn . Show that the orthog-

onal projection from x to its best approximation v(Thm. 1.7.2) is linear.

3.6 Assume F is linear. Show that for any m N andscalars a1, . . . , am and vectors v1, . . . , v m there holds

F(a1v1 + a2v2 + + amvm )= a1F(v1) + a2F(v2) + + amF(vm ).

3.7 Suppose F : V

W is linear and that V and W are

complex vector spaces.

a) Show that ker(F) is a subspace ofV

b) Show that Im(F) is a subspace ofW

3.8 Let B Cnn . Show that the mapping L : Cnn Cnn defined as L(A) = A B B A is linear.

3.9 Let C([a, b],R) denote the subspace of continuous

functions inF([a, b],R). Is the integral operator J :

C[a, b] C[a, b] defined as

f

=J(g) : f(t)

= t

a

g( ) d

linear?

25


32/66

3.10 Consider the linear transformation F : P1 P1defined by

F(0 + 1t) = 0 + (80 1)t.a) Determine the matrix ofF with respect to the

standard basis ofP1.

b) Determine the matrix ofFwith respect to basis

{t+ 1, t 1}.c) Determine the eigenvalues of the above two ma-

trices.

d) Determine the eigenvalues ofF without using

the matrices.

3.11 Let (N; R) and consider the mapping :(N; R) (N; R) defined as

f

=(g) : fk

=kgk.

a) Show that the mapping is linear

b) What are the eigenvalues of?

3.12 Consider the complex vector space of infinitely often

differentiable functions

C(R,C)

= {u + iv u(k), v(k) F(R,R) k N}.

Consider on this space the differentiator D( f) =f(1). Determine all eigenvalues ofD.

3.13 LetA,B, C : V V and supposeV has a finite basisS. Show that

A = BC AS S = BS SCS S

3.14 Consider the subspaceW := span{1, sin(x ), sin(2x )}of F(R,R) and the second derivative T : W W, T(g) = g(2).

a) Determine the eigenvalues and eigenspaces ofT

b) Is T : W W diagonalizable?3.15 Let V be a vector space and A : V V a linear

transformation.

a) SupposeA = A2. Show that 0 or 1 are the onlypossible eigenvalues

b) SupposeAk = 0 for some k N. Which eigen-values are possible?

c) Construct a V and linear A : V V for whichA = 0 while A2 = 0.

3.16 Consider the mapping F : P2 P2 defined asF(p)(t) = p(t).

a) Determine a basis S ofP2

b) Determine the matrix FS S of the mapping withrespect to this basis S

c) Find the eigenvalues and eigenvectors ofFS S

d) Find the eigenvalues and eigenfunctions p P2 ofF.

3.17 Repeat the previous question but now for mapping

F(p)(t) = p(t+ 1).3.18 Determine eigenvalues and eigenvectors of A and

check whether or not A can be diagonalized, for

a) A =

1 2

0 3

b) A =

0 1

2 3

c) A =

1 1

0 1

3.19 Show that

A =

1 3

1 1

is diagonalizable. Use this to compute A4.

3.20 Is the operator of Example 3.5.5 diagonalizable?

26


33/66

4 Normed vector space

A normed vector space loosely speaking is a vector space

in which a length a size of a vector is available. This

additional structure allows us to deal with optimal approx-

imation and with limits of vectors. We denote the length

of a vector x by x and call it the norm ofx .

4.1 Norm

Definition 4.1.1 (norm). Let V be a real or complex vec-

tor space. A mapping from V to R is a norm if for allx , y V and all scalars it satisfies the three axioms:

1. x = ||x, (positive homogeneous)2. x + y x + y, (triangle inequality)3. x > 0 for every x = 0. (positive definite)

For = 0 the first axiom tells us that 0 = 0. Soa norm x is zero if and only if x is the zero vector. Anormed vector space is a vector space on which a norm is

defined. Formally one should say (V, ) is a normedvector space but we usually just say V is normed vector

space assuming that the choice of norm is clear from theproblem at hand. Be aware, however, that a vector space

can be equipped with many different norms.

x1 1 (1, 0)

(0, 1)

x2 1 (1, 0)

(0, 1)

x 1

(1, 1)(1, 1)

Figure 4.1: Unit balls in p-norm for p = 1, 2,

Example 4.1.2 (Three different norms on R2).

1. The 1-norm is defined as

x1 = |x1| + |x2|.

In the first quadrant where x1 and x2 are nonneg-

ative the 1-norm is just the the sum the entries,

x1 = x1 + x2. In the first quadrant therefore thenorm is at most 1 iffx2 1 x1, which is the region

(1, 0)

(0, 1)

Combined with the other three quadrants we get that

the unit ball {x x1 1} is a polytope, a square in

fact, see Fig. 4.1(a).

2. The Euclidean norm, also known as the 2-norm, is

defined as

x2 :=

x 21 +x 22 .

In this norm the unit ball {x x2 1} is the unit

disc, see Fig. 4.1(b).

3. The max-norm, or -norm, is defined as

x = max(|x1|, |x2|).

Now in this norm the unit ball {x

x 1} is a

square with its axes parallel to the x1- and x2 axis, see

Fig. 4.1(c).

The 1-norm is sometimes called the manhattan norm be-

cause in a rectangular street grid which is common in

US cities the 1-norm x y1 is the minimal Euclideandistance required to travel from junction x to junction y,

see Fig. 4.2.

x

y

Figure 4.2: Manhattan norm: all three routes are

equally long, x y1

The triangle inequality x + y x + y looselyspeaking says that in any norm traveling from 0 to x + yvia x or y can only mean a detour. Moving the y to theleft-hand side of the inequality turns the triangle inequality

into a statement that says that any side in a triangle is at

least the difference of the other two sides:

x + y y xx

y

x+

y

This is sometimes called the reverse triangle inequality and

it is commonly formulated in terms of z = x + y as:

Lemma 4.1.3. |z y| z y. In this form it is immediate that if two vector z and y are

close then their norms are close as well. This impliesthat norms are continuous in some way (see 4.4.1).

27


34/66

Example 4.1.4. The space of finitely nonzero sequences

finite(N; R) is a normed vector space in the 1-normdefinedas

f1 :=

i=1|fi |.

See Problem 4.4.

Example 4.1.5 (Continuous functions in max-norm).

The standard norm on the vector space C[a, b] of continu-

ous functions on real interval [a, b] is the max-norm, also

known as -norm, defined asf = max

x[a,b]|f(x )|.

We now verify that this indeed satisfies the three axioms of

norm:

1. For every scalar we have

f

=maxx

|f(x )

| =maxx |||f(x )| = || maxx |f(x)| = ||f.2. The max norm inherits the triangle inequality from

R: since for every p, q R we have that |p + q| |p| + |q|, we also have for every f, g C[a, b] that

f + g = maxx

|f(x ) + g(x )| max

x|f(x )| + |g(x)|

maxx

|f(x )| + maxx

|g(x )|= f + g.

3. If f is not the zero function then f(x0) = 0 for atleast one x0 [a, b]. Now f |f(x0)| > 0.

In some literature the vector space C[a, b] is identified

with the normed vector space (C[a, b], ). This isunfortunate since we may want to consider other norms on

the space of continuous functions, for instance:

Example 4.1.6. On C[a, b]

f1 :=b

a

|f(x )| dx (4.1)

is a norm (Problem 4.5).

Notice that in this example the norm f1 exists (is fi-nite) for every continuous function. For arbitrary functions

in F([a, b],R) that need not be the case and this is the

reason we restricted attention to C[a, b]. However the

space of continuous functions also has its drawbacks for

this norm:

Example 4.1.7 (Limit does not exist in the space). Con-

sider C[1, 1] and the 1-norm defined in (4.1). In thisnorm the sequence of functions

fn(t) = 0 t [1, 0]nt t (0,

1

n )1 t [ 1

n, 1]

1/n

1

does not converge in the space C[1, 1] because no con-tinuous function f exists for which limn fn f1 = 0. (Convince yourself of this.) Neverthelessthe sequence of functions do approach one another in the

sense that

supn>N,m>N

fn fm1

goes to zero as N . This follows from the fact thatfor any n, m > N we have

fn fm1 =1

1|fn(t) fm (t)| dt

=1/ min(n,m)

0

|fn (t) fm (t)| dt

1/N

0

1 dt

= 1N

.

What fails in this example is that limn fn does notexist in the space, even though the fn become arbitrarily

close to one another in the given norm. We thus need to

make a distinction between converging sequence and se-

quences whose elements become closer and closer. The

latter is called Cauchy sequence and it is the topic of the

next section. Incidentally this difference is not specific to

vector space. It also shows up in sets like the rational num-bers Q. Indeed, in Q we can construct sequences that ap-

proach one another in absolute value but that do not have

a limit in the set of rational numbers. An example is the

sequence of rational numbers {3, 3.1, 3.14, 3.141, . . .} thatconverges to the nonrational .

4.2 Cauchy sequence

Definition 4.2.1 (Cauchy sequence and convergent se-

quence). LetX be a normed vector space and let {xn}nNbe a sequence in

X.

{xn} is a Cauchy sequence if for every > 0 N Nsuch that

n, m > N xn xm < .

{xn} is a convergent sequence if there is an x Xsuch that limn xn x = 0.

It can be shown that for sequences {n} ofreal numbersthe two notions are equivalent. I.e. a real sequence con-

verges iff it is a Cauchy sequence. Figure 4.3 makes thisplausible.

28


35/66

N n

n

Figure 4.3: Cauchy criterion for real sequences

Example 4.2.2 (Integral test for real-valued sequences).

Consider the real sequence n = 1 + 122 +1

32+ + 1

n2.

Now for every m n > N we have

|m n | =m

k=n+1

1

k2

N. Then by thetriangle inequality fn fm = ( fn f) ( fm f) fn f+fm f < /2+/2 = for every n, m > N.So {fn} is Cauchy.

4.3 Banach space = complete vector space

Definition 4.3.1 (Banach space). A normed vector space

X is said to be complete if every Cauchy sequence has a

limit in X. Complete normed vector spaces are called Ba-

nach spaces.

In a Banach space therefore a sequence converges if and

only if it is a Cauchy sequence. This is beneficial because

the Cauchy property is often easier to check since it doesnot require knowledge of the limit, see Example 4.2.2, and

more importantly all sorts of limits are then guaranteed to

exist. This will be of great help in the final chapter of this

course.

Over the years many spaces have been shown to be Ba-

nach spaces, and also many have been shown to fail the

Banach property. In this introductory course we will notworry about completeness proofs because the proofs are

often intricate. We simply list a couple in the remainder of

this section.

Theorem 4.3.2 (continuous functions with max norm).

C[a, b] is a Banach space in the max-norm .Proof. Suppose fn is a Cauchy sequence. Then > 0there is an N > 0 such that fn fm < for alln, m > N. Now at any t [a, b] we have

|fn(t) fm (t)| fn fm < n, m > N .

So for every fixed t [a, b] the sequence of real numbers{fn(t)} is Cauchy. Since R is a Banach space we hencehave that the pointwise limit f(t) := limn fn (t) exists.For m we obtain that

|fn(t) f(t)| n > Nand that this N does not depend on t. Hence fn f 0 as n . Remains to show that this f iscontinuous. Fix an n > N/3. By continuity of fn we have

at each t that |fn(t) fn (t+h)| < /3 for all h [t, t]for some small enough t. For all such h there holds

|f(t

+h)

f(t)

| =|f(t

+h)

fn(t + h) + fn (t+ h) fn(t) + fn (t) f(t)|

|f(t + h) fn (t+ h)|+ |fn (t + h) fn(t)|+ |fn (t) f(t)|

< /3 + /3 + /3 = .So f is continuous.

Notice that C[a, b] is not complete in the 1-norm (Ex-

ample 4.1.7) thus completeness is norm dependent. On fi-nite dimensional space it does not depend on the norm:

Theorem 4.3.3 (Finite dimensional space). Every finite

dimensional normed vector space is a Banach space.

Proof (idea only). Suppose S := {v1, . . . , v m} is a basisof the space. If fn is a Cauchy sequence then it may shown

that its coordinate vectors fn,S is a Cauchy sequence inRm

in, say, the Euclidean norm. This implies that each fixed

entry of these vectors is a Cauchy sequence. Since these

entries are real numbers, they have a limit. The vectors

fn,S hence entry-wise converges to some fS Rm as n . The corresponding f := v1 vm fS is welldefined, and one can show that limn fn f = 0.

29


36/66

4.3.1 Sequence space 1, 2,

On the infinite sequence space (N; R) the 1-norm, 2-normand -norm, that we defined on Rn , become the infinitesums and suprema

v1 := |v1| + |v2| + |v3| + |v4| +

v2 :=

|v1|2 + |v2|2 + |v3|2 + |v4|2 + v := sup(|v1|, |v2|, |v3|, |v4|,. . .).

These, however, are not norms on (N; R) because theyare not finite for some sequences. For instance all three

norms are infinite for the growing sequence

v= (1, 2, 3, 4, 5,. . .).The way out of this problem is as simple as it is elegant.

Merely restricting the sequence space to those elements that

have finite norm will do the job, and the result is a Banachspace (we skip the proof):

Theorem 4.3.4 (Complete sequence spaces). The three

sequence spaces

1 := {v (N,R) v1 < }

2 := {v (N,R) v2 < }

:= {v (N,R) v < }

are all complete in their respective norms.

Example 4.3.5 (Cauchy or not Cauchy). Consider the in-

finite sequence

vn = (1, 12 , 13 , . . . , 1n , 0, 0, . . . )depending on n N. For every n the vn has only finitelymany nonzero entries, so it has finite 1, 2, -norm andthus is in all three vector spaces 1, 2 and . The se-quence vn pointwise converges to

v = (1, 12 , 13 , 14 , 15 , 16 , . . . )

as n . This v is not in 1 because

v

1

=1

+12

+13

+ = but it is in 2 and with respective norms

v2 =

1 + 122

+ 132

+ < v = sup

k1(1, 1

2, 1

3, . . . ) = 1 < .

This is consistent with the observations that

{vn}nN is not Cauchy in the 1-norm because no mat-ter how large N is, the quantity

vn vm1 = 1n+1 + 1n+2 + + 1mcan be taken arbitrary large by appropriate choice ofm n > N.

{vn}nN is Cauchy in the 2-norm because for alln, m > N we have vn vm22 < 1/N 0 asN (See Example 4.2.2). Since 2 is a Banachspace the vn hence converges in

2. Indeed.

{vn}nN is Cauchy in the -norm because for alln, m > N we have vn vm < 1/N 0 asN .

4.3.2 Lebesgue space L1 and L2

The function space equivalent of1 we naively define as

L1[a, b] := {f : [a, b] R

f1 < }where the 1-norm is now defined as

f1 =b

a

|f(t)| dt.

We allow a = and b = +. This definition ofL1[a, b] is not precise because it still depends on the def-

inition of integralb

a|f(t)| dt. The Riemann integral def-

inition is not ideal because one can construct a Cauchy

sequence of Riemann integrable functions whose limiting

function is so crazy that its Riemann integral is no longer

well defined. Hence the space L1[a, b] would then fail to

be complete. The desire of having a complete function

space was so strong that it prompted mathematicians to

look for alternative definitions of integration! In the be-ginning of the 20th century the issue was settled by Henri

Lebesgue. He devised the Lebesgue measure and Lebesgue

integration with respect to which the space L1[a, b] is

complete. The interested reader should follow a course on

measure theory. The symbolL is standard in the math lit-

erature and it is in honor of its inventor Lebesgue. The dif-

ference between Riemann- and Lebesgue integration only

shows up in really weird functions and in this course we

need not worry about such functions. We simply accept

that:

Theorem 4.3.6 (Complete / Banach). L1[a, b] is com-

plete in the 1-norm.

Built in in the definition ofL1 is that its elements have

a well defined 1-norm. This space contains all continu-

ous functions but also many more, and they need not be

bounded.

Example 4.3.7 (SeveralL1 functions). All functions of

Fig. 4.4 are elements ofL1[0, 1], except the last function

f9(t) = 0.1t . Indeed1

0 f9(t) dt = 0.1log(t)|10 = . We should first fix a possibly unsettling problem: part of

the definition of norm is that

f > 0 for all f = 0

30


37/66

0 1

f(t) = 1

0 1

f(t) = t

0 1

0 1

1t

0 1

1/2|t1/2|

0 1

log(t)

sin(1/t) 0.1/t

Figure 4.4: The first 8 functions are in L1[0, 1],

the 9th is not

but here that is not the case! The 8th function of Fig. 4.4,

for instance,

f(t) =1 t = 1/2

0 elsewhere

is not the zero function, yet its 1-norm is zero. The sim-

plistic way out of this problem is to identify every function

f with zero norm with the zero function. That is not far

fetched because iff1 = 0 then

f1 =b

a

|f(t)| dt = 0

implying that f(t) is zero1 almost everywhere. From

now on we do not distinguish between functions f and g

when their difference has norm zero, so from now on by

definition

f = g f g1 = 0.The counter part of 2 is the space of square integrable

functions:

Lemma 4.3.8 (Lebesgue space L2). The space of

square integrable functions

L2[a, b] := {f : [a, b] R

f(t)2 < }

1In a course on measure theory this identification will be formalized

through equivalence classes and then the notion ofalmost everywhere

will be properly defined.

is complete in the 2-norm defined as

f2 :=b

a

|f(t)|2 dt.

Here a = and b = + are allowed. The top threefunctions of Fig. 4.4 are in L2[0, 1]. The fourth and fifth

function of that figure are not L2[0, 1].

Example 4.3.9 (Complete in L2, not complete in C).

Consider the standard 2-norm of functions. All functions

fn : [0, 1] R defined as

fn(t) =

n4/5t 0 t 1/n1

t1/51/n < t 1

1/n1/5

1/n

are continuous. All fn are therefore in C[0, 1] as well as

in L2[0, 1]. The pointwise limit

f(t) =

0 t = 01

t1/50 < t 1

is not in C[0, 1] because it is not continuous and in fact it

is not bounded. It is in L2[0, 1], however, because

f2 =1

0

f2(t) dt =1

0

t25 dt = 53 t

35

10

= 53

is finite. One can show that fn is a Cauchy sequence in

the 2-norm. Since C[a, b] is not complete in this norm,

its limit is not guaranteed to exist in the space C[a, b]

and indeed it does not exist. The space L2[a, b] how-

ever is complete in this norm and hence limn fn existsin L2[a, b]. Indeed.

4.4 Bounded linear operator

Having a norm of vectors allows us to come up with bounds

for mappings on vectors.

Definition 4.4.1 (Bounded operator). Let X and Y be

normed vector spaces. A linear operator F : X Y isboundedif a c 0 exists such that

F(x )Y cxX x X. (4.2)

The smallest possible c in (4.2) gives an indication on

how big the operator is. Ifc for instance is < 1 then we

know that the norm of the imageF(x ) is less then that ofx ,irrespective of the choice of x . Likewise if (4.2) holds for

31


38/66

c = 2 then the norm ofF(x ) will never be more than twicethe norm x . Et cetera. The smallest possible c is what is

called the operator norm2.

Definition 4.4.2 (Operator norm). Let X,Y be normed

vector spaces and F : X

Y a bounded operator. The

operator norm F ofF is defined as3

F = supx=0

F(x )YxX

.

IfX = {0} then we define F = 0. By definition of operator norm we have for every non-

trivial vector space X and every x X that

F(x )Y cxX (4.3)

ifc = F

, while for every c less than F

there are x thatviolate (4.3).

Example 4.4.3 (Bounded operator). We determine the

operator norm ofA : C[a, b] R defined as

A( f) =b

a

f(t) dt.

On C[a, b] we take the max-norm, on R we take the abso-

lute value.

|A( f)

| = b

a

f(t) dtb

a

|f(t)| dt

b

a

f dt

= (b a)f.

The operatorA thus is bounded and its operator norm is at

most b a. For the constant function f(t) = 1, the aboveis an equality,

|A(1)| = b

a 1 dt = (b a) = (b a)1.

The operator norm hence equals b a. Example 4.4.4 (Unbounded operator). ConsiderC[0, 1]

with the 1-norm. On this space the operator : C[0, 1] R defined as

( f) = f(0)2The attentive reader will wonder why we call it operator norm. Doesnt

this require that some set of operators F is a vector space and that on

this vector space the operator norm has the property of norm? The

answers are yes and yes, but we will not deal with such matters in this

course, even though we are very close to settling it.3supremum means least upperbound.

is unbounded. To see this take for instance the sequence of

functions

fn

(t)=

n(1

tn

) 0

t

1n

0 elsewhere.

1/n

n

The 1-norm of each fn is 1/2 while |( f)| = n. The ratio|(f)|/f1 = 2n is unbounded. This shows that is anunbounded operator.

4.4.1 Continuity of maps

We say that a mapping A on a normed vector space is con-

tinuous at y if for every > 0 there is a > 0 such that

x y < A(x ) A(y) < .If the mapping is continuous at y for every y in the do-

main, thenA is said to be continuous. For linearmappings,

boundedness and continuity are equivalent:

Theorem 4.4.5 (Bounded = continuous for linear maps).

For a linear operator A the following three statements are

equivalent.

1. A is continuous

2. A is continuous at 0

3. A is bounded

Proof. (1. 2.) is trivial. Now (2. 3.): IfA is

Documents

Linear Analysis 2010