Linear Analysis 2010

Embed Size (px)

Citation preview

  • 8/2/2019 Linear Analysis 2010

    1/66

    Linear Analysiscourse code: 151124

    October 2010

    University of Twente

  • 8/2/2019 Linear Analysis 2010

    2/66

    Preface

    This course is an introduction to Functional Analysis with the main difference that topology is left out almost entirely.

    The topics in the notes for the year 2010-2011 differ only marginally from that of previous years, but the text is

    substantially different and, we hope, more precise and easier to read.

    ii

  • 8/2/2019 Linear Analysis 2010

    3/66

    Contents

    1 Introduction: real and complex vectors and matrices 1

    1.1 Vectors and matrices in Rn and Rkn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The dot product and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.3 Euclidean norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.4 Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.5 Orthogonal complement in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.6 Subspace, column space and null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.7 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.8 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.9 Normal equations and the projection operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.10 Vectors and matrices inCn and Ckn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Vector space 9

    2.1 Real vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2 Complex vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.4 Linear combination and span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.5 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3 Linear transformation 19

    3.1 Linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.2 Familiar linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.3 Kernel, image and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.4 Linear transformation onRn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.5 Matrix representation and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    4 Normed vector space 27

    4.1 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.2 Cauchy sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4.3 Banach space = complete vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    4.4 Bounded linear operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5 Inner product 37

    5.1 Real inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.2 Complex inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5.4 Orthogonal complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.5 Cauchy-Schwarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    5.6 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    5.7 Orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    5.8 Orthonormal sequences and Parseval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    5.9 Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    6 Hilbert space 47

    6.1 Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    6.2 Complete orthonormal basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    6.3 Adjoint operator on Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    6.4 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    6.5 Unitary operators norm preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    iii

  • 8/2/2019 Linear Analysis 2010

    4/66

    Index 59

    iv

  • 8/2/2019 Linear Analysis 2010

    5/66

    Notation

    tr(A) trace of a square matrix: tr(A) = i aiidet(A) determinant of a square matrix A

    Natural, integer, rational, real, complex numbers:

    N set of positive integers {1, 2, 3, . . .}N0 set of nonnegative integers {0, 1, 2, 3, . . .}Z set of integers {. . . , 2, 1, 0, 1, 2, . . .}Q set of rational numbers { n

    k

    n, k Z, k = 0 }R set of real numbers

    C set of complex numbers

    Real and complex vectors and matrices:

    Rn set of ordered n-tuples (u1, . . . , un ) with uk R k = 1, 2, . . . , nCn set of ordered n-tuples (u1, . . . , un ) with uk C k = 1, 2, . . . , nSequence space:

    sequence space {(u1, u2, . . . ) uk R, k N}. It can also be written as {u : N R}

    (A,B) {u : A B} with A Z. For instance, = (N,R)2 {u : N R

    uk R,k=1 u2k < }2(A,C) {u : A C

    uk C,kA |uk|2 < }finite {u : N R

    uk = 0 for finitely many k N}Function space:

    F(A,B) {f : A B}. This is the set of functions that map from some setA to some set Bfor instance R

    n

    = F({1, . . . , n},R) and =F(N,R).Typically, though,F is used for function spaces such as F([0, 1],R)

    L2[a, b] The square integrable functions on [a, b] R:{f : [a, b] B

    ba

    |f(t)|2 dt < } with either B = R or B = CL1[a, b] {f : [a, b] B

    ba

    |f(t)| dt < } with either B = R or B = CC[a, b] {f : [a, b] B

    f is continuous} with either B = R or B = CPn(A,B) The space of polynomials of degree n or less, that map from A to B. Here A RP The space of polynomials of abitrary degree, P = n0Pn

    v

  • 8/2/2019 Linear Analysis 2010

    6/66

    vi

  • 8/2/2019 Linear Analysis 2010

    7/66

    1 Introduction: real and complex

    vectors and matrices

    In this introductory chapter we review familiar facts

    about vectors and matrices in Rn and Rkn and their com-plex counterparts, and we introduce a version on of the pro-

    jection theorem. It is this projection theorem and most

    notably its proof that we use as a motivation for the ab-

    stractions and generalizations of the following chapters. It

    are these abstractions and generalizations that are the main

    focus of this course. In the end the real and complex vec-

    tors and matrices play only a marginal role, but it is where

    our story begins.

    1.1 Vectors and matrices in Rn and Rkn

    The set Rn is the set of ordered n-tuples (x1,x2, . . . ,xn)

    with xi R, i {1, 2 . . . , n}. Commonly these n-tuplesare identified with column vectors, so we write

    Rn = {x

    x =

    x1x2

    ...

    xn

    with xi R }.

    Likewise Rnm denotes the set n m real matrices. Ma-trices are denoted by capital letters and their elements by

    lower case letters with two subscript indices. The first index

    is the row index, the second the column index, for example

    A =

    a11 a12 a13 a1ma21 a22 a23 a2m

    ......

    ......

    ...

    an1 an2 an3 anm

    Rnm .

    The transpose AT is formed by considering all rows of Aas columns of AT,

    AT =

    a11 a21 an1a12 a22 an2a13 a23 an3

    ......

    ......

    a1m a2m anm

    R

    mn.

    It is convenient to think of the transpose AT as the result of

    reflecting A in its diagonal. The kth column of a matrix A

    is denoted by Ak and, similarly, Ar means its rth row.Thezero (matrix) in whatever dimension nm is usually

    denoted simply as 0; the square n n identity matrix is

    denoted by In or simply by I,

    0 =

    0 0...

    ......

    0 0

    , I =

    1 0 0

    0

    0

    0 0 1

    .

    We assume familiarity with the common matrix addition

    and matrix multiplication.

    1.2 The dot product and orthogonality

    Definition 1.2.1 (Dot product and orthogonality in Rn ).

    The dot product x y of two vectors x , y Rn is the realnumber defined as

    x y = x1y1 + x2y2 + +xnyn.

    We say that two vectors x , y Rn are orthogonal (withrespect to the dot product) if x y = 0.

    Orthogonality ofx and y is often denoted as x y.

    Example 1.2.2 (Orthogonality with respect to the

    dot product). Consider the vectors v and w shown in

    Fig. 1.1(a), that is,

    v= v1v2 = 2

    1 , w = w1

    w2 = 1

    2 .

    These two vector are orthogonal because

    v w = (2 1) + (1 2) = 0.

    It is not hard to show that the set of vectors x R2 forwhich v x 0 is the half space shown in Fig. 1.1(b).

    v

    0

    {x R2

    v x 0}

    v

    w

    2

    1

    1

    2

    0

    Figure 1.1: Orthogonal vectors

    For R2 and R3 the dot product being zero agrees with

    our intuition of being orthogonal (perpendicular) but realize

    that we take x y = 0 to be the definition of orthogonalityand that this is the definition for anyRn .

    1

  • 8/2/2019 Linear Analysis 2010

    8/66

    1.3 Euclidean norm

    Definition 1.3.1 (Euclidean norm). The Euclidean norm

    |x| ofx Rn is defined as

    |x| =

    x 21 +x 22 + +x 2n .

    The set {x Rn |x | 1} is known as the unit ball

    (in the Euclidean norm). For n = 1 it is the unit interval[1, 1] and for n = 2 it is the unit disc, see Fig. 1.2.

    The Euclidean norm of x equals the square root of the

    dot product ofx with itself,

    |x| = x x.

    {x R2 |x | 1}

    (1,0)

    (0,1)

    Figure 1.2: Unit ball in the Euclidean norm, n = 2

    1.4 Pythagoras

    Now that orthogonality is definedas having zero dot prod-

    uct, the Pythagorean theorem is trivial:

    Theorem 1.4.1 (Pythagorean theorem). Let x , y

    Rn .

    Then

    x y |x + y|2 = |x |2 + |y|2.

    Proof.

    |x + y|2 = (x + y) (x + y)= x (x + y) + y (x + y)= (x x ) + (x y) + (y x ) + (y y)= |x |2 + 2(x y) + |y|2.

    Here we used that z (x + y) = z x + z y and thatx y = y x . Convince yourself of these properties.

    1.5 Orthogonal complement inRn

    The orthogonal complementof some set S Rn is the setofall vectors that are orthogonal to all elements of S. The

    orthogonal complement is denoted S.

    Example 1.5.1 (Orthogonal complement). Consider

    x := 13

    10

    R3.

    Its orthogonal complement is

    x : = {y R3 x y = 0}

    = {y R3 y1 + 3y2 + 10y3 = 0}

    = {y R3

    y3 = 1

    10y1

    3

    10y2}

    = { ab 1

    10a 3

    10b

    a, b R}

    =

    x

    x

    The orthogonal complement here is a plane.

    We write V W whenever all elements ofV are per-pendicular to all elements ofW.

    1.6 Subspace, column space and null space

    (a) (b)

    (c) (d)

    Figure 1.3: (a) subspace; (b) affine subspace; (c,d)

    not subspaces

    Very loosely speaking a subspace ofRn is a subset that

    is flat, extends in all directions and contains the origin,see Fig. 1.3(a). It is not too hard to formalize subspace:

    2

  • 8/2/2019 Linear Analysis 2010

    9/66

    Definition 1.6.1 (Subspace). A subset S ofRn is a sub-

    space if

    1. The zero vector 0 is in S,

    2. u, v

    S implies u

    +v

    S, (closed under addition)

    3. u S, R implies v S. (closed under scaling)

    It is customary to use 0 for both the origin (i.e. the

    zero vector) and the zero number. IfS is a subspace and

    x Rn then x + S is referred to as an affine subspace, seeFig. 1.3(b).

    Example 1.6.2 (Column space and Null space). The set

    S := {v R3

    v= (v1, v2, 0), v1, v2 R}

    is a subspace ofR3

    . It is the (x , y)-plane. Let us verifythe three defining properties of subspace:

    1. Clearly 0 = (0, 0, 0) S2. Ifv, w S then v= (v1, v2, 0) and w = (w1, w2, 0).

    Hence v+ w = (v1, v2, 0) + (w1, w2, 0) = (v1 +w1, v2 + w2, 0) and since its last entry is zero alsothis vector is in S.

    3. If v S then v = (v1, v2, 0) so that v =(v1, v2, 0) = (v1, v2, 0) and this clearly is againan element ofS.

    This subspace can be represented in many different ways:

    Let

    A =1 00 1

    0 0

    R32.

    Our set S equals the column space Col(A) of the ma-

    trix A. This is the set of all possible linear combina-

    tions of the columns of A,

    Col(A) :

    = {x x = Ay, y R

    2

    }.

    Let

    W = 0 0 1 R13.The null space, Null(W), of a matrix W is the set of

    vectors x for which W x = 0. It will be no surprisethat Null(W) = S for our W.

    We can also interpret the null space with dot products.

    Let w be the above W, now seen as a vector

    w = 001

    R3.

    The set S is the orthogonal complement w:

    S = {v R3 v w = 0}

    Equivalently, it is the orthogonal complement of the

    entire column space

    Col(

    00

    1

    ).

    This is just to say that the (x , y)-plane is the set of

    vectors that is orthogonal to the z-axis.

    The following lemma states that any subspace ofRn can

    be represented by matrices.

    Lemma 1.6.3 (Matrix representation of subspace). Let

    S be a subset ofRn . The following four statements are

    equivalent:

    S is a subspace

    S = Col(A) for some matrix A Rnk and somek N

    S = Null(W) for some W Rmn and some m N S = W for some set W Rn .

    Given a subspace S there are many matrices A and W for

    which S = Col(A) = Null(W).

    1.7 Projection

    x

    vV

    Figure 1.4: Orthogonal projection in R3

    With orthogonality, norm and subspace defined it is now

    possible to formulate our intuition that connects minimal

    distance (norm) with orthogonality. Here is our first ver-

    sion. Have a look at the proof because it is a basis for later

    generalizations.

    Definition 1.7.1 (Best approximation). An element v V Rn is a best approximation in V of x Rn if

    |x

    v| |

    x

    v|

    vV.

    See Fig. 1.4.

    3

  • 8/2/2019 Linear Analysis 2010

    10/66

    Theorem 1.7.2 (A projection theorem). Let x Rn andlet V be a subspace ofRn . Then

    1. v is a best approximation inV ofx iff1 (x v) V,

    2. If the best approximation v exists then it is uniqueand it satisfies

    |x v|2 = |x|2 |v|2.

    Proof. Suppose (x v) V for some v V. Then forany v V the difference v v is in V by the subspaceproperty, and so by Pythagoras we get

    |x v|2 = |(x v)

    V

    (v v)

    V

    |2

    = |x

    v|

    2

    + |v

    v|

    2

    |x

    v|

    2.

    Hence if v= v then the norm of x vexceeds that ofx v, making v the unique best approximation.

    Conversely, suppose (x v) V. Then by definitionthere is a v V such that (x v) vi.e. such that (x v) v= 0. In particular this vis nonzero. We constructan improved approximation of x of the form v + vwiththe real number yet to be determined.

    |x (v + v )|2= |(x v) v|2

    = |x v|2 2(x v) (v) + |v|2= |x v|2 2[(x v) v] + 2|v|2.

    This quadratic expression in is minimized for =(xv)v

    |v|2 , rendering it equal to

    = |x v|2 2[(x v) v]2

    |v|2 +[(x v) v]2

    |v|2

    = |x v|2 [(x v) v]2

    |v|2

    < |x v|2

    .

    So then v is not a best approximation.The equality |x v|2 = |x |2 |v|2 is a restatement of

    Pythagoras, see Fig. 1.4.

    The theorem avoids the issue of existence of the best

    approximation v because we prefer no to worry about itnow. Here (in Rn) it does exist though.

    1.8 Transpose

    1iff means if-and-only-if

    For explicit representations of the best approximation we

    remind you of an alternative representation of the dot prod-

    uct in terms of transpose of vectors,

    x v= vTx = v1 v2 vnx1x

    2...

    xn

    .

    Then we get the handy rule that for any kn-matrix A andvectors x Rn , y Rk, the matrix A can be moved fromone side of the dot product to the other

    (Ax ) y = x (ATy).

    Indeed, (Ax ) y = y T(Ax ) = (ATy)Tx = x (ATy).

    1.9 Normal equations and the projection op-erator

    If we have the subspace V given in the explicit form

    V = Col(A),

    then the best approximation v V of x can be obtainedrather explicitly:

    Lemma 1.9.1 (Explicit projection normal equations).

    Let x Rn and A Rnk. Then

    y = arg minyRk

    |x Ay|

    iff y Rk satisfies the normal equations

    ATAy = ATx . (1.1)

    The best approximation v Col(A) of x then is v =Ay.

    Proof. This is the projection theorem for V = Col(A) andv = Ay. By the projection theorem we need only estab-lish that (x Ay) V.

    (x Ay) V (x Ay) (Ay) = 0 y Rk y TAT(x Ay) = 0 y Rk AT(x Ay) = 0 (See Problem 1.2) ATx = ATAy.

    This result clearly shows that the transpose is a conve-

    nient notion. With it, projections can be formulated ex-

    plicitly, something we will come back to later (at which

    point we generalize transpose to something called adjoint).The equations (1.1) are known as the normal equations.

    4

  • 8/2/2019 Linear Analysis 2010

    11/66

    The lemma does not require that ATA is invertible and in-

    deed the solution y of the normal equation need not beunique, but if ATA is invertible then (1.1) yields

    y = (ATA)1ATx

    and hence the best approximation v = Ay equalsv = A(ATA)1ATx . (1.2)

    Example 1.9.2 (Projection in R2). Let

    V = Col(

    3

    1

    ) and x =

    0

    1

    .

    According to (1.2) the best approximation in V ofx is

    v =

    3

    1

    3 1

    3

    1

    1

    3 1

    0

    1

    =

    3

    1

    1

    10=

    0.3

    0.1

    xv V

    0

    1.10 Vectors and matrices in Cn and Ckn

    We briefly summarize the complex counterpart of the reals.

    The setCn is the set of ordered n-tuples (x1,x2, . . . ,xn)

    with xi C, i {1, 2 . . . , n}. As in the real case thesen-tuples are often identified with column vectors, so we

    write

    Cn = {x x =

    x1x2

    ...

    xn

    with xi C }.

    The set Cnm denotes the set n m of complex valuedmatrices. Given a complex matrix,

    A =

    a11 a12 a13

    a1ma21 a22 a23 a2m

    ......

    ......

    ...

    an1 an2 an3 anm

    Cnm

    its complex conjugate transpose2 AH is the matrix defined

    as

    AH =

    a11 a21 a1na12 a22 a2na13 a23 a3n

    ......

    ......

    a1m a2m

    anm

    Cmn .

    2or Hermitian transpose or conjugate transpose.

    The complex conjugate transpose AH can be obtained by

    reflecting A in its diagonal and then replacing each element

    by its complex conjugate. We say that a matrix isHermitian

    if A = AH. If the matrix happens to be real then AH =AT. There are two well accepted notations for complex

    conjugate transpose: AH

    and A. We choose AH

    to set itapart from the adjoint operators that we introduce later.

    Example 1.10.1 (Pauli matrix). The Pauli matrices3

    1, 2, 3 are the three 2 2 matrices

    1 =

    0 1

    1 0

    ,

    2 =

    0 ii 0

    ,

    3 =

    1 0

    0 1

    .

    All three are Hermitian and they have the property that

    21 = 22 = 23 = i123 = I2.

    Let us verify that 22 = I2:

    22 =

    0 ii 0

    0 ii 0

    =

    1 0

    0 1

    .

    Since Hi = i we also have that Hi i = I2. This propertyis what we later call unitary.

    The dot product x y for complex vectors x and y ofequal dimension is defined as

    x y = y Hx = y1 yn

    x1...

    xn

    = n

    k=1ykxk.

    Example 1.10.2 (Norm of complex vector). For v =(1, 2 + i, 3i) C3 we have

    v v= v1v1 + v2v2 + v3v3= |v1|2 + |v2|2 + |v3|2

    =12

    + |2

    +i

    |2

    + | 3i

    |2

    = 12 + (22 + 12) + 32 = 15.

    The norm defined as |v| = v vhence is

    15.

    1.11 Problems

    1.1 Let

    x =

    1

    2

    3

    3 is the common notation for Pauli matrices in physics. In this coursewe typically denote matrices with capital letters however.

    5

  • 8/2/2019 Linear Analysis 2010

    12/66

    a) Determine a matrix A such that x = Col(A)b) How many columns ofA are needed?

    1.2 Show that x y = 0 for all y Rn implies that x = 0.1.3 Let W

    Rm

    n . Prove that Null(W) is a subspace.

    1.4 Let A Rnk. Prove that Col(A) is a subspace.1.5 Let S Rn . Show that

    a) S (S)b) S is a subspace iff(S) = S

    1.6 Let S1,S2 Rn . Is the intersectionS1S2 a subspaceifS1 and S2 are subspaces?

    1.7 Consider

    A =1 01 4

    0 1

    Compute the best approximation in V = Col(A) ofx = (0, 0, 1)

    1.8 Redo the previous example but now for

    A =1 21 2

    1 2

    1.9 Let A Rn3 and let (as always) Ak denote its kthcolumn. Show that

    ATA =

    |A1|

    2 A2 A1 A3 A1A1 A2 |A2|2 A3 A2A1 A3 A2 A3 |A3|2

    .

    1.10 Let

    V = Null(1 1 1).a) Express V as V = Col(A) for some matrix Ab) Determine the best approximation in V of x =

    (0, 0, 1).

    c) Sketch V and both x = (0, 0, 1) and its bestapproximation.

    1.11 Prove the two properties used in the proof of the

    Pythagorean theorem:

    x y = y x z (x + y) = (z x ) + (z y)

    1.12 Suppose Q is a 2 2 matrix such that |Qx | = |x | forall x

    R2.

    a) Show that QT Q = I

    b) Show that Q has the form

    Q =

    cos() sin()sin() cos()

    or

    Q =

    cos() sin()

    sin() cos()

    .

    vx+V

    x

    V

    0

    Figure 1.5: Minimum norm element v of an affinesubspace x + V

    1.13 A version of the projection theorem that appears often

    in applications is the following (see Fig. 1.5):

    Let x Rn and letV be a subspace ofRn . A vector v V is a minimal normelement of the affine subspacex +V if andonly ifv V.

    Prove it.

    1.14 Sketch the affine subspace 01 + Col 12 and deter-mine the minimal norm element of this set.1.15 Determine the complex conjugate transpose of

    a)

    1 + i 1 + 2i

    1 + 3i 1 + 4i

    b)

    3 2 + i 3 + 2i

    4 + 2i 4 + i 4

    c)i 0 i 1 + i

    d)

    0 1 + i 3 + 4i1 + i 0 2 6i

    3 + 4i 2 6i 0

    1.16 Let

    x =

    1

    2i

    C2

    Determine a complex matrix A and W such that

    x = Col(A) = Null(W).(In the complex case, Col(A) is the set of vectors of

    the form Ay with y Ck, where k is the number ofcolumns of A.)

    1.17 what is the smallest subspace ofR3 that contains theunit circle {(x , y,z)

    x 2 + y2 = 1,z = 0}?

    6

  • 8/2/2019 Linear Analysis 2010

    13/66

    1.18 Show that

    a) Col(A) = Null(AT)b) Col(A AT) = Null(AT)c) Col(A)

    =Col(A AT)

    1.19 Formulate and prove a projection theorem for x Cnand V a subspace ofCn . This also requires that you

    think about what subspace should mean in Cn (this

    chapters only defines it for real vectors).

    tk

    xk|k|

    Figure 1.6: Least squares fit

    1.20 Least squares approximation. A very common prob-

    lem is to approximate a set of pairs of real numbers,

    (t1,x1), (t2,x2), . . . , (tn,xn)

    by a straight line, see Fig. 1.6. This can be seen as an

    application of the projection theorem inRn with n thenumber of pairs. We write the candidate straight line

    as

    x (t) = y1 + y2t, with y1, y2 R

    and the approximation error of the kth pair we write

    as k := xk x(tk), see Fig. 1.6. Ideally k = 0 kwhich would mean that the straight line interpolates

    all pairs. In practice we try to make the errors as small

    as possible, and the most popular way of doing this is

    by least squares approximation:

    a) Express the vector (1, . . . , n ) of errors as

    12...

    n

    =

    x1x2

    ...

    xn

    x

    ? ?

    ? ?...

    ...

    ? ?

    A

    y1y2

    y

    (that is, determine the matrix A)

    b) Show that

    AT

    A =n

    k=11 tk

    tk t2k

    c) Show that ATA is invertible iff tk = tj for atleast one pair (j, k). (This might be a tough

    problem.)

    d) Show that the sum of squares

    nk=1

    2k of the er-

    rors equals

    |x

    Ay

    |2 and write down the corre-

    sponding normal equations in terms of the avail-able data (tk,xk)

    e) The least squares fit is defined as the straight

    line that minimizes the sum of squaresn

    k=1 2k.

    Determine the least squares fit (that is, deter-

    mine the optimal y1, y2 as functions of tk,xk.

    You may assume that ATA is invertible)

    7

  • 8/2/2019 Linear Analysis 2010

    14/66

    8

  • 8/2/2019 Linear Analysis 2010

    15/66

    2 Vector space

    Let us say that it is our purpose to generalize the pro-

    jection theorem. Then we should generalize the various

    players in the projection theorem. These are

    spaceRn ,

    subspaceV ofRn ,

    dot product,

    Euclidean norm.

    In this chapter we generalize space Rn (to be called vector

    space) and subspace V (still to be called subspace). Vector

    spaces and subspaces can be recognized in loads of appli-

    cations, the projection theorem being just one of them.

    2.1 Real vector space

    What properties ofRn did we implicitly use in the projec-

    tion theorem and its proof? Have a look at Thm. 1.7.2 and

    its proof and you will probably agree that the following

    eight properties will do:

    Definition 2.1.1 (Real vector space). A real vector space(X,, ) is a nonempty set of elements X, called vectors,on which vector addition XX X and real scalar mul-tiplication RX X is defined with the following eightproperties for all v, w X and all scalars , R:

    1. u v= v u commutative2. (u v) w = u (v w) associative3. There is a zero vector, also known as origin, 0 X

    such that u 0 = u u X4. For each v

    X there is an additive inverse

    v

    X

    such that v (v) = 05. 1v= v6. (v) = ()v associative7. ( +)v= vv distributive8. (u v) = u v distributive

    If this is your first contact with such a formal definition

    then please realize this: we have the freedom to define our

    own addition and multiplication and we may dream up

    really weird sets X; but the moment that X with that addi-tion and multiplication satisfies the eight axioms of vector

    space then automatically all results we will derive for gen-

    eral vector spaces hold for our weird X as well. Thats the

    beauty of generality and abstraction.

    Before entering a series of examples, you will want to

    know that the 8 axioms of vector space imply a host of

    other properties. Here are some basic ones:

    Theorem 2.1.2 (Basic properties of vector space). Sup-

    pose (X,, ) is a real vector space. Then1. The origin 0 X is unique2. The additive inverse is unique: if v w1 = 0 and

    v w2 = 0 then w1 = w2.3. 0v= 04. 0 = 05. The additive inverse

    vequals (

    1)

    v

    6. v= 0, v= 0 = 0Proof.

    1. Suppose that 01 and 02 are two zero vectors. Then

    01 02 = 01 and 01 02 = 02. So the two zerovectors are the same.

    2. Let w1 and w2 be two additive inverses of v. Then

    w1 = w1 0 = w1 (vw2) = (w1 v) w2 =0 w2 = w2.

    3. 0

    v

    =0

    v 0

    =0

    v (0

    v

    (0

    v))

    =(0

    +0)v

    (0v) = 0v(0v) = 04. We proved it already for = 0. If = 0 then v

    0 = ( 1 v 0) = ( 1 v) = 1v= vfor every v.Hence 0 satisfies the conditions of the zero vector.

    5. v (1)v= 1v+ (1)v= (1 1)v= 0v= 0.6. Suppose v= 0 and v= 0. If = 0 then 1

    (v) =

    1v = v = 0 while 1(v) = 1

    0 = 0. This is a

    contradiction. Hence = 0.

    In fact properties 3, 4 and 6 of the above theorem can be

    combined into

    v= 0 ( = 0 and/or v= 0).

    One may choose to include any number of the above prop-

    erties into the definition of vector space but it is customary

    not to do that. We prefer to strip a property from a defi-

    nition if it is implied by others properties (axioms) of the

    definition.

    Example 2.1.3 (Rn ). The space Rn of ordered sequences

    of given length n

    N, with entries in R,

    Rn = {u u = (u1, u2, . . . , un), uk R }

    9

  • 8/2/2019 Linear Analysis 2010

    16/66

    is a vector space under the vector addition and scalar mul-

    tiplication defined elementwise as

    u v:= (u1 + v1, u2 + v2, . . . , un + vn),u := (u1, u2, . . . , un ).

    The subtlety is that the plus-sign in u vrepresents addi-

    tion of two vectors whereas the plus-sign in u1 + v1 rep-resents ordinary addition of two real numbers. Likewise

    u is a product of scalar and vector u while u1 simplymeans product of two real numbers. It is easy to verify that

    the 8 defining properties of vector space hold, i.e. that this

    (Rn,, ) is a real vector space. Example 2.1.4 (Sequence space). The space (N; R) isthe set of one-sided infinite sequences

    (N; R) = {u

    u = (u1, u2,. . .), uk R, k N }.

    As in Rn

    it is a vector space under the addition and scalarmultiplication defined elementwise as

    u v:= (u1 + v1, u2 + v2, u3 + v3,. . .),v:= (u1, u2, u3, . . . ) .

    we leave it to the reader to establish that the 8 properties of

    real vector space indeed hold.

    u

    v

    u v

    Figure 2.1: Two vectors u, v R25 and their sum

    Figure 2.1 depicts vector addition in R25. The reason toinclude this figure is to convince you of the fact that also

    function spaces can be seen as vector spaces and that con-

    ceptually the step from Rn to function space is marginal.

    Example 2.1.5 (Function space). The set of functions

    F([0, 1],R) := {f : [0, 1] R}that map from [0, 1] to R, is a vector space under addition

    and scalar multiplication defined pointwise, at each t, as

    ( f g)(t) = f(t) + g(t), (f)(t) = f(t).

    See Fig. 2.2. It is a bit of a bore to verify the eight definingrules of vector space, but once we have to do it:

    1. f g = g f because ( f g)(t) = f(t) + g(t) =g(t)+ f(t) = (g f)(t). The vector addition inheritsthe commutative property of addition of real numbers.

    2. ( f g) p = f (g p) indeed, and its proof isvery similar to that of part 1.

    3. the function n(t) = 0 t satisfies fn = f for everyfunction f, so n is a zero vector

    4. f defined pointwise as f(t) = (1)f(t) is an addi-tive inverse of f because then f f = n

    5. 1f = f because (1f)(t) = 1( f(t)) = f(t) t.6. (f) = ()f. This is possibly the trickiest to

    prove. Its proof is a series of applications of the defi-

    nition of scalar multiplication on our function space:

    (f)(t)

    =f(t)

    t.

    Here we go:

    ((f))(t)= ( f)(t)= (f(t))= ()f(t)= (()f)(t)

    So (f) and ()f are indeed the same functions.7. (

    +)

    f

    =

    f

    f see Problem 2.6.

    8. ( f g) = f g see Problem 2.7.

    f

    g

    f g

    Figure 2.2: Graph of functions f and g and their

    sum f g

    Notation cleanup

    To avoid unduly cumbersome notation we simplify the no-

    tation somewhat.

    The dot on top of vector addition was used to empha-

    size that it differs from addition of scalars. Now that the

    difference is clear, we almost always skip the dot on vector

    addition and so + from now on means both vector additionand scalar addition. The context makes clear which one it

    is.

    Similarly the dot in scalar-vector multiplication such asin vis deleted altogether: v.

    10

  • 8/2/2019 Linear Analysis 2010

    17/66

    Also the underline in the zero vector 0 is usually omitted,

    so from now on 0 is used both for the scalar zero and the

    zero vector.

    Finally, we typically say X is a vector space instead of

    the more precise but also more cumbersome (X, +, ) is avector space.

    2.2 Complex vector space

    A complex vector space differs from a real vector space

    only in that the scalars the s and s in a complex

    vector space are taken from C instead ofR. For complete-

    ness: a complex vector space X is a nonempty set of ele-

    ments, called vectors, on which vector additionXX Xand complex scalar multiplication C X X is de-fined that satisfy the 8 properties of Definition 2.1.1 for

    all v, w

    X and all ,

    C. From the context it will be

    clear whether we deal with real or complex vector spacesand we refer to the s and s simply as scalars.

    The basic properties of Lemma 2.1.2 also holds for com-

    plex vector space (the proof is identical).

    Example 2.2.1 (Cn ). The space Cn is the set of ordered

    n-tuples of complex numbers,

    Cn = {u u = (u1, u2, . . . , un); u1, . . . , un C }.

    It is a vector space under the addition and scalar multipli-

    cation defined elementwise as

    u + v= (u1 + v1, u2 + v2, . . . , un + vn),u = (u1, u2, . . . , un ).

    Example 2.2.2 (Doubly infinite complex sequence).

    The space (Z; C) is the set of doubly infinite orderedsequences

    (Z; C)= {u

    u = ( . . . , u1, u0, u1,. . .), uk C, k Z }.It is a vector space under the addition and scalar multipli-

    cation defined elementwise as

    u

    +v

    =( . . . , u1

    +v1, u0

    +v0, u1

    +v1,. . .),

    u = ( . . . , u1, u0, u1, . . . ) .

    Example 2.2.3 (Function space). Complex-valued func-

    tions

    F([0, 1],C) := {f : [0, 1] C}that map from [0, 1]toC can be seen as a vector space with

    addition and scalar multiplication defined pointwise as

    ( f + g)(t) = f(t) + g(t) t [0, 1],(f)(t) = ( f(t)) t [0, 1].

    The zero element is the function n(t) that is zero for everyt [0, 1].

    2.3 Subspace

    A subset of a vector space may be a vector space itself.

    For instance the (x , y)-plane of the vector space R3 is it-

    self a vector space with addition and scalar multiplication

    borrowed from vector space R3. If it has been settled thatX is a vector space, then to test whether or not a subset

    V X is a vector space, we need not redo all the 8 defin-ing properties of vector space. It is sufficient to check that

    the set is closed under addition and scalar multiplication.

    All other axioms of vector space are then inherited by that

    ofX. Such subsets, when nonempty, we call subspace.

    Definition 2.3.1 (Subspace). A subset V of a vector

    spaceX is a subspace ofX if for all u, v V and scalar :1. 0 V,

    2. u + v V, closed under addition3. v V. closed under scaling

    In a non-empty setV the third condition implies the first

    (take = 0). Therefore the first condition in effect onlysays that subspaces are not allowed to be empty.

    Example 2.3.2 (Subspace of function space). The set

    S = {f : R R c, d R such that

    f(t) = c cos(t) + dsin(t) t R }

    is a subspace ofF(R,R). Let us verify:

    1. the zero function n(t) = 0 t ofF(R,R) is an ele-ment ofS (take c = d = 0),

    2. it is closed under addition, for if fk(t) = ck cos(t) +dk sin(t) S then so is their sum ( f1 + f2)(t) =(c1 + c2) cos(t) + (d1 + d2) sin(t) S.

    3. it is closed under scalar multiplication, for if

    f(t) := c cos(t) + dsin(t) is in S then so isf(t) = (c) cos(t) + (d) sin(t).

    Our intuition forR3 that says that a subspace is something

    flat may fail for function space. It is a subspace nonethe-

    less.

    Example 2.3.3 (Finitely nonzero sequence space). The

    set of infinite sequences of which only finitely many entries

    are nonzero,

    finite(N,R) := {u : N R only finitely many uk

    are nonzero }

    is a subspace of(N; R). See problem 2.14. The next example is important. It considers the set of

    square summable sequences and they play a key role infunctional analysis.

    11

  • 8/2/2019 Linear Analysis 2010

    18/66

    Example 2.3.4 (Square summable sequence). The set

    of square summable sequences u = (u1, u2, . . . ) of realnumbers is denoted 2(N; R). That is,

    2(N

    ;R)

    = {u

    =(u1, u2, . . . ) un R,

    n=1 u2n 0. Is it true that

    any set ofn elements that spans X is a basis ofX?

    2.27 A subset S of a vector space is an affine subspace if

    it is closed under affine combination, meaning that if

    x , yS then

    1x + 2y S

    for all 1 and 2 that add up to one, 1 + 2 = 1.a) Consider R2 and two elements x1 =

    01

    and

    x2 =

    21

    . Sketch in the plane the set of all

    affine combinations of x1 and x2

    b) Show that a nonempty S is an affine subspace

    (of some vector space X) iffS = x0 + V forsome x0 X and some subspace V ofX.

    c) Let S be an affine subspace. Show that for any

    n and any x1, . . . ,xn S we haven

    i=1ixi S

    whenevern

    i=1 i = 1.2.28 Let n > 0. Show that Rn is not a subspace Cn

    2.29 Suppose V is a subspace ofX and that dim(X) < .Show that V = X iff dim(V) = dim(X).

    2.30 Prove that a subspace of a vector space is itself a vec-

    tor space.

    2.31 ConsiderP3 with basis {1,x 3, (x 3)2, (x 3)3}.Determine the coordinates with respect to this basis

    of

    a) 1

    b) x

    c) x 2

    2.32 Consider span{1, eix , eix } F(R,C) with obviousbasis S = {1, eix , eix}. With respect to this basis,determine the vector of coordinates of

    a) sin(x )

    16

  • 8/2/2019 Linear Analysis 2010

    23/66

    b) 1 + cos(x )2.33 Alternative definition of vector space. Less common

    but more concise is this definition of vector space:

    A real vector space is a nonempty setV

    with an addition : V V V andscalar multiplication : R V Vthat satisfy the following six axioms for all

    x , y,z V and all, R: x (y z) = (x y)z 0x does not depend on x ( +)x = (x ) (x ) (x y) = (x ) (y) ()x = (x ) 1x = x

    We denote0

    x as 0. We abbreviate(

    1)

    x

    tox andx (y) to x y.a) Show that Definition 2.1.1 implies the above six

    axioms

    b) Show that the above six axioms imply the eight

    of Definition 2.1.1.

    In other words, the two definitions of vector space are

    equivalent.

    17

  • 8/2/2019 Linear Analysis 2010

    24/66

  • 8/2/2019 Linear Analysis 2010

    25/66

    3 Linear transformation

    v F(v)

    F(V)V

    W

    Figure 3.1: A mapping F fromV to W

    Linear transformations (also known as linear operators

    and linear mappings) are everywhere. For instance the pro-

    jection theorem (page 4) states that the best approximation

    v of an x is unique so we can consider the mapping F thatsends x to its best approximation v = F(x ). This is justone of the many mappings F that turn out to be linear.

    3.1 Linear transformation

    Definition 3.1.1 (Linearity). Let V and W be two vector

    spaces (both real or both complex vector spaces). A map-

    ping F from V to W is linear if for every v1, v2, v Vand scalar :

    1. F(v1 + v2) = F(v1) +F(v2), additive

    2. F(v) = F(v). homogeneous

    IfFmaps from V to W then we write F : V W. Wecan apply F to elements (vectors) v V but also to setsS V, and we use the notation F(S) to mean

    F(S)= {

    F(v) v S}.The range ofF : V W is defined as F(V), i.e. it is theset of all possible outcomes of the mapping. The range is

    also known as the image (of its domain) and is denoted as

    Im(F). The setW to which Fmaps is sometimes referred

    to as the codomain ofF. The codomainW may well be a

    much bigger set than the range ofF.

    Example 3.1.2 (Linearity on function space). This is

    an attempt to graphically explain what linearity means on

    function space. Suppose thatFmaps functions x : R Rto functions y : R R, and suppose that

    F

    =

    and

    F( ) = .Then additivity implies that

    F =

    and homogeneity implies that

    F

    = .

    The vector addition and scalar multiplication of the

    codomain W induce a form of addition of mappings and

    scalar multiplication with mappings. Specifically, for any

    two mappings F,G : V W we define the sum of thetwo mappings as

    (F+ G)(x ) := F(x ) + G(x )and the product of scalar and the mapping is defined as

    (F)(x ) := (F(x )).Also, ifF1 : V1 V2 and F2 : V2 V3 are two map-pings then F2F1 : V1 V3 by definition is the mappingdefined as

    (F2F1)(x ) := F2(F1(x )).

    3.2 Familiar linear transformations

    Well, the most familiar linear transformations are the ones

    that map from Rn to Rksee a later sectionbut here are

    other standard ones. It is easy to verify that they are indeed

    linear. Following the colloquial definition some trickier

    issues regarding domain and codomain are added.

    Example 3.2.1 (Fourier transform). The Fourier trans-

    formation is a linear transformation that sends continuous

    time functions x : R C to continuous frequency func-tions x : R C, defined as

    x = F(x ) : x() =

    x (t) eit dt.

    As domain V we could take the set of absolutely inte-

    grable functions {x : R C

    R|x (t)| dt < } (with

    standard addition and multiplication) because then x () iswell defined for every R. As codomain we may takeW := F(R,C). Example 3.2.2 (Fourier series). The Fourier series can

    be seen as a linear mapping that sends continuous time

    functions on finite interval, x : [0, T] R to countablymany Fourier coefficients x : Z C,

    x = F(x ) : xk = 1T

    T0

    x (t) eik2T t dt, k Z.

    19

  • 8/2/2019 Linear Analysis 2010

    26/66

    As domainVwe could take the set of continuous functions

    on [0, T] (but other sensible domains can be dreamed up).

    Codomain (Z; C) is natural. Example 3.2.3 (Laplace transform). Likewise the unilat-

    eral Laplace transformL is linear as well,

    X = L(x ) : X(s) =

    0

    x (t) est dt.

    One might remember that every bounded function x (t) has

    a Laplace transform X(s) that is defined for all s C withRe(s) > 0. So if the domain is V = {x : [0, ) R c > 0 such that |x (t)| < c t R} then as codomain

    W we might take the functions defined on the open right-

    half complex plane, W = {x : ((0, ) + iR) C}. Example 3.2.4 (Convolution and Fredholm). Here is an-

    other familiar linear mapping: the convolution Ch ,

    y = Ch(u) : y(t) = (hu)(t) :=

    h( )u(t ) d.

    The convolution is in fact a special case of the general

    linear mapping fromF(R,R) to F(R,R),

    y = Ffredholm(u) : y(t) =b

    a

    K(t, s)u(s) ds.

    If a and b are finite and K(t, s) is continuous and u is

    continuous as well, then the operator is well defined and

    its outcome is continuous. The equation relating u and y

    is often called Fredholm equation (and the game then is to

    find u for given K and y).

    Example 3.2.5 (Differentiator). Also linear is the differ-

    entiator D,

    f = D(g) : f(t) = g(1)(t).

    As domain V we should take a vector space whose ele-

    ments are differentiable, such as

    V = {f : R R

    f is differentiable}.

    Codomain F(R; R) will do. Let us verify linearity. Forone it is additive, because for any g, h V the derivativeof the sum is the sum of the derivatives,

    D(g + h) = (g + h)(1)= g(1) + h(1)= (Dg) + (Dh)

    and it is homogeneous as well,

    D(g) = (g)(1) = (g(1)) = (Dg).

    v(t)

    t

    wk

    k

    Figure 3.2: Original signal, sampled signal

    Example 3.2.6 (Sampler). The ideal sampler Sh maps

    functions to sequences, see Fig. 3.2. More specific, for

    a given sampling period h > 0, it is defined as

    w = Sh (v) : w(k) = v(kh ), k Z.It is a well defined linear transformation if we choose as

    domain, say, V = {v : R R vis continuous } and as

    codomainW = (Z; R), both with their standard additionand multiplication. Additivity in words means that the

    samples of the sum equals the sum of the samples. Indeed,

    (Sh( f + g))(k) = ( f + g)(kh )= f(kh ) + g(kh )= (Sh ( f))(k) + (Sh (g))(k).

    It is also homogeneous: the samples of the scaled signal

    are the scaled samples of the signal (or scaling commutes

    with sampling):

    (Sh (f))(k) = (f)(kh ) = ( f(kh )) = (Sh ( f))(k).

    3.3 Kernel, image and dimension

    Let F : V W be a linear mapping from vector space Vto vector space W. It is readily verified that then ker(F) is

    a subspace of the domain V and that Im(F) is a subspace

    of the codomain W (Problem 3.7). Now suppose that we

    have to find the solutions x of the equation

    F(x ) = w.There are two possibilities: either w Im(F), so then nosolution x exists, or

    w Im(F).In that case there is at least one x0 for which F(x0) =w. We claim that the complete solution set is the affine

    subspace

    x0 + ker(F).Indeed, ifF(x0) = w then x satisfies

    F(x ) = w F(x ) = F(x0) F(x x0) = 0

    x

    x

    0 ker(F)

    x x0 + ker(F).

    20

  • 8/2/2019 Linear Analysis 2010

    27/66

    Example 3.3.1. Let V be the subspace of twice differen-

    tiable functions in F(R; R) and let D : V F(R; R) bethe differential operator defined as

    D(y) = y(2) + y.

    What is the complete solution set (in V) of

    (Dy)(t) = 2 et?Clearly y0(t) = et is one solution. The complete solutionset hence is

    et + ker(D) = et + span{sin, cos}.

    From linear algebra one may recall that any n m ma-trix through elementary row andcolumn operations can be

    transformed into the formIr 0r,mr

    0nr,r 0nr,mr

    Rnm .

    In this form it is immediate that the kernel has dimension

    m r and that the image has dimension r. These twodimensions add up to m, which is the number of columns

    of the matrix. This result holds in greater generality (no

    proof):

    Lemma 3.3.2 (A dimension theorem). Let F : V Wbe a linear operator from vector space V to vector spaceW

    and assume that V is finite dimensional. Then

    dim(ker(F)) + dim(Im(F)) = dim(V).In particular, if dim(V) = dim(W) < , then the abovesays that F is injective iff it is surjective:

    ker(F) = {0} Im(F) = W.

    Example 3.3.3 (Differentiator). Consider the vector

    space of polynomials Pn of degree at most n, and the

    differentiatorD : Pn Pn defined as D(p) = p.The kernel ofD is

    ker(D) = {p Pn p = 0}= {p Pn

    p is constant }= P0.

    Clearly this kernel has dimension 1. So by the dimension

    theorem the range, Im(D), has dimension dim(Pn) 1 =n. It does:

    Im(D)

    = {D(p) p(t) = antn + + a1t + a0, ai R}

    = {nantn1 + (n 1)an1tn2 + a1

    ai R}

    =Pn

    1.

    Example 3.3.4 (Abstract interpolation). Clearly given

    any two points (x1, y1) and (x2, y2) in R2, with x1 = x2,

    there is a unique degree-1 or constant polynomial that in-

    terpolates these points:

    (x1, y1)

    (x2, y2)

    With the dimension theorem this can be generalized as fol-

    lows. Consider an arbitrary set ofn+1 points (xi , yi ) R2with all xi distinct. We show that there is a unique polyno-

    mial of degree n or less that interpolates these points. To

    this end consider the mapping

    F : PnRn+1

    that sends a polynomial p to

    F(p) = (p(x1), p(x2) , . . . , p(xn+1)).

    A polynomial p interpolates (x1, y1) , . . . , (xn+1, yn+1) iffF(p) = y where y = (y1, . . . , yn+1). The mapping Fis linear (verify this yourself). Now it is well known that

    a polynomial of degree n or less does not have n + 1 ze-ros, unless it is the zero function. Hence on Pn we have

    F(p) = 0 only if p is the zero element, so

    ker(F)

    = {0

    }.

    By the dimension theorem and the fact that Pn and Rn+1

    have the same dimension we thus have

    Im(F) = Rn+1.

    In other words for every y = (y1, . . . , yn+1) there is ap0 Pn that interpolates the n + 1 points (xi , yi ). Infact the solution is unique because the general solution is

    p0 + ker(F), and ker(F) = {0}. See Fig. 3.3.

    (x1, y1)(x2, y2)

    (x3, y3)

    Figure 3.3: There is a unique p P2 that interpo-lates the three points

    3.4 Linear transformation on Rn

    On Rn linear mappings are often identified with matrices.

    21

  • 8/2/2019 Linear Analysis 2010

    28/66

    v

    F(v)

    0/2

    v

    F(v)

    0

    Figure 3.4: Rotation and reflection

    Example 3.4.1 (Rotation in R2). Figure 3.4(a) illustrates

    the rotation F : R2 R2 operator. It rotates its argu-ment over an angle of (counter clockwise). It is a linear

    mapping (verify this). In particular it maps the unit vector

    e1 :=

    10

    to y1 :=

    cos()sin()

    and the unit vector e2 :=

    01

    to y2 := sin()cos() . Combining the two outcomes in a ma-trixFrotation :=

    y1 y2

    = cos() sin()sin() cos()

    is the standard way of representing this linear mapping.

    Example 3.4.2 (Reflection in R2). Figure 3.4(b) depicts

    the reflection transformation F : R2 R2. It reflectsits argument with respect to the line with angle /2. The

    matrix F now becomes

    Freflection =cos() sin()

    sin() cos()

    .

    Example 3.4.3 (transformation on R3). Suppose we

    have a mapping T that we know to be linear and that

    sends the unit cube to a stretched version, see Fig. 3.5, in

    particular that

    T(e1) = e1, T(e2) = 2e2, T(e3) = e3.

    The matrix T associated with this mapping (with respect

    to the standard basis) is

    T =1 0 00 2 0

    0 0 1

    .

    Identifying linear mappings with their matrix has to do

    with the fact that the linear mapping is completely specified

    by its matrix (a proof follows shortly). The drawback of

    such a matrix approach is that it assumes that we all agree

    on what the standard basis is and while this may be so

    (well) in Rn , for other vector spaces this may not be soobvious.

    e1

    e2

    e3

    T(e1) = e1 T(e2) = 2e2

    T(e3) = e3

    Figure 3.5: Unit cube linearly transformed

    3.5 Matrix representation and eigenvectors

    A message of the previous section is this: once we settle

    on a basis then the linear mapping may be identified with a

    matrix of scalars. As mentioned earlier, a drawback of such

    an approach is that it assumes agreement on the choice of

    basis. On the other hand, an advantage is that it translates

    the linear mapping into a matrix of numbers, which makes

    it explicit (e.g. matlabable). Consistent with the previoussection we define:

    Definition 3.5.1 (Matrix representation of linear map-

    pings). Let V be a vector space with finite ordered basis

    S = {v1, v2, . . . , v n}. For any x V let xS Rn (or Cn)denote the column vector of coordinates of x with respect

    to the basis, that is,

    x =n

    i=1vixS,i . (3.1)

    For any linear transformation

    F : V V

    the matrix FS S ofFwith respect to the basis S is defined as

    the n n matrix whose columns are the coordinate vectorsof the transformed basis elements,

    FS S =

    [F(v1)]S [F(v2)]S [F(vn)]S

    .

    The connection (3.1) between x and xS may be written

    compactly using a row vector of basis elements, as

    x = v1 v2 vnxS.For x = F(vi ) this shows that [F(vi )]S is determined bythe equation

    F(vi ) =

    v1 v2 vn

    [F(vi )]S,

    and that the matrix FS S since it is just the collection of

    all these [F(vi )]S is determined byF(v1) F(v2) F(vn)

    = v1 v2 vn FS S.The following lemma says that linear transformations on

    finite dimensional vector space are completely specified bytheir matrix:

    22

  • 8/2/2019 Linear Analysis 2010

    29/66

    Lemma 3.5.2 (Matrix representation of linear transfor-

    mation). Let V be a vector space with finite ordered ba-

    sis S = {v1, . . . , v n} and let x , y V and suppose thatF : V V is linear. Then

    y

    =F(x )

    yS

    =FS SxS .

    Proof. By definition ofxS we have x =

    v1 vn

    xS .

    Using linearity we get F(x ) = F(v1 vnxS) =F(v1) F(vn)

    xS =

    v1 vn

    FS SxS . So

    y = F(x ) iff v1 vnyS = v1 vn FS SxS .As the {v1, . . . , v n} are linearly independent this last equal-ity holds iff yS = FS SxS.Lemma 3.5.3 (Eigenvalue and eigenvector). Let be a

    scalar. Consider a linear mapping F : V V and let FS Sbe the matrix of this mapping, given some basis S ofV.

    The following statements are equivalent.

    1. There is an x V, x = 0 such that F(x ) = x .2. is an eigenvalue of the matrix FS S .

    Such nonzero x we call an eigenvectorof the mapping, and

    the scalar an eigenvalue of the mapping.

    Proof. Apply Lemma 3.5.2 for y = x , and realize thatx = 0 iffxS = 0.

    The lemma implies that the eigenvalues of FS S do not

    depend on the choice of basis. Better yet, the notion of

    eigenvalue does not require the notion of basis. For com-

    plicated linear mappings it may however be hard to find the

    eigenvalues and eigenfunctions and then a matrix represen-tation may help.

    Example 3.5.4 (Differentiator). Consider the differentia-

    tor D : Pn Pn that sends polynomials p of degree atmost n to their derivative D(p) := p(1). A basis for Pnclearly is

    S := {1, t, t2, . . . , tn}and they map to

    {0, 1, 2t, . . . , ntn1}.

    With respect to this basis S, the matrix DS S that representsthe differentiator on Pn can be derived from

    D(1) D(t) D(t2) D(tn )= 0 1 2t ntn1

    = 1 t t2 tn

    0 1 0 00 0 2

    . . ....

    0 0. . .

    . . . 0...

    . . .. . .

    . . . n

    0 0

    DSS

    .

    The matrix DS S is not invertible hence neither is the differ-entiator. Indeed the differentiator is not invertible because

    every constant maps to 0. The only eigenvalue that the ma-

    trix has is = 0 hence the differentiator has no eigenval-ues other than = 0. Indeed, the derivative of any poly-nomial is of lower degree so nonconstant eigenfunctions

    do not exist. The eigenfunctions with eigenvalue 0 are the

    constant functions.If we choose as domainV = span{et, e2t} with obvious

    basis V = {et, e2t} then the matrix DV V of the differentia-tor becomes

    DV V =

    1 0

    0 2

    because

    D(et) D(e2t)

    = et 2 e2t = et e2t 1 00 2

    .

    Now DV V is invertible, hence the differentiator is invert-

    ible on span{et, e2t}, indeed it is. Also, its eigenvalues are1 and 2 hence f span{et, e2t} exist with D( f) = f andD( f) = 2 f, Clearly such f exist.

    1/2

    p(t) g(t) = p(1 t)

    Figure 3.6: g(t)

    =p(t

    1)

    Example 3.5.5 (Eigenfunction). Consider the mapping

    F : P2 P2 defined as

    g = F(p) : g(t) = p(1 t).

    The graph (t, g(t)) is the graph (t, p(t)) reflected in the

    vertical axis at t = 1/2, see Fig. 3.6. With respect to thestandard basis S = {1, t, t2} the matrix FS S follows as

    F(1) F(t) F(t2) = 1 1 t (1 t)2

    = 1 t t2 1 1 10 1 20 0 1

    FSS

    .

    Because of its upper-triangular structure, the eigenvalues

    of FS S are the diagonal elements,

    = 1(twice) and = 1.

    It is readily verified that the corresponding eigenvectors

    (modulo scaling etc.) are

    = 1 : v1 = 100

    , v1 = 011

    23

  • 8/2/2019 Linear Analysis 2010

    30/66

    and

    = 1 : v1 =12

    0

    .

    This corresponds to the eigenfunctions

    p1(t) =

    1 t t2 10

    0

    = 1,

    p1(t) =

    1 t t2 01

    1

    = t2 t,

    and

    p1(t) = 1 t t2 120

    = 2t 1.See Fig. 3.7. Since the eigenvector v1 for = 1 isunique (up to scaling) the eigenfunction p1 with eigen-value 1 is unique as well (up to scaling). The eigenfunc-tions with eigenvalue 1 are the linear combinations of p1and p1 .

    12

    p1

    1

    2

    p1

    12

    p1

    Figure 3.7: Three eigenfunctions (Example 3.5.5)

    3.5.1 Eigenspace

    Eigenvectors are not unique. If vis an eigenvector then

    so are 2vand 3v, all with the same eigenvalue. For anyeigenvalue of a linear mapping F, the set of all eigenvec-

    tors, including the zero element, equals

    E : = {v F(v) = v}

    = {v 0 = (I F)(v) }

    = ker(I F).

    This set E is a subspace and we call it the eigenspace of

    F for eigenvalue .

    Example 3.5.6 (Eigenspaceon infinite dimensional vec-

    tor space). Let L : F(R,R) F(R,R) be the linearmapping defined as

    (Lf)(t) = t2 f(t) t R.

    We determine the eigenvalues and eigenspaces of this map-

    ping. Now a nonzero f F(R,R) is an eigenvector witheigenvalue if

    t2 f(t) = f(t) t R. (3.2)

    Since t2 is real, any eigenvalue is necessarily real as well.

    Among these we distinguish three cases:

    If < 0 then (3.2) holds only if f(t) = 0 t. Butthe zero function is by definition not an eigenvector.

    Hence no < 0 is an eigenvalue.

    If = 0 then (3.2) implies that f(t) = 0 for all t = 0.The value f(t) at t = 0 may be anything as long asit is nonzero because eigenvectors are by definition

    nonzero. So

    f(t) = 1 t = 00 t = 0

    is an eigenvector with eigenvalue 0 and the corre-

    sponding eigenspace is the 1-dimensional

    E0 = span{f1}.

    If > 0 then (3.2) holds at t = irrespective off. At all other t we need f(t) = 0. Now

    f2(t)

    = 1 t = 0 t =

    , f3(t)

    = 1 t = 0 t =

    are two independent eigenvectors with eigenvalue

    and the eigenspace in this case equals

    E = span{f2, f3}.

    It has dimension two.

    Notice that in the above example every real number 0 is an eigenvalue of the mapping. This is in stark contrast

    with mappings on finite dimensional vector space, which

    have finitely many eigenvalues only.

    Example 3.5.7. The differentiatorD : Pn Pn of Ex-ample 3.5.4 has one eigenvalue only, = 0, and the eigen-vectors were shown to equal the nonzero constant func-

    tions. The eigenspace for = 0 is E=0 = span{1}. It isthe set of all constant functions, including the zero func-

    tion.

    Example 3.5.8. The mapping of Example 3.5.5 has two

    eigenvalues, = 1 and = 1. The eigenspaces are

    E=1=

    span

    {1, t2

    t

    }, E=1

    =span

    {2t

    1

    }.

    24

  • 8/2/2019 Linear Analysis 2010

    31/66

    3.5.2 Diagonalization

    A linear transformation F : V V is said to be diagonal-izable ifV has a basis S with respect to which the matrix

    FS S is diagonal. More succinctly, it is diagonalizable if the

    space has a basis of eigenvectors ofF.

    Example 3.5.9 (Differentiator). The differentiator D :

    Pn Pn of Example 3.5.4 is not diagonalizable be-cause only the constant functions are eigenfunctions and

    these do not form a basis ofPn (unless n = 0).The same differentiator D : V V but now with V =

    span{et, e2t} is diagonalizable. Example 3.5.10. Consider the linear mapping A : R2 R2 that, with respect to some basis S = {s1, s2}, has matrixrepresentation

    AS S =1 1

    6 2

    .

    This matrix has characteristic polynomial

    det(I AS S) = det

    1 16 2

    = 2 3 4

    and its zeros are 1 = 4 and 2 = 1. The correspondingeigenspaces follow as

    E

    =4

    =ker(4I

    A)

    =ker

    3 16 2 = span

    1

    3and

    E=1 = ker(IA) = ker2 16 3

    = span

    1

    2

    .

    Hence V := {v1, v2} defined as

    v1 =

    s1 s2 1

    3

    , v2 =

    s1 s2

    12

    are eigenvectors ofA, and the matrix AV V with respect to

    this basis is the diagonal matrix of eigenvalues,

    AV V =

    1 0

    0 2

    =

    4 0

    0 1

    .

    The AS S we started with can now be written as a product

    of three matrices, each with its own interpretation:

    AS S =

    1 1

    3 2

    transform

    coordinates in basis Vto coordinates in basis S

    4 0

    0 1

    apply mapping

    in coordinatesof basis V

    1 1

    3 21

    transform

    coordinates in basis Sto coordinates in basis V

    3.6 Problems

    3.1 Let L : F(R,R) F(R,R) be the operator de-fined as (Lf)(x ) = x 2 f(x ). Show that L is linear.

    3.2 Determine which of the following mappings are lin-ear:

    a) F : R R : F(t) = 3t+ 1b) A : Pn R : F(p) = p(1)(3)c) B : P P : B(p) = p(1)d) G : Cn C : G(x ) = aHx (where a Cn is

    some given vector)

    3.3 The plus sign + appears four times in Section 3.1.Which of these four plus signs indicate the the same

    type of addition?

    3.4 Let V,W be two real vector spaces or two complexvector spaces and let L(V,W) be the set of linear op-

    erators from V to W. On this set of operators we de-

    fine addition and scalar multiplication as

    (A+ B)(x ) := A(x ) + B(x ),(A)(x ) := (A(x )).

    a) Show thatA+ B is linear ifA,B are linearb) Show that A is linear ifA is linear and is

    scalar

    c) Show thatL(V,W) is a vector space

    d) Briefly comment on a link between L(Ck

    ,Cn

    )and n k complex matrices3.5 Let V be a subspace ofRn . Show that the orthog-

    onal projection from x to its best approximation v(Thm. 1.7.2) is linear.

    3.6 Assume F is linear. Show that for any m N andscalars a1, . . . , am and vectors v1, . . . , v m there holds

    F(a1v1 + a2v2 + + amvm )= a1F(v1) + a2F(v2) + + amF(vm ).

    3.7 Suppose F : V

    W is linear and that V and W are

    complex vector spaces.

    a) Show that ker(F) is a subspace ofV

    b) Show that Im(F) is a subspace ofW

    3.8 Let B Cnn . Show that the mapping L : Cnn Cnn defined as L(A) = A B B A is linear.

    3.9 Let C([a, b],R) denote the subspace of continuous

    functions inF([a, b],R). Is the integral operator J :

    C[a, b] C[a, b] defined as

    f

    =J(g) : f(t)

    = t

    a

    g( ) d

    linear?

    25

  • 8/2/2019 Linear Analysis 2010

    32/66

    3.10 Consider the linear transformation F : P1 P1defined by

    F(0 + 1t) = 0 + (80 1)t.a) Determine the matrix ofF with respect to the

    standard basis ofP1.

    b) Determine the matrix ofFwith respect to basis

    {t+ 1, t 1}.c) Determine the eigenvalues of the above two ma-

    trices.

    d) Determine the eigenvalues ofF without using

    the matrices.

    3.11 Let (N; R) and consider the mapping :(N; R) (N; R) defined as

    f

    =(g) : fk

    =kgk.

    a) Show that the mapping is linear

    b) What are the eigenvalues of?

    3.12 Consider the complex vector space of infinitely often

    differentiable functions

    C(R,C)

    = {u + iv u(k), v(k) F(R,R) k N}.

    Consider on this space the differentiator D( f) =f(1). Determine all eigenvalues ofD.

    3.13 LetA,B, C : V V and supposeV has a finite basisS. Show that

    A = BC AS S = BS SCS S

    3.14 Consider the subspaceW := span{1, sin(x ), sin(2x )}of F(R,R) and the second derivative T : W W, T(g) = g(2).

    a) Determine the eigenvalues and eigenspaces ofT

    b) Is T : W W diagonalizable?3.15 Let V be a vector space and A : V V a linear

    transformation.

    a) SupposeA = A2. Show that 0 or 1 are the onlypossible eigenvalues

    b) SupposeAk = 0 for some k N. Which eigen-values are possible?

    c) Construct a V and linear A : V V for whichA = 0 while A2 = 0.

    3.16 Consider the mapping F : P2 P2 defined asF(p)(t) = p(t).

    a) Determine a basis S ofP2

    b) Determine the matrix FS S of the mapping withrespect to this basis S

    c) Find the eigenvalues and eigenvectors ofFS S

    d) Find the eigenvalues and eigenfunctions p P2 ofF.

    3.17 Repeat the previous question but now for mapping

    F(p)(t) = p(t+ 1).3.18 Determine eigenvalues and eigenvectors of A and

    check whether or not A can be diagonalized, for

    a) A =

    1 2

    0 3

    b) A =

    0 1

    2 3

    c) A =

    1 1

    0 1

    3.19 Show that

    A =

    1 3

    1 1

    is diagonalizable. Use this to compute A4.

    3.20 Is the operator of Example 3.5.5 diagonalizable?

    26

  • 8/2/2019 Linear Analysis 2010

    33/66

    4 Normed vector space

    A normed vector space loosely speaking is a vector space

    in which a length a size of a vector is available. This

    additional structure allows us to deal with optimal approx-

    imation and with limits of vectors. We denote the length

    of a vector x by x and call it the norm ofx .

    4.1 Norm

    Definition 4.1.1 (norm). Let V be a real or complex vec-

    tor space. A mapping from V to R is a norm if for allx , y V and all scalars it satisfies the three axioms:

    1. x = ||x, (positive homogeneous)2. x + y x + y, (triangle inequality)3. x > 0 for every x = 0. (positive definite)

    For = 0 the first axiom tells us that 0 = 0. Soa norm x is zero if and only if x is the zero vector. Anormed vector space is a vector space on which a norm is

    defined. Formally one should say (V, ) is a normedvector space but we usually just say V is normed vector

    space assuming that the choice of norm is clear from theproblem at hand. Be aware, however, that a vector space

    can be equipped with many different norms.

    x1 1 (1, 0)

    (0, 1)

    x2 1 (1, 0)

    (0, 1)

    x 1

    (1, 1)(1, 1)

    Figure 4.1: Unit balls in p-norm for p = 1, 2,

    Example 4.1.2 (Three different norms on R2).

    1. The 1-norm is defined as

    x1 = |x1| + |x2|.

    In the first quadrant where x1 and x2 are nonneg-

    ative the 1-norm is just the the sum the entries,

    x1 = x1 + x2. In the first quadrant therefore thenorm is at most 1 iffx2 1 x1, which is the region

    (1, 0)

    (0, 1)

    Combined with the other three quadrants we get that

    the unit ball {x x1 1} is a polytope, a square in

    fact, see Fig. 4.1(a).

    2. The Euclidean norm, also known as the 2-norm, is

    defined as

    x2 :=

    x 21 +x 22 .

    In this norm the unit ball {x x2 1} is the unit

    disc, see Fig. 4.1(b).

    3. The max-norm, or -norm, is defined as

    x = max(|x1|, |x2|).

    Now in this norm the unit ball {x

    x 1} is a

    square with its axes parallel to the x1- and x2 axis, see

    Fig. 4.1(c).

    The 1-norm is sometimes called the manhattan norm be-

    cause in a rectangular street grid which is common in

    US cities the 1-norm x y1 is the minimal Euclideandistance required to travel from junction x to junction y,

    see Fig. 4.2.

    x

    y

    Figure 4.2: Manhattan norm: all three routes are

    equally long, x y1

    The triangle inequality x + y x + y looselyspeaking says that in any norm traveling from 0 to x + yvia x or y can only mean a detour. Moving the y to theleft-hand side of the inequality turns the triangle inequality

    into a statement that says that any side in a triangle is at

    least the difference of the other two sides:

    x + y y xx

    y

    x+

    y

    This is sometimes called the reverse triangle inequality and

    it is commonly formulated in terms of z = x + y as:

    Lemma 4.1.3. |z y| z y. In this form it is immediate that if two vector z and y are

    close then their norms are close as well. This impliesthat norms are continuous in some way (see 4.4.1).

    27

  • 8/2/2019 Linear Analysis 2010

    34/66

    Example 4.1.4. The space of finitely nonzero sequences

    finite(N; R) is a normed vector space in the 1-normdefinedas

    f1 :=

    i=1|fi |.

    See Problem 4.4.

    Example 4.1.5 (Continuous functions in max-norm).

    The standard norm on the vector space C[a, b] of continu-

    ous functions on real interval [a, b] is the max-norm, also

    known as -norm, defined asf = max

    x[a,b]|f(x )|.

    We now verify that this indeed satisfies the three axioms of

    norm:

    1. For every scalar we have

    f

    =maxx

    |f(x )

    | =maxx |||f(x )| = || maxx |f(x)| = ||f.2. The max norm inherits the triangle inequality from

    R: since for every p, q R we have that |p + q| |p| + |q|, we also have for every f, g C[a, b] that

    f + g = maxx

    |f(x ) + g(x )| max

    x|f(x )| + |g(x)|

    maxx

    |f(x )| + maxx

    |g(x )|= f + g.

    3. If f is not the zero function then f(x0) = 0 for atleast one x0 [a, b]. Now f |f(x0)| > 0.

    In some literature the vector space C[a, b] is identified

    with the normed vector space (C[a, b], ). This isunfortunate since we may want to consider other norms on

    the space of continuous functions, for instance:

    Example 4.1.6. On C[a, b]

    f1 :=b

    a

    |f(x )| dx (4.1)

    is a norm (Problem 4.5).

    Notice that in this example the norm f1 exists (is fi-nite) for every continuous function. For arbitrary functions

    in F([a, b],R) that need not be the case and this is the

    reason we restricted attention to C[a, b]. However the

    space of continuous functions also has its drawbacks for

    this norm:

    Example 4.1.7 (Limit does not exist in the space). Con-

    sider C[1, 1] and the 1-norm defined in (4.1). In thisnorm the sequence of functions

    fn(t) = 0 t [1, 0]nt t (0,

    1

    n )1 t [ 1

    n, 1]

    1/n

    1

    does not converge in the space C[1, 1] because no con-tinuous function f exists for which limn fn f1 = 0. (Convince yourself of this.) Neverthelessthe sequence of functions do approach one another in the

    sense that

    supn>N,m>N

    fn fm1

    goes to zero as N . This follows from the fact thatfor any n, m > N we have

    fn fm1 =1

    1|fn(t) fm (t)| dt

    =1/ min(n,m)

    0

    |fn (t) fm (t)| dt

    1/N

    0

    1 dt

    = 1N

    .

    What fails in this example is that limn fn does notexist in the space, even though the fn become arbitrarily

    close to one another in the given norm. We thus need to

    make a distinction between converging sequence and se-

    quences whose elements become closer and closer. The

    latter is called Cauchy sequence and it is the topic of the

    next section. Incidentally this difference is not specific to

    vector space. It also shows up in sets like the rational num-bers Q. Indeed, in Q we can construct sequences that ap-

    proach one another in absolute value but that do not have

    a limit in the set of rational numbers. An example is the

    sequence of rational numbers {3, 3.1, 3.14, 3.141, . . .} thatconverges to the nonrational .

    4.2 Cauchy sequence

    Definition 4.2.1 (Cauchy sequence and convergent se-

    quence). LetX be a normed vector space and let {xn}nNbe a sequence in

    X.

    {xn} is a Cauchy sequence if for every > 0 N Nsuch that

    n, m > N xn xm < .

    {xn} is a convergent sequence if there is an x Xsuch that limn xn x = 0.

    It can be shown that for sequences {n} ofreal numbersthe two notions are equivalent. I.e. a real sequence con-

    verges iff it is a Cauchy sequence. Figure 4.3 makes thisplausible.

    28

  • 8/2/2019 Linear Analysis 2010

    35/66

    N n

    n

    Figure 4.3: Cauchy criterion for real sequences

    Example 4.2.2 (Integral test for real-valued sequences).

    Consider the real sequence n = 1 + 122 +1

    32+ + 1

    n2.

    Now for every m n > N we have

    |m n | =m

    k=n+1

    1

    k2

    N. Then by thetriangle inequality fn fm = ( fn f) ( fm f) fn f+fm f < /2+/2 = for every n, m > N.So {fn} is Cauchy.

    4.3 Banach space = complete vector space

    Definition 4.3.1 (Banach space). A normed vector space

    X is said to be complete if every Cauchy sequence has a

    limit in X. Complete normed vector spaces are called Ba-

    nach spaces.

    In a Banach space therefore a sequence converges if and

    only if it is a Cauchy sequence. This is beneficial because

    the Cauchy property is often easier to check since it doesnot require knowledge of the limit, see Example 4.2.2, and

    more importantly all sorts of limits are then guaranteed to

    exist. This will be of great help in the final chapter of this

    course.

    Over the years many spaces have been shown to be Ba-

    nach spaces, and also many have been shown to fail the

    Banach property. In this introductory course we will notworry about completeness proofs because the proofs are

    often intricate. We simply list a couple in the remainder of

    this section.

    Theorem 4.3.2 (continuous functions with max norm).

    C[a, b] is a Banach space in the max-norm .Proof. Suppose fn is a Cauchy sequence. Then > 0there is an N > 0 such that fn fm < for alln, m > N. Now at any t [a, b] we have

    |fn(t) fm (t)| fn fm < n, m > N .

    So for every fixed t [a, b] the sequence of real numbers{fn(t)} is Cauchy. Since R is a Banach space we hencehave that the pointwise limit f(t) := limn fn (t) exists.For m we obtain that

    |fn(t) f(t)| n > Nand that this N does not depend on t. Hence fn f 0 as n . Remains to show that this f iscontinuous. Fix an n > N/3. By continuity of fn we have

    at each t that |fn(t) fn (t+h)| < /3 for all h [t, t]for some small enough t. For all such h there holds

    |f(t

    +h)

    f(t)

    | =|f(t

    +h)

    fn(t + h) + fn (t+ h) fn(t) + fn (t) f(t)|

    |f(t + h) fn (t+ h)|+ |fn (t + h) fn(t)|+ |fn (t) f(t)|

    < /3 + /3 + /3 = .So f is continuous.

    Notice that C[a, b] is not complete in the 1-norm (Ex-

    ample 4.1.7) thus completeness is norm dependent. On fi-nite dimensional space it does not depend on the norm:

    Theorem 4.3.3 (Finite dimensional space). Every finite

    dimensional normed vector space is a Banach space.

    Proof (idea only). Suppose S := {v1, . . . , v m} is a basisof the space. If fn is a Cauchy sequence then it may shown

    that its coordinate vectors fn,S is a Cauchy sequence inRm

    in, say, the Euclidean norm. This implies that each fixed

    entry of these vectors is a Cauchy sequence. Since these

    entries are real numbers, they have a limit. The vectors

    fn,S hence entry-wise converges to some fS Rm as n . The corresponding f := v1 vm fS is welldefined, and one can show that limn fn f = 0.

    29

  • 8/2/2019 Linear Analysis 2010

    36/66

    4.3.1 Sequence space 1, 2,

    On the infinite sequence space (N; R) the 1-norm, 2-normand -norm, that we defined on Rn , become the infinitesums and suprema

    v1 := |v1| + |v2| + |v3| + |v4| +

    v2 :=

    |v1|2 + |v2|2 + |v3|2 + |v4|2 + v := sup(|v1|, |v2|, |v3|, |v4|,. . .).

    These, however, are not norms on (N; R) because theyare not finite for some sequences. For instance all three

    norms are infinite for the growing sequence

    v= (1, 2, 3, 4, 5,. . .).The way out of this problem is as simple as it is elegant.

    Merely restricting the sequence space to those elements that

    have finite norm will do the job, and the result is a Banachspace (we skip the proof):

    Theorem 4.3.4 (Complete sequence spaces). The three

    sequence spaces

    1 := {v (N,R) v1 < }

    2 := {v (N,R) v2 < }

    := {v (N,R) v < }

    are all complete in their respective norms.

    Example 4.3.5 (Cauchy or not Cauchy). Consider the in-

    finite sequence

    vn = (1, 12 , 13 , . . . , 1n , 0, 0, . . . )depending on n N. For every n the vn has only finitelymany nonzero entries, so it has finite 1, 2, -norm andthus is in all three vector spaces 1, 2 and . The se-quence vn pointwise converges to

    v = (1, 12 , 13 , 14 , 15 , 16 , . . . )

    as n . This v is not in 1 because

    v

    1

    =1

    +12

    +13

    + = but it is in 2 and with respective norms

    v2 =

    1 + 122

    + 132

    + < v = sup

    k1(1, 1

    2, 1

    3, . . . ) = 1 < .

    This is consistent with the observations that

    {vn}nN is not Cauchy in the 1-norm because no mat-ter how large N is, the quantity

    vn vm1 = 1n+1 + 1n+2 + + 1mcan be taken arbitrary large by appropriate choice ofm n > N.

    {vn}nN is Cauchy in the 2-norm because for alln, m > N we have vn vm22 < 1/N 0 asN (See Example 4.2.2). Since 2 is a Banachspace the vn hence converges in

    2. Indeed.

    {vn}nN is Cauchy in the -norm because for alln, m > N we have vn vm < 1/N 0 asN .

    4.3.2 Lebesgue space L1 and L2

    The function space equivalent of1 we naively define as

    L1[a, b] := {f : [a, b] R

    f1 < }where the 1-norm is now defined as

    f1 =b

    a

    |f(t)| dt.

    We allow a = and b = +. This definition ofL1[a, b] is not precise because it still depends on the def-

    inition of integralb

    a|f(t)| dt. The Riemann integral def-

    inition is not ideal because one can construct a Cauchy

    sequence of Riemann integrable functions whose limiting

    function is so crazy that its Riemann integral is no longer

    well defined. Hence the space L1[a, b] would then fail to

    be complete. The desire of having a complete function

    space was so strong that it prompted mathematicians to

    look for alternative definitions of integration! In the be-ginning of the 20th century the issue was settled by Henri

    Lebesgue. He devised the Lebesgue measure and Lebesgue

    integration with respect to which the space L1[a, b] is

    complete. The interested reader should follow a course on

    measure theory. The symbolL is standard in the math lit-

    erature and it is in honor of its inventor Lebesgue. The dif-

    ference between Riemann- and Lebesgue integration only

    shows up in really weird functions and in this course we

    need not worry about such functions. We simply accept

    that:

    Theorem 4.3.6 (Complete / Banach). L1[a, b] is com-

    plete in the 1-norm.

    Built in in the definition ofL1 is that its elements have

    a well defined 1-norm. This space contains all continu-

    ous functions but also many more, and they need not be

    bounded.

    Example 4.3.7 (SeveralL1 functions). All functions of

    Fig. 4.4 are elements ofL1[0, 1], except the last function

    f9(t) = 0.1t . Indeed1

    0 f9(t) dt = 0.1log(t)|10 = . We should first fix a possibly unsettling problem: part of

    the definition of norm is that

    f > 0 for all f = 0

    30

  • 8/2/2019 Linear Analysis 2010

    37/66

    0 1

    f(t) = 1

    0 1

    f(t) = t

    0 1

    0 1

    1t

    0 1

    1/2|t1/2|

    0 1

    log(t)

    sin(1/t) 0.1/t

    Figure 4.4: The first 8 functions are in L1[0, 1],

    the 9th is not

    but here that is not the case! The 8th function of Fig. 4.4,

    for instance,

    f(t) =1 t = 1/2

    0 elsewhere

    is not the zero function, yet its 1-norm is zero. The sim-

    plistic way out of this problem is to identify every function

    f with zero norm with the zero function. That is not far

    fetched because iff1 = 0 then

    f1 =b

    a

    |f(t)| dt = 0

    implying that f(t) is zero1 almost everywhere. From

    now on we do not distinguish between functions f and g

    when their difference has norm zero, so from now on by

    definition

    f = g f g1 = 0.The counter part of 2 is the space of square integrable

    functions:

    Lemma 4.3.8 (Lebesgue space L2). The space of

    square integrable functions

    L2[a, b] := {f : [a, b] R

    f(t)2 < }

    1In a course on measure theory this identification will be formalized

    through equivalence classes and then the notion ofalmost everywhere

    will be properly defined.

    is complete in the 2-norm defined as

    f2 :=b

    a

    |f(t)|2 dt.

    Here a = and b = + are allowed. The top threefunctions of Fig. 4.4 are in L2[0, 1]. The fourth and fifth

    function of that figure are not L2[0, 1].

    Example 4.3.9 (Complete in L2, not complete in C).

    Consider the standard 2-norm of functions. All functions

    fn : [0, 1] R defined as

    fn(t) =

    n4/5t 0 t 1/n1

    t1/51/n < t 1

    1/n1/5

    1/n

    are continuous. All fn are therefore in C[0, 1] as well as

    in L2[0, 1]. The pointwise limit

    f(t) =

    0 t = 01

    t1/50 < t 1

    is not in C[0, 1] because it is not continuous and in fact it

    is not bounded. It is in L2[0, 1], however, because

    f2 =1

    0

    f2(t) dt =1

    0

    t25 dt = 53 t

    35

    10

    = 53

    is finite. One can show that fn is a Cauchy sequence in

    the 2-norm. Since C[a, b] is not complete in this norm,

    its limit is not guaranteed to exist in the space C[a, b]

    and indeed it does not exist. The space L2[a, b] how-

    ever is complete in this norm and hence limn fn existsin L2[a, b]. Indeed.

    4.4 Bounded linear operator

    Having a norm of vectors allows us to come up with bounds

    for mappings on vectors.

    Definition 4.4.1 (Bounded operator). Let X and Y be

    normed vector spaces. A linear operator F : X Y isboundedif a c 0 exists such that

    F(x )Y cxX x X. (4.2)

    The smallest possible c in (4.2) gives an indication on

    how big the operator is. Ifc for instance is < 1 then we

    know that the norm of the imageF(x ) is less then that ofx ,irrespective of the choice of x . Likewise if (4.2) holds for

    31

  • 8/2/2019 Linear Analysis 2010

    38/66

    c = 2 then the norm ofF(x ) will never be more than twicethe norm x . Et cetera. The smallest possible c is what is

    called the operator norm2.

    Definition 4.4.2 (Operator norm). Let X,Y be normed

    vector spaces and F : X

    Y a bounded operator. The

    operator norm F ofF is defined as3

    F = supx=0

    F(x )YxX

    .

    IfX = {0} then we define F = 0. By definition of operator norm we have for every non-

    trivial vector space X and every x X that

    F(x )Y cxX (4.3)

    ifc = F

    , while for every c less than F

    there are x thatviolate (4.3).

    Example 4.4.3 (Bounded operator). We determine the

    operator norm ofA : C[a, b] R defined as

    A( f) =b

    a

    f(t) dt.

    On C[a, b] we take the max-norm, on R we take the abso-

    lute value.

    |A( f)

    | = b

    a

    f(t) dtb

    a

    |f(t)| dt

    b

    a

    f dt

    = (b a)f.

    The operatorA thus is bounded and its operator norm is at

    most b a. For the constant function f(t) = 1, the aboveis an equality,

    |A(1)| = b

    a 1 dt = (b a) = (b a)1.

    The operator norm hence equals b a. Example 4.4.4 (Unbounded operator). ConsiderC[0, 1]

    with the 1-norm. On this space the operator : C[0, 1] R defined as

    ( f) = f(0)2The attentive reader will wonder why we call it operator norm. Doesnt

    this require that some set of operators F is a vector space and that on

    this vector space the operator norm has the property of norm? The

    answers are yes and yes, but we will not deal with such matters in this

    course, even though we are very close to settling it.3supremum means least upperbound.

    is unbounded. To see this take for instance the sequence of

    functions

    fn

    (t)=

    n(1

    tn

    ) 0

    t

    1n

    0 elsewhere.

    1/n

    n

    The 1-norm of each fn is 1/2 while |( f)| = n. The ratio|(f)|/f1 = 2n is unbounded. This shows that is anunbounded operator.

    4.4.1 Continuity of maps

    We say that a mapping A on a normed vector space is con-

    tinuous at y if for every > 0 there is a > 0 such that

    x y < A(x ) A(y) < .If the mapping is continuous at y for every y in the do-

    main, thenA is said to be continuous. For linearmappings,

    boundedness and continuity are equivalent:

    Theorem 4.4.5 (Bounded = continuous for linear maps).

    For a linear operator A the following three statements are

    equivalent.

    1. A is continuous

    2. A is continuous at 0

    3. A is bounded

    Proof. (1. 2.) is trivial. Now (2. 3.): IfA is