Cayley Hamilton 2011

Embed Size (px)

Citation preview

  • 8/22/2019 Cayley Hamilton 2011

    1/10

    Math 5327 The Cayley-Hamilton Theorem

    Spring 2011 and Minimal Polynomials

    Here are some notes on the Cayley!Hamilton Theorem, with a few extras

    thrown in. First, the proof of the Cayley-Hamilton theorem, that the

    characteristic polynomial is an annihilating polynomial for A. The proof started

    out this way: given a matrix, A, we consider

    (xI - A)adj(xI - A) = det(xI - A) I = cA(x) I.

    It would be nice if we could just plug A in for x, in this equation. Certainly,

    we cannot do that because the matrix adj(xI - A) has entries which are

    polynomials in x, so we would end up with a matrix with matrix entries. As we

    proceed, we will use an example to illustrate the difficulties.

    Suppose that A =

    "$$$$$

    %'''''

    2 1 1

    1 2 1

    1 1 2

    . Then adj(xI - A) =

    "$$$$$$

    %''''''

    x2-4x+3 x-1 x-1

    x-1 x2-4x+3 x-1

    x-1 x-1 x2-4x+3

    ,

    and (xI - A)adj(A) = (x3 - 6x2 + 9x - 4)I.

    We write this out as (xI - A)(B0 + xB1 + x2B2 +

    + xn!1Bn-1) = cA(x)I:

    "$$$$$xI -

    "$$$$

    %''''

    2 1 1

    1 2 1

    1 1 2 %'''''

    "$$$$$"$$$$

    %''''

    3 -1 -1

    -1 3 -1

    -1 -1 3

    + x

    "$$$$

    %''''

    !4 1 1

    1 !4 1

    1 1 !4

    + x2I

    %''''' = (x3 - 6x2 + 9x - 4)I.

    At this point, the expressions would still make sense if we replaced x by A but we

    are not guaranteed that the resulting equation is valid. For example, if we replace

    x by the matrix

    "$$$$

    %''''

    2 1 2

    1 2 1

    1 1 2

    ,

  • 8/22/2019 Cayley Hamilton 2011

    2/10

    "$$$$$"$$$$

    %''''

    2 1 2

    1 2 1

    1 1 2

    -

    "$$$$$

    %'''''

    2 1 1

    1 2 1

    1 1 2 %''''''

    "$$$$$$"$$$$$

    %'''''

    3 -1 -1

    -1 3 -1

    -1 -1 3

    +

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2 "$$$$$

    %'''''

    !4 1 1

    1 !4 1

    1 1 !4

    +

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    2

    %''''''

    =

    "$$$$$

    %'''''

    0 0 1

    0 0 0

    0 0 0

    "$$$$$$"$$$$$

    %'''''

    3 -1 -1

    -1 3 -1

    -1 -1 3

    +

    "$$$$$

    %'''''

    !5 0 !5

    !1 !6 !1

    !1 !1 !6

    +

    "$$$$$

    %'''''

    7 6 9

    5 6 6

    5 5 7 %''''''

    =

    "$

    $$$

    %'

    '''

    0 0 1

    0 0 0

    0 0 0 "$

    $$$

    %'

    '''

    5 5 3

    3 3 4

    3 3 4

    =

    "$

    $$$$

    %'

    ''''

    3 3 4

    0 0 0

    0 0 0

    ,

    whereas cA

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    = !4I + 9

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    ! 6

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    2

    +

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    3

    =

    "$$$$$

    %'''''

    !4 0 0

    0 !4 0

    0 0 !4

    +

    "$$$$

    %''''

    18 9 18

    9 18 9

    9 9 18

    !

    "$$$$

    %''''

    42 36 54

    30 36 36

    30 30 42

    +

    "$$$$

    %''''

    29 28 38

    22 23 28

    22 22 29

    =

    "$$$$

    %''''

    1 1 2

    1 1 1

    1 1 1

    ,

    a very different answer.

    Thus, (xI - A)(B0 + xB1 + x2B2 + + xn!1Bn-1) = cA(x)I is correct for

    scalars x, but does not appear to work if x is a matrix. Now if x is a scalar,

    (xI - A)(B0 + xB1 + x2B2 +

    + xn!1Bn-1)

    = !AB0 +(xB0! AxB1)+(x2B1! Ax

    2B2)+ xnBn!1

    = !AB0+ x(B0! AB1)+ x2(B1! AB2)+

    + xnBn!1.

    page 2

  • 8/22/2019 Cayley Hamilton 2011

    3/10

    If we denote cA(x) by cA(x)= a0+ a1x ++ an!1x

    n!1+ xn, then we have that

    !AB0+ x(B0! AB1)+ x2(B1! AB2)+

    + xnBn!1

    =a0I + xa1I + + xn!1an!1I + xnI.

    Two polynomials are equal if and only if they have the same coefficients (you

    might think about why this is true, even if the coefficients are matrices), so

    a0I =!AB0,, Bn!1= I. This means that for scalars AND matrices x,

    !AB0+ x(B0! AB1)+ x2

    (B1! AB2)+

    + xn

    Bn!1 = cA(x)I.

    You might check that the matrix

    "$$$$

    %''''

    2 1 2

    1 2 1

    1 1 2

    can be substitutes in for x here, and a

    correct result follows. So we have the following: For any x, scalar OR matrix,

    cA(x)I =!AB0+ x(B0! AB1)+ x2(B1! AB2)++ xnBn!1,

    and if x is a scalar,

    !AB0+ x(B0! AB1)+ x2(B1! AB2)+

    + xnBn!1

    = (xI - A)(B0 + xB1 + x2B2 +

    + xn!1Bn-1),

    but if x is a matrix, then

    (xI - A)(B0 + xB1 + x2B2 +

    + xn!1Bn-1)

    = !AB0 +(xB0! AxB1)+(x2B1! Ax

    2B2)+ xnBn!1

    If x is a matrix for which

    page 3

  • 8/22/2019 Cayley Hamilton 2011

    4/10

    (*) !AB0 +(xB0! AxB1)+(x2B1! Ax

    2B2)+ xnBn!1

    =!AB0+ x(B0! AB1)+ x2(B1! AB2)+

    + xnBn!1,

    then we could perform the calculation in this way:

    cA(x)=!AB0+ x(B0! AB1)+ x2(B1! AB2)+

    + xnBn!1(already true)

    =!AB0 +(xB0! AxB1)+(x2B1! Ax

    2B2)+ xnBn!1

    =(xI ! A)((B0 + xB1 + x2B2 +

    + xn!1Bn-1).

    For which matrices will this work? The answer is that we can do this, so long as

    x commutes with any power of A, and this was the problem with

    "$$$$$

    %'''''

    2 1 2

    1 2 1

    1 1 2

    : it

    does not commute with A. Since A commutes with any power of A, it is legal

    to substitute A for x into the equation, and we obtain

    (A ! A)(B0++ An!1Bn!1) = cA(A),

    and in particular, that cA(A)= 0.

    So the point of this proof is the following:

    (**) (xI ! A)((B0 + xB1 + x2B2 +

    + xn!1Bn-1) = cA(x)I

    is true for any x for which commutes with A (that is, for which xA = Ax). As

    an exercise, you might try x =

    "$$$$$

    %'''''

    1 1 1

    1 1 1

    1 1 1

    , which commutes with A, and show that

    (**) works for this x.

    page 4

  • 8/22/2019 Cayley Hamilton 2011

    5/10

    Here is an extension of the Cayley!Hamilton Theorem. It uses adj(xI ! A)

    to calculate the minimal polynomial of A. Suppose that the greatest common

    divisor of all the entries in adj(xI ! A) is g(x). Then mA(x) =cA(x)

    g(x)

    . The

    proof is very similar to the proof of the Cayley!Hamilton theorem: we can write

    cA(x)I = (xI ! A) adj(xI ! A)

    = (xI ! A) g(x)(C0+ xC1++ xmCm)

    for some m. Dividing by g(x), we have

    [cA(x)/g(x)] I = (xI ! A)(C0+ xC1++ xmCm).

    As before, the right hand side of the above can be multiplied out to get a

    polynomial with matrix coefficients equal to [cA(x)/g(x)] I, and this will all

    be legal as long as x commutes with A. Thus,

    (cA/g)(A)=(A ! A)(C0+ AC1++ AmCm)= 0.

    Now suppose that f(x) = xnCn+ xn!1Cn!1+

    + xC1+ C0, is ANY

    polynomial with matrix coefficients. If x is a variable that commutes with A,

    then we can write f(x) = (xI ! A) q(x)+ R, for some polynomial q(x) with

    matrix coefficients, where R is some remainder matrix. In particular, if f is an

    annihilating polynomial for A, then f(A)= 0 =(A ! A)q(A)+ R shows R= 0.

    Thus, f(x)=(xI ! A)q(x) for some polynomial q(x). If mA(x) is the minimal

    polynomial of A and cA(x)= h(x)mA(x), we have

    cA(

    x)I=

    h(x)m

    A(x)I=

    (xI

    !A)h(x)q(x)=

    (xI

    !A)h(x)Q(x)

    where q(x) is a polynomial with matrix coefficients and Q(x) is a

    matrix with polynomial coefficients. Comparing this to

    cA(x)I = (xI ! A) adj(xI ! A),

    page 5

  • 8/22/2019 Cayley Hamilton 2011

    6/10

    we have that h(x)Q(x)= adj(xI ! A), so h(x) must be a divisor of each

    entry of adj(xI ! A). This proves that h = g.

    As anexample, consider the matrix:

    A =

    "$$$

    $$$$

    %'''

    ''''

    2 1 1 1

    !1 0 !1 !1

    1 1 2 1

    !1 !1 !1 0

    , xI ! A =

    "$$$

    $$$$

    %'''

    ''''

    x!2 !1 !1 !1

    1 x 1 1

    !1 !1 x!2 !1

    1 1 1 x

    ,

    We have (after MUCH work)

    adj(xI ! A)=

    "$$$$$$$

    %'''''''

    x(x!1)2 (x!1)2 (x!1)2 (x!1)2

    !(x!1)2 (x!1)2(x!2) !(x!1)2 !(x!1)2

    (x!1)2 (x!1)2 x(x!1)2 (x!1)2

    !(x!1)2 !(x!1)2 !(x!1)2 (x!1)2(x!2)

    ,

    so g(x)= (x ! 1)2, and mA(x)= cA(x)/(x!1)=(x ! 1)4

    (x ! 1)2= (x ! 1)2.

    One last thing mentioned in class: That matrices over the integers, or

    matrices with polynomial coefficients can be put into something called

    Smith!Normal form: Given A, there are integer(or polynomial) matrices P and

    Q so that det(P)= 1, det(Q)= 1, and PAQ =

    "$$$

    $$$$

    %'''

    ''''

    c1

    c2

    cn

    , a diagonal

    matrix, with cn divisible by cn!1 divisible by divisible by c1. This is not quite

    true: The first several cs will be 0 if det(A)= 0. It is only after the cs become

    nonzero that they start dividing each other. The cs will be integers if A is an

    integer matrix, polynomials if A is a polynomial matrix. If det(A)! 0, then the

    page 6

  • 8/22/2019 Cayley Hamilton 2011

    7/10

    product of the cs is det(A). Finally, if we do this for xI ! A, whose

    determinant is cA(x), then the cs can be taken to be monic polynomials, and the

    largest of these, cn, is the minimal polynomial. The proof for this rests on the

    fact that if B(x) is a matrix with polynomial entries, and we make a single row or

    column operation on B(x) to get B((x), then the greatest common divisor of the

    entries in adj(B((x)) is the same as for adj(B(x)). Once this is established, one

    may perform any number of row or column operations without changing the gcd

    of the entries of the adjoint. Finally, the adjoint of a diagonal matrix is extremely

    easy to figure out, and one gets that the gcd will be the product c1cn!1, so the

    minimal polynomial isc

    A

    (x)

    c1cn!1

    =c

    1

    cn!1

    cn

    c1cn!1

    = cn. Lets verify that a single

    row operation does not change the gcd of the adjoint for one case when B is 3x3.

    I will use the cofactor matrix rather than the adjoint below (avoids a transpose). I

    will use C(B) for the cofactor matrix of B.

    Suppose B =

    "$

    $$$$

    %'

    ''''

    a1 a2 a3

    b1 b2 b3

    c1 c2 c3

    , and B(=

    "$

    $$$$

    %'

    ''''

    a1!yb1 a2!yb2 a3!yb3b1 b2 b3

    c1 c2 c3

    . Now let g(x) be

    the gcd of the entries in C(B), h(x) the gcd of the entries in C(B(). When we take

    cofactors, the top row will be unchanged. This means that g(x) and h(x) both

    divide the entries in the top row of either cofactor matrix Lets look at the

    bottom row. This will be (a2b3! a3b2, a3b1! a1b3, a1b2! a2b1) for B and

    ((a2!yb2)b3!(a3!yb3)b2, something, something) for B(. But

    (a

    2!yb

    2)b

    3!

    (a

    3!yb

    3)b

    2=

    a2b

    3!

    a3b

    2, so the bottom rows are the same as well.

    This leaves just the second row to check. The second rows will be

    (a3c2! a2c3, a1c3! a3c1, a2c1! a1c2) for B, and

    ((a3!yb3)c2!(a2!yb2)c3, (a1!yb1)c3!(a3!yb3)c1, (a2!yb2)c1!(a1!yb1)c2)

    for B(. These are clearly different. Now g(x) divides each of the entries for B.

    page 7

  • 8/22/2019 Cayley Hamilton 2011

    8/10

    In particular, say, g(x) divides a1c3! a3c1. But g(x) also divides b1c3! b3c1

    because this is an entry in C(B)(the (1, 2)!cofactor). Since g(x) divides both

    a1c3! a3c1 and b1c3! b3c1, it also divides a1c3! a3c1! y(b1c3! b3c1), the

    (3, 2)!cofactor for B(. In a similar way, we see that g(x) divides all the cofactors

    in B(, so it must divide the greatest common divisor of these. That is, h(x) is

    divisible by g(x). But row operations are reversible. Because of this, the same

    argument shows that g(x) is divisible by h(x). Since each divides the other, they

    are, up to scalar multiples, the same polynomial.

    Here is an example: (same as before)

    A =

    "$$$$$$$

    %'''''''

    2 1 1 1

    !1 0 !1 !1

    1 1 2 1

    !1 !1 !1 0

    , xI ! A =

    "$$$$$$$

    %'''''''

    x!2 !1 !1 !1

    1 x 1 1

    !1 !1 x!2 !1

    1 1 1 x

    ,

    We have:

    "$$$

    $$$$

    %'''

    ''''

    x!2 !1 !1 !1

    1 x 1 1

    !1 !1 x!2 !1

    1 1 1 x

    )row operations

    "$$$

    $$$$

    %'''

    ''''

    x!1 0 0 x!1

    0 x!1 0 !x+1

    0 0 x!1 x!1

    1 1 1 x

    (to simplify)

    )row operations

    "$$$

    $$$$

    %'''

    ''''

    1 1 1 x

    0 x!1 0 !x+1

    0 0 x!1 x!1

    0 !x+1 !x+1 !x2+2x!1

    . )column operations

    "$$$

    $$$$

    %'''

    ''''

    1 0 0 0

    0 x!1 0 !x+1

    0 0 x!1 x!1

    0 !x+1 !x+1 !x2+2x!1

    )

    "

    $$$

    $$$$

    %

    '''

    ''''

    1 0 0 0

    0 x!1 0 !x+1

    0 0 x!1 x!1

    0 0 !x+1 !x2+x

    )

    "

    $$$

    $$$$

    %

    '''

    ''''

    1 0 0 0

    0 x!1 0 0

    0 0 x!1 x!1

    0 0 !x+1 !x2+x

    )

    "

    $$$

    $$$$

    %

    '''

    ''''

    1 0 0 0

    0 x!1 0 0

    0 0 x!1 x!1

    0 0 0 !x2+2x!1

    )

    "$$$

    $$$$

    %'''

    ''''

    1 0 0 0

    0 x!1 0 0

    0 0 x!1 0

    0 0 0 (x!1)2

    so again, the minimal polynomial is (x ! 1)2.

    page 8

  • 8/22/2019 Cayley Hamilton 2011

    9/10

    Finally, for this set of notes, the relationship between minimal polynomials

    and diagonalization of operators or matrices. First, a result on kernels of

    compositions of operators (or of products of matrices).

    Lemma If S and T are linear operators on V, then

    dim(ker(ST))" dim ker(S)+ dim ker(T).

    Proof: Ker(ST) = {v * ST(v)= 0}

    = {v * T(v)= 0} + {v * T(v)! 0 but ST(v)= 0}.

    Suppose that ker(T) has basis {u1, u2,

    , um} and Ker(ST) has basis{u1, u2,

    , um}+{v1, v2,, vk}.

    Then dim(ker(T))= m, and dim(ker(ST))= m+k. Now {T(v1), T(v2),,T(vk)}

    is a linearly independent set!!you should check that this is true! it is an important

    fact about linear transformations. How big can k be? Since S(T(vi))= 0, each

    T(vi) is in the kernel of S. Since a vector space cant contain more independent

    vectors than its dimension, k"dim(ker(S)), Thus,

    m + k " m + dim(ker(S)) = dim(ker(T))+ dim(ker(S)).

    Theorem. Let T be a linear operator on a finite dimensional vector space V.

    Then T is diagonalizable if and only if its minimal polynomial, mT(x) factors

    into distinct linear terms over F.

    Proof: One direction was easy: If T is diagonalizable, then in some basis B, for

    V, [T]B is a diagonal matrix. The minimal polynomial for T is the same as the

    minimal polynomial for [T]B, and it is easy to check that the minimal polynomial

    for a diagonal matrix factors into distinct linear terms.

    page 9

  • 8/22/2019 Cayley Hamilton 2011

    10/10

    For the other direction, let mT(x) = (x ! a1)(x ! a2)(x ! am). Then

    0= mT(T) is the m!fold composition of the linear operators

    T ! a1I,, T ! amI.

    Now V = ker(0). Consequently, by the lemma, we have

    dim V = dim ker(0) " dim ker(T!a1I)++ dim ker(T!am) " dim V,

    and this can only be true if

    dim V = dim ker(T!a1I)++ dim ker(T!am).

    Since ker(T ! aiI) is the eigenspace of ai, this implies that T is diagonalizableby a previous theorem.

    page 10