Cayley Hamilton 2011

8/22/2019 Cayley Hamilton 2011

1/10

Math 5327 The Cayley-Hamilton Theorem

Spring 2011 and Minimal Polynomials

Here are some notes on the Cayley!Hamilton Theorem, with a few extras

thrown in. First, the proof of the Cayley-Hamilton theorem, that the

characteristic polynomial is an annihilating polynomial for A. The proof started

out this way: given a matrix, A, we consider

(xI - A)adj(xI - A) = det(xI - A) I = cA(x) I.

It would be nice if we could just plug A in for x, in this equation. Certainly,

we cannot do that because the matrix adj(xI - A) has entries which are

polynomials in x, so we would end up with a matrix with matrix entries. As we

proceed, we will use an example to illustrate the difficulties.

Suppose that A =

"$$$$$

%'''''

2 1 1

1 2 1

1 1 2

. Then adj(xI - A) =

"$$$$$$

%''''''

x2-4x+3 x-1 x-1

x-1 x2-4x+3 x-1

x-1 x-1 x2-4x+3

,

and (xI - A)adj(A) = (x3 - 6x2 + 9x - 4)I.

We write this out as (xI - A)(B0 + xB1 + x2B2 +

+ xn!1Bn-1) = cA(x)I:

"$$$$$xI -

"$$$$

%''''

2 1 1

1 2 1

1 1 2 %'''''

"$$$$$"$$$$

%''''

3 -1 -1

-1 3 -1

-1 -1 3

+ x

"$$$$

%''''

!4 1 1

1 !4 1

1 1 !4

+ x2I

%''''' = (x3 - 6x2 + 9x - 4)I.

At this point, the expressions would still make sense if we replaced x by A but we

are not guaranteed that the resulting equation is valid. For example, if we replace

x by the matrix

"$$$$

%''''

2 1 2

1 2 1

1 1 2

,


2/10

"$$$$$"$$$$

%''''

2 1 2

1 2 1

1 1 2

-

"$$$$$

%'''''

2 1 1

1 2 1

1 1 2 %''''''

"$$$$$$"$$$$$

%'''''

3 -1 -1

-1 3 -1

-1 -1 3

+

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2 "$$$$$

%'''''

!4 1 1

1 !4 1

1 1 !4

+

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

2

%''''''

=

"$$$$$

%'''''

0 0 1

0 0 0

0 0 0

"$$$$$$"$$$$$

%'''''

3 -1 -1

-1 3 -1

-1 -1 3

+

"$$$$$

%'''''

!5 0 !5

!1 !6 !1

!1 !1 !6

+

"$$$$$

%'''''

7 6 9

5 6 6

5 5 7 %''''''

=

"$

$$$

%'

'''

0 0 1

0 0 0

0 0 0 "$

$$$

%'

'''

5 5 3

3 3 4

3 3 4

=

"$

$$$$

%'

''''

3 3 4

0 0 0

0 0 0

,

whereas cA

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

= !4I + 9

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

! 6

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

2

+

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

3

=

"$$$$$

%'''''

!4 0 0

0 !4 0

0 0 !4

+

"$$$$

%''''

18 9 18

9 18 9

9 9 18

!

"$$$$

%''''

42 36 54

30 36 36

30 30 42

+

"$$$$

%''''

29 28 38

22 23 28

22 22 29

=

"$$$$

%''''

1 1 2

1 1 1

1 1 1

,

a very different answer.

Thus, (xI - A)(B0 + xB1 + x2B2 + + xn!1Bn-1) = cA(x)I is correct for

scalars x, but does not appear to work if x is a matrix. Now if x is a scalar,

(xI - A)(B0 + xB1 + x2B2 +

+ xn!1Bn-1)

= !AB0 +(xB0! AxB1)+(x2B1! Ax

2B2)+ xnBn!1

= !AB0+ x(B0! AB1)+ x2(B1! AB2)+

+ xnBn!1.

page 2


3/10

If we denote cA(x) by cA(x)= a0+ a1x ++ an!1x

n!1+ xn, then we have that

!AB0+ x(B0! AB1)+ x2(B1! AB2)+

+ xnBn!1

=a0I + xa1I + + xn!1an!1I + xnI.

Two polynomials are equal if and only if they have the same coefficients (you

might think about why this is true, even if the coefficients are matrices), so

a0I =!AB0,, Bn!1= I. This means that for scalars AND matrices x,

!AB0+ x(B0! AB1)+ x2

(B1! AB2)+

+ xn

Bn!1 = cA(x)I.

You might check that the matrix

"$$$$

%''''

2 1 2

1 2 1

1 1 2

can be substitutes in for x here, and a

correct result follows. So we have the following: For any x, scalar OR matrix,

cA(x)I =!AB0+ x(B0! AB1)+ x2(B1! AB2)++ xnBn!1,

and if x is a scalar,

!AB0+ x(B0! AB1)+ x2(B1! AB2)+

+ xnBn!1

= (xI - A)(B0 + xB1 + x2B2 +

+ xn!1Bn-1),

but if x is a matrix, then

(xI - A)(B0 + xB1 + x2B2 +

+ xn!1Bn-1)

= !AB0 +(xB0! AxB1)+(x2B1! Ax

2B2)+ xnBn!1

If x is a matrix for which

page 3


4/10

(*) !AB0 +(xB0! AxB1)+(x2B1! Ax

2B2)+ xnBn!1

=!AB0+ x(B0! AB1)+ x2(B1! AB2)+

+ xnBn!1,

then we could perform the calculation in this way:

cA(x)=!AB0+ x(B0! AB1)+ x2(B1! AB2)+

+ xnBn!1(already true)

=!AB0 +(xB0! AxB1)+(x2B1! Ax

2B2)+ xnBn!1

=(xI ! A)((B0 + xB1 + x2B2 +

+ xn!1Bn-1).

For which matrices will this work? The answer is that we can do this, so long as

x commutes with any power of A, and this was the problem with

"$$$$$

%'''''

2 1 2

1 2 1

1 1 2

: it

does not commute with A. Since A commutes with any power of A, it is legal

to substitute A for x into the equation, and we obtain

(A ! A)(B0++ An!1Bn!1) = cA(A),

and in particular, that cA(A)= 0.

So the point of this proof is the following:

(**) (xI ! A)((B0 + xB1 + x2B2 +

+ xn!1Bn-1) = cA(x)I

is true for any x for which commutes with A (that is, for which xA = Ax). As

an exercise, you might try x =

"$$$$$

%'''''

1 1 1

1 1 1

1 1 1

, which commutes with A, and show that

(**) works for this x.

page 4


5/10

Here is an extension of the Cayley!Hamilton Theorem. It uses adj(xI ! A)

to calculate the minimal polynomial of A. Suppose that the greatest common

divisor of all the entries in adj(xI ! A) is g(x). Then mA(x) =cA(x)

g(x)

. The

proof is very similar to the proof of the Cayley!Hamilton theorem: we can write

cA(x)I = (xI ! A) adj(xI ! A)

= (xI ! A) g(x)(C0+ xC1++ xmCm)

for some m. Dividing by g(x), we have

[cA(x)/g(x)] I = (xI ! A)(C0+ xC1++ xmCm).

As before, the right hand side of the above can be multiplied out to get a

polynomial with matrix coefficients equal to [cA(x)/g(x)] I, and this will all

be legal as long as x commutes with A. Thus,

(cA/g)(A)=(A ! A)(C0+ AC1++ AmCm)= 0.

Now suppose that f(x) = xnCn+ xn!1Cn!1+

+ xC1+ C0, is ANY

polynomial with matrix coefficients. If x is a variable that commutes with A,

then we can write f(x) = (xI ! A) q(x)+ R, for some polynomial q(x) with

matrix coefficients, where R is some remainder matrix. In particular, if f is an

annihilating polynomial for A, then f(A)= 0 =(A ! A)q(A)+ R shows R= 0.

Thus, f(x)=(xI ! A)q(x) for some polynomial q(x). If mA(x) is the minimal

polynomial of A and cA(x)= h(x)mA(x), we have

cA(

x)I=

h(x)m

A(x)I=

(xI

!A)h(x)q(x)=

(xI

!A)h(x)Q(x)

where q(x) is a polynomial with matrix coefficients and Q(x) is a

matrix with polynomial coefficients. Comparing this to

cA(x)I = (xI ! A) adj(xI ! A),

page 5


6/10

we have that h(x)Q(x)= adj(xI ! A), so h(x) must be a divisor of each

entry of adj(xI ! A). This proves that h = g.

As anexample, consider the matrix:

A =

"$$$

$$$$

%'''

''''

2 1 1 1

!1 0 !1 !1

1 1 2 1

!1 !1 !1 0

, xI ! A =

"$$$

$$$$

%'''

''''

x!2 !1 !1 !1

1 x 1 1

!1 !1 x!2 !1

1 1 1 x

,

We have (after MUCH work)

adj(xI ! A)=

"$$$$$$$

%'''''''

x(x!1)2 (x!1)2 (x!1)2 (x!1)2

!(x!1)2 (x!1)2(x!2) !(x!1)2 !(x!1)2

(x!1)2 (x!1)2 x(x!1)2 (x!1)2

!(x!1)2 !(x!1)2 !(x!1)2 (x!1)2(x!2)

,

so g(x)= (x ! 1)2, and mA(x)= cA(x)/(x!1)=(x ! 1)4

(x ! 1)2= (x ! 1)2.

One last thing mentioned in class: That matrices over the integers, or

matrices with polynomial coefficients can be put into something called

Smith!Normal form: Given A, there are integer(or polynomial) matrices P and

Q so that det(P)= 1, det(Q)= 1, and PAQ =

"$$$

$$$$

%'''

''''

c1

c2

cn

, a diagonal

matrix, with cn divisible by cn!1 divisible by divisible by c1. This is not quite

true: The first several cs will be 0 if det(A)= 0. It is only after the cs become

nonzero that they start dividing each other. The cs will be integers if A is an

integer matrix, polynomials if A is a polynomial matrix. If det(A)! 0, then the

page 6


7/10

product of the cs is det(A). Finally, if we do this for xI ! A, whose

determinant is cA(x), then the cs can be taken to be monic polynomials, and the

largest of these, cn, is the minimal polynomial. The proof for this rests on the

fact that if B(x) is a matrix with polynomial entries, and we make a single row or

column operation on B(x) to get B((x), then the greatest common divisor of the

entries in adj(B((x)) is the same as for adj(B(x)). Once this is established, one

may perform any number of row or column operations without changing the gcd

of the entries of the adjoint. Finally, the adjoint of a diagonal matrix is extremely

easy to figure out, and one gets that the gcd will be the product c1cn!1, so the

minimal polynomial isc

A

(x)

c1cn!1

=c

1

cn!1

cn

c1cn!1

= cn. Lets verify that a single

row operation does not change the gcd of the adjoint for one case when B is 3x3.

I will use the cofactor matrix rather than the adjoint below (avoids a transpose). I

will use C(B) for the cofactor matrix of B.

Suppose B =

"$

$$$$

%'

''''

a1 a2 a3

b1 b2 b3

c1 c2 c3

, and B(=

"$

$$$$

%'

''''

a1!yb1 a2!yb2 a3!yb3b1 b2 b3

c1 c2 c3

. Now let g(x) be

the gcd of the entries in C(B), h(x) the gcd of the entries in C(B(). When we take

cofactors, the top row will be unchanged. This means that g(x) and h(x) both

divide the entries in the top row of either cofactor matrix Lets look at the

bottom row. This will be (a2b3! a3b2, a3b1! a1b3, a1b2! a2b1) for B and

((a2!yb2)b3!(a3!yb3)b2, something, something) for B(. But

(a

2!yb

2)b

3!

(a

3!yb

3)b

2=

a2b

3!

a3b

2, so the bottom rows are the same as well.

This leaves just the second row to check. The second rows will be

(a3c2! a2c3, a1c3! a3c1, a2c1! a1c2) for B, and

((a3!yb3)c2!(a2!yb2)c3, (a1!yb1)c3!(a3!yb3)c1, (a2!yb2)c1!(a1!yb1)c2)

for B(. These are clearly different. Now g(x) divides each of the entries for B.

page 7


8/10

In particular, say, g(x) divides a1c3! a3c1. But g(x) also divides b1c3! b3c1

because this is an entry in C(B)(the (1, 2)!cofactor). Since g(x) divides both

a1c3! a3c1 and b1c3! b3c1, it also divides a1c3! a3c1! y(b1c3! b3c1), the

(3, 2)!cofactor for B(. In a similar way, we see that g(x) divides all the cofactors

in B(, so it must divide the greatest common divisor of these. That is, h(x) is

divisible by g(x). But row operations are reversible. Because of this, the same

argument shows that g(x) is divisible by h(x). Since each divides the other, they

are, up to scalar multiples, the same polynomial.

Here is an example: (same as before)

A =

"$$$$$$$

%'''''''

2 1 1 1

!1 0 !1 !1

1 1 2 1

!1 !1 !1 0

, xI ! A =

"$$$$$$$

%'''''''

x!2 !1 !1 !1

1 x 1 1

!1 !1 x!2 !1

1 1 1 x

,

We have:

"$$$

$$$$

%'''

''''

x!2 !1 !1 !1

1 x 1 1

!1 !1 x!2 !1

1 1 1 x

)row operations

"$$$

$$$$

%'''

''''

x!1 0 0 x!1

0 x!1 0 !x+1

0 0 x!1 x!1

1 1 1 x

(to simplify)

)row operations

"$$$

$$$$

%'''

''''

1 1 1 x

0 x!1 0 !x+1

0 0 x!1 x!1

0 !x+1 !x+1 !x2+2x!1

. )column operations

"$$$

$$$$

%'''

''''

1 0 0 0

0 x!1 0 !x+1

0 0 x!1 x!1

0 !x+1 !x+1 !x2+2x!1

)

"

$$$

$$$$

%

'''

''''

1 0 0 0

0 x!1 0 !x+1

0 0 x!1 x!1

0 0 !x+1 !x2+x

)

"

$$$

$$$$

%

'''

''''

1 0 0 0

0 x!1 0 0

0 0 x!1 x!1

0 0 !x+1 !x2+x

)

"

$$$

$$$$

%

'''

''''

1 0 0 0

0 x!1 0 0

0 0 x!1 x!1

0 0 0 !x2+2x!1

)

"$$$

$$$$

%'''

''''

1 0 0 0

0 x!1 0 0

0 0 x!1 0

0 0 0 (x!1)2

so again, the minimal polynomial is (x ! 1)2.

page 8


9/10

Finally, for this set of notes, the relationship between minimal polynomials

and diagonalization of operators or matrices. First, a result on kernels of

compositions of operators (or of products of matrices).

Lemma If S and T are linear operators on V, then

dim(ker(ST))" dim ker(S)+ dim ker(T).

Proof: Ker(ST) = {v * ST(v)= 0}

= {v * T(v)= 0} + {v * T(v)! 0 but ST(v)= 0}.

Suppose that ker(T) has basis {u1, u2,

, um} and Ker(ST) has basis{u1, u2,

, um}+{v1, v2,, vk}.

Then dim(ker(T))= m, and dim(ker(ST))= m+k. Now {T(v1), T(v2),,T(vk)}

is a linearly independent set!!you should check that this is true! it is an important

fact about linear transformations. How big can k be? Since S(T(vi))= 0, each

T(vi) is in the kernel of S. Since a vector space cant contain more independent

vectors than its dimension, k"dim(ker(S)), Thus,

m + k " m + dim(ker(S)) = dim(ker(T))+ dim(ker(S)).

Theorem. Let T be a linear operator on a finite dimensional vector space V.

Then T is diagonalizable if and only if its minimal polynomial, mT(x) factors

into distinct linear terms over F.

Proof: One direction was easy: If T is diagonalizable, then in some basis B, for

V, [T]B is a diagonal matrix. The minimal polynomial for T is the same as the

minimal polynomial for [T]B, and it is easy to check that the minimal polynomial

for a diagonal matrix factors into distinct linear terms.

page 9


10/10

For the other direction, let mT(x) = (x ! a1)(x ! a2)(x ! am). Then

0= mT(T) is the m!fold composition of the linear operators

T ! a1I,, T ! amI.

Now V = ker(0). Consequently, by the lemma, we have

dim V = dim ker(0) " dim ker(T!a1I)++ dim ker(T!am) " dim V,

and this can only be true if

dim V = dim ker(T!a1I)++ dim ker(T!am).

Since ker(T ! aiI) is the eigenspace of ai, this implies that T is diagonalizableby a previous theorem.

page 10

Documents

Cayley Hamilton 2011