60
Chapter 7 Powers of Matrices 7.1 Introduction In chapter 5 we had seen that many natural (repetitive) processes such as the rabbit examples can be described by a discrete linear system (1) u n+1 = Au n , n =0, 1, 2,..., and that the solution of such a system has the form (2) u n = A n u 0 . In addition, we had learned how to find an explicit expression for A n and hence for u n by using the constituent matrices of A. In this chapter, however, we will be primarily interested in “long range predictions”, i.e. we want to know the behaviour of u n for n large (or as n →∞). In view of (2), this is implied by an understanding of the limit matrix A = lim n→∞ A n , provided that this limit exists. Thus we shall: 1) Establish a criterion to guarantee the existence of this limit (cf. Theorem 7.6); 2) Find a quick method to compute this limit when it exists (cf. Theorems 7.7 and 7.8). We then apply these methods to analyze two application problems: 1) The shipping of commodities (cf. section 7.7) 2) Rat mazes (cf. section 7.7) As we had seen in section 5.9, the latter give rise to Markov chains, which we shall study here in more detail (cf. section 7.7).

Chapter 7 Powers of Matrices - Queen's U

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 7 Powers of Matrices - Queen's U

Chapter 7

Powers of Matrices

7.1 Introduction

In chapter 5 we had seen that many natural (repetitive) processes such as the rabbitexamples can be described by a discrete linear system

(1) ~un+1 = A~un, n = 0, 1, 2, . . . ,

and that the solution of such a system has the form

(2) ~un = An~u0.

In addition, we had learned how to find an explicit expression for An and hence for ~un

by using the constituent matrices of A.In this chapter, however, we will be primarily interested in “long range predictions”,

i.e. we want to know the behaviour of ~un for n large (or as n→∞).In view of (2), this is implied by an understanding of the limit matrix

A∞ = limn→∞

An,

provided that this limit exists. Thus we shall:

1) Establish a criterion to guarantee the existence of this limit (cf. Theorem 7.6);

2) Find a quick method to compute this limit when it exists (cf. Theorems 7.7 and7.8).

We then apply these methods to analyze two application problems:

1) The shipping of commodities (cf. section 7.7)

2) Rat mazes (cf. section 7.7)

As we had seen in section 5.9, the latter give rise to Markov chains, which we shall studyhere in more detail (cf. section 7.7).

Page 2: Chapter 7 Powers of Matrices - Queen's U

312 Chapter 7: Powers of Matrices

7.2 Powers of Numbers

As was mentioned in the introduction, we want to study here the behaviour of the powersAn of a matrix A as n→∞. However, before considering the general case, it is useful tofirst investigate the situation for 1×1 matrices, i.e. to study the behaviour of the powersan of a number a.

(a) Powers of Real Numbers

Let us first look at a few examples.

Example 7.1. Determine the behaviour of an for a = 2,−12,−1.

(a) a = 2

n 1 2 3 4 5 · · ·an 2 4 8 16 32 · · ·

sequence diverges(limit doesn’t exist)

(b) a = −12

n 1 2 3 4 5 . . .

an −12

14−1

8116− 1

32. . .

limn→∞

(−12)n = 0

converges to 0

(c) a = −1

n 1 2 3 4 5 · · ·an −1 1 −1 1 −1 · · ·

limit doesn’t exist(bounces back and forth)

These three examples generalize to yield the following facts.

Theorem 7.1. Let a ∈ R.

(a) If |a| < 1, then limn→∞

an = 0.

(b) If |a| > 1, then the sequence {an} diverges.

(c) If |a| = 1, then limn→∞

an exists if and only if a = 1 (and then limn→∞

an = 1).

(b) Powers of Complex Numbers

Next, we consider the behaviour of the powers cn of a complex number c = a+ bi. Herewe first need to clarify the notion of the limit of a sequence {cn}n≥1 of complex numbers.

Definition. If cn = an + ibn, n = 1, 2, . . . , is a sequence of complex numbers such that

a = limn→∞

an and b = limn→∞

bn

exist, then we say that c = a+ ib is the limit of {cn}n≥1 and write

limn→∞

cn = a+ ib.

Page 3: Chapter 7 Powers of Matrices - Queen's U

Section 7.2: Powers of Numbers 313

Remarks. 1) If either limn→∞

an or limn→∞

bn does not exist, then neither does limn→∞

cn.

2) The following rule is gives an alternate definition of the limit of the sequence{cn}n≥1 (and is sometimes useful for calculating the limit c):

limn→∞

cn = c ⇔ limn→∞

|cn − c|︸ ︷︷ ︸complex

absolute value

= 0.

[To see this equivalence, write cn = an+ibn and c = a+ib, so |cn−c|2 = (an−a)2+(bn−b)2.Then lim cn = c⇔ lim an = a and lim bn = b⇔ lim |an − a| = 0 and lim |bn − b| = 0⇔lim(an− a)2 = 0 and lim(bn− b)2 = 0⇔ lim(an− a)2 + (bn− b)2 = 0⇔ lim |cn− c| = 0.]

3) We also observe that if c = limn→∞

cn exists, then we have

limn→∞

|cn| = |c|.

[For lim |cn|2 = lim(a2n + b2n) = lim a2

n + lim b2n = a2 + b2 = |c|2, and hence lim |cn| = |c|.]

Theorem 7.1 naturally extends to complex numbers in the following way.

Theorem 7.2. Let α ∈ C.(a) If |α| < 1 then lim

n→∞αn = 0. “The powers spiral in to 0.”

(b) If |α| > 1, then limn→∞

αn =∞. “The powers spiral out to ∞.”

(c) If |α| = 1, then limn→∞

αn does not exist except if α = 1.

Proof. (a) If |α| < 1, then lim |αn| = lim |α|n = 0, and hence limαn = 0 by Remark2) above (with c = 0). (Here we have used the multiplicative rule |αn| = |α|n of theabsolute value.)

(b) If |α| > 1, then the limit lim |αn| = lim |α|n =∞ does not exist, and hence neitherdoes the limit limαn by Remark 3) above.

(c) Suppose that β := limαn exists and that |α| = 1. Then |β| = lim |αn| = lim |α|n =1, so β 6= 0. Thus we have αβ = α limαn = limαn+1 = β, so αβ = β and hence α = 1.

Corollary. limn→∞

αn exists if any only if α = 1 or |α| < 1. Moreover,

limn→∞

αn = 0 if |α| < 1.

Example 7.2. Analyze the behaviour of the powers of α = 56

+ 512i, β = 11

12+ 11

24i and

γ = cos( 2π100

) + i sin( 2π100

).

Solution. Since |α| = 512

√5.= .931694990 < 1, the powers αn spiral in to 0, as is shown

in figure 7.1.Since |β| = 11

24

√5.= 1.024864490 > 1, the powers βn spiral out to ∞, as is evident in

figure 7.1.Finally, since |γ| = 1, the powers γn run around on the unit circle (cf. figure 7.1).

Page 4: Chapter 7 Powers of Matrices - Queen's U

314 Chapter 7: Powers of Matrices

−1.5 −1 −.5 .5 1 1.5

−1.5

−1

−.5

.5

1

1.5

α rrα2

rrr

rr

r r rα10

r rrrrrrrrrα20 r r r r r rrr

rrα30rrrrrr r rrrrrrrrrrrrr

βbb β2

bbb

b

b

b

bbβ10

bb

b

b

b

bbb

b

bβ20

a γaa γ2aaa

aaaaaaaaaaaaaaa γ20aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a aγ90a a a

a aaaaaa

Figure 7.1: Powers of the numbers α = 512

(2+i), β = 1124

(2+i) and γ = cos( 2π100

)+i sin( 2π100

)

Exercises 7.2.

1. Calculate the following limits whenever they exist:

(a) limn→∞

1 + (−1)ni

1 + 2ni; (b) lim

n→∞

5 + (−1)ni

1 + 210000i; (c) lim

n→∞

(−2)n + i

1 + 2n;

(d) limn→∞

n∑k=0

(1 + i

2

)k

; (e) limn→∞

(n∑

k=0

eki(1+i)

).

2. (a) Find the first n for which |(3 + 4i)n| > 107.

(b) Find the first n for which |(.3 + .4i)n| < 10−7.

3. Write out a detailed proof of Theorem 7.1.

Page 5: Chapter 7 Powers of Matrices - Queen's U

Section 7.3: Sequences of powers of matrices 315

7.3 Sequences of powers of matrices

We now extend the results of the previous section about limits of powers of numbers tolimits of powers of matrices. Before doing so, we first have to clarify the meaning of alimit for matrices.

Definition. Let A1, A2, A3, . . . be a sequence of m× n matrices. Then we say that

A = limk→∞

Ak

if we have A = (aij) and limk→∞

a(k)ij = aij, for all i, j with 1 ≤ i ≤ m and 1 ≤ j ≤ n, where

Ak =

a(k)11 . . . a

(k)1n

......

a(k)m1 . . . a

(k)mn

.

Example 7.3. (a) If Ak =

(1k

11k2

k+1k

), then lim

k→∞Ak =

(0 10 1

)because

limk→∞

1

k= 0 = lim

k→∞

1

k2and lim

k→∞

k + 1

k= 1.

(b) limk→∞

((−1)k 0

0 1

)does not exist!

Theorem 7.3 (Rules for limits). Suppose that limk→∞

Ak = A exists and let B,C be ma-

trices such that the product CAkB is defined. Then:

(a) limk→∞

(AkB) = AB,

(b) limk→∞

(CAk) = CA,

(c) limk→∞

(CAkB) = CAB.

Proof. Use the definition of matrix multiplication and properties of (usual) limits.

Definition. A square matrix A is called power convergent if limk→∞

Ak exists.

Example 7.4. (a) limk→∞

12

0 00 1

30

0 0 14

k

= limk→∞

(12)k 0 00 (1

3)k 0

0 0 (14)k

=

0 0 00 0 00 0 0

, so A

is power convergent.

(b) A =„

1 00 −1

«is not power convergent, for Ak =

„1 00 (−1)k

«, and we know that

the sequence (−1)k does not converge.

Page 6: Chapter 7 Powers of Matrices - Queen's U

316 Chapter 7: Powers of Matrices

In view of Theorem 7.2, the above examples show the following general rule:

Theorem 7.4. A diagonal matrix A = Diag(λ1, λ2, . . . , λm) is power convergent if andonly if for each i with 1 ≤ i ≤ m we have:

(3) either |λi| < 1 or λi = 1.

More generally, a diagonable matrix A is power convergent if and only if each eigenvalueλi of A satisfies (3).

Proof. Since A is a diagonable, there is a matrix P such that D := P−1AP is a diagonalmatrix D = Diag(λ1, . . . , λm). Thus, for every k ≥ 1 we have:

Ak = (PDP−1)k = PDkP−1, by Theorem 6.1.

Thus, by Theorem 7.3:

limk→∞

Ak = limk→∞

PDkP−1 Th. 7.3= P ( lim

k→∞Dk)P−1 = P ( lim

k→∞Diag(λk

1, . . . , λkm))P−1.

This means that

A is power convergent ⇔ D is power convergentTh. 7.2⇔ (3) holds for all λi’s.

Remark. Note that the above proof also shows the following more general fact: If Aand B = P−1AP are two similar matrices, then A is power convergent ⇔ B is powerconvergent. Moreover, if this is the case then we have

limk→∞

Ak = P ( limk→∞

Bk)P−1.

When is a general matrix power convergent? Following the strategy introduced insection 6.6, we first analyze this question for Jordan blocks, then for Jordan matricesand finally for arbitrary matrices.

Theorem 7.5. A Jordan block J = J(λ,m) is power convergent if and only if

(4) either: |λ| < 1 or: λ = 1 and m = 1.

Example 7.5. a) If J = J(12, 2) =

(12

10 1

2

), then

Jk =

((1

2)k k(1

2)k−1

0 (12)k

)→(

0 00 0

).

Thus, J is power convergent and converges to 0.

b) If J = J(1, 2) =(1 10 1

), then Jk =

(1 k0 1

), so the sequence does not converge, i.e. J is

not power convergent.

Page 7: Chapter 7 Powers of Matrices - Queen's U

Section 7.3: Sequences of powers of matrices 317

Proof of Theorem 7.5. Recall from chapter 6, Theorem 6.5, that

J(λ,m)n =

λn nλn−1 · · · · · ·(

nm−1

)λn−m+1

0 λn . . ....

.... . . . . . . . .

......

. . . . . . nλn−1

0 · · · · · · 0 λn

,

i.e. J(λ,m)n = (a(n)ij ), where a

(n)ij =

{0 if i > j(

nj−i

)λn−(j−i) if i ≤ j

.

We now check the following four different cases:

Case 1: |λ| ≥ 1 (but λ 6= 1)

Since the sequence a(n)11 = λn doesn’t converge for n → ∞ (cf. Theorem 7.2), neither

does J(λ,m)n.

Case 2: λ = 1 and m > 1Since the sequence a

(n)12 = nλn−1 = n does not converge for n → ∞, hence neither does

J(λ,m)n. (Note that since m > 1, this entry is actually present.)

Case 3: λ = 1 and m = 1The sequence of 1× 1 matrices {(1)n} clearly converges to the 1× 1 matrix J∞ = (1).

Case 4: |λ| < 1

In this case we shall show that {a(n)ij } converges for all i, j. In view of the above formula

for a(n)ij , this follows once we have shown:

Claim. |λ| < 1⇒ limn→∞

(n

a

)λn−a = 0 for all a ∈ Z (a ≥ 0).

Proof of claim. Since(

na

)= n(n−1)...(n−a+1)

k!≤ n · n · . . . · n = na we have∣∣∣∣(na

)λn−a

∣∣∣∣ ≤ |naλn−a| = 1

|λ|a|naλn| = 1

|λ|anaen log |λ|.

We now use the following fact from Analysis:

limt→∞

tae−tb = 0, if a, b > 0,

i.e. exponentials grow much faster than polynomials. This is applicable here because−b = log |λ| < 0 (since |λ| < 1), and so we see that(

n

a

)λn−a → 0, as n→∞.

This, therefore, proves the claim. Putting these four cases together yields the assertionof Theorem 7.5.

Page 8: Chapter 7 Powers of Matrices - Queen's U

318 Chapter 7: Powers of Matrices

For future reference, we note that in the course of the above proof we had also shown:

Corollary. If |λ| < 1, then limn→∞

J(λ,m)n = 0.

We now want to extend Theorem 7.5 to general matrices. To state the result in aconvenient form, we first introduce the following two definitions.

Definition. An eigenvalue λi of a matrix A is called regular if its geometric multiplicityis the same as its algebraic multiplicity; i.e. if mA(λi) = νA(λi).

Remark. Actually, this concept had already been introduced in section 6.2. Recall thatwe had seen there that an eigenvalue λi of a matrix A is regular if and only if all theJordan blocks with eigenvalue λi of its associated Jordan canonical form J are of size1× 1; cf. Corollary 3 of Theorem 6.4.

In particular, A is diagonable if and only if all of its eigenvalues are regular (cf.Theorem 6.4, Corollary 2′.).

Example 7.6. If A =(5 00 5

), then 5 is a regular eigenvalue: νA(5) = mA(5) = 2;

if B =(5 10 5

), then 5 is not a regular eigenvalue: νB(5) = 1,mB(5) = 2.

Definition. An eigenvalue λi of A is called dominant if we have

|λj| < |λi|, for every eigenvalue λj 6= λi of A.

Example 7.7. a) If A = Diag(1, 2, 3, 4, 4), then 4 is a dominant eigenvalue of A.

b) If A = Diag(1, 2, 3, 4,−4), then there is no dominant eigenvalue because |−4| = |4|.c) If A = Diag(1, i), there is no dominant eigenvalue.

We can now characterize power convergent matrices as follows.

Theorem 7.6. A square matrix A is power convergent if and only if

(5)

{either: |λi| < 1 for all eigenvalues λi of A,

or: 1 is a regular, dominant eigenvalue of A.

Example 7.8. a) Determine whether A =

(0 −11 0

)is power convergent.

Here chA(t) = t2 + 1 = (t− i)(t+ i), so λ1 = i, λ2 = −i. Thus A is not power convergentsince |λ1| = |λ2| = 1. [In fact: A2 = −I, A4 = I so A is clearly not power convergent.]

b) A =

1 0 00 1

21

0 0 12

is power convergent: since chA(t) = (t− 1)(t− 12)2, we see that

λ1 = 1 is a dominant, regular eigenvalue (the latter because νA(λ1) = mA(λ1) = 1).

Proof of Theorem 7.6. Step 1. The theorem is true for Jordan blocks J(λ,m) byTheorem 7.5.

Page 9: Chapter 7 Powers of Matrices - Queen's U

Section 7.3: Sequences of powers of matrices 319

Step 2. It is true for Jordan matrices J = Diag(J11, . . . ) because it is true for each blockby step 1; cf. Problem 4 of Exercises 7.3.

Step 3. If A is an arbitrary matrix, then by Jordan’s theorem A = PJP−1, for someJordan matrix J . Thus, by the remark on p. 316 we see that A is power convergent⇔ J

is power convergentstep 2⇔ (5) holds for J ⇔ (5) holds for A.

Exercises 7.3.

1. Which of the following matrices are power convergent? Justify your answer.

12

(1 1−1 3

); 1

2

(0 1−2 3

); 1

3

1 1 10 3 24 −2 1

; 13

5 −8 24 −7 20 0 1

.

2. Determine which of the following matrices are power convegent.

A = 12

0 0 1 10 0 −1 30 1 0 0−2 3 0 0

B =

0 0 0 10 0 −1 00 1 0 02 0 0 0

.

3. (a) Is the matrix A = 12

1 0 1 00 1 0 11 0 1 00 1 0 1

power convergent? Explain.

(b) Find all α ∈ C such that the matrix B =

0 1 00 0 1α3 −3α2 3α

is power conver-

gent. Justify your answer!

4. (a) Show that A = Diag(B,C) is power convergent if and only if both B and C arepower convergent.

(b) Show that A = Diag(A1, . . . , Ar) is power convergent if and only if each Ai,1 ≤ i ≤ r, is power convergent.

5. Show that if A is a power convergent matrix then | det(A)| ≤ 1.

6. (a) If A is an integral 2× 2 matrix which is power convergent, prove that its onlyeigenvalues are 0 and 1.

(b) Use part (a) to write down all integral 2×2 matrices which are power convergent.

Page 10: Chapter 7 Powers of Matrices - Queen's U

320 Chapter 7: Powers of Matrices

7.4 Finding limn→∞

An (when A is power convergent)

Having analyzed when A is power convergent, i.e. when the limit limn→∞

An exists, we now

want determine this limit.

Naive method: find an explicit formula for An and use it to calculate the limit of eachentry.

Better method: use the spectral decomposition theorem (without calculating all theEik’s explicitly).

Example 7.9. If A =

(0 1−2 3

), B = 1

3A and C = 1

2A, find lim

n→∞Bn and lim

n→∞Cn.

Solution. (a) Here chA(t) = (t− 1)(t− 2),

⇒ chB(t) = (t− 13)(t− 2

3)

⇒B is power convergent and Bn = (13)nE10 + (2

3)nE20

⇒ limn→∞Bn = 0 since (12)n, (1

3)n → 0.

(b) As before, chA(t) = (t− 1)(t− 2),

⇒ chC(t) = (t− 12)(t− 2

2)

⇒C is power convergent and Cn = (12)nE10 + 1nE20

⇒ limn→∞Cn = E20.

Thus, to find the limit, it is enough to calculate E20. Using the formula for the con-stituent (polynomials and) matrices (cf. (9) and (13)) we obtain E20 = 1

λ2−λ1(A−λ1I) =(

−1 1−2 2

), and hence limn→∞An =

(−1 1−2 2

).

Note that in the above example, we were able to calculate the limits without first cal-culating Bn and Cn in detail. Generalizing this idea leads to the following two theorems.

Theorem 7.7. If |λi| < 1 for every eigenvalue of A, then A is power convergent andlimn→∞An = 0.

Proof. By the Corollary of Theorem 7.5, this is true for Jordan blocks. Using the sameargument as in the proof of Theorem 7.6, the assertion follows for an arbitrary matrix.

Theorem 7.8. If λ1 = 1 is a regular, dominant eigenvalue of A, then A is powerconvergent and we have

limn→∞

An = E10 6= 0,

where E10 is the first constituent matrix of A associated to λ1 = 1. Moreover, the otherconstituent matrices E1k associated to λ1 = 1 are equal to zero:

E11 = · · · = E1,m1−1 = 0.

Page 11: Chapter 7 Powers of Matrices - Queen's U

Section 7.4: Finding limn→∞

An (when A is power convergent) 321

Corollary. limn→∞

An = 0 ⇔ |λi| < 1, for all eigenvalues λi of A.

Proof. (⇐) Theorem 7.7.

(⇒) If limn→∞An = 0, then A is in particular power convergent. By Theorem 7.6, Asatisfies either the hypothesis of Theorem 7.7 or 7.8. However, in the latter case we havelimAn 6= 0, so we must be in the situation of Theorem 7.7.

It remains to prove Theorem 7.8. Before proving it, however, let us illustrate it withthe following example.

Example 7.10. Find limn→∞

An when A = 14B and B =

3 0 0 11 3 −1 −11 −1 3 −11 0 0 3

.

Solution. Expanding det(B−tI) along the 1st or last row yields chB(t) = (t−4)2(t−2)2,and so chA(t) = (t− 1)2(t− 1

2)2. Row reducing B − 4I yields

B − 4I =

−1 0 0 1

1 −1 −1 −11 −1 −1 −11 0 0 −1

1 0 0 −10 −1 −1 00 0 0 00 0 0 0

,

and so we see that νA(1) = νB(4) = 4 − 2 = 2 = mA(1); thus, 1 is a regular eigenvalue.Moreover, 1 is also dominant (because |λ2| = 1

2< 1), so A power convergent. Thus, by

Theorem 7.8 we have limn→∞An = E10 (and E11 = 0).To find the limit E10, write down the spectral decomposition theorem:

f(A) = f(1)E10 + f ′(1) E11︸︷︷︸0

+f(12)E20 + f ′(1

2)E21, for every f ∈ C[t].

Choose f such that f(12) = f ′(1

2) = 0, i.e. f(t) = (2t− 1)2. Then:

(2A− I)2 = f(1)︸︷︷︸(2·1−1)2

E10 + f ′(1) E11︸︷︷︸0

+0E20 + 0E21 ⇒ E10 = (2A− I)2.

Thus limn→∞

An =E10 = (2A− I)2 = 14( 4A︸︷︷︸

B

−2I)2

= 14

1 0 0 11 1−1−11−1 1−11 0 0 1

2

= 14

2 0 0 20 2−2 00−2 2 02 0 0 2

.

To prove Theorem 7.8, we first verify the following fact:

Theorem 7.9. If λ1 is a regular eigenvalue of A, then its associated constituent matricesE10, . . . , E1,m1−1 satisfy:

E10 6= 0 and E11 = . . . = E1,m1−1 = 0.

Page 12: Chapter 7 Powers of Matrices - Queen's U

322 Chapter 7: Powers of Matrices

Proof. We follow the usual strategy by considering first (special) Jordan matrices, andthen arbitrary matrices.

Case 1. A = J is a Jordan matrix with only one (regular) eigenvalue λ1.

Then, since λ1 is a regular eigenvalue, we have J = λ1I, and so for every f ∈ C[t]

f(J) = f(λ1)I = f(λ1)I + f ′(λ1) · 0 + . . .+ f (m−1)(λ1) · 0.

On the other hand, by the spectral decomposition theorem we have

f(J) = f(λ1)E10 + f ′(λ1)E11 + . . .+ f (m1−1)(λ)E1m1−1, for all f ∈ C[t],

and so it follows thatE10 = I, E11 = 0, . . . , E1m1−1 = 0

either by appealing to the uniqueness property of the spectral decomposition or, moredirectly, as follows.

Take f(t) = 1. Then f(λ) = 1, f ′(λ) = · · · = f (m1−1)(λ) = 0, so I = f(λ)I = f(J) =1 · E10 + 0 · E11 + · · ·+ 0E1m1−1 which means that E10 = I.

Next, take f(t) = (t − λ)k, (k > 0). Then f (k)(λ) = k! and f (l)(λ) = 0 for l 6= k, sok!Elk = f(J) = f(λ)I and hence Elk = 0.

Case 2. A = Diag(J1, J2), where J1 = λ1I and λ1 is not an eigenvalue of J2.

Then EA1k = Diag(EJ1

1k , 0) (because λ1 is not an eigenvalue of J2; cf. Theorem 7.11(b)below). But by case 1 we have EJ1

1k = 0, for 1 ≤ k ≤ m1 − 1, and so it also follows thatEA

1k = 0 for k ≥ 1. Moreover, EA10 = Diag(I, 0) 6= 0. Thus, the assertion is true in this

case as well.

Case 3. A arbitrary.

By Jordan’s theorem (and the fact that λ1 is regular), there is a matrix P such that

J = P−1AP =

(λ1I 00 J2

)has the form of case 2. Thus, since A = PJP−1, we have for 1 ≤ k ≤ m1 − 1 that

EA1k = PEJ

1kP−1 case 2

= P · 0 · P−1 = 0.

Moreover, EA10 = PEJ

10P−1 = P Diag(Im1 , 0)P−1 6= 0, and so the theorem follows.

Corollary. If λi is a regular eigenvalue of A, then

Ei0 =1

gi(λi)gi(A), where gi(t) =

chA(t)

(t− λi)mi.

Proof. Take f(t) = gi(t) in the spectral decomposition formula. Since gi(λj) = 0 ifλj 6= λi and Eik = 0 for k ≥ 1 by Theorem 7.9, the formula reduces to gi(A) = gi(λi)Ei0.Thus, the assertion follows since gi(λi) 6= 0.

Page 13: Chapter 7 Powers of Matrices - Queen's U

Section 7.4: Finding limn→∞

An (when A is power convergent) 323

Proof of Theorem 7.8. Let λ1 = 1, λ2, . . . , λs denote the distinct eigenvalues of A andm1, . . . ,ms their respective algebraic multiplicites. Then by the spectral decompositiontheorem applied to f(t) = tn we obtain

An =s∑

i=1

mi−1∑k=0

k!

(n

k

)λn−k

i Eik

because f (k)(t) = n(n − 1) · · · (n − k + 1)tn−k = k!(

nk

)tn−k. By hypothesis, λ1 = 1 is a

regular eigenvalue and so E11 = . . . = E1,m1−1 = 0 by Theorem 7.9. Thus, since λn1 = 1

for all n we obtain

An = E10 +s∑

i=2

mi−1∑k=0

k!

(n

k

)λn−k

i Eik.

Now since λ1 = 1 is a dominant eigenvalue, we have |λi| < 1 for all i > 1, and so by theclaim of the proof of Theorem 7.5 the coefficients of the above sum tend to 0 as n→∞.Thus, limn→∞An = E10, and E10 6= 0 by Theorem 7.9.

The above formula for Ei0 (cf. Theorem 7.9, Corollary) can be used to quickly computethe limit lim

n→∞An in many cases.

Example 7.11. Find limn→∞

An when A = 14

3 0 −10 4 0−1 0 3

.

Solution. First note that since A is real and symmetric, it is diagonable by the PrincipalAxis Theorem 5.4, and so we could compute the limit by diagonalizing A. However, itis quicker to use Theorem 7.9 and the above corollary; the latter is applicable since alleigenvalues of a diagonable matrix are regular.

The characteristic polynomial of A is

chA(t) = (−1)3 det(A− tI) = − det

1

4

3− 4t 0 −10 4− 4t 0−1 0 3− 4t

= − 1

43 ((4− 4t)(3− 4t)2 − (4− 4t)) = (t− 1)2(t− 12).

Thus, 1 is a dominant, regular eigenvalue of A. Since g1(t) = chA(t)(t−1)2

= t − 12, it follows

from Theorem 7.8 and the above corollary to Theorem 7.9 that

limn→∞

An = E10 =g1(A)

g1(1)= 2A− I = 1

2

1 0 −10 2 0−1 0 1

.

In the case that λi is a simple eigenvalue, there is a more direct formula for Ei0. Here,the term “‘simple” means:

Page 14: Chapter 7 Powers of Matrices - Queen's U

324 Chapter 7: Powers of Matrices

Definition. An eigenvalue λi of A is called simple if its algebraic multiplicity is 1; i.e. ifmA(λi) = 1.

Note. Clearly, each simple eigenvalue is regular. [Indeed, if λ is a simple eigenvalue then1 = mA(λ) ≥ νA(λ) ≥ 1, so mA(λ) = νA(λ), i.e. λ is regular.]

Before giving the formula for Ei0 when λi is simple, we first note

Observation. A square matrix A and its transpose At have the same eigenvalues withthe same algebraic and geometric multiplicities; i.e. we have

mAt(λ) = mA(λ) and νAt(λ) = νA(λ), for all λ ∈ C.

Indeed, since det(At) = det(A), we see that chAt(t) = chA(t), and so the algebraicmultiplicities are the same. Furthermore, νAt(λ) = n − rank(At − λI) = n − rank((A −λI)t) = n− rank(A−λI) = νA(λ), and so the geometric multiplicities are also the same.

In fact, one can also show that A and At are similar (see Problem 4 of the exercises),so they actually have the same Jordan canonical form.

Theorem 7.10. If A is power convergent and 1 is a simple eigenvalue of A, then

limn→∞

An = E10 =1

~ut · ~v︸ ︷︷ ︸scalar

~u · ~vt︸ ︷︷ ︸matrix

,

where: ~u ∈ EA(1) is any non-zero 1-eigenvector of A, and~v ∈ EAt(1) is any non-zero 1-eigenvector of At.

Proof. Let P be such that J = P−1AP is a Jordan matrix, which we may take to be ofthe form J = Diag(J(1, 1), . . .). Then, since |λi| < 1 for i > 1, we see that

(6) limn→∞

An = P limn→∞

JnP−1 = P

0BBBBB@1 0 . . . 0

0 0...

.

.

.. . .

. . ....

0 . . . . . . 0

1CCCCCAP−1 = P~e1~e

t1P

−1,

where ~e1 = (1, 0, . . . , 0)t. Now since ~e1 is a 1-eigenvector of J , it follows that ~x := P~e1 (=the first column of P ) is a 1-eigenvector of A because A~x = AP~e1 = PJ~e1 = P~e1 = ~x.

Similarly, ~yt := ~et1P

−1 (= the first row of P−1) is (the transpose of) a 1-eigenvector ofAt. Indeed, since ~et

1J = ~et1, we see that ~ytA = ~et

1P−1A = ~et

1JP−1 = ~et

1P−1 = ~yt. Taking

the transpose yields At~y = ~y, which means that ~y is a 1-eigenvector of At. Thus by theabove equation (6) we obtain

limn→∞

An = P~e1~et1P

−1 = ~x~yt =1

~xt~y~x~yt,

the latter because ~xt~y = ~yt~x = ~et1P

−1P~ei = ~et1~e1 = 1. Thus, the theorem holds if we take

~u = ~x and ~v = ~y.

Page 15: Chapter 7 Powers of Matrices - Queen's U

Section 7.4: Finding limn→∞

An (when A is power convergent) 325

Now if we make another choice of ~u, then we must have ~u = k~x for some k 6= 0because the 1-eigenspace of A is one-dimensional. (Recall that the algebraic multiplicitymA(1) = 1 by hypothesis and so the corresponding geometric multiplicity is also νA = 1.)Similarly, any other choice of ~v has the form ~v = c~y for some c 6= 0.

We then have ~ut~v = (k~xt)(c~x) = kc~xt~y = kc and similarly ~u~vt = kc~x~yt. Thus,1

~vt~u~u~vt = ~x~yt = lim

n→∞An, as desired.

Example 7.12. Find limn→∞

An when A = 12

2 0 30 2 2−1 1 0

.

Solution. Here chA(t) = (t− 1)(t− 12)2, so 1 is a simple dominant eigenvalue, and hence

A is power convergent. Moreover,

EA(1) = {c(1, 1, 0)t} because A− I = 12

0 0 30 0 2−1 1 −2

,

EAt(1) = {c(2,−3, 0)t} because At − I = 12

0 0 −10 0 13 2 −2

,

so we can take ~u = (1, 1, 0)t and ~v = (2,−3, 0)t in Theorem 7.10.

Now ~ut · ~v = (1, 1, 0)

2−3

0

= −1,

~u · ~vt =

110

(2,−3, 0) =

2 −3 02 −3 00 0 0

,

hence limn→∞

An = 1~ut~v~u~vt = 1

(−1)

2 −3 02 −3 00 0 0

=

−2 3 0−2 3 0

0 0 0

.

Appendix: Further Properties of Constituent Matrices

In order to state these properties in a convenient manner, it is useful to introduce thefollowing notation.

Notation. As usual, let A be an m ×m matrix with distinct eigenvalues λ1, . . . , λs ofmultiplicities m1, . . . ,ms, respectively, so

chA(t) = (t− λ1)m1 · · · (t− λs)

ms ,

and let Eik be the constituent matrices of A. Note that the numbering of the Eik’sdepends on the ordering of the eigenvalues, so it is useful to label the Eik’s in a waywhich is independent of the numbering.

Page 16: Chapter 7 Powers of Matrices - Queen's U

326 Chapter 7: Powers of Matrices

For any λ ∈ C and any k ≥ 0 put

EAλ,k =

{Eik if λ = λi and k ≤ mi − 1,0m×m otherwise.

We then have:

Theorem 7.11. Let λ ∈ C and let k ≥ 0 be an integer.

(a) If B = P−1AP , then EBλ,k = P−1EA

λ,kP .

(b) If A = Diag(A1, A2, . . . , Ar), then EAλ,k = Diag(EA1

λ,k, . . . , EArλ,k).

Proof. (a) By the spectral decomposition formula (Theorem 5.12) we have for everyf ∈ C[t] that

f(A) =s∑

i=1

mi−1∑k=0

f (k)(λi)EAλi,k

,

so by Theorem 5.2 we obtain

f(B) = P−1f(A)P =s∑

i=1

mi−1∑k=0

f (k)(λi)P−1EA

λi,kP.

Since B has the same eigenvalues and multiplicities as A, the spectral decompositionformula for B gives

f(B) =s∑

i=1

mi−1∑k=0

f (k)(λi)EBλi,k

,

By the uniqueness of the matrices appearing in the spectral decomposition formula, itthus follows that EB

λi,k= P−1EA

λi,kP for 1 ≤ i ≤ s and 0 ≤ k ≤ mi − 1. Thus, if λ = λi,

for some i, and if 0 ≤ k ≤ mi − 1, then we see that the assertion holds. On the otherhand, if these conditions do not hold, then EA

λ,k = 0 = EBλ,k, so the assertion holds here

as well.(b) Let λ1, . . . , λs be the eigenvalues of A = Diag(A1, . . . , Ar). Since the eigenvalues

of Aj, for 1 ≤ j ≤ r, are a subset of those of A, and since mAj(λi) ≤ mA(λi), we can

write the spectral decomposition formula for Aj in the form

f(Aj) =s∑

i=1

mi−1∑k=0

f (k)(λi)EAj

λi,k

because the extra terms in this sum are all 0 by our conventions. Thus, by Theorem 5.10we have for any f ∈ C[t] that

f(A) = Diag(A1, . . . , Ar) =s∑

i=1

mi−1∑k=0

f (k)(λi) Diag(EA1λi,k

, . . . , EA1λi,k

).

Page 17: Chapter 7 Powers of Matrices - Queen's U

Section 7.4: Finding limn→∞

An (when A is power convergent) 327

Comparing this to the spectral decomposition formula of A, we conclude as in part (a)that EA

λi,k= Diag(EA1

λi,k, . . . , EA1

λi,k), for 1 ≤ i ≤ s and 0 ≤ k ≤ mi−1. Thus, the assertion

of part (b) holds when λ = λi, for some i and k ≤ mi − 1., and hence in general becausein the other cases all the matrices are equal to 0.

Corollary 1. If A = Diag(A1, . . . , Ar), then

EAλ,k = 0 for k ≥ max(mA1(λ), . . . ,mAr(λ)).

Proof. By definition we have that EAj

λ,k = 0, for k ≥ mAj(λ), for all j = 1, . . . , r. Thus,

the assertion follows directly from Theorem 7.11(a).

We can use the above Theorem 7.11 and its Corollary to relate the vanishing of theconstituent matrices to the maximum size of the Jordan blocks of the Jordan canonicalform J = JA of A. For this, let λ ∈ C and put

tA(λ) := max{k ≥ 0 : J(λ, k) appears as a Jordan block in JA}

Corollary 2. If A is an m×m matrix and if λ ∈ C, then

EAλ,k = 0 ⇔ k ≥ tA(λ).

Proof. By Jordan’s theorem, J = JA = P−1AP , for some P , so by Theorem 7.8(a) wehave EA

λ,k = PEJλ,kP

−1. In particular, EAλ,k = 0⇔ EJ

λ,k = 0.If tA(λ) = 0, i.e., if λ is not an eigenvalue of J = JA, then the assertion is clear.

Thus, assume that tA(λ) ≥ 1, say λ = λ1. Then J = Diag(J11, . . . , J1ν , J′), where

J1j = J(λ1, kij) and J ′ is a Jordan matrix with mJ ′(λ) = 0. Now by Example 5.25 wehave that

EJ1j

λ,k =1

k!J(0, k1j)

k and EJ1j

λ,k = 0 ⇔ k ≥ k1j.

Thus, by Theorem 7.11(b) we have that EJλ,k = 1

k!Diag(J(0, k11)

k, . . . , J(0, k1ν)k, 0) = 0

if and only if k ≥ max(k11, . . . , k1ν) = tJ(λ) = tA(λ), and so the assertion follows.

Remark. 1) Recall from Corollary 3 of Theorem 6.4 that λ is a regular eigenvalue of Aif and only if tA(λ) = 1. Thus, if mA(λ) ≥ 1, then by Corollary 2 we have that

λ is a regular eigenvalue of A ⇔ EAλ,k = 0, for all k ≥ 1.

Note that this is a restatement of Theorem 7.9, so Corollary 2 generalizes this theorem.

2) The number tA(λ) is connected with the generalized geometric multiplicities νpA(λ)

in the following way:νk

A(λ) = νk+1A (λ) ⇔ k ≥ tA(λ).

Indeed, by definition of tA(λ) we have that the Jordan canonical form of A has no Jordanblocks J(λ, k) of size ≥ k if and only if k ≥ tA(λ) + 1, and by Theorem 6.5(c) we havethat this is the case if and only if νk

A(λ) = νk−1A (λ).

Page 18: Chapter 7 Powers of Matrices - Queen's U

328 Chapter 7: Powers of Matrices

3) If we putµA(t) := (t− λ1)

tA(λ1) · · · (t− λs)tA(λs),

where λ1, . . . , λs are the distinct eigenvalues of A, then µA(t)| chA(t) and we have that

µA(A) = 0

by the spectral decomposition formula and by Corollary 2. This, therefore, refines theCayley-Hamilton Theorem. Note that µA(t) is the minimal polynomial of A of theAppendix of Chapter 6.

Exercises 7.4.

1. By using shortcuts, find the constituent matrices of the matrices

A =

1 0 −1 2 −10 1 0 −3 00 0 1

20 0

0 0 0 12

00 0 0 0 −1

3

and B =1

2

1 0 0 0 −11 2 0 0 11 0 2 0 11 0 0 2 10 0 0 0 1

,

and use these to calculate limn→∞

An and limn→∞

Bn.

2. Find limn→∞

An when

(a) A = 14

1 1 0 00 1 0 02 1 1 22 1 3 2

and (b) A =1

10

5 2 23 4 62 4 2

.

3. Let A be an n× n matrix which is power convergent with limit B = limk→∞

Ak, and

let m = mA(1) ≥ 0 denote the algebraic multiplicity of 1. Show:

(a) chB(t) = (t− 1)mtn−m;

(b) 0 and 1 are regular eigenvalues of B (whenever they occur).

[Hint: Look at the Jordan canonical form of A.]

4. (a) Let J = J(λ, 3). Show that J is similar to its transpose J t. In other words,find P such that

P

λ 1 00 λ 10 0 λ

P−1 =

λ 0 01 λ 00 1 λ

.

(b) As in (a), but for any Jordan block J = J(λ,m). Extend this to a generalJordan matrix.

(c) Conclude that for any square matrix A, its transpose At is similar to A.

Page 19: Chapter 7 Powers of Matrices - Queen's U

Section 7.5: The spectral radius and Gersgorin’s theorem 329

7.5 The spectral radius and Gersgorin’s theorem

In much of our preceeding work we were able to solve problems concerning a square matrixA provided we knew certain facts about its spectrum, i.e. about its set of eigenvalues.However, often only a crude estimate of where the eigenvalues lie is enough to enable usto solve the problem. For example, to determine that limn→∞An = 0 we need only knowthat |λ| < 1 for all eigenvalues of λ of A.

Now it turns out that it is frequently easy to get such crude estimates by using aremarkable theorem due to Gersgorin.

To be able to state this theorem in a convenient form, we first introduce the followingterminology.

Definition. Let A be a square matrix. Its spectral radius is the real number defined by

ρ(A) = max{|λ| : λ ∈ C is an eigenvalue of A}.

Example 7.13. Find the spectral radius of A = Diag(1, i, 1 + i) =

1 0 00 i 00 0 1 + i

.

Solution. The spectrum of A is {1, i, 1 + i}, so

ρ(A) = max(|1|, |i|, |1 + i|) = max(1, 1,√

2) =√

2.

With the help of the spectral radius we can restate some of our previous theoremssuch as Theorems 7.6, 7.7 and 7.8 in a more convenient form:

Theorem 7.6′. A is power convergent if and only if either ρ(A) < 1 or ρ(A) = 1 and 1is a dominant, regular eigenvalue of A.

Theorem 7.8′. limn→∞

An = 0⇔ ρ(A) < 1.

One of the interesting facts concerning the spectral radius is that we can estimate itwithout knowing the eigenvalues explicitly.

There are several such methods available. The first one is based on Gersgorin’stheorem and relates the spectral radius ρ(A) of A to the number

||A|| = maxi

(m∑

j=1

|aij|

),

which is often called the norm of the m×m matrix A = (aij).

Remarks. 1) The norm || · || satisfies the following properties:

(0) ||A|| ≥ 0, ||A|| = 0⇔ A = 0,

(1) ||A+B|| ≤ ||A||+ ||B||,(2) ||cA|| = |c| ||A||, if c ∈ C,(3) ||A ·B|| ≤ ||A|| · ||B||.

Page 20: Chapter 7 Powers of Matrices - Queen's U

330 Chapter 7: Powers of Matrices

To see this, note first that properties (0) and (2) are clear and that (1) follows immediatelyfrom the triangle inequality. Finally, to verify (3), write A = (aij), B = (bij) andC = AB = (cij). Then cij =

∑mk=1 aikbkj, so

m∑j=1

|cij| =m∑

j=1

∣∣∣∣∣m∑

k=1

aikbkj

∣∣∣∣∣ ≤ ∑j

∑k

|aikbkj|

=∑

k

|aik|∑

j

|bkj| ≤∑

k

|aik| · ||B||

≤ ||A|| · ||B||

2) Actually, any function || · || : Mn(C) → R satisfying the four properties (0)–(3) iscalled a (matrix) norm. There are many others; for example, ||A||t := ||At|| also satisfiesthese properties, as does ||A||2 := max(||~v1||, . . . , ||~vm||), if A = (~v1| . . . |~vm).

Some of these are implemented in MAPLE: ||A|| is given by the MAPLE commandnorm(A, 1) and ||At|| by norm(A, infinity), and ||A||2 by norm(A, 2).

As we shall see presently, the following estimate on the spectral radius follows easilyfrom Gersgorin’s theorem (which gives much more precise information about the locationof the eigenvalues).

Theorem 7.12. If A is an m×m matrix, then

ρ(A) ≤ min(||A||, ||At||).

Example 7.14. Estimate the spectral radius of A =

0 1

314

12

0 14

12

13

0

.

Solution. The sums of the absolute values of the columns are 1, 23, 1

2, respectively, so

||A|| = 1. On the other hand, the row sums are 712, 3

4, 5

6so ||At|| = 5

6. Thus, by Theorem

7.12 we seeρ(A) ≤ min(1, 5

6) = 5

6.

Note that this estimate already shows that A is power convergent (with limit 0).To see how good this estimate is, let us compute the characteristic polynomial of A,

chA(t) = t3 − 38t− 1

12,

which doesn’t have any rational roots. Its roots are all real; they are (approximately):

−0.420505805,−0.282066739, 0.7025747892.

Thus, the spectral radius is

ρ(A)·= 0.7025747852 . . .

which is not too far away from the bound 56

= .83333 . . .

Page 21: Chapter 7 Powers of Matrices - Queen's U

Section 7.5: The spectral radius and Gersgorin’s theorem 331

We now turn to Gersgorin’s theorem, for which we first require some additional no-tation (and terminology).

Notation. If A = (aij) ∈ Mm(C) is an m×m matrix, then for each k with 1 ≤ k ≤ mput

rk =m∑

j = 1j 6= k

|akj|,

Dk = Dk(A) = {z ∈ C : |z − akk| ≤ rk}.

Thus, rk is the sum of the absolute values of the entries of the k-th row excluding thediagonal element, and Dk is the disc of radius rk centered at the (diagonal element)akk ∈ C. These discs are called the Gersgorin discs.

Example 7.15. (a) A =

(0 12 2

)a11 = 0, r1 = |1| = 1a22 = 2, r2 = |2| = 2

��

��

'&

$%

D1D2

(b) A =

(0 11 0

)a11 = a22 = 0, r1 = r2 = 1

��

��D1 = D2

(c) A =

0 1 −10 0 11 0 1

a11 = 0 r1 = |1|+ | − 1| = 2a22 = 0 r2 = |0|+ |1| = 1a33 = 1 r3 = |1|+ |0| = 1

��

��

'&

$%

��

��

D1 D3

��

D2

���

(d) A =

i 0 i0 −i 11 1 0

a11 = i r1 = 1a22 = −i r2 = 1a33 = 0 r3 = 2

��

��ip

'&

$%

��

��p

D3

D1

��

D2

���

(e) A =

3i i −1−1 4 1

2 −1 −4 + i

a11 = 3i r1 = |i|+ | − 1| = 2a22 = 4 r2 = | − 1|+ |1| = 2a33 = −4 + i r3 = |2|+ | − 1| = 3

��

��3iq D1��

��4

q D2

'&

$%

q−4 + i

D3

Theorem 7.13 (Gersgorin, 1931). Each eigenvalue λ of A lies in at least one Gersgorindisc Dk(A) of A.

Page 22: Chapter 7 Powers of Matrices - Queen's U

332 Chapter 7: Powers of Matrices

Proof. Since λ is an eigenvalue of A, we have A~v = λ~v for some ~v = (v1, . . . , vm)t 6= ~0.

Then, for each k, we have λvk = (A~v)k =m∑

j=1

akjvj = akkvk +m∑

j = 1j 6= k

akjvj, and so

(7) (λ− akk)vk =m∑

j = 1j 6= k

akjvj.

Now choose k such thatvk = max

1≤j≤m|vj|.

Then, applying the triangle inequality to (7) yields:

|λ− akk| · |vk|(7)=

∣∣∣∣∣∣∣∣∣m∑

j = 1j 6= k

akjvj

∣∣∣∣∣∣∣∣∣ ≤m∑

j = 1j 6= k

|akj||vj| ≤∑j 6=k

|akj|︸ ︷︷ ︸rk

·|vk| = rk · |vk|.

Since |vk| 6= 0 (for otherwise ~v = ~0, contradiction), we obtain |λ − akk| ≤ rk, i.e. λ ∈Dk(A).

Corollary. Suppose that the diagonal entries of the matrix A = (aij) are much largerthan its off-diagonal entries in the sense that

(8) |akk| >m∑

j = 1j 6= k

|akj|, for k = 1, . . . ,m.

Then A is invertible.

Proof. The hypothesis means that |0− akk| > rk or that 0 /∈ Dk(A), for 1 ≤ k ≤ m. ByGersgorin theorem, this means that 0 is not an eigenvalue of A and hence A is invertible.

Example 7.16. If A is as in Example 7.15(e), then the picture shows that 0 does notlie in any Gersgorin disc, so A is invertible.

Proof of Theorem 7.12. We first show:

(9) ρ(A) ≤ ||A|| := maxk

(m∑

j=1

|akj|).

For this, let λ be an eigenvalue of A of maximal absolute value, i.e. |λ| = ρ(A). Now by

Page 23: Chapter 7 Powers of Matrices - Queen's U

Section 7.5: The spectral radius and Gersgorin’s theorem 333

Gersgorin’s theorem, λ ∈ Dk(A) for some k, i.e.

|λ− akk| ≤m∑

j = 1j 6= k

|akj|, or

|λ− akk|+ |akk| ≤m∑

j=1

|akj| ≤ ||A||.

On the other hand, since ρ(A) = |λ| = |(λ−akk)+akk| ≤ |λ−akk|+ |akk| by the triangleinequality, it follows from the above inequality that ρ(A) ≤ |λ−akk|+ |akk| ≤ ||A||, whichproves (9).

Next we apply (9) to At to get ρ(At) ≤ ||At||. But since A and At have the sameeigenvalues, ρ(A) = ρ(At) ≤ ||At||. Thus we have ρ(A) ≤ ||A|| and ρ(A) ≤ ||At|| and soTheorem 7.12 follows.

For real matrices, there are other methods of estimating the spectral radius. One ofthese is based on the trace of the matrix.

Definition. If A = (aij) is an m×m matrix, then its trace is defined as the sum of itsdiagonal elements:

tr(A) = a11 + a22 + · · ·+ amm.

Example 7.17. a) tr(1 23 4

)= 1 + 4 = 5.

b) tr

0@ 1 −2 34 −6 67 −8 9

1A = 1− 6 + 9 = 4.

Properties of the trace. If A,B and P are m×m matrices and P is invertible, then

(1) tr(AB) = tr(BA)

(2) tr(P−1AP ) = tr(A)

(3) tr(A) = mA(λ1)λ1 + · · ·+mA(λs)λs

= sum of the eigenvalues of A, counted with their multiplicities.

[Indeed, to verify (1), write A = (aij) and B = (bij). Then

tr(AB) =m∑

i=1

m∑k=1

aikbki =m∑

k=1

m∑i=1

bkiaik = tr(BA)

which is (1). From this (2) follows because

tr(P−1AP ) = tr(P−1︸︷︷︸A

(AP )︸ ︷︷ ︸B

)(1)= tr((AP )P−1) = tr(A).

Finally, (3) is immediate for Jordan matrices J. If A is an arbitrary matrix, then by

Jordan’s theorem P−1AP = J is a Jordan matrix for some P , and thus tr(A)(2)=

tr(P−1AP ) = tr(J).]

Page 24: Chapter 7 Powers of Matrices - Queen's U

334 Chapter 7: Powers of Matrices

Theorem 7.14. If A is a real m×m matrix, then

(10) ρ(A) ≤√ρ(AtA) ≤

√tr(AtA).

Remark. Even though√ρ(AtA) is a better estimate of ρ(A) than

√tr(AtA), the price

we pay for this improvement is often not worth the effort, for ρ(AtA) is in general as hard(or harder) to compute as ρ(A). (For example, if A is a symmetric matrix (At = A),then ρ(AtA) = ρ(A2) = ρ(A)2, so the first inequality in (10) is an equality in this caseand hence is vacuous).

However, the combination of this inequality with other methods frequently leads tobetter results, as the following example shows.

Example 7.18. Estimate ρ(A) when A =

1 −3 4 02 1 2 11 −1 1 −1−1 2 1 −1

.

Solution. We try in turn the various methods at our disposal.

Method 1: use Theorem 7.12.

Here ||A|| = max(5, 7, 8, 3) = 8 and ||At|| = max(8, 6, 4, 5) = 8, so by Theorem 7.12 weobtain ρ(A) ≤ max(8, 8) = 8.

Method 2: use Theorem 7.14.

Since AtA =

7 . . . . . . ∗... 15

...... 22

...∗ . . . . . . 3

, we see that tr(AtA) = 47 and hence by Theorem 7.14

we obtain ρ(A) ≤√

47 < 6.86.

Method 3: combine Theorems 7.12 and 7.14.

We have AtA =

7 −4 8 24 15 −9 08 −9 22 02 0 0 3

.

Applying Method 1 to AtA yields ρ(AtA) ≤ ||AtA|| = max(21, 28, 39, 5) = 39 and henceby the first inequality of (10) we obtain ρ(A) ≤

√ρ(AtA) ≤

√39 < 6.25, which is the

best estimate.

[However, none of these three estimates is particularly good, for

chA(t) = t4 − 2t3 + 3t2 + 16t+ 45

which has (by MAPLE) the approximate roots 2.36496± 2.57701i, −1.36496± 1.34728i,

and so ρ(A)·= |2.36496± 2.57701i| ·

= 3.49772, which is almost 12

of the above estimate!]

Page 25: Chapter 7 Powers of Matrices - Queen's U

Section 7.5: The spectral radius and Gersgorin’s theorem 335

The proof of Theorem 7.14 is based on the fact that AtA is a positive definite matrix,so we first review some properties of such matrices.

Insert: Properties of positive semi-definite matrices

Let A be a real symmetric m × m matrix. Then A is said to be positive semi-definite(notation: A ≥ 0) if

~vtA~v ≥ 0, for all ~v ∈ Rn.

(Note that ~vtA~v is a 1× 1 matrix, i.e. a number).

1) A ∈Mm(R)⇒ AtA ≥ 0.

[Clearly, AtA is real symmetric, for (AtA)t = At(At)t = AtA. Furthermore, ~vt(AtA)~v =(~vtAt)A~v = (A~v)tA~v = ||A~v||2 ≥ 0, so AtA is positive semi-definite.]

2) A ≥ 0⇒ A is diagonable and has real non-negative eigenvalues λi ≥ 0.

[Indeed, since At = A, the Principal Axis Theorem (cf. Chapter 5) tells us that A isdiagonable and that the eigenvalues λi are real. Now if ~vi is an associated eigenvector,then λi||~vi||2 = λi~v

ti~vi = ~vt

i(A~vi) ≥ 0, so λi ≥ 0 since ||~vi||2 > 0.]

3) A ≥ 0⇔ At = A and λi ≥ 0 for every eigenvalue λi of A.

[Clearly, if A ≥ 0 then At = A by definition and λi ≥ 0 by property 2). Moreover, theconverse is clear if A = Diag(λ1, . . . , λm) is diagonal. In the general case, A = P−1DPfor some orthogonal matrix P = (P−1)t, so ~vtP tDP~v = (P~v)tD(P~v) ≥ 0.]

4) A ≥ 0⇒ ρ(A) ≤ tr(A).

[Indeed, let λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0 be the eigenvalues of A, listed according to theirmultiplicities. Then ρ(A) = λ1 ≤ λ1 + λ2 + · · ·+ λm = tr(A).]

In addition to these elementary properties, we also require the following more subtlefact.

Lemma. Let A ∈ Mm(R) be a real m×m matrix and let λ ∈ C be an eigenvalue of A.If 0 ≤ µ1 ≤ · · · ≤ µm denote the eigenvalues of AtA (counted with multiplicities), thenthere are non-negative real numbers θ1, . . . , θm ≥ 0 such that

(11) |λ|2 =m∑

k=1

θiµi andm∑

k=1

θi = 1.

Proof. Note that even though A is real, some of its eigenvalues and eigenvectors may benon-real. We thus introduce the complex conjugate ~v and complex norm ||~v|| of a vector~v = (v1, . . . , vm)t ∈ Cm:

~v = (v1, . . . , vm)t

||~v|| =√~v · ~v =

√|v1|2 + · · ·+ |vm|2

Page 26: Chapter 7 Powers of Matrices - Queen's U

336 Chapter 7: Powers of Matrices

This norm has the familiar properties:

||~v|| ≥ 0 and ||~v|| = 0⇔ ~v = ~0(12)

||α~v|| = |α| · ||~v||.(13)

We apply this to an eigenvector ~v 6= ~0 associated to our given eigenvalue λ. (Thus~v ∈ EA(λ).) Since A is real, we have A~v = (A~v) and so

~vtAtA~v = (A~v)t(A~v) = ||A~v||2 = ||λ~v||2 (13)= |λ|2||~v||2.

Now we apply the Principal Axis Theorem to the symmetric matrix AtA to find anorthogonal matrix P ∈Mn(R) such that

P−1(AtA)P = Diag(µ1, µ2, . . . , µm) = D, or AtA = PDP−1 = PDP t.

Put ~w = P t~v. Then, since P is real, ~w = (P t~v) = P t~v, and so

~wtD~w = (P t~v)tDP t~v = ~vtPDP t~v = ~vt(AtA)~v = |λ|2||~v||2

by the above. On the other hand, since D is a diagonal matrix,

~wtD~w =m∑

i=1

wiµi ~wi =m∑

i=0

µi|wi|2, if ~w = (w1, . . . , wm)t.

Thus, combining this with the previous equation yields

m∑i=1

µi|wi|2 = |λ|2||~v||2,

and so we see that the first equation of (11) holds with

θi =|wi|2

||~v||2≥ 0.

Moreover, the second equation of (11) also holds because

||~v||2m∑

i=1

θi =m∑

i=1

|wi|2 = ||~w||2 = ||P t~v||2 = (P t~v)t(P~v) = ~vtPP t~v = ||~v||2,

and so, dividing through by ||~v||2, we obtain that

m∑i=1

θi = 1.

Page 27: Chapter 7 Powers of Matrices - Queen's U

Section 7.5: The spectral radius and Gersgorin’s theorem 337

Proof of Theorem 7.14. Since AtA is positive semi-definite (cf. property 1) of the aboveinsert), we have ρ(AtA) ≤ tr(AtA) by property 4). This proves the second inequality of(10).

To prove the first inequality, let λ ∈ C be an eigenvalue of A such that |λ| = ρ(A),and put M = max{µi : 1 ≤ i ≤ m}, where the µi’s are the eigenvalues of AtA. Thus,since AtA is positive semi-definte, M = ρ(AtA). Then by the Lemma we have

|λ|2 =m∑

i=1

µiθi ≤m∑

i=1

Mθi = Mm∑

i=1

θi = M = ρ(AtA),

and so ρ(A)2 = |λ|2 ≤ ρ(AtA). From this the first inequality of (10) follows by takingsquare roots.

Exercises 7.5.

1. (a) Find the Gersgorin discs of the following matrices.

A =

(0 1−1 i

); B =

0 −1 11 2 11 0 −1

; C =

2i 1 10 2 01 0 −1

.

(b) What can say about the approximate location of the eigenvalues?

(c) Compute the eigenvalues and compare your answer to that obtained in (b).

2. Suppose A is a 3× 3 matrix with spectrum {1 + i, 1− i, 1}.(a) What is the spectral radius of A?

(b) What is the spectral radius of B = A2 − 2A+ 2I?

(c) Is B power convergent?

3. Let m×m matrix with characteristic polynomial

chA(t) = tm + am−1tm−1 + . . .+ a1t+ a0.

Show thattr(A) = −am−1 and det(A) = (−1)ma0.

4. (a) Show that for all m×m matrices we have

tr(ABC) = tr(BCA) = tr(CAB).

(b) Find three 2× 2 matrices such that tr(ABC) 6= tr(ACB).

Page 28: Chapter 7 Powers of Matrices - Queen's U

338 Chapter 7: Powers of Matrices

7.6 Geometric Series

In the previous sections we learned how to determine whether a given matrix A is powerconvergent and also how to find the limit limn→∞An. We now apply these methods tothe following problem arising in economics.

Problem: Four regions (or cities) R1, R2, R3, and R4 ship non-renewable commodities(such as antique paintings, rental cars) among themselves. Each region ships per week afixed percentage of its goods to the other regions according to the following diagram:

R1

70%−→←−30%

R2

10%↓ ↘10% ↓10%

R3 R4

The figures represent the percentage ofgoods each region ships per week.(Percentage: in terms of goods present)

Today’s distribution:

R1 has $300, 000 worth of goodsR2 has $200, 000 worth of goodsR3 has $100, 000 worth of goodsR4 has $100, 000 worth of goods

Questions: 1) What is the distribution of goods after n weeks?

2) What happens in the long run?

Analysis: Let ~v1n = amount in $100,000 of goods present in R1 after n weeks, ~v2n =the amount of goods in R2 after n weeks, etc.

Thus, ~vn =

0BB@v1n

v2n

v3n

v4n

1CCA represents the distribution of goods after n weeks, and today’s distri-

bution is represented by the vector ~v0 =

0BB@3211

1CCA. We now determine how the distribution

of the goods changes from week n to week n+ 1:

R1 retains 10% of its goods (because it sends 70 + 10 + 10 = 90% away)+ receives 30% of those of R2

R2 retains 60% of its goods (because it sends 30 + 10 = 40% away)+ receives 70% of those of R1

R3 retains all of its goods+ receives 10% of those of R1

R4 retains all of its goods+ receives 10% of those of R1

Page 29: Chapter 7 Powers of Matrices - Queen's U

Section 7.6: Geometric Series 339

We thus have the equations:

v1,n+1 = 110v1,n + 3

10v2,n

v2,n+1 = 710v1,n + 6

10v2,n

v1,n+1 = 110v1,n + + 1 · v3,n

v1,n+1 = 110v1,n + 1

10v2,n + + 1 · v4,n

This means that we have the discrete linear system

~vn+1 = A~vn

in which

A =1

10

1 3 0 07 6 0 01 0 10 01 1 0 10

and ~v0 =

3211

.

From chapter 5 (discrete linear systems) we therefore know that

~vn = An~v0.

How can we compute An efficiently?We could of course apply our previous method (using constituent matrices) to the

matrix A, but since A has the special shape

A =

(T 0B I

), where T = 1

10

(1 37 6

)and B = 1

10

(1 01 1

),

it is better to use the following rule.

Observation (Rule for multiplying partitioned matrices). Given the partitioned matricesas indicated, we have

m1{

n1︷︸︸︷A11

n2︷︸︸︷A12 . . .

ns︷︸︸︷A1s

m2{ A21 . . . . . . A2s

.... . . . . .

...mr{ Ar1 . . . . . . Ars

k1︷︸︸︷B11

k2︷︸︸︷B12 . . .

kt︷︸︸︷B1t }n1

B21 . . . . . . B2t }n2

.... . . . . .

...Bs1 . . . . . . Bst }ns

=

k1︷︸︸︷C11

k2︷︸︸︷C12 . . .

kt︷︸︸︷C1t }m1

C21 . . . . . . C2t }m2

.... . . . . .

...Cr1 . . . . . . Crt }mr

where Cij = Ai1B1j + Ai2B2j + · · ·+ AisBsj.

Notes: 1) As indicated, Aij is an mi × nj matrix and Bij an ni × kj matrix.

2) This generalizes the rules we learned for multiplying block matrices.

Page 30: Chapter 7 Powers of Matrices - Queen's U

340 Chapter 7: Powers of Matrices

Applying this rule to our situation with A =(

T 0B I

)yields:

A2 =

(T 0B I

)(T 0B I

)=

(T 2 + 0 ·B T · 0 + 0 · IB · T + I ·B B · 0I2

)=

(T 2 0

B(T + I) I

)Similarly,

A3 = A2 · A =

(T 2 0

B(T + I) I

)(T 0B I

)=

(T 3 0

B(T + I)T +B I

).︸ ︷︷ ︸

B(T 2+T+I)

Thus, continuing in this manner leads to the formula

(14) An =

(T n 0

B(T n−1 + T n−2 + · · ·+ T + I) I

).

This formula therefore shows that in order to find An, we only have to do computationsinvolving 2× 2 (rather than 4× 4) matrices!

However, while we know how to compute T n efficiently, how do we compute the sum

T n−1 + · · ·+ T + I?

Recall (from high school): Geometric series. If a 6= 1 is any number, then

1 + a+ a2 + · · ·+ an−1 =1− an

1− a

and hence, if |a| < 1, then the infinite series (called a geometric series)

∞∑n=0

an def.= lim

n→∞(1 + 1 + · · ·+ an−1) =

1

1− a.

(Recall from analysis that an (infinite) series is the limit of a sequence of partial sums.)

This idea works for matrices as well, as we shall now see.

Definition. Let T be a square matrix. Then the sequence {Sn}n≥0 defined by

Sn = I + T + . . .+ T n−1, S0 = I,

is called the geometric series generated by T . The series converges if the sequence {Sn}n≥0

converges; we then write∞∑

n=0

T n = limn→∞

Sn.

Page 31: Chapter 7 Powers of Matrices - Queen's U

Section 7.6: Geometric Series 341

Theorem 7.15. The geometric series generated by T converges if and only if

(15) ρ(T ) < 1; i.e. |λi| < 1, for each eigenvalue λi of T.

If this condition holds, then I − T is invertible and we have

(16) Sn :=n−1∑k=0

T k = (I − T )−1(I − T n),

and hence the series converges to

(17)∞∑

k=0

T k = (I − T )−1.

Remark. Note that by the Corollary of Theorem 7.8 the above condition (15) is equiv-alent to:

(18) limn→∞

T n = 0.

Proof of Theorem 7.15. (⇐) Suppose first that the sequence {Sn}n converges, andput S = lim

n→∞Sn. Then, since

Sn − Sn−1 = (T n + T n−1 + · · ·+ I)− (T n−1 + · · ·+ I) = T n,

it follows that limn→∞ T n = limn→∞(Sn−Sn−1) = limn→∞ Sn−limn→∞ Sn−1 = S−S = 0.Thus, T is power convergent with limit 0, so by Theorems 7.6, 7.7, 7.8, we see that (15)holds.

(⇒) Now suppose that (15) holds. Then the eigenvalues of I−T are 1−λi 6= 0, so I−Tis invertible. Furthermore,

T · Sn = T (I + T + · · ·+ T n)

= T + T 2 + · · ·+ T n+1

= (I + · · ·+ T n) + T n+1 − I= Sn + T n+1 − I.

Thus, T ·Sn−Sn = T n+1−I and hence (T −I)Sn = T n+1−I. Since (T −I) is invertible,formula (16) follows.

Finally, from (15) and Theorem 7.7 we have that limn→∞

T n+1 = 0, so

limn→∞

Sn = (I − T )−1(I − limn→∞

T n+1) = (I − T )−1 · I,

which proves (17).

Page 32: Chapter 7 Powers of Matrices - Queen's U

342 Chapter 7: Powers of Matrices

Example 7.19. Find the geometric series∞∑

n=0

T n when T = 110

(1 37 6

).

Solution. We first verify that T satisfies the hypothesis (15).

Method 1: By Theorem 7.12 we have ρ(T ) ≤ ||T || = max( 810, 9

10) = 9

10< 1, so T

satisfies (15).

Method 2: Put T ′ = 10T =(1 37 6

). Then chT ′ = (1 − t)(6 − t) − 21 = t2 − 7t − 15 =

(t− 7+√

1092

)(t− 7−√

1092

), so chT (t) = (t− 7+√

10920︸ ︷︷ ︸λ1

)(t− 7−√

10920︸ ︷︷ ︸λ2

).

Since 0 < 120

(7 +√

109) < 120

(7 +√

121) = 1820< 1,

and 0 > 120

(7−√

109) > 120

(7−√

100) = − 320> −1,

we see that ρ(T ) = max(|λ1|, |λ2|) = 120

(7 +√

109) < 910< 1, so T satisfies (15).

Thus, by Theorem 7.15, the geometric series generated by T converges and∞∑

k=0

T n = (I − T )−1 =

(1

10

(9 −3−7 4

))−1

= 10

(9 −3−7 4

)−1

=10

36− 21

(4 37 9

)=

2

3

(4 37 9

).

Example 7.20. Application to the “shipping of commodities” problem:

Find limn→∞

An when A =

(T OB I

), with T as in Example 7.19 and B = 1

10

(1 01 1

).

Solution. By the previous discussion we have:

An (14)=

(T n 0

B(I + · · ·+ T n−1) I

), so lim

n→∞An Th. 7.15

=

(0 0

B(I − T )−1 I

).

Now by Example 7.19

(I − T )−1 = 23

(4 3

7 9

), so B(I − T )−1 = 1

10

(1 0

1 1

)23

(4 3

7 9

)= 1

15

(4 3

11 12

),

and hence

limn→∞

An = 115

0 0 0 00 0 0 04 3 15 011 12 0 15

.This means that the distribution of goods in the long run is

limn→∞

~vn = limn→∞

An~v0 = 115

0BB@0 0 0 00 0 0 04 3 15 011 12 0 15

1CCA0BB@

3211

1CCA = 115

0BB@003372

1CCA =

0BB@00

2.24.8

1CCA.In other words, R1 and R2 do not have any goods in the long run,

R3 has $220, 000 worth of goods in the long run,R4 has $480, 000 worth of goods in the long run.

Page 33: Chapter 7 Powers of Matrices - Queen's U

Section 7.6: Geometric Series 343

Exercises 7.6.

1. For each of the following matrices, determine (and justify) whether or not thegeometric series generated by the matrix converges and, if so, find its limit.

A =1

2

(1 1−1 1

), B =

1

4

3 0 11 2 −11 0 3

, C =1

5

4 0 0 01 4 −1 00 0 3 01 3 0 0

.

2. Find limn→∞

An when

A =1

2

1 0 1 0 0 00 1 1 0 0 0−1 1 1 0 0 0

0 0 1 1 0 00 1 0 0 1 01 0 0 0 0 1

.

3. (a) Let T be an r× r matrix, B an s× r matrix and put m = r+ s. Show that the

m×m matrix A =

(T 0B I

)is invertible if and only if T is invertible, and that in

this case A−1 =

(T−1 0−BT−1 I

).

(b) Write down the inverse of the 10×10 matrix A =

10 0 . . . . . . 0

9 1. . .

......

.... . . . . .

...2 0 . . . 1 01 0 . . . 0 1

without

performing any calculations.

4. Let P =

(A C0 B

), where A is an r × r matrix, B is an s× s matrix and C is an

r × s matrix.

(a) Show that P is invertible if and only if A and B are invertible. [Hint: Considerthe linear independence of suitable rows and/or columns.]

(b) If A and B are invertible, show that the inverse of P is given by

P−1 =

(A−1 X

0 B−1

), where X = −A−1CB−1.

Note: A quick method to prove (a) is to use the general fact that

det(P ) = det(A) det(B),

but you are not supposed to use this rule in this problem.

Page 34: Chapter 7 Powers of Matrices - Queen's U

344 Chapter 7: Powers of Matrices

7.7 Stochastic matrices and Markov chains

The discrete linear system~vn+1 = A~vn

of the previous example (“shipping of commodities”) had the following two properties:

1) Each entry of A (=transition matrix) is non-negative.

2) The sum of the entries of each column is = 1 (= sum of percentages).

A discrete linear system of this type is called a stochastic system, and its transition matrixA is called a stochastic matrix. We had encountered such matrices already in subsection5.9.3 in connection with Markov chains. More generally:

Definition. A real p× q matrix A = (aij) is called stochastic if

aij ≥ 0, for all i, j;(19)

Σp · A = Σq,(20)

where Σp = (1, 1, . . . , 1)︸ ︷︷ ︸p

(viewed as a 1× p matrix).

Remark. Condition (20) means: the sum of each column of A is equal to 1.

Example 7.21. (a) (1, 1, 1)︸ ︷︷ ︸P3

2 43 21 5

︸ ︷︷ ︸

A

= (2 + 3 + 1, 4 + 2 + 5) = (16, 11)

⇒ A is not stochastic.

(b) B = 15

1 12 32 1

; Σ3B = (1, 1, 1)15

1 12 32 1

= (1, 1)⇒ B is stochastic.

Theorem 7.16. If A is a stochastic matrix of size p× qand B is a stochastic matrix of size q × r,then AB is a stochastic matrix of size p× r.

Proof. Clearly AB has non-negative entries (since A and B do), so it is enough to showthat: Σp(AB) = Σr. Now:

Σp(AB) = (ΣpA)BA stochastic

= (Σq)BB stochastic

= Σr ,

and so AB is stochastic.

Corollary. If A is a square stochastic matrix, then so is An for all n ≥ 0. If, moreover,~v is a stochastic vector, then An~v is a stochastic vector for all n ≥ 0.

Page 35: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 345

Proof. Induct on n. Since the statement is clear for n = 0, assume n > 0. By theinduction hypothesis we have that An−1 is stochastic, and hence An = A · An−1 is alsostochastic by Theorem 7.16. Thus, the first assertion holds for all n ≥ 0.

The second assertion follows from the first by applying Theorem 7.16 to An and ~v.

Remark. In section 5.9 we had seen that if (S,A) is a Markov chain, then its transitionmatrix A and probability distribution vectors ~vn are all stochastic, and that these satisfythe discrete linear system

~vn+1 = A~vn.

In the sequel we shall frequently refer to any such discrete linear system (in which A andthe ~vn’s are stochastic) as an (abstract) Markov chain.

We have the following basic result about the eigenvalues of a square stochastic matrix:

Theorem 7.17. Let A be a square stochastic matrix of size m×m. Then

(a) 1 is a regular eigenvalue of A.

(b) We have |λi| ≤ 1, for every eigenvalue λi of A; i.e. ρ(A) = 1.

Proof. (a) We first prove that λ1 = 1 is an eigenvalue. Since At and A have the sameeigenvalues, this follows from:

(21) Σtm is an eigenvector of At with eigenvalue λ1 = 1.

Indeed, A stochastic ⇒ ΣmA = Σm ⇒ AtΣtm = Σt

m ⇒ Σtm is a 1-eigenvector of At.

To prove that 1 is a regular eigenvalue of A, we will use the norm of a matrix as wasdefined in section 7.5. Since A is stochastic, we have that

(22) ||A|| = max(1, 1, . . . , 1) = 1.

Thus, since An is also stachastic, we have that ||An|| = 1, for any n ≥ 1. Thus, by usingproperty (3) of the norm, we see that for any invertible matrix P we have

(23) ||P−1AnP ||∗ ≤ cP := ||P−1|| · ||P ||, for all n ≥ 1.

Now take P such that P−1AP = J is a Jordan matrix. Then by (23) we see that||Jn|| ≤ cP , for all n ≥ 1. This implies that J cannot contain a Jordan block J(1, k) ofsize k ≥ 2 because then

J(1, k)n =

1 n ∗ ∗0 1 n ∗. . . . . . . . . . . .0 . . . . . . 1

,

where the ∗’s are positive numbers, and so ||Jn|| ≥ ||J(1, k)n|| ≥ 1+n, which contradicts(23). Thus, 1 is a regular eigenvalue of A.

Page 36: Chapter 7 Powers of Matrices - Queen's U

346 Chapter 7: Powers of Matrices

(b) By Theorem 7.12 and (22) we see the spectral radius ρ(A) ≤ ||A|| = 1. Since bypart (a) we know that 1 is an eigenvalue of A, it follows that ρ(A) = 1.

Warning. Theorem 7.17 does not imply that A is power convergent, for 1 need notbe a dominant eigenvalue, i.e. there may be other eigenvalue λi 6= 1 with |λi| = 1. For

example: A =„

0 11 0

«is stochastic, but not power convergent (since its eigenvalues are

1,−1). Note that A2 = I, so A2n+1 = A,A2n = I.

Corollary. A stochastic matrix A is power convergent if and only if 1 is a dominanteigenvalue of A, i.e. if and only if λ = 1 is the only eigenvalue of A of absolute value 1.

Proof. Combine Theorems 7.6′ and 7.17.

While a stochastic matrix does not need to be power convergent in general, thereis a fairly simple criterion which ensures its power convergence. This criterion, whichwas discovered by Oscar Perron (1880–1975) in 1907, depends on the following simpleconcept.

Definition. A square stochastic matrix A is called primitive if for some n ≥ 1 the matrixAn has no entries equal to 0.

Remark. Note that An has no zero entries, then the same is true for An+1 etc. Thuswe see that A is primitive if and only if An is primitive for some (hence every) n.

Example 7.22. (a) A = 110

(1 29 8

)is clearly primitive (take n = 1).

(b) A = 110

(0 110 9

)is also primitive, for A2 = 1

100

(0 110 9

)= 1

100

(10 990 91

).

(c) Upper/lower triangular (block) matrices are never primitive. In particular, thetransition matrix A of the commodities example (Example 7.19) is not primitive.

Theorem 7.18 (Perron’s Theorem). If A is a primitive, stochastic matrix, then:

(a) λ1 = 1 is a simple, dominant eigenvalue of A.

(b) The associated eigenspace

EA(1) = {c~p : c ∈ C}

is generated by a (unique) stochastic vector ~p (called the Perron vector of A).

(c) All the entries of ~p are positive.

Proof. Without proof.1

Corollary. If A is primitive and stochastic, then A is power convergent with limit

limn→∞

An = (~p|~p| . . . |~p)︸ ︷︷ ︸m

,

where ~p denotes the Perron vector of A. Moreover, for any vector ~v we have

limn→∞

An~v = c~p, where c = Σm~v.

1A proof can be found in F.R. Gantmacher’s book, The Theory of Matrices, vol. II, p. 53ff.

Page 37: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 347

In particular, if ~v is stochastic, then limn→∞

An~v = ~p does not depend on ~v.

Proof. By Perron’s theorem (Theorem 7.18) we know that 1 is a dominant, simpleeigenvalue of A, so A is power convergent. Moreover, by Theorem 7.10 we have

limk→∞

Ak =1

~ut~v~u · ~vt,

where ~u is a 1-eigenvector of A and ~v is a 1-eigenvector of At. Here we can take~u = ~p (Perron’s vector)~v = (1, . . . , 1)t = Σt

m since A is stochastic.

Now ~ut · ~v = 1 since ~p is stochastic,~u · ~vt = ~p · (1, . . . , 1) = (~p| . . . |~p),

so first formula follows. Moreover,

limk→∞

Ak~v = ( limk→∞

Ak)~v = (~p| . . . |~p)~v = ~pΣm~v︸︷︷︸c

.

Remarks. 1) As the above proof shows, the conclusion of the corollary holds moregenerally for stochastic matrices for which 1 is a simple, dominant eigenvalue (and ~p isa stochastic 1-eigenvector). However, in this more general case the entries of ~p need notbe positive.

2) The above theorem and its corollary reveal the following interesting fact abouta stochastic discrete linear system ~vn+1 = A~vn: if A is primitive, then no state will becompletely drained in the long run (because the limit c~p has no zero entries).

Note that in the commodities example (Example 7.19), the states R1 and R2 weredrained in the long run (but there the transition matrix A was not primitive!)

Example 7.23. Consider the economics example 7.19 again, but assume now that thetransition matrix A describing the fractions aij of the value of goods shipped weekly fromregion Rj to Ri is

A =

.1 .2 .1 .1.7 .6 .1 .1.1 .1 .7 .1.1 .1 .1 .7

and the initial distribution vector is ~v =

v1

v2

v3

v4

where v1 + v2 + v3 + v4 = 7.

Find the distribution of the goods in the long run.

Solution. Note that here we are not specifying how many $100, 000 worth of goods waspresent initially in each region, only that the total present initially was $700, 000. So ~vcould be (3, 2, 1, 1)t or (7, 0, 0, 0)t, etc.

Qualitative Solution. Thanks to O. Perron (through the corollary to his theorem) weknow (with no calculation) that there is a vector ~p = (p1, . . . , p4)

t (all of whose entries

Page 38: Chapter 7 Powers of Matrices - Queen's U

348 Chapter 7: Powers of Matrices

are positive (pj > 0)) that depends only on A (not on ~v) such that limn→∞An~v = 7~p.Therefore, in the long run, no matter how ~v was chosen among all the vectors with entrysum 7, eventually Rj will have $700, 000pj worth of the commodity. This tells us a greatdeal about the economic situation in a qualitative way.

Quantitative Solution. If we want more quantitative information we need to find ~p.This is a routine matter of finding the eigenvectors for 1. We know in advance (againthanks to O. Perron) that 1 is simple, so any solution ~x of (A−I)~x = ~0 will be a multipleof ~p, i.e. ~x = c~p, with c = Σ4~x. Now by row reducing A − I (or by inspection) we findthat

~x =

6161111

is such a solution, so

~p =1

Σ4~x~x =

1

44

6161111

=

3/224/111/41/4

·=

.14.36.25.25

.

Since limn→∞

An~v = 7~p, this means that

R1 gets $95, 455 = 3/22× $700, 000 worth of goods eventually,

R2 gets $254, 545 = 4/11× $700, 000 worth of goods eventually,

R3 gets $175, 000 = 1/4× $700, 000 worth of goods eventually,

R4 gets $175, 000 = 1/4× $700, 000 worth of goods eventually,

regardless of how the $700, 000 worth of goods was initially distributed!

The only effect of the choice of ~v has on the system is to determine how soon the“ultimate” distribution is reached. For example, if

~v = 7~p =

0.952.551.751.75

,

then it’s reached immediately; however if

~v =

1.42.11.751.75

then it will take longer. (Using the computer, it is not difficult to find the first n forwhich the dollar difference between $106(An~v)j and $7 × 106pj is negligible for all j.

Page 39: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 349

Here, ‘negligible’ means less than $10, 000, $1, 000, $100, $10, $1 or $0.01 depending onthe situation at hand.)

Example 7.24. Find the limit limn→∞

An when A =

0 12

13

12

0 23

12

12

0

.

Solution. Clearly, A is stochastic. Even though A has some zero entries, A is stillprimitive because

A2 =

0 12

13

12

0 23

12

12

0

0 1

213

12

0 23

12

12

0

=1

12

5 2 4

4 7 2

3 3 6

has no zero entries. Thus, by Perron’s theorem, A is power convergent and

limn→∞

An = (~p|~p|~p),

where ~p is the Perron vector of A.To compute ~p, solve (A− I)~x = ~0:

1

6

−6 3 23 −6 43 3 −6

~x = ~0⇒ ~x = c

8109

, so ~p =1

27

8109

.

Thus, limn→∞

An Th. 7.18= 1

27

8 8 810 10 109 9 9

.

Example 7.25. Find the limit limn→∞

Bn when B =

0 12

12

12

0 12

13

23

0

.

Solution. We first observe that even though B is not stochastic (for the columns do notadd to 1), A = Bt is stochastic, and so the theory of the present section applies. In fact,since A is the matrix of Example 7.24, we know that

limn→∞

An = 127

8 8 810 10 109 9 9

.

Now since Bn = (At)n = (An)t, we therefore obtain

limn→∞

Bn = limn→∞

(An)t = 127

8 8 810 10 109 9 9

t

= 127

8 10 98 10 98 10 9

.

Page 40: Chapter 7 Powers of Matrices - Queen's U

350 Chapter 7: Powers of Matrices

We now use Perron’s Theorem to analyze the following examples of “rat mazes” whichmight arise in Biology experiments.

Example 7.26 (A Rat Maze). As in Example 5.30, consider the following system ofthree chambers connected by passages as shown.

1

2

3

At time t0, a rat is placed in one of the cham-bers (say in chamber 1).

Each minute thereafter, the rat is driven outof its present chamber by some stimulus andis prevented from re-entering immediately.

Assume: the rat chooses the exits of eachchamber at random.

Question: What is the probability that the rat is in a certain chamber in the long run?

Solution. Recall from the solution of Example 5.30 that this problem gives rise to aMarkov chain

~vn+1 = A~vn with ~v0 = (1, 0, 0)t,

in which ~vn = (v1,n, v2,n, v3,n)t denotes the state probability vector (with vi,n = theprobability that the rat is in chamber i after n minutes) and that the transition matrixA is given by

A =

0 12

13

12

0 23

12

12

0

(cf. Example 5.30), which is the same as that of Example 7.24. Thus, by the latter exam-ple we see that the matrix A is primitive, and that the Perron vector is ~p = 1

27(8, 10, 9)t.

By Perron’s theorem we therefore have

limn→∞

An = 127

8 8 810 10 109 9 9

and limn→∞

~vn = ~p = 127

8109

.

This means that in the long run the rat spends:

827

of its time in chamber 1,1027

of its time in chamber 2,13

of its time in chamber 3.

Note: This result does not depend on where (i.e. in which chamber) the rat was originallyplaced.

Page 41: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 351

Example 7.27. This example is the same experiment as in Example 7.26, except thatthe maze is different. Moreover, if the rat chooses an exit A or B, then it leaves the maze(and the experiment is over).

1 2

3

B

A

��

��

���

��

��

��� Analysis: To interpret this as a Markov chain, we let

sj be the state: ‘the rat is in chamber j’ for j = 1, 2, 3;s4 be the state: ‘the rat went to exit A’;s5 be the state: ‘the rat went to exit B’.

The corresponding transition matrix (assuming random selections by the rat) is:

A =

0 1

314

0 012

0 14

0 012

13

0 0 0

0 13

14

1 0

0 0 14

0 1

.

This matrix is of the form A =

(T OB I

)where

T =

0 13

14

12

0 14

12

13

0

, B =

(0 1

314

0 0 14

), O =

0 00 00 0

and I =

(1 00 1

).

The matrix A isn’t primitive, but we can use Theorem 7.15 to find limn→∞

An, as we did

in the problem of section 7.6 (cf. Example 7.20).

We have limn→∞

An =

(0 0

B(I − T )−1 I

)because ρ(T ) < 1 (cf. Example 7.14).

Now (I − T )−1 = 178

132 60 4890 126 5496 72 120

, and so limn→∞

An =1

13

0 0 0 0 00 0 0 0 00 0 0 0 09 10 8 13 04 3 5 0 13

.

If we place the rat in chamber i with probability qi initially, then the initial vector is~v0 = (q1, q2, q3, 0, 0)t and

limn→∞

~vn = limn→∞

An~v0 = (0, 0, 0, 1− q, q)t, where q = 413q1 + 3

13q2 + 5

13q3.

Page 42: Chapter 7 Powers of Matrices - Queen's U

352 Chapter 7: Powers of Matrices

Example 7.28. Consider the following rat maze:

1 3

4 2

Find the probability that the rat is in a given chamber after n minutes when n is large.

Solution. Here we have the Markov chain ~vn+1 = A~vn whose transition matrix is

A =

0 0 1

412

0 0 34

12

13

12

0 023

12

0 0

=

(O BC O

),

if we let B = 14

(1 23 2

), C = 1

6

(2 34 3

)and O =

(0 00 0

). Thus A2 =

„BC 00 CB

«, so

A2k =

((BC)k 0

0 (CB)k

)and A2k+1 = AA2k =

(0 B(CB)k

C(BC)k 0

), for all k ≥ 0.

This shows that A is not power convergent. To find out what An looks like when n islarge, we therefore need to distinguish two cases:

(a) n large and even,(b) n large and odd.

Case (a). We calculate limk→∞

(BC)k = (~p, ~p) where ~p is the Perron vector of the primitive

stochastic matrix

BC = 124

(10 914 15

).

We find that

~p = 123

(914

)and hence

L1 := limk→∞

(BC)k = 123

(9 914 14

).

Then we calculate limk→∞

(CB)k = (~q, ~q) where ~q is the Perron vector of the primitive

stochastic matrix CB = 124

(11 1013 14

). We find that ~q = 1

23

„1013

«and hence

L2 := limk→∞

(CB)k = 123

(10 10

13 13

).

Page 43: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 353

Therefore

limk→∞

A2k = Diag(L1, L2) =1

23

9 9 0 014 14 0 00 0 10 100 0 13 13

.

Thus, when n is large and even,

~vn = An~v0.= Diag(L1, L2)~v0 = 1

23

9(v1 + v2)14(v1 + v2)10(v3 + v4)13(v3 + v4)

when ~v0 =

v1

v2

v3

v4

.

For example,

if ~v0 =

0BB@120120

1CCA then ~v1024·= 1

46

0BB@9141013

1CCA and if ~v0 =

0BB@1000

1CCA then ~v1024·= 1

23

0BB@91400

1CCA.The latter computation can be interpreted as follows. If we place the rat in chamber1 at time t = 0, then after 1024 minutes the probability that it is in chamber 1 isapproximately 9

23and that it is in chamber 2 is (approximately) 14

23. On the other hand,

it is highly unlikely (i.e. the probability is almost 0) that it is in chambers 3 or 4.

Case (b). Here we have

limk→∞

A2k+1 = A limk→∞

A2k = 123

(0 BL2

CL1 0

)=

1

23

0 0 9 90 0 14 1410 10 0 013 13 0 0

.

Thus when n is large and odd,

~vn·=

1

23

0 0 9 90 0 14 1410 10 0 013 13 0 0

~v0 = 123

9(v3 + v4)14(v3 + v4)10(v1 + v2)13(v1 + v2)

.

For example,

if ~v0 =

0BB@1000

1CCA then ~v1025·= 1

23

0BB@001013

1CCA and if ~v0 =

0BB@0010

1CCA then ~v1025·= 1

23

0BB@91400

1CCA.Again, these computations can be interpreted as follows. If we place the rat in chamber1 at time t = 0, then it is highly unlikely that after 1025 minutes it is in chambers 1or 2, whereas the probability that it is in chambers 3 and 4 is approximately 10

23and 13

23,

respectively. Similarly, if we place it in chamber 3 at time t = 0, then it is highly unlikelythat it is in chambers 3 or 4 after 1025 minutes, whereas the probability that it is inchambers 1 and 2 at that time is approximately 9

23and 14

23, respectively.

Page 44: Chapter 7 Powers of Matrices - Queen's U

354 Chapter 7: Powers of Matrices

Example 7.29. The game of craps.

This game requires only an ordinary pair of dice. It has two stages:

“Coming Out” The “shooter” (=player) rolls the dice. If she “rolls” (i.e. the sum ofthe dots on the tops of the dice is) 7 or 11, she wins; if she rolls 2, 3 or 12 she loses. Inthese two cases the game is over. However if 4, 5, 6, 8, 9 or 10 is rolled then that numberis called the shooter’s “point” and we go to the second stage:

“Making the Point” Now the shooter rolls the dice again. She wins if she rolls herpoint (“makes” her point) and the game is over. If she rolls any other number then sherolls the dice again, winning if she makes her point, (the same point as determined inthe first stage), losing on 7 and rolling again if any other number turns up.

For example: Suppose the shooter rolls 6 at first (so 6 is the point). Then she rolls 8(not the point, not 7), so she rolls again. Say she rolls a 3 (3 is not 7, not 6), so she rollsagain, say a 2 (not 7, not 6), so she rolls again, etc. until either 6 is rolled (she wins) or7 (she loses).

We consider 8 states: s1 : the point is 4s2 : the point is 10s3 : the point is 5s4 : the point is 9s5 : the point is 6s6 : the point is 8s7 : the shooter losess8 : the shooter wins

The transition matrix is A = 136

27 0 0 0 0 0 0 00 27 0 0 0 0 0 00 0 26 0 0 0 0 00 0 0 26 0 0 0 00 0 0 0 25 0 0 00 0 0 0 0 25 0 06 6 6 6 6 6 36 03 3 4 4 5 5 0 36

and the initial vector is ~v0 = 1

36(3, 3, 4, 4, 5, 5, 4, 8)t.

Therefore w, the shooter’s probability of winning eventually is the 8th entry in limn→∞

An~v.

Now A =(

T OB I

)where T is the diagonal matrix 1

36Diag(27, 27, 26, 26, 25, 25) occupying

the first 6 rows and columns of A, so ρ(T ) = 2736< 1 hence lim

n→∞An~v0 =

„0 0

B(I − T )−1 I

«~v0

where B = 136

„6 6 6 6 6 63 3 4 4 5 5

«and (I − T )−1 = 36 Diag(1

9, 1

9, 1

10, 1

10, 1

10, 1

11, 1

11). Thus,

B(I − T )−1 = 1165

„110 110 99 99 99 9055 55 66 66 75 75

«and so the last row of A∞ = limn→∞An is

1165

(55, 55, 66, 66, 75, 75, 0, 165) = (13, 1

3, 2

5, 2

5, 5

11, 5

11, 0, 1). Thus, the desired probability is

w = (13, 1

3, 2

5, 2

5, 5

11, 5

11, 0, 1) · ~v0 = 976

1980= 244

495

.= 0.4929.

Page 45: Chapter 7 Powers of Matrices - Queen's U

Section 7.7: Stochastic matrices and Markov chains 355

Exercises 7.7.

1. Consider the following stochastic matrices

A = 14

2 0 10 4 22 0 1

and B = 14

2 2 10 2 22 0 1

.

Show that B is primitive but that A is not. (Suggestion: Show that A violatesPerron’s Theorem.)

2. Let A = (aij) be a stochastic matrix of size m×m. Show that:

(a) If for every pair i, j with 1 ≤ i, j ≤ m there is an index k (depending on i, j)with aik 6= 0 and akj 6= 0, then A is primitive. [Hint: Consider A2.]

(b) If each row and each column of A contains less than m2

zeros, then A is primitive.[Hint: Use part(a).]

3. A salesman’s territory consists of 3 cities C1, C2, C3. He never sells in the same cityon successive days. If he sells in C1, then the next day he sells in C2. If he sells ineither C2 or C3, then the next day he is twice as likely to sell in C1 as in the othercity.

Let vkn denote the probability that he is in city Ck on the nth day, and let ~vn =(v1n, v2n, v3n)t denote the associated probability distribution vector.

(a) Draw a transition diagram and write down the transition matrix A such that~vn+1 = A~vn.

(b) Find the matrix A2. If he is in city C2 on one day, what is the probability thathe is in C1 two days later?

(c) Show that the matrix A is primitive.

(d) What proportion of his time does the salesman spend in each city in the longrun?

4. A sequence of random numbers x1, x2, x3, . . . , xn, . . . is generated by rolling one die

repeatedly and recording the number xn shown at the top of the die at the nth roll.Let yn = max(x1, x2, . . . , xn), the largest number shown in n rolls. This is gives riseto a Markov chain in which the state sj is the event: “the largest number shownso far is j” (for j = 1, 2, 3, 4, 5, 6).

(a) Write down the transition matrix and initial vector of the Markov chain.

(b) By means of (a), or otherwise, compute the probability that 4 is the largestnumber shown in three rolls of the die. What happens in the long run?

Page 46: Chapter 7 Powers of Matrices - Queen's U

356 Chapter 7: Powers of Matrices

7.8 Application: Finding approximate eigenvalues of

a matrix

Very often it is a difficult if not impossible task to find the exact eigenvalues of a relativelysmall (even a 5×5) matrix, and so we have to be content with finding approximate values.However, in place of finding first the characteristic polynomial and then its approximateroots, there is a better and quicker method based on the ideas of this chapter. Thismethod, while not always successful, works frequently enough so that most computerprograms for finding an approximate eigenvalues are based on it.

Before presenting the method, let us first observe the following variant of Theorem7.8.

Theorem 7.19. If α 6= 0 is a dominant, regular eigenvalue of A, then

(a) B = 1αA is power convergent and

(b) each non-zero column of limn→∞

Bn is an α-eigenvector of A.

The proof of this theorem is based on the following fact, which we had already usedseveral times in the examples.

Observation. Let σ be a complex number and let λ1, . . . λs be the distinct eigenvaluesof an m×m matrix of A. Then σλ1, . . . , σλs are the distinct eigenvalues of σA and wehave

mσA(σλi) = mA(λi) and νσA(σλi) = νA(λi), for 1 ≤ i ≤ s.

In particular, λi is a regular eigenvalue of A if and only if σλi is a regular eigenvalue ofσA.

[Write chA(t) = (−1)m det(A− tI) = (t− λ1)m1 . . . (t− λs)

ms . Then

chσA(t) = (−1)m det(σA− tI)= (−1)m det(σ(A− t

σI))

= (−1)mσm det(A− tσI)

= σm( tσ− λ1)

m1 . . . ( tσ− λs)

ms

= (t− σλ1)m1 . . . (t− σλs)

ms ,

which shows that σA has the eigenvalues σλ1, . . . , σλs with respective algebraic multi-plicities m1, . . . ,ms. Finally, since EσA(σλi) = Nullspace(σA−σλiI) = Nullspace(σ(A−λiI)) = Nullspace(A − λiI) = EA(λi), we have in particular that νσA(σλi) = νA(λi) bytaking dimensions.]

Proof of Theorem 7.19. (a) Let λ1 = α, λ2, . . . , λs be the eigenvalues of A. Then bythe above observation the eigenvalues of B are

1,λ2

α, . . . ,

λs

α.

Page 47: Chapter 7 Powers of Matrices - Queen's U

Section 7.8: Application: Finding approximate eigenvalues of a matrix 357

Since α is a regular eigenvalue for A, 1 is a regular eigenvalue for B (by the same

observation). Moreover, since α is dominant, we have ‖λj

α‖ < 1 for j > 1. Thus B

satisfies the conditions of Theorem 7.6 for the power convergence and hence is powerconvergent.

(b) Put L = limn→∞

Bn, and let ~v = L~ej be any non-zero column of L. Since

BL = B limn→∞

Bn = limn→∞

Bn+1 = limn→∞

Bn = L,

we see that B~v = BL~ej = L~ej = ~v and so A~v = α~v. This means that ~v ∈ EA(α), asclaimed.

From the above theorem we see that for large n, every non-zero column of α−nAn andtherefore every non-zero column of An is approximately an eigenvector of A belonging tothe dominant eigenvalue α; furthermore we see that the approximation becomes betterthe larger n is.

Let

~v =

0BBB@v1

v2

.

..vm

1CCCAbe any non-zero column of An where n is large. Since ~v is an approximate eigenvector ofA i.e. A~v

·= α~v, the numbers

(A~v)j

vj

for which vj 6= 0

differ little from each other and from α; in fact, if ~v were an exact eigenvector then thesenumbers would all coincide with the corresponding eigenvalue (which is close to α).

If we are given a matrix A and asked to find an eigenvalue of A, even if we don’tknow in advance that A has a dominant, regular eigenvalue we could look at the powersof A (it’s easy to do this with a computer) and test a non-zero column ~cn.

First we will look at a 2×2 example (even though we can find the eigenvalues exactlyfor such small matrices) to illustrate this method (called the power method).

Example 7.30. Find an approximate eigenvalue of A =(1 11 2

).

Solution. We have A2 =

(2 33 5

), squaring A2 we get

A4 =

(13 2121 34

), squaring A4 we get

A8 =

(610 987987 1597

).

Page 48: Chapter 7 Powers of Matrices - Queen's U

358 Chapter 7: Powers of Matrices

Suppose we stopped squaring at the third step and chose the first column of A4 asour candidate ~v for an eigenvector of A. How good is ~v ? Is A~v close to a multiple of ~v ?

Here ~v =„

1321

«, A~v =

„3455

«and

{(A~v)1

v1= 34

13

·= 2.61538

(A~v)2v2

= 5521

·= 2.61905

.

We could take either 2.61538 or 2.61905 as our candidate for the corresponding eigenvalue;let us compromise by taking λ to be their average: 1

2(34

13+ 55

21)

·= 2.61722. Then ~v, λ will

be good candidates if A~v is close to λ~v. How close is A~v to λ~v ?

||A~v − λ~v|| =√

(34− 13λ)2 + (5− 21λ)2 <√

0.0005669 + 0.0014793 < 0.046.

Is that small enough? That depends on

(a) how good an approximation is required and

(b) on the size of ~v.To standardize comparisons between different eigenvector candidates (ones coming fromdifferent powers of A) we could take care of (b) by normalizing ~v but A( 1

||~v||)~v−λ( 1||~v||)~v =

( 1||~v||)||A~v−λ~v|| < 0.0022 in our case. If we are satisfied with that degree of approximation,

we stop here. If not, (suppose e.g. we wanted ||A( 1||~v||)~v − λ( 1

||~v||)~v|| < 0.00001) then we

square the power of A we were looking at (in this case A4) and repeat the procedure onthat matrix (in this case A8). Taking the first column of A8 (we could use any non-zerocolumn) as our new candidate for an eigenvector of A we get

~v =„

610987

«, A~v =

„15972584

«, and

{(A~v)1

v1= 1597

610(A~v)2

v2= 2584

987

;

these last two numbers average out to λ.= 2.618033618. To see how good ~v, λ are, we

calculate the tolerance ∆ which is defined by

(24) ∆(~v, λ) =||A~v − λ~v||||~v||

.

Here we get

∆ =1

||~v||

∣∣∣∣∣∣∣∣( 1597− 1597.005072584− 2583.999181

)∣∣∣∣∣∣∣∣ < 1

||~v||(0.0052) < 0.000005

which is good enough since we prespecified ∆ < 0.00001.Moreover, we know that λ ≥ |µ| for all other eigenvalues µ of A (so λ is “almost”

dominant). If, instead of 0.00001, we had specified 0.000001, then we would have to

square A8 and repeat the process. In this case we obtain λ·= 2.618033989 and ∆

·= 0 to

9 decimal places.

Page 49: Chapter 7 Powers of Matrices - Queen's U

Section 7.8: Application: Finding approximate eigenvalues of a matrix 359

Thus we have found one approximate eigenvalue λi·= 2.6180... of A. To find the other

one, we shall use the trace tr(A) of A. Recall (cf. section 7.5) that the

tr(A) = sum of diagonal entries of A

= sum of eigenvalues of A (counted with their multiplicities),

hence in our case tr(A) = λ1 + λ3 = 1 + 2 = 3 hence λ2 = 3 − λ1 = 0.381966011 to 9decimal places.

To see how good an approximation this is, let us compute λ1 and λ2 exactly:

chA(t) = det

(1− t 1

1 2− t

)= (1− t)(2− t)− 1 = t2 − 3t+ 1 = (t− λ1)(t− λ2),

with λ1 = 3+√

9−42

= 32

+ 12

√5.= 2.618033989 and λ2 = 3−

√5

2

.= 0.391966011. Thus, the

approximate eigenvalues which we found agree with the exact eigenvalues up to 9 decimalplaces.

Example 7.31. Find an eigenvalue λ of

A =

1 2 4 11 1 0 32 1 4 1−1 1 2 4

with tolerance ∆(λ,~v) < 0.0001 (cf. (24)).

Solution. The first column of A8 is

~c =

917758558547970125515537

.

Since its entries are a bit too large to deal with comfortably, we choose ~v = 10−5 times~c, i.e.

~v =

9.177585.585479.701255.15537

, and so A~v =

64.3088939.4067467.9010036.43187

.

The corresponding value of λ is

λ =1

4

{64.30889

9.177758+ · · ·+ 36.43187

5.15537

}.= 7.032094636.

To see how good the approximation is, we calculate:

||A~v − λ~v|| ·= 0.4503433331

||~v|| ·= 15.36611666

Page 50: Chapter 7 Powers of Matrices - Queen's U

360 Chapter 7: Powers of Matrices

therefore ∆(λ,~v).= .0450343/15.36611

.= 0.029307 > 0.0001. So we look at 10−2 times

the first column of (10−5A8)(10−5A8), call it ~w. We get

~w =

5.4056743.3325535.7009113.086350

and A~w =

37.96079165

. . .

. . .

. . .

.

The corresponding λ is

λ =1

4

(37.96079165

5.405674+

. . .

3.332533+

. . .

5.700911+

. . .

3.086350

).= 7.022464671.

Since ||A~w − λ~w|| .= 0.00077075 and ||~w|| .= 9.074864, we see that ∆(λ, ~w) = ||A~w −λ~w||/||~w|| .= 0.00077075/9.074864

.= 0.0000849 < 0.0001, and so we can stop here and

take λ, ~w as our approximate eigenvalue and eigenvector.

Power Method Algorithm

Suppose A is any m × m matrix and ε > 0 is the degree of tolerance required for ourapproximation (in our last example ε = 0.0001).

Successively carry out the following steps until ∆ < ε: We will suppose we have triedA,A2, A4, . . . , A2n−1

and are now looking at M = A2n.

1) Let ~v be the first column of M (or any other non-zero column).

2) Let ~y = A~v.

3) Calculate the average of{

yj

vj: vj 6= 0

}= λ.

4) Calculate ∆ = ||~y−λ~v||||~v|| .

5) If ∆ < ε, stop. Then (λ,~v) are the required approximation.

If ∆ > ε, then square A2n, call the new matrix M and do steps 1-5 again.

This is just a rough outline. We have left out the possibility of replacing M by ascalar multiple of M if the entries in M are unwieldy. In addition, some instructionshould be given about when to quit if the method does not seem to be working.

We will point out in the exercises how the power method sometimes can give us aneigenvalue of A even when A doesn’t have a dominant regular eigenvalue. Of course, themethod always works when A has a dominant, regular eigenvalue.

Deflation Method

Having found one eigenvalue of A, we would like to find the rest. One popular method isto replace A by a smaller matrix that contains the rest of the eigenvalues of A and keeprepeating the process until all the eigenvalues are found.

Page 51: Chapter 7 Powers of Matrices - Queen's U

Section 7.8: Application: Finding approximate eigenvalues of a matrix 361

Theorem 7.20. Suppose A = (aij) is an m × m matrix and ~v = (v1, . . . , vn)t is aneigenvector for the eigenvalue α. If v1 6= 0, define the (m− 1)× (m− 1) matrix A2 by

A2 =

a22 a23 . . . a2m

a32 a33 . . . a3m...am2 am2 . . . amm

− 1

v1

v2

v3...vm

(a12, a13, . . . , a1m).

Then we havechA(t) = (t− α) chA2(t).

Thus, if λ2, λ3, . . . , λm denote the eigenvalues of A2 (counted with multiplicities), thenα, λ2, . . . , λm are those of A.

Before we prove the theorem, let us see how it is used.

Example 7.32. Use the deflation method to find the eigenvalues of

A =

1 1 84 4 20 6 4

.

Solution. Since all row sums are 10 we see without further calculation that

A

111

= 10

111

.

So 10 is an eigenvalue and

0@ 111

1A is one of its eigenvectors. Thus

A2 =

(4 2

6 4

)−(

1

1

)(11

)(1, 8) =

(4 2

6 4

)−(

1 8

1 8

)=

(3 − 6

5 − 4

).

The rest of the eigenvalues of A are the eigenvalues of A2 which we find by calculatingthe roots of the characteristic polynomial of A2. Here

chA2(t) = (t− 3)(t+ 4) + 30 = t2 + t+ 18

and its roots are −1±i√

712

, so the eigenvalues of A are

10,−1 + i

√71

2,−1− i

√71

2.

Page 52: Chapter 7 Powers of Matrices - Queen's U

362 Chapter 7: Powers of Matrices

Example 7.33. Find the eigenvalues of A =

1 1 0 10 2 1 01 −1 3 0−1 1 2 1

by the deflation method.

Solution. Since the row sum is 3, we see that 3 is an eigenvalue of A with eigenvector(1, 1, 1, 1)t. Here

A2 =

2 1 0−1 3 0

1 2 1

− 1

11

( 1 0 1)

=

2 1 0−1 3 0

1 2 1

− 1 0 1

1 0 11 0 1

=

1 1 −1−2 3 −1

0 2 0

.

We could use the power method to find the eigenvalues of A2 but instead we shall computeits characteristic polynomial. It is chA2(t) = (t− 2)(t2− 2t+ 3), so the eigenvalues of A2

are 2, 1 + i√

2, 1− i√

2 and hence the eigenvalues of A are 3, 2, 1 + i√

2, 1− i√

2.

Proof of Theorem 7.20. We know that A~v = α~v and v1 6= 0. Define an m×m matrixP by

P =

v1 0 0 . . . 0v2 1 0 . . . 0v3 0 1 . . . 0...

......

. . . 0vm 0 0 . . . 1

.

If we let ~w = (v2, v3, . . . , vm)t, then P =

(v1

~0t

~w I

), where I is the (m − 1) × (m − 1)

identity matrix. The first column of AP is A~v, but A~v = α~v, so

AP =

αv1 a12 . . . . . . . . . . . . a1m

αv2 a22 . . . . . . . . . . . . a2m

αv3 a32 . . . . . . . . . . . . a3m...

......

.... . .

...αvm am2 . . . . . . . . . . . . amm

.

If we put ~c = (a12, . . . , a1m)t, then AP =„

αv1 ~ct

α~w B

«. Moreover,

P−1 =

(v−1

1~0t

−v−11 ~w I

)because, by block multiplication,(

v−11

~0t

−v−11 ~w I

)(v1

~0t

~w I

)=

(v−1

1 v1 +~0tw v−11~0t +~0tI

(−v−11 ~w)v1 + ~w −v−1

1 ~wot + I

)=

(1 ~0t

~0 ~I

),

Page 53: Chapter 7 Powers of Matrices - Queen's U

Section 7.8: Application: Finding approximate eigenvalues of a matrix 363

which is the m×m identity matrix. Therefore

P−1AP =

(v−1

1~0t

−v−11 ~w I

)(αv1 ~ct

α~w B

)=

(α+~0t(α~w) v−1

1 ~ct +~0tB−α~w + α~w −v−1

1 ~wct +B

)=

(α v−1

1 ~ct

~0 A2

).

Since similar matrices have the same characteristic polynomials, we obtain

chA(t) = chP−1AP (t)

= det(tI − P−1AP )

= det

(t− α −v−1

1 ~ct

~0 tI − A2

).

Expanding the minor of the first column of this determinant yields chA(t) = (t −α) det(tI − A2) = (t− α) chA2(t), as desired.

Here is a version of Theorem 7.20 which can be used if the first entry of the eigenvector~v is zero. We omit the proof since it is similar to that of Theorem 7.20.

Theorem 7.20′. Suppose A = (aij) is an m × m matrix, ~v = (v1, . . . , vm)t is aneigenvector for the eigenvalue α and vj 6= 0. Define

A2 = A(j) −(

1

vj

)

v1

v2...

vj−1

vj+1...vm

(aj1, aj2, . . . , ajj−1, ajj+1, . . . , ajm)

where A(j) is the (m − 1) × (m − 1) matrix obtained from A by deleting its jth row andcolumn. Then

chA(t) = (t− α) chA2(t)

and hence α, λ2, λ3, . . . , λm are the eigenvalues of A if λ2, λ3, . . . , λm are those of A2.

Exercises 7.8.

1. Let A =

0 2 −2 40 2 −1 1−1 2 −1 4−1 1 −1 4

. Given that 2 is an eigenvalue of A, use the deflation

method (at least once) to find the spectrum of A.

Page 54: Chapter 7 Powers of Matrices - Queen's U

364 Chapter 7: Powers of Matrices

7.9 Review Exercises

Review Problems, Part A

1. Which of the following matrices is power convergent?

(a) 14

3 −3 10 2 0−1 −5 3

, (b) 14

3 3 10 4 01 −3 3

, (c)

1 −1 00 1 00 −1 1

,

(d)

0 1.9 .90 .9 0.9 .9 0

, (e)

(.8 .4.2 .6

).

For those matrices A which are power convergent, find limn→∞

An.

2. Find the spectral radius of each matrix:

(a)

(2 13 1

); (b)

(1 23 2

); (c)

3 4 5−1 2 1

1 0 1

; (d)

7 3 1 −10 1 2 30 0 −1 40 0 0 −9

.

3. Find the spectral radius of A if chA(t) = (t2 + 6t+ 9)(t2 + 2t+ 11)2.

4. Find the spectral radius of A if

limn→∞

An =

1 1 12 2 23 3 3

.

5. Suppose that A =

2 .5 0 .5.2 i .4 −.1.2 0 1 + i 1

1+i2

.1 0 3

, and that λ is any eigenvalue of A.

(a) Draw the Gersgorin discs Dj(A), j = 1, 2, 3, 4 in the plane.

(b) Use (a) and (b) to obtain upper and lower bounds on Re(λ), Im(λ) and |λ|.(c) Does your answer in (a) give you enough information to decide:

(i) If A is invertible? If so, is it?

(ii) If 13A is power convergent? If so, is it?

Page 55: Chapter 7 Powers of Matrices - Queen's U

Section 7.9: Review Exercises 365

6. Suppose A =

3 1 0 −1−2 5 1 0−1 0 4 1

2 −1 −1 7

.

(a) Use Method 1 (Theorem 7.12) to estimate ρ(A).

(b) Use Method 2 (Theorem 7.14) to estimate ρ(A).

(c) Use Method 3 (Theorems 7.12 and 7.14) to estimate ρ(A).

(d) Do you have enough information from (a), (b), and (c) to decide that

limn→∞

(19A)n

= 0?

Explain briefly.

7. For each of the following matrices T , determine whether the geometric series gen-

erated by T converges. If so, find the sum∞∑

n=1

T n.

(a) T = .3, (b) T =1

4

(1 23 2

), (c) T =

1

5

(1 23 2

), (d) T =

0 1 20 0 30 0 0

.

8. (a) Use the technique of the economics example to evaluate

~w = limn→∞

.2 .4 0 0.4 .4 0 0.3 .1 1 0.1 .1 0 1

n

v1

v2

v3

v4

.

(b) Find two different stochastic vectors such that the vectors ~w calculated in (a)are different.

9. (a) Use the fact that A is a primitive, stochastic matrix to evaluate limn→∞

An when

A =

.5 .2 .1.3 .4 .8.2 .4 .1

.

(b) If ~v = (v1, v2, v3)t is an arbitrary stochastic vector, find lim

n→∞An~v.

(c) Is there a pair of stochastic vectors ~v, ~w such that

limn→∞

An~v 6= limn→∞

An ~w?

Page 56: Chapter 7 Powers of Matrices - Queen's U

366 Chapter 7: Powers of Matrices

10. Suppose we alter the game of “craps” so that the player wins (instead of losing) ifhe rolls a 2 at the first stage of the game.

(a) Find the initial vector ~v0 for the Markov chain describing the new game.

(b) Find the probability that the player wins eventually under this new rule.

11. If A is an n× n stochastic matrix and At is also stochastic, show that

limk→∞

Ak =1

n(Σt

n| . . . |Σn) =1

n

1 1 · · · 11 1 · · · 1...

......

1 1 · · · 1

.

12. Players A and B play the following game:

Player A begins the game with $1 and B begins with $3. A rolls two dice. If Arolls 3, 5, or 6 then A wins $1. If A rolls 2 then A wins $2 (or $1, if that is all themoney that B has). If A rolls any other number then B wins $1. Playe A continuesto roll until A or B has no money left. Find the probability that A wins eventually.

Suggestion. Use a Markov chain with 5 states:

s1: A has $1s2: A has $2s3: A has $3s4: A has $4 (i.e. B has no more money)s5: A has $0 (i.e. A has no more money)

Sample game:

Beforeroll no.

A has B hasThenA rolls

0 $1 $3 2 — A wins $2 from B1 3 1 9 — A loses $1 to B2 2 2 6 — A wins $1 from B3 3 1 4 — A loses $1 to B4 2 2 3 — A wins $1 from B5 3 1 2 — so A wins only $1 from B6 4 0 — — B loses: game over

13. Given that ~v = (0, 2, 0, 1)t is an eigenvector of A =

0 −1 1 22 4 −1 −24 2 1 −4−3 −1 1 5

:

(a) Find the eigenvalue of A belonging to the given vector.

(b) Find the spectrum of A by using deflation at least once.

Page 57: Chapter 7 Powers of Matrices - Queen's U

Section 7.9: Review Exercises 367

Review Problems, Part B

1. Find

limn→∞

.1 0 0 00 .2 0 0.8 .4 1 0.1 .4 0 1

n

and limn→∞

.1 0 0 0 00 .2 0 0 00 0 .3 0 0.8 .4 .3 1 0.1 .4 .4 0 1

n

.

2. Suppose

A =

x 1 1 1 10 .5 1 2 10 0 1 0 00 0 0 1 00 0 0 0 .7

For which values of x (if any) is A power convergent? Explain briefly.

3. Suppose that the characteristic polynomial of an m ×m matrix A is (t − 1)2(t +1)3(t2 + 4)(t− 2) and B =

(12

)A.

(a) Find m and the trace of A.

(b) Find a spectrum of B.

(c) Is B power-convergent? Justify your answer.

4. Suppose

A =

−1 + 3i 0 1 2

0 1 + 3i 0.3 0.20.5 −2 3.5 0

0 1 −1.5 3

Apply Gersgorin’s theorem to A and the transpose of A,

(a) to estimate the minimum of the real parts of the eigenvalues of A,

(b) to estimate the maximum of the real parts of the eigenvalues of A.

5. Suppose A =

1 + i −1 .2.2 1− i −1.3 .2 −1

, and λ is any eigenvalue of A.

(a) Sketch the Gersgorin discs Dj(A), j = 1, 2, 3.

(b) Obtain upper and lower bounds on |λ|,Re(λ), and Im(λ).

c) Does the information above enable you to decide if A is invertible?

d) Does the information above enable you to decide if(

12

)A is power convergent?

Page 58: Chapter 7 Powers of Matrices - Queen's U

368 Chapter 7: Powers of Matrices

6. Suppose that the characteristic polynomial of an m × m matrix A is (t2 + 6t +9)(t2 + 2t+ 10)(t2 + 2t− 8).

(a) Find m, the determinant and trace of A.

(b) If r is the spectral radius of A, is(

1r

)A power convergent? Justify your answer.

7. Suppose A =

1 1 1 0−1 1 2 1−1 1 2 −1

1 −2 1 1

and B = (14)A.

(a) Estimate the spectral radius of A by any of the methods of the text.

(b) Show that B is power-convergent.

8. Let A =

0.8 0.3 0.20 0.2 00 0.5 0.4

(a) Find the spectral radius of A.

(b) Solve the equation∞∑

n=0

Anx =

111

9. Let A =

3i 1 1− 4

3i 0

0 0 2 i−1 1 3 −1

1 0 0 −2

.

(a) For each of the Gersgorin discs of A, write down its centre and radius.

(b) Give a rough sketch of the above discs.

(c) Use Gersgorin’s theorem to estimate the spectral radius ρ(A) of A.

10. Alex, Barbara and Charles are playing a game which has been analysed as a Markovprocess with 5 states: 1 - Alex plays, 2 - Barbara plays, 3 - Charles plays, 4 - Alexwins, 5 - Alex loses. The matrix, M , of transition probabilities and the matrixM∗ = lim

n→∞Mn are given below.

M =

38

38

18

0 018

38

38

0 038

18

38

0 018

0 0 1 0

0 18

18

0 1

, M∗ =

0 0 0 0 0

0 0 0 0 0

0 0 0 0 01126

413

726

1 01526

913

1926

0 1

Page 59: Chapter 7 Powers of Matrices - Queen's U

Section 7.9: Review Exercises 369

1) What is the chance that Charles will have the next turn if Alex is playingnow?

2) What is the chance that Alex will have the next turn if Charles is playingnow?

3) If Alex is now allowed to start the game, whom should he prefer to have startit? Give your reason.

4) What are the chances that Alex will win if the starting player is chosen atrandom from among the three of them?

5) If M is written as

(A OB I

)how is M∗ written in terms of A,B, I, O?

6) Fill in the last three rows of a new matrix N which represents the analysismade when the state “Alex loses” is broken up into 5 - “Barbara wins” and6 - “Charles wins”. You need to know that only the person actually playingcan win on that turn.

N =

38

38

18

0 0 018

38

38

0 0 038

18

38

0 0 0

? ? ? 1 0 0

? ? ? 0 1 0

? ? ? 0 0 1

11. Find limn→∞

An where A is the column (and row) stochastic matrix

.8 .1 .1.2 .7 .10 .2 .8

.

12. Let A =

4 1 1 11 3 0 01 0 1 01 0 0 1

.

(a) Show that 1 is an eigenvalue of A.

(b) Without calculating the characteristic polynomial of A, show that the eigen-values of A are real numbers lying in the interval 0 < x ≤

√33.

(c) By using the deflation method, find a 3 × 3 matrix B such that f(x) = (x −1)g(x) where f(x) and g(x) are the characteristic polynomials of A and Brespectively.

(d) Given that the other two eigenvalues of B are 12(3 ±

√5), why does L =

limn→∞

(1

5A

)n

exist? Find L.

Page 60: Chapter 7 Powers of Matrices - Queen's U

370 Chapter 7: Powers of Matrices

13. Suppose A =

0 4 4 42 1 5 72 2 6 −71 1 3 3

. Use deflation at least once to find a spectrum of

A, given that A

21−1

0

=

0000

.

14. Let A =

−1 1 32

−1 0 0−3

232

52

.

(a) Show that

1−1

2

is an eigenvector of A belonging to eigenvalue 1.

(b) Using the method of deflation find all other eigenvalues.

(c) Is A power convergent? If yes find limn→∞

An.

15. Let A =

0 1 00 0 1

12 −16 7

, ~v =

124

.

(a) Verify by direct matrix multiplication that ~v is a 2-eigenvector of the matrixA.

(b) Apply deflation and find the remaining eigenvalues of A.