Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Chapter 7
Powers of Matrices
7.1 Introduction
In chapter 5 we had seen that many natural (repetitive) processes such as the rabbitexamples can be described by a discrete linear system
(1) ~un+1 = A~un, n = 0, 1, 2, . . . ,
and that the solution of such a system has the form
(2) ~un = An~u0.
In addition, we had learned how to find an explicit expression for An and hence for ~un
by using the constituent matrices of A.In this chapter, however, we will be primarily interested in “long range predictions”,
i.e. we want to know the behaviour of ~un for n large (or as n→∞).In view of (2), this is implied by an understanding of the limit matrix
A∞ = limn→∞
An,
provided that this limit exists. Thus we shall:
1) Establish a criterion to guarantee the existence of this limit (cf. Theorem 7.6);
2) Find a quick method to compute this limit when it exists (cf. Theorems 7.7 and7.8).
We then apply these methods to analyze two application problems:
1) The shipping of commodities (cf. section 7.7)
2) Rat mazes (cf. section 7.7)
As we had seen in section 5.9, the latter give rise to Markov chains, which we shall studyhere in more detail (cf. section 7.7).
312 Chapter 7: Powers of Matrices
7.2 Powers of Numbers
As was mentioned in the introduction, we want to study here the behaviour of the powersAn of a matrix A as n→∞. However, before considering the general case, it is useful tofirst investigate the situation for 1×1 matrices, i.e. to study the behaviour of the powersan of a number a.
(a) Powers of Real Numbers
Let us first look at a few examples.
Example 7.1. Determine the behaviour of an for a = 2,−12,−1.
(a) a = 2
n 1 2 3 4 5 · · ·an 2 4 8 16 32 · · ·
sequence diverges(limit doesn’t exist)
(b) a = −12
n 1 2 3 4 5 . . .
an −12
14−1
8116− 1
32. . .
limn→∞
(−12)n = 0
converges to 0
(c) a = −1
n 1 2 3 4 5 · · ·an −1 1 −1 1 −1 · · ·
limit doesn’t exist(bounces back and forth)
These three examples generalize to yield the following facts.
Theorem 7.1. Let a ∈ R.
(a) If |a| < 1, then limn→∞
an = 0.
(b) If |a| > 1, then the sequence {an} diverges.
(c) If |a| = 1, then limn→∞
an exists if and only if a = 1 (and then limn→∞
an = 1).
(b) Powers of Complex Numbers
Next, we consider the behaviour of the powers cn of a complex number c = a+ bi. Herewe first need to clarify the notion of the limit of a sequence {cn}n≥1 of complex numbers.
Definition. If cn = an + ibn, n = 1, 2, . . . , is a sequence of complex numbers such that
a = limn→∞
an and b = limn→∞
bn
exist, then we say that c = a+ ib is the limit of {cn}n≥1 and write
limn→∞
cn = a+ ib.
Section 7.2: Powers of Numbers 313
Remarks. 1) If either limn→∞
an or limn→∞
bn does not exist, then neither does limn→∞
cn.
2) The following rule is gives an alternate definition of the limit of the sequence{cn}n≥1 (and is sometimes useful for calculating the limit c):
limn→∞
cn = c ⇔ limn→∞
|cn − c|︸ ︷︷ ︸complex
absolute value
= 0.
[To see this equivalence, write cn = an+ibn and c = a+ib, so |cn−c|2 = (an−a)2+(bn−b)2.Then lim cn = c⇔ lim an = a and lim bn = b⇔ lim |an − a| = 0 and lim |bn − b| = 0⇔lim(an− a)2 = 0 and lim(bn− b)2 = 0⇔ lim(an− a)2 + (bn− b)2 = 0⇔ lim |cn− c| = 0.]
3) We also observe that if c = limn→∞
cn exists, then we have
limn→∞
|cn| = |c|.
[For lim |cn|2 = lim(a2n + b2n) = lim a2
n + lim b2n = a2 + b2 = |c|2, and hence lim |cn| = |c|.]
Theorem 7.1 naturally extends to complex numbers in the following way.
Theorem 7.2. Let α ∈ C.(a) If |α| < 1 then lim
n→∞αn = 0. “The powers spiral in to 0.”
(b) If |α| > 1, then limn→∞
αn =∞. “The powers spiral out to ∞.”
(c) If |α| = 1, then limn→∞
αn does not exist except if α = 1.
Proof. (a) If |α| < 1, then lim |αn| = lim |α|n = 0, and hence limαn = 0 by Remark2) above (with c = 0). (Here we have used the multiplicative rule |αn| = |α|n of theabsolute value.)
(b) If |α| > 1, then the limit lim |αn| = lim |α|n =∞ does not exist, and hence neitherdoes the limit limαn by Remark 3) above.
(c) Suppose that β := limαn exists and that |α| = 1. Then |β| = lim |αn| = lim |α|n =1, so β 6= 0. Thus we have αβ = α limαn = limαn+1 = β, so αβ = β and hence α = 1.
Corollary. limn→∞
αn exists if any only if α = 1 or |α| < 1. Moreover,
limn→∞
αn = 0 if |α| < 1.
Example 7.2. Analyze the behaviour of the powers of α = 56
+ 512i, β = 11
12+ 11
24i and
γ = cos( 2π100
) + i sin( 2π100
).
Solution. Since |α| = 512
√5.= .931694990 < 1, the powers αn spiral in to 0, as is shown
in figure 7.1.Since |β| = 11
24
√5.= 1.024864490 > 1, the powers βn spiral out to ∞, as is evident in
figure 7.1.Finally, since |γ| = 1, the powers γn run around on the unit circle (cf. figure 7.1).
314 Chapter 7: Powers of Matrices
−1.5 −1 −.5 .5 1 1.5
−1.5
−1
−.5
.5
1
1.5
α rrα2
rrr
rr
r r rα10
r rrrrrrrrrα20 r r r r r rrr
rrα30rrrrrr r rrrrrrrrrrrrr
βbb β2
bbb
b
b
b
bbβ10
bb
b
b
b
bbb
b
bβ20
a γaa γ2aaa
aaaaaaaaaaaaaaa γ20aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a aγ90a a a
a aaaaaa
Figure 7.1: Powers of the numbers α = 512
(2+i), β = 1124
(2+i) and γ = cos( 2π100
)+i sin( 2π100
)
Exercises 7.2.
1. Calculate the following limits whenever they exist:
(a) limn→∞
1 + (−1)ni
1 + 2ni; (b) lim
n→∞
5 + (−1)ni
1 + 210000i; (c) lim
n→∞
(−2)n + i
1 + 2n;
(d) limn→∞
n∑k=0
(1 + i
2
)k
; (e) limn→∞
(n∑
k=0
eki(1+i)
).
2. (a) Find the first n for which |(3 + 4i)n| > 107.
(b) Find the first n for which |(.3 + .4i)n| < 10−7.
3. Write out a detailed proof of Theorem 7.1.
Section 7.3: Sequences of powers of matrices 315
7.3 Sequences of powers of matrices
We now extend the results of the previous section about limits of powers of numbers tolimits of powers of matrices. Before doing so, we first have to clarify the meaning of alimit for matrices.
Definition. Let A1, A2, A3, . . . be a sequence of m× n matrices. Then we say that
A = limk→∞
Ak
if we have A = (aij) and limk→∞
a(k)ij = aij, for all i, j with 1 ≤ i ≤ m and 1 ≤ j ≤ n, where
Ak =
a(k)11 . . . a
(k)1n
......
a(k)m1 . . . a
(k)mn
.
Example 7.3. (a) If Ak =
(1k
11k2
k+1k
), then lim
k→∞Ak =
(0 10 1
)because
limk→∞
1
k= 0 = lim
k→∞
1
k2and lim
k→∞
k + 1
k= 1.
(b) limk→∞
((−1)k 0
0 1
)does not exist!
Theorem 7.3 (Rules for limits). Suppose that limk→∞
Ak = A exists and let B,C be ma-
trices such that the product CAkB is defined. Then:
(a) limk→∞
(AkB) = AB,
(b) limk→∞
(CAk) = CA,
(c) limk→∞
(CAkB) = CAB.
Proof. Use the definition of matrix multiplication and properties of (usual) limits.
Definition. A square matrix A is called power convergent if limk→∞
Ak exists.
Example 7.4. (a) limk→∞
12
0 00 1
30
0 0 14
k
= limk→∞
(12)k 0 00 (1
3)k 0
0 0 (14)k
=
0 0 00 0 00 0 0
, so A
is power convergent.
(b) A =„
1 00 −1
«is not power convergent, for Ak =
„1 00 (−1)k
«, and we know that
the sequence (−1)k does not converge.
316 Chapter 7: Powers of Matrices
In view of Theorem 7.2, the above examples show the following general rule:
Theorem 7.4. A diagonal matrix A = Diag(λ1, λ2, . . . , λm) is power convergent if andonly if for each i with 1 ≤ i ≤ m we have:
(3) either |λi| < 1 or λi = 1.
More generally, a diagonable matrix A is power convergent if and only if each eigenvalueλi of A satisfies (3).
Proof. Since A is a diagonable, there is a matrix P such that D := P−1AP is a diagonalmatrix D = Diag(λ1, . . . , λm). Thus, for every k ≥ 1 we have:
Ak = (PDP−1)k = PDkP−1, by Theorem 6.1.
Thus, by Theorem 7.3:
limk→∞
Ak = limk→∞
PDkP−1 Th. 7.3= P ( lim
k→∞Dk)P−1 = P ( lim
k→∞Diag(λk
1, . . . , λkm))P−1.
This means that
A is power convergent ⇔ D is power convergentTh. 7.2⇔ (3) holds for all λi’s.
Remark. Note that the above proof also shows the following more general fact: If Aand B = P−1AP are two similar matrices, then A is power convergent ⇔ B is powerconvergent. Moreover, if this is the case then we have
limk→∞
Ak = P ( limk→∞
Bk)P−1.
When is a general matrix power convergent? Following the strategy introduced insection 6.6, we first analyze this question for Jordan blocks, then for Jordan matricesand finally for arbitrary matrices.
Theorem 7.5. A Jordan block J = J(λ,m) is power convergent if and only if
(4) either: |λ| < 1 or: λ = 1 and m = 1.
Example 7.5. a) If J = J(12, 2) =
(12
10 1
2
), then
Jk =
((1
2)k k(1
2)k−1
0 (12)k
)→(
0 00 0
).
Thus, J is power convergent and converges to 0.
b) If J = J(1, 2) =(1 10 1
), then Jk =
(1 k0 1
), so the sequence does not converge, i.e. J is
not power convergent.
Section 7.3: Sequences of powers of matrices 317
Proof of Theorem 7.5. Recall from chapter 6, Theorem 6.5, that
J(λ,m)n =
λn nλn−1 · · · · · ·(
nm−1
)λn−m+1
0 λn . . ....
.... . . . . . . . .
......
. . . . . . nλn−1
0 · · · · · · 0 λn
,
i.e. J(λ,m)n = (a(n)ij ), where a
(n)ij =
{0 if i > j(
nj−i
)λn−(j−i) if i ≤ j
.
We now check the following four different cases:
Case 1: |λ| ≥ 1 (but λ 6= 1)
Since the sequence a(n)11 = λn doesn’t converge for n → ∞ (cf. Theorem 7.2), neither
does J(λ,m)n.
Case 2: λ = 1 and m > 1Since the sequence a
(n)12 = nλn−1 = n does not converge for n → ∞, hence neither does
J(λ,m)n. (Note that since m > 1, this entry is actually present.)
Case 3: λ = 1 and m = 1The sequence of 1× 1 matrices {(1)n} clearly converges to the 1× 1 matrix J∞ = (1).
Case 4: |λ| < 1
In this case we shall show that {a(n)ij } converges for all i, j. In view of the above formula
for a(n)ij , this follows once we have shown:
Claim. |λ| < 1⇒ limn→∞
(n
a
)λn−a = 0 for all a ∈ Z (a ≥ 0).
Proof of claim. Since(
na
)= n(n−1)...(n−a+1)
k!≤ n · n · . . . · n = na we have∣∣∣∣(na
)λn−a
∣∣∣∣ ≤ |naλn−a| = 1
|λ|a|naλn| = 1
|λ|anaen log |λ|.
We now use the following fact from Analysis:
limt→∞
tae−tb = 0, if a, b > 0,
i.e. exponentials grow much faster than polynomials. This is applicable here because−b = log |λ| < 0 (since |λ| < 1), and so we see that(
n
a
)λn−a → 0, as n→∞.
This, therefore, proves the claim. Putting these four cases together yields the assertionof Theorem 7.5.
318 Chapter 7: Powers of Matrices
For future reference, we note that in the course of the above proof we had also shown:
Corollary. If |λ| < 1, then limn→∞
J(λ,m)n = 0.
We now want to extend Theorem 7.5 to general matrices. To state the result in aconvenient form, we first introduce the following two definitions.
Definition. An eigenvalue λi of a matrix A is called regular if its geometric multiplicityis the same as its algebraic multiplicity; i.e. if mA(λi) = νA(λi).
Remark. Actually, this concept had already been introduced in section 6.2. Recall thatwe had seen there that an eigenvalue λi of a matrix A is regular if and only if all theJordan blocks with eigenvalue λi of its associated Jordan canonical form J are of size1× 1; cf. Corollary 3 of Theorem 6.4.
In particular, A is diagonable if and only if all of its eigenvalues are regular (cf.Theorem 6.4, Corollary 2′.).
Example 7.6. If A =(5 00 5
), then 5 is a regular eigenvalue: νA(5) = mA(5) = 2;
if B =(5 10 5
), then 5 is not a regular eigenvalue: νB(5) = 1,mB(5) = 2.
Definition. An eigenvalue λi of A is called dominant if we have
|λj| < |λi|, for every eigenvalue λj 6= λi of A.
Example 7.7. a) If A = Diag(1, 2, 3, 4, 4), then 4 is a dominant eigenvalue of A.
b) If A = Diag(1, 2, 3, 4,−4), then there is no dominant eigenvalue because |−4| = |4|.c) If A = Diag(1, i), there is no dominant eigenvalue.
We can now characterize power convergent matrices as follows.
Theorem 7.6. A square matrix A is power convergent if and only if
(5)
{either: |λi| < 1 for all eigenvalues λi of A,
or: 1 is a regular, dominant eigenvalue of A.
Example 7.8. a) Determine whether A =
(0 −11 0
)is power convergent.
Here chA(t) = t2 + 1 = (t− i)(t+ i), so λ1 = i, λ2 = −i. Thus A is not power convergentsince |λ1| = |λ2| = 1. [In fact: A2 = −I, A4 = I so A is clearly not power convergent.]
b) A =
1 0 00 1
21
0 0 12
is power convergent: since chA(t) = (t− 1)(t− 12)2, we see that
λ1 = 1 is a dominant, regular eigenvalue (the latter because νA(λ1) = mA(λ1) = 1).
Proof of Theorem 7.6. Step 1. The theorem is true for Jordan blocks J(λ,m) byTheorem 7.5.
Section 7.3: Sequences of powers of matrices 319
Step 2. It is true for Jordan matrices J = Diag(J11, . . . ) because it is true for each blockby step 1; cf. Problem 4 of Exercises 7.3.
Step 3. If A is an arbitrary matrix, then by Jordan’s theorem A = PJP−1, for someJordan matrix J . Thus, by the remark on p. 316 we see that A is power convergent⇔ J
is power convergentstep 2⇔ (5) holds for J ⇔ (5) holds for A.
Exercises 7.3.
1. Which of the following matrices are power convergent? Justify your answer.
12
(1 1−1 3
); 1
2
(0 1−2 3
); 1
3
1 1 10 3 24 −2 1
; 13
5 −8 24 −7 20 0 1
.
2. Determine which of the following matrices are power convegent.
A = 12
0 0 1 10 0 −1 30 1 0 0−2 3 0 0
B =
0 0 0 10 0 −1 00 1 0 02 0 0 0
.
3. (a) Is the matrix A = 12
1 0 1 00 1 0 11 0 1 00 1 0 1
power convergent? Explain.
(b) Find all α ∈ C such that the matrix B =
0 1 00 0 1α3 −3α2 3α
is power conver-
gent. Justify your answer!
4. (a) Show that A = Diag(B,C) is power convergent if and only if both B and C arepower convergent.
(b) Show that A = Diag(A1, . . . , Ar) is power convergent if and only if each Ai,1 ≤ i ≤ r, is power convergent.
5. Show that if A is a power convergent matrix then | det(A)| ≤ 1.
6. (a) If A is an integral 2× 2 matrix which is power convergent, prove that its onlyeigenvalues are 0 and 1.
(b) Use part (a) to write down all integral 2×2 matrices which are power convergent.
320 Chapter 7: Powers of Matrices
7.4 Finding limn→∞
An (when A is power convergent)
Having analyzed when A is power convergent, i.e. when the limit limn→∞
An exists, we now
want determine this limit.
Naive method: find an explicit formula for An and use it to calculate the limit of eachentry.
Better method: use the spectral decomposition theorem (without calculating all theEik’s explicitly).
Example 7.9. If A =
(0 1−2 3
), B = 1
3A and C = 1
2A, find lim
n→∞Bn and lim
n→∞Cn.
Solution. (a) Here chA(t) = (t− 1)(t− 2),
⇒ chB(t) = (t− 13)(t− 2
3)
⇒B is power convergent and Bn = (13)nE10 + (2
3)nE20
⇒ limn→∞Bn = 0 since (12)n, (1
3)n → 0.
(b) As before, chA(t) = (t− 1)(t− 2),
⇒ chC(t) = (t− 12)(t− 2
2)
⇒C is power convergent and Cn = (12)nE10 + 1nE20
⇒ limn→∞Cn = E20.
Thus, to find the limit, it is enough to calculate E20. Using the formula for the con-stituent (polynomials and) matrices (cf. (9) and (13)) we obtain E20 = 1
λ2−λ1(A−λ1I) =(
−1 1−2 2
), and hence limn→∞An =
(−1 1−2 2
).
Note that in the above example, we were able to calculate the limits without first cal-culating Bn and Cn in detail. Generalizing this idea leads to the following two theorems.
Theorem 7.7. If |λi| < 1 for every eigenvalue of A, then A is power convergent andlimn→∞An = 0.
Proof. By the Corollary of Theorem 7.5, this is true for Jordan blocks. Using the sameargument as in the proof of Theorem 7.6, the assertion follows for an arbitrary matrix.
Theorem 7.8. If λ1 = 1 is a regular, dominant eigenvalue of A, then A is powerconvergent and we have
limn→∞
An = E10 6= 0,
where E10 is the first constituent matrix of A associated to λ1 = 1. Moreover, the otherconstituent matrices E1k associated to λ1 = 1 are equal to zero:
E11 = · · · = E1,m1−1 = 0.
Section 7.4: Finding limn→∞
An (when A is power convergent) 321
Corollary. limn→∞
An = 0 ⇔ |λi| < 1, for all eigenvalues λi of A.
Proof. (⇐) Theorem 7.7.
(⇒) If limn→∞An = 0, then A is in particular power convergent. By Theorem 7.6, Asatisfies either the hypothesis of Theorem 7.7 or 7.8. However, in the latter case we havelimAn 6= 0, so we must be in the situation of Theorem 7.7.
It remains to prove Theorem 7.8. Before proving it, however, let us illustrate it withthe following example.
Example 7.10. Find limn→∞
An when A = 14B and B =
3 0 0 11 3 −1 −11 −1 3 −11 0 0 3
.
Solution. Expanding det(B−tI) along the 1st or last row yields chB(t) = (t−4)2(t−2)2,and so chA(t) = (t− 1)2(t− 1
2)2. Row reducing B − 4I yields
B − 4I =
−1 0 0 1
1 −1 −1 −11 −1 −1 −11 0 0 −1
↔
1 0 0 −10 −1 −1 00 0 0 00 0 0 0
,
and so we see that νA(1) = νB(4) = 4 − 2 = 2 = mA(1); thus, 1 is a regular eigenvalue.Moreover, 1 is also dominant (because |λ2| = 1
2< 1), so A power convergent. Thus, by
Theorem 7.8 we have limn→∞An = E10 (and E11 = 0).To find the limit E10, write down the spectral decomposition theorem:
f(A) = f(1)E10 + f ′(1) E11︸︷︷︸0
+f(12)E20 + f ′(1
2)E21, for every f ∈ C[t].
Choose f such that f(12) = f ′(1
2) = 0, i.e. f(t) = (2t− 1)2. Then:
(2A− I)2 = f(1)︸︷︷︸(2·1−1)2
E10 + f ′(1) E11︸︷︷︸0
+0E20 + 0E21 ⇒ E10 = (2A− I)2.
Thus limn→∞
An =E10 = (2A− I)2 = 14( 4A︸︷︷︸
B
−2I)2
= 14
1 0 0 11 1−1−11−1 1−11 0 0 1
2
= 14
2 0 0 20 2−2 00−2 2 02 0 0 2
.
To prove Theorem 7.8, we first verify the following fact:
Theorem 7.9. If λ1 is a regular eigenvalue of A, then its associated constituent matricesE10, . . . , E1,m1−1 satisfy:
E10 6= 0 and E11 = . . . = E1,m1−1 = 0.
322 Chapter 7: Powers of Matrices
Proof. We follow the usual strategy by considering first (special) Jordan matrices, andthen arbitrary matrices.
Case 1. A = J is a Jordan matrix with only one (regular) eigenvalue λ1.
Then, since λ1 is a regular eigenvalue, we have J = λ1I, and so for every f ∈ C[t]
f(J) = f(λ1)I = f(λ1)I + f ′(λ1) · 0 + . . .+ f (m−1)(λ1) · 0.
On the other hand, by the spectral decomposition theorem we have
f(J) = f(λ1)E10 + f ′(λ1)E11 + . . .+ f (m1−1)(λ)E1m1−1, for all f ∈ C[t],
and so it follows thatE10 = I, E11 = 0, . . . , E1m1−1 = 0
either by appealing to the uniqueness property of the spectral decomposition or, moredirectly, as follows.
Take f(t) = 1. Then f(λ) = 1, f ′(λ) = · · · = f (m1−1)(λ) = 0, so I = f(λ)I = f(J) =1 · E10 + 0 · E11 + · · ·+ 0E1m1−1 which means that E10 = I.
Next, take f(t) = (t − λ)k, (k > 0). Then f (k)(λ) = k! and f (l)(λ) = 0 for l 6= k, sok!Elk = f(J) = f(λ)I and hence Elk = 0.
Case 2. A = Diag(J1, J2), where J1 = λ1I and λ1 is not an eigenvalue of J2.
Then EA1k = Diag(EJ1
1k , 0) (because λ1 is not an eigenvalue of J2; cf. Theorem 7.11(b)below). But by case 1 we have EJ1
1k = 0, for 1 ≤ k ≤ m1 − 1, and so it also follows thatEA
1k = 0 for k ≥ 1. Moreover, EA10 = Diag(I, 0) 6= 0. Thus, the assertion is true in this
case as well.
Case 3. A arbitrary.
By Jordan’s theorem (and the fact that λ1 is regular), there is a matrix P such that
J = P−1AP =
(λ1I 00 J2
)has the form of case 2. Thus, since A = PJP−1, we have for 1 ≤ k ≤ m1 − 1 that
EA1k = PEJ
1kP−1 case 2
= P · 0 · P−1 = 0.
Moreover, EA10 = PEJ
10P−1 = P Diag(Im1 , 0)P−1 6= 0, and so the theorem follows.
Corollary. If λi is a regular eigenvalue of A, then
Ei0 =1
gi(λi)gi(A), where gi(t) =
chA(t)
(t− λi)mi.
Proof. Take f(t) = gi(t) in the spectral decomposition formula. Since gi(λj) = 0 ifλj 6= λi and Eik = 0 for k ≥ 1 by Theorem 7.9, the formula reduces to gi(A) = gi(λi)Ei0.Thus, the assertion follows since gi(λi) 6= 0.
Section 7.4: Finding limn→∞
An (when A is power convergent) 323
Proof of Theorem 7.8. Let λ1 = 1, λ2, . . . , λs denote the distinct eigenvalues of A andm1, . . . ,ms their respective algebraic multiplicites. Then by the spectral decompositiontheorem applied to f(t) = tn we obtain
An =s∑
i=1
mi−1∑k=0
k!
(n
k
)λn−k
i Eik
because f (k)(t) = n(n − 1) · · · (n − k + 1)tn−k = k!(
nk
)tn−k. By hypothesis, λ1 = 1 is a
regular eigenvalue and so E11 = . . . = E1,m1−1 = 0 by Theorem 7.9. Thus, since λn1 = 1
for all n we obtain
An = E10 +s∑
i=2
mi−1∑k=0
k!
(n
k
)λn−k
i Eik.
Now since λ1 = 1 is a dominant eigenvalue, we have |λi| < 1 for all i > 1, and so by theclaim of the proof of Theorem 7.5 the coefficients of the above sum tend to 0 as n→∞.Thus, limn→∞An = E10, and E10 6= 0 by Theorem 7.9.
The above formula for Ei0 (cf. Theorem 7.9, Corollary) can be used to quickly computethe limit lim
n→∞An in many cases.
Example 7.11. Find limn→∞
An when A = 14
3 0 −10 4 0−1 0 3
.
Solution. First note that since A is real and symmetric, it is diagonable by the PrincipalAxis Theorem 5.4, and so we could compute the limit by diagonalizing A. However, itis quicker to use Theorem 7.9 and the above corollary; the latter is applicable since alleigenvalues of a diagonable matrix are regular.
The characteristic polynomial of A is
chA(t) = (−1)3 det(A− tI) = − det
1
4
3− 4t 0 −10 4− 4t 0−1 0 3− 4t
= − 1
43 ((4− 4t)(3− 4t)2 − (4− 4t)) = (t− 1)2(t− 12).
Thus, 1 is a dominant, regular eigenvalue of A. Since g1(t) = chA(t)(t−1)2
= t − 12, it follows
from Theorem 7.8 and the above corollary to Theorem 7.9 that
limn→∞
An = E10 =g1(A)
g1(1)= 2A− I = 1
2
1 0 −10 2 0−1 0 1
.
In the case that λi is a simple eigenvalue, there is a more direct formula for Ei0. Here,the term “‘simple” means:
324 Chapter 7: Powers of Matrices
Definition. An eigenvalue λi of A is called simple if its algebraic multiplicity is 1; i.e. ifmA(λi) = 1.
Note. Clearly, each simple eigenvalue is regular. [Indeed, if λ is a simple eigenvalue then1 = mA(λ) ≥ νA(λ) ≥ 1, so mA(λ) = νA(λ), i.e. λ is regular.]
Before giving the formula for Ei0 when λi is simple, we first note
Observation. A square matrix A and its transpose At have the same eigenvalues withthe same algebraic and geometric multiplicities; i.e. we have
mAt(λ) = mA(λ) and νAt(λ) = νA(λ), for all λ ∈ C.
Indeed, since det(At) = det(A), we see that chAt(t) = chA(t), and so the algebraicmultiplicities are the same. Furthermore, νAt(λ) = n − rank(At − λI) = n − rank((A −λI)t) = n− rank(A−λI) = νA(λ), and so the geometric multiplicities are also the same.
In fact, one can also show that A and At are similar (see Problem 4 of the exercises),so they actually have the same Jordan canonical form.
Theorem 7.10. If A is power convergent and 1 is a simple eigenvalue of A, then
limn→∞
An = E10 =1
~ut · ~v︸ ︷︷ ︸scalar
~u · ~vt︸ ︷︷ ︸matrix
,
where: ~u ∈ EA(1) is any non-zero 1-eigenvector of A, and~v ∈ EAt(1) is any non-zero 1-eigenvector of At.
Proof. Let P be such that J = P−1AP is a Jordan matrix, which we may take to be ofthe form J = Diag(J(1, 1), . . .). Then, since |λi| < 1 for i > 1, we see that
(6) limn→∞
An = P limn→∞
JnP−1 = P
0BBBBB@1 0 . . . 0
0 0...
.
.
.. . .
. . ....
0 . . . . . . 0
1CCCCCAP−1 = P~e1~e
t1P
−1,
where ~e1 = (1, 0, . . . , 0)t. Now since ~e1 is a 1-eigenvector of J , it follows that ~x := P~e1 (=the first column of P ) is a 1-eigenvector of A because A~x = AP~e1 = PJ~e1 = P~e1 = ~x.
Similarly, ~yt := ~et1P
−1 (= the first row of P−1) is (the transpose of) a 1-eigenvector ofAt. Indeed, since ~et
1J = ~et1, we see that ~ytA = ~et
1P−1A = ~et
1JP−1 = ~et
1P−1 = ~yt. Taking
the transpose yields At~y = ~y, which means that ~y is a 1-eigenvector of At. Thus by theabove equation (6) we obtain
limn→∞
An = P~e1~et1P
−1 = ~x~yt =1
~xt~y~x~yt,
the latter because ~xt~y = ~yt~x = ~et1P
−1P~ei = ~et1~e1 = 1. Thus, the theorem holds if we take
~u = ~x and ~v = ~y.
Section 7.4: Finding limn→∞
An (when A is power convergent) 325
Now if we make another choice of ~u, then we must have ~u = k~x for some k 6= 0because the 1-eigenspace of A is one-dimensional. (Recall that the algebraic multiplicitymA(1) = 1 by hypothesis and so the corresponding geometric multiplicity is also νA = 1.)Similarly, any other choice of ~v has the form ~v = c~y for some c 6= 0.
We then have ~ut~v = (k~xt)(c~x) = kc~xt~y = kc and similarly ~u~vt = kc~x~yt. Thus,1
~vt~u~u~vt = ~x~yt = lim
n→∞An, as desired.
Example 7.12. Find limn→∞
An when A = 12
2 0 30 2 2−1 1 0
.
Solution. Here chA(t) = (t− 1)(t− 12)2, so 1 is a simple dominant eigenvalue, and hence
A is power convergent. Moreover,
EA(1) = {c(1, 1, 0)t} because A− I = 12
0 0 30 0 2−1 1 −2
,
EAt(1) = {c(2,−3, 0)t} because At − I = 12
0 0 −10 0 13 2 −2
,
so we can take ~u = (1, 1, 0)t and ~v = (2,−3, 0)t in Theorem 7.10.
Now ~ut · ~v = (1, 1, 0)
2−3
0
= −1,
~u · ~vt =
110
(2,−3, 0) =
2 −3 02 −3 00 0 0
,
hence limn→∞
An = 1~ut~v~u~vt = 1
(−1)
2 −3 02 −3 00 0 0
=
−2 3 0−2 3 0
0 0 0
.
Appendix: Further Properties of Constituent Matrices
In order to state these properties in a convenient manner, it is useful to introduce thefollowing notation.
Notation. As usual, let A be an m ×m matrix with distinct eigenvalues λ1, . . . , λs ofmultiplicities m1, . . . ,ms, respectively, so
chA(t) = (t− λ1)m1 · · · (t− λs)
ms ,
and let Eik be the constituent matrices of A. Note that the numbering of the Eik’sdepends on the ordering of the eigenvalues, so it is useful to label the Eik’s in a waywhich is independent of the numbering.
326 Chapter 7: Powers of Matrices
For any λ ∈ C and any k ≥ 0 put
EAλ,k =
{Eik if λ = λi and k ≤ mi − 1,0m×m otherwise.
We then have:
Theorem 7.11. Let λ ∈ C and let k ≥ 0 be an integer.
(a) If B = P−1AP , then EBλ,k = P−1EA
λ,kP .
(b) If A = Diag(A1, A2, . . . , Ar), then EAλ,k = Diag(EA1
λ,k, . . . , EArλ,k).
Proof. (a) By the spectral decomposition formula (Theorem 5.12) we have for everyf ∈ C[t] that
f(A) =s∑
i=1
mi−1∑k=0
f (k)(λi)EAλi,k
,
so by Theorem 5.2 we obtain
f(B) = P−1f(A)P =s∑
i=1
mi−1∑k=0
f (k)(λi)P−1EA
λi,kP.
Since B has the same eigenvalues and multiplicities as A, the spectral decompositionformula for B gives
f(B) =s∑
i=1
mi−1∑k=0
f (k)(λi)EBλi,k
,
By the uniqueness of the matrices appearing in the spectral decomposition formula, itthus follows that EB
λi,k= P−1EA
λi,kP for 1 ≤ i ≤ s and 0 ≤ k ≤ mi − 1. Thus, if λ = λi,
for some i, and if 0 ≤ k ≤ mi − 1, then we see that the assertion holds. On the otherhand, if these conditions do not hold, then EA
λ,k = 0 = EBλ,k, so the assertion holds here
as well.(b) Let λ1, . . . , λs be the eigenvalues of A = Diag(A1, . . . , Ar). Since the eigenvalues
of Aj, for 1 ≤ j ≤ r, are a subset of those of A, and since mAj(λi) ≤ mA(λi), we can
write the spectral decomposition formula for Aj in the form
f(Aj) =s∑
i=1
mi−1∑k=0
f (k)(λi)EAj
λi,k
because the extra terms in this sum are all 0 by our conventions. Thus, by Theorem 5.10we have for any f ∈ C[t] that
f(A) = Diag(A1, . . . , Ar) =s∑
i=1
mi−1∑k=0
f (k)(λi) Diag(EA1λi,k
, . . . , EA1λi,k
).
Section 7.4: Finding limn→∞
An (when A is power convergent) 327
Comparing this to the spectral decomposition formula of A, we conclude as in part (a)that EA
λi,k= Diag(EA1
λi,k, . . . , EA1
λi,k), for 1 ≤ i ≤ s and 0 ≤ k ≤ mi−1. Thus, the assertion
of part (b) holds when λ = λi, for some i and k ≤ mi − 1., and hence in general becausein the other cases all the matrices are equal to 0.
Corollary 1. If A = Diag(A1, . . . , Ar), then
EAλ,k = 0 for k ≥ max(mA1(λ), . . . ,mAr(λ)).
Proof. By definition we have that EAj
λ,k = 0, for k ≥ mAj(λ), for all j = 1, . . . , r. Thus,
the assertion follows directly from Theorem 7.11(a).
We can use the above Theorem 7.11 and its Corollary to relate the vanishing of theconstituent matrices to the maximum size of the Jordan blocks of the Jordan canonicalform J = JA of A. For this, let λ ∈ C and put
tA(λ) := max{k ≥ 0 : J(λ, k) appears as a Jordan block in JA}
Corollary 2. If A is an m×m matrix and if λ ∈ C, then
EAλ,k = 0 ⇔ k ≥ tA(λ).
Proof. By Jordan’s theorem, J = JA = P−1AP , for some P , so by Theorem 7.8(a) wehave EA
λ,k = PEJλ,kP
−1. In particular, EAλ,k = 0⇔ EJ
λ,k = 0.If tA(λ) = 0, i.e., if λ is not an eigenvalue of J = JA, then the assertion is clear.
Thus, assume that tA(λ) ≥ 1, say λ = λ1. Then J = Diag(J11, . . . , J1ν , J′), where
J1j = J(λ1, kij) and J ′ is a Jordan matrix with mJ ′(λ) = 0. Now by Example 5.25 wehave that
EJ1j
λ,k =1
k!J(0, k1j)
k and EJ1j
λ,k = 0 ⇔ k ≥ k1j.
Thus, by Theorem 7.11(b) we have that EJλ,k = 1
k!Diag(J(0, k11)
k, . . . , J(0, k1ν)k, 0) = 0
if and only if k ≥ max(k11, . . . , k1ν) = tJ(λ) = tA(λ), and so the assertion follows.
Remark. 1) Recall from Corollary 3 of Theorem 6.4 that λ is a regular eigenvalue of Aif and only if tA(λ) = 1. Thus, if mA(λ) ≥ 1, then by Corollary 2 we have that
λ is a regular eigenvalue of A ⇔ EAλ,k = 0, for all k ≥ 1.
Note that this is a restatement of Theorem 7.9, so Corollary 2 generalizes this theorem.
2) The number tA(λ) is connected with the generalized geometric multiplicities νpA(λ)
in the following way:νk
A(λ) = νk+1A (λ) ⇔ k ≥ tA(λ).
Indeed, by definition of tA(λ) we have that the Jordan canonical form of A has no Jordanblocks J(λ, k) of size ≥ k if and only if k ≥ tA(λ) + 1, and by Theorem 6.5(c) we havethat this is the case if and only if νk
A(λ) = νk−1A (λ).
328 Chapter 7: Powers of Matrices
3) If we putµA(t) := (t− λ1)
tA(λ1) · · · (t− λs)tA(λs),
where λ1, . . . , λs are the distinct eigenvalues of A, then µA(t)| chA(t) and we have that
µA(A) = 0
by the spectral decomposition formula and by Corollary 2. This, therefore, refines theCayley-Hamilton Theorem. Note that µA(t) is the minimal polynomial of A of theAppendix of Chapter 6.
Exercises 7.4.
1. By using shortcuts, find the constituent matrices of the matrices
A =
1 0 −1 2 −10 1 0 −3 00 0 1
20 0
0 0 0 12
00 0 0 0 −1
3
and B =1
2
1 0 0 0 −11 2 0 0 11 0 2 0 11 0 0 2 10 0 0 0 1
,
and use these to calculate limn→∞
An and limn→∞
Bn.
2. Find limn→∞
An when
(a) A = 14
1 1 0 00 1 0 02 1 1 22 1 3 2
and (b) A =1
10
5 2 23 4 62 4 2
.
3. Let A be an n× n matrix which is power convergent with limit B = limk→∞
Ak, and
let m = mA(1) ≥ 0 denote the algebraic multiplicity of 1. Show:
(a) chB(t) = (t− 1)mtn−m;
(b) 0 and 1 are regular eigenvalues of B (whenever they occur).
[Hint: Look at the Jordan canonical form of A.]
4. (a) Let J = J(λ, 3). Show that J is similar to its transpose J t. In other words,find P such that
P
λ 1 00 λ 10 0 λ
P−1 =
λ 0 01 λ 00 1 λ
.
(b) As in (a), but for any Jordan block J = J(λ,m). Extend this to a generalJordan matrix.
(c) Conclude that for any square matrix A, its transpose At is similar to A.
Section 7.5: The spectral radius and Gersgorin’s theorem 329
7.5 The spectral radius and Gersgorin’s theorem
In much of our preceeding work we were able to solve problems concerning a square matrixA provided we knew certain facts about its spectrum, i.e. about its set of eigenvalues.However, often only a crude estimate of where the eigenvalues lie is enough to enable usto solve the problem. For example, to determine that limn→∞An = 0 we need only knowthat |λ| < 1 for all eigenvalues of λ of A.
Now it turns out that it is frequently easy to get such crude estimates by using aremarkable theorem due to Gersgorin.
To be able to state this theorem in a convenient form, we first introduce the followingterminology.
Definition. Let A be a square matrix. Its spectral radius is the real number defined by
ρ(A) = max{|λ| : λ ∈ C is an eigenvalue of A}.
Example 7.13. Find the spectral radius of A = Diag(1, i, 1 + i) =
1 0 00 i 00 0 1 + i
.
Solution. The spectrum of A is {1, i, 1 + i}, so
ρ(A) = max(|1|, |i|, |1 + i|) = max(1, 1,√
2) =√
2.
With the help of the spectral radius we can restate some of our previous theoremssuch as Theorems 7.6, 7.7 and 7.8 in a more convenient form:
Theorem 7.6′. A is power convergent if and only if either ρ(A) < 1 or ρ(A) = 1 and 1is a dominant, regular eigenvalue of A.
Theorem 7.8′. limn→∞
An = 0⇔ ρ(A) < 1.
One of the interesting facts concerning the spectral radius is that we can estimate itwithout knowing the eigenvalues explicitly.
There are several such methods available. The first one is based on Gersgorin’stheorem and relates the spectral radius ρ(A) of A to the number
||A|| = maxi
(m∑
j=1
|aij|
),
which is often called the norm of the m×m matrix A = (aij).
Remarks. 1) The norm || · || satisfies the following properties:
(0) ||A|| ≥ 0, ||A|| = 0⇔ A = 0,
(1) ||A+B|| ≤ ||A||+ ||B||,(2) ||cA|| = |c| ||A||, if c ∈ C,(3) ||A ·B|| ≤ ||A|| · ||B||.
330 Chapter 7: Powers of Matrices
To see this, note first that properties (0) and (2) are clear and that (1) follows immediatelyfrom the triangle inequality. Finally, to verify (3), write A = (aij), B = (bij) andC = AB = (cij). Then cij =
∑mk=1 aikbkj, so
m∑j=1
|cij| =m∑
j=1
∣∣∣∣∣m∑
k=1
aikbkj
∣∣∣∣∣ ≤ ∑j
∑k
|aikbkj|
=∑
k
|aik|∑
j
|bkj| ≤∑
k
|aik| · ||B||
≤ ||A|| · ||B||
2) Actually, any function || · || : Mn(C) → R satisfying the four properties (0)–(3) iscalled a (matrix) norm. There are many others; for example, ||A||t := ||At|| also satisfiesthese properties, as does ||A||2 := max(||~v1||, . . . , ||~vm||), if A = (~v1| . . . |~vm).
Some of these are implemented in MAPLE: ||A|| is given by the MAPLE commandnorm(A, 1) and ||At|| by norm(A, infinity), and ||A||2 by norm(A, 2).
As we shall see presently, the following estimate on the spectral radius follows easilyfrom Gersgorin’s theorem (which gives much more precise information about the locationof the eigenvalues).
Theorem 7.12. If A is an m×m matrix, then
ρ(A) ≤ min(||A||, ||At||).
Example 7.14. Estimate the spectral radius of A =
0 1
314
12
0 14
12
13
0
.
Solution. The sums of the absolute values of the columns are 1, 23, 1
2, respectively, so
||A|| = 1. On the other hand, the row sums are 712, 3
4, 5
6so ||At|| = 5
6. Thus, by Theorem
7.12 we seeρ(A) ≤ min(1, 5
6) = 5
6.
Note that this estimate already shows that A is power convergent (with limit 0).To see how good this estimate is, let us compute the characteristic polynomial of A,
chA(t) = t3 − 38t− 1
12,
which doesn’t have any rational roots. Its roots are all real; they are (approximately):
−0.420505805,−0.282066739, 0.7025747892.
Thus, the spectral radius is
ρ(A)·= 0.7025747852 . . .
which is not too far away from the bound 56
= .83333 . . .
Section 7.5: The spectral radius and Gersgorin’s theorem 331
We now turn to Gersgorin’s theorem, for which we first require some additional no-tation (and terminology).
Notation. If A = (aij) ∈ Mm(C) is an m×m matrix, then for each k with 1 ≤ k ≤ mput
rk =m∑
j = 1j 6= k
|akj|,
Dk = Dk(A) = {z ∈ C : |z − akk| ≤ rk}.
Thus, rk is the sum of the absolute values of the entries of the k-th row excluding thediagonal element, and Dk is the disc of radius rk centered at the (diagonal element)akk ∈ C. These discs are called the Gersgorin discs.
Example 7.15. (a) A =
(0 12 2
)a11 = 0, r1 = |1| = 1a22 = 2, r2 = |2| = 2
��
��
'&
$%
D1D2
(b) A =
(0 11 0
)a11 = a22 = 0, r1 = r2 = 1
��
��D1 = D2
(c) A =
0 1 −10 0 11 0 1
a11 = 0 r1 = |1|+ | − 1| = 2a22 = 0 r2 = |0|+ |1| = 1a33 = 1 r3 = |1|+ |0| = 1
��
��
'&
$%
��
��
D1 D3
��
D2
���
(d) A =
i 0 i0 −i 11 1 0
a11 = i r1 = 1a22 = −i r2 = 1a33 = 0 r3 = 2
��
��ip
'&
$%
��
��p
D3
D1
��
D2
���
(e) A =
3i i −1−1 4 1
2 −1 −4 + i
a11 = 3i r1 = |i|+ | − 1| = 2a22 = 4 r2 = | − 1|+ |1| = 2a33 = −4 + i r3 = |2|+ | − 1| = 3
��
��3iq D1��
��4
q D2
'&
$%
q−4 + i
D3
Theorem 7.13 (Gersgorin, 1931). Each eigenvalue λ of A lies in at least one Gersgorindisc Dk(A) of A.
332 Chapter 7: Powers of Matrices
Proof. Since λ is an eigenvalue of A, we have A~v = λ~v for some ~v = (v1, . . . , vm)t 6= ~0.
Then, for each k, we have λvk = (A~v)k =m∑
j=1
akjvj = akkvk +m∑
j = 1j 6= k
akjvj, and so
(7) (λ− akk)vk =m∑
j = 1j 6= k
akjvj.
Now choose k such thatvk = max
1≤j≤m|vj|.
Then, applying the triangle inequality to (7) yields:
|λ− akk| · |vk|(7)=
∣∣∣∣∣∣∣∣∣m∑
j = 1j 6= k
akjvj
∣∣∣∣∣∣∣∣∣ ≤m∑
j = 1j 6= k
|akj||vj| ≤∑j 6=k
|akj|︸ ︷︷ ︸rk
·|vk| = rk · |vk|.
Since |vk| 6= 0 (for otherwise ~v = ~0, contradiction), we obtain |λ − akk| ≤ rk, i.e. λ ∈Dk(A).
Corollary. Suppose that the diagonal entries of the matrix A = (aij) are much largerthan its off-diagonal entries in the sense that
(8) |akk| >m∑
j = 1j 6= k
|akj|, for k = 1, . . . ,m.
Then A is invertible.
Proof. The hypothesis means that |0− akk| > rk or that 0 /∈ Dk(A), for 1 ≤ k ≤ m. ByGersgorin theorem, this means that 0 is not an eigenvalue of A and hence A is invertible.
Example 7.16. If A is as in Example 7.15(e), then the picture shows that 0 does notlie in any Gersgorin disc, so A is invertible.
Proof of Theorem 7.12. We first show:
(9) ρ(A) ≤ ||A|| := maxk
(m∑
j=1
|akj|).
For this, let λ be an eigenvalue of A of maximal absolute value, i.e. |λ| = ρ(A). Now by
Section 7.5: The spectral radius and Gersgorin’s theorem 333
Gersgorin’s theorem, λ ∈ Dk(A) for some k, i.e.
|λ− akk| ≤m∑
j = 1j 6= k
|akj|, or
|λ− akk|+ |akk| ≤m∑
j=1
|akj| ≤ ||A||.
On the other hand, since ρ(A) = |λ| = |(λ−akk)+akk| ≤ |λ−akk|+ |akk| by the triangleinequality, it follows from the above inequality that ρ(A) ≤ |λ−akk|+ |akk| ≤ ||A||, whichproves (9).
Next we apply (9) to At to get ρ(At) ≤ ||At||. But since A and At have the sameeigenvalues, ρ(A) = ρ(At) ≤ ||At||. Thus we have ρ(A) ≤ ||A|| and ρ(A) ≤ ||At|| and soTheorem 7.12 follows.
For real matrices, there are other methods of estimating the spectral radius. One ofthese is based on the trace of the matrix.
Definition. If A = (aij) is an m×m matrix, then its trace is defined as the sum of itsdiagonal elements:
tr(A) = a11 + a22 + · · ·+ amm.
Example 7.17. a) tr(1 23 4
)= 1 + 4 = 5.
b) tr
0@ 1 −2 34 −6 67 −8 9
1A = 1− 6 + 9 = 4.
Properties of the trace. If A,B and P are m×m matrices and P is invertible, then
(1) tr(AB) = tr(BA)
(2) tr(P−1AP ) = tr(A)
(3) tr(A) = mA(λ1)λ1 + · · ·+mA(λs)λs
= sum of the eigenvalues of A, counted with their multiplicities.
[Indeed, to verify (1), write A = (aij) and B = (bij). Then
tr(AB) =m∑
i=1
m∑k=1
aikbki =m∑
k=1
m∑i=1
bkiaik = tr(BA)
which is (1). From this (2) follows because
tr(P−1AP ) = tr(P−1︸︷︷︸A
(AP )︸ ︷︷ ︸B
)(1)= tr((AP )P−1) = tr(A).
Finally, (3) is immediate for Jordan matrices J. If A is an arbitrary matrix, then by
Jordan’s theorem P−1AP = J is a Jordan matrix for some P , and thus tr(A)(2)=
tr(P−1AP ) = tr(J).]
334 Chapter 7: Powers of Matrices
Theorem 7.14. If A is a real m×m matrix, then
(10) ρ(A) ≤√ρ(AtA) ≤
√tr(AtA).
Remark. Even though√ρ(AtA) is a better estimate of ρ(A) than
√tr(AtA), the price
we pay for this improvement is often not worth the effort, for ρ(AtA) is in general as hard(or harder) to compute as ρ(A). (For example, if A is a symmetric matrix (At = A),then ρ(AtA) = ρ(A2) = ρ(A)2, so the first inequality in (10) is an equality in this caseand hence is vacuous).
However, the combination of this inequality with other methods frequently leads tobetter results, as the following example shows.
Example 7.18. Estimate ρ(A) when A =
1 −3 4 02 1 2 11 −1 1 −1−1 2 1 −1
.
Solution. We try in turn the various methods at our disposal.
Method 1: use Theorem 7.12.
Here ||A|| = max(5, 7, 8, 3) = 8 and ||At|| = max(8, 6, 4, 5) = 8, so by Theorem 7.12 weobtain ρ(A) ≤ max(8, 8) = 8.
Method 2: use Theorem 7.14.
Since AtA =
7 . . . . . . ∗... 15
...... 22
...∗ . . . . . . 3
, we see that tr(AtA) = 47 and hence by Theorem 7.14
we obtain ρ(A) ≤√
47 < 6.86.
Method 3: combine Theorems 7.12 and 7.14.
We have AtA =
7 −4 8 24 15 −9 08 −9 22 02 0 0 3
.
Applying Method 1 to AtA yields ρ(AtA) ≤ ||AtA|| = max(21, 28, 39, 5) = 39 and henceby the first inequality of (10) we obtain ρ(A) ≤
√ρ(AtA) ≤
√39 < 6.25, which is the
best estimate.
[However, none of these three estimates is particularly good, for
chA(t) = t4 − 2t3 + 3t2 + 16t+ 45
which has (by MAPLE) the approximate roots 2.36496± 2.57701i, −1.36496± 1.34728i,
and so ρ(A)·= |2.36496± 2.57701i| ·
= 3.49772, which is almost 12
of the above estimate!]
Section 7.5: The spectral radius and Gersgorin’s theorem 335
The proof of Theorem 7.14 is based on the fact that AtA is a positive definite matrix,so we first review some properties of such matrices.
Insert: Properties of positive semi-definite matrices
Let A be a real symmetric m × m matrix. Then A is said to be positive semi-definite(notation: A ≥ 0) if
~vtA~v ≥ 0, for all ~v ∈ Rn.
(Note that ~vtA~v is a 1× 1 matrix, i.e. a number).
1) A ∈Mm(R)⇒ AtA ≥ 0.
[Clearly, AtA is real symmetric, for (AtA)t = At(At)t = AtA. Furthermore, ~vt(AtA)~v =(~vtAt)A~v = (A~v)tA~v = ||A~v||2 ≥ 0, so AtA is positive semi-definite.]
2) A ≥ 0⇒ A is diagonable and has real non-negative eigenvalues λi ≥ 0.
[Indeed, since At = A, the Principal Axis Theorem (cf. Chapter 5) tells us that A isdiagonable and that the eigenvalues λi are real. Now if ~vi is an associated eigenvector,then λi||~vi||2 = λi~v
ti~vi = ~vt
i(A~vi) ≥ 0, so λi ≥ 0 since ||~vi||2 > 0.]
3) A ≥ 0⇔ At = A and λi ≥ 0 for every eigenvalue λi of A.
[Clearly, if A ≥ 0 then At = A by definition and λi ≥ 0 by property 2). Moreover, theconverse is clear if A = Diag(λ1, . . . , λm) is diagonal. In the general case, A = P−1DPfor some orthogonal matrix P = (P−1)t, so ~vtP tDP~v = (P~v)tD(P~v) ≥ 0.]
4) A ≥ 0⇒ ρ(A) ≤ tr(A).
[Indeed, let λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0 be the eigenvalues of A, listed according to theirmultiplicities. Then ρ(A) = λ1 ≤ λ1 + λ2 + · · ·+ λm = tr(A).]
In addition to these elementary properties, we also require the following more subtlefact.
Lemma. Let A ∈ Mm(R) be a real m×m matrix and let λ ∈ C be an eigenvalue of A.If 0 ≤ µ1 ≤ · · · ≤ µm denote the eigenvalues of AtA (counted with multiplicities), thenthere are non-negative real numbers θ1, . . . , θm ≥ 0 such that
(11) |λ|2 =m∑
k=1
θiµi andm∑
k=1
θi = 1.
Proof. Note that even though A is real, some of its eigenvalues and eigenvectors may benon-real. We thus introduce the complex conjugate ~v and complex norm ||~v|| of a vector~v = (v1, . . . , vm)t ∈ Cm:
~v = (v1, . . . , vm)t
||~v|| =√~v · ~v =
√|v1|2 + · · ·+ |vm|2
336 Chapter 7: Powers of Matrices
This norm has the familiar properties:
||~v|| ≥ 0 and ||~v|| = 0⇔ ~v = ~0(12)
||α~v|| = |α| · ||~v||.(13)
We apply this to an eigenvector ~v 6= ~0 associated to our given eigenvalue λ. (Thus~v ∈ EA(λ).) Since A is real, we have A~v = (A~v) and so
~vtAtA~v = (A~v)t(A~v) = ||A~v||2 = ||λ~v||2 (13)= |λ|2||~v||2.
Now we apply the Principal Axis Theorem to the symmetric matrix AtA to find anorthogonal matrix P ∈Mn(R) such that
P−1(AtA)P = Diag(µ1, µ2, . . . , µm) = D, or AtA = PDP−1 = PDP t.
Put ~w = P t~v. Then, since P is real, ~w = (P t~v) = P t~v, and so
~wtD~w = (P t~v)tDP t~v = ~vtPDP t~v = ~vt(AtA)~v = |λ|2||~v||2
by the above. On the other hand, since D is a diagonal matrix,
~wtD~w =m∑
i=1
wiµi ~wi =m∑
i=0
µi|wi|2, if ~w = (w1, . . . , wm)t.
Thus, combining this with the previous equation yields
m∑i=1
µi|wi|2 = |λ|2||~v||2,
and so we see that the first equation of (11) holds with
θi =|wi|2
||~v||2≥ 0.
Moreover, the second equation of (11) also holds because
||~v||2m∑
i=1
θi =m∑
i=1
|wi|2 = ||~w||2 = ||P t~v||2 = (P t~v)t(P~v) = ~vtPP t~v = ||~v||2,
and so, dividing through by ||~v||2, we obtain that
m∑i=1
θi = 1.
Section 7.5: The spectral radius and Gersgorin’s theorem 337
Proof of Theorem 7.14. Since AtA is positive semi-definite (cf. property 1) of the aboveinsert), we have ρ(AtA) ≤ tr(AtA) by property 4). This proves the second inequality of(10).
To prove the first inequality, let λ ∈ C be an eigenvalue of A such that |λ| = ρ(A),and put M = max{µi : 1 ≤ i ≤ m}, where the µi’s are the eigenvalues of AtA. Thus,since AtA is positive semi-definte, M = ρ(AtA). Then by the Lemma we have
|λ|2 =m∑
i=1
µiθi ≤m∑
i=1
Mθi = Mm∑
i=1
θi = M = ρ(AtA),
and so ρ(A)2 = |λ|2 ≤ ρ(AtA). From this the first inequality of (10) follows by takingsquare roots.
Exercises 7.5.
1. (a) Find the Gersgorin discs of the following matrices.
A =
(0 1−1 i
); B =
0 −1 11 2 11 0 −1
; C =
2i 1 10 2 01 0 −1
.
(b) What can say about the approximate location of the eigenvalues?
(c) Compute the eigenvalues and compare your answer to that obtained in (b).
2. Suppose A is a 3× 3 matrix with spectrum {1 + i, 1− i, 1}.(a) What is the spectral radius of A?
(b) What is the spectral radius of B = A2 − 2A+ 2I?
(c) Is B power convergent?
3. Let m×m matrix with characteristic polynomial
chA(t) = tm + am−1tm−1 + . . .+ a1t+ a0.
Show thattr(A) = −am−1 and det(A) = (−1)ma0.
4. (a) Show that for all m×m matrices we have
tr(ABC) = tr(BCA) = tr(CAB).
(b) Find three 2× 2 matrices such that tr(ABC) 6= tr(ACB).
338 Chapter 7: Powers of Matrices
7.6 Geometric Series
In the previous sections we learned how to determine whether a given matrix A is powerconvergent and also how to find the limit limn→∞An. We now apply these methods tothe following problem arising in economics.
Problem: Four regions (or cities) R1, R2, R3, and R4 ship non-renewable commodities(such as antique paintings, rental cars) among themselves. Each region ships per week afixed percentage of its goods to the other regions according to the following diagram:
R1
70%−→←−30%
R2
10%↓ ↘10% ↓10%
R3 R4
The figures represent the percentage ofgoods each region ships per week.(Percentage: in terms of goods present)
Today’s distribution:
R1 has $300, 000 worth of goodsR2 has $200, 000 worth of goodsR3 has $100, 000 worth of goodsR4 has $100, 000 worth of goods
Questions: 1) What is the distribution of goods after n weeks?
2) What happens in the long run?
Analysis: Let ~v1n = amount in $100,000 of goods present in R1 after n weeks, ~v2n =the amount of goods in R2 after n weeks, etc.
Thus, ~vn =
0BB@v1n
v2n
v3n
v4n
1CCA represents the distribution of goods after n weeks, and today’s distri-
bution is represented by the vector ~v0 =
0BB@3211
1CCA. We now determine how the distribution
of the goods changes from week n to week n+ 1:
R1 retains 10% of its goods (because it sends 70 + 10 + 10 = 90% away)+ receives 30% of those of R2
R2 retains 60% of its goods (because it sends 30 + 10 = 40% away)+ receives 70% of those of R1
R3 retains all of its goods+ receives 10% of those of R1
R4 retains all of its goods+ receives 10% of those of R1
Section 7.6: Geometric Series 339
We thus have the equations:
v1,n+1 = 110v1,n + 3
10v2,n
v2,n+1 = 710v1,n + 6
10v2,n
v1,n+1 = 110v1,n + + 1 · v3,n
v1,n+1 = 110v1,n + 1
10v2,n + + 1 · v4,n
This means that we have the discrete linear system
~vn+1 = A~vn
in which
A =1
10
1 3 0 07 6 0 01 0 10 01 1 0 10
and ~v0 =
3211
.
From chapter 5 (discrete linear systems) we therefore know that
~vn = An~v0.
How can we compute An efficiently?We could of course apply our previous method (using constituent matrices) to the
matrix A, but since A has the special shape
A =
(T 0B I
), where T = 1
10
(1 37 6
)and B = 1
10
(1 01 1
),
it is better to use the following rule.
Observation (Rule for multiplying partitioned matrices). Given the partitioned matricesas indicated, we have
m1{
n1︷︸︸︷A11
n2︷︸︸︷A12 . . .
ns︷︸︸︷A1s
m2{ A21 . . . . . . A2s
.... . . . . .
...mr{ Ar1 . . . . . . Ars
k1︷︸︸︷B11
k2︷︸︸︷B12 . . .
kt︷︸︸︷B1t }n1
B21 . . . . . . B2t }n2
.... . . . . .
...Bs1 . . . . . . Bst }ns
=
k1︷︸︸︷C11
k2︷︸︸︷C12 . . .
kt︷︸︸︷C1t }m1
C21 . . . . . . C2t }m2
.... . . . . .
...Cr1 . . . . . . Crt }mr
where Cij = Ai1B1j + Ai2B2j + · · ·+ AisBsj.
Notes: 1) As indicated, Aij is an mi × nj matrix and Bij an ni × kj matrix.
2) This generalizes the rules we learned for multiplying block matrices.
340 Chapter 7: Powers of Matrices
Applying this rule to our situation with A =(
T 0B I
)yields:
A2 =
(T 0B I
)(T 0B I
)=
(T 2 + 0 ·B T · 0 + 0 · IB · T + I ·B B · 0I2
)=
(T 2 0
B(T + I) I
)Similarly,
A3 = A2 · A =
(T 2 0
B(T + I) I
)(T 0B I
)=
(T 3 0
B(T + I)T +B I
).︸ ︷︷ ︸
B(T 2+T+I)
Thus, continuing in this manner leads to the formula
(14) An =
(T n 0
B(T n−1 + T n−2 + · · ·+ T + I) I
).
This formula therefore shows that in order to find An, we only have to do computationsinvolving 2× 2 (rather than 4× 4) matrices!
However, while we know how to compute T n efficiently, how do we compute the sum
T n−1 + · · ·+ T + I?
Recall (from high school): Geometric series. If a 6= 1 is any number, then
1 + a+ a2 + · · ·+ an−1 =1− an
1− a
and hence, if |a| < 1, then the infinite series (called a geometric series)
∞∑n=0
an def.= lim
n→∞(1 + 1 + · · ·+ an−1) =
1
1− a.
(Recall from analysis that an (infinite) series is the limit of a sequence of partial sums.)
This idea works for matrices as well, as we shall now see.
Definition. Let T be a square matrix. Then the sequence {Sn}n≥0 defined by
Sn = I + T + . . .+ T n−1, S0 = I,
is called the geometric series generated by T . The series converges if the sequence {Sn}n≥0
converges; we then write∞∑
n=0
T n = limn→∞
Sn.
Section 7.6: Geometric Series 341
Theorem 7.15. The geometric series generated by T converges if and only if
(15) ρ(T ) < 1; i.e. |λi| < 1, for each eigenvalue λi of T.
If this condition holds, then I − T is invertible and we have
(16) Sn :=n−1∑k=0
T k = (I − T )−1(I − T n),
and hence the series converges to
(17)∞∑
k=0
T k = (I − T )−1.
Remark. Note that by the Corollary of Theorem 7.8 the above condition (15) is equiv-alent to:
(18) limn→∞
T n = 0.
Proof of Theorem 7.15. (⇐) Suppose first that the sequence {Sn}n converges, andput S = lim
n→∞Sn. Then, since
Sn − Sn−1 = (T n + T n−1 + · · ·+ I)− (T n−1 + · · ·+ I) = T n,
it follows that limn→∞ T n = limn→∞(Sn−Sn−1) = limn→∞ Sn−limn→∞ Sn−1 = S−S = 0.Thus, T is power convergent with limit 0, so by Theorems 7.6, 7.7, 7.8, we see that (15)holds.
(⇒) Now suppose that (15) holds. Then the eigenvalues of I−T are 1−λi 6= 0, so I−Tis invertible. Furthermore,
T · Sn = T (I + T + · · ·+ T n)
= T + T 2 + · · ·+ T n+1
= (I + · · ·+ T n) + T n+1 − I= Sn + T n+1 − I.
Thus, T ·Sn−Sn = T n+1−I and hence (T −I)Sn = T n+1−I. Since (T −I) is invertible,formula (16) follows.
Finally, from (15) and Theorem 7.7 we have that limn→∞
T n+1 = 0, so
limn→∞
Sn = (I − T )−1(I − limn→∞
T n+1) = (I − T )−1 · I,
which proves (17).
342 Chapter 7: Powers of Matrices
Example 7.19. Find the geometric series∞∑
n=0
T n when T = 110
(1 37 6
).
Solution. We first verify that T satisfies the hypothesis (15).
Method 1: By Theorem 7.12 we have ρ(T ) ≤ ||T || = max( 810, 9
10) = 9
10< 1, so T
satisfies (15).
Method 2: Put T ′ = 10T =(1 37 6
). Then chT ′ = (1 − t)(6 − t) − 21 = t2 − 7t − 15 =
(t− 7+√
1092
)(t− 7−√
1092
), so chT (t) = (t− 7+√
10920︸ ︷︷ ︸λ1
)(t− 7−√
10920︸ ︷︷ ︸λ2
).
Since 0 < 120
(7 +√
109) < 120
(7 +√
121) = 1820< 1,
and 0 > 120
(7−√
109) > 120
(7−√
100) = − 320> −1,
we see that ρ(T ) = max(|λ1|, |λ2|) = 120
(7 +√
109) < 910< 1, so T satisfies (15).
Thus, by Theorem 7.15, the geometric series generated by T converges and∞∑
k=0
T n = (I − T )−1 =
(1
10
(9 −3−7 4
))−1
= 10
(9 −3−7 4
)−1
=10
36− 21
(4 37 9
)=
2
3
(4 37 9
).
Example 7.20. Application to the “shipping of commodities” problem:
Find limn→∞
An when A =
(T OB I
), with T as in Example 7.19 and B = 1
10
(1 01 1
).
Solution. By the previous discussion we have:
An (14)=
(T n 0
B(I + · · ·+ T n−1) I
), so lim
n→∞An Th. 7.15
=
(0 0
B(I − T )−1 I
).
Now by Example 7.19
(I − T )−1 = 23
(4 3
7 9
), so B(I − T )−1 = 1
10
(1 0
1 1
)23
(4 3
7 9
)= 1
15
(4 3
11 12
),
and hence
limn→∞
An = 115
0 0 0 00 0 0 04 3 15 011 12 0 15
.This means that the distribution of goods in the long run is
limn→∞
~vn = limn→∞
An~v0 = 115
0BB@0 0 0 00 0 0 04 3 15 011 12 0 15
1CCA0BB@
3211
1CCA = 115
0BB@003372
1CCA =
0BB@00
2.24.8
1CCA.In other words, R1 and R2 do not have any goods in the long run,
R3 has $220, 000 worth of goods in the long run,R4 has $480, 000 worth of goods in the long run.
Section 7.6: Geometric Series 343
Exercises 7.6.
1. For each of the following matrices, determine (and justify) whether or not thegeometric series generated by the matrix converges and, if so, find its limit.
A =1
2
(1 1−1 1
), B =
1
4
3 0 11 2 −11 0 3
, C =1
5
4 0 0 01 4 −1 00 0 3 01 3 0 0
.
2. Find limn→∞
An when
A =1
2
1 0 1 0 0 00 1 1 0 0 0−1 1 1 0 0 0
0 0 1 1 0 00 1 0 0 1 01 0 0 0 0 1
.
3. (a) Let T be an r× r matrix, B an s× r matrix and put m = r+ s. Show that the
m×m matrix A =
(T 0B I
)is invertible if and only if T is invertible, and that in
this case A−1 =
(T−1 0−BT−1 I
).
(b) Write down the inverse of the 10×10 matrix A =
10 0 . . . . . . 0
9 1. . .
......
.... . . . . .
...2 0 . . . 1 01 0 . . . 0 1
without
performing any calculations.
4. Let P =
(A C0 B
), where A is an r × r matrix, B is an s× s matrix and C is an
r × s matrix.
(a) Show that P is invertible if and only if A and B are invertible. [Hint: Considerthe linear independence of suitable rows and/or columns.]
(b) If A and B are invertible, show that the inverse of P is given by
P−1 =
(A−1 X
0 B−1
), where X = −A−1CB−1.
Note: A quick method to prove (a) is to use the general fact that
det(P ) = det(A) det(B),
but you are not supposed to use this rule in this problem.
344 Chapter 7: Powers of Matrices
7.7 Stochastic matrices and Markov chains
The discrete linear system~vn+1 = A~vn
of the previous example (“shipping of commodities”) had the following two properties:
1) Each entry of A (=transition matrix) is non-negative.
2) The sum of the entries of each column is = 1 (= sum of percentages).
A discrete linear system of this type is called a stochastic system, and its transition matrixA is called a stochastic matrix. We had encountered such matrices already in subsection5.9.3 in connection with Markov chains. More generally:
Definition. A real p× q matrix A = (aij) is called stochastic if
aij ≥ 0, for all i, j;(19)
Σp · A = Σq,(20)
where Σp = (1, 1, . . . , 1)︸ ︷︷ ︸p
(viewed as a 1× p matrix).
Remark. Condition (20) means: the sum of each column of A is equal to 1.
Example 7.21. (a) (1, 1, 1)︸ ︷︷ ︸P3
2 43 21 5
︸ ︷︷ ︸
A
= (2 + 3 + 1, 4 + 2 + 5) = (16, 11)
⇒ A is not stochastic.
(b) B = 15
1 12 32 1
; Σ3B = (1, 1, 1)15
1 12 32 1
= (1, 1)⇒ B is stochastic.
Theorem 7.16. If A is a stochastic matrix of size p× qand B is a stochastic matrix of size q × r,then AB is a stochastic matrix of size p× r.
Proof. Clearly AB has non-negative entries (since A and B do), so it is enough to showthat: Σp(AB) = Σr. Now:
Σp(AB) = (ΣpA)BA stochastic
= (Σq)BB stochastic
= Σr ,
and so AB is stochastic.
Corollary. If A is a square stochastic matrix, then so is An for all n ≥ 0. If, moreover,~v is a stochastic vector, then An~v is a stochastic vector for all n ≥ 0.
Section 7.7: Stochastic matrices and Markov chains 345
Proof. Induct on n. Since the statement is clear for n = 0, assume n > 0. By theinduction hypothesis we have that An−1 is stochastic, and hence An = A · An−1 is alsostochastic by Theorem 7.16. Thus, the first assertion holds for all n ≥ 0.
The second assertion follows from the first by applying Theorem 7.16 to An and ~v.
Remark. In section 5.9 we had seen that if (S,A) is a Markov chain, then its transitionmatrix A and probability distribution vectors ~vn are all stochastic, and that these satisfythe discrete linear system
~vn+1 = A~vn.
In the sequel we shall frequently refer to any such discrete linear system (in which A andthe ~vn’s are stochastic) as an (abstract) Markov chain.
We have the following basic result about the eigenvalues of a square stochastic matrix:
Theorem 7.17. Let A be a square stochastic matrix of size m×m. Then
(a) 1 is a regular eigenvalue of A.
(b) We have |λi| ≤ 1, for every eigenvalue λi of A; i.e. ρ(A) = 1.
Proof. (a) We first prove that λ1 = 1 is an eigenvalue. Since At and A have the sameeigenvalues, this follows from:
(21) Σtm is an eigenvector of At with eigenvalue λ1 = 1.
Indeed, A stochastic ⇒ ΣmA = Σm ⇒ AtΣtm = Σt
m ⇒ Σtm is a 1-eigenvector of At.
To prove that 1 is a regular eigenvalue of A, we will use the norm of a matrix as wasdefined in section 7.5. Since A is stochastic, we have that
(22) ||A|| = max(1, 1, . . . , 1) = 1.
Thus, since An is also stachastic, we have that ||An|| = 1, for any n ≥ 1. Thus, by usingproperty (3) of the norm, we see that for any invertible matrix P we have
(23) ||P−1AnP ||∗ ≤ cP := ||P−1|| · ||P ||, for all n ≥ 1.
Now take P such that P−1AP = J is a Jordan matrix. Then by (23) we see that||Jn|| ≤ cP , for all n ≥ 1. This implies that J cannot contain a Jordan block J(1, k) ofsize k ≥ 2 because then
J(1, k)n =
1 n ∗ ∗0 1 n ∗. . . . . . . . . . . .0 . . . . . . 1
,
where the ∗’s are positive numbers, and so ||Jn|| ≥ ||J(1, k)n|| ≥ 1+n, which contradicts(23). Thus, 1 is a regular eigenvalue of A.
346 Chapter 7: Powers of Matrices
(b) By Theorem 7.12 and (22) we see the spectral radius ρ(A) ≤ ||A|| = 1. Since bypart (a) we know that 1 is an eigenvalue of A, it follows that ρ(A) = 1.
Warning. Theorem 7.17 does not imply that A is power convergent, for 1 need notbe a dominant eigenvalue, i.e. there may be other eigenvalue λi 6= 1 with |λi| = 1. For
example: A =„
0 11 0
«is stochastic, but not power convergent (since its eigenvalues are
1,−1). Note that A2 = I, so A2n+1 = A,A2n = I.
Corollary. A stochastic matrix A is power convergent if and only if 1 is a dominanteigenvalue of A, i.e. if and only if λ = 1 is the only eigenvalue of A of absolute value 1.
Proof. Combine Theorems 7.6′ and 7.17.
While a stochastic matrix does not need to be power convergent in general, thereis a fairly simple criterion which ensures its power convergence. This criterion, whichwas discovered by Oscar Perron (1880–1975) in 1907, depends on the following simpleconcept.
Definition. A square stochastic matrix A is called primitive if for some n ≥ 1 the matrixAn has no entries equal to 0.
Remark. Note that An has no zero entries, then the same is true for An+1 etc. Thuswe see that A is primitive if and only if An is primitive for some (hence every) n.
Example 7.22. (a) A = 110
(1 29 8
)is clearly primitive (take n = 1).
(b) A = 110
(0 110 9
)is also primitive, for A2 = 1
100
(0 110 9
)= 1
100
(10 990 91
).
(c) Upper/lower triangular (block) matrices are never primitive. In particular, thetransition matrix A of the commodities example (Example 7.19) is not primitive.
Theorem 7.18 (Perron’s Theorem). If A is a primitive, stochastic matrix, then:
(a) λ1 = 1 is a simple, dominant eigenvalue of A.
(b) The associated eigenspace
EA(1) = {c~p : c ∈ C}
is generated by a (unique) stochastic vector ~p (called the Perron vector of A).
(c) All the entries of ~p are positive.
Proof. Without proof.1
Corollary. If A is primitive and stochastic, then A is power convergent with limit
limn→∞
An = (~p|~p| . . . |~p)︸ ︷︷ ︸m
,
where ~p denotes the Perron vector of A. Moreover, for any vector ~v we have
limn→∞
An~v = c~p, where c = Σm~v.
1A proof can be found in F.R. Gantmacher’s book, The Theory of Matrices, vol. II, p. 53ff.
Section 7.7: Stochastic matrices and Markov chains 347
In particular, if ~v is stochastic, then limn→∞
An~v = ~p does not depend on ~v.
Proof. By Perron’s theorem (Theorem 7.18) we know that 1 is a dominant, simpleeigenvalue of A, so A is power convergent. Moreover, by Theorem 7.10 we have
limk→∞
Ak =1
~ut~v~u · ~vt,
where ~u is a 1-eigenvector of A and ~v is a 1-eigenvector of At. Here we can take~u = ~p (Perron’s vector)~v = (1, . . . , 1)t = Σt
m since A is stochastic.
Now ~ut · ~v = 1 since ~p is stochastic,~u · ~vt = ~p · (1, . . . , 1) = (~p| . . . |~p),
so first formula follows. Moreover,
limk→∞
Ak~v = ( limk→∞
Ak)~v = (~p| . . . |~p)~v = ~pΣm~v︸︷︷︸c
.
Remarks. 1) As the above proof shows, the conclusion of the corollary holds moregenerally for stochastic matrices for which 1 is a simple, dominant eigenvalue (and ~p isa stochastic 1-eigenvector). However, in this more general case the entries of ~p need notbe positive.
2) The above theorem and its corollary reveal the following interesting fact abouta stochastic discrete linear system ~vn+1 = A~vn: if A is primitive, then no state will becompletely drained in the long run (because the limit c~p has no zero entries).
Note that in the commodities example (Example 7.19), the states R1 and R2 weredrained in the long run (but there the transition matrix A was not primitive!)
Example 7.23. Consider the economics example 7.19 again, but assume now that thetransition matrix A describing the fractions aij of the value of goods shipped weekly fromregion Rj to Ri is
A =
.1 .2 .1 .1.7 .6 .1 .1.1 .1 .7 .1.1 .1 .1 .7
and the initial distribution vector is ~v =
v1
v2
v3
v4
where v1 + v2 + v3 + v4 = 7.
Find the distribution of the goods in the long run.
Solution. Note that here we are not specifying how many $100, 000 worth of goods waspresent initially in each region, only that the total present initially was $700, 000. So ~vcould be (3, 2, 1, 1)t or (7, 0, 0, 0)t, etc.
Qualitative Solution. Thanks to O. Perron (through the corollary to his theorem) weknow (with no calculation) that there is a vector ~p = (p1, . . . , p4)
t (all of whose entries
348 Chapter 7: Powers of Matrices
are positive (pj > 0)) that depends only on A (not on ~v) such that limn→∞An~v = 7~p.Therefore, in the long run, no matter how ~v was chosen among all the vectors with entrysum 7, eventually Rj will have $700, 000pj worth of the commodity. This tells us a greatdeal about the economic situation in a qualitative way.
Quantitative Solution. If we want more quantitative information we need to find ~p.This is a routine matter of finding the eigenvectors for 1. We know in advance (againthanks to O. Perron) that 1 is simple, so any solution ~x of (A−I)~x = ~0 will be a multipleof ~p, i.e. ~x = c~p, with c = Σ4~x. Now by row reducing A − I (or by inspection) we findthat
~x =
6161111
is such a solution, so
~p =1
Σ4~x~x =
1
44
6161111
=
3/224/111/41/4
·=
.14.36.25.25
.
Since limn→∞
An~v = 7~p, this means that
R1 gets $95, 455 = 3/22× $700, 000 worth of goods eventually,
R2 gets $254, 545 = 4/11× $700, 000 worth of goods eventually,
R3 gets $175, 000 = 1/4× $700, 000 worth of goods eventually,
R4 gets $175, 000 = 1/4× $700, 000 worth of goods eventually,
regardless of how the $700, 000 worth of goods was initially distributed!
The only effect of the choice of ~v has on the system is to determine how soon the“ultimate” distribution is reached. For example, if
~v = 7~p =
0.952.551.751.75
,
then it’s reached immediately; however if
~v =
1.42.11.751.75
then it will take longer. (Using the computer, it is not difficult to find the first n forwhich the dollar difference between $106(An~v)j and $7 × 106pj is negligible for all j.
Section 7.7: Stochastic matrices and Markov chains 349
Here, ‘negligible’ means less than $10, 000, $1, 000, $100, $10, $1 or $0.01 depending onthe situation at hand.)
Example 7.24. Find the limit limn→∞
An when A =
0 12
13
12
0 23
12
12
0
.
Solution. Clearly, A is stochastic. Even though A has some zero entries, A is stillprimitive because
A2 =
0 12
13
12
0 23
12
12
0
0 1
213
12
0 23
12
12
0
=1
12
5 2 4
4 7 2
3 3 6
has no zero entries. Thus, by Perron’s theorem, A is power convergent and
limn→∞
An = (~p|~p|~p),
where ~p is the Perron vector of A.To compute ~p, solve (A− I)~x = ~0:
1
6
−6 3 23 −6 43 3 −6
~x = ~0⇒ ~x = c
8109
, so ~p =1
27
8109
.
Thus, limn→∞
An Th. 7.18= 1
27
8 8 810 10 109 9 9
.
Example 7.25. Find the limit limn→∞
Bn when B =
0 12
12
12
0 12
13
23
0
.
Solution. We first observe that even though B is not stochastic (for the columns do notadd to 1), A = Bt is stochastic, and so the theory of the present section applies. In fact,since A is the matrix of Example 7.24, we know that
limn→∞
An = 127
8 8 810 10 109 9 9
.
Now since Bn = (At)n = (An)t, we therefore obtain
limn→∞
Bn = limn→∞
(An)t = 127
8 8 810 10 109 9 9
t
= 127
8 10 98 10 98 10 9
.
350 Chapter 7: Powers of Matrices
We now use Perron’s Theorem to analyze the following examples of “rat mazes” whichmight arise in Biology experiments.
Example 7.26 (A Rat Maze). As in Example 5.30, consider the following system ofthree chambers connected by passages as shown.
1
2
3
At time t0, a rat is placed in one of the cham-bers (say in chamber 1).
Each minute thereafter, the rat is driven outof its present chamber by some stimulus andis prevented from re-entering immediately.
Assume: the rat chooses the exits of eachchamber at random.
Question: What is the probability that the rat is in a certain chamber in the long run?
Solution. Recall from the solution of Example 5.30 that this problem gives rise to aMarkov chain
~vn+1 = A~vn with ~v0 = (1, 0, 0)t,
in which ~vn = (v1,n, v2,n, v3,n)t denotes the state probability vector (with vi,n = theprobability that the rat is in chamber i after n minutes) and that the transition matrixA is given by
A =
0 12
13
12
0 23
12
12
0
(cf. Example 5.30), which is the same as that of Example 7.24. Thus, by the latter exam-ple we see that the matrix A is primitive, and that the Perron vector is ~p = 1
27(8, 10, 9)t.
By Perron’s theorem we therefore have
limn→∞
An = 127
8 8 810 10 109 9 9
and limn→∞
~vn = ~p = 127
8109
.
This means that in the long run the rat spends:
827
of its time in chamber 1,1027
of its time in chamber 2,13
of its time in chamber 3.
Note: This result does not depend on where (i.e. in which chamber) the rat was originallyplaced.
Section 7.7: Stochastic matrices and Markov chains 351
Example 7.27. This example is the same experiment as in Example 7.26, except thatthe maze is different. Moreover, if the rat chooses an exit A or B, then it leaves the maze(and the experiment is over).
1 2
3
B
A
��
��
���
��
��
��� Analysis: To interpret this as a Markov chain, we let
sj be the state: ‘the rat is in chamber j’ for j = 1, 2, 3;s4 be the state: ‘the rat went to exit A’;s5 be the state: ‘the rat went to exit B’.
The corresponding transition matrix (assuming random selections by the rat) is:
A =
0 1
314
0 012
0 14
0 012
13
0 0 0
0 13
14
1 0
0 0 14
0 1
.
This matrix is of the form A =
(T OB I
)where
T =
0 13
14
12
0 14
12
13
0
, B =
(0 1
314
0 0 14
), O =
0 00 00 0
and I =
(1 00 1
).
The matrix A isn’t primitive, but we can use Theorem 7.15 to find limn→∞
An, as we did
in the problem of section 7.6 (cf. Example 7.20).
We have limn→∞
An =
(0 0
B(I − T )−1 I
)because ρ(T ) < 1 (cf. Example 7.14).
Now (I − T )−1 = 178
132 60 4890 126 5496 72 120
, and so limn→∞
An =1
13
0 0 0 0 00 0 0 0 00 0 0 0 09 10 8 13 04 3 5 0 13
.
If we place the rat in chamber i with probability qi initially, then the initial vector is~v0 = (q1, q2, q3, 0, 0)t and
limn→∞
~vn = limn→∞
An~v0 = (0, 0, 0, 1− q, q)t, where q = 413q1 + 3
13q2 + 5
13q3.
352 Chapter 7: Powers of Matrices
Example 7.28. Consider the following rat maze:
1 3
4 2
Find the probability that the rat is in a given chamber after n minutes when n is large.
Solution. Here we have the Markov chain ~vn+1 = A~vn whose transition matrix is
A =
0 0 1
412
0 0 34
12
13
12
0 023
12
0 0
=
(O BC O
),
if we let B = 14
(1 23 2
), C = 1
6
(2 34 3
)and O =
(0 00 0
). Thus A2 =
„BC 00 CB
«, so
A2k =
((BC)k 0
0 (CB)k
)and A2k+1 = AA2k =
(0 B(CB)k
C(BC)k 0
), for all k ≥ 0.
This shows that A is not power convergent. To find out what An looks like when n islarge, we therefore need to distinguish two cases:
(a) n large and even,(b) n large and odd.
Case (a). We calculate limk→∞
(BC)k = (~p, ~p) where ~p is the Perron vector of the primitive
stochastic matrix
BC = 124
(10 914 15
).
We find that
~p = 123
(914
)and hence
L1 := limk→∞
(BC)k = 123
(9 914 14
).
Then we calculate limk→∞
(CB)k = (~q, ~q) where ~q is the Perron vector of the primitive
stochastic matrix CB = 124
(11 1013 14
). We find that ~q = 1
23
„1013
«and hence
L2 := limk→∞
(CB)k = 123
(10 10
13 13
).
Section 7.7: Stochastic matrices and Markov chains 353
Therefore
limk→∞
A2k = Diag(L1, L2) =1
23
9 9 0 014 14 0 00 0 10 100 0 13 13
.
Thus, when n is large and even,
~vn = An~v0.= Diag(L1, L2)~v0 = 1
23
9(v1 + v2)14(v1 + v2)10(v3 + v4)13(v3 + v4)
when ~v0 =
v1
v2
v3
v4
.
For example,
if ~v0 =
0BB@120120
1CCA then ~v1024·= 1
46
0BB@9141013
1CCA and if ~v0 =
0BB@1000
1CCA then ~v1024·= 1
23
0BB@91400
1CCA.The latter computation can be interpreted as follows. If we place the rat in chamber1 at time t = 0, then after 1024 minutes the probability that it is in chamber 1 isapproximately 9
23and that it is in chamber 2 is (approximately) 14
23. On the other hand,
it is highly unlikely (i.e. the probability is almost 0) that it is in chambers 3 or 4.
Case (b). Here we have
limk→∞
A2k+1 = A limk→∞
A2k = 123
(0 BL2
CL1 0
)=
1
23
0 0 9 90 0 14 1410 10 0 013 13 0 0
.
Thus when n is large and odd,
~vn·=
1
23
0 0 9 90 0 14 1410 10 0 013 13 0 0
~v0 = 123
9(v3 + v4)14(v3 + v4)10(v1 + v2)13(v1 + v2)
.
For example,
if ~v0 =
0BB@1000
1CCA then ~v1025·= 1
23
0BB@001013
1CCA and if ~v0 =
0BB@0010
1CCA then ~v1025·= 1
23
0BB@91400
1CCA.Again, these computations can be interpreted as follows. If we place the rat in chamber1 at time t = 0, then it is highly unlikely that after 1025 minutes it is in chambers 1or 2, whereas the probability that it is in chambers 3 and 4 is approximately 10
23and 13
23,
respectively. Similarly, if we place it in chamber 3 at time t = 0, then it is highly unlikelythat it is in chambers 3 or 4 after 1025 minutes, whereas the probability that it is inchambers 1 and 2 at that time is approximately 9
23and 14
23, respectively.
354 Chapter 7: Powers of Matrices
Example 7.29. The game of craps.
This game requires only an ordinary pair of dice. It has two stages:
“Coming Out” The “shooter” (=player) rolls the dice. If she “rolls” (i.e. the sum ofthe dots on the tops of the dice is) 7 or 11, she wins; if she rolls 2, 3 or 12 she loses. Inthese two cases the game is over. However if 4, 5, 6, 8, 9 or 10 is rolled then that numberis called the shooter’s “point” and we go to the second stage:
“Making the Point” Now the shooter rolls the dice again. She wins if she rolls herpoint (“makes” her point) and the game is over. If she rolls any other number then sherolls the dice again, winning if she makes her point, (the same point as determined inthe first stage), losing on 7 and rolling again if any other number turns up.
For example: Suppose the shooter rolls 6 at first (so 6 is the point). Then she rolls 8(not the point, not 7), so she rolls again. Say she rolls a 3 (3 is not 7, not 6), so she rollsagain, say a 2 (not 7, not 6), so she rolls again, etc. until either 6 is rolled (she wins) or7 (she loses).
We consider 8 states: s1 : the point is 4s2 : the point is 10s3 : the point is 5s4 : the point is 9s5 : the point is 6s6 : the point is 8s7 : the shooter losess8 : the shooter wins
The transition matrix is A = 136
27 0 0 0 0 0 0 00 27 0 0 0 0 0 00 0 26 0 0 0 0 00 0 0 26 0 0 0 00 0 0 0 25 0 0 00 0 0 0 0 25 0 06 6 6 6 6 6 36 03 3 4 4 5 5 0 36
and the initial vector is ~v0 = 1
36(3, 3, 4, 4, 5, 5, 4, 8)t.
Therefore w, the shooter’s probability of winning eventually is the 8th entry in limn→∞
An~v.
Now A =(
T OB I
)where T is the diagonal matrix 1
36Diag(27, 27, 26, 26, 25, 25) occupying
the first 6 rows and columns of A, so ρ(T ) = 2736< 1 hence lim
n→∞An~v0 =
„0 0
B(I − T )−1 I
«~v0
where B = 136
„6 6 6 6 6 63 3 4 4 5 5
«and (I − T )−1 = 36 Diag(1
9, 1
9, 1
10, 1
10, 1
10, 1
11, 1
11). Thus,
B(I − T )−1 = 1165
„110 110 99 99 99 9055 55 66 66 75 75
«and so the last row of A∞ = limn→∞An is
1165
(55, 55, 66, 66, 75, 75, 0, 165) = (13, 1
3, 2
5, 2
5, 5
11, 5
11, 0, 1). Thus, the desired probability is
w = (13, 1
3, 2
5, 2
5, 5
11, 5
11, 0, 1) · ~v0 = 976
1980= 244
495
.= 0.4929.
Section 7.7: Stochastic matrices and Markov chains 355
Exercises 7.7.
1. Consider the following stochastic matrices
A = 14
2 0 10 4 22 0 1
and B = 14
2 2 10 2 22 0 1
.
Show that B is primitive but that A is not. (Suggestion: Show that A violatesPerron’s Theorem.)
2. Let A = (aij) be a stochastic matrix of size m×m. Show that:
(a) If for every pair i, j with 1 ≤ i, j ≤ m there is an index k (depending on i, j)with aik 6= 0 and akj 6= 0, then A is primitive. [Hint: Consider A2.]
(b) If each row and each column of A contains less than m2
zeros, then A is primitive.[Hint: Use part(a).]
3. A salesman’s territory consists of 3 cities C1, C2, C3. He never sells in the same cityon successive days. If he sells in C1, then the next day he sells in C2. If he sells ineither C2 or C3, then the next day he is twice as likely to sell in C1 as in the othercity.
Let vkn denote the probability that he is in city Ck on the nth day, and let ~vn =(v1n, v2n, v3n)t denote the associated probability distribution vector.
(a) Draw a transition diagram and write down the transition matrix A such that~vn+1 = A~vn.
(b) Find the matrix A2. If he is in city C2 on one day, what is the probability thathe is in C1 two days later?
(c) Show that the matrix A is primitive.
(d) What proportion of his time does the salesman spend in each city in the longrun?
4. A sequence of random numbers x1, x2, x3, . . . , xn, . . . is generated by rolling one die
repeatedly and recording the number xn shown at the top of the die at the nth roll.Let yn = max(x1, x2, . . . , xn), the largest number shown in n rolls. This is gives riseto a Markov chain in which the state sj is the event: “the largest number shownso far is j” (for j = 1, 2, 3, 4, 5, 6).
(a) Write down the transition matrix and initial vector of the Markov chain.
(b) By means of (a), or otherwise, compute the probability that 4 is the largestnumber shown in three rolls of the die. What happens in the long run?
356 Chapter 7: Powers of Matrices
7.8 Application: Finding approximate eigenvalues of
a matrix
Very often it is a difficult if not impossible task to find the exact eigenvalues of a relativelysmall (even a 5×5) matrix, and so we have to be content with finding approximate values.However, in place of finding first the characteristic polynomial and then its approximateroots, there is a better and quicker method based on the ideas of this chapter. Thismethod, while not always successful, works frequently enough so that most computerprograms for finding an approximate eigenvalues are based on it.
Before presenting the method, let us first observe the following variant of Theorem7.8.
Theorem 7.19. If α 6= 0 is a dominant, regular eigenvalue of A, then
(a) B = 1αA is power convergent and
(b) each non-zero column of limn→∞
Bn is an α-eigenvector of A.
The proof of this theorem is based on the following fact, which we had already usedseveral times in the examples.
Observation. Let σ be a complex number and let λ1, . . . λs be the distinct eigenvaluesof an m×m matrix of A. Then σλ1, . . . , σλs are the distinct eigenvalues of σA and wehave
mσA(σλi) = mA(λi) and νσA(σλi) = νA(λi), for 1 ≤ i ≤ s.
In particular, λi is a regular eigenvalue of A if and only if σλi is a regular eigenvalue ofσA.
[Write chA(t) = (−1)m det(A− tI) = (t− λ1)m1 . . . (t− λs)
ms . Then
chσA(t) = (−1)m det(σA− tI)= (−1)m det(σ(A− t
σI))
= (−1)mσm det(A− tσI)
= σm( tσ− λ1)
m1 . . . ( tσ− λs)
ms
= (t− σλ1)m1 . . . (t− σλs)
ms ,
which shows that σA has the eigenvalues σλ1, . . . , σλs with respective algebraic multi-plicities m1, . . . ,ms. Finally, since EσA(σλi) = Nullspace(σA−σλiI) = Nullspace(σ(A−λiI)) = Nullspace(A − λiI) = EA(λi), we have in particular that νσA(σλi) = νA(λi) bytaking dimensions.]
Proof of Theorem 7.19. (a) Let λ1 = α, λ2, . . . , λs be the eigenvalues of A. Then bythe above observation the eigenvalues of B are
1,λ2
α, . . . ,
λs
α.
Section 7.8: Application: Finding approximate eigenvalues of a matrix 357
Since α is a regular eigenvalue for A, 1 is a regular eigenvalue for B (by the same
observation). Moreover, since α is dominant, we have ‖λj
α‖ < 1 for j > 1. Thus B
satisfies the conditions of Theorem 7.6 for the power convergence and hence is powerconvergent.
(b) Put L = limn→∞
Bn, and let ~v = L~ej be any non-zero column of L. Since
BL = B limn→∞
Bn = limn→∞
Bn+1 = limn→∞
Bn = L,
we see that B~v = BL~ej = L~ej = ~v and so A~v = α~v. This means that ~v ∈ EA(α), asclaimed.
From the above theorem we see that for large n, every non-zero column of α−nAn andtherefore every non-zero column of An is approximately an eigenvector of A belonging tothe dominant eigenvalue α; furthermore we see that the approximation becomes betterthe larger n is.
Let
~v =
0BBB@v1
v2
.
..vm
1CCCAbe any non-zero column of An where n is large. Since ~v is an approximate eigenvector ofA i.e. A~v
·= α~v, the numbers
(A~v)j
vj
for which vj 6= 0
differ little from each other and from α; in fact, if ~v were an exact eigenvector then thesenumbers would all coincide with the corresponding eigenvalue (which is close to α).
If we are given a matrix A and asked to find an eigenvalue of A, even if we don’tknow in advance that A has a dominant, regular eigenvalue we could look at the powersof A (it’s easy to do this with a computer) and test a non-zero column ~cn.
First we will look at a 2×2 example (even though we can find the eigenvalues exactlyfor such small matrices) to illustrate this method (called the power method).
Example 7.30. Find an approximate eigenvalue of A =(1 11 2
).
Solution. We have A2 =
(2 33 5
), squaring A2 we get
A4 =
(13 2121 34
), squaring A4 we get
A8 =
(610 987987 1597
).
358 Chapter 7: Powers of Matrices
Suppose we stopped squaring at the third step and chose the first column of A4 asour candidate ~v for an eigenvector of A. How good is ~v ? Is A~v close to a multiple of ~v ?
Here ~v =„
1321
«, A~v =
„3455
«and
{(A~v)1
v1= 34
13
·= 2.61538
(A~v)2v2
= 5521
·= 2.61905
.
We could take either 2.61538 or 2.61905 as our candidate for the corresponding eigenvalue;let us compromise by taking λ to be their average: 1
2(34
13+ 55
21)
·= 2.61722. Then ~v, λ will
be good candidates if A~v is close to λ~v. How close is A~v to λ~v ?
||A~v − λ~v|| =√
(34− 13λ)2 + (5− 21λ)2 <√
0.0005669 + 0.0014793 < 0.046.
Is that small enough? That depends on
(a) how good an approximation is required and
(b) on the size of ~v.To standardize comparisons between different eigenvector candidates (ones coming fromdifferent powers of A) we could take care of (b) by normalizing ~v but A( 1
||~v||)~v−λ( 1||~v||)~v =
( 1||~v||)||A~v−λ~v|| < 0.0022 in our case. If we are satisfied with that degree of approximation,
we stop here. If not, (suppose e.g. we wanted ||A( 1||~v||)~v − λ( 1
||~v||)~v|| < 0.00001) then we
square the power of A we were looking at (in this case A4) and repeat the procedure onthat matrix (in this case A8). Taking the first column of A8 (we could use any non-zerocolumn) as our new candidate for an eigenvector of A we get
~v =„
610987
«, A~v =
„15972584
«, and
{(A~v)1
v1= 1597
610(A~v)2
v2= 2584
987
;
these last two numbers average out to λ.= 2.618033618. To see how good ~v, λ are, we
calculate the tolerance ∆ which is defined by
(24) ∆(~v, λ) =||A~v − λ~v||||~v||
.
Here we get
∆ =1
||~v||
∣∣∣∣∣∣∣∣( 1597− 1597.005072584− 2583.999181
)∣∣∣∣∣∣∣∣ < 1
||~v||(0.0052) < 0.000005
which is good enough since we prespecified ∆ < 0.00001.Moreover, we know that λ ≥ |µ| for all other eigenvalues µ of A (so λ is “almost”
dominant). If, instead of 0.00001, we had specified 0.000001, then we would have to
square A8 and repeat the process. In this case we obtain λ·= 2.618033989 and ∆
·= 0 to
9 decimal places.
Section 7.8: Application: Finding approximate eigenvalues of a matrix 359
Thus we have found one approximate eigenvalue λi·= 2.6180... of A. To find the other
one, we shall use the trace tr(A) of A. Recall (cf. section 7.5) that the
tr(A) = sum of diagonal entries of A
= sum of eigenvalues of A (counted with their multiplicities),
hence in our case tr(A) = λ1 + λ3 = 1 + 2 = 3 hence λ2 = 3 − λ1 = 0.381966011 to 9decimal places.
To see how good an approximation this is, let us compute λ1 and λ2 exactly:
chA(t) = det
(1− t 1
1 2− t
)= (1− t)(2− t)− 1 = t2 − 3t+ 1 = (t− λ1)(t− λ2),
with λ1 = 3+√
9−42
= 32
+ 12
√5.= 2.618033989 and λ2 = 3−
√5
2
.= 0.391966011. Thus, the
approximate eigenvalues which we found agree with the exact eigenvalues up to 9 decimalplaces.
Example 7.31. Find an eigenvalue λ of
A =
1 2 4 11 1 0 32 1 4 1−1 1 2 4
with tolerance ∆(λ,~v) < 0.0001 (cf. (24)).
Solution. The first column of A8 is
~c =
917758558547970125515537
.
Since its entries are a bit too large to deal with comfortably, we choose ~v = 10−5 times~c, i.e.
~v =
9.177585.585479.701255.15537
, and so A~v =
64.3088939.4067467.9010036.43187
.
The corresponding value of λ is
λ =1
4
{64.30889
9.177758+ · · ·+ 36.43187
5.15537
}.= 7.032094636.
To see how good the approximation is, we calculate:
||A~v − λ~v|| ·= 0.4503433331
||~v|| ·= 15.36611666
360 Chapter 7: Powers of Matrices
therefore ∆(λ,~v).= .0450343/15.36611
.= 0.029307 > 0.0001. So we look at 10−2 times
the first column of (10−5A8)(10−5A8), call it ~w. We get
~w =
5.4056743.3325535.7009113.086350
and A~w =
37.96079165
. . .
. . .
. . .
.
The corresponding λ is
λ =1
4
(37.96079165
5.405674+
. . .
3.332533+
. . .
5.700911+
. . .
3.086350
).= 7.022464671.
Since ||A~w − λ~w|| .= 0.00077075 and ||~w|| .= 9.074864, we see that ∆(λ, ~w) = ||A~w −λ~w||/||~w|| .= 0.00077075/9.074864
.= 0.0000849 < 0.0001, and so we can stop here and
take λ, ~w as our approximate eigenvalue and eigenvector.
Power Method Algorithm
Suppose A is any m × m matrix and ε > 0 is the degree of tolerance required for ourapproximation (in our last example ε = 0.0001).
Successively carry out the following steps until ∆ < ε: We will suppose we have triedA,A2, A4, . . . , A2n−1
and are now looking at M = A2n.
1) Let ~v be the first column of M (or any other non-zero column).
2) Let ~y = A~v.
3) Calculate the average of{
yj
vj: vj 6= 0
}= λ.
4) Calculate ∆ = ||~y−λ~v||||~v|| .
5) If ∆ < ε, stop. Then (λ,~v) are the required approximation.
If ∆ > ε, then square A2n, call the new matrix M and do steps 1-5 again.
This is just a rough outline. We have left out the possibility of replacing M by ascalar multiple of M if the entries in M are unwieldy. In addition, some instructionshould be given about when to quit if the method does not seem to be working.
We will point out in the exercises how the power method sometimes can give us aneigenvalue of A even when A doesn’t have a dominant regular eigenvalue. Of course, themethod always works when A has a dominant, regular eigenvalue.
Deflation Method
Having found one eigenvalue of A, we would like to find the rest. One popular method isto replace A by a smaller matrix that contains the rest of the eigenvalues of A and keeprepeating the process until all the eigenvalues are found.
Section 7.8: Application: Finding approximate eigenvalues of a matrix 361
Theorem 7.20. Suppose A = (aij) is an m × m matrix and ~v = (v1, . . . , vn)t is aneigenvector for the eigenvalue α. If v1 6= 0, define the (m− 1)× (m− 1) matrix A2 by
A2 =
a22 a23 . . . a2m
a32 a33 . . . a3m...am2 am2 . . . amm
− 1
v1
v2
v3...vm
(a12, a13, . . . , a1m).
Then we havechA(t) = (t− α) chA2(t).
Thus, if λ2, λ3, . . . , λm denote the eigenvalues of A2 (counted with multiplicities), thenα, λ2, . . . , λm are those of A.
Before we prove the theorem, let us see how it is used.
Example 7.32. Use the deflation method to find the eigenvalues of
A =
1 1 84 4 20 6 4
.
Solution. Since all row sums are 10 we see without further calculation that
A
111
= 10
111
.
So 10 is an eigenvalue and
0@ 111
1A is one of its eigenvectors. Thus
A2 =
(4 2
6 4
)−(
1
1
)(11
)(1, 8) =
(4 2
6 4
)−(
1 8
1 8
)=
(3 − 6
5 − 4
).
The rest of the eigenvalues of A are the eigenvalues of A2 which we find by calculatingthe roots of the characteristic polynomial of A2. Here
chA2(t) = (t− 3)(t+ 4) + 30 = t2 + t+ 18
and its roots are −1±i√
712
, so the eigenvalues of A are
10,−1 + i
√71
2,−1− i
√71
2.
362 Chapter 7: Powers of Matrices
Example 7.33. Find the eigenvalues of A =
1 1 0 10 2 1 01 −1 3 0−1 1 2 1
by the deflation method.
Solution. Since the row sum is 3, we see that 3 is an eigenvalue of A with eigenvector(1, 1, 1, 1)t. Here
A2 =
2 1 0−1 3 0
1 2 1
− 1
11
( 1 0 1)
=
2 1 0−1 3 0
1 2 1
− 1 0 1
1 0 11 0 1
=
1 1 −1−2 3 −1
0 2 0
.
We could use the power method to find the eigenvalues of A2 but instead we shall computeits characteristic polynomial. It is chA2(t) = (t− 2)(t2− 2t+ 3), so the eigenvalues of A2
are 2, 1 + i√
2, 1− i√
2 and hence the eigenvalues of A are 3, 2, 1 + i√
2, 1− i√
2.
Proof of Theorem 7.20. We know that A~v = α~v and v1 6= 0. Define an m×m matrixP by
P =
v1 0 0 . . . 0v2 1 0 . . . 0v3 0 1 . . . 0...
......
. . . 0vm 0 0 . . . 1
.
If we let ~w = (v2, v3, . . . , vm)t, then P =
(v1
~0t
~w I
), where I is the (m − 1) × (m − 1)
identity matrix. The first column of AP is A~v, but A~v = α~v, so
AP =
αv1 a12 . . . . . . . . . . . . a1m
αv2 a22 . . . . . . . . . . . . a2m
αv3 a32 . . . . . . . . . . . . a3m...
......
.... . .
...αvm am2 . . . . . . . . . . . . amm
.
If we put ~c = (a12, . . . , a1m)t, then AP =„
αv1 ~ct
α~w B
«. Moreover,
P−1 =
(v−1
1~0t
−v−11 ~w I
)because, by block multiplication,(
v−11
~0t
−v−11 ~w I
)(v1
~0t
~w I
)=
(v−1
1 v1 +~0tw v−11~0t +~0tI
(−v−11 ~w)v1 + ~w −v−1
1 ~wot + I
)=
(1 ~0t
~0 ~I
),
Section 7.8: Application: Finding approximate eigenvalues of a matrix 363
which is the m×m identity matrix. Therefore
P−1AP =
(v−1
1~0t
−v−11 ~w I
)(αv1 ~ct
α~w B
)=
(α+~0t(α~w) v−1
1 ~ct +~0tB−α~w + α~w −v−1
1 ~wct +B
)=
(α v−1
1 ~ct
~0 A2
).
Since similar matrices have the same characteristic polynomials, we obtain
chA(t) = chP−1AP (t)
= det(tI − P−1AP )
= det
(t− α −v−1
1 ~ct
~0 tI − A2
).
Expanding the minor of the first column of this determinant yields chA(t) = (t −α) det(tI − A2) = (t− α) chA2(t), as desired.
Here is a version of Theorem 7.20 which can be used if the first entry of the eigenvector~v is zero. We omit the proof since it is similar to that of Theorem 7.20.
Theorem 7.20′. Suppose A = (aij) is an m × m matrix, ~v = (v1, . . . , vm)t is aneigenvector for the eigenvalue α and vj 6= 0. Define
A2 = A(j) −(
1
vj
)
v1
v2...
vj−1
vj+1...vm
(aj1, aj2, . . . , ajj−1, ajj+1, . . . , ajm)
where A(j) is the (m − 1) × (m − 1) matrix obtained from A by deleting its jth row andcolumn. Then
chA(t) = (t− α) chA2(t)
and hence α, λ2, λ3, . . . , λm are the eigenvalues of A if λ2, λ3, . . . , λm are those of A2.
Exercises 7.8.
1. Let A =
0 2 −2 40 2 −1 1−1 2 −1 4−1 1 −1 4
. Given that 2 is an eigenvalue of A, use the deflation
method (at least once) to find the spectrum of A.
364 Chapter 7: Powers of Matrices
7.9 Review Exercises
Review Problems, Part A
1. Which of the following matrices is power convergent?
(a) 14
3 −3 10 2 0−1 −5 3
, (b) 14
3 3 10 4 01 −3 3
, (c)
1 −1 00 1 00 −1 1
,
(d)
0 1.9 .90 .9 0.9 .9 0
, (e)
(.8 .4.2 .6
).
For those matrices A which are power convergent, find limn→∞
An.
2. Find the spectral radius of each matrix:
(a)
(2 13 1
); (b)
(1 23 2
); (c)
3 4 5−1 2 1
1 0 1
; (d)
7 3 1 −10 1 2 30 0 −1 40 0 0 −9
.
3. Find the spectral radius of A if chA(t) = (t2 + 6t+ 9)(t2 + 2t+ 11)2.
4. Find the spectral radius of A if
limn→∞
An =
1 1 12 2 23 3 3
.
5. Suppose that A =
2 .5 0 .5.2 i .4 −.1.2 0 1 + i 1
1+i2
.1 0 3
, and that λ is any eigenvalue of A.
(a) Draw the Gersgorin discs Dj(A), j = 1, 2, 3, 4 in the plane.
(b) Use (a) and (b) to obtain upper and lower bounds on Re(λ), Im(λ) and |λ|.(c) Does your answer in (a) give you enough information to decide:
(i) If A is invertible? If so, is it?
(ii) If 13A is power convergent? If so, is it?
Section 7.9: Review Exercises 365
6. Suppose A =
3 1 0 −1−2 5 1 0−1 0 4 1
2 −1 −1 7
.
(a) Use Method 1 (Theorem 7.12) to estimate ρ(A).
(b) Use Method 2 (Theorem 7.14) to estimate ρ(A).
(c) Use Method 3 (Theorems 7.12 and 7.14) to estimate ρ(A).
(d) Do you have enough information from (a), (b), and (c) to decide that
limn→∞
(19A)n
= 0?
Explain briefly.
7. For each of the following matrices T , determine whether the geometric series gen-
erated by T converges. If so, find the sum∞∑
n=1
T n.
(a) T = .3, (b) T =1
4
(1 23 2
), (c) T =
1
5
(1 23 2
), (d) T =
0 1 20 0 30 0 0
.
8. (a) Use the technique of the economics example to evaluate
~w = limn→∞
.2 .4 0 0.4 .4 0 0.3 .1 1 0.1 .1 0 1
n
v1
v2
v3
v4
.
(b) Find two different stochastic vectors such that the vectors ~w calculated in (a)are different.
9. (a) Use the fact that A is a primitive, stochastic matrix to evaluate limn→∞
An when
A =
.5 .2 .1.3 .4 .8.2 .4 .1
.
(b) If ~v = (v1, v2, v3)t is an arbitrary stochastic vector, find lim
n→∞An~v.
(c) Is there a pair of stochastic vectors ~v, ~w such that
limn→∞
An~v 6= limn→∞
An ~w?
366 Chapter 7: Powers of Matrices
10. Suppose we alter the game of “craps” so that the player wins (instead of losing) ifhe rolls a 2 at the first stage of the game.
(a) Find the initial vector ~v0 for the Markov chain describing the new game.
(b) Find the probability that the player wins eventually under this new rule.
11. If A is an n× n stochastic matrix and At is also stochastic, show that
limk→∞
Ak =1
n(Σt
n| . . . |Σn) =1
n
1 1 · · · 11 1 · · · 1...
......
1 1 · · · 1
.
12. Players A and B play the following game:
Player A begins the game with $1 and B begins with $3. A rolls two dice. If Arolls 3, 5, or 6 then A wins $1. If A rolls 2 then A wins $2 (or $1, if that is all themoney that B has). If A rolls any other number then B wins $1. Playe A continuesto roll until A or B has no money left. Find the probability that A wins eventually.
Suggestion. Use a Markov chain with 5 states:
s1: A has $1s2: A has $2s3: A has $3s4: A has $4 (i.e. B has no more money)s5: A has $0 (i.e. A has no more money)
Sample game:
Beforeroll no.
A has B hasThenA rolls
0 $1 $3 2 — A wins $2 from B1 3 1 9 — A loses $1 to B2 2 2 6 — A wins $1 from B3 3 1 4 — A loses $1 to B4 2 2 3 — A wins $1 from B5 3 1 2 — so A wins only $1 from B6 4 0 — — B loses: game over
13. Given that ~v = (0, 2, 0, 1)t is an eigenvector of A =
0 −1 1 22 4 −1 −24 2 1 −4−3 −1 1 5
:
(a) Find the eigenvalue of A belonging to the given vector.
(b) Find the spectrum of A by using deflation at least once.
Section 7.9: Review Exercises 367
Review Problems, Part B
1. Find
limn→∞
.1 0 0 00 .2 0 0.8 .4 1 0.1 .4 0 1
n
and limn→∞
.1 0 0 0 00 .2 0 0 00 0 .3 0 0.8 .4 .3 1 0.1 .4 .4 0 1
n
.
2. Suppose
A =
x 1 1 1 10 .5 1 2 10 0 1 0 00 0 0 1 00 0 0 0 .7
For which values of x (if any) is A power convergent? Explain briefly.
3. Suppose that the characteristic polynomial of an m ×m matrix A is (t − 1)2(t +1)3(t2 + 4)(t− 2) and B =
(12
)A.
(a) Find m and the trace of A.
(b) Find a spectrum of B.
(c) Is B power-convergent? Justify your answer.
4. Suppose
A =
−1 + 3i 0 1 2
0 1 + 3i 0.3 0.20.5 −2 3.5 0
0 1 −1.5 3
Apply Gersgorin’s theorem to A and the transpose of A,
(a) to estimate the minimum of the real parts of the eigenvalues of A,
(b) to estimate the maximum of the real parts of the eigenvalues of A.
5. Suppose A =
1 + i −1 .2.2 1− i −1.3 .2 −1
, and λ is any eigenvalue of A.
(a) Sketch the Gersgorin discs Dj(A), j = 1, 2, 3.
(b) Obtain upper and lower bounds on |λ|,Re(λ), and Im(λ).
c) Does the information above enable you to decide if A is invertible?
d) Does the information above enable you to decide if(
12
)A is power convergent?
368 Chapter 7: Powers of Matrices
6. Suppose that the characteristic polynomial of an m × m matrix A is (t2 + 6t +9)(t2 + 2t+ 10)(t2 + 2t− 8).
(a) Find m, the determinant and trace of A.
(b) If r is the spectral radius of A, is(
1r
)A power convergent? Justify your answer.
7. Suppose A =
1 1 1 0−1 1 2 1−1 1 2 −1
1 −2 1 1
and B = (14)A.
(a) Estimate the spectral radius of A by any of the methods of the text.
(b) Show that B is power-convergent.
8. Let A =
0.8 0.3 0.20 0.2 00 0.5 0.4
(a) Find the spectral radius of A.
(b) Solve the equation∞∑
n=0
Anx =
111
9. Let A =
3i 1 1− 4
3i 0
0 0 2 i−1 1 3 −1
1 0 0 −2
.
(a) For each of the Gersgorin discs of A, write down its centre and radius.
(b) Give a rough sketch of the above discs.
(c) Use Gersgorin’s theorem to estimate the spectral radius ρ(A) of A.
10. Alex, Barbara and Charles are playing a game which has been analysed as a Markovprocess with 5 states: 1 - Alex plays, 2 - Barbara plays, 3 - Charles plays, 4 - Alexwins, 5 - Alex loses. The matrix, M , of transition probabilities and the matrixM∗ = lim
n→∞Mn are given below.
M =
38
38
18
0 018
38
38
0 038
18
38
0 018
0 0 1 0
0 18
18
0 1
, M∗ =
0 0 0 0 0
0 0 0 0 0
0 0 0 0 01126
413
726
1 01526
913
1926
0 1
Section 7.9: Review Exercises 369
1) What is the chance that Charles will have the next turn if Alex is playingnow?
2) What is the chance that Alex will have the next turn if Charles is playingnow?
3) If Alex is now allowed to start the game, whom should he prefer to have startit? Give your reason.
4) What are the chances that Alex will win if the starting player is chosen atrandom from among the three of them?
5) If M is written as
(A OB I
)how is M∗ written in terms of A,B, I, O?
6) Fill in the last three rows of a new matrix N which represents the analysismade when the state “Alex loses” is broken up into 5 - “Barbara wins” and6 - “Charles wins”. You need to know that only the person actually playingcan win on that turn.
N =
38
38
18
0 0 018
38
38
0 0 038
18
38
0 0 0
? ? ? 1 0 0
? ? ? 0 1 0
? ? ? 0 0 1
11. Find limn→∞
An where A is the column (and row) stochastic matrix
.8 .1 .1.2 .7 .10 .2 .8
.
12. Let A =
4 1 1 11 3 0 01 0 1 01 0 0 1
.
(a) Show that 1 is an eigenvalue of A.
(b) Without calculating the characteristic polynomial of A, show that the eigen-values of A are real numbers lying in the interval 0 < x ≤
√33.
(c) By using the deflation method, find a 3 × 3 matrix B such that f(x) = (x −1)g(x) where f(x) and g(x) are the characteristic polynomials of A and Brespectively.
(d) Given that the other two eigenvalues of B are 12(3 ±
√5), why does L =
limn→∞
(1
5A
)n
exist? Find L.
370 Chapter 7: Powers of Matrices
13. Suppose A =
0 4 4 42 1 5 72 2 6 −71 1 3 3
. Use deflation at least once to find a spectrum of
A, given that A
21−1
0
=
0000
.
14. Let A =
−1 1 32
−1 0 0−3
232
52
.
(a) Show that
1−1
2
is an eigenvector of A belonging to eigenvalue 1.
(b) Using the method of deflation find all other eigenvalues.
(c) Is A power convergent? If yes find limn→∞
An.
15. Let A =
0 1 00 0 1
12 −16 7
, ~v =
124
.
(a) Verify by direct matrix multiplication that ~v is a 2-eigenvector of the matrixA.
(b) Apply deflation and find the remaining eigenvalues of A.