Vector Spaceszeng/Teaching/math3407/shiu.pdfChapter 6 Vector Spaces 6.1 Deﬁnition and Some Basic Properties InChapter3,wehavelearntsomepropertiesofvectorspaceofFn.Inthischapter

Chapter 6

Vector Spaces

§6.1 Definition and Some Basic Properties

In Chapter 3, we have learnt some properties of vector space of Fn. In this chapter,we shall consider the concept of vector space in general. Now let us give a definition ofa general vector space.

Definition 6.1.1 Let F be a field. A vector space V over F is a non-empty set withtwo laws of combination called vector addition “+” (or simply addition) and scalarmultiplication “·” satisfying the following axioms:

(V1) + : V × V → V is a mapping and +(α, β) written by α+ β is called the sum of αand β.

(V2) + is associative.

(V3) + is commutative.

(V4) There is an element, denoted by 0, such that α+ 0 = α for all α ∈ V . Note thatsuch vector is unique (see Exercise 6.1-3). It is called the zero vector of V .

(V5) For each α ∈ V there is an element in V , denoted by −α such that α+ (−α) = 0.

(V6) · : F × V → V is a mapping which associates a ∈ F and α ∈ V a unique elementdenoted by a·α or simply aα in V . This mapping is called the scalar multiplication.

(V7) Scalar multiplication is associative, i.e., a(bα) = (ab)α for all a, b ∈ F, α ∈ V .

(V8) Scalar multiplication is distributive with respect to +, i.e., a(α+β) = aα+aβ forall a ∈ F, α, β ∈ V .

(V9) For each a, b ∈ F, α ∈ V , (a+ b)α = aα+ bα.

(V10) For each α ∈ V , 1 · α = α, where 1 is the unity of F.

117

118 Vector Spaces

Elements of V and F are called vectors and scalars, respectively. In this book, vectorsare often denoted by lower case Greek letters α, β, γ, . . . and scalars are often denotedby lower case Latin letters a, b, c, . . . .

Mathematical statements often contain quantifiers “∀” and “∃”. The quantifier “∀”is read “for each”, “for every” or “for all” and the quantifier “∃” is read “there exists”,“there is” or “for some”.

Lemma 6.1.2 (Cancellation Law) Suppose α, β and γ are vectors in a vector space.If α+ β = α+ γ, then β = γ.

Proof: By (V1), we have −α+(α+β) = −α+(α+γ). By (V2), we have (−α+α)+β =(−α+ α) + γ. By (V3), (V5) and (V4) we have the lemma. �

Corollary 6.1.3 Suppose α and β are vectors in a vector space. If α + β = α, thenβ = 0.

Proof: Since α+ β = α = α+ 0, by Lemma 6.1.2 we have the corollary. �

Proposition 6.1.4 Let V be a vector space over F. We have

(a) ∀α ∈ V , 0α = 0.

(b) ∀α ∈ V , (−1)α = −α.(c) ∀a ∈ F, a0 = 0.

Proof: By (V6) we know that 0α ∈ V . By (V9) we have 0α = (0+ 0)α = 0α+ 0α. ByCorollary 6.1.3 we have (a).

By (V9), (V10) and (a) we have (−1)α + α = [(−1) + 1]α = 0α = 0. By theuniqueness of the inverse (see Exercise 6.1-4), we have (−1)α = −α.

By (V4) and (V8) we have a0 = a(0 + 0) = a0 + a0. By Corollary 6.1.3 we have(c). �

The followings are examples of vector spaces. F is assumed to be a field. Reader caneasily define + and · and check that they satisfy all the axioms of vector space.

1. Let 0 be the zero of F. Then {0} is a vector space over F.

2. Let n ∈ N. Fn is a vector space over F. In particular, F is a vector space over F. Rn

is a vector space over R.

3. Let m,n ∈ N. The set Mm,n(F) is a vector space over F under the usual addition andscalar multiplication.

4. Let S be a non-empty set. Let V be the set of all functions from S into F. Forf, g ∈ V , f + g is defined by the formula (f + g)(a) = f(a) + g(a) ∀a ∈ S. Also forc ∈ F, cf is defined by (cf)(a) = cf(a) ∀a ∈ S. Then V is a vector space over F.

Vector Spaces 119

5. Suppose I is an interval of R. Let C0(I ) be the set of all continuous real valuedfunctions defined on I . Then C0(I ) is a vector space over R.

6. Let L[a, b] be the set of all integrable real valued functions defined on the closedinterval [a, b]. Then L[a, b] is a vector space over R.

7. Recall that F[x] is the set of all polynomials in the indeterminate x over F. Underthe usual addition and scalar multiplication of polynomials, F[x] is a vector spaceover F.

8. For n ∈ N, let Pn(F) be the subset of F[x] consisting of all polynomials in x of degreeless than n (of course, together with the zero polynomial). Then Pn(F) is a vectorspace over F with the same addition and scalar multiplication as in F[x] defined in

the previous example. Namely, Pn(F) can be written as

{n−1∑i=0

aixi

∣∣∣∣ ai ∈ F

}.

Exercise 6.1

6.1-1. Let α be any vector of a vector space. Prove that α+ (−α) = (−α) + α withoutusing the commutative law (V3).

6.1-2. Let α be any vector of a vector space. Prove that 0 + α = α without using thecommutative law (V3).

6.1-3. Prove that zero vector 0 of a vector space V is unique, i.e., if 0′ is an element inV such that α+ 0′ = α for all α ∈ V , then 0′ = 0.

6.1-4. Prove that for each α ∈ V , −α is unique, i.e., if β ∈ V such that α+ β = 0, thenβ = −α.

6.1-5. Determine whether or not the following are vector spaces over R.

(a) The set of polynomials in P9(R) of even degree together with the zero poly-nomial.

(b) The set of polynomials in P4(R) with at least one real root.

(c) The set of all even real-valued functions defined on [−1, 1].

6.1-6. Show that the set of solutions of differential equationd2y

dx2+ y = 0 is a vector

space over R.

6.1-7. Show that C can be considered as a vector over R. Also show that R can beconsidered as a vector space over Q. Can R be a vector space over C? Justifyyou answer.

6.1-8. Let V be any plane through the origin in R3. Show that points in V form a vectorspace under the standard addition and scalar multiplication of vectors in R3.

120 Vector Spaces

6.1-9. Let V = R2. Define scalar multiplication and addition on V by

c(x, y) = (cx, cy), (x1, y1) + (x2, y2) = (x1 + x2, 0),

where c ∈ R. Is V a vector space over R? Justify your answer.

6.1-10. Let V = R. Define addition, denoted by ⊕, by x ⊕ y = max{x, y} and scalarmultiplication c · x = cx, the usual multiplication of real numbers. Is V a vectorspace over R? Justify!

6.1-11. Let V denote the set of all infinite sequences of real numbers. Define additionand scalar multiplication by

{an}∞n=1 + {bn}∞n=1 = {an + bn}∞n=1, c{an}∞n=1 = {can}∞n=1, ∀c ∈ R.

Show that V is a vector space over R.

6.1-12. SupposeW be the set of all convergent sequences of real numbers. Define additionand scalar multiplication as Exercise 6.1-11. Is W still a vector space over R?

6.1-13. Let V be a vector space over F.

(a) Show that if a ∈ F, a �= 0, α ∈ V and aα = 0, then α = 0.

(b) Show that if a ∈ F, α ∈ V , α �= 0, and aα = 0, then a = 0.

§6.2 Linear Dependence and Linear Independence

We have learnt the concept of linear combination and linear independence in Chap-ter 3. This concept can be extended in any vector space.

Definition 6.2.1 Let V be a vector space over F. Suppose α1, α2, . . . , αn and β arevectors in V . β is said to be a linear combination of α1, α2, . . . , αn if there are scalars

a1, a2, . . . , an such that β =n∑

i=0

aiαi. We also say that β is linearly dependent on

{α1, α2, . . . , αn}.

By the above definition, 0 is always a linear combination of a set of vectorsα1, α2, . . . , αn in a vector space, since we can choose ai = 0 for all 1 ≤ i ≤ n.

Definition 6.2.2 Vectors α1, α2, . . . , αn are linearly dependent over F if there exist

a1, a2, . . . , an in F not all zero such thatn∑

i=0

aiαi = 0. Vectors α1, α2, . . . , αn are not

linearly dependent over F is called linearly independent over F. This is equivalent to thefollowing statement:

If

n∑i=0

aiαi = 0 for some ai ∈ F, then ai = 0 for all i.

Vector Spaces 121

A set S ⊆ V , where V is a vector space over F, is said to be linearly dependent if thereexist α1, α2, . . . , αn ∈ S such that α1, α2, . . . , αn are linearly dependent over F. A set issaid to be linearly independent if it is not linearly dependent.

Remark 6.2.3 Let V be a vector over a field F.

1. Let S be a subset of V . If 0 ∈ S, then S is linearly dependent.

2. For α ∈ V and α �= 0, the set {α} is linearly independent.

3. The empty set ∅ is always linearly independent.

4. Any set containing a linearly dependent subset is linearly dependent.

5. Any subset of a linearly independent set is linearly independent.

Examples 6.2.4

1. The set {1, x, x2, x2+x+1, x3} is linearly dependent in the vector space F[x] over F.

2. The set {1, x, x2, x3} is linearly independent in F[x] over F.

3. In general, the set of monomials {1, x, x2, x3, . . . } is linearly independent in F[x] overF. �

Theorem 6.2.5 If α is linearly dependent on {β1, . . . , βn} and each βi is linearly de-pendent on {γ1, . . . , γm}, then α is linearly dependent on {γ1, . . . , γm}.

Proof: From the definition we have α =n∑

i=1aiβi and βi =

m∑j=1

cijγj for some scalars ai’s

and cij ’s. Then

α =n∑

i=1

ai

⎛⎝ m∑j=1

cijγj

⎞⎠ =m∑j=1

(n∑

i=1

aicij

)γj .

�

Theorem 6.2.6 A set of non-zero vectors {α1, . . . , αn} is linearly dependent if and onlyif there is a vector αk that is a linear combination of α1, α2, . . . , αj with j < k.

Proof: Suppose {α1, . . . , αn} is linearly dependent. Then ∃a1, . . . , an ∈ F not all zero

such thatn∑

i=1aiαi = 0. Suppose k is the largest index such that ak �= 0. Clearly k ≥ 2

(since α1 �= 0). Thus

αk = −a−1k

(k−1∑i=1

aiαi

)=

k−1∑i=1

(−a−1k ai)αi.

The converse is trivial. �

We rewrite the contrapositive of Theorem 6.2.6 below.

122 Vector Spaces

Theorem 6.2.7 A set of non-zero vectors {α1, . . . , αn} is linearly independent if andonly if for each k (2 ≤ k ≤ n), αk is not a linear combination of α1, α2, . . . , αk−1.

Exercise 6.2

6.2-1. For which values of λ do the vectors (λ,−1,−1), (−1, λ,−1), (−1,−1, λ) form alinearly dependent set in R3?

6.2-2. Let α1, α2, α3 be linearly independent vectors. Suppose β1 = α1 + α2 + α3,β2 = α1 + aα2, β3 = α1 + bα3. Find condition that must be satisfied by a, b inorder that β1, β2 and β3 are linearly independent.

6.2-3. Regarding R as a vector space over Q. Show that {1,√2,√3} is linearly indepen-

dent.

6.2-4. Let C0[a, b] be the space of all real continuous functions defined on the closedinterval [a, b].

(a) Show that sinx, sin 2x, sin 3x are linearly independent (over R) in C0[0, 2π].

(b) Show that 2x and |x| are linearly independent in C0[−1, 1].(c) Show that 2x and |x| are linearly dependent in C0[0, 1].

(d) Show that ex, e2x, . . . , enx are linearly independent in C0[0, 1].

6.2-5. Are the matrices A, B and C linearly independent? Here

A =

(2 0

1 3

), B =

(0 1

1 0

), C =

(1 1

0 1

).

6.2-6. Are the functions x, ex, xex, (2− 3x)ex linearly independent in C0(R) (over R)?

6.2-7. Show that if {α1, . . . , αn} is linearly independent over F and if {α1, . . . , αn, β} islinearly dependent over F, then β depends on α1, . . . , αn.

6.2-8. Let V be a vector space over R. Show that {α, β, γ} is linearly independent ifand only if {α+ β, β + γ, γ + α} is linearly independent.

6.2-9. Given an example of three linearly dependent vectors in R3 such that any two ofthem are linearly independent.

6.2-10. Show that suppose α1, . . . , αn are linearly independent vectors and supposeα = a1α1 + · · · + anαn for some fixed scalars a1, . . . , an in F. Then the vectorsα−α1, α−α2, . . . , α−αn are linearly independent if and only if a1+a2+· · ·+an �= 1.

6.2-11. Suppose 0 ≤ θ1 < θ2 < θ3 < π2 are three constants. Let fi(x) = sin(x + θi),

i = 1, 2, 3. Is {f1, f2, f3} linearly independent? Why?

6.2-12. Let V ∈ R[x]. Let f ∈ V be a polynomial of degree n. Is {f, f ′, f ′′, . . . , f (n)}linearly independent? Justify! Here f (k) is the k-th derivative of f .

Vector Spaces 123

§6.3 Subspaces

Definition 6.3.1 A subspace W of a vector space V is a non-empty subset of V which isitself a vector space with respect to the vector addition and scalar multiplication definedin V .

By the same proof of Lemma 3.2.4 we have the following lemma.

Lemma 6.3.2 Suppose V is a vector space over F and W is a non-empty subset of V .The following statements are equivalent:

(a) If α, β ∈W , then aα+ bβ ∈W for any a, b ∈ F.

(b) If α, β ∈W , then aα+ β ∈W for any a ∈ F.

(c) If α, β ∈W , then α+ β ∈W and aα ∈W for any a ∈ F.

Proposition 6.3.3 Suppose V is a vector space over F and W is a non-empty subsetof V . Then W is a subspace of V if and only if for any α, β ∈ W , a ∈ F we haveaα+ β ∈W .

Proof: The only if part is trivial.For the if part we have to check all the axioms of vector space. By Lemma 6.3.2,

(V1) and (V6) hold. Since V is a vector space, axioms (V2), (V3), (V7), (V8), (V9)and (V10) hold automatically. Let α ∈ W (it exists since W �= ∅). By Lemma 6.3.2,0α + 0α = 0 ∈ W . Thus (V4) holds. For any α ∈ W , since 0 ∈ W , by the assumption(−1)α+ 0 = −α ∈W . Thus (V5) holds. Therefore, W is a subspace of V . �

Examples 6.3.4

1. Let V = R2 and W = {(x, 0) | x ∈ R}. Then W is a subspace of V .

2. The set Pn(F) is a subspace of the vector space F[x] over F.

3. Q ⊂ R but Q is not a subspace of R over R. It is because Q is not a vector spaceover R. �

Theorem 6.3.5 The intersection of any collection of subspaces is still a subspace.

Proof: Let {Wλ | λ ∈ Λ} be a collection of subspaces of a vector space V over F. LetW =

⋂λ∈Λ

Wλ. Since 0 ∈ Wλ for all λ ∈ Λ, W �= ∅. ∀α, β ∈ W , a ∈ F, then α, β ∈ Wλ,

∀λ ∈ Λ. Since Wλ is a subspace, aα+β ∈Wλ. Since it holds for each λ ∈ Λ, aα+β ∈W .Hence W is a subspace of V . �

Note that the union of two subspaces need not be a subspace. Can you provide anexample?

From Chapter 3, we have learnt the concept about a spanning set of a vector space.In this chapter, we make a general definition of spanning set.

124 Vector Spaces

Definition 6.3.6 Let V be a vector space over F. Suppose S ⊆ V . The span of S,denoted by span(S), is defined as the intersection of all subspaces of V containing S. Sis called a spanning set of span(S).

By the above definition we have the following lemma.

Lemma 6.3.7 Suppose A ⊆ B ⊆ V , where V is a vector space. Then span(A) ⊆span(B).

For convenience, when S is a finite set, say {α1, . . . αn}, then we often write span(S) =span{α1, . . . αn}.

Remark 6.3.8 One can show that span(S) is the smallest (with respect to set inclu-sion) subspace of V containing S. Hence if S is a subspace then span(S) = S. Thusspan(span(A)) = span(A) for any subset A of V . Also, by the definition, span(∅) = {0}.

From Chapter 3, we know that if S is a finite set of a vector space, then span(S)is the set of all linear combinations of vectors in S. Following we shall show that twodefinitions are consistent.

Theorem 6.3.9 Let V be a vector space over F. Suppose ∅ �= S ⊆ V . Then

span(S) =

{n∑

i=1

aiαi

∣∣∣∣∣ αi ∈ S, ai ∈ F and n ∈ N

}.

Proof: Let U =

{n∑

i=1aiαi

∣∣∣∣ αi ∈ S, ai ∈ F and n ∈ N

}. Clearly U is a subspace of V

containing S. Since span(S) is the smallest subspace of V containing S, span(S) ⊆ U .On the other hand, suppose W is a subspace of V containing S. Then ∀α ∈ U ,

α =n∑

i=1aiαi for some αi ∈ S. Since S ⊆W and W is a subspace, α ∈W . Then we have

U ⊆W . In particular, span(S) ⊇ U . Hence we have the theorem. �

Lemma 6.3.10 Let {α1, . . . , αn} be a linearly independent set of a vector space V .Suppose α ∈ V \ span{α1, . . . , αn}. Then {α1, . . . , αn, α} is linearly independent.

Proof: By Theorem 6.2.7 αk is not a linear combination of α1, . . . , αk−1 for 2 ≤ k ≤ n.Since α �∈ span{α1, . . . , αn}, α is not a linear combination of α1, . . . , αn. By Theo-rem 6.2.7 again {α1, . . . , αn, α} is linearly independent. �

Theorem 6.3.11 Let S be a subset of a vector space V over F. If α ∈ S is linearlydependent on other vectors of S then span(S) = span(S \ {α}).

Proof: By Lemma 6.3.7 we have span(S \ {α}) ⊆ span(S). On the other hand, since α

is linearly dependent on other vectors of S, α =n∑

i=1aiαi for some αi ∈ S \ {α}, ai ∈ F

and 1 ≤ i ≤ n. Hence α ∈ span(S \{α}) and then S ⊆ span(S \{α}). By Remark 6.3.8,we have span(S) ⊆ span(S \ {α}). This proves that span(S) = span(S \ {α}). �

Vector Spaces 125

Exercise 6.3

6.3-1. Prove Lemma 6.3.7.

6.3-2. Let C ′[a, b] be the set of all real functions defined on [a, b] and differentiable on(a, b). Let C1[a, b] be the subset of C ′[a, b] consisting of all continuously differen-tiable functions on (a, b). Show that C ′[a, b] is a subspace of C0[a, b] and C1[a, b]is a subspace of C ′[a, b].

6.3-3. Let I[a, b] be the set of all integrable real functions defined on [a, b]. Show thatI[a, b] is a subspace of the space of all real functions defined on [a, b].

6.3-4. Let V be the space of all 2×2 matrices. Let S =

{(1 0

0 0

),

(0 0

0 1

),

(0 1

1 0

)}.

What is span(S)?

6.3-5. S = {1− x2, x+ 2, x2}. Is span(S) = P3(R)?

6.3-6. Let S1 and S2 be subsets of a vector space V . Assume that S1 ∩ S2 �= ∅. Isspan(S1 ∩ S2) = span(S1) ∩ span(S2)?

§6.4 Bases of Vector Spaces

Concept of basis was introduced in Chapter 3. In this section we shall study theproperties of basis for a general vector space.

Definition 6.4.1 Let V be a vector space over F. A subset A of V is said to be a basisof V if A is linearly independent over F and span(A ) = V .

Examples 6.4.2

1. We know that {e1, e2, . . . , en} is the standard basis of Fn.

2. The vector space Pn(F) has {1, x, . . . , xn−1} as basis and is called the standard basisof Pn(F).

3. The vector space F[x] has a basis {1, x, . . . , xk, . . . }. This basis is called the standardbasis of F[x].

4. Suppose W is a subspace of Fn. The standard basis of W is a basis {α1, . . . , αk} of

W so that the matrix A =

⎛⎜⎜⎝α1

...

αk

⎞⎟⎟⎠ is in rref. �

By the same proof of Proposition 3.3.13 we have the following proposition and corol-lary.

126 Vector Spaces

Proposition 6.4.3 Suppose α1, α2, . . . , αk are linearly independent vectors of a vector

space over F. If α =k∑

i=1

aiαi for some a1, a2, . . . , ak ∈ F, then a1, a2, . . . , ak are unique.

Corollary 6.4.4 Suppose {α1, α2, . . . , αn} is a basis for a vector space V over F. Then

for every α ∈ V there exist unique scalars a1, · · · , an ∈ F such that α =

n∑i=1

aiαi.

Proposition 6.4.5 Let A be a linearly independent set of a vector space. Supposen∑

i=1aiαi =

m∑j=1

bjβj for some distinct αi ∈ A , some distinct βj ∈ A , and some non-zero

scalars ai, bj. Then n = m and {α1, . . . , αn} = {β1, . . . , βm}.

Proof: We claim that {α1, . . . , αn}∩ {β1, . . . , βm} �= ∅. For if not, then {α1, . . . , αn}∪{β1, . . . , βm} ⊆ A is a linearly independent set. Then from

n∑i=1

aiαi −m∑j=1

bjβj = 0 we

must have ai = 0 and bj = 0 for all i, j. This contradicts the assumption.Thus {α1, . . . , αn} ∩ {β1, . . . , βm} �= ∅. After a suitable reordering we may assume

that there exists a positive integer k, 1 ≤ k ≤ min{n,m} such that αi = βi for i =1, . . . , k and {αk+1, . . . , αn} ∩ {βk+1, . . . , βm} = ∅. From the assumption we have

k∑i=1

(ai − bi)αi +

n∑i=k+1

aiαi −m∑

j=k+1

bjβj = 0.

Since {α1, . . . , αn, βk+1, . . . , βm} ⊆ A is linearly independent, and all ai’s, bj ’s are non-zero, we must have ai = bi for 1 ≤ i ≤ k and n ≤ k, m ≤ k. Thus n = m = k. �

Theorem 6.4.6 (Steintz Replacement Theorem) Let V be a vector space over F.Suppose {α1, α2, . . . , αn} spans V . Then every linearly independent set {β1, β2, . . . , βm}contains at most n elements.

Proof: Since β1 ∈ V and V = span{α1, α2, . . . , αn}, β1 =n∑

i=1aiαi for some

a1, a2, . . . , an ∈ F. Since β1 �= 0, not all ai can be zero. Without loss of generality,we may assume that a1 �= 0. Thus α1 depends on {β1, α2, . . . , αn}. By Theorem 6.3.11V = span{β1, α2, . . . , αn}.

Now as β2 ∈ V , we have β2 = b1β1 +n∑

i=2ciαi for some b1, c2 . . . , cn ∈ F. Since

β1 and β2 are linearly independent, at least one of ci is non-zero. Again, we assumethat c2 �= 0. Hence α2 depends on {β1, β2, α3, . . . , αn}. So by Theorem 6.3.11V = span{β1, β2, α3, . . . , αn}.

Continuing this process, at the k-th step (k ≤ m) we have

V = span{β1, β2, . . . , βk, αk+1, . . . , αn}.

So if k < m, then by the above argument {β1, β2, . . . , βk, βk+1, αk+1, . . . , αn} is linearlydependent and so one of αi, i > k, depends on the other vectors. After renumbering

Vector Spaces 127

{αk+1, . . . , αn} if necessary, we may assume that it is αk+1. Then we replace αk+1 byβk+1 and obtain that V = span{β1, β2, . . . , βk, βk+1, αk+2, . . . , αn}. Now if n < m, thenthe above process enables us to obtain a spanning set {β1, β2, . . . , βn}. But βn+1 ∈ V ,so βn+1 depends on β1, β2, . . . , βn. This contradicts to the fact that {β1, β2, . . . , βm} islinearly independent. Therefore, we have m ≤ n. �

Corollary 6.4.7 If a vector space has one basis with n elements then all the other basesalso have n elements.

Due to the above corollary we can make the following definition.

Definition 6.4.8 A vector space V over F with a finite basis is called a finite dimen-sional vector space and the number of elements in a basis is called the dimension of Vover F and is denoted by dimF V (or dimV ). V is called infinite dimensional vectorspace if V is not of finite dimension.

Examples 6.4.9

(a) dimF Fn = n.

(b) dimC C = 1 but dimR C = 2 with {1, i} as a basis.

(c) dimF Pn(F) = n and dimF F[x] =∞.

(d) The vector space {0} has the empty set as a spanning set which is linearly indepen-dent and therefore ∅ is a basis of {0}. Thus dim{0} = 0. �

By using the Steintz Replacement Theorem we have the following corollary.

Corollary 6.4.10 Suppose m > n. Then any m vectors in an n-dimensional vectorspace must be linearly dependent.

Theorem 6.4.11 Let V be an n-dimensional vector space and A = {α1, α2, . . . , αn} bea set of vectors in V . Then the following statements are equivalent:

(a) A is a basis.

(b) A is linearly independent.

(c) V = span(A ).

Proof:

[(a)⇒(b)] Clear.

[(b)⇒(c)] If span(A ) �= V , then there is an α ∈ V \ span(A ). By Lemma 6.3.10A ∪ {α} is linearly independent with n+ 1 elements. By Corollary 6.4.10 itis impossible.

128 Vector Spaces

[(c)⇒(a)] Suppose V = span(A ). We have to show that A is linearly independent.Suppose not, then by Theorem 6.2.6 some αk depends on α1, . . . , αk−1. Alsoby Theorem 6.3.11 V = span{α1, . . . , αk−1, αk+1, . . . , αn}. By Theorem 6.4.6dimV ≤ n− 1. This is impossible. �

Corollary 6.4.12 Suppose W1 and W2 are two subspaces of V . If W1 ⊆ W2 anddimW1 = dimW2 <∞, then W1 = W2.

Proof: Suppose {α1, . . . , αm} is a basis of W1. Then {α1, . . . , αm} ⊂ W2 is linearlyindependent. Since dimW1 = dimW2, by Theorem 6.4.11 it is also a basis of W2.Therefore, W1 = W2. �

Note that two vector spaces having the same dimension may not be the same vectorspace. Here the condition that W1 ⊆W2 is crucial.

Theorem 6.4.13 In a finite dimensional vector space, every spanning set contains abasis.

Proof: Let S be a spanning set of an n-dimensional vector space V . If dimV = 0, thenthe empty set is a basis of V . If V �= {0}, then S must contain at least one non-zerovector, say α1. If span{α1} = V , then {α1} is a basis of V . If span{α1} �= V , thenthere exists α2 ∈ S \ span{α1}. By Lemma 6.3.10 {α1, α2} is linearly independent. Ifspan{α1, α2} = V , then {α1, α2} is already a basis. Otherwise, we continue the aboveprocess as long as we can. This process must stop as we cannot find more than n linearlyindependent vectors in S. �

Alternative proof: Let S be a spanning set of an n-dimensional vector space V . LetA be a maximal linearly independent subset of S, i.e., there is no linearly independentsubset B of S such that A ⊂ B (such A must exist it is because by Theorem 6.4.6every linearly independent subset of S containing at most n vectors). We shall showthat A is a basis of V .

If S ⊆ span(A ), then by Lemma 6.3.7 we have V = span(S) ⊆ span(span(A )) =span(A ) and hence span(A ) = V . So A is a basis. If S is not a subset of span(A ),then let α ∈ S \ span(A ). By Lemma 6.3.10 A ∪ {α} is linearly independent. Thiscontradicts to the assumption that A is a maximal linearly independent subset of S. �

From Chapter 3 we know that we can extend a linearly independent set of Fn to abasis of Fn. Following we shall extend this result to a general vector space.

Theorem 6.4.14 In a finite dimensional vector space, any linearly independent set ofvectors can be extended to a basis.

Proof: Let {β1, . . . , βm} be a linearly independent set in an n-dimensional vector spaceV . Let {α1, . . . , αn} be a basis of V . Clearly m ≤ n and {β1, . . . , βm, α1, . . . , αn}spans V . If m = 0, then there is nothing to prove. So we assume m > 0. Thus{β1, . . . , βm, α1, . . . , αn} is linearly dependent. Then there are b1, . . . , bm, a1, . . . , an ∈ F

not all zero such thatm∑i=1

biβi +n∑

j=1ajαj = 0. We claim that at least one aj �= 0. For

Vector Spaces 129

otherwise, if all the aj ’s are zero, then we havem∑i=1

biβi = 0 and by the assumption,

b1 = · · · = bm = 0. This is impossible.

Thus by Theorem 6.3.11 {β1, . . . , βm, α1, . . . , αj−1, αj+1, . . . αn} still spans V . Ifm > 1, then this set is linearly dependent and we can apply the above argument todiscard another αj and still obtain a spanning set of V . We continue this process untilwe get n spanning vectors, m of which are β1, . . . , βm. This is a required basis. �

Remark 6.4.15 From the proof above, we see that there are more than one way ofextending a linearly independent set to a basis.

Let V be a finite dimensional vector space and let W be a subspace of V . What isthe dimension of W? That means whether W contains a basis. Is there any relationbetween the dimension of W and the dimension of V ? We shall answer these questionsbelow.

Theorem 6.4.16 A subspace W of an n-dimensional vector space V is a finite dimen-sional vector space of dimension at most n.

Proof: If W = {0}, then W is 0-dimensional.

Assume W �= {0}. Then there exists α1 ∈ W with α1 �= 0. If span{α1} = W ,then W is 1-dimensional. Otherwise, choose α2 ∈ W \ span{α1}. By Lemma 6.3.10{α1, α2} is linearly independent. Continuing in this fashion, after k steps, we havea linearly independent set {α1, . . . , αk} and span{α1, . . . , αk} �= W . Choose αk+1 ∈W \ span{α1, . . . , αk}. By Lemma 6.3.10 {α1, . . . , αk, αk+1} is linearly independent.

This process cannot go on infinitely, for otherwise we would obtain more thann linearly independent vectors in V . Hence there must be an integer m such thatspan{α1, . . . , αm} = W . Thus dimW = m and clearly m ≤ n. �

Theorem 6.4.17 Let W be a subspace of V with a basis B = {α1, . . . , αm}. Assumethat dimV = n. Then there exists a basis B ∪ {αm+1, . . . , αn} of V for some vectorsαm+1, . . . , αn in V .

Proof: This follows from Theorem 6.4.14. �

Remark 6.4.18 Every infinite dimensional vector space also has a basis. However toshow this, we have to apply Zorn’s lemma, which is beyond the scope of in this book.

Exercise 6.4

6.4-1. Let V be the vector space spanned by α1 = cos2 x, α2 = sin2 x and α3 = cos 2x.Is {α1, α2, α3} a basis of V ? If not, find a basis of V .

6.4-2. Let F = {a + b√3 | a, b ∈ Q}. Show that (i) F is a field, (ii) F is a vector space

over Q, (iii) dimQ F = 2.

130 Vector Spaces

6.4-3. Show that a maximal (with respect to set inclusion) linearly independent set is abasis.

6.4-4. Show that a minimal (with respect to set inclusion) spanning set is a basis.

6.4-5. Find a basis for the subspace W in Q4 consisting of all vectors of the form(a+ b, a− b+ 2c, b, c).

6.4-6. Let W be the set of all polynomials of the form ax2 + bx + 2a + 3b. Show thatW is a subspace of P3(F). Find a basis for W .

6.4-7. Let V = Mn(R). Let W be the set of all symmetric matrices.

(a) Show that W is a subspace of V .

(b) Compute dimW .

6.4-8. Show that if W1 and W2 are subspaces, then W1 ∪W2 is a subspace if and onlyif one is a subspace of the other.

6.4-9. Let S, T and T ′ be three subspaces of a vector space V for which (a) S∩T = S∩T ′,(b) S + T = S + T ′, (c) T ⊆ T ′. Show that T = T ′.

§6.5 Sums and Direct Sums of Subspaces

Sometimes we want to consider a new subspace which is spanned by two subspaces.For example, we have two lines L1 and L2 in R3 which pass through the origin. SupposeL1 and L2 are not the same. Then the subspace span by these two lines is a planepassing through the origin. We shall denote this plane by L1 + L2. Now we make adefinition of such concept.

Definition 6.5.1 Let W1, . . . ,Wk be k subsets of a vector space V . We put

W1 + · · ·+Wk =k∑

i=1

Wi =

{k∑

i=1

αi

∣∣∣∣∣ αi ∈Wi, 1 ≤ i ≤ k

}.

By definition, it is easy to have the following lemma.

Lemma 6.5.2 Suppose A1 ⊆ B1 and A2 ⊆ B2 are nonempty subsets of a vector space.Then A1 +A2 ⊆ B1 +B2.

Proposition 6.5.3 If W1, . . . ,Wk are subspaces of a vector space, then so isk∑

i=1Wi.

Proof: Suppose α, β ∈k∑

i=1Wi and a ∈ F. Then α =

k∑i=1

αi, β =k∑

i=1βi for some

αi, βi ∈Wi. Now we have

aα+ β = a

k∑i=1

αi +

k∑i=1

βi =

k∑i=1

(aαi + βi).

Vector Spaces 131

Since αi, βi ∈ Wi and Wi is a subspace, aαi + βi ∈ Wi. Thus, aα + β ∈k∑

i=1Wi. By

Proposition 6.3.3,k∑

i=1Wi is a subspace. �

Definition 6.5.4 If W1, . . . ,Wk are subspaces of a vector space, thenk∑

i=1Wi is called

the sum of the subspaces W1, . . . ,Wk.

Theorem 6.5.5 Suppose W1, . . . ,Wk are subspaces of a vector space. Then

(a)k∑

i=1Wi = span

(k⋃

i=1Wi

).

(b) If span(Ai) = Wi for 1 ≤ i ≤ k, then span

(k⋃

i=1Ai

)=

k∑i=1

Wi.

Proof: For simplicity we shall assume that k = 2. The general case follows easily bymathematical induction.

To prove (a) we first note that 0 ∈W1, so W2 = {0}+W2 ⊆W1+W2. Similarly wehave W1 ⊆W1+W2. By Lemma 6.3.7 and Remark 6.3.8 span(W1∪W2) ⊆W1+W2. Onthe other hand, since W1 ⊆ span(W1 ∪W2), W2 ⊆ span(W1 ∪W2) and span(W1 ∪W2)is a subspace, W1 +W2 ⊆ span(W1 ∪W2). Thus W1 +W2 = span(W1 ∪W2).

To prove (b) we first note that from Lemma 6.3.7 we have W1 = span(A1) ⊆span(A1 ∪ A2), W2 = span(A2) ⊆ span(A1 ∪ A2). By Lemma 6.3.7 and Remark 6.3.8we have span(W1 ∪ W2) ⊆ span(A1 ∪ A2). But as A1 ⊆ W1 and A2 ⊆ W2 we haveA1∪A2 ⊆W1∪W2. Thus by Lemma 6.3.7 again we have span(A1∪A2) ⊆ span(W1∪W2).Therefore, span(A1 ∪A2) = span(W1 ∪W2). From (a) we obtain (b). �

Theorem 6.5.6 Let W1 and W2 be any two subspaces of a finite dimensional vectorspace. Then dim(W1 +W2) = dimW1 + dimW2 − dim(W1 ∩W2).

Proof: Let dim(W1 ∩ W2) = r, dimW1 = r + s and dimW2 = r + t. Suppose{α1, . . . , αr} is a basis of W1 ∩W2. We extend it to bases {α1, . . . , αr, β1, . . . , βs} and{α1, . . . , αr, γ1, . . . , γt} of W1 and W2, respectively. Clearly

span{α1, . . . , αr, β1, . . . , βs, γ1, . . . , γt} = W1 +W2.

We have only to show that {α1, . . . , αr, β1, . . . , βs, γ1, . . . , γt} is linearly independent.Suppose

r∑i=1

aiαi +

s∑j=1

bjβj +

t∑k=1

ckγk = 0 for some ai, bj , ck ∈ F.

Thent∑

k=1

ckγk = −(

r∑i=1

aiαi +s∑

j=1bjβj

)∈ W1. Also

t∑k=1

ckγk ∈ W2. Thust∑

k=1

ckγk ∈

W1 ∩W2. Hencet∑

k=1

ckγk =r∑

i=1diαi for some di ∈ F. But then

r∑i=1

diαi −t∑

k=1

ckγk = 0

132 Vector Spaces

and since {α1, . . . , αr, γ1, . . . , γt} is a linearly independent, di = 0, ck = 0 ∀i, k. Thuswe have

r∑i=1

aiαi +

s∑j=1

bjβj = 0.

Since {α1, . . . , αr, β1, . . . , βs} is linearly independent, ai = 0 and bj = 0 ∀i, j. Thus{α1, . . . , αr, β1, . . . , βs, γ1, . . . , γt} is a basis of W1 +W2.

Therefore, dim(W1+W2) = r+ s+ t = (r+ s)+ (r+ t)− r = dim(W1)+dim(W2)−dim(W1 ∩W2). �

Example 6.5.7 Let V = R3. Suppose W1 and W2 are subspaces of dimension 2. Then

dim(W1 +W2) + dim(W1 ∩W2) = 4.

If dim(W1 +W2) = 3, then dim(W1 ∩W2) = 1. So W1 and W2 intersect in a line.If dim(W1 +W2) = 2, then dim(W1 ∩W2) = 2. But W1 ∩W2 ⊆ W1, they have the

same dimension. So W1 ∩W2 = W1. Similarly, we also have W1 ∩W2 = W2 and thenW1 = W2. �

Example 6.5.8 Let V = R3. Suppose

W1 = span{(1, 0, 2), (1, 2, 2)},W2 = span{(1, 1, 0), (0, 1, 1)}.

We would like to use the idea of the proof of Theorem 6.5.6 to find a basis for W1+W2.By an easy inspection we see that both dimensions ofW1 andW2 are 2. Let α ∈W1∩W2.Since α ∈ W1, α = a(1, 0, 2) + b(1, 2, 2) = (a + b, 2b, 2a + 2b) for some a, b ∈ R. Also,since α ∈W2, α = c(1, 1, 0) + d(0, 1, 1) = (c, c+ d, d) for some c, d ∈ R. Then⎧⎪⎨⎪⎩

a+ b = c

2b = c+ d

2a+ 2b = d

Solving this system, we get b = −3a,c = −2a, d = −4a. Thus α = −2a(1, 3, 2). Thisshows that {(1, 3, 2)} is a basis of W1 ∩W2. Since dimW1 = 2, and (1, 3, 2), (1, 0, 2) arelinearly independent, {(1, 3, 2), (1, 0, 2)} is a basis of W1. Similarly {(1, 3, 2), (1, 1, 0)} isa basis of W2. Hence {(1, 3, 2), (1, 0, 2), (1, 1, 0)} is a basis of W1 +W2.

Note that, by Theorem 6.5.5 (b) W1+W2 = span{(1, 0, 2), (1, 2, 2), (1, 1, 0), (0, 1, 1)}.We can use Casting-out Method to find a basis of W1 +W2. Please refer to Chapter 3.

�

Definition 6.5.9 Let W1 and W2 be subspaces of a vector space. The sum W1+W2 iscalled a direct sum of W1 and W2 if W1 ∩W2 = {0}. In this case W1 +W2 is denotedby W1 ⊕W2. Note that W1 ⊕W2 = W2 ⊕W1.

Proposition 6.5.10 Let W1 and W2 be subspaces of a finite dimensional vector space.Then

(a) dim(W1 ⊕W2) = dimW1 + dimW2.

Vector Spaces 133

(b) The sum W1 + W2 is direct if and only if ∀α ∈ W1 + W2, α is decomposed in aunique way as α = α1 + α2 with α1 ∈W1 and α2 ∈W2.

Proof: (a) is trivial. We have to prove (b) only.

Suppose α ∈W1+W2 and α = α1+α2 = β1+β2, where α1, β1 ∈W1 and α2, β2 ∈W2.Then since α1 − β1 = α2 − β2 ∈W1 ∩W2 = {0}, we have α1 = β1 and α2 = β2.

Conversely, suppose for each α ∈ W1 + W2, α = α1 + α2 in a unique way withα1 ∈ W1 and α2 ∈ W2. We have to show that W1 ∩W2 = {0}. For α ∈ W1 ∩W2, fromα = α+0 = 0+α ∈W1+W2 and the uniqueness of decomposition we have that α = 0.Thus, W1 ∩W2 = {0} and the sum is direct. �

Definition 6.5.11 Suppose W1 and W2 are subspace of a vector space V . If V = W1⊕W2, then we say that W1 and W2 are complementary and that W2 is a complementarysubspace of W1 or complement of W1. The codimension of W1 is defined to be thedimension of W2 and denoted by codimW1.

Theorem 6.5.12 If W is a subspace of an n-dimensional vector space V , then thereexists a subspace W ′ such that V = W ⊕W ′.

Proof: Let {α1, . . . , αm} be a basis of W . By Theorem 6.4.14 we can extend thislinearly independent set to a basis {α1, . . . , αm, αm+1, . . . , αn} of V . PutW ′ = span{αm+1, . . . , αn}. Clearly V = W ⊕W ′. �

Thus every subspace of a finite dimensional vector space has a complementary sub-space. However, the complementary subspace is not unique, for there are more than oneway to extend a linearly independent set to a basis of the whole space. For example,W1 = span{(0, 1)} and W2 = span{(1, 1)} are two different complementary subspaces ofW = span{(1, 0)} in R2.

Definition 6.5.13 Let W1, . . . ,Wk be subspaces of a vector space. The sum W1+ · · ·+Wk is said to be direct if for each i, Wi ∩

( ∑1 ≤ j ≤ k

j �= i

Wj

)= {0}. We denote this sum by

W1 ⊕ · · · ⊕Wk ork⊕

j=1Wj .

Proposition 6.5.14 Let W1, . . . ,Wk be subspaces of a vector space. The sum W1 +· · ·+Wk is direct if and only if for each α ∈W1+ · · ·+Wk, α can be expressed uniquelyin the form α = α1 + · · ·+ αk with αi ∈Wi, 1 ≤ i ≤ k.

We leave the proof to reader (exercise).

Theorem 6.5.15 Suppose W1, . . . ,Wk are subspaces of a finite dimensional vector space.

Then the sumk∑

i=1Wi is direct if and only if dim

(k∑

i=1Wi

)=

k∑i=1

dimWi.

134 Vector Spaces

Proof: It is easy to show by induction that dim

(r∑

i=1Wi

)≤

r∑i=1

dimWi for each r ≥ 1.

We prove the “if” part first. For each j,

k∑i=1

dimWi = dim

(k∑

i=1

Wi

)= dim

⎛⎝Wj +∑i �=j

Wi

⎞⎠= dimWj + dim

⎛⎝∑i �=j

Wi

⎞⎠− dim

⎛⎝Wj ∩∑i �=j

Wi

⎞⎠≤ dimWj +

⎛⎝∑i �=j

dimWi

⎞⎠− dim

⎛⎝Wj ∩∑i �=j

Wi

⎞⎠ .

This implies that dim

(Wj ∩

∑i �=j

Wi

)= 0 or equivalently Wj ∩

∑i �=j

Wi = {0}.Now, we prove the “only if” part by induction on k. For k = 1, the statement

is trivial. Assume the statement is true for k ≥ 1. That is, ifk∑

i=1Wi is direct, then

dim

(k∑

i=1Wi

)=

k∑i=1

dimWi.

Now we assume thatk+1∑i=1

Wi is direct.

dim

(k+1∑i=1

Wi

)= dim

(k∑

i=1

Wi +Wk+1

)

= dim

(k∑

i=1

Wi

)+ dimWk+1 − dim

(Wk+1 ∩

k∑i=1

Wi

)

= dim

(k∑

i=1

Wi

)+ dimWk+1. (6.1)

Sincek+1∑i=1

Wi is direct, for each j with 1 ≤ j ≤ k

{0} ⊆Wj ∩( ∑

1 ≤ i ≤ k

i �= j

Wi

)⊆Wj ∩

( ∑1 ≤ i ≤ k + 1

i �= j

Wi

)= {0}.

Hencek∑

i=1Wi is direct. So by the induction hypothesis, dim

(k∑

i=1Wi

)=

k∑i=1

dimWi.

Therefore, Equation (6.1) becomes dim

(k+1∑i=1

Wi

)=

k+1∑i=1

dimWi. �

Vector Spaces 135

Exercise 6.5

6.5-1. Prove the Proposition 6.5.14.

6.5-2. Let W1 = span{(1, 1, 2, 0), (−2, 1, 2, 0)} and W2 = span{(2, 0, 1, 1), (−3, 2, 0, 4)}be two subspaces of R4. Is W1 +W2 direct?

6.5-3. Let W = span{(1, 1, 0, 1), (−1, 0, 1, 1)}. Find a subspace W ′ such that W ⊕W ′ =R4.

6.5-4. Let V be the vector space of all real functions defined on R. Let W1 be the setof all even functions in V and W2 be the set of all odd functions in V .

(a) Prove that W1 and W2 are subspaces of V .

(b) Show that W1 +W2 = V .

(c) Is the sum in (b) direct?

6.5-5. Let W1 and W2 are subspaces of a finite dimensional vector space V . Show that

codim(W1 ∩W2) =codimW1+ codimW2

if and only if W1 +W2 = V .

Chapter 7

Linear Transformations andMatrix Representations

§7.1 Linear Transformations

We want to compare two vector spaces. So we need a corresponding between twovector spaces which preserves the structure of vector space. Such corresponding is calledhomomorphism in the theory of algebra. But in linear algebra it is often called lineartransformation. Following is its definition.

Definition 7.1.1 Let U and V be vector spaces over F. A linear transformation σ ofU into V is a mapping of U into V such that

∀α, β ∈ U, a ∈ F, σ(aα+ β) = aσ(α) + σ(β).

Note that if σ is a linear transformation then ∀α, β ∈ U, ∀a, b ∈ F, σ(aα + bβ) =aσ(α) + bσ(β).

Proposition 7.1.2 If σ is a linear transformation, then σ(0) = 0.

Proof: Since σ(0) = σ(0+0) = σ(0)+σ(0), so by cancellation law we have σ(0) = 0.�

Definition 7.1.3 Suppose σ : U → V is a linear transformation. If σ is injective (i.e.,one to one) then σ is called a monomorphism. If σ is surjective (i.e., onto) then σ iscalled an epimorphism. A linear transformation that is both an epimorphism and amonomorphism is called an isomorphism.

Theorem 7.1.4 Let σ : U → V be an isomorphism. Then σ−1 : V → U is also anisomorphism.

Proof: We have only to show that σ−1 is linear.Let α′, β′ ∈ V and a ∈ F. Since σ is surjective, ∃α, β ∈ U such that σ(α) = α′

and σ(β) = β′. Since σ is linear, σ(aα + β) = aσ(α) + σ(β) = aα′ + β′. Thus by thedefinition of inverse mapping we have

σ−1(aα′ + β′) = aα+ β = aσ−1(α′) + σ−1(β′). �

136

Linear Transformations and Matrix Representation 137

Example 7.1.5 Let U = V = F[x]. For α ∈ U , α =n∑

k=0

akxk for some n, define

σ(α) =n∑

k=1

kakxk−1. Note that if F = R, then σ = d

dx is the derivative operator of real

value functions. It is easy to check that σ is linear.

Suppose F = R. Define τ(α) =n∑

k=0

akk+1x

k+1. Then τ is also linear. Note that σ is

surjective but not injective and τ is injective but not surjective. �

Example 7.1.6 For U = V and a fixed a ∈ F, define the mapping σ : U → V byσ(α) = aα. Then σ is a linear transformation and is called a scalar transformation. Ifa = 1, then we have the identity linear transformation. �

Proposition 7.1.7 Let U , V and W be vector spaces over F. Suppose σ : U → V andτ : V →W are linear. Then the composition τ ◦ σ : U →W is linear.

Proof: The proof is straightforward. �

Proposition 7.1.8 Let U and V be vector spaces over F. If σ, τ : U → V are two lineartransformations and a ∈ F, then σ + τ and aσ are linear. Here σ + τ is the sum of σand τ and aσ is the map defined by (aσ)(α) = aσ(α) for each α ∈ U .

Proof: The proof is trivial. �

Example 7.1.9 Let V be an n-dimensional vector space over F. Suppose {α1, . . . , αn}is a basis of V . For each α ∈ V there exist unique a1, . . . , an ∈ F such that α =n∑

i=1aiαi. Define a mapping σ : V → Fn (respectively Fn×1) by assigning α to (a1, . . . , an)

(respectively (a1 · · · an)T ). Then σ is an isomorphism of V onto Fn (respectively

Fn×1). �

Theorem 7.1.10 Suppose σ : U → V is a linear transformation. Then σ(U), the imageof U under σ, is a subspace of V .

Proof: The proof is left to reader as an exercise. �

Corollary 7.1.11 Keep the notation and assumption of Theorem 7.1.10. If U1 is asubspace of U , then σ(U1) is a subspace of V .

The rank of a matrix was defined in Chapter 2. Following we define the rank of alinear transformation. We will obtain a result similar to Proposition 2.4.3.

Definition 7.1.12 Let σ : U → V be a linear transformation. The rank of σ is definedto be the dimension of σ(U) and is denoted by rank(σ).

Theorem 7.1.13 Let σ : U → V be a linear transformation. Suppose dim(U) = n anddim(V ) = m. Then rank(σ) ≤ min{n,m}.

138 Linear Transformations and Matrix Representation

Proof: Since rank(σ) = dim(σ(U)) and σ(U) is a subspace of V , by Theorem 6.4.16 wehave rank(σ) ≤ m.

We first note that if {α1, . . . , αs} is linearly dependent in U then {σ(α1), . . . , σ(αs)}is linearly dependent in V . Since U is n-dimensional, there cannot have more than nlinearly independent vectors in U . Thus if {σ(α1), . . . , σ(αs)} is linearly independent inσ(U), then s ≤ n. Hence dim(σ(U)) ≤ n. Therefore, rank(σ) ≤ n. �

Proposition 7.1.14 Suppose σ : U → V is a linear transformation and W is a subspaceof V . Then σ−1[W ] = {α ∈ U | σ(α) ∈W}, the pre-image of W under σ, is a subspaceof U .

Proof: The proof is left to reader as an exercise. �

Corollary 7.1.15 σ−1[{0}] is a subspace of U .

The nullity of a matrix was defined in Chapter 3. Now we define the nullity of alinear transformation. We will obtain a result similar to Theorem 3.6.9.

Definition 7.1.16 Suppose σ : U → V is a linear transformation. σ−1[{0}] is calledthe kernel of σ and is denoted by ker(σ). The dimension of ker(σ) is called the nullityof σ and is denoted by nullity(σ).

Theorem 7.1.17 Let U and V be vector spaces over F and let σ : U → V be a lineartransformation. Suppose dimU = n. Then rank(σ) + nullity(σ) = n.

Proof: Since ker(σ) is a subspace of U , it is finite dimensional. Suppose nullity(σ) = k.Choose a basis {α1, . . . , αk, αk+1, . . . , αn} of U such that {α1, . . . , αk} is a basis of ker(σ).

For each α ∈ U , α =n∑

i=1aiαi for some ai ∈ F. Thus σ(α) =

n∑i=k+1

aiσ(αi). That is,

span{σ(αk+1), . . . , σ(αn)} = σ(U).

Ifn∑

i=k+1

ciσ(αi) = 0 for some ci ∈ F, then from the linearity of σ we have

σ

(n∑

i=k+1

ciαi

)= 0. This implies that

n∑i=k+1

ciαi ∈ ker(σ). Hencen∑

i=k+1

ciαi =k∑

j=1djαj

for some dj ∈ F. From this, we havek∑

j=1djαj −

n∑i=k+1

ciαi = 0. By the linearly indepen-

dence of basis, we must have ci = 0 ∀i. Thus {σ(αk+1), . . . , σ(αn)} is a basis of σ(U).By definition rank(σ) = n− k. Thus rank(σ) + nullity(σ) = n. �

Theorem 7.1.18 Let U and V be finite dimensional vector spaces. Let σ : U → V bea linear transformation. Then

(a) σ is a monomorphism if and only if nullity(σ) = 0.

(b) σ is an epimorphism if and only if rank(σ) = dimV .


Proof:

(a) Suppose σ is injective. Let α ∈ ker(σ). Then σ(α) = 0. Since σ(0) = 0 and σ isinjective, α = 0. Thus, ker(σ) = {0} and hence nullity(σ) = 0.

Conversely, suppose ker(σ) = {0}. If α, β ∈ U are such that σ(α) = σ(β), thenσ(α− β) = 0. Then α− β ∈ ker(σ). Hence α− β = 0. Therefore, σ is injective.

(b) If σ is surjective, then σ(U) = V . So rank(σ) = dim(σ(U)) = dimV .

Conversely, if rank(σ) = dimV , then since σ(U) ⊆ V , by Corollary 6.4.12 we haveσ(U) = V . That is, σ is surjective. �

Note that if dimU < dimV , then by Theorem 7.1.17 rank(σ) = dimU−nullity(σ) ≤dimU < dimV . Thus σ cannot be an epimorphism. If dimU > dimV , then nullity(σ) =dimU − rank(σ) ≥ dimU − dimV > 0. Thus σ cannot be a monomorphism.

Theorem 7.1.19 Let U and V be finite dimensional vector spaces of the same dimen-sion. Suppose σ : U → V is a linear transformation. Then the following statements areequivalent:

(a) σ is an isomorphism;

(b) σ is a monomorphism;

(c) σ is an epimorphism.

Proof:

[(a)⇒(b)] It is clear.

[(b)⇒(c)] By Theorem 7.1.18 nullity(σ) = 0. By Theorem 7.1.17,

dim(σ(U)) = rank(σ) = dimU = dimV.

Thus σ is surjective.

[(c)⇒(a)] It suffices to show that σ is injective. Since σ is surjective, rank(σ) = dimV .Hence nullity(σ) = dimU−rank(σ) = dimU−dimV = 0. By Theorem 7.1.18σ is injective. �

Let A and B be sets and let f : A → B be a mapping. Suppose C ⊆ A. Letg : C → B be defined by g(c) = f(c) for all c ∈ C. The mapping g is called therestriction of f on C and is denoted by g = f |C .

Theorem 7.1.20 Let U, V and W be finite dimensional vector spaces. Let σ : U → Vand τ : V →W be linear transformations. Then

rank(σ) = rank(τ ◦ σ) + dim(σ(U) ∩ ker(τ)).


Proof: Let φ = τ |σ(U) : σ(U)→W . Then ker(φ) = σ(U) ∩ ker(τ). Also,

rank(φ) = dim(φ(σ(U)) = dim((τ ◦ σ)(U)) = rank(τ ◦ σ).

By Theorem 7.1.17 we have

rank(φ) + nullity(φ) = dim(σ(U)) = rank(σ).

Therefore,rank(σ) = rank(τ ◦ σ) + nullity(φ) = rank(τ ◦ σ) + dim(σ(U) ∩ ker(τ)). �

Corollary 7.1.21 Keep the same hypothesis as Theorem 7.1.20. Then

rank(τ ◦ σ) = dim(σ(U) + ker(τ))− nullity(τ).

Proof: Since

dim(σ(U) + ker(τ)) =dim(σ(U)) + dim(ker(τ))− dim(σ(U) ∩ ker(τ))

= rank(σ) + nullity(τ)− (rank(σ)− rank(τ ◦ σ))=nullity(τ) + rank(τ ◦ σ),

we have the corollary. �

Corollary 7.1.22 Keep the same hypothesis as Theorem 7.1.20. If ker(τ) ⊆ σ(U), then

rank(σ) = rank(τ ◦ σ) + nullity(τ).

Theorem 7.1.23 Let σ : U → V and τ : V → W be linear transformations. Thenrank(τ ◦ σ) ≤ min{rank(σ), rank(τ)}, where U, V and W are finite dimensional.

Proof: By definition, rank(τ ◦ σ) = dim((τ ◦ σ)(U)) = dim(τ(σ(U)) ≤ dim(τ(V )) =rank(τ).

It is easy to see from the equality in Theorem 7.1.20 that rank(τ ◦ σ) ≤ rank(σ).Thus the theorem holds. �

Theorem 7.1.24 Let σ : U → V and τ : V → W be linear transformations. If σ issurjective, then rank(τ ◦ σ) = rank(τ). If τ is injective, then rank(τ ◦ σ) = rank(σ).Here U, V and W are finite dimensional.

Proof: If σ is surjective, then ker(τ) ⊆ V = σ(U). So by Corollary 7.1.22

rank(τ ◦ σ) = rank(σ)− nullity(τ) = dimV − nullity(τ) = rank(τ).

If τ is injective, then ker(τ) = {0}. By Theorem 7.1.20 rank(τ ◦ σ) = rank(σ). �

Corollary 7.1.25 The rank of a linear transformation is not changed by composingwith an isomorphism on either side.


Theorem 7.1.26 Let {α1, . . . , αn} be any basis of a vector space U . Suppose β1, . . . , βnare any n vectors (not necessary distinct) in a vector space V . Then there exists a uniquelinear transformation σ : U → V such that σ(αi) = βi for i = 1, 2, . . . , n.

Proof: For α ∈ U , α =n∑

i=1aiαi for uniquely determined scalars a1, . . . , an. Define

σ(α) =n∑

i=1aiβi. Then it is easy to check that σ is linear and σ(αi) = βi for i = 1, 2, . . . , n.

Clearly, σ is unique. �

Corollary 7.1.27 Let {α1, . . . , αk} be any linearly independent set in an n-dimensionalvector space U . Suppose β1, . . . , βn are any n vectors (not necessary distinct) in a vectorspace V . Then there exists a linear transformation σ : U → V such that σ(αi) = βi fori = 1, 2, . . . , k.

Proof: Extend {α1, . . . , αk} to a basis {α1, . . . , αk, . . . αn} of U . Define σ(αi) = βi fori = 1, 2, . . . , k and σ(αj) arbitrarily for j = k + 1, . . . , n. Extend σ linearly to get arequired linear transformation. Note that such σ is not unique if k < n. �

Under what condition will the left or the right cancellation law hold for compositionof linear transformations?

We shall use O to denote the zero linear transformation, i.e., O(α) = 0 for everyvector α in the domain of O.

Theorem 7.1.28 Let V be an m-dimensional vector space and σ : U → V be a lin-ear transformation. Then σ is surjective if and only if for any linear transformationτ : V →W , τ ◦ σ = O implies τ = O.

Proof:

[⇒] Suppose σ is surjective. Assume τ : V → W is a linear transformation such thatσ ◦ τ = O. ∀β ∈ V , since σ is surjective, ∃α ∈ U such that σ(α) = β. Thenτ(β) = τ(σ(α)) = O(α) = 0. So τ = O.

[⇐] Suppose σ is not surjective. Then σ(U) ⊂ V . Hence we can find a basis {α1, . . . , αk,. . . , αm} of V such that {α1, . . . , αk} is a basis of σ(U). Clearly k < m. ByTheorem 7.1.26 there exists a linear transformation τ : V → V such that τ(αi) = 0for i ≤ k and τ(αi) = αi for k < i ≤ m. Then τ ◦ σ = O with τ �= O. �

Corollary 7.1.29 A linear transformation σ : U → V is surjective if and only ifτ1 ◦ σ = τ2 ◦ σ implies τ1 = τ2 for any linear transformations τ1, τ2 : V → W . Here Vis finite dimensional.

Theorem 7.1.30 Let U be an n-dimensional vector space and σ : U → V be a lineartransformation. Then σ is injective if and only if for any linear transformationη : S → U , σ ◦ η = O implies η = O.

Proof:


[⇒] Suppose σ is injective. Assume η : S → U is a linear transformation such thatσ ◦ η = O. ∀α ∈ S by assumption σ(η(α)) = 0. Since σ is injective, η(α) = 0. Soη = O.

[⇐] Suppose σ is not injective. Then by Theorem 7.1.18 ker(σ) ⊃ {0}. That is, ∃α ∈ker(σ) such that α �= 0. Let S = U and let {α1, . . . , αn} be a basis of U . ByTheorem 7.1.26 there exists a linear transformation η : U → U such that η(αi) = αfor 1 ≤ i ≤ n. Then η �= O but (σ ◦ η)(αi) = σ(α) = 0 for all i. This means thatσ ◦ η = O. �

Corollary 7.1.31 A linear transformation σ : U → V is injective if and only if σ◦η1 =σ ◦ η2 implies η1 = η2 for any linear transformations η1, η2 : S → U . Here U is finitedimensional.

Following we shall introduce a linear transformation called projection. This trans-formation will be used in decomposition of a vector space.

Definition 7.1.32 A linear transformation π : V → V is called a projection on V ifπ2 = π ◦ π = π.

Theorem 7.1.33 If π is projection on V , then V = π(V ) ⊕ ker(π) and π(α) = α forevery α ∈ π(V ).

Proof: First we note that if β ∈ V , then π(π(β)) = π2(β) = π(β). Thus π(α) = α forevery α ∈ π(V ). Now suppose β ∈ V . Put γ = β−π(β). Then π(γ) = π(β)−π2(β) = 0.Thus γ ∈ ker(π). Hence V = π(V ) + ker(π). The sum is also direct, for if α ∈π(V ) ∩ ker(π), then α = π(α) = 0. �

Exercise 7.1

7.1-1. Prove Theorem 7.1.10.

7.1-2. Prove Proposition 7.1.14.

7.1-3. Let σ : P3(R) → P3(R) be defined by σ(f) = f ′′ + 2f ′ − f . Here f ′ is thederivative of f . Is σ surjective? Can you find a polynomial f ∈ P3(R) such thatf ′′(x) + 2f ′(x)− f(x) = x (i.e., x ∈ σ(P3(R)) )? If so, find f .

7.1-4. Determine whether the following linear transformations are (a) injective; (b) sur-jective.

(1) σ : R3 → R2; σ(x, y, z) = (x− y, z).

(2) σ : R2 → R3; σ(x, y) = (x+ y, 0, 2x− y).

(3) σ : R3 → R3; σ(x, y, z) = (x, x+ y, x+ y + z).

(4) σ : P4(R)→M2(R), for f ∈ P4(R), let σ(f) =

(f(1) f(2)

f(3) f(4)

).


7.1-5. Define σ : P3(R) → R4 by σ(f) = (f(1), f ′(0),∫ 1

0f(x)dx,

∫ 1

−1xf(x)dx). Find

ker(σ).

7.1-6. Define σ : M2(R) → P3(R) by σ

(a b

c d

)= a + b + 2dx + bx2. Find ker(σ) and

also find nullity(σ) and rank(σ).

§7.2 Coordinates

From now on, we shall assume all the vector spaces are finite dimensional unlessotherwise stated.

In coordinate geometry, coordinate axes play an indispensable role. In a vectorspace, basis plays a similar role. Each vector in a vector space can be expressed by aunique linear combination of a given ordered basis vectors. These coefficients in thecombination determine a column matrix.

For a linear transformation σ from vector space U to vector space V , we wouldlike to give σ ‘clothes’, that is, the representing matrix via ordered bases of U and Vrespectively. This way we shall be able to make full use of the matrix algebra to studylinear transformation.

Let A be a basis of a finite dimensional vector space V . We say that A is an orderedbasis if A is viewed as an ordered set (i.e., a finite sequence). Note that, from now onthe order is very important.

Definition 7.2.1 Let A = {α1, . . . , αn} be an ordered basis of V over F. For α ∈ V

there exist a1, a2, . . . , an ∈ F such that α =n∑

i=1aiαi. We define the coordinate of α

relative to A , denoted [α]A , by

[α]A =

⎛⎜⎜⎜⎜⎝a1

a2...

an

⎞⎟⎟⎟⎟⎠ ∈ Fn×1.

The scalar ai is called the i-th coordinate of α relative to A .

Note that we normally identify the element (a1, . . . , an) ∈ Fn with the column vector(a1 · · · an)T .Proposition 7.2.2 Let A = {α1, . . . , αn} be an ordered basis of V over F. For α, β ∈ Vand a ∈ F, [aα+ β]A = a[α]A + [β]A .

Proof: Suppose α =n∑

i=1aiαi, and β =

n∑i=1

biαi for some ai, bi ∈ F. Then

aα+ β = a

n∑i=1

aiαi +

n∑i=1

biαi =

n∑i=1

(aai + bi)αi.


Thus,

[aα+ β]A =

⎛⎜⎜⎜⎜⎝aa1 + b1

aa2 + b2...

aan + bn

⎞⎟⎟⎟⎟⎠ = a

⎛⎜⎜⎜⎜⎝a1

a2...

an

⎞⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎝b1

b2...

bn

⎞⎟⎟⎟⎟⎠ = a[α]A + [β]A .

�

Corollary 7.2.3 Keeping the hypothesis of Proposition 7.2.2, the mapping α �→ [α]Ais an isomorphism from V onto Fn.

Suppose U and V are finite dimensional vector spaces over F. Let L(U, V ) be theset of all linear transformations from U to V . Now we are going to study the struc-ture of L(U, V ). Note that under the addition and scalar multiplication denoted inProposition 7.1.8 L(U, V ) is a vector space over F.

Theorem 7.2.4 Suppose U and V are finite dimensional vector spaces over F withordered bases A = {α1, . . . , αn} and B = {β1, . . . , βm}, respectively. Then with respectto these bases, there is a bijection between L(U, V ) and Mm,n(F).

Proof: Let σ ∈ L(U, V ). Since σ(αj) ∈ V , σ(αj) =m∑i=1

aijβi for some aij ∈ F. Thus we

obtain a matrix A = (aij) ∈Mm,n(F).Conversely, suppose A = (aij) ∈ Mm,n(F). Then by Theorem 7.1.26 there exists a

unique linear transformation σ ∈ L(U, V ) such that σ(αj) =m∑i=1

aijβi for j = 1, 2, . . . , n.

Therefore, the mapping σ �→ (aij) is a bijection. �

Definition 7.2.5 With the notation defined in the proof of Theorem 7.2.4, we call thatσ is represented by A with respect to the ordered bases A and B and write A = [σ]AB . IfU = V and A = B, we call that σ is represented by A with respect to the ordered basisA , we simplify the notation A = [σ]AA to A = [σ]A .

Note that any matrix representation of a linear transformation must be relativeto some ordered bases. Thus by bases we shall mean ordered bases for any lineartransformation representation.

Note also that the entries of the j-th column of A are the coefficients of σ(αj) withrespect to basis B. Here αj is the j-th vector in A .

Theorem 7.2.6 Let the bijection ϕ : σ �→ A = [σ]AB be defined in the proof ofTheorem 7.2.4. Then ϕ is an isomorphism.

Proof: It suffices to show that ϕ is linear. Suppose σ, τ ∈ L(U, V ). Let A and B bethe bases defined in Theorem 7.2.4 and let A = (aij) = [σ]AB = ϕ(σ) and B = (bij) =[τ ]AB = ϕ(τ). For any c ∈ F,

(cσ + τ)(αj) = cσ(αj) + τ(αj) = c

m∑i=1

aijβi +

m∑i=1

bijβi =

m∑i=1

(caij + bij)βi.


Thus [cσ + τ ]AB = (caij + bij) = cA + B = c[σ]AB + [τ ]AB . That is, ϕ(cσ + τ) =cϕ(σ) + ϕ(τ). �

Examples 7.2.7

1. Let U = V = R2. Suppose σ is the rotation through an angle θ in the anticlockwisedirection. Then σ maps (1, 0) to (cos θ, sin θ) and (0, 1) to (− sin θ, cos θ), respectively.Thus σ is represented by (

cos θ − sin θ

sin θ cos θ

)with respect to the standard (ordered) basis.

2. Let U = V = R2. Suppose σ is the reflection about the line x = y. Then σ(x, y) =

(y, x). Thus σ(1, 0) = (0, 1) and σ(0, 1) = (1, 0). Hence σ is represented by

(0 1

1 0

)with respect to the standard basis.

3. The zero transformation represented by the zero matrix with respect to any bases.The identity transformation is represented by the identity matrix with respect to anybasis. Note that if we choose two difference bases A and B for the same vectorspace, then the identity transformation is represented by a non-identity matrix. Weshall discuss this fact later.

4. Let A ∈ Mm,n(F). We define σA : Fn → Fm by σA(X) = AX for all X ∈ Fn (whichis identified with F1×n). Then with respect to the standard bases Sn and Sm of Fn

and Fm respectively, [σA]SnSm

= A. This means that for any m × n matrix, it is amatrix representation of a linear transformation of some suitable vector spaces withrespect to some suitable bases. �

Example 7.2.8 Let σ : R3 → R2 be defined by σ(x, y, z) = (x − z, 3x − 2y + z) foreach (x, y, z) ∈ R3.

Now we first verify that σ is linear. For any α = (x1, y1, z1), β = (x2, y2, z2) ∈ R3,c ∈ R, cα+ β = (cx1 + x2, cy1 + y2, cz1 + z2). Then

σ(cα+ β) =((cx1 + x2)− (cz1 + z2), 3(cx1 + x2)− 2(cy1 + y2) + (cz1 + z2)

)=

(cx1 − cz1, 3cx1 − 2cy1 + cz1

)+

(x2 − z2, 3x2 − 2y2 + z2

)= c(x1 − z1, 3x1 − 2y1 + z1) + (x2 − z2, 3x2 − 2y2 + z2) = cσ(α) + σ(β).

Hence σ is linear.Suppose A = {e1, e2, e3} and B = {e′1, e′2} be the standard bases of R3 and R2,

respectively. Then

σ(e1) = σ(1, 0, 0) = (1, 3) = e′1 + 3e′2,σ(e2) = σ(0, 1, 0) = (0,−2) = −2e′2,σ(e3) = σ(0, 0, 1) = (−1, 1) = −e′1 + e′2.


Hence [σ]AB =

(1 0 −13 −2 1

).

Clearly the matrix obtained above is of rank 2. So rank(σ) = 2 and hence nullity(σ) =3− 2 = 1. �

Theorem 7.2.9 Let σ ∈ L(U, V ) and τ ∈ L(V,W ). Suppose A = {α1, . . . , αn},B = {β1, . . . , βm} and C = {γ1, . . . , γp} are bases of U , V and W , respectively. Then[τ ◦ σ]AC = [τ ]BC [σ]AB .

Proof: Let [σ]AB = (aij) ∈Mm,n(F) and [τ ]BC = (bki) ∈Mp,m(F). Then

σ(αj) =

m∑i=1

aijβi, and τ(βi) =

p∑k=1

bkiγk.

Then

(τ ◦ σ)(αj) = τ(σ(αj)) = τ

(m∑i=1

aijβi

)

=

m∑i=1

aijτ(βi) =

m∑i=1

aij

p∑k=1

bkiγk =

p∑k=1

(m∑i=1

bkiaij

)γk.

Thus we have the theorem. �

This theorem explains why the multiplication of matrices is so defined.

Corollary 7.2.10 Let σ ∈ L(U, V ) be an isomorphism. Suppose A and B are bases ofU and V , then [σ]AB is an invertible matrix.

Proof: Since σ−1 ◦ σ = ι, the identity transformation from U to U and [ι]A = I, thecorollary follows. �

Theorem 7.2.11 Let σ ∈ L(U, V ). Suppose A and B are bases of U and V , respec-tively. Then for α ∈ U , [σ(α)]B = [σ]AB [α]A .

Proof: Let A = {α1, . . . , αn}, B = {β1, . . . , βm}. Let

[σ]AB = (aij) and [α]A =

⎛⎜⎜⎜⎜⎝x1

x2...

xn

⎞⎟⎟⎟⎟⎠ .

Then we have

σ(α) =

n∑j=1

xjσ(αj) =

n∑j=1

xj

m∑i=1

aijβi =

m∑i=1

⎛⎝ n∑j=1

aijxj

⎞⎠βi.


Thus

[σ(α)]B =

⎛⎜⎜⎜⎜⎜⎝n∑

j=1a1jxj

...n∑

j=1amjxj

⎞⎟⎟⎟⎟⎟⎠ = [σ]AB [α]A .

�

Corollary 7.2.12 Keep the notation as in Theorem 7.2.11. If [σ(α)]B = A[α]A for allα ∈ U , then A = [σ]AB .

Proof: From the assumption and Theorem 7.2.11 we have (A − [σ]AB )[α]A = 0 in Fn.Since α is arbitrary and from Corollary 7.2.3, (A− [σ]AB )x = 0 for all x ∈ Fn. By settingx = e1, e2, . . . , en, we have A− [σ]AB = O. Hence the corollary holds. �

Suppose σ : U → V is linear. Let β ∈ V . To solve the linear problem σ(α) = β,we choose ordered bases A = {α1, . . . , αn} of U and B = {β1, . . . , βm} of V . Then theunknown vector α is represented by n×1 matrix X, given vector β by m×1 matrix b andlinear transformation σ by m× n matrix A. Thus we convert the linear problem to the

matrix equation AX = b. If we can find X =

⎛⎜⎜⎝x1...

xn

⎞⎟⎟⎠, then we have found α =n∑

i=1xiαi

as our solution.

As another application of the representing matrix we can compute nullity(σ) withoutknowing ker(σ). Since rank(σ) = dim span{σ(α1), . . . , σ(αn)} which is the dimension ofC(A) and can be easily seen via the ‘new clothes’, i.e., the rref of A. Then nullity(σ) =n− rank(σ) is determined immediately.

Exercise 7.2

7.2-1. Let A and B be the standard bases of Rn and Rm, respectively. Show that thefollowing maps σ are linear and compute [σ]AB , rank(σ) and nullity(σ).

(a) σ : R2 → R3 is defined by σ(x, y) = (2x− y, 3x+ 4y, x).

(b) σ : R3 → R3 is defined by σ(x, y, z) = (x− y + 2z, 2x+ y,−x− 2y + 2z).

(c) σ : R3 → R is defined by σ(x, y, z) = 2x+ y − z.

(d) σ : Rn → Rn is defined by σ(x1, x2, . . . , xn) = (x1, x1, . . . , x1).

(e) σ : Rn → Rn is defined by σ(x1, x2, . . . , xn) = (xn, xn−1, . . . , x1).

7.2-2. Let A = {(1, 2), (2, 3)} and B = {(1, 1, 0), (0, 1, 1), (2, 2, 3)}. Show that Aand B are bases of R2 and R2, respectively. Define σ : R2 → R3 by σ(x, y) =(x− y, x+ y, y). Compute [σ]AB .


7.2-3. Define σ : M2(R) → P3(R) by σ

(a b

c d

)= (a + b) + 2dx + bx2. Let A =

{E1,1, E1,2, E2,1, E2,2} and B = {1, x, x2} be the standard bases of M2(R) andP3(R), respectively. Show that σ is linear and compute [σ]AB .

7.2-4. Define σ : M2(R)→M2(R) by σ(X) = AX −XA, here A =

(1 −10 2

).

(a) Show that σ is linear.

(b) Represent σ by matrix using the standard basis of M2(R) (see Problem 7.2-3).

(c) Find the eigenpairs of the matrix obtained from (b) (note that, these eigen-pairs are called eigenpairs of σ).

(d) Find all X such that AX = XA.

7.2-5. Define σ : M2(R) → M2(R) by σ(A) = AT . Show that σ is linear and compute[σ]A , where A is the standard basis of M2(R).

7.2-6. Define σ : M2(R) → R by σ(A) = Tr(A). Show that σ is linear and compute[σ]AB , where A and B are the standard bases of M2(R) and R respectively.

7.2-7. Let V be a vector space with ordered basis A = {α1, α2, . . . , αn}. Put α0 = 0.Suppose σ : V → V is a linear transformation such that σ(αj) = αj + αj−1 for1 ≤ j ≤ n. Compute [σ]A .

7.2-8. Let D : P3(R) → P3(R) be the differential operatord

dx. Find the matrix repre-

sentation of D with respect to the bases {1, 1 + 2x, 4x2 − 3} and {1, x, x2}.

7.2-9. Let A = {(2, 4), (3, 1)} be a basis of R2. What is the 2 × 1 matrix X thatrepresents the vector (2, 1) with respect to A ?

7.2-10. Find the formula of the reflection about the line y =√3x in the xy-plane.

§7.3 Change of Basis

The ‘clothes’ of a linear transformation σ : U → V is just the matrix representingthe transformation via bases of U and V respectively. The ‘look’ of the transformationdepends on the bases chosen. If we change ‘clothes’ the transformation will have adifferent ‘look’. Let us consider the identity transformation first.

Let A = {α1, . . . , αn}, B = {β1, . . . , βn} be two bases of U . For avoiding confusion,we let U ′ be the vector space U with the ordered basis B. Since βj ∈ U for 1 ≤ j ≤ n,

there exist pij ∈ F such that βj =n∑

i=1pijαi. The associated matrix P = (pij) is called

the matrix of transition from A to B or transition matrix from A to B.


Consider the identity transformation ι : U ′ → U . Since ι(βj) = βj =n∑

i=1pijαi,

[ι]BA = P . On the other hand, we consider the identity transformation ι : U → U ′ andwant to find [ι]AB .

Let Q = [ι]AB = (qij), i.e., αi =n∑

k=1

qkiβk. Then for each i,

αi =

n∑k=1

qkiβk =

n∑k=1

qki

n∑j=1

pjkαj =

n∑j=1

(n∑

k=1

pjkqki

)αj .

Thus we haven∑

k=1

pjkqki = δji. Hence PQ = I. That is, [ι]AB = P−1.

Let α ∈ U = U ′. By Theorem 7.2.11

[α]A = [ι(α)]A = [ι]BA [α]B = P [α]B or [α]B = P−1[α]A . (7.1)

So the above formula shows the relation of the coordinates between different bases.

Theorem 7.3.1 Suppose σ ∈ L(U, V ). Suppose A = [σ]AC , where A and C are basesof U and V , respectively. Suppose A′ = [σ]BD , where B and D are other bases of U andV , respectively. Let P = [ιU ]

BA and Q = [ιV ]

DC be matrices of transition from A to B

and C to D , respectively. Here ιU and ιV denote the identity transformation of U andV , respectively. Then A′ = Q−1AP .

Proof: Since ιV ◦ σ = σ = σ ◦ ιU and by Theorem 7.2.9,

[ιV ]DC [σ]

BD = [σ]BC = [σ]AC [ιU ]

BA .

Thus QA′ = [σ]BC = AP , hence A′ = Q−1AP . �

Corollary 7.3.2 Let us keep the notation in Theorem 7.3.1. If A = B, then A′ =Q−1A.

Keeping the notation in Theorem 7.3.1. Suppose U = V , A = C and B = D . Thenby Theorem 7.3.1 we have the following corollary:

Corollary 7.3.3 Suppose σ ∈ L(V, V ). Let A and B be two bases of V . Let P be thematrix of transition from A to B. Then [σ]B = P−1[σ]A P .

Example 7.3.4 Let σ : R3 → R2 be a linear transformation that σ(e1) = (1, 1),σ(e2) = (0,−2) and σ(e3) = (1, 3). Let B = {(1, 1, 1), (1, 0,−1), (0, 0, 1)} andD = {(1,−1), (1, 1)} be bases of R3 and R2, respectively. Find [σ]BD .

Let A = S3 and C = S2 be the standard bases of R3 and R2, respectively. By thedefinition of σ we can easy to see that

A = [σ]S3S2

=

(1 0 1

1 −2 3

).


It is also easy to see that the matrix transition from S3 to B is P =

⎛⎜⎝ 1 1 0

1 0 0

1 −1 1

⎞⎟⎠ .

The matrix transition from S2 to D is Q =

(1 1

−1 1

). Then

[σ]BD = Q−1AP = 12

(1 −11 1

)(1 0 1

1 −2 3

)⎛⎜⎝ 1 1 0

1 0 0

1 −1 1

⎞⎟⎠ =

(0 1 −12 −1 2

). �

Theorem 7.3.5 Suppose σ ∈ L(U, V ). Let A = {α1, . . . , αn} and C = {β1, . . . , βm}be bases of U and V , respectively. If rank(σ) = r, then rank([σ]AC ) = r.

Proof: Choose a basis B = {α′1, . . . , α′n} of U such that {α′r+1, . . . , α′n} is a basis of

ker(σ). Since rank(σ) = r, {σ(α′1), . . . , σ(α′r)} is a basis of σ(U) and hence can beextended to a basis D = {β′1, . . . , β′m} of V . Hence σ(α′i) = β′i for i = 1, 2 . . . , r andσ(α′j) = 0 for j = r + 1, . . . , n. Let A′ = [σ]BD . Then

Q−1AP = A′ =

(Ir Or,n−r

Om−r,r Om−r,n−r

),

where P and Q are the respective transition matrices. Then rank(A′) = r. Since P andQ are invertible, by Theorem 2.4.11 rank(A) = r. �

Suppose A ∈ Mm,n(F). We can choose a basis A of an n-dimensional vector spaceU (for example Fn) and a basis C of an m dimensional vector space V (for exampleFm). Then by Theorem 7.1.26, we can define a linear transformation σ : U → V suchthat [σ]AC = A. If rank(A) = r, then by Theorem 7.3.5 rank(σ) = r.

The main purpose of changing basis is to make the linear transformation σ : U → Vlooks nicer, for example, upper triangular or diagonal. To do this we may have to changethe basis of U . This is equivalent to multiply the representing matrix on the left by anon-singular matrix. To attain this goal, we may simply apply a sequence of elementaryrow operations to the representing matrix. Or we would like to change basis of V whichcan sometimes be obtained by applying a sequence of elementary column operations.

When U = V , the representing matrix of σ is a square matrix. Diagonalizing asquare matrix is just choosing the right basis of U so that the new representing matrixis diagonal if this is possible.

Exercise 7.3

7.3-1. Find the matrix of transition in P3(R) from the basis A = {1, 1 + 2x, 4x2 − 3}to S = {1, x, x2}. [Hint: Find the matrices of transitions from S to A first.]

7.3-2. Suppose A = {x2−x+1, x+1, x2+1} and B = {x2+x+4, 4x2−3x+2, 2x2+3}are bases of P3(Q). What is the matrix of transition from A to B? [Hint: Find


the transition matrices from the standard basis S = {1, x, x2} to A and to B,respectively.]

7.3-3. Let σ : P3(R)→ P2(R) be defined by σ(p) = p′, the derivative of p ∈ P3(R). LetA = {x2 − x+ 1, x+ 1, x2 + 1} and C = {1 + x, 1− x} be bases of P3(R) andP2(R), respectively. Using transition matrices find [σ]AC .

7.3-4. Let σ be the linear transformation from R3 to R3 defined by

σ(x, y, z) = (3x− y − 2z, 2x− 2z, 2x− y − z).

Find the matrix A representing σ with respect to the basis

{(1, 1, 1), (1, 2, 0), (0,−2, 1)}.

Chapter 8

Diagonal Form and Jordan Form

§8.1 Diagonal Form

In Chapter 7, we knew that the matrix representation of a linear transformationbetween two vector spaces U and V depends on the choice of bases. In this chapter, weconsider the special case when U = V . In this case we choose the same (ordered) basisA for U and V . If we change the basis from A to B, then for σ ∈ L(V, V ) we knewfrom Theorem 7.3.1 or Corollary 7.3.3 that

[σ]B = P−1[σ]A P,

where P is the matrix of transition from A to B. In this chapter, we want to find abasis of V such that the matrix representation of σ is as simple as possible. In matrixtheory terminology, it is equivalent to finding an invertible matrix P such that P−1AP assimple as possible for a given square matrix A. The simplest form is diagonal matrix. Soour question is whether A is diagonalizable. We have discussed this topic in Chapter 5.We have two tests for the diagonalizability of A in Theorems 5.3.6 and 5.3.10. Howeverthe proof of Theorem 5.3.10 has not yet completed. We shall finish the proof of it. Inthis chapter, all vector spaces are finite dimensional and all linear transformations arein L(V, V ) for some vector space V .

Let f(x) = akxk + · · ·+ a1x+ a0 ∈ F[x] and σ ∈ L(V, V ). We define that

f(σ) = akσk + · · ·+ a1σ + a0ι,

where ι is the identity mapping and σi =

i times︷︸︸︷σ ◦ · · · ◦ σ for i ≥ 1.

Since the characteristic polynomials of two similar matrices are the same, we canmake the following definition.

Definition 8.1.1 The characteristic polynomial of any matrix representing the lineartransformation σ is called the characteristic polynomial of σ. This is well defined sinceany two representing matrices are similar and hence have the same characteristic poly-nomial. We shall write the characteristic polynomial of σ as Cσ(x) or C(x).

152

Diagonal Form and Jordan Form 153

It is easy to see that the (monic) minimum polynomials of two similar matricesare the same (Proposition 5.2.7). So we say that the monic minimum polynomial of alinear transformation σ is the minimum polynomial of any matrix representing it. Forshort, we call the monic minimum polynomial of a linear transformation the minimumpolynomial.

Only the zero matrix represents the zero linear transformation and vice versa. Sup-pose f(x) is the characteristic or minimum polynomial of a linear transformation σ.Then f(σ) = O, the zero linear transformation.

Definition 8.1.2 Let σ ∈ L(V, V ). σ is diagonalizable if there is a basis B of V suchthat [σ]B is a diagonal matrix. That is, the basis B consists of eigenvectors of σ.

Theorem 5.3.10 (Second Test for Diagonalizability) A ∈Mn(F) is diagonalizableover F if and only if the monic minimum polynomial of A has the form

m(x) = (x− λ1)(x− λ2) · · · (x− λp),

where λ1, λ2, . . . , λp are distinct scalars (eigenvalues of A).

Proof: It remains to prove the “ if” part as the “only if” part had been done in Chapter5. By Example 7.2.7 we can choose a linear transformation σA : V → V such that A isits representing matrix. Suppose

m(x) = (x− λ1)(x− λ2) · · · (x− λp)

is the minimum polynomial of A with λi all distinct.

If p = 1, then σ = λ1ι. Hence it is diagonalizable. So we assume p ≥ 2. LetMj = ker(σ−λjι) 1 ≤ j ≤ p. Nonzero vectors in Mj are eigenvectors of σ corresponding

to λj . By Theorem 5.3.1 Mj ∩∑i �=j

Mi = {0}. Thus the sump∑

j=1Mj is direct. Let

mj = dimMj = nullity(σ − λjι) and rank(σ − λjι) = rj . Since M1 ⊕ · · · ⊕Mp ⊆ V , wehave m1 + · · ·+mp ≤ n. By Theorem 7.1.17 we have

rj = rank(σ − λjι) = n− nullity(σ − λjι) = n−mj .

Also, by Theorem 7.1.20

dim([(σ − λ1ι) ◦ (σ − λ2ι)](V )

)= rank[(σ − λ1ι) ◦ (σ − λ2ι)]

= rank(σ − λ2ι)− dim[(σ − λ2ι)(V ) ∩ ker(σ − λ1ι)

]≥ r2 −m1 = n− (m2 +m1).

Repeating use of the same idea, we have

0 = dim(m(σ)(V )) = dim([(σ − λ1ι) ◦ · · · ◦ (σ − λpι)](V )

) ≥ n−p∑

j=1

mj .

154 Diagonal Form and Jordan Form

Hencem1+· · ·+mp ≥ n. Therefore,m1+· · ·+mp = n. This show that V = M1⊕· · ·⊕Mp.Since every nonzero vectors of Mj is an eigenvector of σ, V has a basis consisting ofeigenvectors of σ. Hence σ is diagonalizable, i.e., A is diagonalizable. �

This theorem is not useful in practice as it is usually very tedious to find a minimumpolynomial. In fact, we may just go ahead to find a basis consisting of eigenvectors ifsuch a basis exists. If so, then the matrix is diagonalizable. Otherwise, it is not.

The subspace ker(σ − λι) for some eigenvalue λ in the proof of Theorem 5.3.10 iscalled the eigenspace of λ and is denoted by E (λ). This concept is agreed with theeigenspace defined in Chapter 5. So we use the same notation.

Examples 8.1.3

1. Let A =

⎛⎜⎝ −1 2 2

2 2 2

−3 −6 −6

⎞⎟⎠ ∈ M3(Q). Then the eigenvalues are −2,−3 and 0. By

Corollary 5.3.8 A is diagonalizable. In fact, we can find three linearly independent

eigenvectors α1 =

⎛⎜⎝ 2

−10

⎞⎟⎠, α2 =

⎛⎜⎝ 1

0

−1

⎞⎟⎠ and α3 =

⎛⎜⎝ 0

1

−1

⎞⎟⎠ corresponding to the

eigenvalues −2, −3 and 0, respectively. Thus

P =

⎛⎜⎝ 2 1 0

−1 0 1

0 −1 −1

⎞⎟⎠ and P−1 =

⎛⎜⎝ 1 1 1

−1 −2 −21 2 1

⎞⎟⎠ ,

and hence

P−1AP =

⎛⎜⎝ −2 0 0

0 −3 0

0 0 0

⎞⎟⎠ .

�

2. Let A =

⎛⎜⎝ 1 1 −1−1 3 −1−1 2 0

⎞⎟⎠ ∈ M3(Q). Then the eigenvalues are 2, 1 and 1. An

eigenvector corresponding to λ1 = 2 is

⎛⎜⎝ 0

1

1

⎞⎟⎠. However, for λ2 = 1, we can only

obtain one linearly independent eigenvector

⎛⎜⎝ 1

1

1

⎞⎟⎠. Thus A is not diagonalizable.�

Definition 8.1.4 Let V be a vector space and σ ∈ L(V, V ). A subspace W of V is saidto be invariant under σ if σ(W ) ⊆W . W is also called an invariant subspace.


Clearly, if σ ∈ L(V, V ), then V and {0} and eigenspaces and their sums are invariantsubspaces.

Theorem 8.1.5 Let V be a vector space with a basis consisting of eigenvectors of σ ∈L(V, V ). If W is a subspace of V invariant under σ, then W also has a basis consistingof eigenvectors of σ.

Proof: Suppose {β1, . . . , βn} is a basis of V consisting of eigenvectors of σ. Let α ∈W .

Then α =n∑

i=1aiβi for some scalar a1, . . . , an. Suppose ai �= 0, aj �= 0 and βi, βj ∈ E (λ).

Then γ = aiβi+ ajβj ∈ E (λ). We do this for each i with ai �= 0. Then we can represent

α as α =p∑

i=1γi for some p, where γi are eigenvectors of σ with distinct eigenvalues λi,

respectively.

Since W is invariant under σ, W is invariant under σ − kι for any scalar k. Thus[(σ − λ2ι) ◦ · · · ◦ (σ − λpι)

](α) ∈W . But

[(σ − λ2ι) ◦ · · · ◦ (σ − λpι)

](α) =

[(σ − λ2ι) ◦ · · · ◦ (σ − λpι)

]( p∑i=1

γi

)

=

p∑i=1

[(σ − λ2ι) ◦ · · · ◦ (σ − λpι)

](γi)

= (λ1 − λ2)(λ1 − λ3) · · · (λ1 − λp)γ1.

Since λj �= λ1 for all j �= 1, we have γ1 ∈W .

Similarly, we can show that γ2, . . . , γp ∈W . Since α is arbitrary, this show that W isspanned by eigenvectors of σ in W . Thus W contains a basis consisting of eigenvectorsof σ. �

Theorem 8.1.6 Let σ ∈ L(V, V ). If V has a basis consisting of eigenvectors of σ, thenfor every invariant subspace S there is an invariant subspace T such that V = S ⊕ T .

Proof: Suppose B = {β1, . . . , βn} is a basis of V consisting of eigenvectors. Let Sbe any invariant subspace of V . By Theorem 8.1.5 S has a basis consisting of eigen-vectors of σ, say {γ1, . . . , γs}. Using the method used in the proof of the Steintz Re-placement Theorem (Theorem 6.4.6) we obtain a basis {γ1, . . . , γs, β′s+1, . . . , β

′n} where

{β′s+1, . . . , β′n} ⊆ B. Put T = span(β′s+1, . . . , β

′n). Then clearly T is invariant under σ

and V = S ⊕ T . �

In general, the converse of the above theorem is not true. But it is true for vectorspaces over C.

Theorem 8.1.7 Let V be a vector space over C and let σ ∈ L(V, V ). Suppose that forevery subspace S invariant under σ there is a subspace T also invariant under σ suchthat V = S ⊕ T . Then V has a basis consisting of eigenvectors of σ.


Proof: We shall prove this theorem by induction on n = dimV .

If n = 1, then any nonzero vector is an eigenvector of σ and hence the theorem istrue.

Assume the theorem holds for vector spaces of dimension less than n, n ≥ 2. Nowsuppose V is an n-dimensional vector space over C satisfying the hypothesis of thistheorem. By fundamental theorem of algebra, the characteristic polynomial of σ has atleast one root λ1, i.e., λ1 is an eigenvalue of σ. Let α1 be an eigenvector corresponding toλ1. Clearly, the subspace S1 = span(α1) is invariant under σ. By the assumption, thereis a subspace T1 invariant under σ such that V = S1 ⊕ T1. Clearly dimT1 = n− 1. Toapply the induction hypothesis we have to show that every subspace S2 of T1 invariantunder σ1 = σ

∣∣T1

there is a subspace T2 of T1 which is invariant under σ1.

Now suppose S2 is a subspace of T1 invariant under σ1. Then σ(S2) = σ1(S2) ⊆ S2.That is, S2 is invariant under σ. Thus by the hypothesis there exists a subspace T ′2 ofV invariant under σ such that V = S2 ⊕ T ′2. Since S2 ⊆ T1,

T1 = T1 ∩ V = T1 ∩ (S2 ⊕ T ′2) = (T1 ∩ S2)⊕ (T1 ∩ T ′2) = S2 ⊕ (T1 ∩ T ′2)

(we leave the proof of the above equalities to the reader). Since T1∩T ′2 is invariant underσ, it is invariant under σ1. Put T2 = T1 ∩ T ′2. By induction hypothesis T1 has a basis{α2, . . . , αn} consisting of eigenvectors of σ1. Since these vectors are in T1, they are alsoeigenvectors of σ. Thus {α1, α2, . . . , αn} is a basis of V consisting of eigenvectors of σ.

�

Note that we do not require F = C in the above theorem. All we need is σ has allits eigenvalues in F. That means the characteristic polynomial Cσ(x) can be written asa product of linear factors over F. In this case, we say that Cσ(x) splits over F.

Example 8.1.8 Let A =

(0 −21 3

)∈M2(Q). We want to find Ak for k ∈ N.

Solution: It is easy to see that the eigenvalues of A are 1 and 2. Since the eigenvalues

are distinct, A is diagonalizable. It is easy to see that if P =

(2 2

−1 −1

), then

P−1 =

(1 1

−1 −2

)and P−1AP =

(1 0

0 2

)= D. Now since A = PDP−1,

Ak = (PDP−1)k = (PDP−1)(PDP−1) · · · (PDP−1)︸︷︷︸k times

= PDkP−1

=

(1 1

−1 −2

)(1 0

0 2k

)(1 1

−1 −2

)=

(2− 2k 2− 2k+1

−1 + 2k −1 + 2k+1

).

�

Example 8.1.9 Prove that if A ∈ Mn(Q) such that A3 = A then A is diagonalizableover Q. Moreover, the eigenvalues of A are either 0, 1 or −1.


Proof: Let g(x) = x3 − x. Then g(A) = O. So if m(x) is the minimum polynomial ofA (remind that it is the monic one), then m(x)|g(x) = x(x−1)(x+1). Thus m(x) mustfactor into distinct linear factors. Hence A is diagonalizable.

Thus, there exists an invertible matrix P such that P−1AP = D, a diagonal matrix.Then

D3 = (P−1AP )3 = P−1A3P = P−1AP = D.

Put D = diag{d1, d2, . . . , dn}. Then we have d3i = di for all i. Thus di = 0, 1 or −1.Since di are just the eigenvalues of A, hence our assertion is proved. Note that thisresult also follows from Theorem 5.2.9 and Corollary 5.2.9. �

Example 8.1.10 Characterize all the real 2×2 matrices A for which A2−3A+2I = O.

Solution: Let g(x) = x2 − 3x + 2 = (x − 1)(x − 2). From g(A) = O we see that theminimum polynomial m(x) of A must be one of form: x− 1, x− 2 or (x− 1)(x− 2).

If m(x) = x− 1, then A = I.If m(x) = x− 2, then A = 2I.

If m(x) = (x− 1)(x− 2), then A is diagonalizable and hence similar to

(1 0

0 2

). �

Exercise 8.1

8.1-1. Let σ ∈ L(V, V ). Prove that if g(x) ∈ F [x] and α is an eigenvector of σ corre-sponding to eigenvalue λ, then g(σ)(α) = g(λ)α.

8.1-2. Show that if every vector in V is an eigenvector of σ, then σ must be a scalartransformation, i.e., σ = kι for some k ∈ F.

8.1-3. Let A =

⎛⎜⎝ −3 1 5

7 2 −61 1 1

⎞⎟⎠ ∈M3(R). Find Ak for k ∈ N.

8.1-4. Let A =

⎛⎜⎝ 4 1 1

0 4 5

0 3 6

⎞⎟⎠ ∈ M3(R). Find a matrix B such that B2 = A. [Hint:

First, find an invertible matrix P such that P−1AP = D, a diagonal matrix.Next, find C such that C2 = D.

8.1-5. Let x0 = 0, x1 =12 and xk = 1

2(xk−1 + xk−2) for k ≥ 2. Find a formula for xk bysetting a matrix A and diagonalize it. Also find lim

k→∞xk.

8.1-6. Show that the matrix

(1 c

0 1

)with c �= 0 is not diagonalizable.

8.1-7. Prove that the matrix Jn is similar to

(n 0Tn−1

0n−1 On−1

).


8.1-8. Can you find two matrices that have the same characteristic polynomial but arenot similar?

8.1-9. Prove that two diagonal matrices are similar if and only if the diagonal elementsof one are simply a rearrangement of the diagonal elements of the other.

§8.2 Jordan Form

There are some matrices that cannot be diagonalized. The best possible simple formthat we can get is so-called Jordan form. Before introducing Jordan form, we definesome concepts first.

Definition 8.2.1 Let σ ∈ L(V, V ). A nonzero vector α ∈ V is called a generalizedeigenvector of σ if there exists a scalar λ and positive integer s such that (σ−λι)s(α) = 0.We say that α is a generalized eigenvector corresponding to λ. Hence, an eigenvector isalso a generalized eigenvector.

Note that if α is a generalized eigenvector corresponding to λ, then λ is an eigenvalueof σ. If s is the smallest positive integer such that (σ − λι)s(α) = 0, then β = (σ −λι)s−1(α) �= 0 is an eigenvector of σ corresponding to the eigenvalue λ.

Definition 8.2.2 Let σ ∈ L(V, V ). Suppose α is a generalized eigenvector of σ corre-sponding to eigenvalue λ. If s denotes the smallest positive integer such that(σ − λι)s(α) = 0, then the ordered set

Z(α;σ, λ) = {(σ − λι)s−1(α), (σ − λι)s−2(α), . . . , (σ − λι)(α), α} = (Z(α;λ) for short)

is called a cycle of generalized eigenvectors of σ corresponding to λ. The elements(σ−λι)s−1(α) and α are called the initial vector and the end vector of the cycle, respec-tively. Also we say that length of the cycle is s.

Let λ be an eigenvalue of σ ∈ L(V, V ) and α be a generalized eigenvector corre-sponding to λ. Then each element of Z(α;λ) is a generalized eigenvector of σ. Let

K (λ) = {α ∈ V | α is a generalized eigenvector corresponding to λ} ∪ {0}.

Then Z(α;λ) ⊆ K (λ). In the following we shall consider the set K (λ). Under somecondition we try to partition it into a collection of mutually disjoint cycles of generalizedeigenvectors.

Theorem 8.2.3 Let λ be an eigenvalue of σ ∈ L(V, V ). The set K (λ) is a subspace ofV containing the eigenspace E (λ) and is invariant under σ.

Proof: Clearly E (λ) ⊆ K (λ). Suppose α, β ∈ K (λ) and c ∈ F. There are positiveintegers s and t such that (σ−λι)s(α) = 0 and (σ−λι)t(β) = 0. Then (σ−λι)r(cα+β) =


0, where r = max{s, t}. Hence K (λ) is a subspace of V . For each α ∈ K (λ), there isa positive integer s such that (σ − λι)s(α) = 0. Then

(σ − λι)s(σ(α)) = [(σ − λι)s ◦ σ](α) = [σ ◦ (σ − λι)s](α)

= σ((σ − λι)s(α)) = 0.

Therefore, σ(α) ∈ K (λ) and hence K (λ) is invariant under σ. �

Definition 8.2.4 The subspace K (λ) is called the generalized eigenspace correspondingto λ.

Theorem 8.2.5 Let the ordered set {α1, . . . , αk} be a cycle of generalized eigenvectorsof σ corresponding to eigenvalue λ. Then the initial vector α1 is an eigenvector and αi

is not an eigenvector of σ for each i ≥ 2. Also {α1, . . . , αk} is linearly independent.

Proof: We have only to prove that {α1, . . . , αk} is linearly independent. To do this, weprove by induction on k, the length of the cycle.

If k = 1, then the theorem is trivial.

Assume k ≥ 2 and that cycles of length less than k are linearly independent.

Let {α1, . . . , αk} be a cycle of generalized eigenvectors of σ corresponding to λ.

Suppose a1, . . . , ak are scalars such thatk∑

i=1aiαi = 0. By the definition of cycle, we have

αi = (σ − λι)k−i(αk) for 1 ≤ i ≤ k. Now

0 = (σ − λι)

(k∑

i=1

aiαi

)=

k∑i=1

ai(σ − λι) ◦ (σ − λι)k−i(αk)

=

k∑i=2

ai(σ − λι)k−i+1(αk) =

k∑i=2

aiαi−1.

This is a linear relation among the ordered set Z = {α1, . . . , αk−1}. Clearly Z isa cycle of generalized eigenvectors of σ corresponding to eigenvalue λ of length k − 1.Thus by induction hypothesis, we have ai = 0 for 2 ≤ i ≤ k. This shows that a1 = 0 aswell, so {α1, . . . , αk} is linearly independent. �

Let σ ∈ L(V, V ). Suppose that there is a basis B of V such that

J = [σ]B =

⎛⎜⎜⎜⎜⎝J1 O · · · O

O J2 · · · O...

.... . .

...

O O · · · Jp

⎞⎟⎟⎟⎟⎠


for some p ≥ 1, where Ji is an mi ×mi matrix of the form (λ)1×1 when mi = 1 or⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

λ 1 0 · · · 0 0

0 λ 1 · · · 0 0...

. . .. . .

. . ....

...

0. . .

. . .. . . 1 0

0 0. . .

. . . λ 1

0 0 0 · · · 0 λ

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠when mi ≥ 2,

for some eigenvalue λ of σ.Such a matrix Ji is called a Jordan block corresponding to λ, and the matrix J is

called a Jordan canonical form of σ or Jordan form for short. B is called a Jordancanonical basis corresponding to σ or Jordan basis corresponding to σ for short.

Example 8.2.6 Let

J =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

3 1 0

0 3 1

0 0 3

0

0

0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 2 0 0 0 0

0 0 0

0 0 0

0

0

2 1

0 2

0 0

0 0

0 0 0

0 0 0

0

0

0 0

0 0

0 1

0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Then J is a Jordan form. Let σ ∈ L(C8,C8) defined by σ(X) = JX for X ∈ C8. Then[σ]S = J , where S is the standard basis of C8 over C.

It is clear that CJ(x) = Cσ(x) = (3−x)3(2−x)3x2. One can check that the minimumpolynomial of σ is (x− 3)3(x− 2)2x2. By Theorem 5.3.10 σ is not diagonalizable. Alsowe observe that of the vectors e1, e2, . . . , e8 only e1, e4, e5, e7 are eigenvectors of σ.

Consider η = σ − 3ι. Then

η(e3) = (σ − 3ι)(e3) = σ(e3)− 3e3 = 3e3 + e2 − 3e3 = e2;

η2(e3) = η(e2) = e1.

So Z(e3; 3) = {e1, e2, e3}. Similarly Z(e4; 2) = {e4}, Z(e6; 2) = {e5, e6} and Z(e8; 0) ={e7, e8}. �

From the above example, we see that a Jordan basis of V corresponding to σ is adisjoint union of cycles of generalized eigenvectors. The converse is clearly true. It istrue in general. Now we state and prove this theorem.

Theorem 8.2.7 Let B be a basis of V and let σ ∈ L(V, V ). Then B is a Jordan basisfor V corresponding to σ if and only if B is a disjoint union of cycles of generalizedeigenvectors of σ.


Proof: Suppose [σ]B = J =

⎛⎜⎜⎜⎜⎜⎝J1 O · · · O...

. . .. . .

...

O. . .

. . . O

O O · · · Jp

⎞⎟⎟⎟⎟⎟⎠, where Ji’s are Jordan blocks,

1 ≤ i ≤ p. Suppose Ji is an mi × mi matrix. Then n =p∑

i=1mi. We partition B

into p classes B1, . . . ,Bp corresponding to the sizes of the Jordan blocks, i.e., B1 con-sists of the first m1 vectors of B, B2 consists the next m2 vectors of B, and so on.Let

Bi = {β1, . . . , βmi} and Ji =

⎛⎜⎜⎜⎜⎜⎝λ 1 · · · 0

0 λ. . .

......

. . .. . . 1

0 0 · · · λ

⎞⎟⎟⎟⎟⎟⎠ .

Then by the definition of [σ]B,

σ(βmi) = λβmi + βmi−1;σ(βmi−1) = λβmi−1 + βmi−2;

...

σ(β2) = λβ2 + β1;

σ(β1) = λβ1.

If we let η = σ − λι, then

η(βmi) = βmi−1, η2(βmi) = βmi−2, . . . , η

mi−1(βmi) = β1, ηmi(βmi) = 0.

Then Bi = Z(βmi ;λ). Therefore, B is a disjoint union of cycles of generalized eigen-vectors of σ.

The “if” part is trivial. �

Lemma 8.2.8 Let σ ∈ L(V, V ) and let λ1, λ2, . . . , λk be distinct eigenvalues of σ. Foreach i, 1 ≤ i ≤ k, let αi ∈ K (λi). If α1 + α2 + · · ·+ αk = 0, then αi = 0 for all i.

Proof: We prove the lemma by induction on k. It is trivial for k = 1. Assume that thelemma holds for any k − 1 distinct eigenvalues, where k ≥ 2.

Suppose αi ∈ K (λi) for 1 ≤ i ≤ k satisfying α1+α2+ · · ·+αk = 0. Suppose α1 �= 0.For each i, let si be the smallest nonnegative integer for which (σ−λiι)

si(αi) = 0. Sinceα1 �= 0, s1 ≥ 1. Let β = (σ−λ1ι)

s1−1(α1). Then β is an eigenvector of σ correspondingto λ1. Let g(x) = (x − λ2)

s2 · · · (x − λk)sk . Then g(σ)(αi) = 0 for all i > 1. By the

result of Exercise 8.1-1,

g(σ)(β) = g(λ1)β = (λ1 − λ2)s2 · · · (λ1 − λk)

skβ �= 0.


On the other hand,

g(σ)(β) = g(σ)((σ − λ1ι)

s1−1(α1))= [g(σ) ◦ (σ − λ1ι)

s1−1](α1)

= −(σ − λ1ι)s1−1(g(σ)(α2 + · · ·+ αk)

)= 0.

This is a contradiction. So α1 must be 0 and then α2 + · · · + αk = 0. By induction,αi = 0 for all i, 2 ≤ i ≤ k. The lemma follows. �

Corollary 8.2.9 The sum K (λ1) + K (λ2) + · · ·+ K (λk) is direct.

Proof: This corollary follows from Proposition 6.5.14 and Lemma 8.2.8. �

Lemma 8.2.10 Let σ ∈ L(V, V ) with distinct eigenvalues λ1, λ2, . . . , λk. For each i =1, 2, . . . , k, let Si be a linearly independent subset of K (λi). Then S1, S2, . . . , Sk aremutually disjoint and S = S1 ∪ S2 ∪ · · · ∪ Sk is a linearly independent subset of V .

Proof: Suppose α ∈ Si ∩ Sj for some i �= j. Let β = −α. Then α ∈ K (λi) andβ ∈ K (λj) with α + β = 0. By Lemma 8.2.8 we have α = 0. This contradicts to thefact that α lies in a linearly independent set. Thus Si ∩ Sj = ∅ for i �= j.

Suppose Si = {αi,1, . . . , αi,mi}. Then S = {αi,j | 1 ≤ j ≤ mi, 1 ≤ i ≤ k}. Supposek∑

i=1

mi∑j=1

ai,jαi,j = 0. Let βi =mi∑j=1

ai,jαi,j . Then βi ∈ K (λi) for each i and β1+ · · ·+βk =

0. By Lemma 8.2.8 βi = 0 for all i. Since Si is linearly independent for each i, ai,j = 0for all j. Therefore, S is linearly independent. �

Is K (λ1) ⊕ K (λ2) ⊕ · · · ⊕ K (λk) = V for some k? The answer is not affirma-tive. We have to impose some condition on the characteristic polynomial of the lineartransformation σ.

Theorem 8.2.11 Let σ ∈ L(V, V ). For each i with 1 ≤ i ≤ q let Zi be a cycle ofgeneralized eigenvectors of σ corresponding to the same λ with initial vector βi. If

{β1, . . . , βq} is linearly independent, then Zi’s are mutually disjoint and Z =q⋃

i=1Zi is

linearly independent.

Proof: Suppose α ∈ Zi ∩ Zj . There are two nonnegative integers s, t such that(σ − λι)s(α) = βi and (σ − λι)t(α) = βj . Suppose s ≤ t. Then βj = (σ − λι)s(α) ∈ Zj .Since βi and βj are eigenvectors of σ and βj is the only eigenvector in Zj , βi = βj andhence i = j. Thus Zi ∩ Zj = ∅ if i �= j.

We prove the linearly independence of Z by induction on |Z| = n. If n = 1, thenthis is trivial. Assume that for n ≥ 2, the result is true for |Z| < n.

Now suppose |Z| = n. Let W = span(Z) and let η = σ − λι. Clearly W isinvariant under η and dimW ≤ n. Let φ = η

∣∣W

: W → W . For each i let Z ′i denotethe cycle obtained from Zi by deleting the end vector. Then Z ′i = φ(Zi) \ {0}. Let

Z ′ =q⋃

i=1Z ′i. Then span(Z ′) = φ(W ). Since |Z ′| = n − q < n and the set of initial

vectors of Z ′i’s are the same as those of Zi’s. By induction Z ′ is linearly independent.Then rank(φ) = dim(φ(W )) = n− q.


Since φ(βi) = η(βi) = 0, βi ∈ ker(φ). Since {β1, . . . , βq} is linearly independent,dim(ker(φ)) = nullity(φ) ≥ q. By Theorem 7.1.17,

n ≥ dimW = rank(φ) + nullity(φ) ≥ n− q + q = n.

So dimW = n and Z is a basis of W , and hence Z is linearly independent. �

Lemma 8.2.12 Let σ ∈ L(V, V ) and let W be an invariant subspace of V under σ.Then the characteristic polynomial of σ

∣∣W

divides the characteristic polynomial of σ.

Proof: Let A = {α1 . . . , αk} be a basis of W . Extend A to a basis B of V . Let

A = [σ]B and B =[σ∣∣W

]A. Then it is clear that A =

(B C

O D

)for some matrices C

and D. Then by Exercise 4.3-6 we have

CA(x) = det(A− xI) = det

(B − xI C

O D − xI

)= CB(x)CD(x).

Thus the lemma follows. �

Theorem 8.2.13 Let V be an n-dimensional vector space over F and let σ ∈ L(V, V ).If the characteristic polynomial Cσ(x) splits over F, then there is a Jordan basis forV corresponding to σ; i.e., there is a basis for V that is a disjoint union of cycles ofgeneralized eigenvectors of σ.

Proof: We prove the theorem by induction on n. Clearly, the theorem is true forn = 1. Assume that the theorem is true for any vector space of dimension less thann > 1. Suppose dimV = n. Since Cσ(x) splits over F, there is an eigenvalue λ1 of σ.Let r = rank(σ − λ1ι). Since λ1 is an eigenvalue, U = (σ − λ1ι)(V ) is an r-dimensionalinvariant subspace of V under σ and r < n. Let φ = σ

∣∣U. By Lemma 8.2.12 we have

Cφ(x)|Cσ(x) and hence Cφ(x) splits over F. By induction, there is a Jordan basis A forU corresponding to φ as well as to σ.

We want to extend A to be a Jordan basis for V . Suppose σ has k distinct eigenvaluesλ1, . . . , λk. For each j let Sj consist of the generalized eigenvectors in A correspondingto λj . Since A is a basis, Sj is linearly independent that is a disjoint union of cycles ofgeneralized eigenvectors of λj . Let Z1 = Z(α1;λ1), Z2 = Z(α2;λ1), . . . , Zp = Z(αp;λ1)be the disjoint cycles whose union is S1, p ≥ 1. For each i, since Zi ⊆ (σ − λ1ι)(V ),there is a βi ∈ V such that (σ − λ1ι)(βi) = αi. Then Z ′i = Zi ∪ {βi} = Z(βi;λ1) is acycle of generalized eigenvectors of σ corresponding λ1 with end vector βi.

Let γi be the initial vector of Zi for each i. Then {γ1, . . . , γp} is a linearly independentsubset of ker(σ− λ1ι), and this set can be extended to a basis {γ1, . . . , γp, . . . , γn−r} forker(σ − λ1ι). If p < n− r, then let Z ′j = {γj} for p < j ≤ n− r. Then Z ′1, . . . , Z ′n−r is acollection of disjoint cycles of generalized eigenvectors corresponding to λ1.

Let S′1 =n−r⋃i=1

Z ′i. Since the initial vectors of these cycles form a linearly independent

set, by Theorem 8.2.11 S′1 is linearly independent. Then B = S′1∪S2 · · ·∪Sk is obtained


from A by adjoining n − r vectors. By Theorem 8.2.7, B is a Jordan basis for Vcorresponding to σ. �

In the proof of Theorem 8.2.13 the cycles Z ′i’s were constructed so that the set of theirinitial vectors {γ1, . . . , γn−r} is a basis of ker(σ− λ1ι) = E (λ1). Then, in the context ofthe construction of the proof of Theorem 8.2.13, the number of cycles corresponding toλ equals dimE (λ). These relations are true for any Jordan basis. We do not want toprove it in this textbook∗.

Following we investigate the connection between the generalized eigenspaces K (λ)and the characteristic polynomial Cσ(x) of σ.

Theorem 8.2.14 Let V be an n-dimensional vector space over F. Let σ ∈ L(V, V ) besuch that Cσ(x) splits over F. Suppose λ1, . . . , λk are the distinct eigenvalues of σ withalgebraic multiplicities m1, . . . ,mk, respectively. Then

(a) dimK (λi) = mi for all i, 1 ≤ i ≤ k.

(b) For each i, if Ai is a basis of K (λi), then A = A1 ∪ · · · ∪Ak is a basis of V .

(c) If B is a Jordan basis of V corresponding to σ, then for each i, Bi = B ∩K (λi)is a basis of K (λi).

(d) K (λi) = ker((σ − λiι)mi) for all i.

(e) σ is diagonalizable if and only if E (λi) = K (λi) for all i.

Proof: We shall prove (a), (b) and (c) simultaneously. Let J = [σ]B be a Jordan form,and let ri = dimK (λi) for 1 ≤ i ≤ k.

For each i, the vectors in Bi are in one-to-one correspondence with the columns of Jthat contain λi as the diagonal entry. Since Cσ(x) = CJ(x) and J is an upper triangularmatrix, the number of occurrence of λi on the diagonal is mi. Therefore, |Bi| = mi.Since Bi is a linearly independent subset of K (λi), mi ≤ ri for all i.

Since Ai’s are linearly independent and mutually disjoint, by Lemma 8.2.10 A is

linearly independent. So we have n =k∑

i=1mi ≤

k∑i=1

ri ≤ n. Hence we have mi = ri for

all i andk∑

i=1ri = n. Then |A | = n and A is a basis of V . Since mi = ri, each Bi is a

basis of K (λi). Therefore, (a), (b) and (c) hold.It is clear that ker((σ − λiι)

mi) ⊆ K (λi). Suppose α ∈ K (λi). By Theorem 8.2.5,Z(α;λi) is linearly independent. Since dimK (λi) = mi, by (c) the length of Z(α;λi)cannot be greater than mi. Thus (σ − λiι)

mi(α) = 0, i.e., α ∈ ker((σ − λiι)mi). So (d)

is proved.σ is diagonalizable if and only if there is a basis C of V consisting of eigenvectors of

σ. By (c) K (λi) has a basis Ci = C ∩K (λi) consists of eigenvectors of σ correspondingto λi. Then mi = |Ci| ≤ dimE (λi). Since E (λi) ⊆ K (λi), they are equal. Conversely,if E (λi) = K (λi) for all i, then by (a) dimE (λi) = mi for all i. By (b), V has a basisconsisting of eigenvectors of σ. So σ is diagonalizable. Hence (e) is proved. �

∗The interested reader may refer to the book: S.H. Friedberg, A.J. Insel and L.E. Spence LinearAlgebra, 2nd editor, Prentice-Hall, 1989.


Corollary 8.2.15 Let V be a vector space over F. Let σ ∈ L(V, V ) be such that Cσ(x)splits over F. Suppose λ1, . . . , λk are the distinct eigenvalues of σ. Then

V = K (λ1)⊕ · · · ⊕K (λk).

By the fundamental theorem of algebra, we have the following corollaries.

Corollary 8.2.16 Let V be a vector space over C. Let σ ∈ L(V, V ). Then there isJordan basis of V corresponding to σ.

Corollary 8.2.17 Suppose A ∈Mn(C). Then there is an invertible matrix P ∈Mn(C)such that P−1AP is in the Jordan form.

Suppose V has a Jordan basis corresponding to σ. Let K (λ) be one of the generalizedeigenspace of V . By Theorem 8.2.14 (c) and Theorem 8.2.7 K (λ) has a basis B(λ) whichis a disjoint union of cycles of generalized eigenvectors of σ. Suppose B(λ) = Z1∪· · ·∪Zk

and the length of the cycle Zi is li. Without loss of generality, we may assume l1 ≥· · · ≥ lk.

It can be shown that the sequence l1, . . . , lk is uniquely determined by σ. Theinterested reader may refer to the book written by S.H Friedberg et al. Thus under thisordering of the basis of K (λ), the Jordan form of σ is uniquely determined.

Exercise 8.2

8.2-1. Let σ ∈ L(V, V ), where V is a finite dimensional vector space over F. Prove that

(a) ker(σ) ⊆ ker(σ2) ⊆ · · · ⊆ ker(σk) ⊆ ker(σk+1) ⊆ · · · .(b) If rank(σm) = rank(σm+1) for some m ∈ N, then rank(σk) = rank(σm) for

k ≥ m.

(c) If rank(σm) = rank(σm+1) for some m ∈ N, then ker(σk) = ker(σm) fork ≥ m.

(d) Let λ be an eigenvalue of σ. Prove that if rank((σ−λι)m) = rank((σ−λι)m+1)for some m ∈ N, then K (λ) = ker((σ − λι)m).

(e) (Third Test for Diagonalizability) Suppose Cσ(x) splits over F. Supposeλ1, . . . , λk are the distinct eigenvalues of σ. Then σ is diagonalizable if andonly if rank(σ − λiι) = rank((σ − λiι)

2) for all i.

8.2-2. Let Z = Z(α;σ, λ) be a cycle of generalized eigenvectors of σ ∈ L(V, V ). Provethat span(Z) is an invariant subspace under σ.

§8.3 Algorithms for Finding Jordan Basis

Jordan form is very useful in solving differential equation. So we provide two algo-rithms for finding the Jordan form of a given linear transformation or a square matrix.


Since for a given square matrix A we can define a linear transformation σ such that Arepresents σ. Conversely, for a given linear transformation σ, it can be represented bya square matrix. Thus in the following we only talk about finding an invertible P suchthat P−1AP is in the Jordan form, where A ∈ Mn(F) whose characteristic polynomialsplits over F. To find P is equivalent to find a Jordan basis of Fn. So in this section, weassume the characteristic polynomial of any matrix splits over F.

Recurrent Algorithm:

Let A ∈ Mn(F) with λ as an eigenvalue of algebraic multiplicity m. Suppose{X1, . . . , Xk} is a cycle of generalized eigenvectors corresponding to λ. Then

(A− λI)X1 = 0

(A− λI)X2 = X1

...

(A− λI)Xk = Xk−1

(8.1)

Thus we have to choose X1 properly so that the second equation of Equation (8.1) hasa solution X2. Furthermore, when Xi is chosen, the equation (A − λI)X = Xi has asolution Xi+1 for i = 1, . . . , k − 1. We proceed as follows:

Step 1: Form the augmented matrix(A− λI|B)

where B =(y1, · · · , yn

)T.

Step 2: Use elementary row operations to reduce(A−λI|B)

to the row reduced echelonform.

Step 3: If nullity(A − λI) = m, then we get m linearly independent eigenvectors im-mediately by putting B = 0 and then go to Step 2 for another unconsideredeigenvalue, else go to Step 4.

Step 4: Assume that nullity(A−λI) = ν < m. Then after Step 2, the augmented matrix(A− λI|B)

is row-equivalent to⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

Hf1...

fn−ν

Ofn−ν+1

...

fn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠,

where H is in rref and f1, . . . , fn are linear functions of y1, . . . , yn. Since ν < m,some eigenvector will generate a cycle of length larger than 1. Thus to find an ini-tial vectorX1 of the cycle, we cannot simply putB = 0. We have to take the con-ditionsfn−ν+1 = 0, . . . , fn = 0 into consideration. We may use the coefficients of


y1, . . . , yn in the linear equations fn−ν+1 = 0, . . . , fn = 0 to form a matrix K.Then reduce the matrix

⎛⎜⎜⎜⎜⎜⎝ Hf1...

fn−ν

K O

⎞⎟⎟⎟⎟⎟⎠ to

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

H ′f ′1...

f ′r

Of ′r+1...

f ′n

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

From this rref, we may choose the right eigenvector X1 by putting y1 = · · · =yn = 0 if the length of the cycle is 2. If the length of the cycle is larger than 2,then this procedure can be continued.

Step 5: After X1 =(x1 · · · xn

)Thas been chosen properly, we may put B = X1 into

the linear functions f1(y1, . . . , yn), . . . , fn−ν(y1, . . . , yn) and compute X2 from⎛⎜⎜⎝ Hf1(y1, . . . , yn)

...

fn−ν(y1, . . . , yn)

⎞⎟⎟⎠ .

If thisX2 will generateX3, then we have to take the conditions f ′r+1(y1, . . . , yn) =0, . . . , f ′n(y1, . . . , yn) = 0 into consideration.

Step 6: This algorithm terminates if there is no new linear relation among y1, . . . , ynrequired. Thus, we shall be able to find a cycle of generalized eigenvectorscorresponding to λ.

Step 7: If ν > 1, then we can choose another eigenvector X ′1 independent of X1 and go

to Step 4.

Step 8: After we have done with λ, we go to Step 2 for another unconsidered eigenvalue.

Step 9: Finally we obtain a Jordan basis of Fn and hence an invertible matrix P suchthat P−1AP is in the Jordan form.

We use the following examples to illustrate the algorithm. We assume all the matricesconsidered in the following examples are over Q, R or C.


⎛⎜⎝ 5 −3 −28 −5 −4

−4 3 3

⎞⎟⎠. Then C(x) = −(x− 1)3.


Steps 1-3: Reduce the augmented matrix

(A− I|B)

=

⎛⎜⎝ 4 −3 −2 y1

8 −6 −4 y2

−4 3 2 y3

⎞⎟⎠

→

⎛⎜⎝ 4 −3 −2 y1

0 0 0 −2y1+y2

0 0 0 y1 +y3

⎞⎟⎠ . (8.2)

In practice, we do not need to reduce to the rref. Thus we have two cycles.(In this stage, we know that one of length 1 and the other length 2.)

Step 4: In order to find the cycle of length 2, we have to choose x1, x2, x3 such that⎧⎪⎨⎪⎩4x1 − 3x2 − 2x3 = 0

−2x1 + x2 = 0

x1 + x3 = 0

.

To do this, we form the matrix

⎛⎜⎝ 4 −3 −2−2 1 0

1 0 1

⎞⎟⎠ and reduce to

⎛⎜⎝ 1 0 1

0 1 2

0 0 0

⎞⎟⎠.

Hence choose X1 =

⎛⎜⎝ 1

2

−1

⎞⎟⎠.

Step 5: Then put y1 = 1, y2 = 2, y3 = −1. Hence X2 can be chosen from⎛⎜⎝ 4 −3 −2 1

0 0 0 0

0 0 0 0

⎞⎟⎠. We choose X2 =

⎛⎜⎝ 0

−11

⎞⎟⎠.

Step 7: To find the other linearly independent eigenvector, we go back to Equa-tion (8.2). By putting y1 = y2 = y3 = 0 we obtain x1 = 1, x2 = 0, x3 = 2.

That is X3 =

⎛⎜⎝ 1

0

2

⎞⎟⎠ which is linearly independent of X1.

Step 8: Therefore, P =

⎛⎜⎝ 1 0 1

2 −1 0

−1 1 2

⎞⎟⎠ and J = P−1AP =

⎛⎜⎝ 1 1 0

0 1 0

0 0 1

⎞⎟⎠. �


⎛⎜⎜⎜⎜⎝3 1 0 0

−4 −1 0 0

7 1 2 1

−7 −6 −1 0

⎞⎟⎟⎟⎟⎠. Then C(x) = (x− 1)4.


Steps 1-3: Reduce the matrix

(A− I|B)

=

⎛⎜⎜⎜⎜⎝2 1 0 0

−4 −2 0 0

7 1 1 1

−7 −6 −1 −1

⎞⎟⎟⎟⎟⎠

→

⎛⎜⎜⎜⎜⎝1 0 0 0 1

2y1 +110(y3 + y4)

0 1 0 0 −15(y3 + y4)

0 0 1 1 −72y1 +

12y3 − 1

2y4

0 0 0 0 2y1 + y2

⎞⎟⎟⎟⎟⎠ . (8.3)

Thus, there is only one linearly independent eigenvector and hence there is acycle of length 4. To achieve this, we have to assume 2x1+x2 = 0. Since thereis only one linearly independent eigenvector, it is not necessary to continuethe reducing process for finding the condition of the eigenvector. That is,the condition 2x1 + x2 = 0 will hold until the final step. So we only put

y1 = y2 = y3 = 0 = y4 = 0 in Equation (8.3) to get X1 =

⎛⎜⎜⎜⎜⎝0

0

1

−1

⎞⎟⎟⎟⎟⎠.

Step 5: Put y1 = 0, y2 = 0, y3 = 1, y4 = −1 into Equation (8.3) we obtain X2 easily

from

⎛⎜⎜⎜⎜⎝1 0 0 0 0

0 1 0 0 0

0 0 1 1 1

0 0 0 0 0

⎞⎟⎟⎟⎟⎠, namely, X2 =

⎛⎜⎜⎜⎜⎝0

0

1

0

⎞⎟⎟⎟⎟⎠.

Step 5: X3 may be chosen from

⎛⎜⎜⎜⎜⎝1 0 0 0 1

10

0 1 0 0 −15

0 0 1 1 12

0 0 0 0 0

⎞⎟⎟⎟⎟⎠ to be X3 =

⎛⎜⎜⎜⎜⎝110

−15

012

⎞⎟⎟⎟⎟⎠. Finally,

X4 may be chosen from

⎛⎜⎜⎜⎜⎝1 0 0 0 1

10

0 1 0 0 − 110

0 0 1 1 −35

0 0 0 0 0

⎞⎟⎟⎟⎟⎠ to be X4 =

⎛⎜⎜⎜⎜⎝110

− 110

−35

0

⎞⎟⎟⎟⎟⎠.

Step 9: Thus P =

⎛⎜⎜⎜⎜⎝0 0 1

10110

0 0 −15 − 1

10

1 1 0 −35

−1 0 12 0

⎞⎟⎟⎟⎟⎠ and J =

⎛⎜⎜⎜⎜⎝1 1 0 0

0 1 1 0

0 0 1 1

0 0 0 1

⎞⎟⎟⎟⎟⎠.

�



⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 1 0

−4 1 −3 2 1

−2 −1 0 1 1

−3 −1 −3 4 1

−8 −2 −7 5 4

⎞⎟⎟⎟⎟⎟⎟⎠. Then C(x) = −(x− 2)5.

Step 1: Consider(A− 2I|B)

=

⎛⎜⎜⎜⎜⎜⎜⎝−1 0 −1 1 0 y1

−4 −1 −3 2 1 y2

−2 −1 −2 1 1 y3

−3 −1 −3 2 1 y4

−8 −2 −7 5 2 y5

⎞⎟⎟⎟⎟⎟⎟⎠.

Step 4: Reduce it to the form⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 2y1 −y30 0 1 −1 0 −2y1+y2−y30 0 0 0 0 −2y1−y2−y3 +y5

0 0 0 0 0 −y1 −y3+y4

⎞⎟⎟⎟⎟⎟⎟⎠ . (8.4)

There are only two linearly independent eigenvectors and hence two cycles ofgeneralized eigenvectors. In order to get generalized eigenvectors we have tointroduce two conditions{

−2x1−x2−x3 +x5 = 0

−x1 −x3+x4 = 0

Step 5: Apply elementary row operation to reduce⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 2y1 −y30 0 1 −1 0 −2y1+y2−y3

−2 −1 −1 0 1 0

−1 0 −1 1 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ to

⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 2y1 −y30 0 1 −1 0 −2y1+y2−y30 0 0 0 0 2y1−y20 0 0 0 0 y1

⎞⎟⎟⎟⎟⎟⎟⎠ . (8.5)

Thus there are two cycles of length greater than 1. (In this case, we know thatone is of length 3 and the other is of length 2.)


Step 5: Thus apply elementary row operations again to reduce⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 2y1 −y30 0 1 −1 0 −2y1+y2−y32 −1 0 0 0 0

1 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ to

⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 0

0 0 1 −1 0 −2y1+y2−y30 0 0 1 −1 2y1 −y30 0 0 0 0 −y1+y2−y3

⎞⎟⎟⎟⎟⎟⎟⎠ . (8.6)

Here we can see again that there is only one cycle of length greater than 2. Ifwe put the condition −x1 + x2 − x3 = 0 into consideration, then we reduce theaugmented matrix⎛⎜⎜⎜⎜⎜⎜⎝

1 0 0 0 0 y1−y2+y3

0 1 0 1 −1 0

0 0 1 −1 0 −2y1+y2−y30 0 0 1 −1 2y1 −y3

−1 1 −1 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ to

⎛⎜⎜⎜⎜⎜⎜⎝I5

∣∣∣∣∣∣∣∣∣∣∣∣

y1−y2 +y3

0

0

2y1−y2 −y3−y2+2y3

⎞⎟⎟⎟⎟⎟⎟⎠There is no other linear relation among y1, . . . , y5 required. So there is no cycleof length greater than 3.

Step 6: We find a cycle of length 3 first. To find X1, we put yi = 0 for i = 1, . . . , 5 in

Equation (8.6) to get X1 =(0 0 1 1 1

)T.

Then put y1 = 0, y2 = 0, y3 = y4 = y5 = 1 in Equation (8.5) to get X2 =(1 − 1 − 1 0 0

)T. Similarly, put the data of X2 into Equation (8.4) we get

X3 =(1 3 − 2 0 0

)T.

Step 7: We try to get another cycle. Since this cycle will be of length 2. So we start

from Equation (8.5). Put yi = 0 for all i we obtain X4 =(0 1 0 0 1

)Twhich is

linearly independent of X1. Thus by putting this data of X4 into Equation (8.4)

we obtain X5 =(− 1 0 1 0 0

)T.

Step 9: Hence P =

⎛⎜⎜⎜⎜⎜⎜⎝0 1 1 0 −10 −1 3 1 0

1 −1 −2 0 1

1 0 0 0 0

1 0 0 1 0

⎞⎟⎟⎟⎟⎟⎟⎠ and J =

⎛⎜⎜⎜⎜⎜⎜⎝2 1 0 0 0

0 2 1 0 0

0 0 2 0 0

0 0 0 2 1

0 0 0 0 2

⎞⎟⎟⎟⎟⎟⎟⎠. �



⎛⎜⎜⎜⎜⎜⎜⎝5 −1 −3 2 −50 2 0 0 0

1 0 1 1 −20 −1 0 3 1

1 −1 −1 1 1

⎞⎟⎟⎟⎟⎟⎟⎠. Then C(x) = −(x− 2)3(x− 3)2.

Step 1: Consider(A− 2I|B)

=

⎛⎜⎜⎜⎜⎜⎜⎝3 −1 −3 2 −5 y1

0 0 0 0 0 y2

1 0 −1 1 −2 y3

0 −1 0 1 1 y4

1 −1 −1 1 −1 y5

⎞⎟⎟⎟⎟⎟⎟⎠.

Step 4: We reduce it to the form⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 0 −2 −y4 +y5

0 1 0 0 −1 y3 −y50 0 0 1 0 y3+y4 −y50 0 0 0 0 y1 −y3+y4−2y50 0 0 0 0 y2

⎞⎟⎟⎟⎟⎟⎟⎠ (8.7)

There are two linearly independent eigenvectors, hence there are two cycles,one of length 1 and the other of length 2. In order to find a proper eigenvector,we have to take the conditions x1 − x3 + x4 − 2x5 = 0 and x2 = 0 intoconsideration. Thus we form the matrix⎛⎜⎜⎜⎜⎜⎜⎝

1 0 −1 0 −2 −y4+y5

0 1 0 0 −1 y3 −y50 0 0 1 0 y3+y4−y51 0 −1 1 −2 0

0 1 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ and reduce it to

⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 0 0 −2y3−y4 +y5

0 1 0 0 0 0

0 0 0 1 0 y3+y4 −y50 0 0 0 1 −y3 +y5

0 0 0 0 0 −y3 +2y5

⎞⎟⎟⎟⎟⎟⎟⎠ .

Thus we choose X1 =(1 0 1 0 0

)T.

Step 5: Now put y1 = y3 = 1 and y2 = y4 = y5 = 0 in Equation (8.7). Then we

get X2 =(0 1 0 1 0

)T. For the other cycle, we put yi = 0 for all i in

Equation (8.7) and get X3 =(2 1 0 0 1

)T.


Steps 1-4: For λ = 3, we again reduce(A− 3I|B)

to⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 1 0 −4y2−y3+2y4+2y5

0 1 0 0 0 −y20 0 1 0 0 −y2−y3 +y5

0 0 0 0 1 −y2 +y4

0 0 0 0 0 y1 −y2−y3 +y4 −y5

⎞⎟⎟⎟⎟⎟⎟⎠ .

There is only one linearly independent eigenvector, hence only one cycle of

length 2. Choose X4 =( − 1 0 0 1 0

)T. It is clear that it satisfies the

condition x1 − x2 − x3 + x4 − x5 = 0. So we can compute X5.

Step 5: By the data of X4 and the above matrix, we solve that X5 =(2 0 0 0 1

)T.

Step 9: Thus P =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 2 −1 2

0 1 1 0 0

1 0 0 0 0

0 1 0 1 0

0 0 1 0 1

⎞⎟⎟⎟⎟⎟⎟⎠ and hence J =

⎛⎜⎜⎜⎜⎜⎜⎝2 1

0 2

0

0

0 0

0 0

0 0 2 0 0

0 0

0 0

0

0

3 1

0 3

⎞⎟⎟⎟⎟⎟⎟⎠ . �

Example 8.3.5 Let

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

2 1 0 0 1 1 0

0 2 6 0 −27 9 0

0 −2 −10 1 50 −18 0

0 0 −4 2 18 −6 0

0 0 −4 0 20 −6 0

0 2 −4 −1 22 −4 0

0 2 1 −1 0 1 2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Then C(x) = −(x−2)7. After applying some elementary row operations on [A−2I | Y ]we have

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 2 0 y1 −3y2−2y3 +y4 −2y70 0 1 0 0 −3 0 18y2+9y3+2y4 +9y7

0 0 0 1 0 0 0 2y1+10y2+5y3 +y4 +4y7

0 0 0 0 1 −1 0 3y2+2y3 −y4 +2y7

0 0 0 0 0 0 0 2y2 +3y4

0 0 0 0 0 0 0 2y2 +y3 −y4 +y6

0 0 0 0 0 0 0 −y4+y5

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. (8.8)

Since the nullity of A− 2I is 3, there are 3 cycles of generalized eigenvectors.


For cycle of length greater than 1, we have to take the last three linear relations intoconsideration. After taking some elementary row operation we have

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 0 0 −3y1−17y2 −8y3−3y4 −2y70 0 1 0 0 0 0 6y1+39y2+18y3+8y4+15y7

0 0 0 1 0 0 0 2y1+10y2 +5y3 +y4 +4y7

0 0 0 0 1 0 0 2y1+10y2 +5y3 +y4 +4y7

0 0 0 0 0 1 0 2y1 +7y2 +3y3+2y4

0 0 0 0 0 0 0 4y2 +y3+3y4

0 0 0 0 0 0 0 −2y2 −3y4 −y7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. (8.9)

Thus, there are only two cycles of length greater than 1. For cycle of length greater than2, we take the last two linear relations into consideration. After taking some elementaryrow operations we have

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 0 0 −3y1−17y2 −8y3−3y4 −2y70 0 1 0 0 0 0 6y1+39y2+18y3+8y4+15y7

0 0 0 1 0 0 0 2y1+10y2 +5y3 +y4 +4y7

0 0 0 0 1 0 0 2y1+10y2 +5y3 +y4 +4y7

0 0 0 0 0 1 0 2y1 +7y2 +3y3+2y4

0 0 0 0 0 0 1 4y2 +y3+3y4

0 0 0 0 0 0 0 −y2 −y3 +y4 −3y7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. (8.10)

Thus, there is only one cycle of length greater than 2. Hence, there is precisely one cycleof length 1, one of length 2 and the other of length 4.

To get a cycle of length 4, put y1 = · · · = y7 = 0 into Equation (8.10) and getan initial vector X1 = (1 0 0 0 0 0 0)T . Now put y1 = 1, y2 = · · · = y7 = 0 intoEquation (8.10) and get X2 = (0 − 3 6 2 2 2 0)T . Putting this data into Equation (8.9)and get X3 = (0 − 3 7 2 2 1 0)T . Substitute the data into Equation (8.8) and getX4 = ( −3 1 3 7 3 0 0)T .

To get a cycle of length 2, put yi = 0 into Equation (8.9) and get an initial vectorX5 = (0 0 0 0 0 0 1)T . Put y1 = · · · = y6 = 0 and y7 = 1 into Equation (8.8) and getX6 = (0 − 2 9 4 2 0 0)T .

Finally, it is easy to see from Equation (8.8) that the cycle of length 1 consists ofthe vector X7 = (0 − 2 3 0 1 1 0)T . �

Null Space Algorithm:

Let σ ∈ L(V, V ). Suppose V has a Jordan basis B corresponding to σ. Let λ bean eigenvalue of σ. Suppose k = nullity(σ − λι), the geometric multiplicity of λ. Weknow that there are k cycles of generalized eigenvectors of σ corresponding to λ. FromExercise 8.2-1 (d) we know that if rank((σ − λι)j) = rank((σ − λι)j+1) for some j ∈ N,then K (λ) = ker((σ − λι)j). How can we determine the length of each of the cycles?How do we find the initial vector of each cycle?


Let J = [σ]B. Suppose that J1, . . . , Jt are the Jordan blocks. Without loss ofgenerality, we may assume that the first k Jordan blocks whose diagonal entries areλ. Let mi be the size of Ji for 1 ≤ i ≤ t. Without loss of generality, we may assumem1 ≥ · · · ≥ mk. For convenience, we let η = σ− λι, N = J − λI and Ni = Ji− λImi for1 ≤ i ≤ t. Also we denote N by diag{N1, . . . , Nk, . . . , Nt}. Then it is easy to see thatNmi

i = O but Nmi−1i �= O for 1 ≤ i ≤ k. Since Nj is invertible for j ≥ k + 1, so is N s

j

for s ≥ 0. Then dimK (λ) = nullity(Nm1).Let Z(α1), . . . , Z(αk) be the cycles of generalized eigenvectors of σ corresponding to

λ. In order to determine these cycles we have to find all the αi’s. If mi = m1, thenηm1−1αi �= 0 but ηm1αi = 0. Hence αi ∈ ker(ηm1)\ker(ηm1−1). So we can determine theend vectors of the cycles whose length are m1. Let mh+1 be the second large numberof the sequence m1, · · · ,mk. The rank of ηm1−2 must be greater than rank(ηm1−1)by at least h. If rank(ηm1−2) > rank(ηm1−1) + h or equivalently nullity(ηm1−2) <nullity(ηm1−1)+h, then there are some end vectors of the cycles whose length are mh+1.Then we can determine all such end vectors. If rank(ηm1−2) = rank(ηm1−1) + h, thenconsider the rank of ηm1−3. This process can be continued until all the end vectors havebeen found.

Following is the algorithm. Let A ∈ Mn(F) be such that CA(x) splits over F. Letm = dimK (λ), i.e., the algebraic multiplicity of λ.

Step 1: Let λ be an eigenvalue of A. Let N = A − λI. Starting with i = 1, computeN i and ni = nullity(N i) until ni+1 = ni. Let m be the least positive integersuch that nm+1 = nm. Let Ni = null(N i) for 1 ≤ i ≤ m and let n0 = 0. Definedi = ni − ni−1 for 1 ≤ i ≤ m. Set the banks Bi = ∅ for 1 ≤ i ≤ m. Set acounter k = m.

Step 2: Find a basis Am for Nm. Find dm linearly independent vectors fromNm−1(Am).Let them be β1, . . . , βdm . Trace back these vectors to the basis Am and denotethem by α1, . . . , αdm accordingly. Note that βj = Nm−1(αj) for 1 ≤ i ≤ dm.Put N i(αj) into Bm−i for 1 ≤ j ≤ dm and 0 ≤ i ≤ m− 1.

Step 3: Subtract the counter k by 1. If k = 0, then go to Step 1 for another unconsideredeigenvalue until all eigenvalues have been considered. If dk − dk+1 = 0 thenrepeat this step, otherwise go to Step 4.

Step 4: Extend the setk⋃

i=1Bi to a basis Ak for Nk. Find the maximal dk linearly inde-

pendent subset from Nk−1(Ak) containing the vectors of B1 = {β1, . . . , βdk+1}.

Let the vectors ink⋃

i=1Bi but not in B1 be denoted by βdk+1+1, . . . , βdk . Trace

back these dk − dk+1 vectors to Ak. Let the corresponding vectors be αdk+1+1,. . . , αdk . Put N

i(αj) into Bk−i for dk+1 + 1 ≤ j ≤ dk and 0 ≤ i ≤ k− 1. Go toStep 3.


Note that by the choice of B1, it guarantees that B1 is linearly independent. Sinced1 = dimE (λ), B1 is a basis of E (λ). Each βj is the initial vector of Z(αj ;λ). Sinced1⋃j=1

Z(αj ;λ) =m⋃i=1

Bi and the construction of Bi, we have

∣∣∣∣∣ d1⋃j=1

Z(αj ;λ)

∣∣∣∣∣ = m∑i=1|Bi| =

m∑i=1

di = nm = dimK (λ). Thusd1⋃j=1

Z(αj ;λ) is a basis of K (λ).

We use the following examples to illustrate the algorithm. We assume all the matricesconsidered in the following examples are over F, where F is Q, R or C.

Example 8.3.6 We consider Example 8.3.2 again.

Step 1: N = A − I =

⎛⎜⎜⎜⎜⎝2 1 0 0

−4 −2 0 0

7 1 1 1

−7 −6 −1 −1

⎞⎟⎟⎟⎟⎠. rref(N) =

⎛⎜⎜⎜⎜⎝1 0 0 0

0 1 0 0

0 0 1 1

0 0 0 0

⎞⎟⎟⎟⎟⎠. Then

n1 = 1.

N2 =

⎛⎜⎜⎜⎜⎝0 0 0 0

0 0 0 0

10 0 0 0

10 10 0 0

⎞⎟⎟⎟⎟⎠. Then n2 = 2.

N3 =

⎛⎜⎜⎜⎜⎝0 0 0 0

0 0 0 0

20 10 0 0

−20 −10 0 0

⎞⎟⎟⎟⎟⎠. Then n3 = 3. and N4 = O. Hence n4 = 4 = n5.

So m = 4 and d4 = d3 = d2 = d1 = 1.

Step 2: Since N4 = F4. We let A4 be the standard basis. Since d4 = 1 and N3e2 �= 0,we choose α1 = e2 (since N3e1 �= 0, we may also choose e1).

Step 3: Since d4, d3, d2, d1 = 1 are the same, no other step is needed.

Then Z(e2) = {(0 0 10 − 10)T , (0 0 0 10)T , (1 − 2 1 − 6)T , (0 1 0 0)T } and

P =

⎛⎜⎜⎜⎜⎝0 0 1 0

0 0 −2 1

10 0 1 0

−10 10 −6 0

⎞⎟⎟⎟⎟⎠ .

Then P−1AP is in the Jordan form as in Example 8.3.2. �



Step 1: N = A− 2I =

⎛⎜⎜⎜⎜⎜⎜⎝−1 0 −1 1 0

−4 −1 −3 2 1

−2 −1 −2 1 1

−3 −1 −3 2 1

−8 −2 −7 5 2

⎞⎟⎟⎟⎟⎟⎟⎠. After taking row operations we have

rref(N) =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 0 0

0 1 0 1 −10 0 1 −1 0

0 0 0 0 0

0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ . Thus n1 = 2.

N2 =

⎛⎜⎜⎜⎜⎜⎜⎝0 0 0 0 0

0 0 0 0 0

−1 0 −1 1 0

−1 0 −1 1 0

−1 0 −1 1 0

⎞⎟⎟⎟⎟⎟⎟⎠ and rref(N2) =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 1 −1 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠. Thus

n2 = 4.

N3 = O. Thus n3 = 5 = n4. So m = 3 and d3 = 1, d2 = 2, d1 = 2.

Step 2: Since N3 = F5, we let A3 be the standard basis.N2e1 = (0 0 − 1 − 1 − 1)T = N2e3 = −N2e4 and N2e2 = N2e5 = 0. So wemay choose any one vectors from e1, e3 and e4. Now we choose α1 = e1. ThenB3 = {e1}, B2 = {Ne1 = (−1 − 4 − 2 − 3 − 8)T } and B1 = {N2e1 = β1 =(0 0 − 1 − 1 − 1)T }.

Step 3: k = 2, d2 − d3 = 1. There is one new vector.

Step 4: Now {e5, e2, γ1 = (−1 0 1 0 0)T , γ2 = (1 0 0 1 0)T } is a basis of N2.Consider {Ne1, N

2e1, e5, e2, γ1, γ2}. By casting-out method we obtain A2 ={Ne1, N

2e1, e5, e2} as a basis of N2. N(A2) = {N2e1,0, Ne5 = (0 1 1 1 2)T ,Ne2 = (0 − 1 − 1 − 1 − 2)T }. Then {N2e1, Ne5 = (0 1 1 1 2)T } is a linearlyindependent subset of N(A2). Thus we choose α2 = e5 and hence β2 = Ne5.

B2 = {Ne1, e5} and B1 = {β1, β2}. (In practice, since we have already found5 linearly independent vectors in K (2), we stop here.)

Step 3: k = 1, d1 − d2 = 0. Repeat Step 3, Since k = 0, stop.

So

Z(α1) = {(0 0 − 1 − 1 − 1)T , (−1 − 4 − 2 − 3 − 8)T , (1 0 0 0 0)T },Z(α2) = {(0 1 1 1 2)T , (0 0 0 0 1)T }.


Thus

P =

⎛⎜⎜⎜⎜⎜⎜⎝0 −1 1 0 0

0 −4 0 1 0

−1 −2 0 1 0

−1 −3 0 1 0

−1 −8 0 2 1

⎞⎟⎟⎟⎟⎟⎟⎠ ,

and P−1AP is in the Jordan form as in Example 8.3.3. �


Consider the case λ = 2.

Step 1: Then N = A− 2I =

⎛⎜⎜⎜⎜⎜⎜⎝3 −1 −3 2 −50 0 0 0 0

1 0 −1 1 −20 −1 0 1 1

1 −1 −1 1 −1

⎞⎟⎟⎟⎟⎟⎟⎠. From Example 8.3.4 we know

that n1 = 2.

N2 =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 0 −20 0 0 0 0

0 0 0 0 0

1 −2 −1 2 0

1 −1 −1 1 −1

⎞⎟⎟⎟⎟⎟⎟⎠ and

rref(N2) =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 0 −20 1 0 −1 −10 0 0 0 0

0 0 0 0 0

0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠ . (8.11)

So n2 = 3.

N3 =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 −1 0 −10 0 0 0 0

0 0 0 0 0

2 −3 −2 3 −11 −1 −1 1 −1

⎞⎟⎟⎟⎟⎟⎟⎠. We can compute that n3 = 3. Then m = 3,

d2 = 1 and d1 = 2.

Step 2: By Equation (8.11) we obtainA2 = {(1 0 1 0 0)T , (0 1 0 1 0)T , (2 1 0 0 1)T }.N(A2) = {0, (1 0 1 0 0)T , 0}. Then we choose α1 = (0 1 0 1 0)T . Henceβ1 = (1 0 1 0 0)T . B2 = {α1} and B1 = {β1}.


Step 3-4: Now k = 1. From Equation (8.7) we obtain that {(1 0 1 0 0)T , (2 1 0 0 1)T }is a basis of N . So A1 = {(1 0 1 0 0)T , (2 1 0 0 1)T }. Thenα2 = (2 1 0 0 1)T = β2. And B1 = {β1, β2}.

Then

Z(α1; 2) ={(1 0 1 0 0)T , (0 1 0 1 0)T }Z(α2; 2) ={(2 1 0 0 1)T }

Consider the case λ = 3.

Step 1: N = A − 3I =

⎛⎜⎜⎜⎜⎜⎜⎝2 −1 −3 2 −50 −1 0 0 0

1 0 −2 1 −20 −1 0 0 1

1 −1 −1 1 −2

⎞⎟⎟⎟⎟⎟⎟⎠. By Example 8.3.4 we know that

n1 = 1.

N2 =

⎛⎜⎜⎜⎜⎜⎜⎝−4 2 5 −4 8

0 1 0 0 0

−2 0 3 −2 4

1 0 −1 1 −2−1 1 1 −1 2

⎞⎟⎟⎟⎟⎟⎟⎠ and rref(N2) =

⎛⎜⎜⎜⎜⎜⎜⎝1 0 0 1 −20 1 0 0 0

0 0 1 0 0

0 0 0 0 0

0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠. So

n2 = 2.

N3 =

⎛⎜⎜⎜⎜⎜⎜⎝5 −2 −6 5 −100 −1 0 0 0

3 0 −4 3 −6−1 0 1 −1 2

1 −1 −1 1 −2

⎞⎟⎟⎟⎟⎟⎟⎠ and n3 = 2. So m = 2. (In fact, we need

not compute N3. It is because that the algebraic multiplicity of λ = 3 is 2.Then the maximum length of any cycle is 2.)

Step 2: A2 = {(1 0 0 − 1 0)T , (2 0 0 0 1)T }. Since N(A2) = {0, (−1 0 0 1 0)T }. Wechoose α1 = (2 0 0 0 1)T and then β1 = (−1 0 0 1 0)T .

Hence we have

Z(α1; 3) = {(−1 0 0 1 0)T , (2 0 0 0 1)T }.Finally, we have a Jordan basis

B = {(1 0 1 0 0)T , (0 1 0 1 0)T , (2 1 0 0 1)T , (−1 0 0 1 0)T , (2 0 0 0 1)T }

of F5 corresponding to a linear transformation induced by A. Coincidentally, it is thesame as what we have found in Example 8.3.4. �


Example 8.3.9 Consider Example 8.3.5 again. let N = A− 2I. We compute ni first.Since

rref(N) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 2 0

0 0 1 0 0 −3 0

0 0 0 1 0 0 0

0 0 0 0 1 −1 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠rref(N2) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 −12 2 0 0

0 0 1 0 −92

32 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

rref(N3) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 −12 2 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠and N4 = O. Thus n1 = 3, n2 = 5, n3 = 6, n4 = 7 = n5 and hence m = 4, d4 = 1,d3 = 1, d2 = 2, d1 = 3.

Let A4 be the standard basis of F7 = N4. Since N3e4 = e1, we choose α1 = e4 andput it into B4. Put Ne4 = (0 0 1 0 0 − 1 − 1)T into B3, N

2e4 = (−1 − 3 6 2 2 2 0)T

into B2 and β1 = N3e4 = e1 into B1.From rref(N2) we have that

{e1, e7, (0, 1, 0, 2, 0, 0, 0), (0,−4, 9, 0, 2, 0, 0), (0, 0,−3, 0, 0, 2, 0)} is a basis of N2 (notethat, for convenience we write the column vectors as 7-tuples). Then we have

A2 = {e1, (−1,−3, 6, 2, 2, 2, 0), e7, (0, 1, 0, 2, 0, 0, 0), (0,−4, 9, 0, 2, 0, 0)}.

Then N(A2) = {0, e1, 0, e1, (−2, 0, 0, 0, 0, 0, 1)}. So we chooseα2 = (0,−4, 9, 0, 2, 0, 0). Hence β2 = (−2, 0, 0, 0, 0, 0, 1). ThenB2 = {(−1,−3, 6, 2, 2, 2, 0), (0,−4, 9, 0, 2, 0, 0)}, B1 = {e1, (−2, 0, 0, 0, 0, 0, 1)}.

From rref(N) we have that {e1, e7, (0,−2, 3, 0, 1, 1, 0)} is a basis of N1. ThenA1 = {e1, (−2, 0, 0, 0, 0, 0, 1), (0,−2, 3, 0, 1, 1, 0)}. Hence α3 = β3 = (0,−2, 3, 0, 1, 1, 0)and B1 = {e1, (−2, 0, 0, 0, 0, 0, 1), (0,−2, 3, 0, 1, 1, 0)}.

Therefore,

Z(α1) ={e1, (−1,−3, 6, 2, 2, 2, 0), (0, 0, 1, 0, 0,−1,−1), e4},Z(α2) ={(−2, 0, 0, 0, 0, 0, 1), (0,−4, 9, 0, 2, 0, 0)},Z(α3) ={(0,−2, 3, 0, 1, 1, 0)}. �


Exercise 8.3

8.3-1. Let A =

⎛⎜⎜⎜⎜⎝1 −3 0 3

−2 −6 0 13

0 −3 1 3

−1 −4 0 8

⎞⎟⎟⎟⎟⎠ over C. Find a invertible matrix P such that

P−1AP is in Jordan form.

8.3-2. Put the matrix A =

⎛⎜⎜⎜⎜⎝−3 −1 1 −79 −3 −7 −10 0 4 −80 0 2 −4

⎞⎟⎟⎟⎟⎠ ∈M4(C) into Jordan form.

8.3-3. Put the matrix A =

⎛⎜⎜⎜⎜⎝−1 1 0 0

−5 3 1 0

3 0 −1 1

11 −3 −3 3

⎞⎟⎟⎟⎟⎠ ∈M4(Q) into Jordan form.

8.3-4. Let A =

⎛⎜⎜⎜⎜⎝1 0 0 0

0 1 0 0

5 −1 −1 1

10 −2 −4 3

⎞⎟⎟⎟⎟⎠ ∈M4(R). Find an invertible matrix P such that

P−1AP is in Jordan form.

8.3-5. Let A =

⎛⎜⎜⎜⎜⎝−2 1 0 0

−4 2 0 0

3 0 −2 1

12 −3 −4 2

⎞⎟⎟⎟⎟⎠ ∈ M4(R). Find an invertible matrix P such

that P−1AP is in Jordan form.

Chapter 9

Linear and Quadratic Forms

For real problem we always need to find a numerical data or value of a function. To findan approximate value of a given function, the usual method is linear approximation. Ifwe need to have a more precise value, the higher order approximation is used. The usualhigher approximation is quadratic approximation. In mathematical language, supposethere is an n-variable function f : Rn → R. For a fixed point c ∈ Rn, we know the exactvalue f(c) but do not know the value of x that near the point c . So we need to find anapproximate value of f(x) for x close to c. The usual linear approximation is

f(x) ∼= f(c) +∇f(c) · (x− c).

The quadratic approximation is

f(x) ∼= f(c) +∇f(c) · (x− c) + (x− c)TD2f(c)(x− c),

where D2f(c) =( ∂2f∂xi∂xj

(c))is the second derivative matrix of f at c. If ∇f(c) = 0,

then the positivity of the matrix D2f(c) determines the property of the extrema.The functions ∇f(c) · (x − c) and (x − c)TD2f(c)(x − c) are a linear form and a

quadratic form, respectively. In this chapter we shall discuss linear forms and quadraticforms.

§9.1 Linear Forms

In this section we study linear forms first. Let V and W be vectors spaces over F.We know that L(V,W ) is a vector space over F. In particular if we choose W = F, thenL(V,F) is a vector space. We shall use V to denote L(V,F).

Definition 9.1.1 Let V be a vector space over F. A linear transformation of V into Fis called a linear form or linear functional on V . The space V = L(V,F) is called thedual space of V .

Let V and W be n and m dimensional vectors spaces over F. From Theorems 7.2.4and 7.2.6 we know that L(V,W ) is isomorphic to Mm,n(F). Let Ei,j be the matrix

182

Linear and Quadratic Forms 183

defined in Section 1.5 for 1 ≤ i ≤ m and 1 ≤ j ≤ n. Then it is easy to see that

{Ei,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n}

is a basis of Mm,n(F). This means that dimL(V,W ) = dimMm,n(F) = mn. So combin-ing with Theorem 7.1.8 we have the following theorem.

Theorem 9.1.2 Let V be a vector space (not necessary finite dimensional) over F.Then V is a vector space over F. If dimV = n, then dim V = n.

From the above discussion we have a basis for Mm,n(F). Use the proof of Theo-rem 7.2.4 we should have a corresponding basis for L(V,W ). Let A = {α1, . . . , αn}and B = {β1, . . . , βm} be bases of V and W , respectively. From the proof of Theo-rem 7.2.4 and 7.2.6 we know that the corresponding is Λ : σ �→ [σ]AB . So the basis forL(V,W ) should be the linear transformations whose images under Λ are the matricesEi,j . Precisely, we define σi,j : V → F by

σi,j(αj) = βi and σi,j(αk) = 0 for k �= j.

Hence {σi,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for L(V,W ) and [σi,j ]AB = Ei,j .

For the linear functional case, we choose B = {1} as the basis of F. Then thecorresponding basis of V is φj : V → F which is defined by φj(αj) = 1 and φj(αk) = 0for k �= j. In other word,

φj(αk) = δjk · 1 = δjk, for 1 ≤ j ≤ n.

Then [φj ]AB = ej , where {e1, . . . , en} is the standard basis of F1×n. The linear form φj

is called the j-th coordinate function.

Definition 9.1.3 The basis {φ1, . . . , φn} constructed above is called the basis dual toA or the dual basis of A . They are characterized by the relations φi(αj) = δij for all1 ≤ i, j ≤ n.

Let A = {α1, . . . , αn} be a basis of V and let Φ = {φ1, . . . , φn} be the dual basis of

A . Suppose φ ∈ V and α ∈ V . There are bi, xj ∈ F, 1 ≤ i, j ≤ n, such that φ =n∑

i=1biφi

and α =n∑

j=1xjαj . Then

φ(α) =

(n∑

i=1

biφi

)⎛⎝ n∑j=1

xjαj

⎞⎠ =

n∑i=1

n∑j=1

bixjφi(αj) =

n∑i=1

n∑j=1

bixjδij =

n∑i=1

bixi.

Thus if we use the matrix B =(b1 · · · bn

)Tto represent φ and the matrix X =(

x1 · · · xn

)Tto represent α, i.e., [φ]Φ = B and [α]A = X, then

φ(α) = BTX. (9.1)

184 Linear and Quadratic Forms

Note that if we view φ as a vector in V , then [φ]Φ = B is the coordinate vector of φrelative to the basis Φ. However if we view φ as a linear transformation from V to F,then [φ]AB = BT (which is a row vector). So [φ(α)]B = [φ]AB [α]A = BTX. This agreeswith (9.1) whose right hand side is the product of two matrices.

Suppose A = {α1, . . . , αn} is a basis of V . Let {φ1, . . . , φn} be the dual basis of A .

Let α ∈ V . Then α =n∑

j=1ajαj for some aj ∈ F. Then φi(α) =

n∑j=1

ajφi(αj) =n∑

j=1ajδij =

ai. So

α =

n∑j=1

φj(α)αj . (9.2)

Suppose φ ∈ V . Then φ =n∑

i=1biφi for some bi ∈ F. Then

φ(αj) =n∑

i=1biφi(αj) =

n∑i=1

biδij = bj . So φ =n∑

i=1φ(αi)φi.

Example 9.1.4 Let t1, . . . , tn be any distinct real numbers and c1, . . . , cn be any realnumbers. Find a polynomial f ∈ R[x] such that f(ti) = ci for all i and deg f < n.Solution: Let V = Pn(R). For p ∈ V , define φi(p) = p(ti) for i = 1, . . . , n. Thenclearly φ ∈ V .

Supposen∑

i=1aiφi = O, then

(n∑

i=1aiφi

)(p) = 0 for all p ∈ V . In particular, for p = xj

with 0 ≤ j ≤ n− 1, we haven∑

i=1ait

ji = 0 for 0 ≤ j ≤ n− 1. Since the matrix

⎛⎜⎜⎜⎜⎝1 1 · · · 1

t1 t2 · · · tn...

... · · · ...

tn−11 tn−12 · · · tn−1n

⎞⎟⎟⎟⎟⎠is non-singular (see Example 4.3.5), we have c1 = · · · = cn = 0. Thus {φ1, . . . , φn} is abasis of V .

Since φi(xj) = tji �= δij in general, {φ1, . . . , φn} is not a basis dual to the standard

basis {1, x, . . . , xn−1}. Let

p1(x) =(x− t2)(x− t3) · · · (x− tn)

(t1 − t2)(t1 − t3) · · · (t1 − tn)

...

pi(x) =(x− t1) · · · (x− ti−1)(x− ti+1) · · · (x− tn)

(ti − t1) · · · (ti − ti−1)(t1 − ti+1) · · · (ti − tn)

...

pn(x) =(x− t1)(x− t2) · · · (x− tn−1)(tn − t1)(tn − t2) · · · (tn − tn−1)

.


It is easy to see that {p1, . . . , pn} is linearly independent and hence a basis of V , and{φ1, . . . , φn} is its dual basis.

Then by Equation 9.2 the required polynomial is

f =n∑

j=1

cjpj .

This is called the Lagrange interpolation. Note that the above method works in any fieldthat contains at least n elements. �

Exercise 9.1

9.1-1. Find the dual basis of the basis {(1, 0, 1), (−1, 1, 1), (0, 1, 1)} of R3.

9.1-2. Let V be a vector space of finite dimension n ≥ 2 over F. Let α and β be twovectors in V such that {α, β} is linearly independent. Show that there exists alinear form φ such that φ(α) = 1 and φ(β) = 0.

9.1-3. Show that if α is a nonzero vector in a finite dimensional vector space, then thereis a linear form φ such that φ(α) �= 0.

9.1-4. LetW be a proper subspace of a finite dimensional vector space V . Let α ∈ V \W .Show that there is a linear form φ such that φ(α) = 1 and φ(β) = 0 for all β ∈W .

9.1-5. Let α, β be vectors in a finite dimensional vector space V such that wheneverφ ∈ V , φ(β) = 0 implies φ(α) = 0. Show that α is a multiple of β.

9.1-6. Find a polynomial f(x) ∈ P5(R) such that f(0) = 2, f(1) = −1, f(2) = 0,f(−1) = 4 and f(4) = 3.

§9.2 The Dual Space

Let V be a vector space. For a fixed vector α ∈ V we define a function J(α) on the

space V by J(α)(φ) = φ(α) for all φ ∈ V . It is clear that J(α) is linear. So J(α) ∈ V ,

the dual space of V .

Theorem 9.2.1 Let V be a finite dimensional vector space over F and letV be the

space of all linear forms on V . Let J : V → V be defined by J(α)(φ) = φ(α) for all

α ∈ V and φ ∈ V . Then J is an isomorphism between V andV and is sometimes

referred as the natural isomorphism.

Proof: Let α, β ∈ V , a ∈ F and φ ∈ V . Then

J(aα+ β)(φ) = φ(aα+ β) = aφ(α) + φ(β) = aJ(α)(φ) + J(β)(φ) = [aJ(α) + J(β)](φ).


Since φ ∈ V is chosen arbitrarily, we have J(aα+ β) = aJ(α) + J(β). So J is linear.Suppose {α1, . . . , αn} is a basis of V and {φ1, . . . , φn} its dual basis. Let α ∈ V ,

α =n∑

j=1ajαj for some aj ∈ F. If J(α) = O, then J(α)(φ) = 0 ∀φ ∈ V . In particular,

J(α)(φi) = 0 ∀i. That is, 0 = φ(α) = ai ∀i. Hence α = 0 and J is a monomorphism.

Since dimV = dim V = dimV = n, By Theorem 7.1.19 J is an isomorphism between V

andV . �

Corollary 9.2.2 Let V be a finite dimensional vector space and let V be its dual space.Then every basis of V is the dual basis of some basis of V .

Proof: Let {φ1, . . . , φn} be a basis of V . By Theorem 9.1.2 we can find its dual

basis {Φ1, . . . , Φn} inV . By means of the isomorphism J defined above we can find

α1, . . . , αn ∈ V such that J(αi) = Φi for i = 1, . . . , n. Then {α1, . . . , αn} is the requiredbasis. This is because φj(αi) = J(αi)(φj) = Φi(φj) = δij . �

By the above theorem, we can identify V withV , and consider V as the space of all

linear forms on V . Thus V and V are called dual spaces. A basis {α1, . . . , αn} of V anda basis {φ1, . . . , φn} of V are called dual bases if φi(αj) = δij for all i, j.

Suppose A and B are bases of a finite dimensional vector space V and Φ and Ψare their dual bases, respectively. It is known that there is a linear relation between thebases A and B. Namely, there is a matrix of transition P from A to B. Similarlythere is a matrix of transition Q from Φ to Ψ. What is the relation between P and Q?

Let A = {α1, . . . , αn}, B = {β1, . . . , βn}, Φ = {φ1, . . . , φn} and Ψ = {ψ1, . . . , ψn}.Suppose P = (pij) and Q = (qij). Then

βj =n∑

i=1

pijαi and ψk =n∑

r=1

qrkφr, 1 ≤ j, k ≤ n.

Now observe that ψk(αj) =n∑

r=1qrkφr(αj) =

n∑r=1

qrkδrj = qjk. Then

δkj = ψk(βj) = ψk

(n∑

i=1

pijαi

)=

n∑i=1

pijψk(αi) =

n∑i=1

pijqik =

n∑i=1

(QT )k,i(P )i,j .

Thus QTP = I and hence QT = P−1 or Q = (P−1)T = (P T )−1. Hence P T = Q−1

is the matrix of transition from Ψ to Φ. Thus φ =n∑

j=1pijψj .

Let B =(b1 · · · bn

)Tand B′ =

(b′1 · · · b′n

)Tbe the coordinate vectors of the

linear form φ with respect to the bases Φ and Ψ, respectively. That is, [φ]Φ = B and[φ]Ψ = B′. Then

n∑j=1

b′jψj = φ =

n∑i=1

biφi =

n∑i=1

bi

⎛⎝ n∑j=1

pijψj

⎞⎠ =

n∑i=1

⎛⎝ n∑j=1

bipij

⎞⎠ψj .


Thus B′T = BTP or B′ = P TB = Q−1B. That is, [φ]Ψ = Q−1[φ]Φ. This result canbe obtained from Equation (9.1) directly.

Example 9.2.3 Let V = R3 and β1 = (1, 0,−1), β2 = (−1, 1, 0) and β3 = (0, 1, 1).Then clearly B = {β1, β2, β3} is a basis of V . From Equation (9.1) we know thatφ(x, y, z) = b1x+ b2y + b3z for some bi ∈ R. To find the dual basis Ψ = {ψ1, ψ2, ψ3} ofB we can solve bi by considering the equations ψi(βj) = δi,j for all 1 ≤ i, j ≤ 3.

Now we would like to use the result above to find the dual basis of B. Let Sbe the standard basis of V . Let Φ = {φ1, φ2, φ3} be the dual basis of S . Then

[φ1]Φ =

⎛⎜⎝1

0

0

⎞⎟⎠, [φ2]Φ =

⎛⎜⎝0

1

0

⎞⎟⎠ and [φ3]Φ =

⎛⎜⎝0

0

1

⎞⎟⎠. That is, φ1(x, y, z) = x, φ2(x, y, z) = y

and φ3(x, y, z) = z. The transition matrix P from S to B is

⎛⎜⎝ 1 −1 0

0 1 1

−1 0 1

⎞⎟⎠. Then

Q = (P T )−1 = 12

⎛⎜⎝ 1 −1 1

1 1 1

−1 −1 1

⎞⎟⎠. Thus, the required dual basis will consist of

[ψ1]Φ =

⎛⎜⎝1212

−12

⎞⎟⎠ , [ψ2]Φ =

⎛⎜⎝−1212

−12

⎞⎟⎠ , [ψ3]Φ =

⎛⎜⎝121212

⎞⎟⎠ .

That is, ψ1 =12φ1+

12φ2− 1

2φ3, etc. Equivalently, ψ1(x, y, z) =12x+

12y− 1

2z, etc. �

In Chapter 2, we were asked to find the null space of a matrix. The equivalent ques-tion was asked in Chapter 7. It asked us to find the kernel of a given linear transforma-tion. In this chapter, we are concerned on linear form. Let V be a finite dimensionalspace. We may be asked to find the solution space of φ(α) = 0 for a given linear formφ. On the other hand, α can be viewed as a linear form of V . So we may also be askedto find all the solutions φ ∈ V such that φ(α) = 0.

Definition 9.2.4 Let V be a vector space. Let S be a subset of V (S can be empty).The set of all linear forms φ satisfying the condition that φ(α) = 0 ∀α ∈ S is called theannihilator of S and denoted by S0. Any linear form φ ∈ S0 is called an annihilator ofS.

Theorem 9.2.5 The annihilator W 0 of a subset W is a subspace of V . In addition,suppose W is a subspace of V and if dimV = n and dimW = r, then dimW 0 = n− r.

Proof: Let φ, ψ ∈ W 0 and a ∈ F. Then (aφ + ψ)(α) = aφ(α) + ψ(α) = 0 ∀α ∈ W .Thus W 0 is a subspace of V .

Suppose W is a subspace of V . Let {α1, . . . , αr} be a basis of W . Extend it to a basis{α1, . . . , αn} of V . Let {φ1, . . . , φn} be its dual basis. We shall show that φr+1, . . . , φn

span W 0 and hence form a basis of W 0.


Suppose α ∈ W . Then α =r∑

i=1aiαi for some ai ∈ F. If r + 1 ≤ j ≤ n, φj(α) =

r∑i=1

aiφj(αi) = 0. Thus φr+1, . . . , φn ∈W 0.

Suppose φ ∈ W 0. Then φ ∈ V and so φ =n∑

i=1biφi for some bi ∈ F. Then for

1 ≤ j ≤ r, 0 = φ(αj) = bj . Thus φ =n∑

i=r+1biφi. Thus W

0 = span{φr+1, . . . , φn}. �

Example 9.2.6 Let W = span{(1, 0,−1), (−1, 1, 0), (0, 1,−1)} be a subspace of R3.We would like to find a basis of W 0.

We first observe that dimW = 2 and W has a basis {(1, 0,−1), (−1, 1, 0)}. Extendthis basis to a basis {(1, 0,−1), (−1, 1, 0), (0, 1, 1)} of R3. By Example 9.2.3 we obtainthe dual basis {ψ1, ψ2, ψ3}. Then by the proof of Theorem 9.2.5 we see that {ψ3}is a basis of W 0. That is, ψ3(x, y, z) = 1

2(x + y + z). Note that we can also let

[φ3]Φ =(b1 b2 b3

)Tor φ(x, y, z) = b1x + b2y + b3z in W 0 and use definition to

determine b1, b2, b3. �

Since the dual space V of V is also a vector space. We can also consider the an-nihilator of a subset S of V (S can be empty). The annihilator S0 is a subspace ofV . Since we have identified

V with V , S0 is viewed as a subspace of V . Namely,

S0 = {α ∈ V | φ(α) = 0 ∀φ ∈ S}. By Theorem 9.2.5 we have

Theorem 9.2.7 Let S be a subspace of V . If dimV = n and dimS = s, thendimS0 = n− s.

Theorem 9.2.8 If W is a subset of a vector space V , then W ⊆ (W 0)0. If W is asubspace and dimV is finite, then (W 0)0 = W .

Proof: It is clear that W ⊆ (W 0)0 = {α ∈ V | φ(α) = 0 ∀φ ∈W 0}.If dimW < ∞, then dim(W 0)0 = dim V − dimW 0 = n − (n − dimW ) = dimW .

Hence (W 0)0 = W . �

Theorem 9.2.9 Let W1 and W2 be two subsets containing 0 in a finite dimensionalvector space V . Then (W1 + W2)

0 = W 01 ∩ W 0

2 . If W1 and W2 are subspaces, then(W1 ∩W2)

0 = W 01 +W 0

2 .

Proof: Suppose φ ∈ (W1 + W2)0. Then for α1 ∈ W1, since 0 ∈ W2, α1 = α1 + 0 ∈

W1 +W2. Thus φ(α1) = 0 and hence φ ∈W 01 . Similarly, φ ∈W 0

2 . Hence φ ∈W 01 ∩W 0

2

and (W1 +W2)0 ⊆W 0

1 ∩W 02 .

Conversely, suppose φ ∈ W 01 ∩ W 0

2 . Then for α ∈ W1 + W2, α = α1 + α2 withα1 ∈ W1 and α2 ∈ W2, φ(α) = φ(α1) + φ(α2) = 0. Hence φ ∈ (W1 + W2)

0 andW 0

1 ∩W 02 ⊆ (W1 +W2)

0.Suppose W1 and W2 are subspaces. Then W1∩W2 = (W 0

1 )0∩(W 0

2 )0. Since both W 0

1

and W 02 contain O in V , (W 0

1 +W 02 )

0 = (W 01 )

0 ∩ (W 02 )

0 by what we have just proved.Thus, W1 ∩W2 = (W 0

1 +W 02 )

0. By Theorem 9.2.8 we have (W1 ∩W2)0 = W 0

1 +W 02 .�


Remark: In establishing the second equality of Theorem 9.2.9, we do need W1 and W2

to be subspaces. For, if we let V = R2, W1 = {(0, 0), (1, 0)} and W2 = {(0, 0), (−1, 0)},then W1 ∩W2 = {(0, 0)} and hence (W1 ∩W2)

0 = V . However, W 01 = W 0

2 , so W 01 +W 0

2

is of dimension one.

Exercise 9.2

9.2-1. Let V be a finite dimensional vector space. Suppose S is a proper subspace of V .Let φ ∈ V \S. Show that there exists an α ∈ V such that φ(α) = 1 and ψ(α) = 0for all ψ ∈ S.

9.2-2. Let φ, ψ ∈ V be such that φ(α) = 0 always implies ψ(α) = 0 for all α ∈ V . Showthat ψ is a multiplies of φ, i.e., there is a scalar c ∈ F such that ψ = cφ.

9.2-3. Let W = span{(1, 0,−1), (1,−1, 0), (0, 1,−1)} be a real vector subspace. FindW 0.

9.2-4. Let W1 = {(1, 2, 3, 0), (0, 1,−1, 1)} and W2 = {(1, 0, 1, 2)} be real vector sub-spaces. Find (W1 +W2)

0 and W 01 ∩W 0

2 .

9.2-5. Show that if S and T are subspaces of V such that V = S+T , then S0∩T 0 = {0}.9.2-6. Show that if S and T are subspaces of V such that V = S⊕T , then V = S0⊕T 0.

§9.3 The Dual of a Linear Transformation

For vector spaces U and V , we will study the relation of L(U, V ) and L(V , U) in thissection.

Theorem 9.3.1 Let U and V be vector spaces and σ ∈ L(U, V ). The mappingσ : V → U defined by σ(φ) = φ ◦ σ for φ ∈ V is linear.

Proof: Since σ ∈ L(U, V ) and φ ∈ V = L(V,F), φ ◦ σ : U → F is linear and henceσ(φ) = φ ◦ σ ∈ U .

For any φ, ψ ∈ V and a ∈ F, α ∈ U , we have(σ(aφ+ ψ)

)(α) =

((aφ+ ψ) ◦ σ)(α) = (aφ+ ψ)(σ(α)) = aφ(σ(α)) + ψ(σ(α))

=(aσ(φ)

)(α) +

(σ(ψ)

)(α) =

(aσ(φ) + σ(ψ)

)(α).

Thus σ(aφ+ ψ) = aσ(φ) + σ(ψ). �

Definition 9.3.2 The mapping σ defined in Theorem 9.3.1 is called the dual of σ.

Suppose σ ∈ L(U, V ) is represented by the matrix A = (aij) with respect to thebases A = {α1, . . . , αn} of U and B = {β1, . . . , βm}, respectively.

Let Φ = {φ1, . . . , φn} be the basis in U dual to A and let Ψ = {ψ1, . . . , ψm} be thebasis in V dual to B. Let G = (gij) = [σ]ΨΦ . Now consider


(σ(ψi)

)(αj) = (ψi ◦ σ)(αj) = ψi

(σ(αj)

)= ψi

(m∑k=1

akjβk

)

=

m∑k=1

akjψi(βk) =

m∑k=1

akjδik = aij = (A)i,j = (AT )j,i.

On the other hand, we have σ(ψi) =∑n

s=1 gsiφs and hence

(σ(ψi)

)(αj) =

(n∑

s=1

gsiφs

)(αj) =

n∑s=1

gsiφs(αj) =

n∑s=1

gsjδsj = gji = (G)j,i.

This shows the following theorem:

Theorem 9.3.3 Suppose A and B are bases of finite dimensional vector spaces U andV , respectively. Let Φ and Ψ be the dual bases of A and B, respectively. If σ ∈ L(U, V ),

then σ ∈ L(V , U) and [σ]ΨΦ =([σ]AB

)T.

Let G and A be as defined in the above theorem. If we use the m× 1 matrix X torepresent ψ (i.e., [ψ]Ψ) and the n× 1 matrix Y to represent φ (i.e., [φ]Φ = Y ), then thematrix equation for σ(ψ) = φ will have the form GX = Y . But as G = AT , we haveXTA = Y T .

The dual linear transformation σ of a linear transformation σ has many same prop-erties as what σ has. We shall show some of them below.

Theorem 9.3.4 Let σ ∈ L(U, V ). Then ker(σ) = (σ(U))0.

Proof: ψ ∈ ker(σ) if and only if σ(ψ) = O

if and only if(σ(ψ)

)(α) = 0 ∀α ∈ U

if and only if ψ(σ(α)

)= 0 ∀α ∈ U

if and only if ψ ∈ (σ(U))0.

�

Corollary 9.3.5 Suppose V is finite dimensional and suppose σ ∈ L(U, V ). The linearproblem σ(ξ) = β has a solution if and only if β ∈ (ker(σ))0.

Theorem 9.3.6 Let σ ∈ L(U, V ), V finite dimensional and β ∈ V . Then either thereis ξ ∈ U such that σ(ξ) = β or there is φ ∈ V such that σ(φ) = O and φ(β) = 1.

Proof: Let β ∈ V . If β ∈ σ(U), then there exists ξ ∈ U such that σ(ξ) = β. Nowsuppose not, then by Corollary 9.3.5 that β �∈ (ker(σ))0. Thus there exists ψ ∈ ker(σ)such that ψ(β) = c �= 0. Let φ = c−1ψ. Then σ(φ) = O and φ(β) = 1. �

Theorem 9.3.7 Let U and V be finite dimensional vector spaces. Suppose σ ∈ L(U, V ).Then rank(σ) = rank(σ).


Proof: This is because

rank(σ) = dim V − nullity(σ) = dimV − dim(ker(σ))

= dim((ker(σ))0

)= dim(σ(U)) = rank(σ).

�

Theorem 9.3.8 Let σ ∈ L(V, V ). If W is a subspace invariant under σ, then W 0 is asubspace of V also invariant under σ.

Proof: Let φ ∈ W 0. Then for each α ∈ W ,(σ(φ)

)(α) = φ(σ(α)) = 0 since σ(α) ∈ W

and φ ∈W 0. Thus σ(φ) ∈W 0. �

Lemma 9.3.9 Let σ, τ ∈ L(U, V ), c ∈ F. Then cσ + τ = cσ + τ .

Proof: To prove the lemma, we have to prove that for each φ ∈ V , linear forms(cσ + τ)(φ) and (cσ + τ)(φ) in U are equal.

For each α ∈ U , we have((cσ + τ)(φ)

)(α) =

(φ ◦ (cσ + τ)

)(α) = φ

((cσ + τ)(α)) = φ

((cσ(α) + τ(α)

)= φ

(cσ(α)

)+ φ

(τ(α)

)= cφ

(σ(α)

)+ φ

(τ(α)

)= c(φ ◦ σ)(α) + (φ ◦ τ)(α) = c

(σ(φ)

)(α) +

(τ(φ)

)(α)

=(cσ(φ)

)(α) +

(τ(φ)

)(α) =

(cσ(φ) + τ(φ)

)(α) =

((cσ + τ)(φ)

)(α)

Thus cσ + τ = cσ + τ . �

From the above lemma, it shows that the mapping σ �→ σ is a linear transforma-tion from L(U, V ) to L(V , U). It is easy to show that this linear transformation is anisomorphism.

Theorem 9.3.10 Suppose V is a finite dimensional vector space and σ ∈ L(V, V ). Ifλ is an eigenvalue of σ, then λ is also an eigenvalue of σ.

Proof: Let λ be an eigenvalue of σ and put η = σ − λι. By Theorem 9.3.7 rank(η) =rank(η). Since η is singular, rank(η) < dimV = dim V . Thus rank(η) < dim V and η issingular. Clearly, by Lemma 9.3.9 η = σ − λι. It is clear that ι is the identity mappingon V . Therefore, λ is an eigenvalue of σ. �

This is an expected result, since we know that if λ is an eigenvalue of A then λ isalso an eigenvalue of AT .

Assume that U and V are two finite dimensional vector spaces. Let σ ∈ L(U, V ).

Then σ ∈ L(V , U). Let ˆσ be the dual of σ. Then ˆσ ∈ L(U,

V ). Let σ′ : U → V be the

mapping J−1V ◦ ˆσ ◦ JU , where JU : U → U and JV : V →

V are the isomorphisms in

Theorem 9.2.1. Now for α ∈ U , φ ∈ V , we have(ˆσ(JU (α))

)(φ) =

(JU (α)

)(σ(φ)) =

(σ(φ)

)(α) = φ(σ(α)) =

(JV (σ(α))

)(φ).

Thus ˆσ(JU (α)) = JV (σ(α)) for all α ∈ U . Hence ˆσ ◦ JU = JV ◦ σ. That is, σ =J−1V ◦ ˆσ ◦ JU = σ′. Thus, we may regard σ as the dual of σ.


Example 9.3.11 It is known that every functional φ on Fn is of the form φ(x1, . . . , xn) =n∑

i=1aixi for some ai ∈ F. Let us consider a simpler case. Let U = R2 and V = R3.

Suppose σ ∈ L(U, V ) defined by σ(x, y) = (x + y, x − 2y, −x). For any φ ∈ V ,φ(x, y, z) = ax+ by+ cz for some a, b, c ∈ R. Then σ(φ) is a linear functional on U . Bydefinition we have

σ(φ)(x, y) = φ(σ(x, y)) = φ(x+ y, x− 2y, −x)= a(x+ y) + b(x− 2y) + c(−x) = (a+ b− c)x+ (a− 2b)y. �

Exercise 9.3

9.3-1. Let F be a field and let φ be the linear functional on F2 defined by φ(x, y) = ax+by.For each of the following linear transformation σ, let ψ = σ(φ), find σ and ψ(x, y).

(a) σ(x, y) = (x, 0).

(b) σ(x, y) = (−y, x).(c) σ(x, y) = (x− y, x+ y).

9.3-2. Let V = R[x]. Let a and b be fixed real numbers and let φ be the linear functional

on V defined by φ(p) =

∫ b

ap(t)dt. If D is the differentiation operation on V , what

is D(φ)?

9.3-3. Let V = Mn(F) and let B ∈ V be fixed. Let σ ∈ L(V, V ) be defined by σ(A) =AB −BA. Let φ =Tr be the trace function. What is σ(φ)?

§9.4 Quadratic Forms

Let A ∈Mm,n(F). F (X,Y ) = XTAY , for X ∈ Fm and Y ∈ Fn, is a function definedon Fm × Fn. If we fix Y as a constant vector, then f(X) = F (X,Y ) is a linear formdefined on Fm. Similarly, if we fix X as a constant, then g(Y ) = F (X,Y ) is a linearfrom on Fn. Such F is called a bilinear form.

Definition 9.4.1 Let U and V be vector spaces over F. A mapping f : U × V → F issaid to be bilinear if f is linear in each variable, that is,

(1) f(aα1 + α2, β) = af(α1, β) + f(α2, β) and

(2) f(α, bβ1 + β2) = bf(α, β1) + f(α, β2)

for all α, α1, α2 ∈ U , β, β1, β2 ∈ V and a, b ∈ F. f is called a bilinear form.

Example 9.4.2 Let U = V = Rn and F = R. Suppose α = (x1, . . . , xn) and β =

(y1, . . . , yn). Define f(α, β) =n∑

i=1xiyi. Then f is a bilinear form and is known as the

inner product or dot product. �


Example 9.4.3 Let U = V be the space of all real valued continuous functions on [0, 1].

Let F = R. Define f(α, β) =

∫ 1

0α(x)β(x)dx. Then f is a bilinear form. Note that it is

also an inner product on U . We shall discuss inner product in Chapter 10. �

Let A = {α1, . . . , αm} be a basis of U and let B = {β1, . . . , βn} be a basis of V .

Then for α ∈ U and β ∈ V , we have α =m∑i=1

xiαi and β =n∑

j=1yjβj with xi, yj ∈ F. Then

f(α, β) = f

⎛⎝ m∑i=1

xiαi,n∑

j=1

yjβj

⎞⎠ =m∑i=1

n∑j=1

xiyjf(αi, βj).

Put bij = f(αi, βj) ∀i, j. This defines an m × n matrix B = (bij). B is called thematrix representing f with respect to bases A and B. We denote this matrix by {f}A

B .Suppose [α]A = X = (x1, · · · , xm)T and [β]B = Y = (y1, · · · , yn)T , then

f(α, β) = XTBY = ([α]A )T {f}AB [β]B.

Now suppose A ′ = {α′1, . . . , α′m} is another basis of U with transition matrixP = (pij) and suppose B′ = {β′1, . . . , β′n} is another basis of V with transitionmatrix Q = (qij). Suppose {f}A ′

B′ = B′ = (b′ij). Then

b′ij = f(α′i, β′j) = f

(m∑r=1

priαr,

n∑s=1

qsjβs

)=

m∑r=1

pri

(n∑

s=1

qsjf(αr, βs)

)

=

m∑r=1

n∑s=1

pribrsqsj .

Thus B′ = P TBQ. Hence rank(B′) = rank(B).In particular, when U = V we can choose A = B and A ′ = B′. Then Q = P and

B′ = P TBP .

Definition 9.4.4 Two square matrices A and B are said to be congruent if there existsa non-singular matrix P such that B = P TAP .

Proposition 9.4.5 For a fixed natural number n, congruence is an equivalence relationon Mn,n(F).

Definition 9.4.6 A bilinear form f : V × V → F is said to be symmetric if f(α, β) =f(β, α) ∀α, β ∈ V . f is said to be skew-symmetric if f(α, α) = 0 ∀α ∈ V .

Theorem 9.4.7 For a finite dimensional vector space V , bilinear form f is symmetricif and only if any representing matrix is symmetric.

Proof: Suppose the bilinear form f : V × V → F is symmetric. Let A = {α1, . . . , αn}be a basis of V . Let {f}A

A = B = (bij). Since f is symmetric, bij = f(αi, αj) =f(αj , αi) = bji ∀i, j. Thus B is symmetric.


Conversely, suppose B = {f}AA is a symmetric matrix. For α, β ∈ V , let [α]A = X

and [β]A = Y , we have f(α, β) = XTBY = (XTBY )T = Y TBTX = Y TBX =f(β, α). �

Note that if A is another matrix representing f , then A = P TBP for some invertiblematrix P . Hence B is symmetry if and only if A is symmetry.

Theorem 9.4.8 If a bilinear form f on a finite dimensional vector space is skew-symmetric, then any matrix B representing f is skew-symmetric.

Proof: Let α, β ∈ V . Then

0 = f(α+ β, α+ β) = f(α, α) + f(α, β) + f(β, α) + f(β, β) = f(α, β) + f(β, α).

Thus f(α, β) = −f(α, β). Hence, for any basis {α1, . . . , αn} of V , we have

bij = f(αi, αj) = −f(αj , αi) = −bji ∀i, j.That is, BT = −B. �

For the converse of the above theorem, we need an additional condition.

Theorem 9.4.9 If 1 + 1 �= 0 in F and the matrix B representing f is skew-symmetric,then f is skew-symmetric.

Proof: Let {α1, . . . , αn} be a basis of a vector space such that B represents f withrespect to this basis. Since BT = −B, f(αi, αj) = −f(αj , αi) ∀i, j. Hence f(α, β) =−f(β, α) ∀α, β ∈ V . In particular, f(α, α) = −f(α, α) ∀α ∈ V . Thus (1+1)f(α, α) = 0.Since 1 + 1 �= 0, f(α, α) = 0. Therefore, f is skew-symmetric. �

For convenience, we shall use 2 to denote 1 + 1 in any field. Thus 2 = 0 in the fieldZ2.

Theorem 9.4.10 If the field is such that 2 �= 0, then any bilinear from f : V × V → Fcan be expressed uniquely as a sum of symmetric bilinear form and a skew-symmetricbilinear form.

Proof: Define g, h : V × V → F by

g(α, β) = 2−1[f(α, β) + f(β, α)] and h(α, β) = 2−1[f(α, β)− f(β, α)], α, β ∈ V.

Then it is easy to see that g is a symmetric bilinear form, h is a skew-symmetricbilinear form and f = g + h. To show the uniqueness, suppose f = g1 + h1 is anothersuch decomposition with g1 symmetric and h1 skew-symmetric. Then by the definitionsof g and g1,

g(α, β) = 2−1[f(α, β) + f(β, α)] = 2−1[g1(α, β) + h1(α, β) + g1(β, α) + h1(β, α)]

= 2−1[2g1(α, β)] = g1(α, β).

Hence h1 = f − g1 = f − g = h. �

Note that if 2 �= 0 in a field F, then for any square matrix A over F, A can alwaysbe expressed uniquely as A = B+C with B symmetric and C skew-symmetric. We cansimply put B = 2−1(A+AT ) and C = 2−1(A−AT ).


Definition 9.4.11 A function q from a vector space V into the scalar field F is calleda quadratic form if there exists a bilinear form f : V × V → F such that q(α) = f(α, α).

Remark 9.4.12 Suppose the bilinear form f is of the form g + h with g symmetricand h skew-symmetric. Then q(α) = f(α, α) = g(α, α). Thus q is determined by thesymmetric part of f only.

Theorem 9.4.13 Suppose 2 �= 0. There is a bijection between symmetric bilinear formsand quadratic forms.

Proof: From Remark 9.4.12 we have seen that a symmetric bilinear form determines aquadratic form. So there is a mapping, written as φ, between symmetric bilinear formsand quadratic forms. Now we want to show that this mapping is a bijection.

Note that, suppose q is a quadratic form defined by a symmetric bilinear form f .Then by definition

q(α+ β)− q(α)− q(β) = f(α+ β, α+ β)− f(α, α)− f(β, β)

= f(α, β) + f(β, α) = 2f(α, β). (9.3)

Suppose there are two symmetric bilinear forms f and f1 both mapped to the samequadratic form q under φ. Then by Equation (9.3) we have 2f(α, β) = 2f1(α, β) for allα, β ∈ V . Since 2 �= 0, f = f1. Thus, φ is an injection.

Suppose q is a quadratic form. By definition there is a bilinear from f such thatq(α) = f(α, α) for α ∈ V . Since 2 �= 0, we can define a bilinear form g by g(α, β) =2−1[f(α, β)+f(β, α)]. Clearly g is symmetric and g(α, α) = q(α) for α ∈ V . This meansthat φ is a surjection. �

A matrix representing a quadratic form is a symmetric matrix which represents thecorresponding symmetric bilinear form given by Theorem 9.4.13. Here, we assume that2 �= 0.

Example 9.4.14 Let q : R2 → R be defined by q(x, y) = x2 − 4xy + 3y2. Then q is a

quadratic, since q(x, y) = f((x, y), (x, y)), where f((x, y), (u, v)) = (x, y)

(1 0

−4 3

)(u

v

)is a bilinear form. We can choose a symmetric bilinear form

g((x, y), (u, v)) = (x, y)

(1 −2

−2 3

)(u

v

).

It is easy to see that q(x, y) = f((x, y), (x, y)) and it can be expressed as the matrixproduct

(x, y)

(1 −2

−2 3

)(u

v

).

�


In general, if 2 �= 0, then the quadratic form XTAX = q(x1, . . . , xn) =n∑

i=1

n∑j=1

aijxixj

has the representing matrix S = (sij) with sij = 2−1(aij + aji).

Example 9.4.15 Let V = Z22. Let f be a bilinear form whose representing matrix is(

1 1

1 0

)with respect to the standard basis. Clearly f is symmetric. f determines the

quadratic form q(x, y) = x2. One can see that there is another symmetric f1 whose

representing matrix is

(1 0

0 0

)determining q. �

Given a quadratic form, we want to change the variables such that the quadratic formbecomes as simple as possible. Any quadratic form can be represented by a symmetricmatrix A. If we change the basis of the domain space of the quadratic form, thenthe representing matrix becomes P TAP for some invertible matrix P . So the problembecomes to find an invertible matrix P such that P TAP is in the simplest form.

Theorem 9.4.16 Let f be a symmetric bilinear form on V , where V is finite dimen-sional vector space over F. If 2 �= 0 in F, then there is a basis B of V such that {f}B

Bis diagonal.

Proof: Let A be a basis of V . Let A = {f}AA .

We shall prove the theorem by induction on n, the dimension of V . If n = 1, thenthe theorem is obvious.

Suppose the theorem holds for all symmetric bilinear forms on vector spaces ofdimension n − 1. Let f be a symmetric bilinear form on V with dimV = n > 1. IfA = O, then it is already diagonal. Thus we assume A �= O.

Let q be the quadratic form induced from f . Then f(α, β) = 2−1[q(α+ β)− q(α)−q(β)]. If q(α) = 0 ∀α ∈ V , then f(α, β) = 0 ∀α, β ∈ V . Since A �= O, there existsβ1 ∈ V such that q(β1) = d1 �= 0. Define φ : V → F by φ(α) = f(β1, α). Then φ is anonzero linear form since φ(β1) = d1 �= 0.

Let W = ker(φ). Then dimW = n − 1. Suppose g is the restriction of f onW ×W . Then g is a symmetric bilinear form on W . By induction, there is a basisB1 = {β2, . . . , βn} of W such that {g}B1

B1is diagonal. That is, g(βi, βj) = 0, for i �= j,

2 ≤ i, j ≤ n. However, for i ≥ 2, f(βi, β1) = f(β1, βi) = φ(βi) = 0. Thus f(βi, βj) = 0,for i �= j, 1 ≤ i, j ≤ n. Let B = {β1}∪B1. Then {f}B

B = diag{d1, d2, . . . , dn}, for somed2, . . . , dn ∈ F. �

For any given symmetric matrix A, it is a representing matrix of a bilinear form onsome finite dimensional vector space. For example, we may choose V = Fn, A is thestandard basis of Fn and f(X,Y ) = XTAY for all X,Y ∈ Fn. By Theorem 9.4.16 if welet P be the matrix of transition from A to B, then we have the following corollary.

Corollary 9.4.17 Let F be a field such that 2 �= 0. Then for any symmetric matrix Aover F, there is an invertible matrix P such that P TAP is diagonal.


The above result may not hold if 1 + 1 = 0. For example, F = Z2 and A =

(0 1

1 0

).

Then we cannot find an invertible matrix such that P TAP is diagonal.

Note that, the di’s along the main diagonal of {f}BB = P TAP (for some P ) described

in the proof of Theorem 9.4.16 are not unique. We can introduce a third basis C ={γ1, . . . , γn} by putting γi = aiβi with ai �= 0 for all i. Then the matrix of transition Qfrom B to C is diag{a1, . . . , an}. Then the matrix representing f with respect to C is(PQ)TAPQ = diag{a21d1, . . . , a2ndn}. Thus the elements in the main diagonal may bemultiplied by arbitrary non-zero squares from F.

Theorem 9.4.18 If F = C, then every symmetric matrix is congruent to a diagonalmatrix in which all the non-zero elements are 1’s.

Proof: We simply choose ai for each i for which di �= 0 such that a2i di = 1. �

In practice, how do we find an invertible matrix P or the diagonal form D? We shallprovide three methods for finding such matrices.

Inductive method

Let A = (aij) be a symmetric matrix. We shall use the idea used in the proof ofTheorem 9.4.16 to find an invertible matrix P such that P TAP is diagonal. As before,A determines a symmetric bilinear form f and a quadratic form q with respect to somebasis A = {α1, . . . , αn} of some vector space V . Here 2 �= 0 in F.

First choose β1 such that f(β1, β1) = q(β1) �= 0. If a11 �= 0, then we let β1 = α1, fora11 = f(α1, α1). If a11 = 0 but a22 �= 0, then we let β1 = α2. If a11 = a22 = 0 but a12 �=0, then we let β1 = α1+α2. Then f(β1, β1) = f(α1, α1)+2f(α1, α2)+f(α2, α2) = 2a12.If a11 = a12 = a22 = 0 but a33 �= 0, then we let β1 = α3. Thus, unless A = O, thisprocess enables us to choose β1 such that f(β1, β1) �= 0.

Define φ1 ∈ V by φ1(α) = f(β1, α) ∀α ∈ V . Now suppose [β1]A =

⎛⎜⎜⎝p11...

pn1

⎞⎟⎟⎠ and

[α]A =

⎛⎜⎜⎝x1...

xn

⎞⎟⎟⎠. That is, β1 =n∑

i=1pi1αi and α =

n∑i=1

xiαi. Then φ1(α) =n∑

j=1

n∑i=1

pi1aijxj .

Thus φ1 is represented by the 1× n matrix(p11 · · · pn1

)A.

Next we put W1 = ker(φ1). If q∣∣W1

= O, then we are done. Otherwise, choose

β2 ∈ W1 such that q(β2) = f(β2, β2) �= 0. Define φ2 ∈ V by φ2(α) = f(β2, α) ∀α ∈ V .

Suppose [β2]A =

⎛⎜⎜⎝p12...

pn2

⎞⎟⎟⎠, then by the above argument, φ2 is represented by the 1 × n

matrix(p12 · · · pn2

)A.


Let W2 = W1 ∩ ker(φ2). If q∣∣W2

= O, then we are done. Otherwise, choose β3 ∈W2

such that q(β3) = f(β3, β3) �= 0. Define φ3 ∈ V by φ3(α) = f(β3, α) ∀α ∈ V . Thisprocess can be continued until we have found {β1, . . . , βn} and {φ1, . . . , φn} so that

D = P TAP = diag{φ1(β1), φ2(β2), . . . , φn(βn)}.

Here φi(βi) = f(βi, βi) and P = (pij).


⎛⎜⎜⎜⎜⎝0 1 −1 2

1 1 0 −1−1 0 −1 1

2 −1 1 0

⎞⎟⎟⎟⎟⎠ over R.

For convenience to compute, we usually choose A to be the standard basis (of R4).Since (A)1,1 = 0 and (A)2,2 = 1 �= 0, we put β1 = e2 = (0, 1, 0, 0). Now the linear formφ1 is represented by (

0 1 0 0)A =

(1 1 0 −1

).

Thus φ1(β1) = 1. Next, we have to choose β2 such that φ1(β2) = 0 and f(β2, β2) �= 0, ifpossible.

If β2 = (x1, x2, x3, x4), then φ1(β2) = 0 implies x1 + x2 − x4 = 0. If we chooseβ2 = (1, 0, 0, 1) then the linear form φ2 is represented by(

1 0 0 1)A =

(2 0 0 2

).

In this case, f(β2, β2) = 4 �= 0. Next we have to choose β3 such that φ1(β3) = φ2(β3) = 0but f(β3, β3) �= 0, if possible.

If β3 = (x1, x2, x3, x4), then{x1 + x2 − x4 = 0,

x1 + x4 = 0.

Solving this system we get β3 = (1,−2, 0,−1). Thus φ3 is represented by(1 −2 0 −1

)A =

(−4 0 −2 4

).

Also φ3(β3) = −8 �= 0.Finally, we choose β4 such that φ1(β4) = φ2(β4) = φ3(φ4) = 0. Hence we have to

solve the system ⎧⎪⎨⎪⎩x1 + x2 − x4 = 0,

x1 + x4 = 0,

2x1 + x3 − 2x4 = 0.

It is easy to see that we can choose β4 = (−1, 2, 4, 1). Then φ4 is defined by(−1 2 4 1

)A =

(0 0 −2 0

).


Thus φ4(β4) = −8. So

P =

⎛⎜⎜⎜⎜⎝0 1 1 −11 0 −2 2

0 0 0 4

0 1 −1 1

⎞⎟⎟⎟⎟⎠ and D = P TAP =

⎛⎜⎜⎜⎜⎝1 0 0 0

0 4 0 0

0 0 −8 0

0 0 0 −8

⎞⎟⎟⎟⎟⎠ .

�

Elementary row and column operations method

We shall use the elementary row operations and elementary column operations toreduce a symmetric matrix to a diagonal matrix. Let A be a symmetric matrix. ByCorollary 9.4.17, there exists an invertible matrix P such that P TAP = D a diago-nal matrix. Since P is invertible, by Theorem 2.2.5 P = E1E2 · · ·Er is a product ofelementary matrices. Thus,

D = (E1E2 · · ·Er)TA(E1E2 · · ·Er) = ET

r

(· · · (ET

2 (ET1 AE1)E2

) · · ·)Er.

Note that ET1 is also an elementary matrix and ET

1 A is obtained from A by applyingan elementary row operation to A. Hence (ET

1 A)E1 is obtained by applying the samecolumn operation to ET

1 A. Thus the diagonal form will be obtained after successive useof elementary operations and the corresponding column operations.

In practice, we form the augmented matrix(A|I), where the function of I is to record

the product of elementary matrices ET1 , E

T2 , . . . , E

Tr . After applying an elementary row

operation (or a sequence of elementary row operations) to(A|I) we also apply the same

elementary column operation (or the same sequence of column operations) to A. Thisprocess is to be continued until A becomes a diagonal matrix. At this time I becomesthe matrix ET

1 ET2 · · ·ET

r , i.e., PT . Note that we need 1 + 1 �= 0 in F. Also note that

exchanging two rows is always of no use, since we have to exchange the correspondingcolumns.



⎛⎜⎝0 1 2

1 0 1

2 1 0

⎞⎟⎠ over R. Then

(A|I) =

⎛⎜⎝ 0 1 2 1 0 0

1 0 1 0 1 0

2 1 0 0 0 1

⎞⎟⎠ 1R2+R1−−−−−→

⎛⎜⎝ 1 1 3 1 1 0

1 0 1 0 1 0

2 1 0 0 0 1

⎞⎟⎠1C2+C1−−−−→

⎛⎜⎝ 2 1 3 1 1 0

1 0 1 0 1 0

3 1 0 0 0 1

⎞⎟⎠ − 12R1+R2

− 32R2+R3−−−−−−−→

⎛⎜⎝ 2 1 3 1 1 0

0 −12 −1

2 −12

12 0

0 −12 −9

2 −32 −3

2 1

⎞⎟⎠− 1

2C1+C2

− 32C2+C3−−−−−−→

⎛⎜⎝ 2 0 0 1 1 0

0 −12 −1

2 −12

12 0

0 −12 −9

2 −32 −3

2 1

⎞⎟⎠(−1)R2+R3−−−−−−−−→

⎛⎜⎝ 2 0 0 1 1 0

0 −12 −1

2 −12

12 0

0 0 −4 −1 −2 1

⎞⎟⎠(−1)C2+C3−−−−−−−→

⎛⎜⎝ 2 0 0 1 1 0

0 −12 0 −1

212 0

0 0 −4 −1 −2 1

⎞⎟⎠ .

Thus P T =

⎛⎜⎝ 1 1 0

−12

12 0

−1 −2 1

⎞⎟⎠ and P TAP =

⎛⎜⎝ 2 0 0

0 −12 0

0 0 −4

⎞⎟⎠. �

For comparison, we shall use elementary row and column operation method to reducethe matrix in Example 9.4.19 to diagonal form.


Example 9.4.21 Let A be the matrix in Example 9.4.19.

⎛⎜⎜⎜⎜⎝

0 1 −1 2 1 0 0 0

1 1 0 −1 0 1 0 0

−1 0 −1 1 0 0 1 0

2 −1 1 0 0 0 0 1

⎞⎟⎟⎟⎟⎠

1R4+R1−−−−−−→

⎛⎜⎜⎜⎜⎝

2 0 0 2 1 0 0 1

1 1 0 −1 0 1 0 0

−1 0 −1 1 0 0 1 0

2 −1 1 0 0 0 0 1

⎞⎟⎟⎟⎟⎠

1C4+C1−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 2 1 0 0 1

0 1 0 −1 0 1 0 0

0 0 −1 1 0 0 1 0

2 −1 1 0 0 0 0 1

⎞⎟⎟⎟⎟⎠

− 12R1+R4−−−−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 2 1 0 0 1

0 1 0 −1 0 1 0 0

0 0 −1 1 0 0 1 0

0 −1 1 −1 − 12

0 0 12

⎞⎟⎟⎟⎟⎠

− 12C1+C4−−−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 −1 0 1 0 0

0 0 −1 1 0 0 1 0

0 −1 1 −1 − 12

0 0 12

⎞⎟⎟⎟⎟⎠

1R2+R4−−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 −1 0 1 0 0

0 0 −1 1 0 0 1 0

0 0 1 −2 − 12

1 0 12

⎞⎟⎟⎟⎟⎠

1C2+C4−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 0 0 1 0 0

0 0 −1 1 0 0 1 0

0 0 1 −2 − 12

1 0 12

⎞⎟⎟⎟⎟⎠

1R3+R4−−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 0 0 1 0 0

0 0 −1 1 0 0 1 0

0 0 0 −1 − 12

1 1 12

⎞⎟⎟⎟⎟⎠

1C3+C4−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 0 0 1 0 0

0 0 −1 0 0 0 1 0

0 0 0 −1 − 12

1 1 12

⎞⎟⎟⎟⎟⎠

2R42C4−−−−−→

⎛⎜⎜⎜⎜⎝

4 0 0 0 1 0 0 1

0 1 0 0 0 1 0 0

0 0 −1 0 0 0 1 0

0 0 0 −4 −1 2 2 1

⎞⎟⎟⎟⎟⎠

.

In this case, P T =

⎛⎜⎜⎜⎜⎝1 0 0 1

0 1 0 0

0 0 1 0

−1 2 2 1

⎞⎟⎟⎟⎟⎠, i.e., P =

⎛⎜⎜⎜⎜⎝1 0 0 −10 1 0 2

0 0 1 2

1 0 0 1

⎞⎟⎟⎟⎟⎠, and

P TAP =

⎛⎜⎜⎜⎜⎝4 0 0 0

0 1 0 0

0 0 −1 0

0 0 0 −4

⎞⎟⎟⎟⎟⎠ .

Note that, actually the last step is not necessary. �


Completing square method

Consider the quadratic form

XTAX =

n∑i=1

n∑j=1

xiaijxj =

n∑i=1

aiix2i + 2

⎛⎝ ∑1≤i<j≤n

aijxixj

⎞⎠ .

Of course we assume 2 �= 0 in F.

Case 1. Suppose akk �= 0 for some k. Put x′k =n∑

j=1akjxj . Then

XTAX − a−1kk x′k2 =

n∑i=1

aiix2i + 2

⎛⎝ ∑1≤i<j≤n

aijxixj

⎞⎠− a−1kk

⎡⎣ n∑j=1

a2kjx2j + 2

⎛⎝ ∑1≤i<j≤n

akiakjxixj

⎞⎠⎤⎦ .

=

n∑i=1

(aii − a−1kk a

2ki

)x2i + 2

⎡⎣ ∑1≤i<j≤n

(aij − akiakja

−1kk

)xixj

⎤⎦

=n∑

i=1

(aii − a−1kk a

2ki

)x2i + 2

⎡⎢⎢⎣ ∑1≤i<j≤ni �=k, j �=k

(aij − akiakja

−1kk

)xixj

⎤⎥⎥⎦ .

This is a quadratic form which does not involve xk, thus the inductive stepsapply.

Case 2. Suppose akk = 0 ∀k = 1, 2, . . . , n. Then XTAX = 2

( ∑1≤i<j≤n

aijxixj

).

Suppose ars �= 0 for some r < s. Put x′r = xr + xs and x′s = xr − xs. That is,xr = 2−1(x′r + x′s), xs = 2−1(x′r − x′s) and xrxs = 2−2(x′r2 − x′s2). Thus

XTAX = 2

⎛⎝ ∑1≤i<j≤n

aijxixj

⎞⎠ = 2−1ars(x′r2 − x′s

2) + · · · .

Then, we can apply Case 1.


⎛⎜⎝ 1 2 3

2 0 −13 −1 1

⎞⎟⎠ be a real matrix. Then XTAX = x21 +

x23 + 4x1x2 + 6x1x3 − 2x2x3. Since (A)1,1 = 1 �= 0, we let x′1 = x1 + 2x2 + 3x3. Then

XTAX − x′12 = −4x22 − 14x2x3 − 8x23 = −4(x22 + 7

2x2x3 +4916x

23) +

174 x

23.


If we put x′2 = x22 +74x3 and x′3 = x3, then

XTAX = x′12 − 4x′2

2 + 174 x

′32.

So we have

X ′ =

⎛⎜⎝x′1x′2x′3

⎞⎟⎠ =

⎛⎜⎝1 2 3

0 1 74

0 0 1

⎞⎟⎠⎛⎜⎝x1

x2

x3

⎞⎟⎠ .

If we writeXTAX =(XT (P−1)T (P TAP )P−1X)

)= X ′TDX ′, whereD = diag{1,−4, 174 }.

Then X ′ = P−1X and the transition matrix

P =

⎛⎜⎝ 1 −2 12

0 1 −74

0 0 1

⎞⎟⎠ .

�

Exercise 9.4

9.4-1. Consider elements of R2 as column vectors. Define f : R2 → R2 by f(X,Y ) =

det(X Y

). Here X =

(x1

x2

), Y =

(y1

y2

)and

(X Y

)=

(x1 y1

x2 y2

). Is f

bilinear?

9.4-2. Define f : R3 × R3 → R by f((x1, x2, x3), (y1, y2, y3)

)= x1y1 − 2x2y2 + 2x2y1 −

x3y3. Show that f is bilinear. Also find the matrix representing f with respectto the basis A = {(1, 0, 1), (1, 0,−1), (0, 1, 0)}.

9.4-3. Let V = M2,2(F). Define f : V × V → V by f(A,B) = Tr(A)Tr(B). Show thatf is bilinear. Also find the matrix representing f with respect to the basis

A =

{E1,1 =

(1 0

0 0

), E1,2 =

(0 1

0 0

), E2,1 =

(0 0

1 0

), E2,2 =

(0 0

0 1

)}.

9.4-4. Show that a skew-symmetric matrix over F of odd order must be singular.

9.4-5. Show that if A is skew-symmetric and congruent to a diagonal matrix, thenA = O. Here 2 �= 0.

9.4-6. Reduce the following matrices to diagonal form which are congruent to them.

(a)

⎛⎜⎝ 1 3 0

3 −2 −10 −1 1

⎞⎟⎠; (b)

⎛⎜⎝ 1 2 2

2 1 −22 −2 1

⎞⎟⎠; (c)

⎛⎜⎜⎜⎜⎝0 1 2 3

1 0 1 2

2 1 0 1

3 2 1 0

⎞⎟⎟⎟⎟⎠.


§9.5 Real Quadratic Forms

In this section and the following section we only consider finite dimensional vectorspace. In this section we only consider vector space over R.

From the previous section we knew that given a quadratic form q we can choosea basis such that the matrix representing q is diagonal. On the diagonal of this di-agonal matrix, we shall show that the numbers of positive and negative numbers areindependent of the bases chosen.

Theorem 9.5.1 Let q be a quadratic form over R. Let P and N be the number ofpositive terms and negative terms in a diagonalized representation of q, respectively.Then these two numbers are independent of the representations chosen.

Proof: Let {α1, . . . , αn} be a basis of a vector space V which yields a diagonalizedrepresentation of q with P positive terms and N negative terms in the main diagonal.

Without loss of generality, we may assume that the first P elements of the maindiagonal are positive. Suppose {β1, . . . , βn} is another basis yielding a diagonalizedrepresentation of q with the first P ′ elements of the main diagonal positive.

Let U = span{α1, . . . , αP } and W = span{βP ′+1, . . . , βn}. For α ∈ U , α =P∑i=1

aiαi

for some ai ∈ R. Then

q(α) =P∑i=1

P∑j=1

aiajf(αi, αj) =P∑i=1

a2i f(αi, αi) ≥ 0,

here f is the symmetric bilinear form determining q. Thus q(α) = 0 ⇔ ai = 0 ∀i =1, . . . , P . That is, q(α) = 0 ⇔ α = 0. Also, if β ∈ W , then by the definition, q(β) ≤ 0.Thus U ∩W = {0}. Since dimU = P , dimW = n− P ′ and dim(U +W ) ≤ n, we haveP + n− P ′ = dimU + dimW = dim(U +W ) ≤ n. Therefore, P ≤ P ′.

Similarly, we can show P ′ ≤ P . Thus P = P ′ and N = r − P = r − P ′ = N ′, here ris the rank of the representation matrix. �

Remark 9.5.2 For a symmetric matrix A ∈Mn(R), define q(X) = XTAX for X ∈ Rn.Then q is a quadratic form with representing matrix A with respect to the standard basisof Rn. By Theorem 9.5.1 if A is congruent to a diagonal matrix D, i.e., QTAQ = D forsome non-singular matrix Q, then the number of positive terms and negative terms inthe diagonal of D are independent of the choice of Q.

Definition 9.5.3 The number S = P −N is called the signature of the quadratic formq. By Theorem 9.5.1, S is well-defined. A quadratic form q is called non-negative definiteif S = r, where r is the rank of the matrix representing q. If S = n, then q is said tobe positive definite. Similarly, if S = −r, then q is called non-positive definite; and ifS = −n, then q is said to be negative definite.

Remark 9.5.4 Since r = P +N , q is non-negative definite if and only if P −N = S =P +N if and only if N = 0. Also q is positive definite if and only if S = n if and only ifP −N = n if and only if P = n and N = 0. Similarly, q is non-positive definite if andonly if P = 0; and q is negative definite if and only if N = n and P = 0.


Proposition 9.5.5 Suppose A is real symmetric matrix which represents a positive def-inite quadratic form. Then detA > 0.

Proof: By Corollary 9.4.17 there exists an invertible matrix P such that P TAP = Dis diagonal. Since q is positive definite, all the diagonal entries of D are positive. Thus, detD > 0. But detD = det(P TAP ) = (detP )2(detA). Hence detA > 0. �

Definition 9.5.6 A symmetric real matrix is called positive definite, negative definite,non-negative definite or non-positive definite if it represents a positive definite, negativedefinite, non-negative definite or non-positive definite quadratic form, respectively. Thesignature of a symmetric real matrix is the signature of quadratic form it defines.

Definition 9.5.7 Let V be a vector space over R. Suppose q is a quadratic form onV and f is the symmetric bilinear form such that q(α) = f(α, α) ∀α ∈ V . The Gramdeterminant of the vectors α1, . . . , αk with respect to q is defined to be the determinant

G(α1, . . . , αk) =

∣∣∣∣∣∣∣∣f(α1, α1) f(α1, α2) · · · f(α1, αk)

......

......

f(αk, α1) f(αk, α2) · · · f(αk, αk)

∣∣∣∣∣∣∣∣ .Theorem 9.5.8 Let V be a vector space over R with a positive definite quadratic formq. Then α1, . . . , αk are linearly independent if and only if G(α1, . . . , αk) > 0.

Proof: Let q be the quadratic form induced from a symmetric bilinear form f .Suppose α1, . . . , αk are linearly independent. Let W = span{α1, . . . , αk}. Then

B = {α1, . . . , αk} is a basis of W . Let q′ = q∣∣W. Then q′ is a positive definite quadratic

form on W . With respect to the basis B, q′ is represented by the k×k matrix A = (aij),where aij = f(αi, αj). Since q

′ is positive definite, by Proposition 9.5.5 detA > 0. Thatis, G(α1, . . . , αk) > 0.

Conversely, suppose α1, . . . , αk are linearly dependent. Then there exist real numbers

c1, . . . , ck not all zero such thatk∑

i=1ciαi = 0. Hence f

(αj ,

k∑i=1

ciαi

)= 0 ∀j = 1, 2, . . . , k.

Hence the system of linear equations⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩f(α1, α1)x1+ · · · + f(α1, αk)xk = 0

f(α2, α1)x1+ · · · + f(α2, αk)xk = 0

...

f(αk, α1)x1+ · · · + f(αk, αk)xk = 0

has non-trivial solution (c1, c2, . . . , ck). Thus the determinant of the coefficient matrixmust be zero. That is, G(α1, . . . , αk) = 0. �

Note that, from the above proof we can see that the Gram determinant with respectto a positive definite quadratic form must be non-negative.

Exercise 9.5


9.5-1. Find the signature of the following matrices.

(a)

(0 1

1 0

); (b)

⎛⎜⎝2 1 1

1 2 1

1 1 2

⎞⎟⎠; (c)

⎛⎜⎝ 3 1 2

1 4 0

2 0 −1

⎞⎟⎠.

9.5-2. Suppose A is a real symmetric matrix. Suppose A2 = O. Is A = O? Why?

9.5-3. Reduce the quadratic form q(x1, x2, x3) = 2x1x2 + 4x1x3 − x22 + 6x2x3 + 4x23 to

diagonal form. That is, find an invertible matrix P such that if Y = P−1

⎛⎜⎝x1

x2

x3

⎞⎟⎠ =

⎛⎜⎝y1

y2

y3

⎞⎟⎠, then q(x1, x2, x3) is a quadratic form with variables y1, y2 and y3 but with

no mixed terms yiyj for i �= j.

9.5-4. Reduce the quadratic form q(x, y, z, w) = x2 − 2xy − y2 + 4yw + w2 − 6zw + z2

to diagonal form.

§9.6 Hermitian Forms

After considering real quadratic forms, in this section we consider complex quadraticforms which are called Hermitian quadratic forms. They will have similar properties asreal quadratic forms.

Definition 9.6.1 Let F ⊆ C and let V be a vector space over F. A mappingf : V × V → C is called a Hermitian form if ∀α, β, β1, β2 ∈ V , b ∈ F

(1) f(α, β) = f(β, α), where z is the complex conjugate of z, and

(2) f(α, bβ1 + β2) = bf(α, β1) + f(α, β2).

That is, f is linear in the second variable and conjugate linear in the first variable. Fora given Hermitian form f , the mapping q : V → C defined by q(α) = f(α, α) is called aHermitian quadratic form. Note that q(α) ∈ R.

Definition 9.6.2 Let B = {α1, . . . , αn} be a basis of V . Define hij = f(αi, αj) ∀i, j =1, . . . , n. Put H = (hij). Then H is the representing matrix of f with respect to B.

Since hji = f(αj , αi) = f(αi, αj) = hij , HT = H where H = (hij). Such a matrix is

called a Hermitian matrix.

For any square matrix A over C, we denote A∗ = A T = AT . Thus, H is Hermitianif and only if H∗ = H. Note that if H is Hermitian, then (H)i,i ∈ R. Also if H is a realmatrix, then H is Hermitian if any only if H is symmetric.


Proposition 9.6.3 Let H = (hij) be the matrix representing a Hermitian form f withrespect to a basis A = {α1, . . . , αn}. Suppose B = {β1, . . . , βn} is another basis andsuppose P = (pij) is the matrix of transition from A to B. If K = (kij) is the matrixrepresenting f with respect to the basis B, then K = P ∗HP .

Proof: Since βj =n∑

i=1pijαi for each j, we have

kij = f(βi, βj) = f( n∑

r=1

priαr,

n∑s=1

psjαs

)=

n∑r=1

n∑s=1

prif(αr, αs)psj =

n∑r=1

n∑s=1

prihrspsj .

�

Definition 9.6.4 Two matrices H and K are said to be Hermitian congruent if thereexists an invertible matrix P such that P ∗HP = K.

Note that it is easy to see that Hermitian congruence is an equivalence relation.

Theorem 9.6.5 For a given Hermitian matrix H over C, there is an invertible matrixP such that P ∗HP = D is a diagonal matrix.

Proof: Choose a vector space V and a basis A . Then with respect to this basis,H defines a Hermitian form f and hence a Hermitian quadratic form q. The relationbetween f and q is

f(α, β) =1

4[q(α+ β)− q(α− β)− iq(α+ iβ) + iq(α− iβ)].

As in the proof of Theorem 9.4.16, choose β1 ∈ V such that q(β1) �= 0. If this is notpossible, then f = 0 and we are finished.

Define a linear form φ on V by the formula φ(α) = f(β1, α) ∀α ∈ V . If β1 is

represented by(p11 · · · pn1

)Tand α by

(x1 · · · xn

)Twith respect to A , then

φ(α) =n∑

j=1

(n∑

k=1

pk1hkj

)xj . Thus φ is represented by the matrix

(p11 · · · pn1

)H.

The rest of the proof is very much the same as that of the proof of Theorem 9.4.16and hence we shall omit it. �

Note that, if H is a real Hermitian matrix, then the above theorem is just Theo-rem 9.4.16. From the proof of Theorem 9.6.5, it is easy to see that the field to whichentries of H belong is not necessary the whole complex field C. It can be any subfieldof C that containing i.

We can use the inductive method or row and column operations method to reducea Hermitian matrix to a diagonal matrix. But when using the elementary row andcolumn operations method, we have only to make one modification. That is, wheneverwe multiple one row by a complex numbers c we have to multiply the correspondingcolumn by c. Also, instead of getting P T we get P ∗.


Example 9.6.6 Let H =

⎛⎜⎝ 1 i 0

−i 1 i

0 −i 1

⎞⎟⎠. We shall apply the elementary row and

column operations method to find an invertible matrix P such that P ∗HP is diagonal.

(H|I) iR1+R2−−−−−→

⎛⎜⎝ 1 i 0 1 0 0

0 0 i i 1 0

0 −i 1 0 0 1

⎞⎟⎠ −iC1+C2−−−−−→

⎛⎜⎝ 1 0 0 1 0 0

0 0 i i 1 0

0 −i 1 0 0 1

⎞⎟⎠1R3+R2−−−−−→

⎛⎜⎝ 1 0 0 1 0 0

0 −i 1 + i i 1 1

0 −i 1 0 0 1

⎞⎟⎠ 1C3+C2−−−−→

⎛⎜⎝ 1 0 0 1 0 0

0 1 1 + i i 1 1

0 1− i 1 0 0 1

⎞⎟⎠(−1+i)R2+R3−−−−−−−−−→

⎛⎜⎝ 1 0 0 1 0 0

0 1 1 + i i 1 1

0 0 −1 −1− i −1 + i i

⎞⎟⎠(−1−i)C2+C3−−−−−−−−→

⎛⎜⎝ 1 0 0 1 0 0

0 1 0 i 1 1

0 0 −1 −1− i −1 + i i

⎞⎟⎠ .

Hence P ∗ =

⎛⎜⎝ 1 0 0

i 1 1

−1− i −1 + i i

⎞⎟⎠, so P =

⎛⎜⎝1 −i −1 + i

0 1 −1− i

0 1 −i

⎞⎟⎠ and

P ∗HP =

⎛⎜⎝ 1 0 0

0 1 0

0 0 −1

⎞⎟⎠. �

Remark 9.6.7 Since the diagonal matrix P ∗HP = D = diag{d1, . . . , dn} is Hermitian,the diagonal entries are real. However, these diagonal entries are not unique. Wecan transform D into another diagonal matrix D′ by means of a diagonal matrix Q =diag{a1, . . . , an} with ai �= 0 ∀i. Then

D′ = Q∗DQ = diag{d1|a1|2, . . . dr|ar|2, 0, . . . , 0}.Here r = rank(H). Note that even though we are dealing with complex numbers, thetransformation multiplies the diagonal entries of D by positive real numbers.

Definition 9.6.8 A Hermitian form or a Hermitian matrix is said to be positive definite(respectively, non-negative definite, negative definite or non-positive semi-definite) if theassociated quadratic form is positive definite (respectively, non-negative definite, nega-tive definite or non-positive definite). By Remark 9.6.7 and the fact that the associatedquadratic form is a real quadratic form, we see that these definitions are well defined.

The definition of the signature of a Hermitian quadratic or Hermitian matrix is thesame as that of the real quadratic form or symmetric real matrix.


Remark 9.6.9 For infinite dimensional vector space V , a quadratic form q is said tobe positive definite (negative definite respectively) if the corresponding Hermitian formf satisfied the condition that f(α, α) > 0 (f(α, α) < 0 respectively) ∀α ∈ V \ {0}. q issaid to be non-negative definite (non-positive definite respectively) if q(α) ≥ 0 (q(α) ≤ 0respectively) ∀α ∈ V . For finite dimensional vector spaces, these two definitions areeasily seen to be equivalent. Thus suppose A is a real symmetric (Hermitian respectively)matrix. Then A is positive definite if and only if XTAX > 0 (X∗AX > 0) for all non-zero n× 1 matrix X. The definition of the other definite or definite matrix is similarlydefined.

Exercise 9.6

9.6-1. Reduce the following matrices to diagonal form which are Hermitian congruentto them. Also find their signatures.

(a)

⎛⎜⎝ 1 i 1− i

−i −1 0

1 + i 0 1

⎞⎟⎠; (b)

⎛⎜⎝ 1 i 1 + i

−i 1 i

1− i −i 1

⎞⎟⎠9.6-2. Show that if H is a positive definite Hermitian matrix, then there exists an

invertible matrix P such that H = P ∗P . Also show that if H is real then we canchoose P to be real.

9.6-3. Show that if A is an invertible matrix, then A∗A is positive definite Hermitianmatrix.

9.6-4. Show that if A is a square matrix, then A∗A is Hermitian non-negative definite.

9.6-5. Show that if H is a Hermitian non-negative definite matrix, then there exists amatrix R such that H = R∗R. Moreover, if H is real, then R can be chosen tobe real.

9.6-6. Show that if A is a square matrix such that A∗A = O, then A = O.

9.6-7. Let H1, . . . , Hk be Hermitian matrices. Show that if H21 + · · · + H2

k = O, thenH1 = · · · = Hk = O.

9.6-8. Let A be an n×n Hermitian non-negative definite matrix. Show that if rankA =1, then there exists an 1× n matrix X such that A = X∗X.

9.6-9. Suppose A is a real m× n matrix with rank(A) = n. Show that ATA is positivedefinite.

Chapter 10

Inner Product Spaces andUnitary Transformations

§10.1 Inner Product

In secondary school we have learnt dot product in R3. This is one of the innerproduct in R3. We shall generalize this concept to some general vector spaces.

In this chapter, we shall denote F to be the field C or a subfield of C.

Definition 10.1.1 Let V be a vector space over F. An inner product or scalar producton V is a positive definite Hermitian form f on V . Since f will be fixed throughout ourdiscussion, we write 〈α, β〉 instead f(α, β). For α ∈ V , we define the norm or length ofα by ‖α‖ = √〈α, α〉. Since the inner product is positive definite, ‖α‖ ≥ 0 ∀α ∈ V . Also‖α‖ = 0 if and only if α = 0.

Examples 10.1.2

1. Let V = Rn. For α = (x1, . . . , xn), β = (y1, . . . , yn) in V , define 〈α, β〉 =n∑

i=1xiyi.

Then it is easy to see that this defines an inner product on Rn, sometimes called the

dot product, ‖α‖ =√

n∑i=1

x2i .

2. Let V = Cn. For α = (x1, . . . , xn), β = (y1, . . . , yn) in V , define 〈α, β〉 =n∑

i=1xiyi.

Again, this defines an inner product on Cn, ‖α‖ =√

n∑i=1|xi|2.

3. Let V be the vector space of all complex valued continuous functions defined on theclosed interval [a, b]. Then it is easy to see that the formula 〈f, g〉 = ∫ b

a f(x)g(x)dx

defines an inner product on V . Also ‖f‖2 = ∫ ba |f(x)|2dx. �

210

Inner Product Spaces and Unitary Transformations 211

Definition 10.1.3 A vector space with an inner product is called an inner productspace. A Euclidean space is a finite dimensional inner product space over R. A finitedimensional inner product space over C is called a unitary space.

Theorem 10.1.4 (Cauchy-Schwarz’s inequality) Let V be an inner product space.Then for α, β ∈ V , |〈α, β〉| ≤ ‖α‖‖β‖.

Proof: If α = 0, then 〈α, β〉 = 0 and thus the inequality holds. If α �= 0, then ‖α‖ �= 0.

��

α

β

β − 〈α,β〉‖α‖2 α

〈α,β〉‖α‖2 α

From

0 ≤∥∥∥∥〈α, β〉‖α‖2 α− β

∥∥∥∥2 = ‖β‖2 − |〈α, β〉|2‖α‖2 ,

we get |〈α, β〉|2 ≤ ‖α‖2‖β‖2. That is,|〈α, β〉| ≤ ‖α‖‖β‖. �

Remark 10.1.5 The equality holds if and only if β is a multiple of α or α = 0. For, inthe proof of Theorem 10.1.4, we see that the equality holds only if α = 0 or 〈α,β〉‖α‖2 α−β = 0.

Conversely, if α = 0 or β = cα for some scalar c, then the equality holds.

Theorem 10.1.6 (Triangle inequality) Let V be an inner product space. Then forα, β ∈ V , ‖α + β‖ ≤ ‖α‖ + ‖β‖. The equality holds if and only if α = 0 or β = cα forsome non-negative real number c.

Proof: Consider

‖α+ β‖2 = 〈α+ β, α+ β〉 = ‖α‖2 + 2Re〈α, β〉+ ‖β‖2≤ ‖α‖2 + 2|〈α, β〉|+ ‖β‖2 ≤ ‖α‖2 + 2‖α‖‖β‖+ ‖β‖2= (‖α‖+ ‖β‖)2.

Hence ‖α+ β‖ ≤ ‖α‖+ ‖β‖.Note that the equality holds if and only if Re〈α, β〉 = |〈α, β〉| = ‖α‖‖β‖. By Re-

mark 10.1.5, |〈α, β〉| = ‖α‖‖β‖ if and only if α = 0 or β = cα for some c ∈ C.If β = cα, then 〈α, β〉 = 〈α, cα〉 = c‖α‖2. However, Re〈α, β〉 = |〈α, β〉| if and only if

〈α, β〉 is real and non-negative. Hence the equality holds if and only if α = 0 or β = cαfor some c ≥ 0. �

Definition 10.1.7 Let V be an inner product space. Two vectors α, β ∈ V are said tobe orthogonal if 〈α, β〉 = 0. A vector α is said to be a unit vector if ‖α‖ = 1. A setS ⊆ V is called an orthogonal set if all pairs of distinct vectors in S are orthogonal. Anorthogonal set S is called an orthonormal set if all the vectors in S are unit vectors. Abasis of V is said to be an orthogonal basis (orthonormal basis) if it is also an orthogonalset (orthonormal set).

Theorem 10.1.8 Let V be an inner product space. Suppose S (finite set or infiniteset) is an orthogonal set of non-zero vectors. Then S is a linearly independent set.

212 Inner Product Spaces and Unitary Transformations

Proof: Suppose ξ1, . . . , ξk ∈ S are distinct and thatk∑

i=1ciξi = 0. Then for each j,

1 ≤ j ≤ k,

0 = 〈ξj ,0〉 =⟨ξj ,

k∑i=1

ciξi

⟩=

k∑i=1

ci〈ξj , ξi〉 = cj‖ξj‖2.

Since ‖ξj‖2 �= 0, each cj = 0. Thus S is a linearly independent set. �

Corollary 10.1.9 Let V be an inner product space. Suppose S is an orthonormal set.Then S is linearly independent.

Theorem 10.1.10 (Gram-Schmidt orthonormalization process )Suppose {α1, . . . , αs} is a linearly independent set in an inner product space V . Thenthere exists an orthonormal set {ξ1, . . . , ξs} such that for each k

ξk =

k∑i=1

aikαi for some scalars aik (10.1)

and span{α1, . . . , αs} = span{ξ1, . . . , ξs}.

Proof: Since α1 �= 0, ‖α1‖ > 0. Put ξ1 =α1‖α1‖ . Then ‖ξ1‖ = 1.

Suppose an orthonormal set {ξ1, . . . , ξr} has been found so that each ξk, k = 1, . . . , rsatisfies (10.1) and span{α1, . . . , αr} = span{ξ1, . . . , ξr}. Assume r < s. Let

α′r+1 = αr+1 −r∑

j=1

〈ξj , αr+1〉ξj .

Then for each ξi, 1 ≤ i ≤ r,

〈ξi, α′r+1〉 = 〈ξi, αr+1〉 −r∑

j=1

〈ξj , αr+1〉〈ξi, ξj〉 = 〈ξi, αr+1〉 −r∑

j=1

〈ξj , αr+1〉δij = 0.

Since each ξk ∈ span{α1, . . . , αk} for 1 ≤ k ≤ r, α′r+1 ∈ span{α1, . . . , αr, αr+1}. Alsoα′r+1 �= 0, for otherwise αr+1 will be a linear combination of α1, . . . , αr.

Put ξr+1 =α′r+1

‖α′r+1‖ . {ξ1, . . . , ξr+1} is an orthonormal set with the desired properties.

Also as αr+1 ∈ span{ξ1, . . . , ξr, α′r+1} = span{ξ1, . . . , ξr, ξr+1} we have

span{α1, . . . , αr+1} = span{ξ1, . . . , ξr+1}

We continue this process until r = s. �

Applying the Gram-Schmidt process to a basis of a finite dimensional inner productspace we have the following corollary:

Corollary 10.1.11 Let V be a finite dimensional inner product space. Then V has anorthonormal basis.


Corollary 10.1.12 Let V be a finite dimensional inner product space. For any givenunit vector α, there is an orthonormal basis with α as the first element.

Proof: Extend the linearly independent set {α} to be a basis of V with α as thefirst element. Applying the Gram-Schmidt process to this basis we obtain a desiredorthonormal basis. �

Examples 10.1.131. Let V = R4. It is clear that α1 = (0, 1, 1, 0), α2 = (0, 5,−3,−2) and α3 =

(−3,−3, 5,−7) are linearly independent. We would like to find an orthonormal basisof span{α1, α2, α3}.Put ξ1 =

α1‖α1‖ =

1√2(0, 1, 1, 0).

Let α′2 = α2 − 〈ξ1, α2〉ξ1 = 2(0, 2,−2,−1). Thus ξ2 = α′2

‖α′2‖ =

13(0, 2,−2,−1).

Let α′3 = α3 − 〈ξ1, α3〉ξ1 − 〈ξ2, α3〉ξ2 = (−3,−2, 2,−8). Hence

ξ3 =α′3

‖α′3‖ =

19(−3,−2, 2,−8). �

2. Let V be the vector space of all real continuous functions defined on closed interval[0, 1]. V has an inner product defined by 〈f, g〉 = ∫ 1

0 f(x)g(x)dx. We would like tofind an orthonormal basis of span{1, x, x2}.

Let f1(x) = 1, f2(x) = x, f3(x) = x2. Then ‖f1‖ =∫ 1

0dx = 1. Put g1 = f1.

Let f ′2(x) = f2(x)− 〈g1, f2〉g1 = x−∫ 1

0xdx = x− 1

2, so

‖f ′2‖ =√∫ 1

0

(x− 1

2

)2

dx =1

2√3. Put g2(x) =

f ′2‖f ′2‖

= 2√3(x− 1

2).

Let f ′3(x) = f3(x)− 〈g1, f3〉g1(x)− 〈g2, f3〉g2(x)

= x2 −∫ 1

0x2dx−

[2√3

∫ 1

0x2

(x− 1

2

)dx

]2√3

(x− 1

2

)= x2 − x+

1

6.

Then ‖f ′3‖ =√∫ 1

0

(x2 − x+

1

6

)2

dx =1

6√5.

Put g3(x) =f ′3‖f ′3‖

= 6√5

(x2 − x+

1

6

).

The required orthonormal basis is{1, 2√3(x− 1

2), 6√5(x2 − x+ 1

6)}. �

Application–QR decomposition

Let α1, . . . , αs be s linearly independent vectors in an inner product space V . Bythe Gram-Schmidt process, we obtain

α′1 = α1, ξ1 =α′1‖α′1‖

, and for k > 1, α′k = αk−〈ξ1, αk〉ξ1−· · ·−〈ξk−1, αk〉ξk−1, ξk =α′k‖α′k‖

.


Put rkk = ‖α′k‖ for k = 1, . . . , s and rik = 〈ξi, αk〉 for i < k, k = 2, . . . , s. Thus,α1 = α′1 = ‖α′1‖ξ1 = r11ξi. Also from α′k = ‖α′k‖ξk = rkkξk = αk−r1kξ1−· · ·−rk−1 kξk−1for k = 2, . . . , s, we have

αk = r1kξ1 + · · ·+ rk−1 kξk−1 + rkkξk for k = 1, 2, . . . , s. (10.2)

Suppose dimV = m and with respect to some basis αk and ξk have coordinate

vectors

⎛⎜⎜⎝a1k...

amk

⎞⎟⎟⎠ and

⎛⎜⎜⎝q1k...

qmk

⎞⎟⎟⎠ , respectively. Then the relations (10.2) in the matrix form

are

⎛⎜⎜⎜⎜⎜⎜⎜⎝

a1k...

ajk...

amk

⎞⎟⎟⎟⎟⎟⎟⎟⎠=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

k∑i=1

rikq1i

...k∑

i=1rikqji

...k∑

i=1rikqmi

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

k∑i=1

q1irik

...k∑

i=1qjirik

...k∑

i=1qmirik

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Thus

A =

⎛⎜⎜⎜⎜⎝a11 a12 · · · a1s

a21 a22 · · · a2s...

......

...

am1 am2 · · · ams

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

q11r11 q11r12 + q12r22 · · ·s∑

i=1q1iris

q21r11 q21r12 + q22r22 · · ·s∑

i=1qmiris

......

......

qm1r11 qm1r12 + qm2r22 · · ·s∑

i=1qmiris

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎝q11 q12 · · · q1s

q21 q22 · · · q2s...

......

...

qm1 qm2 · · · qms

⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝r11 r12 · · · r1s

0 r22 · · · r2s...

.... . .

...

0 0 · · · rss

⎞⎟⎟⎟⎟⎠ = QR.

Here Q = (qij) is an m × n matrix with orthonormal columns and R = (rij) is ans × s invertible upper triangular matrix. Therefore, we have proved the so-called QRdecomposition theorem:

Theorem 10.1.14 Suppose A is an m× n matrix of rank n. Then A = QR, where Qis an m×n matrix with orthonormal columns and R is an n×n invertible upper trianglematrix.


⎛⎜⎜⎜⎜⎝1 1 2

1 2 3

1 2 1

1 1 6

⎞⎟⎟⎟⎟⎠. Clearly, rank(A) = 3. In this case, we let V =


R4 under the usual dot product. And we choose the standard basis as an orthonormalbasis for V .

Then α1 = (1, 1, 1, 1) = α′1, ξ1 =12(1, 1, 1, 1). Thus α1 = 2ξ1.

Now α2 = (1, 2, 2, 1), α′2 = α2 − 〈ξ1, α2〉ξ1 = α2 − 3ξ1 = 12(−1, 1, 1,−1). Hence

ξ2 =12(−1, 1, 1,−1) and α2 = 3ξ1 + ξ2.

α3 = (2, 3, 1, 6), α′3 = α3 − 〈ξ1, α3〉ξ1 − 〈ξ2, α3〉ξ2 = α3 − 6ξ1 + 2ξ2 = (−2, 1,−1, 2).Hence ξ3 =

1√10(−2, 1,−1, 2) and α3 = 6ξ1 − 2ξ2 +

√10ξ3.

Thus Q =

⎛⎜⎜⎜⎜⎜⎝12 − 1

2 − 2√10

12

12

1√10

12

12 − 1√

1012 −1

22√10

⎞⎟⎟⎟⎟⎟⎠, R =

⎛⎜⎝ 2 3 6

0 1 −20 0

√10

⎞⎟⎠ and A = QR. �

Definition 10.1.16 Let {ξ1, ξ2, . . . , } be an orthonormal set in an inner product spaceV and let α ∈ V . The scalars ai = 〈ξi, α〉, i = 1, 2, . . . are called the Fourier coefficientsof α with respect to this orthonormal set.

Suppose ξ is a unit vector in R3 (or generally in Rn). The absolute value of Fouriercoefficient 〈ξ, α〉 of α is the length of α projected onto the vector ξ. If we identify thepoint P with its position vector α, then α − 〈ξ, α〉ξ is a vector orthogonal to ξ. Thedistance from P to the straight line passing through the origin with the direction ξ is thelength (norm) of the vector α−〈ξ, α〉ξ. This geometrical property can be generalized toproject a point onto a plane passing through the origin (a subspace) in a general innerproduct space.

Theorem 10.1.17 Let V be an inner product space. Suppose {ξ1, . . . , ξk} is an or-

thonormal set in V . For α ∈ V , we have minxi∈F1≤i≤k

∥∥∥∥α− k∑i=1

xiξi

∥∥∥∥ =

∥∥∥∥α− k∑i=1

aiξi

∥∥∥∥, whereai = 〈ξi, α〉 are the Fourier coefficients of α. Moreover,

∥∥∥∥α− k∑i=1

xiξi

∥∥∥∥ =

∥∥∥∥α− k∑i=1

aiξi

∥∥∥∥if and only if xi = ai for all i. Also

k∑i=1|ai|2 ≤ ‖α‖2 and

⟨ξj , α−

k∑i=1

aiξi

⟩= 0, ∀j =

1, . . . , k. Hence, we see that α−k∑

i=1aiξi is orthogonal to each vector in span{ξ1, . . . , ξk}.

Proof: Consider∥∥∥∥∥α−k∑

i=1

xiξi

∥∥∥∥∥2

=

⟨α−

k∑i=1

xiξi, α−k∑

i=1

xiξi

⟩= 〈α, α〉 −

k∑i=1

xiai −k∑

i=1

xiai +

k∑i=1

xixi

= ‖α‖2 +k∑

i=1

(ai − xi)(ai − xi)−k∑

i=1

|ai|2

= ‖α‖2 +k∑

i=1

|ai − xi|2 −k∑

i=1

|ai|2.


Thus

∥∥∥∥α− k∑i=1

xiξi

∥∥∥∥2 ≥ ‖α‖2 − k∑i=1|ai|2 =

∥∥∥∥α− k∑i=1

aiξi

∥∥∥∥2. The equality holds if and

only ifk∑

i=1|ai−xi|2 = 0. This is equivalent to xi = ai ∀i = 1, . . . , k. That is, the minimum

of

∥∥∥∥α− k∑i=1

xiξi

∥∥∥∥ is attained when xi = ai ∀i = 1, . . . , k. Since ‖α||2 −k∑

i=1|ai|2 =∥∥∥∥α− k∑

i=1aiξi

∥∥∥∥2 ≥ 0, we havek∑

i=1|ai|2 ≤ ‖α‖2. It is clear that

⟨ξj , α−

k∑i=1

aiξi

⟩= 0 for

all j. �

Application–least squares problem

Let V be an inner product space with W as its finite dimensional subspace. Givenα ∈ V \W . We want to find β ∈W such that ‖α−β‖ = min

ξ∈W‖α− ξ‖. One way of doing

this is to find an orthonormal basis of W and proceed as above.An alternative method is to pick any basis, say A = {η1, . . . , ηk}, of W . Suppose

β =k∑

j=1xjηj ∈ W is the required vector, its existence is asserted in Theorem 10.1.17

yet to be determined. Then by Theorem 10.1.17 we have 〈ηi, α − β〉 = 0 ∀i = 1, . . . , k.

So

⟨ηi, α−

k∑j=1

xjηj

⟩= 0 and then 〈ηi, α〉 =

k∑j=1〈ηi, ηj〉xj . This system has the matrix

form GX = N , where G = (〈ηi, ηj〉) is the representing matrix of the inner product withrespect to A . The determinant of G is the Gram determinant of the vectors η1, . . . , ηk

with respect to the inner product and N =

⎛⎜⎜⎝〈η1, α〉

...

〈ηk, α〉

⎞⎟⎟⎠. By Theorem 9.5.8, G is invertible

and hence we have a unique solution β. Such vector β is called the best approximationto α in the least squares sense.

Example 10.1.18 Let W = {(x, y, z, w) ∈ R4 | x− y− z+w = 0} be a subspace of R4

under the usual inner product (i.e., the dot product). Find a vector in W that is closestto α = (−2, 1, 2, 1).Solution: Clearly {η1 = (1, 1, 0, 0), η2 = (1, 0, 1, 0), η3 = (−1, 0, 0, 1)} is a basis

of W . Then G =(〈ηi, ηj〉

)=

⎛⎜⎝ 2 1 −11 2 −1

−1 −1 2

⎞⎟⎠ and N =

⎛⎜⎝ −10

3

⎞⎟⎠. Solve the

equation GX = N we get X =

⎛⎜⎝ x1

x2

x3

⎞⎟⎠ =

⎛⎜⎝ 0

1

2

⎞⎟⎠. Thus, the required vector is

0η1 + 1η2 + 2η3 = (−1, 0, 1, 2). �

In most applications, our inner product space V is Rm and the corresponding innerproduct is the usual dot product. In this content, we rephrase the least squares problem


as follows:Given A ∈ Mm,k(R) and B ∈ Mm,1(R). We want to find a matrix X0 ∈ Mk,1(R)

such that ‖AX0 − B‖ is minimum among all k × 1 matrices X. Let W = C(A) be thecolumn space of A, i.e., W = {β | β = AX,X ∈ Rk}. Then any vector in W is of theform AX for some X ∈ Mk,1(R). By Theorem 10.1.17, we know that X0 is such thatAX0 − B is orthogonal to each vector in W . Thus using the dot product as our innerproduct, we must have

(AX)T (AX0 −B) = 0 for all X ∈Mk,1(R).

That is XTAT (AX0 − B) = 0 for all X. This could happen only if AT (AX0 − B) = 0.That means X0 is a solution of the so-called normal equation

ATAX = ATB.

Moreover if rank(A) = k, then ATA is non-singular (Exercise 9.6-9) and the normalequation has unique solution in X0 = (ATA)−1(ATB).

Geometrically the solution of the normal equation enables us to find the vector inW that has the least distance from B. Thus, AX0 is just the ‘projection’ of B onto thesubspace W . As B varies, we get the so-called projection matrix

P = A(ATA)−1AT .

It is easy to see that P 2 = P and P is symmetric.

Example 10.1.19 Let W = span{(1, 1, 1, 1)T , (−1, 0, 1, 2)T , (0, 1, 2, 3)T } and letB = (0, 2, 1, 2)T . Then

A =

⎛⎜⎜⎜⎜⎝1 −1 0

1 0 1

1 1 2

1 2 3

⎞⎟⎟⎟⎟⎠ .

Then the normal equation ATAX = ATB takes the form⎛⎜⎝ 4 2 6

2 6 8

6 8 14

⎞⎟⎠⎛⎜⎝x1

x2

x3

⎞⎟⎠ =

⎛⎜⎝ 5

5

10

⎞⎟⎠ .

One can get a solution x1 = 1, x2 =12 and x3 = 0. Then the vector in W that yields

the least distance is given by

AX0 =

⎛⎜⎜⎜⎜⎝12

132

2

⎞⎟⎟⎟⎟⎠ .

Thus the least distance is√(12 − 0)2 + (1− 2)2 + (32 − 1)2 + (2− 2)2 =

√62 . �


Following example is a well known problem in statistics called regression.

Example 10.1.20 We would like to find a polynomial of degree n such that it fitsgiven m points (x1, y1), . . . , (xm, ym) in the plane in the least squares sense, in general,

m > n+1. We put y =n∑

j=0cjx

j , where cj ’s are to be determined such thatm∑i=1

(yi− yi)2

is the least. Put yi =n∑

j=0cjx

ji , i = 1, . . . ,m. Then we have to solve the system

⎛⎜⎜⎜⎜⎝1 x1 x21 · · · xn11 x2 x22 · · · xn2...

......

......

1 xm x2m · · · xnm

⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝c0

c1...

cn

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝y1

y2...

ym

⎞⎟⎟⎟⎟⎠in the least squares sense. Thus we need to find an X0 such that

‖AX0 −B‖ = minX∈Rn+1

‖AX −B‖

with

A =

⎛⎜⎜⎜⎜⎝1 x1 x21 · · · xn11 x2 x22 · · · xn2...

......

......

1 xm x2m · · · xnm

⎞⎟⎟⎟⎟⎠ , X =

⎛⎜⎜⎜⎜⎝c0

c1...

cn

⎞⎟⎟⎟⎟⎠ and B =

⎛⎜⎜⎜⎜⎝y1

y2...

ym

⎞⎟⎟⎟⎟⎠ .

When n = 1, the problem is called the linear regression problem. In this case

A =

⎛⎜⎜⎜⎜⎝1 x1

1 x2...

...

1 xm

⎞⎟⎟⎟⎟⎠ , X =

(c0

c1

)and B =

⎛⎜⎜⎜⎜⎝y1

y2...

ym

⎞⎟⎟⎟⎟⎠ .

Now W = span{(1, 1, . . . , 1), (x1, x2, . . . , xn)}. In practice as, x1, . . . , xn are not allequal, so dimW = 2. Then

ATA =

⎛⎜⎜⎝ nn∑

i=1xi

n∑i=1

xin∑

i=1x2i

⎞⎟⎟⎠ =

⎛⎝ n nx

nxn∑

i=1x2i

⎞⎠ , ATB =

⎛⎜⎜⎝n∑

i=1yi

n∑i=1

xiyi

⎞⎟⎟⎠ =

⎛⎝ nyn∑

i=1xiyi

⎞⎠ ,

where x = 1n

n∑i=1

xi and y = 1n

n∑i=1

yi.

Then det(ATA) = nn∑

i=1x2i − n2x2 = n

n∑i=1

(xi − x)2. Hence

X0 = (ATA)−1(ATB) =1

nn∑

i=1(xi − x)2

⎛⎜⎜⎝nyn∑

i=1x2i − nx

n∑i=1

xiyi

nn∑

i=1xiyi − n2xy

⎞⎟⎟⎠ .


So

c1 =

n∑i=1

xiyi − nxy

n∑i=1

(xi − x)2=

n∑i=1

(xi − x)(yi − y)

n∑i=1

(xi − x)2and c0 =

yn∑

i=1x2i − x

n∑i=1

xiyi

n∑i=1

(xi − x)2.

�

Exercise 10.1

10.1-1. Let V be an inner product space over F. Prove the following polarization iden-tities:

(a) If F ⊆ R, then

〈α, β〉 = 1

4‖α+ β‖2 − 1

4‖α− β‖2, ∀α, β ∈ V.

(b) If i ∈ F ⊆ C, then

〈α, β〉 = 1

4‖α+ β‖2 − 1

4‖α− β‖2 − i

4‖α+ iβ‖2 + i

4‖α− iβ‖2, ∀α, β ∈ V.

10.1-2. Let V be an inner product space over F, where F ⊆ R or i ∈ F ⊆ C. Prove theparallelogram law:

‖α+ β‖2 + ‖α− β‖2 = 2‖α‖2 + 2‖β‖2, ∀α, β ∈ V.

10.1-3. Let V be a real inner product space. Show that if α, β ∈ V are orthogonal ifand only if ‖α+ β‖2 = ‖α‖2 + ‖β‖2 (Pythagoream theorem).

10.1-4. Given the basis {(1, 0, 1, 0), (1, 1, 0, 0), (0, 1, 1, 1), (0, 1, 1, 0)} of R4. Apply theGram-Schmidt process to obtain an orthonormal basis.

10.1-5. Find an orthonormal basis of R3 starting with 1√3(1, 2,−1).

10.1-6. Let V = C0[0, 1] be the inner product space of real continuous defined on [0, 1].The inner product is defined by

〈f, g〉 =∫ 1

0f(t)g(t)dt.

Apply the Gram-Schmidt process to the standard basis {1, x, x2, x3} of the sub-space P4(R).

10.1-7. Let S be a finite set of mutually non-zero orthogonal vectors in an inner productspace V . Show that if the only vector orthogonal to each vector in S is the zerovector, then S is a basis of V .


10.1-8. Find an orthonormal basis for the plane x+ y + 2z = 0 in R3.

10.1-9. Let W = span{(1, 1, 0, 1), (3, 1, 1,−1)}. Find a vector in W that is closest to(0, 1,−1, 1).

10.1-10. Find the minimum distance of (1, 1, 1, 1) to the subspace{(x1, x2, x3, x4) ∈ R4 | x1 + x2 + x3 + x4 = 0}.

10.1-11. Find a linear equation that fits the data (0, 1), (3, 4), (6, 5) in the least squaressense.

10.1-12. Find a quadratic equation that fits the data (1, 4), (2, 0), (−1, 1), (0, 2) in theleast squares sense.

10.1-13. Find the least squares solution to each of the following system:

(a)

⎛⎜⎝1 1

2 −30 0

⎞⎟⎠(x1

x2

)=

⎛⎜⎝3

1

2

⎞⎟⎠; (b)

⎛⎜⎜⎜⎜⎝1 1 1

−1 1 1

0 −1 1

1 0 1

⎞⎟⎟⎟⎟⎠⎛⎜⎝x1

x2

x3

⎞⎟⎠ =

⎛⎜⎜⎜⎜⎝4

0

1

2

⎞⎟⎟⎟⎟⎠;

(c)

⎛⎜⎝ 1 1 3

−1 3 1

1 2 4

⎞⎟⎠⎛⎜⎝x1

x2

x3

⎞⎟⎠ =

⎛⎜⎝−208

⎞⎟⎠.

10.1-14. Find the QR decomposition of

⎛⎜⎜⎜⎜⎝1√2 0

0 1 0

0 1 2

0 0 0

⎞⎟⎟⎟⎟⎠.

§10.2 The Adjoint Transformation

An inner product on a vector space V is a positive definite Hermitian form. So if wefix the first variable of the inner product, then it becomes a linear form. We shall showthat this is a one-one corresponding between V and V if V is finite dimensional.

Theorem 10.2.1 Let V be a finite dimensional inner product space. Then for anylinear form φ ∈ V , there exists a unique γ ∈ V such that φ(α) = 〈γ, α〉.

Proof: Let {α1, . . . , αn} be an orthogonal basis of V and let {φ1, . . . , φn} be the dual

basis. Then φ =n∑

i=1biφi with bi = φ(αi). Define γ =

n∑i=1

biαi. Then for each αj ,

〈γ, αj〉 =⟨

n∑i=1

biαi, αj

⟩=

n∑i=1

bi〈αi, αj〉 = bj = φ(αj).


Thus if α ∈ V , α =n∑

j=1ajαj ,

φ(α) =n∑

j=1

ajφ(αj) =n∑

j=1

aj〈γ, αj〉

=

⟨γ,

n∑j=1

ajαj

⟩= 〈γ, α〉.

If γ′ is another vector in V such that φ(α) = 〈γ′, α〉, then 〈γ, α〉 = 〈γ′, α〉 ∀α ∈ V .That is, 〈γ − γ′, α〉 = 0 ∀α ∈ V . In particular, for α = γ − γ′, we have 〈γ − γ′, γ − γ′〉 =‖γ − γ′‖2 = 0. Hence γ = γ′. �

Of course, if V is inner product space with another inner product, then the vector γmust be changed.

Theorem 10.2.2 Let V be a finite dimensional inner product space. Let Λ : V → Vbe the map which associates each φ ∈ V a unique γ ∈ V such that φ(α) = 〈Λ(φ), α〉 =〈γ, α〉. Then Λ is bijective and conjugate linear, that is, Λ(aφ + ψ) = aΛ(φ) + Λ(ψ).Also Λ−1 is conjugate linear.

Proof: Suppose β ∈ V . Define φ : V → F by φ(α) = 〈β, α〉. Clearly, φ ∈ V andby Theorem 10.2.1, Λ(φ) = β. Thus Λ is surjective. Suppose φ, ψ ∈ V are such thatΛ(φ) = Λ(ψ). Then by the definition of Λ, φ(α) = 〈Λ(φ), α〉 = 〈Λ(ψ), α〉 = ψ(α) ∀α ∈ V .Thus φ = ψ and hence Λ is injective.

Now let φ, ψ ∈ V , a ∈ F and α ∈ V . Consider

〈Λ(aφ+ψ), α〉 = (aφ+ψ)(α) = aφ(α)+ψ(α) = a〈Λ(φ), α〉+〈Λ(ψ), α〉 = 〈aΛ(φ)+Λ(ψ), α〉.Hence 〈Λ(aφ+ψ)− aΛ(φ)−Λ(ψ), α〉 = 0 ∀α ∈ V . Therefore, Λ(aφ+ψ) = aΛ(φ)+Λ(ψ).Similarly, Λ−1 is conjugate linear. �

Corollary 10.2.3 If F is a subfield of R, then Λ is linear and hence an isomorphism.Thus for Euclidean space V , any linear form on V is an inner product with a fixed vectorin V .

Theorem 10.2.4 Let σ ∈ L(V, V ), where V is a finite dimensional inner product space.Then there exists a unique σ∗ ∈ L(V, V ) such that 〈σ∗(α), β〉 = 〈α, σ(β)〉 ∀α, β ∈ V .

Proof: For α ∈ V , define φ ∈ V by φ(β) = 〈α, σ(β)〉 for β ∈ V . Put σ∗(α) = Λ(φ),where Λ is defined in Theorem 10.2.2. Then we have 〈σ∗(α), β〉 = 〈Λ(φ), β〉 = φ(β) =〈α, σ(β)〉. Now for α1, α2, β ∈ V , a ∈ F,

〈σ∗(aα1 + α2), β〉 = 〈aα1 + α2, σ(β)〉 = a〈α1, σ(β)〉+ 〈α2, σ(β)〉= a〈σ∗(α1), β〉+ 〈σ∗(α2), β〉 = 〈aσ∗(α1) + σ∗(α2), β〉.

Since β is arbitrary, we have σ∗(aα1 +α2) = aσ∗(α1) + σ∗(α2). It is easy to see that σ∗

with this property is unique. �

Note that if there exists a linear transformation σ∗ ∈ L(V, V ) such that 〈σ∗(α), β〉 =〈α, σ(β)〉 ∀α, β ∈ V for a given σ ∈ L(V, V ), where V is a general inner product space,then σ∗ is unique.


Definition 10.2.5 Let σ ∈ L(V, V ), where V is an inner product space. The uniquelinear transformation σ∗ : V → V defined by 〈σ∗(α), β〉 = 〈α, σ(β)〉 ∀α, β ∈ V is calledthe adjoint of σ.

By Theorem 10.2.4, for finite dimensional inner product space V and any lineartransformation on V , the adjoint always exists.

Theorem 10.2.6 Let σ ∈ L(V, V ), where V is an inner product space. Suppose σ∗

exists. Then σ∗∗ = (σ∗)∗ exists and equals σ.

Proof: Let α, β ∈ V . Then from the definition of adjoint transformation, we must havethat

〈α, σ∗(β)〉 = σ∗(β), α〉 = 〈β, σ(α)〉 = 〈σ(α), β〉.Since α, β are arbitrary, we have σ∗∗ exists and σ∗∗ = σ. �

Suppose V is a finite dimensional inner product space. Let σ : V → V be a lineartransformation. Recall that σ : V → V is defined by σ(φ) = φ ◦ σ for φ ∈ V . Now forα, β ∈ V , we have

〈Λ ◦ σ ◦ Λ−1(α), β〉 = 〈Λ(σ ◦ Λ−1(α)), β〉 = (σ ◦ Λ−1(α))(β)= Λ−1(α)(σ(β)) = 〈α, σ(β)〉 = 〈σ∗(α), β〉.

Hence σ∗ = Λ ◦ σ ◦ Λ−1. Let A = (aij) and A′ = (a′ij) be respectively matrices repre-senting σ and σ∗ with respect to some orthonormal basis {ξ1, . . . , ξn}. Then

〈σ∗(ξj), ξk〉 = 〈ξj , σ(ξk)〉 =⟨ξj ,

n∑i=1

aikξi

⟩=

n∑i=1

aik〈ξj , ξi〉 = ajk.

On the other hand

〈σ∗(ξj), ξk〉 = 〈ξj , σ(ξk)〉 =⟨

n∑i=1

a′ijξi, ξk

⟩=

n∑i=1

a′ik〈ξj , ξk〉 = a′kj .

Thus A′ = AT = A∗.

Example 10.2.7 Let σ : R2 → R2 be given by σ(x, y) = (x + y, 2x − y). Then with

the standard basis of R2, σ has the representing matrix A =

(1 1

2 −1

). Thus σ∗ is

represented by AT with respect to the standard basis.

Now

(1 2

1 −1

)(x

y

)=

(x+ 2y

x− y

). Hence σ∗(x, y) = (x+ 2y, x− y). �

Proposition 10.2.8 Let V be a finite dimensional inner product space. Suppose W isa subset of V . Then Λ(W 0) is a subspace of V . Moreover,

Λ(W 0) = {β ∈ V | 〈β, α〉 = 0 ∀α ∈W}.


Proof: Let φ, ψ ∈W 0 ⊆ V , a ∈ F. Then aΛ(φ)+Λ(ψ) = Λ(aφ+ψ) ∈ Λ(W 0) since W 0

is a subspace of V . Let U = {β ∈ V | 〈β, α〉 = 0 ∀α ∈ W}. From Theorem 10.2.2 forany β ∈ V there is a unique φ ∈ V such that β = Λ(φ). Then β ∈ Λ(W 0) if and onlyif β = Λ(φ) for some φ ∈ W 0. This is equivalent to 〈β, α〉 = φ(α) = 0 ∀α ∈ W . HenceΛ(W 0) = U . �

In general inner product space V , we shall use W⊥ to denote the subspace

{β ∈ V | 〈β, α〉 = 0 ∀α ∈W}

in the sequel, where W ⊆ V .

Theorem 10.2.9 Let V be an n-dimensional inner product space. If W is a subspaceof V of dimension r, then W⊥ is a subspace of V of dimension n − r. Moreover,W ∩W⊥ = {0} and hence V = W ⊕W⊥.

Proof: Let α ∈ W ∩W⊥. Then by the definition of W⊥, 0 = 〈α, α〉 = ‖α‖2. Henceα = 0. Thus W ∩W⊥ = {0}. Thus it suffices to show that V = W +W⊥.

For α ∈ V , by Theorem 10.1.17 there is a unique β ∈ W such that β is the bestapproximation to α that lies in W . Moreover α − β is orthogonal to any vector in W ,i.e., α− β ∈W⊥. Then

α = β + (α− β) ∈W +W⊥.

Hence we have the theorem. �

In the proof above, if we denote the best approximation vector β to α in W as σ(α),then one can show that σ ∈ L(V, V ). The vector β = σ(α) is also called the orthogonalprojection of α on W . The linear transformation σ is called the orthogonal projectionof V on W .

Definition 10.2.10 Suppose W1 and W2 are subspaces of an inner product space V .If V = W1 + W2 and vectors of W1 are orthogonal to vectors of W2, then W1 is saidto be an orthogonal complement of W2 and is denoted by V = W1⊥W2. Note thatW1 ∩W2 = {0}.

Theorem 10.2.11 Let V be a finite dimensional inner product space. Then any sub-space W of V has a unique orthogonal complement.

Proof: The existence of orthogonal complement follows from Theorem 10.2.9. Theuniqueness is left as an exercise. �

Theorem 10.2.12 Suppose W is a subspace of an inner product space V invariantunder a linear transformation σ. Then W⊥ is invariant under σ∗, the adjoint transfor-mation of σ.

Proof: Let α ∈ W⊥. Then for β ∈ W , we have 〈σ∗(α), β〉 = 〈α, σ(β)〉 = 0 sinceσ(β) ∈W and α ∈W⊥. Hence σ∗(α) ∈W⊥. �


Theorem 10.2.13 Let σ ∈ L(V, V ), where V is an inner product space, and let σ∗ beits adjoint transformation. Then ker(σ∗) = σ(V )⊥.

Proof: Let α ∈ ker(σ∗). Then σ∗(α) = 0. Thus ∀β ∈ V , 〈α, σ(β)〉 = 〈σ∗(α), β〉 = 0.That is, α ∈ σ(V )⊥.

Conversely, suppose α ∈ σ(V )⊥. Then 〈σ∗(α), σ∗(α)〉 = 〈α, σ(σ∗(α))〉 = 0. Thusσ∗(α) = 0. That is, α ∈ ker(σ∗). �

Theorem 10.2.14 Let V be a finite dimensional inner product space. For each con-jugate bilinear form f , there exists a unique linear transformation σ on V such thatf(α, β) = 〈α, σ(β)〉 ∀α, β ∈ V .

Proof: For each α ∈ V , f(α, β) is linear in β. Thus by Theorem 10.2.1, there exists aunique vector γ such that f(α, β) = 〈γ, β〉 ∀β ∈ V .

Let τ : V → V be the mapping which associates α to γ. We claim that τ is linear.For any α1, α2, β ∈ V and a ∈ F,

〈τ(aα1 + α2), β〉 = f(aα1 + α2, β) = af(α1, β) + f(α2, β) = 〈aτ(α1), β〉+ 〈τ(α2), β〉= 〈aτ(α1) + τ(α2), β〉.

Thus τ(aα1 + α2) = aτ(α1) + τ(α2).

Let σ = τ∗. Then we have f(α, β) = 〈τ(α), β〉 = 〈α, σ(β)〉. Thus, σ is the requiredlinear transformation. Clearly, σ is unique. �

The linear transformation σ defined above is called the linear transformation asso-ciated with the conjugate bilinear form f .

Theorem 10.2.15 Let V be a finite dimensional inner product space. Suppose f is aconjugate bilinear form on V and σ is the associated linear transformation. Then f andσ are represented by the same matrix with respect to an orthonormal basis.

Proof: Let {ξ1, . . . , ξn} be an orthonormal basis and let A = (aij) be the matrixrepresenting σ with respect to this basis. Then

f(ξi, ξj) = 〈ξi, σ(ξj)〉 =⟨ξi,

n∑k=1

akjξk

⟩= aij .

�

By the above theorem we can define that the eigenvalues and eigenvectors of a con-jugate bilinear form as the eigenvalues and eigenvectors of the associated linear trans-formation. It is easy to see that there is a bijection between linear transformations andconjugate bilinear forms.

Definition 10.2.16 A linear transformation σ : V → V is said to be self-adjoint ifσ∗ = σ.


Proposition 10.2.17 Suppose V is a finite dimensional inner product space. A lineartransformation σ : V → V is self-adjoint if and only if the matrix representing σ withrespect to an orthonormal basis is Hermitian.

Proof: This follows from the paragraph between Theorem 10.2.6 and Example 10.2.7.�

Exercise 10.2

10.2-1. Suppose W1 and W2 are two subspace of an inner product space V . Prove thatif W1 ⊆W2, then W⊥

1 ⊇W⊥2 .

10.2-2. Suppose V is an inner product space. Show that if U , W and W ′ are subspacesof V such that V⊥W = U⊥W ′, then W = W ′.

10.2-3. Let V be a finite dimensional inner product space. Show that if σ ∈ L(V, V ) isan isomorphism, then (σ∗)−1 = (σ−1)∗.

§10.3 Isometry Transformations

In the real world (R3), there are many motions of rigid objects. For example, trans-lations and rotations (rotational dynamics in physics). They preserve angle of any twovectors. Now we shall consider an abstract case in mathematical sense called orthogonaland unitary transformations. They will preserve the inner product, and hence preserveangles between vectors.

Definition 10.3.1 Let V be an inner product space over F. A linear transformationσ : V → V is called an isometry if ‖σ(α)‖ = ‖α‖ ∀α ∈ V . If F = R, then σ is called anorthogonal transformation. If F = C, then σ is called a unitary transformation. Notethat an isometry is always injective.

Theorem 10.3.2 Let σ ∈ L(V, V ), where V is an inner product space over F withF ⊆ R or i ∈ F. Then σ is an isometry if and only if σ preserves inner product; that is,〈σ(α), σ(β)〉 = 〈α, β〉 ∀α, β ∈ V .

Proof: If 〈σ(α), σ(β)〉 = 〈α, β〉 for α, β ∈ V , then in particular, when α = β, we have‖σ(α)‖2 = ‖α‖2. Hence σ is an isometry.

Now suppose σ is an isometry. For F ⊆ R, by polarization identity we have

〈α, β〉 = 1

4‖α+ β‖2 − 1

4‖α− β‖2

=1

4‖σ(α+ β)‖2 − 1

4‖σ(α− β)‖2

=1

4‖σ(α) + σ(β)‖2 − 1

4‖σ(α)− σ(β)‖2 = 〈σ(α), σ(β)〉.


For i ∈ F ⊆ C, by polarization identity we have

〈α, β〉 = 1

4

[‖α+ β‖2 − ‖α− β‖2 − i‖α+ iβ‖2 + i‖α− iβ‖2]=

1

4

[‖σ(α+ β)‖2 − ‖σ(α− β)‖2 − i‖σ(α+ iβ)‖2 + i‖σ(α− iβ)‖2]=

1

4

[‖σ(α) + σ(β)‖2 − ‖σ(α)− σ(β)‖2 − i‖σ(α) + iσ(β)‖2 + i‖σ(α)− iσ(β)‖2]= 〈σ(α), σ(β)〉.

�

Corollary 10.3.3 Let V be a finite dimensional inner product space over F, whereF ⊆ R or i ∈ F ⊆ C. If σ ∈ L(V, V ) is an isometry, then σ maps orthonormal basis toorthonormal basis.

We shall obtain a stronger result of the converse of the above corollary. It is statedas follows.

Theorem 10.3.4 Let σ be a linear transformation on an inner product space V . If σmaps orthonormal basis onto orthonormal basis, then σ is an isometry.

Proof: Let {ξi}i∈I be an orthonormal basis of V such that {σ(ξi)}i∈I is also an

orthonormal basis. Let α ∈ V . Then α =m∑k=1

aikξik for some m. Hence

‖σ(α)‖2 = 〈σ(α), σ(α)〉 =⟨

m∑k=1

aikσ(ξik),

m∑j=1

aijσ(ξij )

⟩

=

m∑k=1

m∑j=1

aikaij⟨σ(ξik), σ(ξij )

⟩=

m∑k=1

|aik |2 = ‖α‖2.

�

Theorem 10.3.5 Suppose V is a finite dimensional inner product space over F withF ⊆ R or i ∈ F and σ ∈ L(V, V ). Then σ is an isometry if and only if σ∗ = σ−1.

Proof: Suppose σ is an isometry. By Theorem 10.3.2, 〈σ(α), σ(β)〉 = 〈α, β〉 ∀α, β ∈ V .By Theorem 10.2.4, 〈σ(α), σ(β)〉 = 〈σ∗(σ(α)), β〉. Thus (σ∗ ◦ σ)(α) = α ∀α ∈ V . Henceσ∗ ◦ σ is the identity mapping. So σ is injective. Since V is finite dimension, σ is anisomorphism. Thus σ−1 exists. Hence σ∗ = σ−1.

Conversely, suppose σ∗ = σ−1. Then 〈σ(α), σ(β)〉 = 〈σ∗(σ(α)), β〉 = 〈α, β〉 ∀α, β ∈V . Thus σ is an isometry. �

Following we shall study the matrix representing a unitary transformation or anorthogonal transformation.

Theorem 10.3.6 Let V be a unitary space. Then a complex matrix U represents aunitary transformation with respect to an orthonormal basis if and only if U∗ = U−1.


Proof: Suppose U = [σ]A for some orthonormal basis A . By the description betweenTheorem 10.2.6 and Example 10.2.7 we have U∗ = [σ∗]A . By Theorem 10.3.5, σ is aunitary transformation if and only if σ∗ = σ−1, if and only if U∗ = U−1. �

By the same proof we have

Corollary 10.3.7 Let V be a Euclidean space. Then a real matrix U represents anorthogonal transformation with respect to an orthonormal basis if and only if UT = U−1.

Definition 10.3.8 A complex matrix U is said to be unitary if U∗U = I. If U is a realmatrix and U∗ = UT = U−1, then U is called an orthogonal matrix.

Proposition 10.3.9 Let U ∈Mn(F). Then the followings are equivalent:

(1) U is unitary (or orthogonal).

(2) The columns of U are orthonormal.

(3) The rows of U are orthonormal.

Proof: Since U∗U = I, we have UU∗ = I. Thusn∑

r=1(U)r,i(U)r,j = δij =

n∑s=1

(U)i,s(U)j,s ∀i, j. �

Theorem 10.3.10 The matrix of transition from one orthonormal basis to another isunitary (or orthogonal if F ⊆ R).

Proof: Suppose A = {ξ1, . . . , ξn} and B = {ζ1, . . . , ζn} are two orthonormal bases

with P = (pij) as the matrix of transition from A to B. Then ζj =n∑

r=1prjξr. Thus

δij = 〈ζi, ζj〉 =⟨

n∑r=1

priξr,

n∑s=1

psjξs

⟩=

n∑r=1

priprj .

Hence P ∗P = I. �

Definition 10.3.11 Two square matrices A and B are unitary (respectively orthogonal)similar if there exists a unitary (respectively orthogonal) matrix P such that B =P−1AP = P ∗AP (or B = P−1AP = P TAP ).

It is easy to see that unitary similar and orthogonal similar are equivalence relationson Mn(F).

Exercise 10.3

10.3-1. Show that the product of unitary (respectively orthogonal) matrices is unitary(respectively orthogonal).

10.3-2. Let {ξ1, ξ2, ξ3} be an orthonormal basis of V over R or C. Find an isometry thatmaps ξ1 onto 1

3(ξ1 + 2ξ2 + 2ξ3).

10.3-3. Let A be an orthogonal matrix. Suppose Aij is the cofactor of (A)i,j . Show thatAij = (A)i,j detA.


§10.4 Upper Triangular Form

Given a square matrix A over C we know that it is similar to a Jordan form, thatis there is an invertible matrix P such that P−1AP is in Jordan form. Jordan form isa particular upper triangular matrix. In this section we shall show that every squarematric over C is unitary similar to an upper triangular matrix. For the real case, weknow that if characteristic polynomial of a real square matrix A factors into linearfactors, then it is similar to a Jordan form. Similar to the complex case, we shall showthat every real square matrix is orthogonal similar to an upper triangular matrix.

Theorem 10.4.1 Let V be a unitary space. Suppose σ ∈ L(V, V ). Then there existsan orthonormal basis with respect to which the matrix representing σ is in the uppertriangular form, that is, each entry below the main diagonal is zero.

Proof: We shall prove by mathematical induction on n = dimV . Clearly, the theoremholds for n = 1.

Assume the theorem holds for dimV < n with n ≥ 2. Suppose dimV = n. Since Vis a vector space over C, σ∗ has at least one eigenvalue. Let λ be an eigenvalue of σ∗

with unit vector ξn as its corresponding eigenvector. Put W = span(ξn)⊥. Then W is

of dimension n − 1. By Theorems 10.2.6 and 10.2.12 W is invariant under (σ∗)∗ = σ.Thus, by induction hypothesis, there is an orthonormal basis {ξ1, . . . , ξn−1} of W such

that σ(ξk) =k∑

i=1aikξi for k = 1, . . . , n− 1. Clearly, {ξ1, . . . , ξn} is a required basis. �

Corollary 10.4.2 Over the field C, every square matrix is unitary similar to an uppertriangular matrix.

Now we are going to consider the real case.

Theorem 10.4.3 Let V be a Euclidean space. Suppose σ ∈ L(V, V ) whose characteris-tic polynomial factors into real linear factors. Then there is an orthonormal basis of Vwith respect to which the matrix representing σ is in the upper triangular form.

Proof: Again, we shall prove by mathematical induction on n = dimV . Clearly, thetheorem holds for n = 1.

Now we assume that the theorem holds for all Euclidean space of dimension lessthan n with n ≥ 2. Let λ be an eigenvalue of σ with unit vector ξ1 as its correspondingeigenvector. By the Gram-Schmidt process, we obtain an orthonormal basis {ξ1, . . . , ξn}of V .

Let W = span{ξ2, . . . , ξn}. We define a mapping τ : V → W as follows. For each

α =n∑

i=1aiξi ∈ V , defined τ(α) =

n∑i=2

aiξi. Then clearly, τ is linear and τ(α) = α ∀α ∈W .

Let σ′ = (τ ◦σ)∣∣W. Then σ′ ∈ L(W,W ). To apply the induction hypothesis we have

to show that the characteristic polynomial of σ′ factors into real linear factors.


First, we note that if α ∈ W , then (τ ◦ σ)(α) = (τ ◦ σ ◦ τ)(α). If α = cξ1 for c ∈ R,then (τ ◦ σ)(α) = λcτ(ξ1) = 0 = (τ ◦ σ ◦ τ)(α). Thus τ ◦ σ = τ ◦ σ ◦ τ . Hence by an easyinduction, we can show that

(τ ◦ σ)k = τ ◦ σk, k = 1, 2, . . . .

Suppose f(x) = knxn + kn−1xn−1 + · · · + k1x + k0 is the characteristic polynomial

of σ. Then for α ∈W ,

f(σ′)(α) = knσ′n(α) + kn−1σ′n−1 + · · ·+ k1σ

′(α) + k0α

= kn(τ ◦ σ)n(α) + kn−1(τ ◦ σ)n−1 + · · ·+ k1(τ ◦ σ)(α) + k0τ(α)

= kn(τ ◦ σn)(α) + kn−1(τ ◦ σn−1) + · · ·+ k1(τ ◦ σ)(α) + k0τ(α)

= τ(knσn(α) + kn−1σn−1(α) + · · ·+ k1σ(α) + k0α) = τ(f(σ)(α)) = 0.

Thus f(σ′) = O. Hence the minimum polynomial of σ′ divides f(x). Since f(x)factors into real linear factors, the minimum polynomial of σ′ also factors into linearfactors. Thus the characteristic polynomial of σ′ must also factor into real linear factors.By the induction hypothesis, W has an orthonormal basis {ζ2, . . . , ζn} such that σ′(ζk) =k∑

i=2aikζi for k = 2, . . . , n. Let ζ1 = ξ1. Then σ(ζ1) = σ(ξ1) = λζ1. For k ≥ 2, since

σ(ζk) =n∑

j=2bjkζj + b1kζ1 for some bjk ∈ R,

σ′(ζk) =k∑

i=2

aikζi = τ(σ(ζk)) =n∑

j=2

bjkτ(ζj) + 0 =n∑

j=2

bjkζj , for some aik ∈ R.

Thus bjk = 0 for j > k, i.e., σ(ζk) = b1kζ1+k∑

j=2bjkζj . Hence {ζ1, ζ2, . . . , ζn} is a required

basis. �

Corollary 10.4.4 Every real square matrix whose characteristic polynomial factors intoreal linear factors is orthogonal similar to an upper triangular matrix.

Theorems 10.4.1 and 10.4.3 are usually referred as Schur’s Theorems.

Given a square real matrix whose eigenvalues are all real. Following we provide analgorithm for finding an upper triangular matrix which is orthogonal similar to thissquare matrix.

Let A ∈Mn(R) be such that all its eigenvalues are real. Then by Corollary 10.4.4 Ais orthogonal similar to an upper triangular matrix. To put A into an upper triangularfrom, we proceed as follows.

Step 1. Find one eigenvalue λ1 and choose a corresponding eigenvector ξ1 with unitlength.

Step 2. Extend ξ1 to an orthonormal basis {ξ1, . . . , ξn} of Rn. Form the transitionmatrix P1, which is an orthogonal matrix. Indeed, the i-th column is ξi.


Step 3. Compute P T1 AP1, which is of the form⎛⎜⎜⎜⎜⎝

λ1 · · · · · · · · ·0... A1

0

⎞⎟⎟⎟⎟⎠ .

Here A1 is a real (n− 1)× (n− 1) matrix whose eigenvalues are also eigenvaluesof A, hence they are all real. If A1 is already upper triangular, then we aredone.

Step 4. Suppose A1 is not upper triangular, then we apply Step 1 and Step 2 to A1 toobtain an (n − 1) × (n − 1) orthogonal matrix Q1 such that QT

1 AQ1 is of theform ⎛⎜⎜⎜⎜⎝

λ2 · · · · · · · · ·0... A2

0

⎞⎟⎟⎟⎟⎠ ,

for some eigenvalue λ2 and some (n− 2)× (n− 2) matrix A2 whose eigenvaluesare also eigenvalues of A. Now let

P2 =

⎛⎜⎜⎜⎜⎝1 0 · · · 00... Q1

0

⎞⎟⎟⎟⎟⎠ .

Clearly P2 is orthogonal and

P T2 P T

1 AP1P2 =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

λ1 ∗ · · · · · · · · ·0 λ2 · · · · · · · · ·0 0...

... A2

0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎠.

Note that P1P2 is an orthogonal matrix (see Exercise 10.3-1.)

Step 5. If A2 is upper triangular, then we are done. Otherwise, we repeat the abovesteps until we obtain an upper triangular matrix.


⎛⎜⎝ 2 2 −1−1 −1 1

−1 −2 2

⎞⎟⎠. Then 1 is an eigenvalue of A. ξ1 =

1√2(1, 0, 1) is a corresponding eigenvector of unit length. By Gram-Schmidt process, we

have an orthonormal basis{

1√2(1, 0, 1), 1√

2(1, 0,−1), (0, 1, 0)

}of R3.


Put P1 =

⎛⎜⎜⎝1√2

1√2

0

0 0 11√2− 1√

20

⎞⎟⎟⎠. Then P T1 AP1 =

⎛⎜⎝ 1 0 0

0 3 2√2

0 −√2 −1

⎞⎟⎠. Now consider

the 2×2 matrix

(3 2

√2

−√2 −1

). This matrix has an eigenvalue 1 and a corresponding

eigenvector 1√3(√2,−1). By an easy inspection, we see that

{1√3(√2,−1), 1√

3(1,√2)}

is an orthonormal basis of R2.

Form P2 =

⎛⎜⎜⎝1 0 0

0√

23

√13

0 −√

13

√23

⎞⎟⎟⎠. Then P T2 P T

1 AP1P2 =

⎛⎜⎝ 1 0 0

0 1 3√2

0 0 1

⎞⎟⎠. �

Exercise 10.4

10.4-1. Let A =

⎛⎜⎝ 1 1 −1−1 3 −1−1 2 0

⎞⎟⎠. Find an orthogonal matrix P such that P TAP is

upper triangular.

§10.5 Normal Linear Transformations

We learned that every matrix whose minimum polynomial can be factorized intolinear factors is similar to a diagonal matrix. We also learned every symmetric orHermitian matrix is congruent to a diagonal matrix. Under what condition can a matrixbe simultaneously similar and congruent to a diagonal matrix, i.e., unitary or orthogonalsimilar to a diagonal matrix?

Definition 10.5.1 A linear transformation σ ∈ L(V, V ), where V is an inner productspace, is called a normal linear transformation if σ ◦ σ∗ = σ∗ ◦ σ.

Theorem 10.5.2 Let V be an inner product space over F, where F ⊆ R or i ∈ F.Suppose σ ∈ L(V, V ). Then the followings are equivalent:

(1) σ is normal.

(2) 〈σ(α), σ(β)〉 = 〈σ∗(α), σ∗(β)〉 ∀α, β ∈ V .

(3) ‖σ(α)‖ = ‖σ∗(α)‖ ∀α ∈ V .

Proof:

[(1) ⇒ (2)]: This is because

〈σ(α), σ(β)〉 = 〈σ∗ ◦ σ(α), β〉 = 〈σ ◦ σ∗(α), β〉 = 〈σ∗(α), σ∗(β)〉.


[(2) ⇒ (1)]: Since

〈α, σ ◦ σ∗(β)〉 = 〈σ∗(α), σ∗(β)〉 = 〈σ(α), σ(β)〉 = 〈α, σ∗ ◦ σ(β)〉 ∀α, β ∈ V,

so σ ◦ σ∗ = σ∗ ◦ σ.[(2) ⇒ (3)]: Just put α = β in (2).

[(3) ⇒ (2)]: We have to consider the following two cases:

Case 1: F ⊆ R. Then

〈σ(α), σ(β)〉 = 1

4

(‖σ(α+ β)‖2 − ‖σ(α− β)‖2)=

1

4

(‖σ∗(α+ β)‖2 − ‖σ∗(α− β)‖2) = 〈σ∗(α), σ∗(β)〉.Case 2: i ∈ F. Then

〈σ(α), σ(β)〉 = 1

4

(‖σ(α+ β)‖2 − ‖σ(α− β)‖2

− i‖σ(α+ iβ)‖2 + i‖σ(α− iβ)‖2)=

1

4

(‖σ∗(α+ β)‖2 − ‖σ∗(α− β)‖2

− i‖σ∗(α+ iβ)‖2 + i‖σ∗(α− iβ)‖2)= 〈σ∗(α), σ∗(β)〉.

�

Theorem 10.5.3 Let σ : V → V be a normal linear transformation of an inner productspace V . Then ker(σ) = ker(σ∗). Moreover, if V is finite dimensional, then σ(V ) =σ∗(V ).

Proof: By the proof of Theorem 10.5.2, ‖σ(α)‖ = ‖σ∗(α)‖. Thus σ(α) = 0 if and onlyif σ∗(α) = 0. Hence ker(σ) = ker(σ∗).

By Theorem 10.2.13, ker(σ∗) = σ(V )⊥ and ker(σ) = σ∗(V )⊥. Hence from ker(σ) =ker(σ∗) if V is finite dimensional, then by the uniqueness of orthogonal complement wehave σ(V ) = σ∗(V ). �

Theorem 10.5.4 Let σ : V → V be a normal linear transformation on an inner productspace V . If ξ is an eigenvector of σ corresponding to eigenvalue λ, then ξ is also aneigenvector of σ∗ corresponding to eigenvalue λ.

Proof: Since σ is normal, by the proof of Theorem 10.5.2 〈σ(ξ), σ(ξ)〉 = 〈σ∗(ξ), σ ∗ (ξ)〉.Thus

0 = ‖σ(ξ)− λξ‖2 = 〈σ(ξ)− λξ, σ(ξ)− λξ〉= 〈σ(ξ), σ(ξ)〉 − λ〈ξ, σ(ξ)〉 − λ〈σ(ξ), ξ〉+ |λ|2〈ξ, ξ〉= 〈σ∗(ξ), σ∗(ξ)〉 − λ〈σ∗(ξ), ξ〉 − λ〈ξ, σ∗(ξ)〉+ |λ|2〈ξ, ξ〉= ‖σ∗(ξ)− λξ‖2.


Thus σ∗(ξ) = λξ. �

By applying the above theorem we have

Theorem 10.5.5 Let σ : V → V be a normal linear transformation of an inner productspace V . Suppose ξ1 and ξ2 are eigenvectors corresponding to eigenvalues λ1 and λ2,respectively. If λ1 �= λ2, then 〈ξ1, ξ2〉 = 0.

Proof: Since

λ2〈ξ1, ξ2〉 = 〈ξ1, λ2ξ2〉 = 〈ξ1, σ(ξ2)〉 = 〈σ∗(ξ1), ξ2〉 = 〈λ1ξ1, ξ2〉 = λ1〈ξ1, ξ2〉,

(λ1 − λ2)〈ξ1, ξ2〉 = 0. Since λ1 �= λ2, 〈ξ1, ξ2〉 = 0. �

Lemma 10.5.6 Let σ : V → V be a normal linear transformation of an inner productspace V . If S is a set of eigenvectors of σ, then S⊥ is invariable under σ.

Proof: Let α ∈ S⊥. Suppose ξ is an eigenvector in S corresponding to eigenvalue λ.Then 〈σ(α), ξ〉 = 〈α, σ∗(ξ)〉 = 〈α, λξ〉 = λ〈α, ξ〉 = 0. Thus σ(α) ∈ S⊥. �

Lemma 10.5.7 Let σ : V → V be a normal linear transformation of an inner productspace V . If W is a subspace invariant under both σ and σ∗, then σ′, the restriction ofσ on W is a normal linear transformation on W .

Proof: Let α, β ∈W . Since 〈σ′∗(α), β〉 = 〈α, σ′(β)〉 = 〈α, σ(β)〉, σ′∗ = σ∗∣∣W.

Let α, β ∈ W . 〈σ′(α), σ′(β)〉 = 〈σ(α), σ(β)〉 = 〈σ∗(α), σ∗(β)〉 = 〈σ′∗(α), σ′∗(β)〉.Thus by the proof of Theorem 10.5.2 σ′ is normal linear transformation on W . �

Theorem 10.5.8 Let V be a unitary space and σ : V → V a normal linear transforma-tion. If W is a subspace invariant under σ, then W is also invariant under σ∗. Henceσ∣∣W

is a normal linear transformation on W .

Proof: We shall prove by mathematical induction on n = dimV . Clearly, the theoremholds for n = 1.

Now for n ≥ 2, we assume the theorem holds for all unitary spaces of dimension lessthan n. Suppose V is a unitary space of dimension n and let W be a subspace invariantunder σ. Then by Theorem 10.2.12 W⊥ is invariant under σ∗. Since W⊥ is also afinite dimensional vector space over C, σ∗ has at least one eigenvector ξ in W⊥. ByTheorem 10.5.4, ξ is also an eigenvector of σ. By Lemma 10.5.6, span{ξ}⊥ is invariantunder both σ and σ∗. Then by Lemma 10.5.7, σ is a normal linear transformation onspan{ξ}⊥. Since span{ξ} ⊆ W⊥, W ⊆ span{ξ}⊥ and dim(span{ξ})⊥ = n − 1, theinduction hypothesis applies. Thus W is invariant under σ∗ and σ is a normal lineartransformation on W . �

Theorem 10.5.9 Let V be a finite dimensional inner product space and let σ ∈ L(V, V ).If V has an orthonormal basis consisting of eigenvectors of σ, then σ is a normal lineartransformation.


Proof: It follows from Theorem 10.5.2(2). �

Theorem 10.5.10 If V is a unitary space and σ : V → V is a normal linear transfor-mation, then V has an orthonormal basis consisting of eigenvectors of σ.

Proof: We shall prove by mathematical induction on n = dimV . For n = 1, thetheorem is clear, for any unit vector will do.

Now for n ≥ 2, we assume the theorem holds for all unitary spaces of dimension lessthan n. Suppose V is a unitary space of dimension n. Since V is a finite dimensionalvector space over C, σ has at least one eigenvalue λ and hence has a correspondingeigenvector ξ1 of unit length. By Lemma 10.5.6 W = span{ξ1}⊥ is invariant under σ. ByTheorem 10.5.8, σ

∣∣W

is a normal linear transformation onW . Since dimW = n−1, so byinduction hypothesis, W has an orthonormal basis {ξ2, . . . , ξn} consisting of eigenvectorsof σ

∣∣W. Since V = span{ξ1}⊥W , {ξ1, ξ2, . . . , ξn} is an orthonormal basis of V consisting

of eigenvector of σ. �

Note that in Theorem 10.5.9, we do not require that F = C. However in the proofof Theorem 10.5.10 we do not need F = C to ensure eigenvalues exist.

Theorem 10.5.11 Let V be a unitary space and let σ be a normal linear transformationon V . Then σ is self-adjoint if and only if all its eigenvalues are real.

Proof: Suppose σ is self-adjoint. Let λ be an eigenvalue of σ with ξ as a correspondingeigenvector. Then by Theorem 10.5.4 we have λξ = σ(ξ) = σ∗(ξ) = λξ and therefore(λ− λ)ξ = 0. Hence λ = λ. That is, λ must be real.

Conversely, suppose all the eigenvalue of σ are real. By Theorem 10.5.10 V has anorthonormal basis {ξ1, . . . , ξn} consisting of eigenvectors of σ. Let λi be the eigenvaluecorresponding to ξi. Then σ∗(ξi) = λiξi = λiξi = σ(ξi). Since σ and σ∗ agree on a basisof V , σ = σ∗. �

Theorem 10.5.12 Let V be an inner product space. Then all the eigenvalues of anisometry on V are of absolute value 1. If dimV is finite and σ ∈ L(V, V ) is normalwhose eigenvalues are of absolute value 1, then σ is an isometry.

Proof: Suppose σ is an isometry. Let λ be an eigenvalue of σ with ξ as a correspondingeigenvector. Then ‖ξ‖ = ‖σ(ξ)‖ = ‖λξ‖ = |λ|‖ξ‖. Hence |λ| = 1. So we have the firstpart of the theorem.

Now suppose V is finite dimensional and σ is a normal linear transformation on Vwhose eigenvalues are of absolute value 1. By Theorem 10.5.10 V has an orthonormalbasis {ξ1, . . . , ξn} consisting of eigenvectors of σ. Let λi be the eigenvalue correspondingto ξi. Then 〈σ(ξi), σ(ξj)〉 = 〈λiξi, λjξj〉 = λiλj〈ξi, ξj〉 = δij . Since σ maps orthonormalbasis onto orthonormal basis, by Theorem 10.3.4 σ is isometry. �

Now let us consider the matrix counterpart.

Definition 10.5.13 A square matrix A for which A∗A = AA∗ is called a normalmatrix.Thus, a matrix that represents a normal linear transformation must be normal.


Example 10.5.14 Unitary matrices and Hermitian matrices are normal matrices. Also,diagonal matrices are normal matrices. �

Lemma 10.5.15 An upper triangular matrix A is normal if and only if A is diagonal.

Proof: Suppose A = (aij) is a normal matrix with aij = 0 if i > j. If A were notdiagonal, then there would be a smallest positive integer r for which there exists aninteger s > r such that ars �= 0. This implies that for any i < r, we would have air = 0.Then

(A∗A)r,r =

n∑i=1

airair =

n∑i=1

|air|2 = |arr|2,

and

(AA∗)r,r =n∑

j=1

arjarj =

n∑j=r

|arj |2.

Thus we would have |arr|2 = |arr|2 + · · · + |ars|2 + · · · + |arn|2. Since |ars|2 > 0, this isclearly a contradiction.

The conversely is clear. �

Lemma 10.5.16 A matrix that is unitary similar to a normal matrix is also normal.

Proof: Suppose A is normal and B = U∗AU for some unitary matrix U . Then

B∗B = (U∗AU)∗(U∗AU) = U∗A∗UU∗AU = U∗A∗AU

= U∗AA∗U = U∗AUU∗A∗U = BB∗.

�

Theorem 10.5.17 A square matrix A that is unitary similar to a diagonal matrix ifand only if A is normal.

Proof: Suppose A is normal. By Corollary 10.4.2 A is unitary similar to an uppertriangular matrix S. Then by Lemma 10.5.16 S is normal. By Lemma 10.5.15 S mustbe diagonal.

Conversely, suppose A is unitary similar to a diagonal matrix D. Then since D isnormal, by Lemma 10.5.16 A must be normal. �

Theorem 10.5.18 Suppose H is a Hermitian matrix. Then H is unitary similar to adiagonal matrix and all its eigenvalues are real. Conversely, if H is normal and all itseigenvalues are real, then H is Hermitian.

Proof: Suppose H is Hermitian. Then H is normal, and by Theorem 10.5.17 H isunitary similar to a diagonal matrix D. That is, D = U∗HU for some unitary matrixU . Since

D∗ = (U∗HU)∗ = U∗H∗U = U∗HU = D,

the diagonal entries of D are real. Since the diagonal entries of D are also eigenvaluesof H, all the eigenvalues of H are real.

Conversely, if H is normal with real eigenvalues, then D = U∗HU with U unitaryand D a real diagonal matrix. Hence H = UDU∗ and H∗ = UD∗U∗ = UDU∗ = H asD = D∗. Thus H is Hermitian. �


Theorem 10.5.19 If A is unitary, then A is similar to a diagonal matrix and theeigenvalues of A are of absolute value 1. Conversely, if A is normal and all eigenvaluesare of absolute value 1, then A is unitary.

Proof: Suppose A is unitary. Then A is normal, so by Theorem 10.5.17 there exists aunitary matrix U and a diagonal matrix D such that U∗AU = D. Since D is a productof unitary matrices, D is unitary. Therefore, I = D∗D = DD, where D is the matrixobtained from D by taking complex conjugate of all entries of D. Hence the diagonalentries of D are of absolute value 1. Since the diagonal entries of D are also eigenvaluesof A, we have the first part of the theorem.

Conversely, suppose A is normal and all eigenvalues of A are of absolute value 1.Then by Theorem 10.5.17 A is unitary similar to a diagonal matrix D whose diagonalentries are eigenvalues of A. Thus D∗D = DD = I, D is unitary. Therefore, A is alsounitary. �

If A is orthogonal, then it is not necessary that A is orthogonal similar to a diagonal

matrix. For consider the matrix A =

(0 −11 0

). A is an orthogonal matrix with

eigenvalues i and −i. Thus there cannot exist orthogonal matrix P such that P TAP =(i 0

0 −i

).

Theorem 10.5.20 Let A ∈ Mn(R). A is symmetric if and only if A is orthogonalsimilar to a diagonal matrix.

Proof: Suppose A is symmetric. Then A is Hermitian. By Theorem 10.5.18 all theeigenvalues of A are real. Thus, by Corollary 10.4.4 A is orthogonal similar to anupper triangular matrix S. Thus there exists an orthogonal (real) matrix P such thatS = P TAP . Thus S is real matrix and

S∗S = STS = P TATPP TAP = P TAATP = SST = SS∗.

Thus S is normal and by Lemma 10.5.15 is a diagonal matrix.

Conversely, suppose A is orthogonal similar to a diagonal matrix D, then thereexists an orthogonal matrix P such that P TAP = D. Then A = PDP T and henceAT = PDTP T = PDP T = A. Thus A is symmetric. �

Theorem 10.5.21 Suppose A ∈ Mn(R). Then A is symmetric if and only if A is anormal and all its eigenvalues are real.

Proof: Suppose A is symmetric. Obviously A is normal. By Theorem 10.5.20 A isorthogonal similar to a diagonal matrix whose diagonal entries are eigenvalues of A.Hence all the eigenvalues of A are real.

Conversely, suppose A is normal and all the eigenvalues of A are real. Then byCorollary 10.4.4 and Lemma 10.5.15, A is orthogonal similar to a diagonal matrix.Hence A is symmetric. �


In practice, how do we find an orthogonal matrix to reduce a given symmetric orHermitian matrix into diagonal form? We shall use the following examples to demon-strate.

Example 10.5.22 The matrix A =

⎛⎜⎜⎜⎜⎝0 0 3 1

0 0 1 3

3 1 0 0

1 3 0 0

⎞⎟⎟⎟⎟⎠ is a real symmetric matrix which

has four distinct eigenvalues, namely, −4,−2, 2 and 4. The corresponding eigenvectorsare

(−1,−1, 1, 1), (1,−1,−1, 1), (−1, 1,−1, 1) and (1, 1, 1, 1),

respectively. Since the eigenvalues are distinct, these eigenvectors are mutually orthog-onal. Thus, we obtain an orthonormal basis

{12(−1,−1, 1, 1), 12(1,−1,−1, 1), 1

2(−1, 1,−1, 1), 12(1, 1, 1, 1)}

and the orthogonal matrix of transition is

P =1

2

⎛⎜⎜⎜⎜⎝−1 1 −1 1

−1 −1 1 1

1 −1 −1 1

1 1 1 1

⎞⎟⎟⎟⎟⎠ . �

Example 10.5.23 The symmetric matrix A =

⎛⎜⎝ 1 2 2

2 1 −22 −2 1

⎞⎟⎠ has characteristic

polynomial−(x − 3)2(x + 3). For the eigenvalue −3, we obtain the eigenvector (1,−1,−1). Forthe eigenvalue 3, we obtain two linearly independent eigenvectors (1, 1, 0) and (1, 0, 1).Apply the Gram-Schmidt process to these two vectors we obtain an orthonormal set{12(1, 1, 0), 1√

6(1,−1, 2)} also consisting of eigenvectors. Thus

{ 1√3(1,−1,−1), 1

2(1, 1, 0),1√6(1,−1, 2)}

is an orthonormal basis of eigenvectors. The orthogonal matrix of transition is

P =

⎛⎜⎜⎝1√3

1√2

1√6

− 1√3

1√2− 1√

6

− 1√3

0 2√6

⎞⎟⎟⎠ . �


(1 1− i

1 + i 3

). Then A is Hermitian and its characteristic

polynomial is (x − 4)(x − 1). For eigenvalue 1, we obtain eigenvector (−1 + i, 1). For


eigenvalue 4, we obtain eigenvector (1, 1+i). Thus, an orthonormal basis of eigenvectorsis { 1√

3(−1 + i, 1), 1√

3(1, 1 + i)}. The unitary matrix of transition is

U =1√3

(−1 + i 1

1 1 + i

). �


⎛⎜⎝cos θ 0 − sin θ

0 1 0

sin θ 0 cos θ

⎞⎟⎠, where 0 < θ < π. Then A is an

orthogonal matrix that is not symmetric. It is easy to see that the characteristic poly-nomial of A is −(x − 1)(x2 − 2x cos θ + 1). The eigenvalues are 1, cos θ + i sin θ andcos θ− i sin θ. The corresponding eigenvectors are (0, 1, 0), (1, 0,−i) and (1, 0, i), respec-tively. Since the eigenvalues are distinct, these eigenvectors are mutually orthogonal.After normalizing, we obtain the orthonormal basis {(0, 1, 0), 1√

2(1, 0,−i), 1√

2(1, 0, i)}.

The unitary matrix of transition is U = 1√2

⎛⎜⎝ 0 1 1√2 0 0

0 −i i

⎞⎟⎠. �

Example 10.5.26 Determine the type of the conic curve 2x2 − 4xy + 5y2 − 36 = 0.

Solution: The associated quadratic form is 2x2− 4xy+5y2 which has the representing

matrix B =

(2 −2

−2 5

). Then there exists an orthogonal matrix P =

(2√5− 1√

51√5

2√5

),

so that P TBP =

(1 0

0 6

). Putting

X ′ =

(x′

y′

)= P−1

(x

y

)or

(x

y

)= PX ′ =

(2√5− 1√

51√5

2√5

)(x′

y′

).

Thus x = 1√5(2x′ − y′) and y = 1√

5(x+ 2y′). Substitute x, y into the original equation,

we get (x′)2 + 6(y′)2 = 36. This is an ellipse. �

Remark 10.5.27 In the above example, we have used a rotation of the axis to the majoraxis of the conic. We do not change the conic at all. While if we use the ‘congruent’ asin Chapter 9, we can reduce the conic to the form 2x′2 + 3y′2 = 36 via the non-singular

matrix P =

(1 1

0 1

).

Even though the curve changes its shape, but it is still an ellipse.

By applying Theorems 10.5.20 and 9.5.1 we have


Theorem 10.5.28 Suppose A ∈Mn(R) and symmetric. Then

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is non-negative definite if and only if all the eigenvalues of A are non-negative.

Theorem 10.5.29 Let A and B be two real symmetric m ×m matrices. Then A andB are congruent if and only if A and B have the same numbers of positive, negative andzero eigenvalues.

Proof: Let p, n, z and p′, n′, z′ be respectively the numbers of positive, negative andzero eigenvalues of A and B.

Suppose A and B are congruent. Then there exists an invertible matrix Q such thatB = QTAQ. By Remark 9.5.2 we have p = p; and n = n′, hence z = z′.

Conversely, there are orthogonal matrices Q and Q′ such that

QTAQ = D = diag{λ1, . . . , λp, λp+1, . . . , λp+n, 0, . . . , 0} and

Q′TBQ′ = D′ = diag{λ′1, . . . , λ′p, λ′p+1, . . . , λ′p+n, 0, . . . , 0}.

Let dj = (λ′j)−1λj , 1 ≤ j ≤ p+n, then dj > 0. Let P = diag{√d1, . . . ,

√dp+n, 1, . . . , 1}.

Then QP TQ′TBQ′PQT = A. �

Application – Hessenberg form

We denote vectors in Rn by column vectors X =

⎛⎜⎜⎝x1...

xn

⎞⎟⎟⎠, Z =

⎛⎜⎜⎝z1...

zn

⎞⎟⎟⎠, etc. Assume

that Rn is endowed with the usual inner product 〈X,Z〉 = XTZ =n∑

i=1xizi. Suppose Z

is a unit vector in Rn. Put H = I− 2ZZT . Then it is easy to see that H is a symmetricorthogonal matrix. Such a matrix H is usually called a Householder matrix.

Now HX = (I − 2ZZT )X = X − 2Z(ZTX). So geometrically, HX is just thereflection of X with respect to the space span{Z}⊥. (For the complex case, we considerH = I − 2ZZ∗, then H is Hermitian and unitary.)

Let X =(x1 x2 · · · xn

)Tand e1 =

(1 0 · · · 0

)Tbe given. We would like to

find a unit vector Z =(z1 z2 · · · zn

)Tsuch that Householder matrix H = I−2ZZT

satisfies HX = ae1 for some a ∈ R. Since H is orthogonal, ‖X‖ = ‖HX‖ = ‖ae1‖ = |a|.Thus, |a| = ‖X‖ is a necessary condition.

Without loss of generality, we may assume a > 0. From HX = ae1 we have

X = HTHX = H2X = H(ae1) = a(e1 − 2z1Z).

Comparing coordinates, we have

x1 = a(1− 2z21)

x2 = −2az1z2...

xn = −2az1zn.


If x1 = a, then x2 = · · · = xn = 0. In this case choose Z = e2 and we are done.Assume that x1 �= a, in fact x1 < a. Then from the above equations we have

z1 = ±√

a−x12a and zi = − xi

2az1for i = 2, 3, . . . , n. Now we choose z1 = −

√a−x12a and set

b = a(a− x1). Then −2az1 =√

2a(a− x1) =√2b. It follows that

Z = − 1

2az1

(−2az21 x2 · · · xn

)T=

1√2b

(x1 − a x2 · · · xn

)T.

Put W =(x1 − a x2 · · · xn

)T. Then

‖W‖2 = (x1 − a)2 +n∑

i=2x2i = 2a(a − x1) = 2b. Hence ‖W‖ =

√2b and Z = 1√

2bW .

Therefore, H = I − 2ZZT = I − 1bWW T is a matrix such that HX = ae1. The matrix

H defines the so-called Householder transformation which enables us to transform amatrix into the upper triangular form.

Given X =

⎛⎜⎜⎝x1...

xn

⎞⎟⎟⎠. We want to zero out the last n− k components of X, 1 ≤ k ≤ n.

To do this, we write X =

(X(1)

X(2)

), where X(1) =

⎛⎜⎜⎝x1...

xk−1

⎞⎟⎟⎠ and X(2) =

⎛⎜⎜⎝xk...

xn

⎞⎟⎟⎠. Then

by the above argument, we construct an (n− k + 1)× (n− k + 1) Householder matrix

H(2)k = I(2) − 1

bkWkW

Tk satisfying H

(2)k X(2) = ‖X(2)‖e(2)1 , where I(2) = In−k+1, Wk is a

unit vector in Rn−k+1 and e(2)1 =

(1 0 · · · 0

)Tin Rn−k+1. Let Hk =

(I(1) O

O H(2)k

),

where I(1) = Ik−1. Then

HkX =

(I(1) O

O H(2)k

)(X(1)

X(2)

)=

(X(1)

H(2)k X(2)

)=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

x1...

xk−1n∑

i=k

x2i

0...

0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

Let Y =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

y1...

yk−1yk...

yn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. Then HkY =

(Y (1)

H(2)k Y (2)

), where Y (1) =

⎛⎜⎜⎝y1...

yk−1

⎞⎟⎟⎠ and


Y (2) =

⎛⎜⎜⎝yk...

yn

⎞⎟⎟⎠. Thus Hk acts like the identity on the first k − 1 coordinates of any

vector Y . In particular, if Y (2) = 0, then HkY = Y .


⎛⎜⎝0 1 1

1 0 1

0 −1 0

⎞⎟⎠. Find a Householder matrix H so that HA

is upper triangular.

Solution: First let X =

⎛⎜⎝0

1

0

⎞⎟⎠, a = ‖X‖ = 1, x1 = 0 and b = 1. Thus W =

⎛⎜⎝−110

⎞⎟⎠.

Then WW T =

⎛⎜⎝ 1 −1 0

−1 1 0

0 0 0

⎞⎟⎠. So,

H1 = I − 1

bWW T =

⎛⎜⎝0 1 0

1 0 0

0 0 1

⎞⎟⎠ and H1A =

⎛⎜⎝1 0 1

0 1 1

0 −1 0

⎞⎟⎠ .

Next we consider A =

(1 1

−1 0

). Let X =

(1

−1

). In this case, a = ‖X‖ = √2 and

b =√2(√2− 1) = 2−√2. Thus, W =

(1−√2

1

)and WW T =

((1−√2)2

√2− 1√

2− 1 1

).

I − 1bWW T =

( √2−1

2−√2−√2−1

2−√2

−√2−1

2−√2−√2−1

2−√2

)=

(1√2− 1√

2

− 1√2− 1√

2

)So,

H2 =

⎛⎜⎜⎝1 0 0

0 1√2− 1√

2

0 − 1√2− 1√

2

⎞⎟⎟⎠ =1√2

⎛⎜⎝√2 0 0

0 1 −10 −1 −1

⎞⎟⎠ .

Therefore, H2H1A = 1√2

⎛⎜⎝√2 0

√2

0 2 1

0 0 −1

⎞⎟⎠. �

We may also use Householder transformation to put an m×n matrix A with m ≥ n

into a matrix of the form R if m = n and

(R

O

)if m > n, where R is an upper triangular

matrix. We proceed as follows.


First, we find a Householder transformation H1 = Im− 1b1W1W

T1 which when applies

to the first column of A gives a multiple of e1. Then H1A has the form

⎛⎜⎜⎜⎜⎝∗ ∗ · · · ∗0...

0

A2

⎞⎟⎟⎟⎟⎠ .

Then we can find a Householder transformation H2 that zeros out the last m − 2entries in the second column of H1A while leaving the first entry in the second columnand all the entries in the first column unchanged. Then H2H1A is of the form⎛⎜⎜⎜⎜⎜⎜⎜⎝

∗ ∗ ∗ · · · ∗0 ∗ ∗ · · · ∗0...

0

0...

0

A3

⎞⎟⎟⎟⎟⎟⎟⎟⎠.

Continuing in this fashion, we obtain at most n−1 Householder matricesH1, . . . , Hn−1such that Hn−1 · · ·H1A = R if m = n. (Note that this yields an alternative method tofind the QR decomposition.) If m > n, then we obtain at most n Householder matrix

H1, . . . , Hn such that Hn · · ·H1A =

(R

O

).

Theorem 10.5.31 Let A ∈ Mn(R). Then there exists P = H1 · · ·Hn−2, a product ofHouseholder matrices so that H = PAP is of the Hessenberg form, i.e., (H)i,j = 0 ifi > j + 1.

Proof: As in the above discussion, we find a Householder matrix H1 which zeros out

the last n − 2 entries in the first column of A. Then H1 =

(1 0T

0 H ′

), where H ′ is an

(n − 1) × (n − 1) matrix. It is easy to see that H1A and H1AH1 have the same firstcolumn. Thus applying this procedure to the second, third, . . . , and (n− 2)-th columnsrespectively, we obtain H2, . . . , Hn−2 so that Hn−2 · · ·H1AH1 · · ·Hn−2 has the desiredform. �

Application – Singular value decomposition

Theorem 10.5.32 Let A ∈Mm,n(R) with m > n. Then there exist an m×m orthogonalmatrix U and an n× n orthogonal matrix V such that UTAV is of the form


S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

d1 0 · · · 0

0. . .

......

......

. . . 0

0 · · · 0 dn

0 · · · · · · 0...

......

...

0 · · · · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠with d1 ≥ d2 ≥ · · · ≥ dn ≥ 0.

Proof: Since ATA is non-negative define (see Exercise 9.6-4) , ATA has non-negativeeigenvalues, say λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Put di =

√λi, i = 1, 2, . . . , n. Suppose

rank(ATA) = r, then d1 ≥ d2 ≥ · · · ≥ dr > 0 and dr+1 = · · · = dn = 0. Since ATA issymmetric, there is an orthogonal matrix V such that V TATAV = D, an n×n diagonal

matrix of the form diag{d21, . . . , d2r , 0, . . . , 0}. Let S be the m × n matrix

(Sr O

O O

),

where Sr is an r × r diagonal matrix diag{d1, d2, . . . , dr}. Then we have D = STS.Let V1, . . . , Vn be the columns of V . Then Vi is an eigenvector of ATA with eigenvalue

d2i for i = 1, 2, . . . , r and Vr+1, . . . , Vn are eigenvectors corresponding to eigenvalue 0.

Let V (1) =(V1 V2 · · · Vr

)be the n × r matrix and V (2) =

(Vr+1 · · · Vn

)be the n × (n − r) matrix. Clearly,

(V (1)

)TV (1) = Ir and

(V (2)

)TV (2) = In−r. Since

ATAVi = 0 for i = r + 1, . . . , n, we have(AV (2)

)TAV (2) =

(V (2)

)TATAV (2) = O

and hence AV (2) = O. Since ATAV (1) = V (1)S2r , we have

S−1r

(V (1)

)TATAV (1)S−1r = Ir.

Now we put U (1) = AV (1)S−1r . Then(U (1)

)TU (1) = Ir. Hence U (1) is an m × r

matrix with orthonormal columns U1, . . . , Ur, say. Extend {U1, . . . , Ur} to an orthonor-

mal basis {U1, . . . , Ur, . . . , Um} of Rm. Let U =(U1 · · · Um

)be them×mmatrix and

U (2) =(Ur+1 · · · Um

)be the m× (m− r) matrix. Thus U =

(U (1) U (2)

). Now we

consider

UTAV =

((U (1)

)T(U (2)

)T)A

(V (1) V (2)

)=

((U (1)

)T(U (2)

)T)(

AV (1) AV (2))

=

((U (1)

)T(U (2)

)T)(

AV (1) O)=

((U (1)

)TAV (1) O(

U (2))T

AV (1) O

).

Since(U (1)

)TAV (1) = S−1r

(V (1)

)TATAV (1) = Sr and

(U (2)

)TAV (1) =

(U (2)

)TU (1)Sr = O,


we have UTAV =

(Sr O

O O

). �

Remark 10.5.33

(1) Since the diagonal entries of S are non-negative square roots of the eigenvalues ofATA, hence are unique. The di’s are called singular values of A and the factorizationA = USV T is called the singular value decomposition (SVD) of A.

(2) The matrices U and V are not unique as we can easily see from the proof of Theo-rem 10.5.32.

(3) Since UTAATU = SST is diagonal, U diagonalizes AAT and hence the columns ofU are eigenvectors of AAT .

(4) Since(AV1 · · · AVn

)= A

(V1 · · · Vn

)= AV = US

=(U1 · · · Um

)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

d1 0 · · · 0

0. . .

......

......

. . . 0

0 · · · 0 dn

0 · · · · · · 0...

......

...

0 · · · · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

(d1U1 · · · dnUn

),

we have AVj = djUj for j = 1, 2, . . . , n.

Also from the matrix equation ATU = V ST we have

ATUj = djVj for j = 1, 2, . . . , n;

ATUj = 0 for j = n+ 1, . . . ,m.

Therefore, AATUj = djAVj = d2jUj = λjUj for j = 1, 2, . . . , n and AATUj = 0 for

j = n + 1, . . . ,m. Hence Uj for j = 1, 2, . . . ,m are eigenvectors of AAT and forj = n+ 1, . . . ,m, Uj ’s are eigenvectors corresponding to eigenvalue 0.


⎛⎜⎝1 1

1 1

0 0

⎞⎟⎠. We want to compute the singular values and the

singular value decomposition of A.

ATA =

(2 2

2 2

)has eigenvalues 4 and 0. Consequently, the singular values of A are

2 and 0. An eigenvector corresponding to 4 is

(1

1

)and an eigenvector corresponding

to 0 is

(1

−1

). Then the orthogonal matrix V = 1√

2

(1 1

1 −1

)diagonalizes ATA.


Let U1 = AV1S−11 =

⎛⎜⎝1 1

1 1

0 0

⎞⎟⎠(1√21√2

)(12

)=

⎛⎜⎜⎝1√21√2

0

⎞⎟⎟⎠. The remaining columns of U

must be eigenvectors of AAT corresponding to the eigenvalue 0.

Now AAT =

⎛⎜⎝2 2 0

2 2 0

0 0 0

⎞⎟⎠ has U2 =

⎛⎜⎜⎝1√2

− 1√2

0

⎞⎟⎟⎠ and U3 =

⎛⎜⎝0

0

1

⎞⎟⎠ as eigenvectors whose

eigenvalues are 0. Then

A = USV T =

⎛⎜⎜⎝1√2

1√2

01√2− 1√

20

0 0 1

⎞⎟⎟⎠⎛⎜⎝2 0

0 0

0 0

⎞⎟⎠(1√2

1√2

1√2− 1√

2

)

is a singular value decomposition of A. �

Example 10.5.35 Find a singular value decomposition of A =

⎛⎜⎜⎜⎜⎝2 0 0

0 2 1

0 1 2

0 0 0

⎞⎟⎟⎟⎟⎠.

Here m = 4, n = 3 and ATA =

⎛⎜⎝4 0 0

0 5 4

0 4 5

⎞⎟⎠. Then the characteristics polynomial

of ATA is −(x− 4)(x− 1)(x− 9). Thus the eigenvalues are 9, 4 and 1 and the singularvalues are 3, 2 and 1.

A unit eigenvector corresponding to 9 is V1 = 1√2

⎛⎜⎝0

1

1

⎞⎟⎠. A unit eigenvector corre-

sponding to 4 is V2 =

⎛⎜⎝1

0

0

⎞⎟⎠. A unit eigenvector corresponding to 1 is V3 = 1√2

⎛⎜⎝ 0

1

−1

⎞⎟⎠.

Then V (1) = V = 1√2

⎛⎜⎝ 0√2 0

1 0 1

1 0 −1

⎞⎟⎠ diagonalizes ATA. Since S3 =

⎛⎜⎝3 0 0

0 2 0

0 0 1

⎞⎟⎠,


S−13 =

⎛⎜⎝13 0 0

0 12 0

0 0 1

⎞⎟⎠. So we obtain

U (1) = AV (1)S−13 = AV S−13 =1√2

⎛⎜⎜⎜⎜⎝0√2 0

1 0 1

1 0 −10 0 0

⎞⎟⎟⎟⎟⎠ .

To obtain U4 we have to find an eigenvector corresponding to the zero eigenvalue

of AAT . Now AAT =

⎛⎜⎜⎜⎜⎝4 0 0 0

0 5 4 0

0 4 5 0

0 0 0 0

⎞⎟⎟⎟⎟⎠. From this, we obtain U4 =

⎛⎜⎜⎜⎜⎝0

0

0

1

⎞⎟⎟⎟⎟⎠ as a unit

eigenvector corresponding to 0. Thus, we have U = 1√2

⎛⎜⎜⎜⎜⎝0√2 0 0

1 0 1 0

1 0 −1 0

0 0 0√2

⎞⎟⎟⎟⎟⎠. �

For complex matrices, we consider A∗A instead of ATA and proceed as in the realcase with proper modifications.

Exercise 10.5

10.5-1. Show that every real skew-symmetric matrix is normal. Find a complex sym-metric matrix which is not normal.

10.5-2. Find an orthogonal transformation to reduced the quadratic form x2 + 6xy −2y2 − 2yz + z2 to diagonal form.

10.5-3. Let A =

⎛⎜⎝ 8 4 −14 −7 4

−1 4 8

⎞⎟⎠. Find an orthogonal matrix P such that P TAP is

diagonal.

10.5-4. Let A =

⎛⎜⎝0 1 0

1 0 0

0 0 1

⎞⎟⎠.

(a) Find an orthogonal matrix P such that P TAP is diagonal.

(b) Compute eA.


10.5-5. Let A =

⎛⎜⎝ 1 i 0

−i 1 i

0 −i 1

⎞⎟⎠. Find a unitary matrix U such that U∗AU is diagonal.

10.5-6. Let A = 13

⎛⎜⎝ 2 −2 1

2 1 −21 2 2

⎞⎟⎠. Find a unitary matrix U such that U∗AU is

diagonal.

10.5-7. A =

⎛⎜⎜⎜⎜⎝0 1 −3 0

1 0 0 −3−3 0 0 1

0 −3 1 0

⎞⎟⎟⎟⎟⎠. Find an orthogonal matrix P such that P TAP is

diagonal.

10.5-8. Let A and B be real symmetric matrices with A positive definite. For real x,define the polynomial f(x) = det(B − xA). Show that there exist an invertiblematrix Q such that QTAQ = I and QTBQ is a diagonal matrix whose diagonalelements are roots of f(x).

10.5-9. Show that every real skew-symmetric matrix has the form A = P TBP where Pis orthogonal and B2 is a diagonal.

10.5-10. Show that every non-zero real skew-symmetric matrix cannot be orthogonalsimilar to a diagonal matrix.

10.5-11. Show that the eigenvalues of a normal matrix A are all equal if and only if A isa scalar multiple of the identity matrix.

10.5-12. Let W =

⎛⎜⎜⎜⎜⎝9

1

5

1

⎞⎟⎟⎟⎟⎠. Find H the Householder matrix defined by W . Also find the

reflection of X =

⎛⎜⎜⎜⎜⎝3

1

5

1

⎞⎟⎟⎟⎟⎠ with respect to subspace span(W )⊥.

10.5-13. Prove that if A is a real symmetric matrix with eigenvalues λ1, . . . , λn, then thesingular values of A are |λ1|, . . . , |λn|.

10.5-14. Find the singular value decomposition of

⎛⎜⎝1 3

3 1

0 0

⎞⎟⎠.


10.5-15. Show that if A is a real symmetric positive define matrix, then there is an uppertriangular matrix R such that A = RTR.

10.5-16. Suppose that A has eigenvalues 0, 1 and 2 corresponding to eigenvectors

⎛⎜⎝ 1

2

0

⎞⎟⎠,

⎛⎜⎝ 2

−10

⎞⎟⎠ and

⎛⎜⎝ 0

0

1

⎞⎟⎠, respectively. Find A. Is A normal? Why?

10.5-17. Show that Theorem 10.5.5 is false if σ is not normal.

10.5-18. Show that if A is normal and Ak = O for some positive integer k, then A = O.

10.5-19. Show that the product of two upper triangular matrices is upper triangular.Also, show that the inverse of an invertible upper triangular matrix is uppertriangular. Hence show that each symmetric positive definite matrix A has thefactorization A = LLT , where LT is upper triangular.

Appendices

§A.1 Greatest Common Division

Proof of Theorem 0.2.6: Applying the Division Algorithm repeatedly, since ri ≥ 0and b > r1 > r2 > · · · , the process will stop. Thus we obtain recurrence relations (0.1).

By Lemma 0.2.5,

(a, b) = (a− bq1, b) = (r1, b) = (r1, b− r1q2)

= (r1, r2) = (r1 − r2q3, r2) = (r3, r2).

Continuing this process, we get (a, b) = (rn−1, rn) = (rn, 0) = rn.Since (

a

b

)=

(q1 1

1 0

)(b

r1

),

(b

r1

)=

(q2 1

1 0

)(r1

r2

),(

r1

r2

)=

(q3 1

1 0

)(r2

r3

)· · · ,

(rn−2rn−1

)=

(qn 1

1 0

)(rn−1rn

),(

rn−1rn

)=

(qn+1 1

1 0

)(rn

0

),

we have (b

r1

)=

(0 1

1 −q1

)(a

b

),

(r1

r2

)=

(0 1

1 −q2

)(b

r1

),(

r2

r3

)=

(0 1

1 −q3

)(r1

r2

), · · · ,

(rn−1rn

)=

(0 1

1 −qn

)(rn−2rn−1

),(

rn

0

)=

(0 1

1 −qn+1

)(rn−1rn

).

Therefore, (rn

0

)=

(0 1

1 −qn+1

)(0 1

1 −qn

)· · ·

(0 1

1 −q2

)(0 1

1 −q1

)(a

b

)

=

(s t

u v

)(a

b

)

249

250 Appendices

for some s, t, u, v ∈ Z. This completes the proof. �

Proof of Remark 0.2.8: From the proof of Theorem 0.2.6, for 0 ≤ i ≤ n, we have(ri

ri+1

)=

(si ti

ui vi

)(a

b

)

for some si, ti, ui, vi ∈ Z, where r0 = b,

(s0 t0

u0 v0

)=

(0 1

1 −q1

)and

(si ti

ui vi

)=

(0 1

1 −qi+1

)(si−1 ti−1ui−1 vi−1

)

=

(ui−1 vi−1

si−1 − qi+1ui−1 ti−1 − qi+1vi−1

).

For convenience, let s−1 = 1, s0 = 0, t−1 = 0 and t0 = 1, then the above equation holdsfor 0 ≤ i ≤ n. So we have

si = ui−1, ti = vi−1, ui = si−1 − qi+1ui−1, and vi = ti−1 − qi+1vi−1.

Then we have

si = si−2 − qiui−2 = si−2 − qisi−1, 1 ≤ i ≤ n.

Similarly,

ti = ti−2 − qiti−1, 1 ≤ i ≤ n.

Then g.c.d.(a, b) = rn = asn + btn. �

§A.2 Block Matrix Multiplication

Theorem A.2.1 Let A ∈Mm,n(F) and B ∈Mn,p(F). Suppose A and B are partitionedas follows:

A =

⎛⎜⎜⎝A1,1 · · · A1,s

......

...

Ar,1 · · · Ar,s

⎞⎟⎟⎠ , B =

⎛⎜⎜⎝B1,1 · · · B1,t

......

...

Bs,1 · · · Bs,t

⎞⎟⎟⎠ ,

where each Ai,j is an mi × nj submatrices of A, each Bj,k is an nj × pk submatricesof B, m = m1 + · · · + mr, n = n1 + · · · + ns and p = p1 + · · · pt for some positive

integers r, s, t. Let C = AB. Then there are Ci,k =s∑

j=1Ai,jBj,k ∈ Mmi,pk(F) such that

C =

⎛⎜⎜⎝C1,1 · · · C1,t

......

...

Cr,1 · · · Cr,t

⎞⎟⎟⎠, 1 ≤ i ≤ r, 1 ≤ k ≤ t.

Appendices 251

Proof: To prove this theorem, we first define a number q(A,B) = r + s + t. Clearlyq(A,B) ≥ 3.

We shall prove the theorem by induction on q(A,B).

When q(A,B) = 3. Then r = s = t = 1. The theorem is always true.

When q(A,B) = 4. Then there are 3 cases as below:

Case 1: Suppose t = 2, s = r = 1. Then B =(B1 B2

), where

B1 =(B∗1 · · · B∗p1

)∈Mn,p1(F) and B2 =

(B∗(p1+1) · · · B∗p

)∈Mn,p2(F).

Thus,

AB = A(B∗1 · · · B∗p1 B∗(p1+1) · · · B∗p

)=

(AB∗1 · · · AB∗p1 AB∗(p1+1) · · · AB∗p

)=

(AB1 AB2

).

Let C1,1 = AB1 and C1,2 = AB2. Then we have the theorem for this case.

Case 2: Suppose r = 2, s = t = 1. Then A =

(A1

A2

). Consider CT = (AB)T = BTAT .

By Case 1 we have CT =(BT (A1)

T BT (A2)T). Thus C =

(A1B

A2B

). So by

putting C1,1 = A1B and C2,1 = A2B we have the theorem for this case.

Case 3: Suppose s = 2, r = t = 1. Then A =(A1 A2

)and B =

(B1

B2

), where A1, A2,

B1 and B2 are m× n1, m× n2, n1 × p and n2 × p matrices respectively. Then

(C)i,k =n∑

j=1

(A)i,j(B)j,k =

n1∑j=1

(A)i,j(B)j,k +n∑

j=n1+1

(A)i,j(B)j,k.

Now

n1∑j=1

(A)i,j(B)j,k = (A1B1)i,k andn∑

j=n1+1

(A)i,j(B)j,k = (A2B2)i,k.

So we have

(C)i,k = (A1B1 +A2B2)i,k

and hence C = C1,1 = A1B1 +A2B2.

Thus, the theorem holds for q(A,B) = 4.

Suppose the theorem holds for q(A,B) ≤ q, where q ≥ 4. Now we assume q(A,B) =q + 1. Then one of r, s and t must be greater than 1.

252 Appendices

Case 1: Suppose t ≥ 2. Then we rewrite B as B =(B1 B2

), where

B1 =

⎛⎜⎜⎝B1,1 · · · B1,t−1...

......

Bs,1 · · · Bs,t−1

⎞⎟⎟⎠ and B2 =

⎛⎜⎜⎝B1,t

...

Bs,t

⎞⎟⎟⎠ .

Then AB = A(B1 B2

). In this case, q(A,B) = 1 + 1 + 2 = 4. By induction

AB =(AB1 AB2

).

Since q(A,B1) = r + s+ (t− 1) = q, by induction again

C =

⎛⎜⎜⎝C1,1 · · · C1,t−1...

......

Cr,1 · · · Cr,t−1

⎞⎟⎟⎠where Ci,k =

s∑j=1

Ai,jBj,k ∈Mmi,pk(F) for 1 ≤ i ≤ r and 1 ≤ k ≤ t− 1.

Similarly, since q(A,B2) = r + s+ 1 ≤ q, we have

AB2 =

⎛⎜⎜⎝C1,t

...

Cr,t

⎞⎟⎟⎠ , where Ci,t =s∑

j=1

Ai,jBj,t ∈Mmi,pt(F),

for 1 ≤ i ≤ r.

Combine these two results we have the theorem for this case.

Case 2: Suppose r ≥ 2. Then consider CT = BTAT . We will get the result.

Case 3: Suppose r = t = 1 and s ≥ 2. Then

A =(

A1,1 · · · A1,s−1 A1,s)

and B =

⎛⎜⎜⎜⎜⎝B1,1

...

Bs−1,1

Bs,1

⎞⎟⎟⎟⎟⎠ .

We rewrite A =(A1 A1,s

)and B =

(B1

Bs,1

). By induction AB = A1B1 +

A1,sBs,t. Since q(A1, B1) = 1 + (s − 1) + 1 = s + 1 = q, by induction A1B1 =s−1∑j=1

A1,jBj,1. Thus we have C = C1,1 =s∑

j=1A1,jBj,1.

Therefore, by induction the theorem holds for q(A,B) ≥ 3. �

Appendices 253

§A.3 Jacobian Matrix

In finite dimensional calculus we define the derivative of a mapping f from Rn intoRm as a linear transformation.

Suppose f : Rn → Rm is a mapping. f is said to be differentiable at x ∈ Rn if thereexists a linear transformation σx ∈ L(Rn,Rm) such that for h ∈ Rn \ {0} we have

‖f(x+ h)− f(x)− σx(h)‖m‖h‖n → 0 as ‖h‖n → 0.

Here ‖ · ‖m and ‖ · ‖n are the usual norms of Rm and Rn, respectively. One can easilycheck that if such σx exists then it is unique and we shall denote it by Df(x).

Definition A.3.1 Let f : Rn → Rm be a mapping and x ∈ Rn. Let h be a unit vectorin Rn. Then the directional derivative of f at x in the direction h, denoted by Dhf(x),is defined by

Dhf(x) = limt→0

f(x+ th)− f(x)

t.

Proposition A.3.2 Let f : Rn → Rm be differentiable at x. Then for each unit vectorh ∈ Rn, Dhf(x) exists and Dhf(x) =

(Df(x)

)(h).

In particular, if x = (x1, . . . , xn) ∈ Rn and h = ej in Rn, then Dejf(x) is the usualpartial derivative with respect to xj .

Proposition A.3.3 Let f : Rn → Rm be a mapping. Suppose that for x = (x1, . . . , xn) ∈Rn, f(x) = (f1(x), . . . , fm(x)). Let A = {e1, . . . , en} and B = {e∗1, . . . , e∗m} be stan-dard bases of Rn and Rm respectively. If f is differentiable at x0, then

[Df(x0)]AB =

(∂fi(x0)

∂xj

)1≤i≤m1≤j≤n

.

Proof: Let [Df(x0)]AB = (aij). Then Df(x0)(ej) =

m∑i=1

aije∗i , 1 ≤ j ≤ n. Since

Df(x0)(ej) = Dejf(x0) =

(∂f1(x0)

∂xj, · · · , ∂fm(x0)

∂xj

)=

m∑i=1

∂fi(x0)

∂xje∗i .

Thus aij =∂fi(x0)

∂xjfor i = 1, 2, . . . ,m and j = 1, 2, . . . , n. �

The matrix [Df(x)]AB is called the Jacobian matrix of f at x.

Corollary A.3.4 If f : Rn → R is differentiable at x. Then with respect to the standardbases of Rn and R, Df(x) has representing matrix⎛⎜⎜⎝

∂f(x)∂x1...

∂f(x)∂xn

⎞⎟⎟⎠ .

254 Appendices

Remark A.3.5 With the same notation as above, Df(x) is a linear functional on Rn.By Theorem 10.2.1 there exists a unique vector denoted by ∇f(x) ∈ Rn such thatDf(x)(v) = ∇f(x) · v. Here v ∈ Rn and “ · ” is the usual dot product in Rn. ∇f(x) is

called the gradient vector of f at x. In standard basis of Rn, ∇f(x) =

(∂f

∂x1, . . . ,

∂f

∂xn

).

Numerical Answers

Chapter 0

Exercise 0.2

0.2-1. (a)

i −1 0 1 2 3 4 5

−qi −14 −1 −1 −16 −2si 1 0 1 −1 2 −33 68

ti 0 1 −14 15 −29 479 −987

s = −33, t = 479

(b)

i −1 0 1 2 3 4 5 6 7

−qi −2 −3 −1 −5 −4 −1 −2si 1 0 1 −3 4 −23 96 −119 334

ti 0 1 −2 7 −9 52 −217 269 −755

s = −119, t = 269

0.2-2. (a)

i −1 0 1 2 3 4

−qi −1 −2 −2 −10si 1 0 1 −2 5 −52ti 0 1 −1 3 −7 73

x = 5, y = 7

(b)

i −1 0 1 2 3 4

−qi −1 −4 −2 −2si 1 0 1 −4 9 −22ti 0 1 −1 5 −11 27

x = −11, y = 9

(c) x = −4, y = 4, z = 1

Exercise 0.4

0.4-1. The inverse of 32 is 44; the inverse of 47 is 10

0.4-4. 8:00am

0.4-6. z = 17

0.4-7. z = 123 + 385s ∀s ∈ N0

255

256 Numerical Answers

Exercise 0.5

0.5-4. (x2 − 2)(x2 +2) over Q; (x−√2)(x+√2)(x2 +2) over R; (x−√2)(x+

√2)(x−√

2i)(x+√2i) over C

0.5-5. (x−2)(x+2)(x2+4) overQ; (x−2)(x+2)(x2+4) over R; (x−2)(x+2)(x−2i)(x+2i)over C

0.5-7.x +1 x4 +x3 +2x2 +x −1 x3 +1 1

2x

x4 +x3 +x +1 x3 −x2x 2x2 −2 x +1 −1

2

2x2 +2x x +1

−2x −2 0

−1 0 1 2 3 4

−x− 1 −12x −2x 1

2

1 0 1 −12x x2 + 1 1

2x2 − 1

2x+ 12

0 1 −x− 1 12x(x+ 1) + 1 −x3 − x2 − 3x− 1 −1

2x3 − x+ 1

2

Then (x2 + 1)g(x) + (−x3 − x2 − 3x− 1)f(x) = −2x− 2

Chapter 1

Exercise 1.2

1.2-1. (a)

(4 −2 −6

−8 −2 0

)

(b)

(4 10 6

−8 1 3

)(c) It cannot work, since the size of B is different from C

(d) It cannot work, since the size of BC is different from CB

(e)

(8 36

2 9

)

(f)

⎛⎜⎝ 20 2 −62 2 3

−6 3 9

⎞⎟⎠(g)

(6 −3 −3

−24 6 9

)

1.2-2.

⎛⎜⎜⎜⎜⎝1 0 0 0

0 1 0 0

k 0 1 0

0 k 0 1

⎞⎟⎟⎟⎟⎠

Numerical Answers 257

1.2-3. AAT = (a2 + b2 + c2 + d2)I4 = ATA

1.2-4.r−1∑i=0

(ki

)N i (since IN = NI)

1.2-5.r−1∑i=0

(ki

)Ak−iN i

1.2-14. A =

(1 0

0 0

), B =

(0 0

0 1

), C =

(0 0

0 2

)1.2-15. Tr(A) = 15, Tr(AB) = 8, Tr(AB) = 107, Tr(BA) = 107

Exercise 1.3

1.3-2. − 1a0(a1I + a2A+ · · ·+ amAm−1)

1.3-3. (a)

⎛⎜⎝ 3 16 −90 69 −410 −41 28

⎞⎟⎠

(b) (A3 −A+ I)−1 = A−1 = −A2 + 2I =

⎛⎜⎝ 1 −1 −10 0 1

0 1 1

⎞⎟⎠

1.3-7.

⎛⎜⎜⎜⎜⎜⎜⎝−1

434 0 0 0

12 −1

2 0 0 0

0 0 1 −1 0

0 0 0 1 −10 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

1.3-8.

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 0 14 0 0

1 0 0 0 0 0

0 12 0 0 0 0

0 0 13 0 0 0

0 0 0 0 −2 1

0 0 0 0 32 −1

2

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠Exercise 1.5

1.5-2.

⎛⎜⎜⎜⎜⎝1 0 0 0 −736

85

0 1 0 0 4

0 0 1 0 1185

0 0 0 1 13985

⎞⎟⎟⎟⎟⎠1.5-3. I4


1.5-4. rref(A) + rref(B) =

⎛⎜⎝ 2 0 5 3231

0 2 −1 −2331

0 0 1 5131

⎞⎟⎠, rref(A+B) =

⎛⎜⎝ 1 0 0 −2371

0 1 0 571

0 0 1 7071

⎞⎟⎠

1.5-5.

⎛⎜⎝1 0 0

0 1 0

0 0 1

⎞⎟⎠,

⎛⎜⎝1 0 ∗0 1 ∗0 0 0

⎞⎟⎠,

⎛⎜⎝1 ∗ 0

0 0 1

0 0 0

⎞⎟⎠,

⎛⎜⎝1 ∗ ∗0 0 0

0 0 0

⎞⎟⎠,

⎛⎜⎝0 1 ∗0 0 0

0 0 0

⎞⎟⎠,

⎛⎜⎝0 1 0

0 0 1

0 0 0

⎞⎟⎠,

⎛⎜⎝0 0 1

0 0 0

0 0 0

⎞⎟⎠,

⎛⎜⎝0 0 0

0 0 0

0 0 0

⎞⎟⎠

Chapter 2

Exercise 2.2

2.2-1.

⎛⎜⎝9 0 1

0 8 4

1 2 8

⎞⎟⎠

2.2-2.

⎛⎜⎝−2 0 1

0 −3 4

1 2 −3

⎞⎟⎠

2.2-3.

⎛⎜⎝2 0 1

0 3 0

1 0 2

⎞⎟⎠Exercise 2.4

2.4-1. (a) 3

(b) 3

(c) 2

2.4-2. (a) {(0, 0, 0, 0)}(b) {(1, 2, t, 193−4t10 , 69−2t2 ) | t ∈ R}

Exercise 2.5

2.5-1. (a) P =

⎛⎜⎝ − 415

25 −1

313 0 −1

3215 −1

523

⎞⎟⎠, Q =

⎛⎜⎜⎜⎜⎝− 4

15215 −1

3 113 0 −1

3 −1215 −1

523 −1

0 0 0 1

⎞⎟⎟⎟⎟⎠


(b) P =

⎛⎜⎜⎜⎜⎝17 0 0 −2

717 −1

515

15

0 15

25

935

−27

15 −8

51635

⎞⎟⎟⎟⎟⎠, Q =

⎛⎜⎜⎜⎜⎝17 0 −3

7107

17 −1

5635

135

0 15

25 −8

5

−27

15

935 −16

35

⎞⎟⎟⎟⎟⎠

(c) P =

⎛⎜⎜⎜⎜⎜⎜⎝−13

14 −57

127

514

47 −3

7

−57

17

67

114

27

27

314 −1

717

⎞⎟⎟⎟⎟⎟⎟⎠, Q =

⎛⎜⎝ −19 −1

9518

29

29 − 1

1823 −1

3 −38

⎞⎟⎠

2.5-5. M =

⎛⎜⎝ 25 15 21 0

8 1 22 5

0 23 15 14

⎞⎟⎠. “you have won”

Chapter 3

Exercise 3.3

3.3-1. (a), (b), (c) and (e) are linearly independent sets.For (d), (2, 0, 1) = 5(1, 1, 0)− 2(0, 1, 1)− 3(1, 1,−1)

3.3-2. (1, 2, 3, 0)T = (1, 1, 2,−1)T + (1, 0, 1, 1)T + (−1, 1, 0, 0)T

3.3-3. a = 1 or 2

3.3-9. {β1, β2, β3} is linearly independent if and only if ab+ a+ b �= 0 ∀a, b ∈ R

3.3-10. V = span{(1, 0)} and W = span{(0, 1)}Exercise 3.5

3.5-1. {(1, 0, 0,−1), (0, 1, 0, 3), (0, 0, 1,−2)}3.5-2. R(A) = span{(1, 0, 0, 0, 2, 0), (0, 1, 1, 0,−1, 0) (0, 0, 0, 1, 1, 0), (0, 0, 0, 0, 0, 1)}

C(A) = span{(1, 0, 0, 0)T , (0, 1, 0, 0)T (0, 0, 1, 0)T , (0, 0, 0, 1)T }Exercise 3.6

3.6-1. We choose {(1, 1, 1, 3), (1,−2, 4, 0), (2, 0, 5, 3), (3, 4, 6, 1)} as a basis. Then(0, 3, 2,−2) = 17

3 (1, 1, 1, 3)− 133 (1,−2, 4, 0) + 5(2, 0, 5, 3)

(1, 3, 5, 7) = −18121 (1, 1, 1, 3)− 170

21 (1,−2, 4, 0) + 747 (2, 0, 5, 3)− 8

7(3, 4, 6,−1)3.6-2. By using row operation,⎛⎜⎜⎜⎜⎝

1 2 5

1 3 3

−1 −4 −32 2 3

I4

⎞⎟⎟⎟⎟⎠ becomes

⎛⎜⎜⎜⎜⎝1 2 5 1 0 0 0

0 1 −2 −1 1 0 0

0 0 1 0.5 −1 −0.5 0

0 0 0 1.5 −9 −5.5 1

⎞⎟⎟⎟⎟⎠.

So A =(1.5 −9 −5.5 1

)or equivalently A =

(3 −18 −11 2

)


3.6-3. {(1, 1, 0, (−1, 2, 1), (0, 1, 0)}3.6-4. {(1, 1, 0, 1), (−1, 0, 1, 1, ), (0, 0, 1, 0), (0, 0, 0, 1)}3.6-5. {(1,−2, 0, 1), (1, 0, 0, 1), (1,−2, 1, 1), (0, 1, 1, 1)} or

{(1,−2, 0, 1), (1,−2, 1, 1), (1,−1, 0, 1), (0, 1, 1, 1)}Exercise 3.7

3.7-1. (x1, x2, x3, x4, x5) = (−1 + 4t− s, 2 + 2s− t, t, 3s, s)

3.7-2. a = −43.7-3. four 10 cents, three 50 cents, seven 1 dollar

3.7-4. (250− 3t, 2050 + t, t), where t ∈ N and 0 ≤ t ≤ 83

Chapter 4

Exercise 4.1

4.1-1. −1

4.1-2. θ ◦ σ =

(1 2 3 4 5 6

2 3 6 4 5 1

), σ ◦ θ =

(1 2 3 4 5 6

3 1 5 4 2 6

)Exercise 4.2

4.2-1. detA =n∏

i=1aii, where A is an n× n matrix

Exercise 4.2

4.2-1. 0

4.2-3. (−1) 12n(n−1) 1

2nn−1(n+ 1), 0, (−1)n

n∑i=1

aibi

4.2-4.n∏

i=1

n+1∏j=i

∣∣∣∣∣ai bi

aj bj

∣∣∣∣∣4.2-8.

⎛⎜⎝ 1 −12 −1

213

12

16

−13 0 1

3

⎞⎟⎠4.2-10. (d) For nonzero integers x, y, z satisfying the equation 7x+ 13y − 4z = ±1. For

example, x = 2, y = 1 and z = 7

Exercise 4.3

4.3-1. (x1, x2, x3) = (−1,−12 ,−7

2)


4.3-2. (x1, x2, x3) = (t,−2t3 ,

t3)

Chapter 5

Exercise 5.1

5.1-1. n− 1

Exercise 5.2

5.2-1.

(1 2 −11 0 −1

)x2 +

(1 0 −14 2 0

)x+

(1 −3 −1−3 0 1

)

5.2-2. −x3 + 2x2 + x− 2 = −(x− 2)(x− 1)(x+ 1)

5.2-3. 2, 1, −1. The algebraic and geometric multiplicities of each eigenvalue are 1

5.2-5. (−1)np(x)Exercise 5.3

5.3-3. (n+ 1)!

5.3-5. (a)

⎛⎜⎝1 0 0

0 −1 0

0 0 0

⎞⎟⎠(c) 1

4

⎛⎜⎝ e+ e−1 −e+ e−1 e+ e−1

−2e+ 2e−1 2e+ 2e−1 −2e+ e−1

e+ e−1 −e+ e−1 e+ e−1

⎞⎟⎠5.3-6.

1

2n√5

(Cn+1k 5

k2 Cn

k 5k−12

2Cn+1k 5

k2 2Cn

k 5k−12

)

Chapter 6

Exercise 6.2

6.2-1. λ = 2 or −16.2-2. −a+ ab− b �= 0

6.2-5. Yes

6.2-6. No

6.2-11. Yes


Exercise 6.3

6.3-4. span(S) =

{(a b

b c

) ∣∣∣∣∣ a, b, c ∈ F

}6.3-5. Yes

Exercise 6.4

6.4-1. No. {α1, α2} is a basis of V

6.4-5. {(1, 1, 0, 0), (1,−1, 1, 0), (0, 2, 0, 1)}6.4-6. {x2 + 2, x+ 3}6.4-7. n(n+1)

2

Exercise 6.5

6.5-2. Yes

6.5-3. W ′ = span{(0, 0, 1, 0), (0, 0, 0, 1)}

Chapter 7

Exercise 7.1

7.1-3. σ is onto. f = −(x+ 2)

7.1-4. (1) surjective, (2) injective, (3) injective and surjective, (4) injective and surjective

7.1-5. {0}

7.1-6. ker(σ) =

{(0 0

c 0

) ∣∣∣∣∣ c ∈ R

}, nullity(σ) = 1, rank(σ) = 2

Exercise 7.2

7.2-1. (a) [σ]AB =

⎛⎜⎝ 2 −13 4

1 0

⎞⎟⎠, rank(σ) = 2, nullity(σ) = 0

(b)

⎛⎜⎝ 1 −1 2

2 1 0

−1 −2 2

⎞⎟⎠, rank(σ) = 3, nullity(σ) = 0

(c)(

2 1 −1), rank(σ) = 1, nullity(σ) = 2

(d)

⎛⎜⎜⎜⎜⎝1 0 · · · 0

1 0 · · · 0...

......

...

1 0 · · · 0

⎞⎟⎟⎟⎟⎠, rank(σ) = 1, nullity(σ) = n− 1


(e)

⎛⎜⎜⎜⎜⎜⎝0 · · · 0 1... . .

.

. ..

0

0 . ..

. .. ...

1 0 · · · 0

⎞⎟⎟⎟⎟⎟⎠, rank(σ) = n, nullity(σ) = 0

7.2-2.

⎛⎜⎝13 1

4 6

−23 −1

⎞⎟⎠

7.2-3.

⎛⎜⎝1 1 0 0

0 0 0 2

0 1 0 0

⎞⎟⎠

7.2-4. (b) [σ]A =

⎛⎜⎜⎜⎜⎝0 0 −1 0

1 −1 0 −10 0 1 0

0 0 1 0

⎞⎟⎟⎟⎟⎠(c) Eigenvalues: −1, 0, 0, 1 and

their corresponding eigenvectors:

⎛⎜⎜⎜⎜⎝0

1

0

0

⎞⎟⎟⎟⎟⎠,

⎛⎜⎜⎜⎜⎝1

1

0

0

⎞⎟⎟⎟⎟⎠,

⎛⎜⎜⎜⎜⎝0

−10

1

⎞⎟⎟⎟⎟⎠,

⎛⎜⎜⎜⎜⎝−1−11

1

⎞⎟⎟⎟⎟⎠(d) a

(1 1

0 0

)+ b

(0 −10 1

)for all a, b ∈ R

7.2-5.

⎛⎜⎜⎜⎜⎝1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

⎞⎟⎟⎟⎟⎠7.2-6.

(1 0 0 1

)

7.2-7.

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 0 · · · 0

0 1 1. . .

......

. . .. . .

. . ....

0. . .

. . .. . . 1

0 0 · · · 0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠


7.2-8.

⎛⎜⎝0 2 0

0 0 8

0 0 0

⎞⎟⎠7.2-9.

(11035

)

7.2-10.

(12

√32√

32 −1

2

)Exercise 7.3

7.3-1.

⎛⎜⎝ 1 −12

34

0 12 0

0 0 14

⎞⎟⎠

7.3-2.

⎛⎜⎝ 2 1 1

3 −2 1

−1 3 1

⎞⎟⎠7.3-3.

(12

12 1

−32

12 −1

)

7.3-4.

⎛⎜⎝ 0 0 0

0 1 0

0 0 1

⎞⎟⎠

Chapter 8

Exercise 8.1

8.1-3. 170

⎛⎜⎝ 72(−4)k + 3k × 40− 42 35(3k − 1) −72(−4)k − 3k × 5 + 77

−82(−4)k + 3k × 40 + 42 35(3k + 1) 82(−4)k − 3k × 5− 77

2(−4)k + 3k × 40− 42 35(3k − 1) −2(−4)k − 3k × 5 + 77

⎞⎟⎠

8.1-4.

⎛⎜⎝2 730

16

0 74

54

0 34

94

⎞⎟⎠8.1-5. A =

(0 1

0.5 0.5

), xk = 1

6(2 + (0.5)k−1), limk→∞

xk = 13

8.1-8.

(0 1

0 0

)and O2, their characteristic polynomials are x2


Exercise 8.3

8.3-1. P =

⎛⎜⎜⎜⎜⎝3 0 7 6

1 −2 2 2

3 0 0 0

1 −1 2 2

⎞⎟⎟⎟⎟⎠, P−1AP =

⎛⎜⎜⎜⎜⎝1 1 0 0

0 1 1 0

0 0 1 0

0 0 0 1

⎞⎟⎟⎟⎟⎠

8.3-2.

⎛⎜⎜⎜⎜⎝0 1 0 0

0 0 0 0

0 0 −3 + 3i 0

0 0 0 −3− 3i

⎞⎟⎟⎟⎟⎠

8.3-3.

⎛⎜⎜⎜⎜⎝1 1 0 0

0 1 1 0

0 0 1 1

0 0 0 1

⎞⎟⎟⎟⎟⎠

8.3-4. P =

⎛⎜⎜⎜⎜⎝0 1 0 −1

5

0 1 1 0

5 0 0 0

10 1 1 1

⎞⎟⎟⎟⎟⎠, P−1AP =

⎛⎜⎜⎜⎜⎝1 1 0 0

0 1 0 0

0 0 1 0

0 0 0 1

⎞⎟⎟⎟⎟⎠

8.3-5. P =

⎛⎜⎜⎜⎜⎝−2 1 0 0

−4 0 0 0

1 1 −2 1

8 0 −4 0

⎞⎟⎟⎟⎟⎠, P−1AP =

⎛⎜⎜⎜⎜⎝0 1 0 0

0 0 0 0

0 0 0 1

0 0 0 0

⎞⎟⎟⎟⎟⎠

Chapter 9

Exercise 9.1

9.1-1. φ1(x, y, z) = −y + z, φ2 = −x− y + z, φ3 = x+ 2y − z

9.1-6. − 31120x

4 + 2720x

3 − 29120x

2 − 7720x+ 2

Exercise 9.2

9.2-3. {ψ(x, y, z) = a(x+ y + z) | a ∈ R}9.2-4. {ψ(x, y, z, w) = a(3x− z − w) | a ∈ R} ∀f ∈ R[x]

Exercise 9.3

9.3-1. (a) ax (b) bx− ay (c) (a+ b)x− (a− b)y

9.3-2.(D(φ)

)(f) = φ(D ◦ f) = f(b)− f(a)


9.3-3. σ(φ) = O (the zero map)

Exercise 9.4

9.4-1. Yes

9.4-2.

⎛⎜⎝ 0 2 0

2 0 0

0 0 −2

⎞⎟⎠

9.4-3.

⎛⎜⎜⎜⎜⎝1 0 0 1

0 0 0 0

0 0 0 0

1 0 0 1

⎞⎟⎟⎟⎟⎠

9.4-6. (a)

⎛⎜⎝ 4 0 0

0 −2 0

0 0 32

⎞⎟⎠ with P =

⎛⎜⎝ 1 0 0

1 1 −12

1 0 1

⎞⎟⎠

(b)

⎛⎜⎝ 1 0 0

0 −3 0

0 0 9

⎞⎟⎠ with P =

⎛⎜⎝ 1 −2 2

0 1 −20 0 1

⎞⎟⎠

(c)

⎛⎜⎜⎜⎜⎝2 0 0 0

0 −12 0 0

0 0 −4 0

0 0 0 −3

⎞⎟⎟⎟⎟⎠ with P =

⎛⎜⎜⎜⎜⎝1 1

2 −1 12

1 −12 −2 0

0 0 1 32

0 0 0 −1

⎞⎟⎟⎟⎟⎠Exercise 9.5

9.5-1. (a) 0 (b) 3 (c) 1

9.5-3. P =

⎛⎜⎝ 1 0 −31 1 −20 0 1

⎞⎟⎠, q(y1, y2, y3) = y21 − y22 − 14y23

9.5-4. q(x′, y′, z′, w′) = x′2−2y′2+3z′2−2w′2, where x = x′+y′+z′+w′, y = y′+z′+w′,z = z′ + w′ and w = w′

Exercise 9.6

9.6-1. (a)

⎛⎜⎝1 0 0

0 −2 0

0 0 0

⎞⎟⎠ with P =

⎛⎜⎝1 −i −12 + 1

2 i

0 1 12 + 1

2 i

0 0 1

⎞⎟⎠, 0

(b)

⎛⎜⎝1 0 0

0 −3 0

0 0 53

⎞⎟⎠ with P =

⎛⎜⎝1 −1− 2i 1− 13 i

0 1 −23 + 2

3 i

0 1 13 + 2

3 i

⎞⎟⎠, 1


Chapter 10

Exercise 10.1

10.1-4. 1√2(1, 0, 1, 0), 1

6(1, 2,−1, 0), 1√21(−2, 2, 2, 3), 1

7(−2,−5, 2,−4)

10.1-6. {1, 2√3(x− 12), 6

√5(x2 − x+ 1

6), 10√28(x3 − 3

2x2 + 3

5x− 120)}

10.1-8. { 1√3(1, 1,−1), 1√

2(−1, 1, 0)}

10.1-9. (0, 23 ,−13 ,

43}

10.1-10. 2

10.1-11. 3y = 2x+ 4

10.1-12. y = 12(4 + x− x2)

10.1-13. (a) (2, 1)

(b) 15(8, 3, 6)

(c) (2, 1, 0)

10.1-14. Q =

⎛⎜⎜⎜⎜⎝1 0 0 0

0 − 1√2− 1√

20

0 − 1√2

1√2

0

0 0 0 1

⎞⎟⎟⎟⎟⎠, R =

⎛⎜⎜⎜⎜⎝1

√2 0

0 −√2 −√2

0 0√2

0 0 0

⎞⎟⎟⎟⎟⎠Exercise 10.3

10.3-2. ξ2 �→ 13(−2ξ1 + 2ξ2 − ξ3), ξ3 �→ 1

3(−2ξ1 − ξ2 + 2ξ3)

Exercise 10.4

10.4-1. P =

⎛⎜⎝ 1 14

12

0 14 −3

2

0 0 2

⎞⎟⎠; P TAP =

⎛⎜⎝ 1 12 −3

0 14 −5

2

0 0 2

⎞⎟⎠Exercise 10.5

10.5-2. x = x′ − 14y′, y = − 1

12y′, − 1

12y′ − z′; x′2 − 1

16y′2 + z′2

10.5-3.

⎛⎜⎝ 1 1 1

−4 14 −4

1 0 −17

⎞⎟⎠

10.5-4. (a)

⎛⎜⎜⎝− 1√

21√2

01√2

1√2

0

0 0 1

⎞⎟⎟⎠


(b) 12

⎛⎜⎝ e−1 + e −e−1 + e 0

−e−1 + e e−1 + e 0

0 0 e

⎞⎟⎠ =

⎛⎜⎝ cosh 1 sinh 1 0

sinh 1 cosh 1 0

0 0 e

⎞⎟⎠

10.5-5.

⎛⎜⎜⎝−1

21√2−1

2

− i√2

i i√2

12

1√2

12

⎞⎟⎟⎠

10.5-6.

⎛⎜⎜⎝1√2− i

2 − i2

0 1√2

1√2

1√2− i

2i2

⎞⎟⎟⎠

10.5-7.

⎛⎜⎜⎜⎜⎝1 1 −1 −1

−1 1 1 −11 1 1 1

−1 1 −1 1

⎞⎟⎟⎟⎟⎠

10.5-12. H = I − 2WW T =

⎛⎜⎜⎜⎜⎝−161 −18 −90 −18−18 −1 −10 −2−90 −10 −49 −10−18 −2 −10 −1

⎞⎟⎟⎟⎟⎠, HX =

⎛⎜⎜⎜⎜⎝−969−107−535−107

⎞⎟⎟⎟⎟⎠

10.5-14. U =

⎛⎜⎜⎝− 1√

2− 1√

20

− 1√2

1√2

0

0 0 1

⎞⎟⎟⎠, S =

⎛⎜⎝ 4 0

0 2

0 0

⎞⎟⎠, V =

(− 1√

21√2

− 1√2− 1√

2

); A = USV T

10.5-16.

⎛⎜⎝ − 425 − 2

25 0

− 225

125 0

0 0 2

⎞⎟⎠Note: If you find some wrong answers, please send an e-mail to tell the author Dr. W.C.Shiu ([email protected]) for correction.

Index

n-tuple, 43

Addition, 2table, 4

Adjoint, 92, 222Algebraic multiplicity, 100Annihilator, 187Annihilator of a matrix, 106Augmented matrix, 44

Basis, 65, 125basis dual to, 183dual basis, 183ordered basis, 143standard basis, 125

Best approximation, 216Bilinear, 192

form, 192Bilinear form

skew-symmetric, 193symmetric, 193

Cancellation law, 118Canonical basis, 71Casting-out method, 73Cauchy-Schwarz’s inequality, 211Cayley-Hamilton Theorem, 105Characteristic equation, 99Characteristic polynomial, 99Characteristic value, 98Characteristic vector, 98Classical adjoint, 92Codimension, 133Coefficient matrix, 43Cofactor, 90Column, 20Column space, 70Companion matrix, 109Congruence, 12

Congruent, 115, 193

Conjugate bilinear form

eigenvalue, 224

eigenvectors, 224

Conjugate linear, 206

Coordinate, 143

i-th, 143

Coordinate function, 183

Cramer’s rule, 96

Cycle of generalized eigenvectors, 158

Determinant, 85

Diagonal matrix, 21

Diagonalizable, 110

Dimension, 67

Divide, 16

Divisible, 6, 16

Division algorithm, 6

Divisor

common divisor, 6

greatest common divisor, 6

Dot product, 192

Dual bases, 186

Dual of a linear transformation, 189

Dual space, 182

Dual spaces, 186

Eigenpair, 98, 148

Eigenspace, 100, 154

Eigenvalue, 98

Eigenvalue problem, 98

Eigenvector, 98

generalized, 158

Elementary column operations, 38

Elementary matrix, 35

type, 35

Elementary row operations, 34

End vector of a cycle, 158

269

270 Index

Epimorphism, 136Equivalence class, 10Euclidean space, 211

Factor, 16Field, 2

finite, 5subfield, 4

Fourier coefficients, 215Free variables, 50Full column rank, 80

Gaussian elimination, 40General solution, 79Generalized eigenspace, 159Geometric multiplicity, 100Gram determinant, 205Gram-Schmidt orthonormalization, 212

Hermite normal form, 39Hermitian congruent, 207Hermitian form, 206

negative definite, 208non-negative definite, 208non-positive definite, 208positive definite, 208signature, 208

Hermitian matrix, 206negative definite, 208non-negative semi-definite, 208non-positive definite, 208positive definite, 208signature, 208

Hermitian quadratic form, 206Hessenberg form, 242Householder matrix, 239Householder transformation, 240

Idempotent, 113Identity linear transformation, 137Identity matrix, 21Initial vector of a cycle, 158Inner product, 192, 210Inner product space, 211Invariant, 154Inverse, 2Inversion, 83

Involutory, 113

Isometry, 225

Isomorphism, 136

Jordan block, 160

Jordan canonical basis, 160

Jordan canonical form, 160

Kernel, 138

Kronecker delta, 21

Lagrange interpolation, 185

Lead variables, 50

Leading

column, 39, 50

one, 39

Least squares problem, 216

Length, 210

Length of a cycle, 158

Linear combination, 63, 120

Linear form, 182

Linear functional, 182

Linear relation, 64

non-trivial, 64

trivial, 64

Linear system, 43

consistent, 44

homogeneous, 43

inconsistent, 44

non-trivial solution, 44

trivial solution, 44

Linear transformation, 136

adjoint, 222

characteristic polynomial, 152

diagonalizable, 153

matrix representation, 144

minimum polynomial, 153

normal, 231

rank, 137

represented by matrix, 144

self-adjoint, 224

Linear transformation associated with aconjugate bilinear form, 224

Linearly dependent, 65, 120, 121

Linearly dependent on, 63, 120

Linearly dependent over F, 65

Index 271

Linearly independent, 65, 120, 121

Main diagonal, 21Matrix, 19

m× n, 19addition, 22block multiplication, 26canonical form, 53column rank, 70congruent, 193defective, 110element, 20entry, 20equal, 20inverse, 31invertible, 31lower triangular, 21negative definite, 113, 205negative semi-definite, 113non-negative definite, 113, 205non-positive definite, 113non-singular, 31normal, 234null space, 75orthogonal, 115, 227orthogonal similar, 227partitioned multiplication, 26positive definite, 113, 205positive semi-definite, 113product, 23rank, 49representing, 193row rank, 70scalar multiplication, 22scalar product, 22similar, 102simple, 110singular, 31size, 20skew-symmetric, 25square, 20submatrix, 26sum, 22symmetric, 25trace, 30, 99transpose, 25

unitary, 227

unitary similar, 227

upper triangular, 21

Vandermonde, 92

Matrix of transition, 148

Maximal linearly independent set, 66

Minimum polynomial, 106

Minor, 90

Modulo, 11

Monomorphism, 136

Multiplication, 2

table, 4

Norm, 210

Normal equation, 217

Null space, 63

Nullity, 75, 138

Orthogonal, 211

Orthogonal basis, 211

Orthogonal complement, 223

Orthogonal projection, 223

Orthogonal set, 211

Orthogonal transformation, 225

Orthonormal basis, 211

Orthonormal set, 211

Parallelogram law, 219

Particular solution, 79

Partition, 10

Permutation, 83

even, 83

identity, 83

odd, 83

Permutation matrix, 53

Pivot, 39

Polarization identities, 219

Polynomial, 14

coefficients, 14

common divisor, 16

greatest common divisor, 16

constant polynomial, 14

constant term, 14

degree, 14

division algorithm, 15

equal, 14

272 Index

in A, 104

irreducible, 16

leading coefficient, 14

leading term, 14

monic, 14

product, 15

reducible, 16

relatively prime, 16

scalar multiplication, 15

sum, 14

zero polynomial, 14

Projection, 142

Projection matrix, 217

Pythagoream theorem, 219

QR decomposition, 213

Quadratic form, 195

negative definite, 204

non-negative definite, 204

non-positive definite, 204

positive definite, 204

signature, 204

Quotient set, 10

Reduced row echelon form, 39

Reduced row echelon matrix, 39

Related to, 9

Relation

binary relation, 9

equivalence, 10

reflexive, 9

symmetric, 9

transitive, 9

Restriction, 139

Ring

commutative ring, 3

polynomial ring, 15

polynomial ring with two indetermi-nates, 18

Root, 18

Row, 20

Row echelon form, 41

Row space, 70

Row-equivalent, 38

Scalar, 118

Scalar product, 210Scalar transformation, 137Schur’s Theorem, 229Self-adjoint, 224Sign, 83Singular value decomposition, 244Singular values, 244Solution set, 43Solution space, 63Span, 124Span of, 63Spanning set, 63, 124Split, 156Standard basis for Fn, 66standard basis for M2(R), 148standard basis for P3(R), 148Standard unit vectors, 21Steintz Replacement Theorem, 126Stochastic matrix, 103Subspace, 69, 123

complement, 133complementary, 133direct sum, 132invariant, 154sum, 131, 133

Sum of subspaces, 64Symmetric matrix

signature, 205System

coefficients, 43of linear equations, 43solution, 43

Term, 14Test for Diagonalizability

first, 110second, 111, 153third, 165

Trace, 30, 99Transformation

othogonal, 225unitary, 225

Transition matrix, 148Trapezoidal form, 41

Unit vector, 211

Index 273

Unitary space, 211Unitary transformation, 225Unity, 2

Vandermonde determinant, 91Vector, 62, 118

addition, 117scalar multiplication, 117

Vector space, 117finite dimension, 127infinite dimension, 127of Fn, 62

Zero, 2Zero matrix, 21

Documents

Vector Spaceszeng/Teaching/math3407/shiu.pdfChapter 6 Vector Spaces 6.1 Deﬁnition and Some Basic Properties InChapter3,wehavelearntsomepropertiesofvectorspaceofFn.Inthischapter