CHAPTER IV - people.math.carleton.cackfong/la4.pdf · The space Cn provides us with the typical example of complex inner product spaces, deﬁned as follows: Deﬁnition. By an inner

CHAPTER IV

OPERATORS ON INNER PRODUCT SPACES

§1. Complex Inner Product Spaces

1.1. Let us recall the inner product (or the dot product) for the real n–dimensional

Euclidean space Rn: for vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Rn, the

inner product 〈x,y〉 (denoted by x · y in some books) is defined to be

〈x,y〉 = x1y1 + x2y2 + · · · + xnyn,

and the norm (or the magnitude) ‖x‖ is given by

‖x‖ =√

〈x,x〉 =√

x21 + x22 + · · · + x2n.

For complex vectors, we cannot copy this definition directly. We need to use complex

conjugation to modify this definition in such a way that 〈x,x〉 ≥ 0 so that the definition

of magnitude ‖x‖ =√

〈x,x〉 still makes sense. Recall that the conjugate of a complex

number z = a+ ib, where x and y are real, is given by z = a− ib, and

z z = (a− ib)(a+ ib) = a2 + b2 = |z|2.

The identity z z = |z|2 turns out to be very useful and should be kept in mind.

Recall that the addition and the scalar multiplication of vectors in Cn are defined

as follows: for x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Cn, and a in C,

x + y = (x1 + y1, x2 + y2, . . . , xn + yn) and ax = (ax1, ax2, . . . , axn)

The inner product (or the scalar product) 〈x,y〉 of vectors x and y is defined by

〈x,y〉 = x1y1 + x2y2 + · · · · · · + xnyn (1.1.1)

Notice that 〈x,x〉 = x1x1+x2x2+ · · ·+xnxn = |x1|2+|x2|2+ · · ·+|xn|2 ≥ 0, which is what

we ask for. The norm of x is given by

‖x‖ = 〈x,x〉1/2 =√

|x1|2+|x2|2+ · · · +|xn|2.

1

Remark: In (1.1.1), it is not clear why we prefer to take complex conjugates of components

of y instead of components of x. Actually this is more or less due to the tradition of

mathematics, rather than our preference. (Physicists have a different tradition!)

The space Cn provides us with the typical example of complex inner product spaces,

defined as follows:

Definition. By an inner product on a complex vector space we mean a device of

assigning to each pair of vectors x and y a complex number denoted by 〈x,y〉, such that

the following conditions are satisfied:

(C1) 〈x,y〉 ≥ 0, and 〈x,x〉 = 0 if and only if x = 0.

(C2) 〈y,x〉 = 〈x,y〉.(C3) The inner product is a “sesquelinear map”, i.e.

〈a1x1 + a2x2, y〉 = a1〈x1,y〉 + a2〈x2,y〉〈x, b1y1 + b2y2〉 = b1〈x,y1〉 + b2〈x,y2〉.

(Actually the second identity of (C3) above is the consequence of the first, to-

gether with (C2).

Inner products for real vector spaces can be defined in the similar fashion. It is slightly

simpler because there is no need to take complex conjugation. This is simply because the

conjugate of a real number is just itself.

Besides Cn, another example of complex inner product space is given as follows.

Consider a space F of well-behaved complex-valued functions over an interval, say [a, b];

(here we do not specify the technical meaning of being well-behaved). The inner product

〈f, g〉 of f, g ∈ F is given by

〈f, g〉 =1

b− a

∫ b

a

f(t) g(t) dt, for f, g ∈ F .

(On the right hand side, 1/(b− a) is a normalization factor added for convenience in the

future.) The norm induced by this inner product is

‖f‖ ≡ 〈f, f〉1/2 =

(

1

b− a

∫ b

a

|f(t)|2 dt)1/2

for f ∈ F .

In the future we will take F to be the space of trigonometric polynomials and [a, b] is

any interval of length 2π, such as [0, 2π] and [−π, π].

2

1.2. Let V be a complex vector space V with an inner product 〈·, ·〉. We say that

two vectors x and y in V are orthogonal or perpendicular if their inner product

is zero and we write x⊥y in this case. Thus, by our definition here,

x⊥y ⇐⇒ 〈x,y〉 = 0.

From the definition of orthogonality you should recognize that, first, the zero vector 0

is orthogonal to every vector (indeed, for each vector x in V , 〈0,x〉 = 〈0 + 0,x〉 =

〈0,x〉 + 〈0,x〉 by (C3) and hence 〈0,x〉 = 0); second, 0 is the only vector orthogonal to

itself (this follows from (C1)) and hence 0 is the only vector orthogonal to every vector;

third, x⊥y implies y⊥x (indeed, if 〈x,y〉 = 0, then 〈y,x〉 = 〈x,y〉 = 0 = 0).

A set of nonzero vectors S is called an orthogonal system if each vector in S is

orthogonal to all other vectors in S. If, furthermore, each vector in S has length 1,

then S is called an orthonormal system. (Notice the difference of the endings of the

words “orthogonal” and “orthonormal”.) We have the following generalized Pythagoras

theorem: If v1,v2, · · · ,vn are an orthogonal system, then

‖v1 + v2 + · · · + vn‖2 = ‖v1‖2 + ‖v2‖2 + · · · + ‖vn‖2. (1.2.1)

We prove this by induction on n. When n = 1, (5.2) becomes ‖v1‖2 = ‖v1‖2 and there is

nothing to prove. So let n ≥ 2 and assume that the theorem is true for n− 1 vectors. Let

w = v2 + v3 + · · · + vn. Then, by our induction hypothesis, ‖w‖2 =∑n

k = 2 ‖vk‖2. Thus

(1.2.1) becomes ‖v1 + w‖2 = ‖v1‖2 + ‖w‖2 which remains to be verified. Notice that

〈v1,w〉 = 〈v1,v2〉 + 〈v1,v3〉 + · · · · · · + 〈v1,vn〉 = 0.

Hence‖v1 + w‖2 = 〈v1 + w,v1 + w〉

= 〈v1,v1〉 + 〈v1,w〉 + 〈w,v1〉 + 〈w,w〉= 〈v1,v1〉 + 〈v1,w〉 + 〈v1,w〉 + 〈w,w〉= 〈v1,v1〉 + 〈w,w〉 = ‖v1‖2 + ‖w‖2.

Hence (1.2.1) is valid.

Given an orthogonal system E = {e1, e2, . . . , en} in V , and a vector v which can

be written as a linear combination of vectors in B, say

v = v1e1 + v2e2 + · · · + vnen ≡∑n

k = 1vkek,

we look for an explicit expression for the coefficients vk in this linear combination. By

the linearity in the “first slot” of the inner product, we have

〈v, ej〉 =⟨∑n

k = 1vkek, ej

⟩

=∑n

k = 1vk〈ek, ej〉.

3

Note that 〈ek, ej〉 are zeros except when k = j, which gives 1 in this case; (in short,

〈ek, ej〉 = δjk). So the above identity becomes 〈v, ej〉 = vj . Thus

v =∑n

k = 1〈v, ek〉ek = 〈v, e1〉e1 + 〈v, e2〉e2 + · · · + 〈v, en〉en, (1.2.2)

Since ‖〈v, ek〉ek‖ = |〈v, ek〉|‖ek‖ = |〈v, ek〉|, the generalized Pythagoras theorem gives

‖v‖2 = |〈v, e1〉|2 + |〈v, e2〉|2 + · · · · · · + |〈v, en〉|2, (1.2.3)

if v is in the linear span of the orthonormal system E = {e1, e2, . . . , en}. The last

identity is a general fact about orthonormal system that should be kept in mind.

1.3. Next we consider a slightly more general problem: given a vector v in an inner

product space V and a subspace W of V , spanned by a given orthogonal system S =

{w1,w2, . . . ,wr} of nonzero vectors (〈wk,wj〉 = 0 for k �= j and 〈wk,wk〉 �= 0, where k

and j run between 1 and r), find the so-called orthogonal decomposition of v:

v = w + h, (1.3.1)

where w ∈W and h ⊥W (that is, h is perpendicular to all vectors in W ). The vector w

here will be called the (orthogonal) projection of v onto W . Since w is in W and W is

spanned by w1,w2, . . . ,wr, we can write

w = a1w1 + a2w2 + · · · + arwr. (1.3.2)

We have to find a1, a2, . . ., ar. Identity (1.3.1) can be rewritten as

v = w + h =∑r

k = 1akwk + h.

Take any vector from w1,w2, . . . ,wr, say wj, and form the inner product of wj with each

side of the above identity. By the linearity of the “first slot” of inner product, we have

〈v,wj〉 =∑r

k = 1 ak〈wk,wj〉 + 〈h,wj〉. Note that 〈wk,wj〉 are zeros except when k = j.

Hence∑r

k = 1 ak〈wk,wj〉 can be reduced to aj〈wj ,wj〉. On the other hand, 〈h,wj〉 = 0

because h is perpendicular to W and wj is in W . Thus we arrive at 〈v,wj〉 = aj〈wj ,wj〉,or aj = 〈v,wj〉/〈wj,wj〉. Substitute this expression of aj to (1.3.2), switching the index

j to k, to obtain:

w =∑r

k = 1

〈v, wk 〉〈wk,wk〉

wk ≡ 〈v, w1〉〈w1,w1〉

w1 +〈v, w2〉〈w2,w2〉

w2 + · · · +〈v, wr〉〈wr,wr〉

wr, (1.3.3)

which is the required projection.

4

Now we consider two special cases: The first case is that S consists of a single (nonzero)

vector, say u. Write down the orthogonal decomposition

v =〈v,u〉〈u,u〉u + h where h ⊥ u.

The generalized Pythagoras theorem gives ‖v‖2 =∣

∣〈v,u〉/〈u,u〉∣

∣2 ‖u‖2 + ‖h‖2. Using

〈u,u〉 = ‖u‖2, we rewrite the first term on the right-hand side as |〈v,u〉|2/‖u‖2. Then

we show our generosity by dropping the second term ‖h‖2 on the right to obtain the

inequality ‖v‖2 ≥ |〈v,u〉|2/‖u‖2. We can rearrange this into

|〈v,u〉| ≤ ‖v‖‖u‖, (1.3.4)

which is the celebrated Cauchy–Schwarz’s inequality.

The second special case is that S consists of an orthonormal system, say S =

{e1, e2, . . . , en}. In this case

w =∑n

k = 1〈v, ek〉ek with ‖w‖2 =

∑n

k = 1|〈v, ek〉|2.

The orthogonal decomposition v = w + h tells us that ‖v‖2 = ‖w‖2 + ‖h‖2. Dropping

‖h‖2, we get ‖v‖2 ≥ ‖w‖2, or ‖w‖2 ≤ ‖v‖2. We have arrived at

∑n

k = 1|〈v, ek〉|2 ≤ ‖v‖2. (1.3.5)

Notice that this inequality also holds for an infinite orthonormal system, say {ek}∞k = 1.

Indeed, for any n, applying this inequality to the finite system {ek}nk = 1, we get (1.3.5)

above. Letting n→ ∞, we obtain

∑∞

k = 1|〈v, ek〉|2 ≤ ‖v‖2

which is usually called Bessel’s inequality.

1.4. In the present section, we give some examples of orthonormal systems.

Example 1.4.1. In Cn, the standard basis consisting of vectors

e1 = (1, 0, . . . , 0, 0), e2 = (0, 1, . . . , 0, 0), . . . en = (0, 0, . . . , 0, 1)

clearly form an orthonormal basis.

5

Example 1.4.2.* Fix a positive number n and let ω = e2πi/n, which is called a

primitive nth root of unity. Consider the following vectors in Cn:

fk =1√n

(

1, ωk−1, ω2(k−1), . . . , ω(n−1)(k−1))

; 1 ≤ k ≤ n.

We write down the first three of them to see the general pattern:

f1 = (1, 1, 1, . . . , 1)/√n,

f2 = (1, ω, ω2, . . . , ωn−1)/√n,

f3 = (1, ω2, ω4, . . . , ω2(n−1))/√n,

We claim that fk (1 ≤ k ≤ n) form an orthonormal basis in Cn. First we check that

they are unit vectors:

‖fk‖2 =1

n

(

12 + |ωk−1|2 + |ω2(k−1)|2 + · · · + |ω(n−1)(k−1)|2)

= 1

in view of |ω| = 1. Next we show that, for k �= ℓ, 〈fk, fℓ〉 = 0. For definiteness, let us

assume 1 ≤ ℓ < k ≤ n. By using ω = ω−1, we get

〈fk, fℓ〉 =(

1 + ωk ωℓ + ω2k ω2ℓ + · · · + ωk(n−1) ωℓ(n−1))

/n

=(

1 + ωk−ℓ + ω2k−2ℓ + · · · + ω(k−ℓ)(n−1))

/n

= (1 + η + η2 + · · · + ηn−1)/n,

where η = ωk−ℓ. Now

(1 − η)(1 + η + η2 + · · · + ηn−1) = 1 − ηn = 1 − ω(k−ℓ)n = 1 − (ωn)k−ℓ = 1 − 1 = 0.

Since 0 < k− ℓ < n, η ≡ ωk−ℓ �= 1, or 1− η �= 0. Hence 1 + η+ η2 + · · ·+ ηn−1 = 0. Now

〈fk, fℓ〉 = 0 is clear. This example will be referred to in the next section when we discuss

the finite Fourier transform (in Example 2.7.1).

Example 1.4.3*. Consider the space of all periodic functions of period 2π. The

inner product of two such functions f and g is defined to be

〈f, g〉 =1

2π

∫ 2π

0

f(t) g(t) dt.

We claim that the system eint (−∞ < n < ∞), where n ranges over all integers, is

orthonormal. First we check that each of them is of unit length:

‖eint‖2 ==1

2π

∫ 2π

0

|eint|2 dt =1

2π

∫ 2π

0

1 dt = 1.

6

Next, for n �= m, we have

〈eint, eimt〉 =1

2π

∫ 2π

0

eint eimt dt =1

2π

∫ 2π

0

ei(n−m)t dt

=1

n−mei(n−m)t

∣

∣

∣

2π

0=

1

2π(1 − 1) = 0.

The orthogonal decomposition of a function f in this space gives its Fourier series:

f(t) =∑

−∞ <n<∞cne

int, where cn =1

2π

∫ 2π

0

f(t)e−int dt.

Bessel’s inequality says

∑

−∞ <n<∞|cn|2 ≤ 1

2π

∫ 2π

0

|f(t)|2 dt,

showing that the infinite sum on the left hand side is a finite number. This fact is often

stated as follows: the sequence of Fourier coefficients is square summable.

Example 1.4.4*. Consider the space of all even functions of period 2π. The inner

product of two such functions f and g is defined to be

〈f, g〉 =1

π

∫ π

0

f(t) g(t) dt.

Then the following functions

1,√

2 cos t,√

2 cos 2t,√

2 cos 3t, . . .

form an orthonormal system of this space. To show this, we need to check

2

π

∫ π

0

cosmt cosnt dt = δmn =

{

1 if m = n0 if m �= n

(1.4.1)

which is left to the reader as an exercise.

Example 1.4.5*. In this example we introduce the so-called Chebyshev’s polynomi-

als Tn(x) and Un(x), which have extensive applications in numerical analysis and some

extremal problems arising from electrical engineering. First we observe, from Euler’s iden-

tity,

cosnt+ i sinnt = eint = (eit)n = (cos t+ i sin t)n. (1.4.2)

7

We may try to use the binomial formula to expand the right hand side of (1.4.2) and if we

are patient enough, we can see the following pattern

eint = Tn(x) + iUn−1(x) sin t with x = cos t, (1.4.3)

where Tn and Un−1 are some polynomials of degrees n and n − 1 respectively.

However we can use induction to verify (1.4.3). When n = 1, we simply put T1(x) = x

and U0(x) = 1. Assume the validity for n = k. Then

ei(k + 1)t = eikteit = (Tk(x) + iUk−1(x) sin t)(cos t+ i sin t)

= Tk(x) cos t− Uk−1(x) sin2 t+ i(Tk(x) sin t+ Uk−1(x) sin t cos t)

= Tk(x)x+ Uk−1(x)(1 − x2) + i(Tk(x) + Uk−1(x)x) sin t.

Thus we have ei(k + 1)t = Tk + 1(x) + iUk(x) sin t, where

Tk + 1(x) = Tk(x)x+ (1 − x2)Uk−1(x) Uk(x) = Tk(x) + xUk−1(x).

The last two identities tells us how to generate polynomials Tn(x) and Un(x) recursively.

Comparing the real parts of both sides of (1.4.3), we get cosnt = Tn(x) = Tn(cos t). The

orthogonality relation (1.4.1) in the last example can be rewritten as

2

π

∫ π

0

Tm(cos t) Tn(cos t) dt = δmn

Now apply the change of variable x = cos t. Notice that cos 0 = 1, cosπ = −1 and

dx = − sin t dt, which gives dt = −dx/ sin t = −dx/√

1 − cos2 t = −dx/√

1 − x2; (notice

that, for 0 ≤ t ≤ π, we have sin t ≥ 0) Observe that, when t suns from 0 to π, cos t drops

from 1 to −1. Thus we have

2

π

∫ 1

−1

Tm(x)Tn(x)dx√

1 − x2= δmn.

This shows that if we define the inner product of two polynomial functions f and g by

〈f, g〉 =

∫ 1

−1

f(x)g(x)dx√

1 − x2,

then the Chebyshev’s polynomials Tn(x) (n = 1, 2, 3, . . .) form an orthonormal system.

1.5. Given a list of linearly independent vectors v1, v2, . . . , vn in an inner product

space V , there is a procedure of constructing an orthonormal system e1, e2, . . . , en,

called the Gram-Schmidt process, with the property that

span {v1,v2, . . . ,vk} = span {e1, e2, . . . , ek}

8

for each k = 1, 2, . . . , n. To make things easier, let us describe how to construct an

orthogonal basis b1,b2, . . . ,bn with the similar property. After this we normalize b’s to

get e’s — a simple finishing touch.

We construct b’s in n steps: the kth step is the one to obtain bk, (1 ≤ k ≤ n). The

first step is the easiest one: just take v1 to be b1. Now suppose that the (k − 1)th step

has been accomplished: we have obtained an orthogonal system {b1,b2, . . . ,bk−1} which

spans the same subspace as {v1,v2, . . . ,vk−1} does, say Wk−1. Consider the vectors

b1,b2, . . . ,bk−1,vk,vk + 1, . . . ,vn.

Let wk be the projection of vk onto the subspace Wk−1, which is given by

w k =〈vk,b1〉〈b1,b1〉

b1 +〈vk,b2〉〈b2,b2〉

b2 + · · · +〈vk, bk−1〉〈bk−1,bk−1〉

bk−1

according to (1.3.3). Now let bk = vk −wk. Then bk ⊥Wk−1, and hence b1,b2, . . . ,bk

form an orthogonal system. Also, from the fact that wk is in Wk−1, we can see that

the set {b1,b2, . . . ,bk} spans the same subspace as {v1,v2, . . . ,vk} does; (this subspace

should be denoted by Wk.) As we have mentioned before, once we get the orthogonal basis

b1,b2, . . . ,bn, the required orthonormal basis e1, e2, . . . , en can be obtained immediately

by normalization:

e1 =b1

‖b1‖, e2 =

b2

‖b2‖, · · · , en =

bn

‖bn‖.

This process of Gram-Schmidt is more or less a way to turn a given bunch of vectors, one

by one, progressively, to“straighten them up”. Each time, you turn a vector to make it

orthogonal to all the previous vectors which have been “straightened up”. In this way

v1,v2, . . . ,vn is gradually replaced by b1,b2, . . . ,bn, one vector at each time.

Example 1.5.1. Apply Gram–Schmidt process to the basis consisting of v1 =

(1, 1, 1), v2 = (2, 0, 1) and v3 = (0, 0, 3) in C3 to obtain an orthonormal basis.

Solution. Let b1 = v1 = (1, 1, 1),

b2 = v2 −〈v2,b1〉〈b1,b1〉

b1 = (2, 0, 1) − 3

3(1, 1, 1) = (1,−1, 0),

b3 = v3 −〈v3,b1〉〈b1,b1〉

b1 −〈v3,b2〉〈b2,b2〉

b2 = (0, 0, 3) − 3

3(1, 1, 1) − 0

3(1,−1, 0) = (−1,−1, 2).

Upon normalization, we obtain the following orthonormal basis:

e1 =

(

1√3,

1√3,

1√3

)

, e2 =

(

1√2, − 1√

2, 0

)

, e3 =

(

− 1√6, − 1√

6,

2√6

)

.

9

We can use the Gram-Schmidt process to prove that every finite dimensional vector space

has an orthonormal basis. Indeed, if V is a finite dimensional space, either over R or over

C, we can take any basis in V and apply Gram-Schmidt process to this basis to obtain an

orthonormal basis of V .

EXERCISE SET IV.1.

Review Questions. What is the main difference between a complex and a real inner

product space? What is the orthognal projection on to a subspace? How to compute this

when an orthogonal basis of this subspace is given? What is Gram-Schmidt’s process?

What is it good for?

Drills

1. In each of the following cases, find the inner product 〈u,v〉 and the norms |u| and |v|of vectors u and v in C3 (with the standard inner product.)

(a) u = (1, i, 2), v = (−2, i, 1).

(b) u = (i, 2,−2), v = (2, 2i, i).

(c) u = (1 +√

3i, 1, 1 −√

3i), v = (1 −√

3i, 1 +√

3i, 1).

(d) u = (i cosα, sinα, cosα+ i sinα), (cosβ, i sinβ, sinβ + i cosβ).

2. In each of the following cases, find the orthogonal projection of a vector u in an inner

product space to the 1-dimensional subspace spanned by v.

(a) V = R2; u = ( 1√2,− 1√

2) and v = (1, 0).

(b) V = R3; u = (2, 1, 1) and v = (1, 2, 3).

(c) V = C2; u = (1 + i, 1 − i) and v = (1, i).

(d) V = C3; u = (2, 1, 3) and v = (1, 1, i).

(e) V is the space of continuous functions on [0, 1] with the inner product 〈f, g〉 =∫ 1

0f(x)g(x)dx; u is the function f(x) ≡ 1 and v is g(x) = eiπx. (Hint: Use the

identities ez = ez and∫

eaxdx = 1ae

ax +C, where a �= 0.)

(f) Same V as in (e); u is the function h(x) = e(2πi)x and v is k(x) = e(4πi)x.

3. In each of the following cases, find the projection of a vector u in an inner product

space V onto the subspace spanned by v1,v2.

(a) V = R2; u = (your age, your weight), v1 = (1, 0) and v2 = (1, 1).

(b) V = R3; u = (3, 1, 1), v1 = (1,−1, 0) and v2 = (1, 1, 2). (Notice that v1 ⊥ v2.)

(c) V = R3; u = (2, 3, 1), v1 = (1, 2, 0) and v2 = (4, 7, 0). (Hint: Determine the

subspace spanned by v1 and v2 first.)

(d) V = C3; u = (3i, 1,−1), v1 = (1, i, i) and v2 = (2i, 1, 1). (Notice that v1 ⊥ v2)

10

4. True or false (u and v are vectors in a complex inner product space V , M and N are

subspaces of V and z is a complex number; all of them are arbitrary):

(a) 〈u, zv〉 = 〈zu,v〉.(b) 〈zu, zv〉 = |z|2〈u,v〉.(c) 〈u,v〉〈v,u〉 = |〈u,v〉|2.

(d) If the identity 〈u,v〉 = 〈v,u〉 holds, then 〈u,v〉 must be a real number.

(e) If u1 ⊥ v1 and u2 ⊥ v2, then u1 + u2 is orthogonal to v1 + v2.

(f) If u1 ⊥ v and u2 ⊥ v, then u1 + u2 is orthogonal to v.

(g) If u ⊥ v, then v ⊥ u.

(h) If u ⊥ v and v ⊥ w, then u ⊥ w.

(i) If the orthogonal projections of u and v on the subspace M (of an inner product

space V ) are the same, then u− v is orthogonal to M .

(j) The Gram-Schmidt orthogonalization process is a process to construct an in-

ner product on a vector space such that a given basis in this space becomes an

orthonormal basis.

5. In each of the following parts, apply the Gram-Schmidt orthogonalization process to

the given linearly independent set of vectors (in the given order) in C4.

(a) (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1).

(b) (1, 1, 1, 1), (1, 1, 1, 0), (1, 1, 0, 0).

(c) (1, 1, 1, 1), (1, 0, 1, 0), (0, 0, 1, 1), (0, 0, 0, 1).

(d) (1, i, i, i), (1, i, i, 0), (1, i, 0, 0), (1, 0, 0, 0).

(e) (0, 0, 2, 0), (1, 0, 4, 0), (5, 2, 0, 1).

Exercises

1. Let V be a complex inner product space and denote by VR the real vector space

obtained from V by restricting scalars to R.

(a) Show that the recipe 〈u,v〉 R = Re〈u,v〉 defines an inner product for the real

space V R . (Rez and Imz stand for the real part and the imaginary part, respec-

tively, of a complex number z.)

(b) Show that the recipe 〈u,v〉I = Im〈u,v〉 does not give an inner product for VR .

(c) Check the identity 〈u,v〉 = 〈u,v〉 R + i〈u, iv〉 R .

2. Let E = {e1, e2, . . . , en} be an orthonormal basis of a complex inner product space V .

Show that, if [v]E = (v1, v2, . . . , vn) and [w]E = (w1, w2, . . . , wn) for v,w ∈ V , then

〈v,w〉 = v1w1 + v2w2 + · · · + vnwn.

3. Let u and v be two vectors in a complex inner product space.

11

(a) Show that |u + v|2 = |u|2 + |v|2 + 2Re〈u,v〉.(b) From the above identity derive that 4Re〈u,v〉 = |u + v|2 − |u− v|2.

(c) Show that the imaginary part of 〈u,v〉 is Re〈u, iv〉. Use this fact and (c) to

derive the following polarization identity for complex inner product spaces:

〈u,v〉 =1

4(|u + v|2 − |u− v|2 + i|u+ iv|2 − i|u− iv|2).

(A neat way to rewrite this is 〈u,v〉 = 14

∑3k = 0i

k|u + ikv|2.)

4. Let v1 and v2 be linearly independent vectors in a real inner product space V . Show

that the area A of the parallelogram stretched by v1 and v2 is equal to the square

root of ∣

∣

∣

∣

〈v1,v1〉〈v1,v2〉〈v2,v1〉〈v2,v2〉

∣

∣

∣

∣.

(You also have to explain why the above determinant cannot be negative so that we

can take its square root.) Hint: Write v2 = w + h, where w is the projection of v2

onto v1. Then A2 = |v1|2|h|2.

12

§2. Operators on Inner Product Spaces

2.1. In this section we consider operators on a finite dimensional inner product space

over either R or C. Because of an extra structure on the vector space, namely, the inner

product, these operators have a new feature called adjoint, which behaves like the complex

conjugation for complex numbers. For notational convenience, we limit our discussion to

operators on inner product spaces, although all material from §2.1 to §2.3 are applicable

to linear mappings between inner product spaces.

Let T be a linear operator on a (finite dimensional) inner product space V , either real

or complex. The adjoint of T , denoted by T ∗, is the linear operator on the same space V

such that the identity

〈Tx,y〉 = 〈x, T ∗y〉 (2.1.1)

holds for all vectors x and y in V . At the outset it is not clear whether such T ∗ exists, and

if it does exist, it is not clear if it is uniquely determined by T . We have to establish the

existence and uniqueness of the operator T ∗ which satisfies (2.1.1) in order to justify this

definition. Aside: The definition of adjoint here is unusual. Instead of telling us exactly

what T ∗ is, it singles out the most desirable property of T ∗. We may call it a “priority

definition”, because this desirable property here has priority over anything else. “Priority

definitions” are not rare in mathematics. To justify the above “priority definition” for the

adjoint of an operator, we must prove the following statement:

(*) For every linear operator T on V , there is a unique linear operator S on V such

that 〈Tx,y〉 = 〈x, Sy〉 for all x,y ∈ V .

Once this statement is proven, we can define T ∗ to be the unique operator S described in

this statement. The “uniqueness” part is easier to prove. Assume that both S1 and S2 have

the same property as S as described, namely 〈Tx,y〉 = 〈x, S1y〉 and 〈Tx,y〉 = 〈x, S2y〉for all x and y in V . Let R = S1−S2. Then 〈x, Ry〉 = 〈x, S1y〉−〈x, S2y〉 = 0 for arbitrary

x,y in V . Thus, for every y in V , Ry is orthogonal to all vectors in V and hence Ry = 0.

Therefore R = O, or S1 = S2.

The proof of the existence part of (∗) is based on the following lemma, which is a

“baby version” of famous theorem called the Riesz representation theorem.

Lemma 2.1.1. If φ is a linear functional of V (that is, φ is in V ′) then there exists

a unique vector a in V such that φ(x) = 〈x, a〉.

Take an orthonormal basis E = {e1, e2, . . . , en} in V . (The last remark in the last section

gaurantees its existence.) Then, for each vector x in V , we have x =∑n

k = 1〈x, ek〉ek (see

13

identity (1.2.2) in the last section) and hence

φ(x) = φ(∑n

k = 1〈x, ek〉ek

)

=∑n

k = 1〈x, ek〉φ(ek) =

∑n

k = 1〈x, ek〉φ(ek)

=∑n

k = 1〈x, φ(ek)ek〉 =

⟨

x,∑n

k = 1φ(ek)ek

⟩

.

Hence φ(x) = 〈x,a〉 where a =∑n

k = 1φ(ek)ek. The uniqueness of a is left for you to check

as an exercise.

Now we return to the proof of the existence part of (∗). Take any y in V and consider

the linear functional φy defined by putting φy (x) = 〈Tx,y〉. By the above lemma we know

that there exists a unique vector determined by y, say Sy such that φ y (x) = 〈x, Sy〉. Thus

we have 〈Tx,y〉 = 〈x, Sy〉 for all x in V . The linearity of S is left for you to check as an

exercise. Thus S is the required operator T ∗.

We consider the matrix representation [T ] E of an operator T with an orthonormal

basis E = {e1, e2, . . . , en} in V . The first column of [T ]E is filled with the coordinates of

Te1. Since E is an orthonormal basis, we have (see (1.2.2) in the last section)

Te1 = 〈Te1, e1〉e1 + 〈Te1, e2〉e2 + · · · + 〈Te1, en〉en.

Hence the first column of [T ] E is filled with 〈Te1, e1〉, 〈Te1, e2〉, etc. The same method

allows us to figure out other columns. We arrive at:

[T ]E =

〈Te1, e1〉〈Te2, e1〉 · · · 〈Ten, e1〉〈Te1, e2〉〈Te2, e2〉 · · · 〈Ten, e2〉

......

...

〈Te1, en〉〈Te2, en〉 · · · 〈Ten, en〉

.

The (j, k)-entry of [T ]E , denoted by tjk, is given by

tjk = 〈Tek, ej〉.

Reversing the order of j, k looks awkward, but things turn out that way and we cannot

help it. Now the (k, j)–entry of [T ∗]E , denoted by t∗kj , is given as follows:

t∗kj ≡ 〈T ∗ej , ek〉 = 〈ej , T ∗ek〉 = 〈Tej , ek〉 = tjk.

Thus the matrix of T ∗ relative to E is the conjugate transpose of the matrix of T relative

to E. We also call the conjugate transpose of a matrix A the adjoint of A and denote it

14

by A∗. Thus A∗ = A⊤

. We have shown that “the matrix of the adjoint of T is the adjoint

of the matrix of T” relative to any orthonormal basis E :

[T ∗] E = [T ]∗E (2.1.2)

We give some quick examples of adjoints of matrices as follows:

[

2i 1 − i2i 1 + i

]∗=

[

−2i −2i1 + i 1 − i

]

,

[

z ww z

]∗=

[

z ww z

]

.

Example 2.1.1. Let E = {e1, e2, . . . , en} be an orthonormal basis of a inner product

space. By the forward shift relative to this basis we mean the operator S satisfying

Se1 = e2, Se2 = e3, . . . , Sen = 0.

What is its adjoint S∗ ? Well, the representing matrix of S relative to E is

[S] E =

0 0 0 · · · 0 0 01 0 0 · · · 0 0 00 1 0 · · · 0 0 00 0 1 · · · 0 0 0

0 0 0 · · · 1 0 00 0 0 · · · 0 1 0

with [S]∗E =

0 1 0 · · · 0 0 00 0 1 · · · 0 0 00 0 0 · · · 0 0 00 0 0 · · · 0 0 0

0 0 0 · · · 0 0 10 0 0 · · · 0 0 0

.

From [S∗] E = [S]∗E we know

S∗e1 = 0, S∗e2 = e1, S∗e3 = e2, . . . . . . , S

∗en = en−1.

Naturally, S∗ is called the backward shift relative to E.

Example 2.1.2. An operator D on V is diagonal relative to the orthonormal basis

E = {e1, . . . , en} if there are scalars λ1, λ2, . . . , λn such that Dek = λkek for all k =

1, 2, . . . , n. In this case the representing matrix of D relative to E is a diagonal matrix

[D]E =

λ1 0 . . . 00 λ2 . . . 0...

0 0 . . . λn

with [D]∗E =

λ1 0 . . . 00 λ2 . . . 0...

0 0 . . . λn

.

The representing matrix of the adjoint D∗ is also a diagonal matrix, obtained by replacing

each entry on the main diagonal by its complex conjugate. Therefore D∗ is also a diagonal

15

operator relative to the basis E with D∗ek = λkek for k = 1, 2, . . . , n. Notice that, in case

the scalars λ1, λ2, . . . , λn are real numbers, D and D∗ have the same representation matrix

relative to E and hence D = D∗. In this case D is called a Hermitian operator or a

self–adjoint operator.

2.2. The following elementary properties about adjoints should be kept in mind:

(S + T )∗ = S∗ + T ∗, (αS)∗ = αS∗, (ST )∗ = T ∗S∗,

T ∗∗ = T, O∗ = O, I∗ = I,(2.2.1)

where S and T are operators on a (finite dimensional) inner product space, and α is an

arbitrary scalar. One way to prove these identities is by definition. For example, to show

(ST )∗ = T ∗S∗, we only have to check the identity 〈STx,y〉 = 〈x, T ∗S∗y〉. This is an easy

thing to do, provided you understand the definition of adjoint:

〈x, T ∗S∗y〉 = 〈x, T ∗(S∗y)〉 = 〈Tx, S∗y〉 = 〈STx,y〉.

In the special case V = Cn or Rn with the standard inner product, every linear operator

on V is induced by a matrix, i.e. all linear operators are of the form MA for some n × n

matrix A. The verification of the following proposition is left to you as an exercise.

Proposition 2.2.1. If T is induced by A, then T ∗ is induced by A∗, i.e. M∗A = MA∗ .

(The adjoint of the induced operator is the induced operator of the adjoint.)

We have seen some advantages of studying matrices by investigating the operators induced

by them. We will discover that the above simple fact is very handy in treating matrix

problems by this approach.

2.3. Let M be a subspace of an innr product space V over C or R. The orthogonal

complement of M , denoted by M⊥, is the set of vectors in V perpendicular to all vectors

in M . Thus v is in M⊥ if v ⊥ M , that is, v ⊥ x for all vectors x in M . Using set–

theoretical notation, we can write

M⊥ = {v ∈ V : 〈v,x〉 = 0 for all x ∈M}.

If x is in both M and M⊥, then we have 〈x,x〉 = 0 and hence x = 0. Thus M∩M⊥ = {0}.

On the other hand, for any vector v in V , we have the orthogonal decomposition v = w+h

with w ∈ M and h ∈ M⊥; see §1.3 of the last section. This shows V = M + M⊥. It

follows from Theorem 2.3.2 in Chapter II that

dimV = dimM + dimM⊥. (2.3.1)

16

It is clear from the definition of orthogonal complement that M is contained in M⊥⊥. On

the other hand, the above identity tells us that M and M⊥⊥ have the same dimension.

Hence M⊥⊥ = M .

For T ∈ L (V ), where V is an inner product space over C or R, we have

Theorem 2.3.1. The kernel of T ∗ is the orthogonal complement of the range of T :

kerT ∗ = T (V )⊥.

Remark: Since T ∗∗ = T and W⊥⊥ = W for a subspace W of V , we can deduce from this

theorem that kerT = T ∗(V )⊥, T (V ) = (kerT ∗)⊥, and T ∗(V ) = (kerT )⊥.

The proof of this important theorem is short and neat:

v ∈ T (V )⊥ ⇔ 〈v, Tx〉 = 0 for all x ∈ V⇔ 〈T ∗v,x〉 = 0 for all x ∈ V⇔ T ∗v = 0

⇔ v ∈ kerT ∗.

The proof is complete.

We give two interesting applications of the above theorem. In the first application

we use it to prove “row rank = column rank”. First we do this for a real, square matrix,

say A of size n × n. Consider the operator T on Rn induced by A, i.e. T = MA. The

column rank of A is the dimension of the subspace spanned by column vectors of A and

this subspace is just T (V ). By the above theorem,

dim kerT ∗ = dimT (V )⊥ = n− dimT (V ). (2.3.2)

The last identity follows (2.3.1). On the other hand,

dim kerT ∗ + dimT ∗(V ) = n. (2.3.3)

From (2.3.2) and (2.3.3) we obtain dimT (V ) = dimT ∗(V ). However T ∗ is the operator

induced by the matrix A∗, which is the transpose A⊤ of A (because A is real). So the

last identity tells us that the column ranks of A and A⊤ are the same. But the column

rank of A⊤ is just the row rank of A! So the staement is proven for a real square matrix.

What can we do if A is not real? In this case we work with Cn instead of Rn. The same

argument allows us to conclude A and A∗ have the same column rank. But here we have a

small trouble: A∗ is the conjugate transpose of A, instead of the transpose A⊤. However,

17

observe that the column rank of a matrix does not change if we replace all entries by their

complex conjugates. So, this “small trouble” is in fact not a trouble. What can we do if

the given matrix is not a square matrix? In this case we “augment” this matrix with more

rows or columns of zeros to convert it into a square matrix. The row rank and the column

rank of the enlarged matrix clearly remain the same. Thus we have proved “row rank =

column rank” in its full generality.

The next application is about the least square approximation. Let T be a linear

operator on a finire dimensional inner product space V and let b be a vector not in its

range T (V ). Consider the following “ill–posed problem”: solve Tx = b. Since b is

not in T (V ), this equation has no solution. The best we can do is to find some x so that

the difference between Tx and b is minimized. So now we are asking to find a vector x0

at which ‖Tx − b‖ is minimized. The minimization requirement tells us that y0 = Tx0

is, among all vectors in the subspace T (V ), the one nearest to b. So y0 − b must be

perpendicular to the subspace T (V ). Theorem 2.3.1 tells us that y0 −b is in the kernel of

T ∗, that is T ∗(y0 − b) = 0, or T ∗(Tx0 − b) = 0, that is

T ∗Tx0 = T ∗b. (2.3.4)

The argument here can be reversed: if (2.3.4) holds, then y0−b ⊥ T (V ). We have proved

that the least square solutions to Tx = b are the same as the solutions to T ∗Tx0 = T ∗b.

Example 2.3.2. Find the least square solution(s) to x1 − 2x2 = 1, −x1 + 2x2 = 3.

Solution. Write the system of equations as Ax = b with

A =

[

1 −2−1 2

]

, x =

[

x1x2

]

, b =

[

13

]

.

It is easy to see that this is an ill–posed problem and hence we should look for the least

square solution(s) by solving A∗Ax = A∗b. Now

A∗A =

[

1 −1−2 2

] [

1 −2−1 2

]

=

[

2 −4−4 8

]

, A∗b =

[

1 −1−2 2

] [

13

]

=

[

−24

]

.

Thus A∗Ax = A∗b becomes 2x1 − 4x2 = −2, −4x1 + 8x2 = 4, giving us x1 − 2x2 = −1.

Introducing the parameter t = x2, we can write down the solutions as x1 = 2t−1, x2 = t.

Example 2.3.3. Find the least square solution(s) to ix1 + x2 = 2, x1 + ix2 = −2,

x1 + x2 = 4.

18

Solution. Write the system of equations as Ax = b with

A =

i 11 i1 1

, x =

[

x1x2

]

, b =

2−2

4

.

We look for the least square solution(s) by solving A∗Ax = A∗b. Now

A∗A =

[

−i 1 11 −i 1

]

i 11 i1 1

=

[

3 11 3

]

, A∗b =

[

−i 1 11 −i 1

]

2−2

4

=

[

2 − 2i6 + 2i

]

.

Thus A∗Ax = A∗b becomes 3x1 + x2 = 2 − 2i, x1 + 3x2 = 6 + 2i, giving us x1 = −i and

x2 = 2 + i.

2.4. An operator H on a complex inner space V is called a self-adjoint operator,

or a Hermitian operator, if H = H∗. In the same way, we call a square matrix A a

self-adjoint matrix or a Hermitian matrix if A = A∗, i.e. if A is equal to its conjugate

transpose. Thus, a 2 × 2 Hermitian matrix must have the form

[

c a+ bia− bi d

] (

such as

[

4 2 + 3i2 − 3i 7

])

,

where a, b, c, d are real numbers.

Notice that a complex number z is real if and only if z = z, which is the one-

dimensional version of the identity T = T ∗. Hence the situation of Hermitian operators

among other operators resembles that of real numbers among complex numbers.

Example 2.4.1. Verify that eigenvalues of Hermitian operators are real.

Solution. Let T be a Hermitian operator on V and let λ be an eigenvalue for T . Then

there is a nonzero vector v in V such that Tv = λv. So 〈Tv,v〉 = 〈λv,v〉 = λ〈v,v〉. On

the other hand,

〈Tv,v〉 = 〈v, T ∗v〉 = 〈v, Tv〉 = 〈v, λv〉 = λ 〈v,v〉.

Hence λ 〈v,v〉 = λ 〈v,v〉. As v �= 0, we have 〈v,v〉 = ‖v‖2 �= 0 and hence 〈v,v〉 of the

last identity can be canceled. Thus λ = λ. Therefore λ is real.

We consider a similar concept for the real case. An operator T on a real inner product

space V satisfying T = T ∗ is called a symmetric operator. A real square matrix A is called

a symmetric matrix if A = A⊤. The representation matrix of a symmetric operator relative

19

to an orthonormal basis is symmetric. A real symmetric matrix is clearly a Hermitian and

hence its eigenvalues are real. We can translate this statement into an assertion about

symmetric operators on real inner product space:

Proposition 2.4.1. If T is a symmetric operator on a real, finite dimensional, inner

product space, then T has a real eigenvalue λ and consequently ker(T − λI) �= 0.

2.5. An operator T defined on a (finite dimensional) real inner product space V is an

orthogonal operator if it preserves the inner product of V , i.e.

〈Tx, Ty〉 = 〈x,y〉 (2.5.1)

We can rewrite the above identity as 〈x, T ∗Ty〉 = 〈x,y〉. This gives 〈x, (T ∗T − I)y〉 = 0

for all x and y in V . We deduce that T ∗T − I = O, or T ∗T = I, i.e. T is invertible

and its inverse is T ∗. By reversing the above argument, we can show that, conversely, if

T−1 = T ∗, then T is orthogonal. We conclude: A linear operator T on a real inner product

space is orthogonal if and only if T ∗T = TT ∗ = I.

An orthogonal matrix is a real square matrix A satisfying AA⊤ = A⊤A = I.

The representing matrix of an orthogonal operator relative to an orthonormal basis is an

orthogonal matrix. (Verify this statement!)

By letting y = x in identity (2.5.1), we have 〈Tx, Tx〉 = 〈x,x〉, or ‖Tx‖2 = ‖x‖2.

Hence we have ‖Tx‖ = ‖x‖ for all x ∈ V . In other words, an orthogonal operator preserves

the norm. It turns out that the converse of this statement is also true:

Proposition 2.5.1. A norm-preserving linear operator is an orthogonal operator.

To prove this fact, we have to express the inner product of two vectors in terms of the

norms of certain linear combinations of them, called the polarization identity:

4〈x,y〉 = ‖x + y‖2 − ‖x− y‖2. (2.5.2)

To prove (2.5.2), we begin with the elementary identity ‖v‖2 = 〈v,v〉 which holds for all

vectors v in an inner product space. Applying this identity for v = x + y, we have

‖x+y‖2 = 〈x+y,x+y〉= 〈x,x〉+〈x,y〉+〈y,x〉+〈y,y〉= ‖x‖2+2〈x,y〉+‖y‖2.

(2.5.3)

Letting v = x− y instead, we will get a similar result:

‖x− y‖2 = ‖x‖2 − 2〈x,y〉 + ‖y‖2. (2.5.4).

20

Now you can see that the polarization identity (2.5.2) is obtained by subtracting (2.5.4)

from (2.5.3) and a simple rearrangement of sides.

From the polarization identity (2.5.2) we can deduce Poposition 2.5.1 stated above.

Indeed, if T is a linear operator on V satisfying ‖T (x)‖ = ‖x‖ for all x ∈ V , then

4〈Tx, Ty〉 = ‖Tx + Ty‖2 − ‖Tx− Ty‖2

= ‖T (x + y)‖2 − ‖T (x− y)‖2

= ‖x + y‖2 − ‖x + y‖2 = 4〈x,y〉.

Canceling 4, we get the required identity which characterizes orthogonal operators.

Let σ be a permutation of the set {1, 2, . . . , n}. Then Tσ on Rn defined by

Tσ(x1, x2, . . . , xn) = (xσ(1), xσ(2), . . . , xσ(n))

is an orthogonal operator, because the sum of squares of x1, x2, . . . , xn remains the same if

their order is changed. For example, if σ sends 1, 2, 3, 4, 5 to 4, 1, 5, 2, 3 respectively,

then Tσ(x1, x2, x3, x4, x5) = (x4, x1, x5, x2, x3). Another example of orthogonal operators

is the operator MA on R2 (with the standard inner product) induced by

A =

[

cos θ − sin θsin θ cos θ

]

,

where θ is a fixed real number. You can check directly that A is an orthogonal matrix.

From this you may conclude that MA is an orthogonal operator.

Example 2.5.2. By an orthogonal projection we mean a self–adjoint projection.

Thus, if P is an orthogonal projection, then P 2 = P and P ∗ = P . Verify the following

assertion: if P is an orthogonal projection, then I − 2P is an orthogonal operator.

Solution. Since P is an orthogonal projection, we have P 2 = P and P ∗ = P . So

(I−2P )∗(I−2P ) = (I−2P )(I−2P ) = I−2P−2P+4P 2 = I−2P−2P+4P 2 = I.

Similarly we have (I − 2P )(I − 2P )∗ = I. Hence I − 2P is an orthogonal operator.

2.6. Let A be a n× n real matrix. Denote by E = {e1, e2, . . . , en} the standard basis

of Rn. Let T = MA be the operator on Rn induced by A, i.e. T = MA. Then, as we know

very well by now, vj ≡ Tej is the jth column of A, for each j. If A is an orthogonal matrix,

then T is an othogonal operator and hence v1,v2, . . . ,vn, which are the images of vectors

in the standard orthonormal basis under the operator T , also form an orthonormal basis.

21

(Notice that, since T preserves inner products, it sends an orthonormal basis to another.)

Conversely, suppose that the columns v1,v2, . . . ,vn of A form an orthonormal basis of

Rn, that is 〈vj,vk〉 = δjk. (Recalled that the Kronecker delta δjk stands for 1 if j = k and

0 if j �= k.) Then, for vectors x = (x1, . . . , xn) =∑

k xkek and y = (y1, . . . , yn) =∑

j yjejin Rn, we have Tx =

∑

kxkTek =∑

kxkvk and similarly Ty =∑

jyjvj , and hence

〈Tx, Ty〉 =∑

k,j

xkyj〈vk,vj〉 =∑

k,j

xkyjδkj =∑

k

xkyk = 〈x,y〉.

This says that T is an orthogonal operator. Hence A is an orthogonal matrix. We conclude:

Proposition 2.6.1. A real n × n matrix is an orthogonal matrix if and only if its

columns form an orthonormal basis of Rn.

For example, we observe that (−1/3, 2/3, 2/3), (2/3,−1/3, 2/3), (2/3, 2/3,−1/3) form an

orthonormal basis of R3; (this can be cheked directly). Therefore the 3 × 3 matrix

A =

−1/3 2/3 2/32/3 −1/3 2/32/3 2/3 −1/3

is orthogonal. The induced operator MA given by

MA(x1, x2, x3) =1

3(−x1 + 2x2 + 2x3, 2x1 − x2 + 2x3, 2x1 + 2x2 − x3)

is an orthogonal operator on R3.

Example 2.6.2. Notice that the matrix

H1 =1√2

[

1 1−1 1

]

wih columns v1 =

[

1/√

21/

√2

]

, v2 =

[

1/√

2−1/

√2

]

is an orthogonal matrix, since we can check that its columns v1, v2 form an orthonormal

basis in R2. Now we describe a process to define the Hadamard matrix Hn. Let

A =

[

a11 a12a21 a22

]

be a 2 × 2 matrix and let B be an n× n matrix. We define their tensor product A⊗B

to be the 2n× 2n matrix given

A⊗B =

[

a11B a12Ba21B a22B

]

.

22

We have the following basic idntities about tensor products of matrices:

aA⊗ bB = ab(A⊗B), (A⊗B)∗ = A∗ ⊗B∗, (A⊗B)(C ⊗D) = AC ⊗BD. (6.1)

A consequence of these identities is: if A and B are orthogonal (or unitary), then so is

A⊗B. For example

H2 ≡ H1 ⊗H1 =1

2

[

1 1−1 1

]

⊗[

1 1−1 1

]

≡ 1√2

[

H1 H1

−H1 H1

]

=1

2

1 1 1 1−1 1 −1 1−1 −1 1 1

1 −1 −1 1

We can define Hn inductively by putting

Hn = H1 ⊗Hn−1 =1√2

[

Hn−1 Hn−1

−Hn−1 Hn−1

]

which is a 2n × 2n orthogonal matrix, called the Hadamard matrix. We remark that

tensoring is an important operation used in many areas, such as quantum information and

quantum computation.

Notice that the transpose A⊤ of an orthogonal matrix A is also an orthogonal matrix.

In fact, from AA⊤ = A⊤A = I we immediately get (A⊤)⊤A⊤ = A⊤(A⊤)⊤ = I. Since

transposing a matrix changes its columns into rows, from Proposition 2.6.1 we deduce: a

real n×n matrix is an orthogonal matrix if and only if its rows form an orthonormal basis

in Rn.

2.7. Unitary operators are the complex version of orthogonal operators. A linear

operator U on a complex inner product space V is a unitary operator if

〈Ux, Uy〉 = 〈x,y〉

for all x and y in V , i.e. U preserves the inner product of V . As in the real case, a

linear operator on a complex inner product space is a unitary operator if and only if

U∗U = UU∗ = I, that is, U∗ is the inverse of U . As before, unitary operators are norm-

preserving. The converse is also true but the proof is more difficult than the orthogonal

case. Similarly, a complex square matrix A is called a unitary matrix if AA∗ = A∗A = I.

By recycling previous arguments we can show that a n × n complex matrix is unitary if

and only if its columns (or its rows) form an orthonormal basis. A quick example:

[

cos θ i sin θi sin θ cos θ

] (

such as1√2

[

1 ii 1

]

and1

2

[√3 ii

√3

])

23

is an unitary matrix for each real θ.

Example 2.7.1. Let ω = e2πi/n. The columns of the following matrix F is the

orthonormal basis of Cn (see Example 1.4.1 in §1.4)

Fn =1√n

1 1 1 1 1 · · · 11 ω ω2 ω3 ω4 · · · ωn−1

1 ω2 ω4 ω6 ω8 · · · ω2(n−1)

1 ωn−1 ω2(n−1) ω3(n−1) ω4(n−1) · · · ω(n−1)(n−1)

and hence F is a unitary matrix. The linear mapping associated with this matrix is

called the finite Fourer transform. To speed up this transform by using some special

methods is crucial for reducing the cost of communication network in recent years. The

rediscovery of so–called FFT (Fast Fourier Transform) has great practical value in cutting

cost substantially. Now the historian in mathematics can trace back FFT method as early

as Gauss, who certainly did not have this sort of application im mind!

2.8. In the above subsection we have seen that unitary matrices come from unitary

operators. Here we describe another source of such matrices: change of orthonormal basis.

In §2 of Chapter III we describe the connection between matrices [T ]E and [T ]Frepresenting an operator T on a vector space V relative to two different bases E and F in

V : [T ] E and [T ]F are similar, i.e. there is an invertible matrix P such that

[T ]F = P [T ] E P−1. (2.8.1)

Now we make the further assumptions that V is an inner product space and both E and

F are orthonormal bases. If we go over the argument in §2 in Chapter III again, we can

check that the matrix P in (2.8.1) is a unitary matrix in the complex case and P is a

orthogonal matrix in the real case. This leads to the following two definitions: two n× n

complex matrices A and B are unitarily equivalent if there is a unitary matrix U such

that UAU∗ = B; two n× n real matrices A and B are orthogonally equivalent if there

is a orthogonal matrix P such that PAP⊥ = B. Using the terminology here, we have

1. Matrices representing the same operator on a finite dimensional complex inner

product spaces relative to different orthonormal bases are unitarily equivalent;

2. Matrices representing the same operator on a finite dimensional real inner product

spaces relative to different orthonormal bases are orthogonally equivalent.

24

EXERCISE SET IV.2.

Review Questions. Can I state the definitions and give examples of the following terms?

adjoint of a linear operator (of a matrix), Hermitian operator (Hermitian matrix),

unitary operator (unitary matrix), orthogonal operator (orthogonal matrix), sym-

metric operator (symmetric matrix).

What numbers do they correspond in the one-dimensional case?

Drills

1. In each of the following cases, find the adjoint A∗ of the given matrix A:

(a) A =

1 + 2i 2 + 3i 3 + 4i4 + 5i 5 + 6i 6 + 7i7 + 8i 8 + 9i 9 + 9i

(b) A =

0 i 00 0 10 0 0

(c) A =

i 1 ii 1 ii 1 i

.

2. Let A and B be n× n matrices and let a be a complex number. Verify the following

identities. (♠ These identities will be used freely without giving explicit references.

So you must get familiar with them. ♠.)

(A+B)∗ = A∗ +B∗, (aA)∗ = aA∗, (AB)∗ = B∗A∗, (A∗A)∗ = A∗A.

Also, if A is invertible, then so is A∗ and (A−1)∗ = (A∗)−1.

3. Find the missing entry (or entries) indicated by ∗ in each of the following unitary

matrices:

1√2

[

1 ∗1 1

]

,1√2

[

1 ∗i 1

]

,1

5

[

−3 4i4i ∗

]

,1

2

[

1 + i 1 − i∗ 1 + i

]

,

1

3

1 2 ∗2 1 ∗2 ∗ 1

,1

3

i 2 22 ∗ ∗2 2i ∗

,1

2

1 ∗ ∗ 11 −1 −1 ∗1 1 1 ∗1 −1 1 ∗

.

4. Find the least square solution(s) to each of the following inconsistent systems

(a) x1 + ix2 = 0, x1 + ix2 = 2.

(b) x1 + x2 = 1, x1 − x2 = 1, x1 + 2x2 = 5.

(c) x1 + x2 + x3 = 1, x1 − x2 + x3 = 1, x1 − x2 − x3 = 1, x1 + x2 − x3 = 1.

25

5. True or False:

(a) The sum of two Hermitian matrices is Hermitian.

(b) The product of two Hermitian matrices is Hermitian.

(c) If a Hermitian matrix is invertible, then its inverse is also Hermitian.

(d) The sum of two unitary matrices is unitary.

(e) The product of two unitary matrices is unitary.

(f) Unitary matrices are invertible and their inverses are also unitary.

(g) An orthogonal matrix is a matrix orthogonal to a set of given matrices.

(h) If H is a Hermitian matrix and if U is a unitary matrix, then UHU−1 is a

Hermitian matrix.

(i) If H is a Hermitian matrix and if P is an invertible matrix, then PHP−1 is a

Hermitian matrix.

(j) If A is an arbitrary matrix, then A∗A is a Hermitian matrix.

6. Write down each of the following matrices explicitly:

(a) the 8 × 8 Hadamard matrix H3 explicitly (for notation, see Example 2.6.2)

(b) unitary matrices F2, F3, F4, F6, F8 in finite Fourier transform (for notation, see

Example 2.7.1).

Exercises

1. Let R be a linear operator on a complex inner product space V such that R2 = I.

Show that R is unitary if and only if R is Hermitian.

2. Let T be a linear operator on a finite dimensional complex inner product space V .

Show that there exist unique Hermitian operators H and K on V such that T =

H+ iK. (Aside: This is the analogue of the identity z = x+ iy (where x and y are the

real part and the imaginary part of z) for complex numbers. Notice that T ∗ = H−iK,

which is analogous to z = x− iy.)

3. Recall that a projection is a linear operator E satisfying E2 = E. If furthermore,

the space V on which E is a complex inner product space and if E is a Hermitian

operator (thus E2 = E = E∗), then E is called an orthogonal projection. Show that

a projection E on an inner product space is an orthogonal projection if and only if its

kernel is orthogonal to its range: kerE ⊥ E(V ).

4. Let V be a 2-dimensional inner product space and let T be a linear operator on V .

Show that T 2 = O if and only if there is an orthogonal system {e, f} in V such that

26

Tx = 〈x, f〉e for all x ∈ V . Hint: The rank of T is 0 or 1. (Aside: It is straightforward

to check that if T is an operator having the form Tx = 〈x, f〉e with e ⊥ f , then T 2 = O.

Indeed, for each x ∈ V , T 2x = T (Tx) = T (〈x, f〉e) = 〈x, f〉Te = 〈x, f〉〈e, f〉e = 0,

due to the assumption that 〈e, f〉 = 0.)

5. Let T be a linear operator on a finite dimensional inner product space V .

(a) Show that, for all x ∈ V , 〈T ∗Tx,x〉 ≥ 0.

(b) Show that T is invertible if and only if T ∗T is invertible.

6. Show that (a) if P is an orthogonal matrix, then det(P ) is either 1 or −1, and (b) if

U is an unitary matrix, then | det(U)| = 1.

7. Let A be a 2 × 2 orthogonal matrix. Show that

(a) in case det(A) = 1, A is a rotation matrix, i.e.

A =

[


]

for some real number θ,

(b) in case det(A) = −1,

A =

[

cos θ sin θsin θ − cos θ

]

, and

(c) in case det(A) = −1, A2 = I; (Aside: A represents a reflection.)

8. Show that a 2 × 2 unitary matrix U with det(U) = 1 can always be expressed as

[

z1 z2−z2 z1

]

,

where z1 and z2 are complex numbers satisfying |z1|2 + |z2|2 = 1.

9. Show that, if H is a Hermitian operator on a finite dimensional complex inner product

space V , then H − iI is invertible and U ≡ (H + iI)(H − iI)−1 is a unitary operator

on V . (Aside: U is called the Cayley transform of H.)

10. Let A, B, C be 2 × 2 real matrices. Check that

(a) (A⊗B) ⊗ C = A⊗ (B ⊗ C)

(b) there is a 4 × 4 permutation matrix P such that P (B ⊗A)P−1 = A⊗B.

11*. Let T be a linear operator on a complex inner product space. Prove that

(a) T is Hermitian if and only if 〈Tx,x〉 is real for each x ∈ V .

(b) T = O if and only if 〈Tx,x〉 = 0 for each x ∈ V .

27

§3. Orthogonal Diagonalization

3.1. Question: Which operators on a finite dimensional inner product space possess

orthonormal bases consisting of eigenvectors? In other words, which operators can be

represented by diagonal matrices relative to appropriate orthonormal bases? For short,

which operators are orthogonally diagonalizable?

This question does not specify whether the space is real or complex. We have to

consider both situations. Moreover, we have to consider them separately, because they

come up with different answers. In both situations we take the same approach: find a

necessary condition first (an easy step) and then prove the sufficiency of this condition

(the hard part). Before we proceed, let us make an advertisement for the forthcoming

answers. There are three great things about it: first, it is thorough; second, it is neat and

pleasant; and third, it is extremely important, for both theoretical and practical purposes!

Without exaggeration, we can say that these answers are the best things we can learn in

a subject called linear algebra.

Let us start with our investigation. First we consider the real case. Let T be a

linear operator on a finite dimensional real inner product space V . Suppose that T does

have an orthonormal basis E consisting of e1, e2, . . . , en which are eigenvectors of T , say

Tej = λjej for j = 1, 2, . . . , n. Here, of course, λ1, λ2, . . . , λn are real numbers. The

matrix of T relative to E is diagonal:

[T ] ≡ [T ] E =

λ1λ2

. . .

λn

.

The unspecified entries of the above matrix are filled with zeros. By what we have seen in

§2.1 of the last section (to be more specific, identity (2.1.2)), the matrix [T ∗] of the adjoint

T ∗, also relative to E , is just its transpose [T ]⊤. But [T ], as shown above, is diagonal and

hence [T ]⊤ = [T ]. So [T ∗] = [T ], from which it follows T ∗ = T , that is, T is a symmetric

operator.

The above conclusion is easy to get and short to say. Now a wonderful thing happens:

the converse is also true!

Theorem 3.1.1. If T is a symmetric operator on a finite dimensional real inner

product space V , then there is an orthonormal basis consisting of eigenvectors of T .

To prove this theorem, we have to find an orthonormal basis E consisting of e1, e2, . . . , enwhich are eigenvectors of T . Here, n of course is dim V . Our proof proceeds by induction

28

on n. When n = 1, i.e. V is one dimensional, take any unit vector e1 and form the

orthonormal basis E consisting of the single vector e1. This clearly will do for our purpose.

Now we make the inductive hypothesis that the theorem is true for all symmetric operators

on spaces of dimension m. Assume that the dimension of the space V on which T (the

operator we are investigating) is defined is m+1: dimV = n = m+1. By Proposition 2.4.1

we know the the existence of a real eigenvalue for T , say λ1 so that ker(T −λ1I) �= {0}.

Let us take any vector e1 in ker(T − λ1I) with ‖e1‖ = 1. Let L be the one dimensional

subspace spanned by e1:

L = {x ∈ V | x = αe1 for some α ∈ R }.

Let M = L⊥ = {v ∈ V | 〈v, e1〉 = 0 }. (Here {e1} stands for the set consisting of single

vector e1.) Then L + M = V , dimM = dimV − dimL = (m + 1) − 1 = m. For each

y ∈M ≡ L⊥,

〈Ty, e1〉 = 〈y, Te1〉 = 〈y, λ1e1〉 = 0.

The first identity follows from the assumption T = T ∗, the second from e1 ∈ ker(T − λ1I)and the last from y ⊥ L and λe1 ∈ L.

Denote by S the linear operator on M obtained by restricting T to M , that is, S is

the operator defined on the subspace M by putting Sx = Tx for x ∈M . Notice that, the

above argument shows that the range of S is in M and hence it is indeed a linear operator

on M , not just a linear transformation from M to V . Also notice that the linearity of S

is inherited from T . (Aside: In general, if M is an invariant subspace of T , that is, M is

a subspace of V with the property that x ∈ M implies Tx ∈ M , then it is legitimate to

consider the restriction of T to M .) As we have noticed, T being a symmetric operator

can be described by the following condition:

〈Tx,y〉 = 〈x, Ty〉 for all x,y ∈ V.

If x and y are actually inM , then we can rewrite Tx and Ty as Sx and Sy respectively, and

the above identity becomes 〈Sx,y〉 = 〈x, Sy〉. This shows that S is a symmetric operator

on M , which is m–dimensional. So we can apply the induction hypothesis to assert that M

has an orthonormal basis consisting of eigenvectors of S, say e2, e3, . . . , em+ 1. Now you

can see that m + 1 vectors e1, e2, . . . , em + 1 form an orthonormal basis of V consisting

of eigenvectors of T . The proof os complete.

The “matrix version” of Theorem 3.1.1 is the following:

Theorem 3.1.2. If A is a real symmetric matrix, i.e. A = A⊤, then there is a real

diagonal matrix D and an orthogonal matrix P such that A = PDP⊤. (In short, a real

symmetric matrix is orthogonally diagonalizable.)

29

The converse of Theorem 3.1.2 is true and very easy to prove (and hence not very

exciting to us): if A = PDP⊤ for some diagonal D and some orthogonal P , then we have

A⊤ = (PDP⊤)⊤ = P⊤⊤DP⊤ = PDP⊤ = A; (recall that P⊤⊤ = P ).

Now we start the proof of the above theorem. Assume that A is a n×n real symmetric

matrix. Consider the “God-given” operator T ≡MA on Rn induced by A, i.e. T (x) = Ax

for x ∈ Rn. Then T is a symmetric operator on Rn (which is the real inner product space

equipped with the standard inner product). By Theorem 3.1.1, there is an orthonormal

basis v1, v2, . . . , vn in Rn such that Tvk ≡ Avk = λkvk for some scalars λk, (k =

1, 2, . . . , n). Let

P = [v1 v2 · · · vn],

that is, the matrix with v1, v2, etc. as its column vectors. Then P is an orthogonal

matrix. We can check that AP = PD as follows, where D is the diagonal matrix with

λ1, λ2, . . . , λn as its diagonal entries.

AP = A[v1 v2 · · · vn] = [Av1 Av2 · · · Avn] = [λ1v1 λ2v2 · · · λnvn],

With the last block matrix written in the correct way, we have:

[v1λ1 v2λ2 · · · vnλn] = [v1 v2 · · · vn]

λ1. . .

λn

= PD.

Hence AP = PD, giving us A = APP⊤ = PDP⊤.

3.2. Now we consider the similar question in the complex case:

Question: W hich linear operator on a finite dimensional complex inner product space

has an orthonormal basis consisting of eigenvectors?

We proceed in the same way as the real case. However, the complex case is not

just more complex, it is more tricky. Suppose that an operator T on a finite dimensional

complex inner product space V does have an orthonormal basis E = {e1, e2, . . . , en} which

are eigenvalues of T , say Tej = λjej for j = 1, , . . . , n. The matrix of T relative to E is

diagonal:

[T ] ≡ [T ] E =

λ1λ2

. . .

λn

.

30

By what we have seen in the last section (identity (2.1.2)) the matrix of the adjoint T ∗,

also relative to E , is given by

[T ∗] E ≡ [T ∗] = [T ]∗ =

λ1λ2

. . .

λn

.

Hence both [T ][T ∗] and [T ∗][T ] are equal to the diagonal matrix with

|λ1|2(= λ1λ1 = λ1λ1), |λ2|2, . . . , |λn|2

as the diagonal entries. Thus we have [TT ∗] = [T ][T ∗] = [T ∗][T ] = [T ∗T ]. That is, the

operators TT ∗ and T ∗T have the same matrix representation relative to E . Since a linear

operator is completely determined by its matrix representation, we must have TT ∗ = T ∗T .

The discussion here leads to

Definition: A normal operator is a linear operator T on a complex inner product

space satisfying the identity TT ∗ = T ∗T . Similarly, a normal matrix is a (complex)

square matrix A satisfying AA∗ = A∗A.

Normal operators include both Hermitian operators and unitary operators. Recall

that an operator H (on a complex inner product space) is Hermitian if H = H∗. If H is

a Hermitian operator, then HH∗ and H∗H are equal, because both of them are equal to

H2. Hence Hermitian operators are normal. Also recall that an operator U is unitary if

UU∗ = U∗U = I. This identity clearly indicates that U is normal. In the same fashion,

Hermitian matrices and unitary matrices are normal matrices.

Example 3.2.1. Consider the matrix

A =

[

1 − i i−i 1 − i

]

with A =

[

1 + i i−i 1 + i

]

.

Then we can check AA∗ = A∗A =

[

3 2i−2i 3

]

, showing that A is normal, but is neither

hermitain nor unitary.

3.3. The previous discussion establishes that, if an operator T possesses an orthonor-

mal basis consisting of its eigenvectors, then T is a normal operator. Now we witness a

miracle: the converse is also true.

Theorem 3.3.1 A linear operator T on a (finite dimensional) complex inner product

space V has a diagonal matrix representation relative to some orthonormal basis if and

only if T is normal, that is, TT ∗ = T ∗T .

31

Assume that T is normal. We have to find an orthonormal basis E = {e1, e2, . . . , en}which are eigenvectors of T . Here, n of course is the dimension of V . Our proof proceeds

by induction on n. When n = 1, as in the real case, take any unit vector e1 and form the

orthonormal basis E consisting of the single vector e1. This clearly will do for our purpose.

Now we assume the existence of a diagonalizing orthonormal basis for normal operators on

spaces of dimension m, and the dimension of the space V on which T (the operator under

investigation) is defined is m+ 1, that is, n = dimV = m+ 1. Let λ0 be an eigenvalue of

T and let e0 be an eigenvector corresponding to λ0 with norm one, that is, ‖e0‖ = 1. Let

M be the one dimensional subspace spanned by e0:

M = {x ∈ V | x = αe0 for some α ∈ C }.

For convenience, let us write T0 for T −λ0I. Notice that, “x ∈M” implies “T0x = 0”. We

proceed our proof step-by-step as follows. Firstly, notice that T0 is also normal. Indeed,

T0T∗0 = (T − λ0I)(T ∗ − λ0I) = TT ∗ − λ0T

∗ − λ0T + λ0λ0I

= T ∗T − λ0T∗ − λ0T + λ0λ0I = (T ∗ − λ0I)(T − λ0I) = T ∗

λ0Tλ0

.

Secondly, for x ∈ M , in addition to T0x = 0, we also have T ∗0 x = 0. (Aside: Attention!

This is the crucial step.) Indeed, we have

‖T ∗0 x‖2 = 〈T ∗

0 x, T∗0 x〉 = 〈T0T ∗

0 x,x〉 = 〈T ∗0 T0x,x〉 = 〈0,x〉 = 0.

Hence T ∗0 x = 0. Thirdly, we claim that M⊥ (the orthogonal complement of M) is invariant

for both T and T ∗, i.e. if y ∈M⊥, then both Ty and T ∗y belong to M⊥. To prove this,

let us suppose y ∈M⊥. Notice that T = T0 + λI. Hence, for each x ∈M , we have

〈Ty,x〉 = 〈(T0 + λI)y,x〉 = 〈T0y + λy,x〉= 〈T0y,x〉 + λ0〈y,x〉 = 〈y, T ∗

0 x〉 + 0 = 〈y, 0〉 = 0,

and, in the same fashion, we can show 〈T ∗y,x〉 = 0 for all x ∈M . Therefore both Ty and

T ∗y are in M⊥.

Now the rest of the argument is very similar to that of Theorem 3.1.1. Denote by S

the linear operator on M⊥ by restricting T to M⊥, that is, S is the operator defined on

the subspace M⊥ by putting Sx = Tx for x ∈ M⊥. The third step above tells us that

S is indeed a linear operator on M⊥. We check that S ∈ L (M⊥) is also normal. To this

end, we need to find out its adjoint S∗ and show that S and S∗ commutes. So let us take

x,y ∈M⊥. Then

〈x, S∗y〉 = 〈Sx,y〉 = 〈Tx,y〉 = 〈x, T ∗y〉.

32

Since x is an arbitrary vector in M⊥ and since both S∗y and T ∗y are in M⊥, we must have

S∗y = T ∗y. In other words, S∗ is just the restriction of T ∗ to M⊥. Hence, for y ∈M⊥,

SS∗y = S(S∗y) = S(T ∗y) = T (T ∗y) = TT ∗y.

In the same way, we can show that S∗Sy = T ∗Ty. As T is normal, TT ∗y = T ∗Ty and

hence SS∗y = S∗Sy. As y is an arbitrary vector in M⊥ (the domain of S), we have

SS∗ = S∗S, that is, S is normal.

We have shown that S is a normal operator on M⊥. From the fact that M is one-

dimensional and V has dimension m + 1, we see that M⊥ is m-dimensional. Therefore,

by our induction hypothesis, M⊥ has an orthonormal basis consisting of eigenvectors of

S, say e1, e2, . . . , em. Now, m + 1 vectors e0, e1, e2, . . . . . . , em form an orthonormal

basis of V consisting of eigenvectors of T .

3.4. The “matrix version” of Theorem 3.3.1 is the following:

Theorem 3.4.1. If A is a normal matrix, i.e. AA∗ = A∗A, then there is a diagonal

matrix D and a unitary matrix U such that A = UDU∗.

The converse of the above theorem is true and very easy to prove (and hence not very

interesting). Indeed, if A = UDU∗ for some diagonal D and some unitary U , then

AA∗ =UDU∗(UDU∗)∗ = UDU∗U∗∗DU∗ = UDU∗UD∗U∗

= UDD∗U∗ = UD∗DU = UD∗U∗U∗∗DU∗ = A∗A;

(recall that U∗∗ = U and UU∗ = U∗U = I.)

We can derive Theorem 3.4.1 from Theorem 3.3.1 in the same way as deriving Theorem

3.1.2 from Theorem 3.3.1 except that we work with the standard complex space Cn instead

of the real Rn. This same argument will not be repeated here. Recall the following

definition of unitary equivalence: we say that n × n (complex) matrices A and B are

unitarily equivalent if and only if A = U∗BU for some unitary matrix U . Now Theorem

3.4.1 can be restated as follows:

An n× n complex matrix is unitarily equivalent to adiagonal matrix if and only if it is a normal matrix.

Since Hermitian operators and unitary operators are normal operators, Theorem 3.3.1 is

applicable to these types of operators. Thus, if A is a Hermitian operator (or a uni-

tary operator) on a finite dimensional complex inner product space V , then there is an

33

orthonormal basis E such that the representing matrix [T ] E relative to E is diagonal, say

[T ] E =

λ1λ2

. . .

λn

.

Notice that, the diagonal elements of [T ]E are eigenvalues of T . In case T is Hermitian,

the diagonal elements λk are real. In case T is unitary, |λk| = 1 for all k.

We have shown that a Hermitian matrix has an orthonormal basis consisting of eigen-

vectors with real eigenvalue. The matrix version of this statement is: a Hermitian matrix

A is unitarily equivalent to a real diagonal matrix, that is, there is a real diagonal matrix

D and a untary operator U such that A = UDU∗; (the converse is also true but not very

interesting.)

3.5. A 1 × 1 complex matrix is simply a complex number. A 1 × 1 unitary matrix

is a unit modulus number, that is, a complex number z with |z| = 1. A 1 × 1 Hermitain

matrix is just a real number. A good way to think of Hermitian matrices or operators is to

regard them an extension of real numbers. In the present subsection we study operators

and matrices which can be considered as an extension of positive numbers.

Definition. W e say that a linear operator P on a complex inner product space V

is positive if P is Hermitain and 〈Px,x〉 ≥ 0 for all x in V .

Notice that eigenvalues of positive operators are nonnegative real numbers. Indeed, if λ is

an eigenvalue of a positive operator P , say Pv = λv for some vector v with ‖v‖ = 1,

then we have 〈Pv,v〉 = 〈λv,v〉 = λ‖v‖2 = λ and hence λ ≥ 0.

Example 3.5.1. If P is a positive operator on V and if T is any operator on V ,

then the operator T ∗PT is also positive. Indeed, for any vector x in V ,

〈T ∗PTx,x〉 = 〈PTx, Tx〉 = 〈Py,y〉 ≥ 0, with y = Tx.

In particular, for any operator T on an inner product space, operators T ∗T and TT ∗

are positive.

Example 3.5.2. If T is a Hermitian operator on V with nonnegative eigenvalues,

then T is positive. Indeed, since T is Hermitain (and hence normal), it follows from

Theorem 3.3.1 that there is an orthonormal basis E = {e1, e2, . . . , en} in V consisting

eigenvectors of T , say Tek = λkek with λk ≥ 0 (1 ≤ k ≤ n). Any vector v can be

34

written as a linear combination of the basis vectors, say v =∑

k vkek; (here we briefly

recall that vk = 〈v, ek〉, even this will not be used here). Thus

〈Tv,v〉 =⟨

T(∑

kvkek

)

,∑

jvjej

⟩

=∑

k,jvkvjλk〈ek, ej〉 =

∑

kλk|vk|2 ≥ 0.

Hence T is positive.

Let P be a positive operator on a complex inner product space V . By Theorem

3.3.1 we see that that there is an orthonormal basis E = {e1, e2, . . . , en} in V consisting

eigenvectors of T , say Tek = λkek with Theorem 3.3.1 that there is an orthonormal

basis E = {e1, e2, . . . , en} in V consisting eigenvectors of T , say Tek = λkek with

λk ≥ 0 (1 ≤ k ≤ n). In other words, the matrix [P ]E representing P relative to E is a

diagonal matrix with λk ≥ 0 (1 ≤ k ≤ n) as its diagonal elements. Now let Q be the

operator such that its matrix representation [Q]E is a diagonal matrix with with√λk

(1 ≤ k ≤ n) as its diagonal elements. Thus

[P ] E =

λ1λ2

. . .

λn

and [Q]E =

√λ1 √

λ2. . . √

λn

.

Clearly [Q]2E = [P ] E . Hence we have Q2 = P . From Example 3.5.2 above we know that

Q is positive. We have proved the existence part of the following theorem

Theorem 3.5.1. If P is a positive operator, then there exists a unique positive

operator Q such that Q2 = P .

The Proof of the uniqueness of Q is rather technical and hence is omitted here. The

operator Q in the above theorem is called the square root of P and is denoted by P 1/2

or√P .

Take any operator T on an inner product space V . According to Example 3.5.1

above, T ∗T is a positive operator and hence its square root is defined. We denote√T ∗T

by |T |. Thus, |T | is a positive operator satisfying

|T |2 = T ∗T. (3.5.1)

In general, |T | and |T ∗| are not the same.

Example 3.5.3. Prove that |T | = |T ∗| if and only if T is normal.

35

Solution. Suppose |T | = |T ∗|. Then |T |2 = |T ∗|2. Now |T |2 = T ∗T and |T ∗|2 =

T ∗(T ∗)∗ = TT ∗. Hence T ∗T = TT ∗, that is, T is normal. The steps can be reversed to

show that, if T is normal, then |T ∗| = |T |.

The eigenvalues of |T |, arranged in the decreasing order, say µ1 ≥ µ2 ≥ · · · ≥ µn, are

called singular values, or s-numbers of T . They are important in many areas, but we

do not plan to say more about this.

Take any complex n × n matrix A = [ajk] and consider the linear operator MA

induced by A, defined on the complex space Cn with the standard inner product. If

MA is a positive operator, we say that A is positive semi-definite. If, furthermore,

MA is invertible, we say that A is positive definite. It can be easily checked that, if

v = (z1, z2, . . . , zn), then

〈MAv,v〉 = v∗Av =∑

k,jajkzkzj .

Thus a Hermitain matrix A = [ajk] is positive semi–definite if and only if

∑

k,jajkzkzj ≥ 0.

All of the above discussion about operators can be applied to matrices. For example, for

any matrix A, A∗A is positive semi–definite and hence |A| =√A∗A exists

Example 3.5.4. Take any set of vectors v1, v2, . . . , vr in an inner product space

and let G = [gjk] be the r × r matrix with gjk = 〈vj ,vk〉. Check that A is positive

semi–definite.

Solution. For all comples numbers z1, z2, . . . , zr, we have

∑r

j,k = 1gjkzjzk =

∑r

j,k = 1〈vj ,vk〉zjzk =

∑r

j,k = 1ajk〈zjvj , zkvk〉

=⟨∑r

j = 1zjvj ,

∑r

k = 1zkvk

⟩

= 〈w,w〉 ≥ 0,

where w =∑r

j = 1zjvj . A matrix of the form described here is called a Gram matrix.

3.6.* Let T be an operator on a complex inner product space and, as before, write

|T | =√T ∗T . Let us check that |T | and T have the same kernel:

ker |T | = kerT (3.6.1)

Indeed, for any vector v, we have

‖Tv‖2 = 〈Tv, Tv〉 = 〈T ∗Tv,v〉 = 〈|T |2v,v〉 = 〈|T |v, |T |v〉 = ‖|T |v‖2, (3.6.2)

36

which tells us that Tv = 0 if and only if |T |v = 0.

Now assume that T is invertible. From (3.6.1) we know that |T | is also invertible.

Let U = T |T |−1. Then T = U |T |, U∗U = (|T |−1T ∗)T |T |−1) = |T |−1|T |2|T |−1 = I and

UU∗ = (T |T |−1)(|T |−1T ∗) = T (|T |2)−1T ∗ = T (T ∗T )−1T ∗ = T (T−1(T ∗)−1)T ∗ = I,

showing that U is unitary. We have proved that T can be written as a product UP of a

unitary operator U and a positive operator P . Now we check that U and P are uniquely

determined by T . In fact, from T = UP we have P 2 = PU∗UP = (UP )∗(UP ) = T ∗T =

|T |2. By the uniqueness of positive square root, we have P = |T |, from which we also

have U = TP−1 = T |T |−1. The expression UP here is called the polar decomposition

of T . In the one–dimensional case, we can identify T with a complex number z and

the polar representation z = reiθ of z corresponds to the polar decomposition of T .

There is a matrix version for polar decomposition defined in the same manner. The polar

decomposition of an n × n matrix A is A = U |A|, where |A| =√A∗A and U is a

unitary matrix.

Exampe 3.6.1. Find the polar decomposition for A =

[

3 3ii 1

]

.

Solution. Direct computation shows A∗A =

[

10 8i−8i 10

]

with eigenlues λ1 = 18,

λ2 = 2 and corresponding eigenvectors v1 = (i, 1) and v2 = (1, i). Hence |A| =√A∗A

has the eigenvalues√λ1 = 3

√2,

√λ2 =

√2 with the same set of eigenvectors. Let

D =

[

3√

2 00

√2

]

and W =1√2

[

i 11 i

]

.

Then (A∗A)W = WD2, |A|W = WD and U is unitary. A direct computation shows

|A| = WDW ∗ =√

2

[

2 i−i 2

]

and U = A|A|−1 =1√2

[

1 ii 1

]

.

Hence the required polar decomposition is A = U |A|, with U and |A| as given above.

Now we breifly describe the polar decomposition for an operator T not necessarily

invertible. Equation (3.6.2) tells us that ‖|T |w‖ = ‖Tw‖ for all w. Since |T | is Hermitian,

its range is the orthogonal complement of its kernel. We define an operator U by specifying

its values for a vector v in the range or in the kernel of |T |. When v is in the kernel of |T |,we simply set Uv = 0. When v is in the range of |T |, say v = |T |w, we let Uv = Tw.

Notice that

‖Uv‖ = ‖Tw‖ = ‖|T |w‖ = ‖v‖.

37

This shows that, on the range of |T |, U is isometric. From the way U is defined, we have

T = U |T | and kerU = ker |T |. Here U in general is not unitary since it may not be

invertible. However, it resembles a unitary operator, in view of the following identities

which can be verified:

UU∗U = U, U∗UU∗ = U∗.

Let e1, e2, . . . , en be an orthonormal basis consisting of eigenvectors of |T | and let

µ1, µ2, . . . , µn be corresponding eigenvalues of |T |, which are singular values of T :

µ1 ≥ µ2 ≥ · · · ≥ µn.

So we have |T |ek = µkek (1 ≤ k ≤ n). Let r be the rank of T , which is also the rank of

|T |. Thus we have µ1 ≥ µ2 ≥ · · · ≥ µr > 0 and µr + 1 = · · · = µn = 0. Since the vectors

e1, e2, . . . , . . . , er are in the range of |T |, and the operator U is isometric on the range

of |T |, the vectors

fk = Uek 1 ≤ k ≤ r

form an orthonormal system. One can check that

Tx =∑r

k = 1µk〈x, e k 〉 fk (3.6.3)

for any vector x. The above identity is called the singular decomposition of T .

Example 3.6.2. Find the singular decomposition of A =

[

1 i1 i

]

.

Solution. Direct computation shows |A|2 = A∗A =

[

2 2i−2i 2

]

with eigenvalues

4 and 0. Furthermore, e = (i/√

2, 1/√

2) is an eigenvector of |A|2 corresponding to

the eigenvalue 4 with ‖e‖ = 1. From |A|2e = 4e we have |A|e = 2e. Furthermore, we

have Ae = U |A|e = U(2e) = 2Ue = 2f . Thus f = 2−1Ae = (i/√

2, i/√

2). Thus the

singular decomposition of A is given by Ax = 2〈x, e〉f with e = (i/√

2, 1/√

2) and

f = (i/√

2, i/√

2).

38

EXERCISE SET IV. 3.

Review Questions. What is the orthogonal diagonalization problem? What is the neat

and thorough answer to this problem in the real case? In the complex case? What is the

matrix version of this problem and what is the corresponding answer? (Again, you have

to consider the real case and the complex case separately.)

Drills

1. Show that each of the following pairs of matrices A and B are unitarily equivalent by

finding a unitary matrix U such that B = U∗AU .

(a) A =

[

a bc d

]

, B =

[

d cb a

]

. (b) A =

[

a bc d

]

, B =

[

a −b−c d

]

.

(c) A =

[

0 a0 0

]

, B =

[

0 |a|0 0

]

. (d) A =

[

1 1−1 −1

]

, B =

[

0 20 0

]

.

Hint for (c): try a diagonal U . Hint for (d): try the 45o rotation.

2. (a) Verify that N =

[

a bc a

]

is normal if and only if |b| = |c|.(b) Show that the circulant matrix given in Exercise 4 of EXERCISE SET III.1 is

normal.

3. True or false:

(a) If an n× n matrix A is both Hermitian and unitary, then A = I.

(b) The sum of two normal operators is normal.

(c) The product of two normal operators is normal.

(d) If a normal operator is invertible, then its inverse is also normal.

(e) The sum of two unitary operators is unitary.

(f) The product of two unitary operators is unitary.

(g) The sum of two Hermitian operators is Hermitian.

(h) The product of two Hermitian operators is Hermitian.

4. (Aside: It is clear that unitary equivalence implies similarity, but not vice versa. The

present exercise helps you to compare these two concepts.)

(a) Show that, if A and B are unitarily equivalent n × n matrices, then A∗A and

B∗B are also unitarily equivalent.

(b) Give an example of a pair of 2 × 2 matrices which are similar but not unitarily

equivalent.

(c) Give an example of similar 2 × 2 matrices A and B such that A∗A and B∗B are

not similar.

39

(d) Prove that if normal matrices A and B are similar, then they are unitarily

equivalent.

5. For each of the following Hermitian matrices, find the eigenvalues and correspond-

ing eigenvectors and find an appropriate diagonaling unitary matrix (or orthogonal

matrix):

A =

[

1 11 1

]

, B =

[

2 11 2

]

, C =

[

1 22 −2

]

, D =

[

0 −ii 0

]

,

S =

1 1 11 1 11 1 1

, T =

0 1 01 0 10 1 0

.

6. For each of the following matrices, find the eigenvalues and corresponding eigenvectors

and find an appropriate diagonaling unitary matrix (or orthogonal matrix):

A =

[

cos θ sin θsin θ − cos θ

]

, B =

[


]

, W =

[

cos θ i sin θi sin θ cos θ

]

,

where θ is an arbitrary real number such that sin θ > 0.

Exercises

1. Find an orthogonal matrix P and a diagonal matrix D such that D = PAP⊤, where

A =

1 1 01 0 10 1 1

.

(Hint: Notice that (1, 1, 1) is an eigenvector of A.)

2. Follow the guidance given here, show that the n-th term of the Fibonacci sequence

{an} in which each term is is the sum of the preceding two, with the first few terms

1, 1, 2, 3, 5, 8, . . ., is given by

an =1√5

(

1 +√

5

2

)n

− 1√5

(

1 −√

5

2

)n

.

(a) Use the recursive relation an + 2 = an+ 1 + an to verify Xn+ 1 = AXn, where

A =

[

1 11 0

]

, Xn =

[

an + 1

an

]

, and X0 =

[

10

]

,

40

and derive Xn = AnX0. (b) Notice that A is a real symmetric matrix and hence we

can write A = PDP−1 where P is an invertible matrix and D is a diagonal matrix.

Find the explicit expressions of D and P , which enable us to get An and hence Xn.

3*. Let V be a finite dimensional complex inner product space. Prove that If N is a

normal operator on V and if TN = NT for some T ∈ L (V ), then TN∗ = N∗T .

4*. Prove that if P is an invertible positive operator on an inner product space V , then

the inequality

|〈x,y〉|2 ≤ 〈Px,x〉〈P−1y,y〉

holds for all x, y in V .

5*. Prove that if H is a Hermitian operator on a finite dimensional inner product space

V , then eitH is a unitary operator for all real t.

6*. Prove that an operator H on a finite dimensional inner product space V is a Hermitian

operator if eitH is unitary for all real t.

41

Appendices for Chapter IV

Appendix A*: Positive Semidefiniteness, Gram Matrices

Recall that a matrix A = [ajk]1≤j,k≤r is positive semidefinite if, for all complex

numbers z1, z2, . . . , zr,∑r

j,k = 1ajkzjzk ≥ 0.

A positive semidefinite matrix is necessarily Hermitian and its eigenvalues are nonnegative

real numbers. It follows from the spectral theory (for normal operators) that it is a sum

of positive semidefinite matrices of rank one, which are necessarily of the form

v1v1 v1v2 · · · v1vrv2v1 v2v2 · · · v1vr

......

...vrv1 vrv2 · · · vrvr

. (A1)

An interesting consequence of this observation is that, if A = [ajk] and B = [bjk] are

positive semidefinte matrices, then so is their Schur product A ◦B = [ajkbjk] (certainly

this is not the usual kind of matrix multiplication). Indeed, the above discussion tells us

that it is enough to consider the case when B is the matrix given as (A1) above. In that

case, we have

∑r

j,k = 1ajkbjkzjzk =

∑r

j,k = 1ajkvjvkzjzk =

∑r

j,k = 1ajkwjwk ≥ 0,

where wj = vjzj .

Let v1, v2, . . . , vr be a set of vectors in an inner product space V . By the Gram

matrix associated with this set of vectors we mean the following r × r matrix

Γ =

〈v1,v1〉〈v1,v2〉 · · · 〈v1,vr〉〈v2,v1〉〈v2,v2〉 · · · 〈v2,vr〉

......

...〈vr,v1〉〈vr,v2〉 · · · 〈vr,vr〉

. (A2)

Its determinant Gr = det Γ is called the Gramian. Notice that Γ is positive semidef-

inite. Indeed, since the (j, k)–entry of Γ is 〈vj ,vk〉, for arbitrary complex numbers

z1, z2, . . . , zr, we have

r∑

j,k = 1

〈vj ,vk〉zjzk =r

∑

j,k = 1

〈zjvj , zkvk〉 =⟨∑

jzjvj ,

∑

kzkvk

⟩

=∥

∥

∥

∑

jzjvj

∥

∥

∥

2

≥ 0.

42

Furthermore, this argument shows that Γ is positive definite if and only if the vectors

v1, v2, . . . , vr are linearly independent. In terms of Gramian, the vectors v1, v2, . . . , vr

are linearly independent if and only if Gr ≡ det Γ > 0. Conversely, given a positive

definite matrix A = [ajk]1≤j,k≤r, there exist an inner product space and a set of vectors

v1, v2, . . . , vr in V such that ajk = 〈vj ,vk〉 for all j, k; in other words, every positive

definite matrix can be regarded as a Gram matrix. Indeed, given such a matrix A, we

define an inner product on V = Cr by putting

〈v,w〉 =∑r

j,k = 1ajkvjwk

for all v = (v1, . . . , vr) and w = (w1, . . . , wr) in Cr. It is straightforword to check

that this indeed defines an inner product and ajk = 〈ej , ek〉, where ek is the kth vector

of the standard basis for Cr.

Now we consider an interesting expression which resembles the Gramian det Γ, that

is, the determinant of Γ given in (A2),

g =

∣

∣

∣

∣

∣

∣

∣

∣

〈v1,v1〉〈v1,v2〉 · · · 〈v1,vr−1〉 v1

〈v2,v1〉〈v2,v2〉 · · · 〈v2,vr−1〉 v2

...... · · ·

......

〈vr,v1〉〈vr,v2〉 · · · 〈vr,vr−1〉 vr

∣

∣

∣

∣

∣

∣

∣

∣

. (A3)

Notice that the last column of the above determinant are vectors v1, v2, . . . , vr. If we

take the cofactor expansion along the last column, we can write g as a linear combination

of v1,v2, . . . ,vr with

Gr−1 =

∣

∣

∣

∣

∣

∣

∣

∣

〈v1,v1〉〈v1,v2〉 · · · 〈v1,vr−1〉〈v2,v1〉〈v2,v2〉 · · · 〈v2,vr−1〉

...... · · ·

...〈vr−1,v1〉〈vr−1,v2〉 · · · 〈vr−1,vr−1〉

∣

∣

∣

∣

∣

∣

∣

∣

as the coefficient of vr. Thus we may write

g = Gr−1vr + w (A4),

with w in S = span{v1, . . . ,vr−1}. Now, for each vk with 1 ≤ k ≤ r − 1, we have

〈g,vk〉 =

∣

∣

∣

∣

∣

∣

∣

∣

〈v1,v1〉〈v1,v2〉 · · · 〈v1,vr−1〉〈v1,vk〉〈v2,v1〉〈v2,v2〉 · · · 〈v2,vr−1〉〈v2,vk〉

......

......

〈vr,v1〉〈vr,v2〉 · · · 〈vr,vr−1〉〈vr,vk〉

∣

∣

∣

∣

∣

∣

∣

∣

= 0

43

because the last column is the same as the kth column. This shows g ∈ S⊥. Assume that

v1, v2, . . . , vr−1 are linearly independent so that Gr−1 �= 0. Rewrite (A4) as

vr = h + p, where h = G−1r−1g ∈ S⊥ and p = G−1

r−1w ∈ S.

Thus p is the projection of vr onto the subspace spanned by v1, v2, . . . , vr−1.

Example. Find the projection of v = (0, 0, 1) onto the subspace spanned by the

vectors v1 = (1, 1, 1) and v2 = (1, 2, 2).

Solution. Form the vector

g =

∣

∣

∣

∣

∣

∣

〈v1,v1〉〈v1,v2〉 v1

〈v2,v1〉〈v2,v2〉 v2

〈v,v1〉〈v,v2〉 v

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

3 5 v1

5 9 v2

1 2 v

∣

∣

∣

∣

∣

∣

= v1 − v2 + 2v.

So v = 12g − 1

2v1 + 12v2. The required projection is p = −1

2v1 + 12v2 =

(

0, 12 ,− 12

)

.

Appendix B*: Numerical Characters of operators: Norm, Spectral Radius, Etc.

For a linear operator T on a finite dimensional inner product space V , the uniform norm,

or simply the norm, of T is defined to be

‖T‖ = max{‖Tx‖ : x ∈ V, ‖x‖ = 1}

The following basic properties about the norm hold: 1. ‖T‖ ≥ 0 and ‖T‖ = 0 if and

only if T = O; 2. ‖S + T‖ ≤ ‖S‖ + ‖T‖; 3. ‖aT‖ = |a|‖T‖; 4. ‖ST‖ ≤ ‖S‖‖T‖; 5.

‖Tv‖ ≤ ‖T‖‖v‖; 6. ‖T ∗‖ = ‖T‖. The last equality follows from the following observation:

‖T‖ = max{|〈Tx,y〉| : x, y ∈ V, ‖x‖ = ‖y‖ = 1}.

A less trivial property is the following “C∗–identity”:

‖T ∗T‖ = ‖T‖2.

Indeed, ‖T ∗T‖ ≤ ‖T ∗‖‖T‖ = ‖T‖2, and, on the other hand, from

‖Tx‖2 = 〈Tx, Tx〉 = 〈T ∗Tx,x〉 ≤ ‖T ∗Tx‖‖x‖ ≤ ‖T ∗T‖‖x‖2

we have ‖T‖2 = max‖ x ‖= 1 ‖Tx‖2 ≤ ‖T ∗T‖. Hence ‖T ∗T‖ = ‖T‖2.

44

One purpose of introducing the notion of norm is to study convergence of operators.

We say that a sequence of operators {Tn} converges to T if limn→ ∞ ‖Tn − T‖ = 0.

Also, we say that a series of operators∑∞

n = 0Tn converges if the sequence of its partial

sums Sn = T0 +T1 + · · ·+Tn converges. For example, it can be checked that, if ‖T‖ < 1,

then I − T is invertible and the series∑∞

n= 0Tn converges to (I − T )−1.

Recall that the spectrum σ(T ) of T is the set of all eigenvalues of T . The spectral

radius of T is defined to be

r(T ) = max{|λ| : λ ∈ σ(T )}.

It is easy to see that r(T ) ≤ ‖T‖. Indeed, we can choose an eigenvalue λ such that

|λ| = r(T ), and, letting v be a unit vector such that Tv = λv, we have

r(T ) = |λ| = ‖λv‖ = ‖Tv‖ ≤ ‖T‖.

Notice that r(T ) = 0 if and only if 0 is the only eigenvalue of T , or, equivalently,

T is a nilpotent operator. When S and T commutes, that is, ST = TS, we have the

inequalities r(S + T ) ≤ r(S) + r(T ) and r(ST ) ≤ r(S)r(T ). But in general, without

this commutativity condition, these two inequalities are not true. We have the following

important identity for spectral radius

r(T ) = limn→ ∞ ‖Tn‖1/n.

The usual proof of this uses complex analysis. The set

W(T ) = max{〈Tx,x〉 : ‖x‖ = 1}

is called the numerical range of T . It is true but highly nontrivial that W(T ) is always

a convex set in the complex plane. The number w(T ) = max{|λ| : λ ∈ W(T )} is called

the numerical radius of T . For all operators T , we have

r(T ) ≤ w(T ) ≤ ‖T‖ ≤ 2w(T ).

Given operators S and T , thier Hilbert–Schmidt inner product is defined to be

〈S, T 〉HS =∑n

j = 1〈Sej , Tej〉

where {ej}1≤j≤n is an orthonormal basis of V . It can be checked that 〈S, T 〉HS given

here is independent of the choice of the orthonormal basis {ej}1≤j≤n and hence it is well

defined. The Hilbert–Schmidt norm of an operator T , is defined to be

‖T‖HS =√

〈T, T 〉HS =

√

∑n

k = 1‖Tek‖2.

45

Let µ1 ≥ µ2 ≥ · · · ≥ µn be the singular values of T , that is, the eigenvalues of |T |,arranged in the decreasing order. Then ‖T‖ = µ1 and ‖T‖2HS =

∑nk = 1µ

2k. For any

positive number p ≥ 1, one can define the p–norm of T to be ‖T‖p by putting

‖T‖p =(∑n

k = 1µpk

)1/p

.

Then ‖T‖HS = ‖T‖2, that is, the Hilbert–Schmidt norm is just the 2–norm. The following

properties about the p–norm are highly nontrivial: 1. ‖S+T‖p ≤ ‖S‖p+‖T‖p; 2. ‖UT‖p =

‖TU‖p = ‖T‖p if U is unitary (or orthogonal in the real case); 3. ‖ST‖p ≤ ‖S‖‖T‖pand ‖ST‖p ≤ ‖S‖p‖T‖; 4. ‖T‖ ≤ ‖T‖p ≤ ‖T‖1; 5. ‖T ∗‖p = ‖T‖p. The norm ‖T‖1 is

called the trace norm of T . Notice that ‖T‖1 = tr |T |, the trace of |T |.

We can use an inner product space V and operators on V to model some quantum

system. A pure state of this system is a unit vector in V . An observable is an operator

on V . An eigenstate for an observable A is an eigenvector of A and the corresponding

eigenvalue is the observed value of A at that state, as measured in a lab. If v is

not an eigenstate for A, then 〈Av,v〉 is the expected value (the word “expected” is in

probabilistic sense) of A at state v. By a “mixed” state we mean a positive operator

T with ‖T‖1 ≡ tr T = 1. If µ1, µ2, . . . , µn are eigenvalues of T with corresponding

eigenvectors v1, v2, . . . , vn which form an orthonormal basis, then

〈AT, T 〉HS =∑n

k = 1µk〈Avk,vk〉,

which is the expected value of the observable A at the mixed state T . A pure state v

can be identified with the rank one positive operator T defined by Tx = 〈x,v〉v. Notice

that ‖v‖ = 1 implies tr T = 1. A mixed state is a convex combination of a set of pure

states.

Appendix C*: Linear Groups

By a linear group here we mean a group of linear operators (another word for this

is linear transformations), not a group that is linear. Let V be a vector space with

dimV = n, Recall that L (V ) is the set of all linear operators on V . We say that a subset

G of L (V ) is a linear group or simply a group if it satisfies the following conditions:

(LG1) The identity transformation I belongs to G.

(LG2) G is closed under multiplication, that is, if S and T are in G, then so is their

product ST .

46

(LG3) If S is in G, then S is invertible and its inverse S−1 is also in G.

For example, all invertible elements in L (V ) form a group denoted by GL(V ), called the

general linear group. When the vector space V is Fn, we may identify GL(V ) with

the group GL(n;F) of all invertible n × n matrices over F. A subgroup of GL(n;F)

is called a matrix group. For example, all orthogonal (real) n × n matrices form a group

denoted by O(n), called the orthogonal group. Notice that, for A ∈ O(n), we have

det(AA⊤) = det(I) = 1, or (detA)2 = 1 and hence detA is either 1 or −1. The subgroup

of O(n) consisting of orthogonal matrices of determinant 1, called the special orthogonal

group, is denoted by SO(n), All unitary n × n matrices form a subgroup of G(n,C),

denoted by U(n), called the unitary group. The special unitary group is defined to

beSU(n) = {A ∈ U(n) : detA = 1}

≡ {A ∈ GL(n,C) : AA∗ = A∗A = I, detA = 1}.

This group is important in several areas, including particle physics.

Let G be a group of n×n invertible matrices. An n×n matrix A is called a tangent

vector of G at I if A = Φ′(0) for some smooth curve Φ(t) in G satisfying Φ(0) = I.

Denote by LG the set of all tangent vectors of G at I.

Now we derive some basic properties of LG. First, take arbitrary A and B in LG. We

claim: A+B is also in LG. Indeed, by assumption, we have Φ′(0) = A and Ψ′(0) = B for

some parametric curves Φ(t) and Ψ(t) in G with Φ(0) = Ψ(0) = I. Then Θ(t) ≡ Φ(t)Ψ(t)

is also a parametric curve in G with Θ(0) = Φ(0)Ψ(0) = I. So Θ′(0) is in LG. The product

rule gives

Θ′(0) = Φ(0)Ψ′(0) + Φ′(0)Ψ(0) = I.B +A.I = A+B.

Hence A + B is in LG. Next we claim: if A is in LG, say A = Φ′(0) for a parametric

curve Φ(t) with Φ(0) = I, and if λ is a scalar, then λA is also in LG. Putting the last two

claims together, we see that LG is a vector space. Indeed, consider the new parametric

curve Ψ(t) = Φ(λt), which is also lying in G. Clearly Ψ(0) = I and, by the chain rule,

Ψ′(t) = λΦ′(λt) and consequently λA = λΦ′(0) = Ψ′(0) ∈ LG. Now we make the third

claim: if A ∈ LG and B ∈ G, then BAB−1 ∈ LG. Indeed, from A ∈ LG we know that

A = Φ′(0) for some curve Φ(t) in G satisfying Φ(0) = I. Let Ψ(t) = BΦ(t)B−1, which

is a parametric curve in G satisfying Ψ(0) = I with Ψ′(0) = BΦ′(0)B−1 = BAB−1 and

hence BAB−1 ∈ LG. The final claim is: if A,B are in LG, then so is AB −BA. Indeed,

we have Ψ′(0) = B for some parametric curve in Ψ(t) in G with Ψ(0) = I. By our third

claim, we know that C(t) = Ψ(t)−1AΨ(t) is a parametric curve in LG. Now our first and

second claims say that LG is a linear space of n× n matrices. Hence the derivative C ′(t)

47

of C(t), a curve in LG, is also in LG. In particular, C′(0) is in LG. Now

C ′ = (Φ−1AΦ)′ = (Φ−1)′AΦ + Φ−1AΦ′ = −Φ−1Φ′Φ−1AΦ + Φ−1AΦ′;

(here we have used (Φ−1)′ = −Φ−1Φ′Φ). From Φ(0) = I and Φ′(0) = A, we obtain

C′(0) = −BA + AB = AB − BA. The expression AB − BA is called the Lie bracket

or Lie product of A and B, and is denoted by [A,B]. We call a set L of n × n matrices

a real (matrix) Lie algebra if, for all A and B in L and for all real numbers λ and µ,

both λA+ µB and [A,B] are in L. We have arrived at the following fact: If G is a matrix

group, then LG is a Lie algebra. Naturally we call LG the Lie algebra of G.

The description of tangent vectors of G at I is not easy to work with in concrete cases.

The following criteria is handy: a matrix A is in LG if and only if etA ∈ G for all t. Using

this criterion, we can easily find out the Lie algebras of matrix groups mentioned above.

Writing Mn(F) for the set of all n× n matrices with entries in F, we have

L O(n) = {A ∈ Mn(R) : A+A⊤ = O};

L SO(n) = {A ∈ Mn(R) : A+A⊤ = O, tr A = 0};

L U(n) = {A ∈ Mn(C) : A+A∗ = O};

L SU(n) = {A ∈ Mn(C) : A+A∗ = O, tr A = 0}.

Appendix D*: Rotations

By a rotation we mean an element A in the matrix group SO(3); in other words,

A is a 3 × 3 real matrix with AA⊤ = A⊤A = I and detA = 1. Since A is a 3 × 3 real

matrix, its charateristic polynomial p(x) is a real polynomial of (odd) degree 3. Thus

p(x) must have a real root, say r, and the other two are either both real or a conjugate

pair. Let v be an eigenvector corresponding to r, that is, Av = rv, v �= 0. Now

‖v‖ = ‖Av‖ = ‖rv‖ = |r|‖v‖ and hence |r| = 1. so r is either 1 or −1. In case the other

two eigenvalues form a conjugate pair, say λ and λ, we have

1 = detA = rλλ = r|λ|2

which implies r > 0 and hence r = 1. If the other two eigenvalues are also real, say

r1, r2 ∈ R, then we also have |r1| = 1 and |r2| = 1. Furthermore, 1 = detA = rr1r2 and

hence one of r, r1, r2 is positive. Thus we have shown that 1 is always an eigenvalue of a

48

rotation matrix A. Let v1 be a unit vector such that Av1 = v1. Applying A−1 to

both sides, we get v1 = A−1v1, that is, A−1v1 = v1. Let S = {v1}⊥, the orthogonal

complement of v1. Notice that, if v ∈ S, then

〈Av,v1〉 = 〈v, A⊤v1〉 = 〈v, A−1v1〉 = 〈v,v1〉 = 0

and hence Av ∈ S. This shows that S is an invariant subspace of A. The restriction of A

to S, say AS, is necessary an orthogonal operator with determinant 1. Thus, if vectors

v2,v3 form an orthonormal basis of the 2–dimensional space S, the matrix representation

of AS relative to this basis is necessarily of the form

[


]

a rotation matrix with θ as its angle of rotation. Relative to the orthnormal basis B =

{v1, v2, v3}, the representation matrix of A is given by

[A]B =

1 0 00 cos θ − sin θ0 sin θ cos θ

. (C1)

Geometrically, vector v1 gives the direction of the axis of rotation and θ is the angle of

rotation. The connection between A and [A]B is a matter of change of basis. Let

V = [v1 v2 v3], an orthogonal matrix. Then A = V [A]BV −1. Taking traces on both

sides, we get

tr A = tr V [A]BV−1 = tr [A]B = 1 + 2 cos θ (C2)

This gives us the recipe for finding the angle of rotation. Next we describe a way to find

the axis of rotation. As we have seen, Av1 = v1 and A⊤v1 = A−1v1 = v1. Hence

(A−A⊤)v1 = 0. But C = A−A⊤ is a skew symmetric matrix, that is, C⊤ = −C. We

can put C in the following form

C =

0 −a3 a2a3 0 −a1

−a2 a1 0

;

(see the final part of §2.4 in Chapter I). Since Cx = a×x, where a = (a1, a2, a3), we have

Ca = 0. We can set v1 = ‖a‖−1a.

Example. Find the angle and the axis of the “rotation sequence”

R =

cosα − sinα 0sinα cosα 0

0 0 1

1 0 00 cosβ − sinβ0 sinβ cosβ

≡

cosα − sinα cosβ sinα sinβsinα cosα cosβ − cosα sinβ

0 sinβ cosβ

.

49

Solution. Denote by θ the angle of rotation R. Then

1 + 2 cos θ = tr R = cosα+ cosα+ cosβ = (1 + cosα)(1 + cosβ) − 1

and hence cos θ = 12(1 + cosα)(1 + cosβ) − 1 from which θ can be obtained. Form the

skew symmetric matrix (for simplicity, we do not specify the lower left part of this matrix)

R−R⊤ =

0 − sinα(1 + cosβ) sinα sinβ∗ 0 − sinβ(1 + cosα)∗ ∗ 0

.

The axis of rotation is parallel to the vector

v = (sinβ(1 + cosα), sinα sinβ, sinα(1 + cosβ)).

A brute force computation shows Rv = v. We remark that computing rotation sequences

is useful in some practical problems, such as navigation.

Appendix E*: SU(2), Quaternions, and Spinors

Recall that SU(2) is the matrix group of all 2 × 2 unitary matrices of determinant

equal to 1:

SU(2) = {U ∈ U(2) : det(U) = 1}.

Let U be in SU(2). Write down U and UU∗ explicitly as follows

U =

[

z wu v

]

and UU∗ =

[

z wu v

] [

z uw v

]

=

[

|z|2 + |w|2 zu+ wvuz + vw |u|2 + |v|2

]

.

From UU∗ = I we get |z|2 + |w|2 = 1 and uz + vw = 0. Assume w �= 0 and z �= 0.

Then we may write u = αw and v = βz for some α and β. Now uz + vw = 0 gives

(α+ β)zw = 0 and hence α+ β = 0. Thus

1 = det(U) = zv −wu = z(βz) − w(αw)

= z(βz) − w(−βw) = β(|z|2 + |w|2) = β.

Therefore U is of the form

U =

[

z w−w z

]

, where |z|2 + |w|2 ≡ zz + ww = 1. (E.1)

50

In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 × 2

matrix U is in SU(2) if and only if it can be expressed at (E.1) above.

Writing z = x0 + ix1 and w = x1 + ix2 in (E.1), we have

U =

[

z w−w z

]

=

[

x0 + ix1 x2 + ix3−x2 + ix3 x0 − ix1

]

= x01 + x1i + x2j + x3k, (E.2)

where

1 =

[

1 00 1

]

, i =

[

i 00 −i

]

, j =

[

0 1−1 0

]

, k =

[

0 ii 0

]

. (E.3)

Matrix U in (E.1) belongs to SU(2) if and only if

|z|2 + |w|2 ≡ x20 + x21 + x22 + x23 = 1.

An expression written as the RHS of (E.2), without the condition x20 + x21 + x22 + x23 = 1

imposed, is called a quaternion. Since the theory of quaternions was discovered by

Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is

determined by the following identities among basic units 1, i, j, k:

1q = q1 = q, i2 = j2 = k2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j, (E.4)

where q is any quaternion. These identities can be checked by direct computation. We

usually suppress the unit 1 of the quaternion algebra H and write x0 for x01. Let q be

the quaternion given as (E.2), which is a 2 × 2 complex matrix. Its adjoint is given by

q∗ =

[

z −ww z

]

=

[

x0 − ix1 −x2 − ix3x2 − ix3 x0 + ix1

]

= x0 − x1i− x2j− x3k,

which is also called the conjugate of q. A direct computation shows

q∗q = qq∗ = (|z|2 + |w|2)1 ≡ |z|2 + |w|2

= det(q) = x20 + x21 + x22 + x23.

The square root of the last expression is called the norm of q and is denoted by ‖q‖. Thus

q∗q = qq∗ = ‖q‖2.

So, q is in SU(2) if and only if ‖q‖ = 1:

SU(2) = {q = x0 + x1i + x2j + x3k ∈ H : ‖q‖2 ≡ x20 + x21 + x22 + x23 = 1}.

Regarding H as the 4-dimensional space with rectangular coordinates x0, x1, x2, x3, we

may identity SU(2) is the 3-dimensional sphere x20 + x21 + x22 + x23 = 1, which will be

51

simply called the 3-sphere. Notice that, if we write z = x0 + x1i and w = x2 + x3i,

then q = x0 + x1i + x2j + x3k can be written as q = z + wj, in view of ij = k.

For a quaternion q = x0 + x1i + x2j + x3k, we often write q = x0 + x, where x0 is

called the scalar part and x = x1i + x2j + x3k is called the vector part. From (E.4) we

see how to multiply “pure vector” quaternions. It is easy to check that the product of two

quaternions q = x0 + x and r = y0 + y is determined by

qr = (x0 + x)(y0 + y) = x0y0 + x0y + y0x + xy, where

xy = −x · y + x× y.(E.5)

The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for

deciding its conjugate, as we can easily check that

q∗ = (x0 + x)∗ = x0 − x, (E.6)

which resembles the identity x+ iy = x− iy for complex numbers. From (E.6) we see that

a quaternion q is a pure vector if and only if q∗ = −q, that is, q is skew Hermitian.

We identify a pure vector x = x1i + x2j + x3k with the vector x = (x1, x2, x3) in

R3. For each q ∈ SU(2), define a linear transformation R(q) in R3 by putting

R(q)x = q∗xq. (the definition of R(q) here comes from the adjoint representation of a

matrix group, which is SU(2) in the present case, described in Appendix C above.) We

can check that y ≡ R(q)x is indeed in R3:

y∗ = (R(q)x)∗ = (q∗xq)∗ = q∗x∗q = q∗(−x)q = −q∗xq = −y.

The most interesting thing about R(q) is that it is an isometry: x and y ≡ R(q)x have

the same length. Indeed,

‖y‖2 = y∗y = (q∗xq)∗(q∗xq)

= q∗x∗qq∗xq = q∗x∗xq = q∗‖x‖2q = ‖x‖2q∗q = ‖x‖2.

Using some connectedness argument in topology, one can show that R(q) is actually a

rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be

written in the form R(q) and we call it the spinor representation of the rotation. Also,

we call SU(2) the spinor group. It is an essential mathematical device for describing

electron spin, and studying aircraft stability. It is also used to explain how a falling cat

can turn its body 180o in the midair in order to achieve a safe landing, without violating

the basic physical law of conservation of angular momentum.

52

Documents

CHAPTER IV - people.math.carleton.cackfong/la4.pdf · The space Cn provides us with the typical example of complex inner product spaces, deﬁned as follows: Deﬁnition. By an inner