Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CHAPTER IV
OPERATORS ON INNER PRODUCT SPACES
§1. Complex Inner Product Spaces
1.1. Let us recall the inner product (or the dot product) for the real n–dimensional
Euclidean space Rn: for vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Rn, the
inner product 〈x,y〉 (denoted by x · y in some books) is defined to be
〈x,y〉 = x1y1 + x2y2 + · · · + xnyn,
and the norm (or the magnitude) ‖x‖ is given by
‖x‖ =√
〈x,x〉 =√
x21 + x22 + · · · + x2n.
For complex vectors, we cannot copy this definition directly. We need to use complex
conjugation to modify this definition in such a way that 〈x,x〉 ≥ 0 so that the definition
of magnitude ‖x‖ =√
〈x,x〉 still makes sense. Recall that the conjugate of a complex
number z = a+ ib, where x and y are real, is given by z = a− ib, and
z z = (a− ib)(a+ ib) = a2 + b2 = |z|2.
The identity z z = |z|2 turns out to be very useful and should be kept in mind.
Recall that the addition and the scalar multiplication of vectors in Cn are defined
as follows: for x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Cn, and a in C,
x + y = (x1 + y1, x2 + y2, . . . , xn + yn) and ax = (ax1, ax2, . . . , axn)
The inner product (or the scalar product) 〈x,y〉 of vectors x and y is defined by
〈x,y〉 = x1y1 + x2y2 + · · · · · · + xnyn (1.1.1)
Notice that 〈x,x〉 = x1x1+x2x2+ · · ·+xnxn = |x1|2+|x2|2+ · · ·+|xn|2 ≥ 0, which is what
we ask for. The norm of x is given by
‖x‖ = 〈x,x〉1/2 =√
|x1|2+|x2|2+ · · · +|xn|2.
1
Remark: In (1.1.1), it is not clear why we prefer to take complex conjugates of components
of y instead of components of x. Actually this is more or less due to the tradition of
mathematics, rather than our preference. (Physicists have a different tradition!)
The space Cn provides us with the typical example of complex inner product spaces,
defined as follows:
Definition. By an inner product on a complex vector space we mean a device of
assigning to each pair of vectors x and y a complex number denoted by 〈x,y〉, such that
the following conditions are satisfied:
(C1) 〈x,y〉 ≥ 0, and 〈x,x〉 = 0 if and only if x = 0.
(C2) 〈y,x〉 = 〈x,y〉.(C3) The inner product is a “sesquelinear map”, i.e.
〈a1x1 + a2x2, y〉 = a1〈x1,y〉 + a2〈x2,y〉〈x, b1y1 + b2y2〉 = b1〈x,y1〉 + b2〈x,y2〉.
(Actually the second identity of (C3) above is the consequence of the first, to-
gether with (C2).
Inner products for real vector spaces can be defined in the similar fashion. It is slightly
simpler because there is no need to take complex conjugation. This is simply because the
conjugate of a real number is just itself.
Besides Cn, another example of complex inner product space is given as follows.
Consider a space F of well-behaved complex-valued functions over an interval, say [a, b];
(here we do not specify the technical meaning of being well-behaved). The inner product
〈f, g〉 of f, g ∈ F is given by
〈f, g〉 =1
b− a
∫ b
a
f(t) g(t) dt, for f, g ∈ F .
(On the right hand side, 1/(b− a) is a normalization factor added for convenience in the
future.) The norm induced by this inner product is
‖f‖ ≡ 〈f, f〉1/2 =
(
1
b− a
∫ b
a
|f(t)|2 dt)1/2
for f ∈ F .
In the future we will take F to be the space of trigonometric polynomials and [a, b] is
any interval of length 2π, such as [0, 2π] and [−π, π].
2
1.2. Let V be a complex vector space V with an inner product 〈·, ·〉. We say that
two vectors x and y in V are orthogonal or perpendicular if their inner product
is zero and we write x⊥y in this case. Thus, by our definition here,
x⊥y ⇐⇒ 〈x,y〉 = 0.
From the definition of orthogonality you should recognize that, first, the zero vector 0
is orthogonal to every vector (indeed, for each vector x in V , 〈0,x〉 = 〈0 + 0,x〉 =
〈0,x〉 + 〈0,x〉 by (C3) and hence 〈0,x〉 = 0); second, 0 is the only vector orthogonal to
itself (this follows from (C1)) and hence 0 is the only vector orthogonal to every vector;
third, x⊥y implies y⊥x (indeed, if 〈x,y〉 = 0, then 〈y,x〉 = 〈x,y〉 = 0 = 0).
A set of nonzero vectors S is called an orthogonal system if each vector in S is
orthogonal to all other vectors in S. If, furthermore, each vector in S has length 1,
then S is called an orthonormal system. (Notice the difference of the endings of the
words “orthogonal” and “orthonormal”.) We have the following generalized Pythagoras
theorem: If v1,v2, · · · ,vn are an orthogonal system, then
‖v1 + v2 + · · · + vn‖2 = ‖v1‖2 + ‖v2‖2 + · · · + ‖vn‖2. (1.2.1)
We prove this by induction on n. When n = 1, (5.2) becomes ‖v1‖2 = ‖v1‖2 and there is
nothing to prove. So let n ≥ 2 and assume that the theorem is true for n− 1 vectors. Let
w = v2 + v3 + · · · + vn. Then, by our induction hypothesis, ‖w‖2 =∑n
k = 2 ‖vk‖2. Thus
(1.2.1) becomes ‖v1 + w‖2 = ‖v1‖2 + ‖w‖2 which remains to be verified. Notice that
〈v1,w〉 = 〈v1,v2〉 + 〈v1,v3〉 + · · · · · · + 〈v1,vn〉 = 0.
Hence‖v1 + w‖2 = 〈v1 + w,v1 + w〉
= 〈v1,v1〉 + 〈v1,w〉 + 〈w,v1〉 + 〈w,w〉= 〈v1,v1〉 + 〈v1,w〉 + 〈v1,w〉 + 〈w,w〉= 〈v1,v1〉 + 〈w,w〉 = ‖v1‖2 + ‖w‖2.
Hence (1.2.1) is valid.
Given an orthogonal system E = {e1, e2, . . . , en} in V , and a vector v which can
be written as a linear combination of vectors in B, say
v = v1e1 + v2e2 + · · · + vnen ≡∑n
k = 1vkek,
we look for an explicit expression for the coefficients vk in this linear combination. By
the linearity in the “first slot” of the inner product, we have
〈v, ej〉 =⟨∑n
k = 1vkek, ej
⟩
=∑n
k = 1vk〈ek, ej〉.
3
Note that 〈ek, ej〉 are zeros except when k = j, which gives 1 in this case; (in short,
〈ek, ej〉 = δjk). So the above identity becomes 〈v, ej〉 = vj . Thus
v =∑n
k = 1〈v, ek〉ek = 〈v, e1〉e1 + 〈v, e2〉e2 + · · · + 〈v, en〉en, (1.2.2)
Since ‖〈v, ek〉ek‖ = |〈v, ek〉|‖ek‖ = |〈v, ek〉|, the generalized Pythagoras theorem gives
‖v‖2 = |〈v, e1〉|2 + |〈v, e2〉|2 + · · · · · · + |〈v, en〉|2, (1.2.3)
if v is in the linear span of the orthonormal system E = {e1, e2, . . . , en}. The last
identity is a general fact about orthonormal system that should be kept in mind.
1.3. Next we consider a slightly more general problem: given a vector v in an inner
product space V and a subspace W of V , spanned by a given orthogonal system S =
{w1,w2, . . . ,wr} of nonzero vectors (〈wk,wj〉 = 0 for k �= j and 〈wk,wk〉 �= 0, where k
and j run between 1 and r), find the so-called orthogonal decomposition of v:
v = w + h, (1.3.1)
where w ∈W and h ⊥W (that is, h is perpendicular to all vectors in W ). The vector w
here will be called the (orthogonal) projection of v onto W . Since w is in W and W is
spanned by w1,w2, . . . ,wr, we can write
w = a1w1 + a2w2 + · · · + arwr. (1.3.2)
We have to find a1, a2, . . ., ar. Identity (1.3.1) can be rewritten as
v = w + h =∑r
k = 1akwk + h.
Take any vector from w1,w2, . . . ,wr, say wj, and form the inner product of wj with each
side of the above identity. By the linearity of the “first slot” of inner product, we have
〈v,wj〉 =∑r
k = 1 ak〈wk,wj〉 + 〈h,wj〉. Note that 〈wk,wj〉 are zeros except when k = j.
Hence∑r
k = 1 ak〈wk,wj〉 can be reduced to aj〈wj ,wj〉. On the other hand, 〈h,wj〉 = 0
because h is perpendicular to W and wj is in W . Thus we arrive at 〈v,wj〉 = aj〈wj ,wj〉,or aj = 〈v,wj〉/〈wj,wj〉. Substitute this expression of aj to (1.3.2), switching the index
j to k, to obtain:
w =∑r
k = 1
〈v, wk 〉〈wk,wk〉
wk ≡ 〈v, w1〉〈w1,w1〉
w1 +〈v, w2〉〈w2,w2〉
w2 + · · · +〈v, wr〉〈wr,wr〉
wr, (1.3.3)
which is the required projection.
4
Now we consider two special cases: The first case is that S consists of a single (nonzero)
vector, say u. Write down the orthogonal decomposition
v =〈v,u〉〈u,u〉u + h where h ⊥ u.
The generalized Pythagoras theorem gives ‖v‖2 =∣
∣〈v,u〉/〈u,u〉∣
∣2 ‖u‖2 + ‖h‖2. Using
〈u,u〉 = ‖u‖2, we rewrite the first term on the right-hand side as |〈v,u〉|2/‖u‖2. Then
we show our generosity by dropping the second term ‖h‖2 on the right to obtain the
inequality ‖v‖2 ≥ |〈v,u〉|2/‖u‖2. We can rearrange this into
|〈v,u〉| ≤ ‖v‖‖u‖, (1.3.4)
which is the celebrated Cauchy–Schwarz’s inequality.
The second special case is that S consists of an orthonormal system, say S =
{e1, e2, . . . , en}. In this case
w =∑n
k = 1〈v, ek〉ek with ‖w‖2 =
∑n
k = 1|〈v, ek〉|2.
The orthogonal decomposition v = w + h tells us that ‖v‖2 = ‖w‖2 + ‖h‖2. Dropping
‖h‖2, we get ‖v‖2 ≥ ‖w‖2, or ‖w‖2 ≤ ‖v‖2. We have arrived at
∑n
k = 1|〈v, ek〉|2 ≤ ‖v‖2. (1.3.5)
Notice that this inequality also holds for an infinite orthonormal system, say {ek}∞k = 1.
Indeed, for any n, applying this inequality to the finite system {ek}nk = 1, we get (1.3.5)
above. Letting n→ ∞, we obtain
∑∞
k = 1|〈v, ek〉|2 ≤ ‖v‖2
which is usually called Bessel’s inequality.
1.4. In the present section, we give some examples of orthonormal systems.
Example 1.4.1. In Cn, the standard basis consisting of vectors
e1 = (1, 0, . . . , 0, 0), e2 = (0, 1, . . . , 0, 0), . . . en = (0, 0, . . . , 0, 1)
clearly form an orthonormal basis.
5
Example 1.4.2.* Fix a positive number n and let ω = e2πi/n, which is called a
primitive nth root of unity. Consider the following vectors in Cn:
fk =1√n
(
1, ωk−1, ω2(k−1), . . . , ω(n−1)(k−1))
; 1 ≤ k ≤ n.
We write down the first three of them to see the general pattern:
f1 = (1, 1, 1, . . . , 1)/√n,
f2 = (1, ω, ω2, . . . , ωn−1)/√n,
f3 = (1, ω2, ω4, . . . , ω2(n−1))/√n,
We claim that fk (1 ≤ k ≤ n) form an orthonormal basis in Cn. First we check that
they are unit vectors:
‖fk‖2 =1
n
(
12 + |ωk−1|2 + |ω2(k−1)|2 + · · · + |ω(n−1)(k−1)|2)
= 1
in view of |ω| = 1. Next we show that, for k �= ℓ, 〈fk, fℓ〉 = 0. For definiteness, let us
assume 1 ≤ ℓ < k ≤ n. By using ω = ω−1, we get
〈fk, fℓ〉 =(
1 + ωk ωℓ + ω2k ω2ℓ + · · · + ωk(n−1) ωℓ(n−1))
/n
=(
1 + ωk−ℓ + ω2k−2ℓ + · · · + ω(k−ℓ)(n−1))
/n
= (1 + η + η2 + · · · + ηn−1)/n,
where η = ωk−ℓ. Now
(1 − η)(1 + η + η2 + · · · + ηn−1) = 1 − ηn = 1 − ω(k−ℓ)n = 1 − (ωn)k−ℓ = 1 − 1 = 0.
Since 0 < k− ℓ < n, η ≡ ωk−ℓ �= 1, or 1− η �= 0. Hence 1 + η+ η2 + · · ·+ ηn−1 = 0. Now
〈fk, fℓ〉 = 0 is clear. This example will be referred to in the next section when we discuss
the finite Fourier transform (in Example 2.7.1).
Example 1.4.3*. Consider the space of all periodic functions of period 2π. The
inner product of two such functions f and g is defined to be
〈f, g〉 =1
2π
∫ 2π
0
f(t) g(t) dt.
We claim that the system eint (−∞ < n < ∞), where n ranges over all integers, is
orthonormal. First we check that each of them is of unit length:
‖eint‖2 ==1
2π
∫ 2π
0
|eint|2 dt =1
2π
∫ 2π
0
1 dt = 1.
6
Next, for n �= m, we have
〈eint, eimt〉 =1
2π
∫ 2π
0
eint eimt dt =1
2π
∫ 2π
0
ei(n−m)t dt
=1
n−mei(n−m)t
∣
∣
∣
2π
0=
1
2π(1 − 1) = 0.
The orthogonal decomposition of a function f in this space gives its Fourier series:
f(t) =∑
−∞ <n<∞cne
int, where cn =1
2π
∫ 2π
0
f(t)e−int dt.
Bessel’s inequality says
∑
−∞ <n<∞|cn|2 ≤ 1
2π
∫ 2π
0
|f(t)|2 dt,
showing that the infinite sum on the left hand side is a finite number. This fact is often
stated as follows: the sequence of Fourier coefficients is square summable.
Example 1.4.4*. Consider the space of all even functions of period 2π. The inner
product of two such functions f and g is defined to be
〈f, g〉 =1
π
∫ π
0
f(t) g(t) dt.
Then the following functions
1,√
2 cos t,√
2 cos 2t,√
2 cos 3t, . . .
form an orthonormal system of this space. To show this, we need to check
2
π
∫ π
0
cosmt cosnt dt = δmn =
{
1 if m = n0 if m �= n
(1.4.1)
which is left to the reader as an exercise.
Example 1.4.5*. In this example we introduce the so-called Chebyshev’s polynomi-
als Tn(x) and Un(x), which have extensive applications in numerical analysis and some
extremal problems arising from electrical engineering. First we observe, from Euler’s iden-
tity,
cosnt+ i sinnt = eint = (eit)n = (cos t+ i sin t)n. (1.4.2)
7
We may try to use the binomial formula to expand the right hand side of (1.4.2) and if we
are patient enough, we can see the following pattern
eint = Tn(x) + iUn−1(x) sin t with x = cos t, (1.4.3)
where Tn and Un−1 are some polynomials of degrees n and n − 1 respectively.
However we can use induction to verify (1.4.3). When n = 1, we simply put T1(x) = x
and U0(x) = 1. Assume the validity for n = k. Then
ei(k + 1)t = eikteit = (Tk(x) + iUk−1(x) sin t)(cos t+ i sin t)
= Tk(x) cos t− Uk−1(x) sin2 t+ i(Tk(x) sin t+ Uk−1(x) sin t cos t)
= Tk(x)x+ Uk−1(x)(1 − x2) + i(Tk(x) + Uk−1(x)x) sin t.
Thus we have ei(k + 1)t = Tk + 1(x) + iUk(x) sin t, where
Tk + 1(x) = Tk(x)x+ (1 − x2)Uk−1(x) Uk(x) = Tk(x) + xUk−1(x).
The last two identities tells us how to generate polynomials Tn(x) and Un(x) recursively.
Comparing the real parts of both sides of (1.4.3), we get cosnt = Tn(x) = Tn(cos t). The
orthogonality relation (1.4.1) in the last example can be rewritten as
2
π
∫ π
0
Tm(cos t) Tn(cos t) dt = δmn
Now apply the change of variable x = cos t. Notice that cos 0 = 1, cosπ = −1 and
dx = − sin t dt, which gives dt = −dx/ sin t = −dx/√
1 − cos2 t = −dx/√
1 − x2; (notice
that, for 0 ≤ t ≤ π, we have sin t ≥ 0) Observe that, when t suns from 0 to π, cos t drops
from 1 to −1. Thus we have
2
π
∫ 1
−1
Tm(x)Tn(x)dx√
1 − x2= δmn.
This shows that if we define the inner product of two polynomial functions f and g by
〈f, g〉 =
∫ 1
−1
f(x)g(x)dx√
1 − x2,
then the Chebyshev’s polynomials Tn(x) (n = 1, 2, 3, . . .) form an orthonormal system.
1.5. Given a list of linearly independent vectors v1, v2, . . . , vn in an inner product
space V , there is a procedure of constructing an orthonormal system e1, e2, . . . , en,
called the Gram-Schmidt process, with the property that
span {v1,v2, . . . ,vk} = span {e1, e2, . . . , ek}
8
for each k = 1, 2, . . . , n. To make things easier, let us describe how to construct an
orthogonal basis b1,b2, . . . ,bn with the similar property. After this we normalize b’s to
get e’s — a simple finishing touch.
We construct b’s in n steps: the kth step is the one to obtain bk, (1 ≤ k ≤ n). The
first step is the easiest one: just take v1 to be b1. Now suppose that the (k − 1)th step
has been accomplished: we have obtained an orthogonal system {b1,b2, . . . ,bk−1} which
spans the same subspace as {v1,v2, . . . ,vk−1} does, say Wk−1. Consider the vectors
b1,b2, . . . ,bk−1,vk,vk + 1, . . . ,vn.
Let wk be the projection of vk onto the subspace Wk−1, which is given by
w k =〈vk,b1〉〈b1,b1〉
b1 +〈vk,b2〉〈b2,b2〉
b2 + · · · +〈vk, bk−1〉〈bk−1,bk−1〉
bk−1
according to (1.3.3). Now let bk = vk −wk. Then bk ⊥Wk−1, and hence b1,b2, . . . ,bk
form an orthogonal system. Also, from the fact that wk is in Wk−1, we can see that
the set {b1,b2, . . . ,bk} spans the same subspace as {v1,v2, . . . ,vk} does; (this subspace
should be denoted by Wk.) As we have mentioned before, once we get the orthogonal basis
b1,b2, . . . ,bn, the required orthonormal basis e1, e2, . . . , en can be obtained immediately
by normalization:
e1 =b1
‖b1‖, e2 =
b2
‖b2‖, · · · , en =
bn
‖bn‖.
This process of Gram-Schmidt is more or less a way to turn a given bunch of vectors, one
by one, progressively, to“straighten them up”. Each time, you turn a vector to make it
orthogonal to all the previous vectors which have been “straightened up”. In this way
v1,v2, . . . ,vn is gradually replaced by b1,b2, . . . ,bn, one vector at each time.
Example 1.5.1. Apply Gram–Schmidt process to the basis consisting of v1 =
(1, 1, 1), v2 = (2, 0, 1) and v3 = (0, 0, 3) in C3 to obtain an orthonormal basis.
Solution. Let b1 = v1 = (1, 1, 1),
b2 = v2 −〈v2,b1〉〈b1,b1〉
b1 = (2, 0, 1) − 3
3(1, 1, 1) = (1,−1, 0),
b3 = v3 −〈v3,b1〉〈b1,b1〉
b1 −〈v3,b2〉〈b2,b2〉
b2 = (0, 0, 3) − 3
3(1, 1, 1) − 0
3(1,−1, 0) = (−1,−1, 2).
Upon normalization, we obtain the following orthonormal basis:
e1 =
(
1√3,
1√3,
1√3
)
, e2 =
(
1√2, − 1√
2, 0
)
, e3 =
(
− 1√6, − 1√
6,
2√6
)
.
9
We can use the Gram-Schmidt process to prove that every finite dimensional vector space
has an orthonormal basis. Indeed, if V is a finite dimensional space, either over R or over
C, we can take any basis in V and apply Gram-Schmidt process to this basis to obtain an
orthonormal basis of V .
EXERCISE SET IV.1.
Review Questions. What is the main difference between a complex and a real inner
product space? What is the orthognal projection on to a subspace? How to compute this
when an orthogonal basis of this subspace is given? What is Gram-Schmidt’s process?
What is it good for?
Drills
1. In each of the following cases, find the inner product 〈u,v〉 and the norms |u| and |v|of vectors u and v in C3 (with the standard inner product.)
(a) u = (1, i, 2), v = (−2, i, 1).
(b) u = (i, 2,−2), v = (2, 2i, i).
(c) u = (1 +√
3i, 1, 1 −√
3i), v = (1 −√
3i, 1 +√
3i, 1).
(d) u = (i cosα, sinα, cosα+ i sinα), (cosβ, i sinβ, sinβ + i cosβ).
2. In each of the following cases, find the orthogonal projection of a vector u in an inner
product space to the 1-dimensional subspace spanned by v.
(a) V = R2; u = ( 1√2,− 1√
2) and v = (1, 0).
(b) V = R3; u = (2, 1, 1) and v = (1, 2, 3).
(c) V = C2; u = (1 + i, 1 − i) and v = (1, i).
(d) V = C3; u = (2, 1, 3) and v = (1, 1, i).
(e) V is the space of continuous functions on [0, 1] with the inner product 〈f, g〉 =∫ 1
0f(x)g(x)dx; u is the function f(x) ≡ 1 and v is g(x) = eiπx. (Hint: Use the
identities ez = ez and∫
eaxdx = 1ae
ax +C, where a �= 0.)
(f) Same V as in (e); u is the function h(x) = e(2πi)x and v is k(x) = e(4πi)x.
3. In each of the following cases, find the projection of a vector u in an inner product
space V onto the subspace spanned by v1,v2.
(a) V = R2; u = (your age, your weight), v1 = (1, 0) and v2 = (1, 1).
(b) V = R3; u = (3, 1, 1), v1 = (1,−1, 0) and v2 = (1, 1, 2). (Notice that v1 ⊥ v2.)
(c) V = R3; u = (2, 3, 1), v1 = (1, 2, 0) and v2 = (4, 7, 0). (Hint: Determine the
subspace spanned by v1 and v2 first.)
(d) V = C3; u = (3i, 1,−1), v1 = (1, i, i) and v2 = (2i, 1, 1). (Notice that v1 ⊥ v2)
10
4. True or false (u and v are vectors in a complex inner product space V , M and N are
subspaces of V and z is a complex number; all of them are arbitrary):
(a) 〈u, zv〉 = 〈zu,v〉.(b) 〈zu, zv〉 = |z|2〈u,v〉.(c) 〈u,v〉〈v,u〉 = |〈u,v〉|2.
(d) If the identity 〈u,v〉 = 〈v,u〉 holds, then 〈u,v〉 must be a real number.
(e) If u1 ⊥ v1 and u2 ⊥ v2, then u1 + u2 is orthogonal to v1 + v2.
(f) If u1 ⊥ v and u2 ⊥ v, then u1 + u2 is orthogonal to v.
(g) If u ⊥ v, then v ⊥ u.
(h) If u ⊥ v and v ⊥ w, then u ⊥ w.
(i) If the orthogonal projections of u and v on the subspace M (of an inner product
space V ) are the same, then u− v is orthogonal to M .
(j) The Gram-Schmidt orthogonalization process is a process to construct an in-
ner product on a vector space such that a given basis in this space becomes an
orthonormal basis.
5. In each of the following parts, apply the Gram-Schmidt orthogonalization process to
the given linearly independent set of vectors (in the given order) in C4.
(a) (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1).
(b) (1, 1, 1, 1), (1, 1, 1, 0), (1, 1, 0, 0).
(c) (1, 1, 1, 1), (1, 0, 1, 0), (0, 0, 1, 1), (0, 0, 0, 1).
(d) (1, i, i, i), (1, i, i, 0), (1, i, 0, 0), (1, 0, 0, 0).
(e) (0, 0, 2, 0), (1, 0, 4, 0), (5, 2, 0, 1).
Exercises
1. Let V be a complex inner product space and denote by VR the real vector space
obtained from V by restricting scalars to R.
(a) Show that the recipe 〈u,v〉 R = Re〈u,v〉 defines an inner product for the real
space V R . (Rez and Imz stand for the real part and the imaginary part, respec-
tively, of a complex number z.)
(b) Show that the recipe 〈u,v〉I = Im〈u,v〉 does not give an inner product for VR .
(c) Check the identity 〈u,v〉 = 〈u,v〉 R + i〈u, iv〉 R .
2. Let E = {e1, e2, . . . , en} be an orthonormal basis of a complex inner product space V .
Show that, if [v]E = (v1, v2, . . . , vn) and [w]E = (w1, w2, . . . , wn) for v,w ∈ V , then
〈v,w〉 = v1w1 + v2w2 + · · · + vnwn.
3. Let u and v be two vectors in a complex inner product space.
11
(a) Show that |u + v|2 = |u|2 + |v|2 + 2Re〈u,v〉.(b) From the above identity derive that 4Re〈u,v〉 = |u + v|2 − |u− v|2.
(c) Show that the imaginary part of 〈u,v〉 is Re〈u, iv〉. Use this fact and (c) to
derive the following polarization identity for complex inner product spaces:
〈u,v〉 =1
4(|u + v|2 − |u− v|2 + i|u+ iv|2 − i|u− iv|2).
(A neat way to rewrite this is 〈u,v〉 = 14
∑3k = 0i
k|u + ikv|2.)
4. Let v1 and v2 be linearly independent vectors in a real inner product space V . Show
that the area A of the parallelogram stretched by v1 and v2 is equal to the square
root of ∣
∣
∣
∣
〈v1,v1〉 〈v1,v2〉〈v2,v1〉 〈v2,v2〉
∣
∣
∣
∣.
(You also have to explain why the above determinant cannot be negative so that we
can take its square root.) Hint: Write v2 = w + h, where w is the projection of v2
onto v1. Then A2 = |v1|2|h|2.
12
§2. Operators on Inner Product Spaces
2.1. In this section we consider operators on a finite dimensional inner product space
over either R or C. Because of an extra structure on the vector space, namely, the inner
product, these operators have a new feature called adjoint, which behaves like the complex
conjugation for complex numbers. For notational convenience, we limit our discussion to
operators on inner product spaces, although all material from §2.1 to §2.3 are applicable
to linear mappings between inner product spaces.
Let T be a linear operator on a (finite dimensional) inner product space V , either real
or complex. The adjoint of T , denoted by T ∗, is the linear operator on the same space V
such that the identity
〈Tx,y〉 = 〈x, T ∗y〉 (2.1.1)
holds for all vectors x and y in V . At the outset it is not clear whether such T ∗ exists, and
if it does exist, it is not clear if it is uniquely determined by T . We have to establish the
existence and uniqueness of the operator T ∗ which satisfies (2.1.1) in order to justify this
definition. Aside: The definition of adjoint here is unusual. Instead of telling us exactly
what T ∗ is, it singles out the most desirable property of T ∗. We may call it a “priority
definition”, because this desirable property here has priority over anything else. “Priority
definitions” are not rare in mathematics. To justify the above “priority definition” for the
adjoint of an operator, we must prove the following statement:
(*) For every linear operator T on V , there is a unique linear operator S on V such
that 〈Tx,y〉 = 〈x, Sy〉 for all x,y ∈ V .
Once this statement is proven, we can define T ∗ to be the unique operator S described in
this statement. The “uniqueness” part is easier to prove. Assume that both S1 and S2 have
the same property as S as described, namely 〈Tx,y〉 = 〈x, S1y〉 and 〈Tx,y〉 = 〈x, S2y〉for all x and y in V . Let R = S1−S2. Then 〈x, Ry〉 = 〈x, S1y〉−〈x, S2y〉 = 0 for arbitrary
x,y in V . Thus, for every y in V , Ry is orthogonal to all vectors in V and hence Ry = 0.
Therefore R = O, or S1 = S2.
The proof of the existence part of (∗) is based on the following lemma, which is a
“baby version” of famous theorem called the Riesz representation theorem.
Lemma 2.1.1. If φ is a linear functional of V (that is, φ is in V ′) then there exists
a unique vector a in V such that φ(x) = 〈x, a〉.
Take an orthonormal basis E = {e1, e2, . . . , en} in V . (The last remark in the last section
gaurantees its existence.) Then, for each vector x in V , we have x =∑n
k = 1〈x, ek〉ek (see
13
identity (1.2.2) in the last section) and hence
φ(x) = φ(∑n
k = 1〈x, ek〉ek
)
=∑n
k = 1〈x, ek〉φ(ek) =
∑n
k = 1〈x, ek〉φ(ek)
=∑n
k = 1〈x, φ(ek)ek〉 =
⟨
x,∑n
k = 1φ(ek)ek
⟩
.
Hence φ(x) = 〈x,a〉 where a =∑n
k = 1φ(ek)ek. The uniqueness of a is left for you to check
as an exercise.
Now we return to the proof of the existence part of (∗). Take any y in V and consider
the linear functional φy defined by putting φy (x) = 〈Tx,y〉. By the above lemma we know
that there exists a unique vector determined by y, say Sy such that φ y (x) = 〈x, Sy〉. Thus
we have 〈Tx,y〉 = 〈x, Sy〉 for all x in V . The linearity of S is left for you to check as an
exercise. Thus S is the required operator T ∗.
We consider the matrix representation [T ] E of an operator T with an orthonormal
basis E = {e1, e2, . . . , en} in V . The first column of [T ]E is filled with the coordinates of
Te1. Since E is an orthonormal basis, we have (see (1.2.2) in the last section)
Te1 = 〈Te1, e1〉e1 + 〈Te1, e2〉e2 + · · · + 〈Te1, en〉en.
Hence the first column of [T ] E is filled with 〈Te1, e1〉, 〈Te1, e2〉, etc. The same method
allows us to figure out other columns. We arrive at:
[T ]E =
〈Te1, e1〉 〈Te2, e1〉 · · · 〈Ten, e1〉〈Te1, e2〉 〈Te2, e2〉 · · · 〈Ten, e2〉
......
...
〈Te1, en〉 〈Te2, en〉 · · · 〈Ten, en〉
.
The (j, k)-entry of [T ]E , denoted by tjk, is given by
tjk = 〈Tek, ej〉.
Reversing the order of j, k looks awkward, but things turn out that way and we cannot
help it. Now the (k, j)–entry of [T ∗]E , denoted by t∗kj , is given as follows:
t∗kj ≡ 〈T ∗ej , ek〉 = 〈ej , T ∗ek〉 = 〈Tej , ek〉 = tjk.
Thus the matrix of T ∗ relative to E is the conjugate transpose of the matrix of T relative
to E. We also call the conjugate transpose of a matrix A the adjoint of A and denote it
14
by A∗. Thus A∗ = A⊤
. We have shown that “the matrix of the adjoint of T is the adjoint
of the matrix of T” relative to any orthonormal basis E :
[T ∗] E = [T ]∗E (2.1.2)
We give some quick examples of adjoints of matrices as follows:
[
2i 1 − i2i 1 + i
]∗=
[
−2i −2i1 + i 1 − i
]
,
[
z ww z
]∗=
[
z ww z
]
.
Example 2.1.1. Let E = {e1, e2, . . . , en} be an orthonormal basis of a inner product
space. By the forward shift relative to this basis we mean the operator S satisfying
Se1 = e2, Se2 = e3, . . . , Sen = 0.
What is its adjoint S∗ ? Well, the representing matrix of S relative to E is
[S] E =
0 0 0 · · · 0 0 01 0 0 · · · 0 0 00 1 0 · · · 0 0 00 0 1 · · · 0 0 0
0 0 0 · · · 1 0 00 0 0 · · · 0 1 0
with [S]∗E =
0 1 0 · · · 0 0 00 0 1 · · · 0 0 00 0 0 · · · 0 0 00 0 0 · · · 0 0 0
0 0 0 · · · 0 0 10 0 0 · · · 0 0 0
.
From [S∗] E = [S]∗E we know
S∗e1 = 0, S∗e2 = e1, S∗e3 = e2, . . . . . . , S
∗en = en−1.
Naturally, S∗ is called the backward shift relative to E.
Example 2.1.2. An operator D on V is diagonal relative to the orthonormal basis
E = {e1, . . . , en} if there are scalars λ1, λ2, . . . , λn such that Dek = λkek for all k =
1, 2, . . . , n. In this case the representing matrix of D relative to E is a diagonal matrix
[D]E =
λ1 0 . . . 00 λ2 . . . 0...
0 0 . . . λn
with [D]∗E =
λ1 0 . . . 00 λ2 . . . 0...
0 0 . . . λn
.
The representing matrix of the adjoint D∗ is also a diagonal matrix, obtained by replacing
each entry on the main diagonal by its complex conjugate. Therefore D∗ is also a diagonal
15
operator relative to the basis E with D∗ek = λkek for k = 1, 2, . . . , n. Notice that, in case
the scalars λ1, λ2, . . . , λn are real numbers, D and D∗ have the same representation matrix
relative to E and hence D = D∗. In this case D is called a Hermitian operator or a
self–adjoint operator.
2.2. The following elementary properties about adjoints should be kept in mind:
(S + T )∗ = S∗ + T ∗, (αS)∗ = αS∗, (ST )∗ = T ∗S∗,
T ∗∗ = T, O∗ = O, I∗ = I,(2.2.1)
where S and T are operators on a (finite dimensional) inner product space, and α is an
arbitrary scalar. One way to prove these identities is by definition. For example, to show
(ST )∗ = T ∗S∗, we only have to check the identity 〈STx,y〉 = 〈x, T ∗S∗y〉. This is an easy
thing to do, provided you understand the definition of adjoint:
〈x, T ∗S∗y〉 = 〈x, T ∗(S∗y)〉 = 〈Tx, S∗y〉 = 〈STx,y〉.
In the special case V = Cn or Rn with the standard inner product, every linear operator
on V is induced by a matrix, i.e. all linear operators are of the form MA for some n × n
matrix A. The verification of the following proposition is left to you as an exercise.
Proposition 2.2.1. If T is induced by A, then T ∗ is induced by A∗, i.e. M∗A = MA∗ .
(The adjoint of the induced operator is the induced operator of the adjoint.)
We have seen some advantages of studying matrices by investigating the operators induced
by them. We will discover that the above simple fact is very handy in treating matrix
problems by this approach.
2.3. Let M be a subspace of an innr product space V over C or R. The orthogonal
complement of M , denoted by M⊥, is the set of vectors in V perpendicular to all vectors
in M . Thus v is in M⊥ if v ⊥ M , that is, v ⊥ x for all vectors x in M . Using set–
theoretical notation, we can write
M⊥ = {v ∈ V : 〈v,x〉 = 0 for all x ∈M}.
If x is in both M and M⊥, then we have 〈x,x〉 = 0 and hence x = 0. Thus M∩M⊥ = {0}.
On the other hand, for any vector v in V , we have the orthogonal decomposition v = w+h
with w ∈ M and h ∈ M⊥; see §1.3 of the last section. This shows V = M + M⊥. It
follows from Theorem 2.3.2 in Chapter II that
dimV = dimM + dimM⊥. (2.3.1)
16
It is clear from the definition of orthogonal complement that M is contained in M⊥⊥. On
the other hand, the above identity tells us that M and M⊥⊥ have the same dimension.
Hence M⊥⊥ = M .
For T ∈ L (V ), where V is an inner product space over C or R, we have
Theorem 2.3.1. The kernel of T ∗ is the orthogonal complement of the range of T :
kerT ∗ = T (V )⊥.
Remark: Since T ∗∗ = T and W⊥⊥ = W for a subspace W of V , we can deduce from this
theorem that kerT = T ∗(V )⊥, T (V ) = (kerT ∗)⊥, and T ∗(V ) = (kerT )⊥.
The proof of this important theorem is short and neat:
v ∈ T (V )⊥ ⇔ 〈v, Tx〉 = 0 for all x ∈ V⇔ 〈T ∗v,x〉 = 0 for all x ∈ V⇔ T ∗v = 0
⇔ v ∈ kerT ∗.
The proof is complete.
We give two interesting applications of the above theorem. In the first application
we use it to prove “row rank = column rank”. First we do this for a real, square matrix,
say A of size n × n. Consider the operator T on Rn induced by A, i.e. T = MA. The
column rank of A is the dimension of the subspace spanned by column vectors of A and
this subspace is just T (V ). By the above theorem,
dim kerT ∗ = dimT (V )⊥ = n− dimT (V ). (2.3.2)
The last identity follows (2.3.1). On the other hand,
dim kerT ∗ + dimT ∗(V ) = n. (2.3.3)
From (2.3.2) and (2.3.3) we obtain dimT (V ) = dimT ∗(V ). However T ∗ is the operator
induced by the matrix A∗, which is the transpose A⊤ of A (because A is real). So the
last identity tells us that the column ranks of A and A⊤ are the same. But the column
rank of A⊤ is just the row rank of A! So the staement is proven for a real square matrix.
What can we do if A is not real? In this case we work with Cn instead of Rn. The same
argument allows us to conclude A and A∗ have the same column rank. But here we have a
small trouble: A∗ is the conjugate transpose of A, instead of the transpose A⊤. However,
17
observe that the column rank of a matrix does not change if we replace all entries by their
complex conjugates. So, this “small trouble” is in fact not a trouble. What can we do if
the given matrix is not a square matrix? In this case we “augment” this matrix with more
rows or columns of zeros to convert it into a square matrix. The row rank and the column
rank of the enlarged matrix clearly remain the same. Thus we have proved “row rank =
column rank” in its full generality.
The next application is about the least square approximation. Let T be a linear
operator on a finire dimensional inner product space V and let b be a vector not in its
range T (V ). Consider the following “ill–posed problem”: solve Tx = b. Since b is
not in T (V ), this equation has no solution. The best we can do is to find some x so that
the difference between Tx and b is minimized. So now we are asking to find a vector x0
at which ‖Tx − b‖ is minimized. The minimization requirement tells us that y0 = Tx0
is, among all vectors in the subspace T (V ), the one nearest to b. So y0 − b must be
perpendicular to the subspace T (V ). Theorem 2.3.1 tells us that y0 −b is in the kernel of
T ∗, that is T ∗(y0 − b) = 0, or T ∗(Tx0 − b) = 0, that is
T ∗Tx0 = T ∗b. (2.3.4)
The argument here can be reversed: if (2.3.4) holds, then y0−b ⊥ T (V ). We have proved
that the least square solutions to Tx = b are the same as the solutions to T ∗Tx0 = T ∗b.
Example 2.3.2. Find the least square solution(s) to x1 − 2x2 = 1, −x1 + 2x2 = 3.
Solution. Write the system of equations as Ax = b with
A =
[
1 −2−1 2
]
, x =
[
x1x2
]
, b =
[
13
]
.
It is easy to see that this is an ill–posed problem and hence we should look for the least
square solution(s) by solving A∗Ax = A∗b. Now
A∗A =
[
1 −1−2 2
] [
1 −2−1 2
]
=
[
2 −4−4 8
]
, A∗b =
[
1 −1−2 2
] [
13
]
=
[
−24
]
.
Thus A∗Ax = A∗b becomes 2x1 − 4x2 = −2, −4x1 + 8x2 = 4, giving us x1 − 2x2 = −1.
Introducing the parameter t = x2, we can write down the solutions as x1 = 2t−1, x2 = t.
Example 2.3.3. Find the least square solution(s) to ix1 + x2 = 2, x1 + ix2 = −2,
x1 + x2 = 4.
18
Solution. Write the system of equations as Ax = b with
A =
i 11 i1 1
, x =
[
x1x2
]
, b =
2−2
4
.
We look for the least square solution(s) by solving A∗Ax = A∗b. Now
A∗A =
[
−i 1 11 −i 1
]
i 11 i1 1
=
[
3 11 3
]
, A∗b =
[
−i 1 11 −i 1
]
2−2
4
=
[
2 − 2i6 + 2i
]
.
Thus A∗Ax = A∗b becomes 3x1 + x2 = 2 − 2i, x1 + 3x2 = 6 + 2i, giving us x1 = −i and
x2 = 2 + i.
2.4. An operator H on a complex inner space V is called a self-adjoint operator,
or a Hermitian operator, if H = H∗. In the same way, we call a square matrix A a
self-adjoint matrix or a Hermitian matrix if A = A∗, i.e. if A is equal to its conjugate
transpose. Thus, a 2 × 2 Hermitian matrix must have the form
[
c a+ bia− bi d
] (
such as
[
4 2 + 3i2 − 3i 7
])
,
where a, b, c, d are real numbers.
Notice that a complex number z is real if and only if z = z, which is the one-
dimensional version of the identity T = T ∗. Hence the situation of Hermitian operators
among other operators resembles that of real numbers among complex numbers.
Example 2.4.1. Verify that eigenvalues of Hermitian operators are real.
Solution. Let T be a Hermitian operator on V and let λ be an eigenvalue for T . Then
there is a nonzero vector v in V such that Tv = λv. So 〈Tv,v〉 = 〈λv,v〉 = λ〈v,v〉. On
the other hand,
〈Tv,v〉 = 〈v, T ∗v〉 = 〈v, Tv〉 = 〈v, λv〉 = λ 〈v,v〉.
Hence λ 〈v,v〉 = λ 〈v,v〉. As v �= 0, we have 〈v,v〉 = ‖v‖2 �= 0 and hence 〈v,v〉 of the
last identity can be canceled. Thus λ = λ. Therefore λ is real.
We consider a similar concept for the real case. An operator T on a real inner product
space V satisfying T = T ∗ is called a symmetric operator. A real square matrix A is called
a symmetric matrix if A = A⊤. The representation matrix of a symmetric operator relative
19
to an orthonormal basis is symmetric. A real symmetric matrix is clearly a Hermitian and
hence its eigenvalues are real. We can translate this statement into an assertion about
symmetric operators on real inner product space:
Proposition 2.4.1. If T is a symmetric operator on a real, finite dimensional, inner
product space, then T has a real eigenvalue λ and consequently ker(T − λI) �= 0.
2.5. An operator T defined on a (finite dimensional) real inner product space V is an
orthogonal operator if it preserves the inner product of V , i.e.
〈Tx, Ty〉 = 〈x,y〉 (2.5.1)
We can rewrite the above identity as 〈x, T ∗Ty〉 = 〈x,y〉. This gives 〈x, (T ∗T − I)y〉 = 0
for all x and y in V . We deduce that T ∗T − I = O, or T ∗T = I, i.e. T is invertible
and its inverse is T ∗. By reversing the above argument, we can show that, conversely, if
T−1 = T ∗, then T is orthogonal. We conclude: A linear operator T on a real inner product
space is orthogonal if and only if T ∗T = TT ∗ = I.
An orthogonal matrix is a real square matrix A satisfying AA⊤ = A⊤A = I.
The representing matrix of an orthogonal operator relative to an orthonormal basis is an
orthogonal matrix. (Verify this statement!)
By letting y = x in identity (2.5.1), we have 〈Tx, Tx〉 = 〈x,x〉, or ‖Tx‖2 = ‖x‖2.
Hence we have ‖Tx‖ = ‖x‖ for all x ∈ V . In other words, an orthogonal operator preserves
the norm. It turns out that the converse of this statement is also true:
Proposition 2.5.1. A norm-preserving linear operator is an orthogonal operator.
To prove this fact, we have to express the inner product of two vectors in terms of the
norms of certain linear combinations of them, called the polarization identity:
4〈x,y〉 = ‖x + y‖2 − ‖x− y‖2. (2.5.2)
To prove (2.5.2), we begin with the elementary identity ‖v‖2 = 〈v,v〉 which holds for all
vectors v in an inner product space. Applying this identity for v = x + y, we have
‖x+y‖2 = 〈x+y,x+y〉= 〈x,x〉+〈x,y〉+〈y,x〉+〈y,y〉= ‖x‖2+2〈x,y〉+‖y‖2.
(2.5.3)
Letting v = x− y instead, we will get a similar result:
‖x− y‖2 = ‖x‖2 − 2〈x,y〉 + ‖y‖2. (2.5.4).
20
Now you can see that the polarization identity (2.5.2) is obtained by subtracting (2.5.4)
from (2.5.3) and a simple rearrangement of sides.
From the polarization identity (2.5.2) we can deduce Poposition 2.5.1 stated above.
Indeed, if T is a linear operator on V satisfying ‖T (x)‖ = ‖x‖ for all x ∈ V , then
4〈Tx, Ty〉 = ‖Tx + Ty‖2 − ‖Tx− Ty‖2
= ‖T (x + y)‖2 − ‖T (x− y)‖2
= ‖x + y‖2 − ‖x + y‖2 = 4〈x,y〉.
Canceling 4, we get the required identity which characterizes orthogonal operators.
Let σ be a permutation of the set {1, 2, . . . , n}. Then Tσ on Rn defined by
Tσ(x1, x2, . . . , xn) = (xσ(1), xσ(2), . . . , xσ(n))
is an orthogonal operator, because the sum of squares of x1, x2, . . . , xn remains the same if
their order is changed. For example, if σ sends 1, 2, 3, 4, 5 to 4, 1, 5, 2, 3 respectively,
then Tσ(x1, x2, x3, x4, x5) = (x4, x1, x5, x2, x3). Another example of orthogonal operators
is the operator MA on R2 (with the standard inner product) induced by
A =
[
cos θ − sin θsin θ cos θ
]
,
where θ is a fixed real number. You can check directly that A is an orthogonal matrix.
From this you may conclude that MA is an orthogonal operator.
Example 2.5.2. By an orthogonal projection we mean a self–adjoint projection.
Thus, if P is an orthogonal projection, then P 2 = P and P ∗ = P . Verify the following
assertion: if P is an orthogonal projection, then I − 2P is an orthogonal operator.
Solution. Since P is an orthogonal projection, we have P 2 = P and P ∗ = P . So
(I−2P )∗(I−2P ) = (I−2P )(I−2P ) = I−2P−2P+4P 2 = I−2P−2P+4P 2 = I.
Similarly we have (I − 2P )(I − 2P )∗ = I. Hence I − 2P is an orthogonal operator.
2.6. Let A be a n× n real matrix. Denote by E = {e1, e2, . . . , en} the standard basis
of Rn. Let T = MA be the operator on Rn induced by A, i.e. T = MA. Then, as we know
very well by now, vj ≡ Tej is the jth column of A, for each j. If A is an orthogonal matrix,
then T is an othogonal operator and hence v1,v2, . . . ,vn, which are the images of vectors
in the standard orthonormal basis under the operator T , also form an orthonormal basis.
21
(Notice that, since T preserves inner products, it sends an orthonormal basis to another.)
Conversely, suppose that the columns v1,v2, . . . ,vn of A form an orthonormal basis of
Rn, that is 〈vj,vk〉 = δjk. (Recalled that the Kronecker delta δjk stands for 1 if j = k and
0 if j �= k.) Then, for vectors x = (x1, . . . , xn) =∑
k xkek and y = (y1, . . . , yn) =∑
j yjejin Rn, we have Tx =
∑
kxkTek =∑
kxkvk and similarly Ty =∑
jyjvj , and hence
〈Tx, Ty〉 =∑
k,j
xkyj〈vk,vj〉 =∑
k,j
xkyjδkj =∑
k
xkyk = 〈x,y〉.
This says that T is an orthogonal operator. Hence A is an orthogonal matrix. We conclude:
Proposition 2.6.1. A real n × n matrix is an orthogonal matrix if and only if its
columns form an orthonormal basis of Rn.
For example, we observe that (−1/3, 2/3, 2/3), (2/3,−1/3, 2/3), (2/3, 2/3,−1/3) form an
orthonormal basis of R3; (this can be cheked directly). Therefore the 3 × 3 matrix
A =
−1/3 2/3 2/32/3 −1/3 2/32/3 2/3 −1/3
is orthogonal. The induced operator MA given by
MA(x1, x2, x3) =1
3(−x1 + 2x2 + 2x3, 2x1 − x2 + 2x3, 2x1 + 2x2 − x3)
is an orthogonal operator on R3.
Example 2.6.2. Notice that the matrix
H1 =1√2
[
1 1−1 1
]
wih columns v1 =
[
1/√
21/
√2
]
, v2 =
[
1/√
2−1/
√2
]
is an orthogonal matrix, since we can check that its columns v1, v2 form an orthonormal
basis in R2. Now we describe a process to define the Hadamard matrix Hn. Let
A =
[
a11 a12a21 a22
]
be a 2 × 2 matrix and let B be an n× n matrix. We define their tensor product A⊗B
to be the 2n× 2n matrix given
A⊗B =
[
a11B a12Ba21B a22B
]
.
22
We have the following basic idntities about tensor products of matrices:
aA⊗ bB = ab(A⊗B), (A⊗B)∗ = A∗ ⊗B∗, (A⊗B)(C ⊗D) = AC ⊗BD. (6.1)
A consequence of these identities is: if A and B are orthogonal (or unitary), then so is
A⊗B. For example
H2 ≡ H1 ⊗H1 =1
2
[
1 1−1 1
]
⊗[
1 1−1 1
]
≡ 1√2
[
H1 H1
−H1 H1
]
=1
2
1 1 1 1−1 1 −1 1−1 −1 1 1
1 −1 −1 1
We can define Hn inductively by putting
Hn = H1 ⊗Hn−1 =1√2
[
Hn−1 Hn−1
−Hn−1 Hn−1
]
which is a 2n × 2n orthogonal matrix, called the Hadamard matrix. We remark that
tensoring is an important operation used in many areas, such as quantum information and
quantum computation.
Notice that the transpose A⊤ of an orthogonal matrix A is also an orthogonal matrix.
In fact, from AA⊤ = A⊤A = I we immediately get (A⊤)⊤A⊤ = A⊤(A⊤)⊤ = I. Since
transposing a matrix changes its columns into rows, from Proposition 2.6.1 we deduce: a
real n×n matrix is an orthogonal matrix if and only if its rows form an orthonormal basis
in Rn.
2.7. Unitary operators are the complex version of orthogonal operators. A linear
operator U on a complex inner product space V is a unitary operator if
〈Ux, Uy〉 = 〈x,y〉
for all x and y in V , i.e. U preserves the inner product of V . As in the real case, a
linear operator on a complex inner product space is a unitary operator if and only if
U∗U = UU∗ = I, that is, U∗ is the inverse of U . As before, unitary operators are norm-
preserving. The converse is also true but the proof is more difficult than the orthogonal
case. Similarly, a complex square matrix A is called a unitary matrix if AA∗ = A∗A = I.
By recycling previous arguments we can show that a n × n complex matrix is unitary if
and only if its columns (or its rows) form an orthonormal basis. A quick example:
[
cos θ i sin θi sin θ cos θ
] (
such as1√2
[
1 ii 1
]
and1
2
[√3 ii
√3
])
23
is an unitary matrix for each real θ.
Example 2.7.1. Let ω = e2πi/n. The columns of the following matrix F is the
orthonormal basis of Cn (see Example 1.4.1 in §1.4)
Fn =1√n
1 1 1 1 1 · · · 11 ω ω2 ω3 ω4 · · · ωn−1
1 ω2 ω4 ω6 ω8 · · · ω2(n−1)
1 ωn−1 ω2(n−1) ω3(n−1) ω4(n−1) · · · ω(n−1)(n−1)
and hence F is a unitary matrix. The linear mapping associated with this matrix is
called the finite Fourer transform. To speed up this transform by using some special
methods is crucial for reducing the cost of communication network in recent years. The
rediscovery of so–called FFT (Fast Fourier Transform) has great practical value in cutting
cost substantially. Now the historian in mathematics can trace back FFT method as early
as Gauss, who certainly did not have this sort of application im mind!
2.8. In the above subsection we have seen that unitary matrices come from unitary
operators. Here we describe another source of such matrices: change of orthonormal basis.
In §2 of Chapter III we describe the connection between matrices [T ]E and [T ]Frepresenting an operator T on a vector space V relative to two different bases E and F in
V : [T ] E and [T ]F are similar, i.e. there is an invertible matrix P such that
[T ]F = P [T ] E P−1. (2.8.1)
Now we make the further assumptions that V is an inner product space and both E and
F are orthonormal bases. If we go over the argument in §2 in Chapter III again, we can
check that the matrix P in (2.8.1) is a unitary matrix in the complex case and P is a
orthogonal matrix in the real case. This leads to the following two definitions: two n× n
complex matrices A and B are unitarily equivalent if there is a unitary matrix U such
that UAU∗ = B; two n× n real matrices A and B are orthogonally equivalent if there
is a orthogonal matrix P such that PAP⊥ = B. Using the terminology here, we have
1. Matrices representing the same operator on a finite dimensional complex inner
product spaces relative to different orthonormal bases are unitarily equivalent;
2. Matrices representing the same operator on a finite dimensional real inner product
spaces relative to different orthonormal bases are orthogonally equivalent.
24
EXERCISE SET IV.2.
Review Questions. Can I state the definitions and give examples of the following terms?
adjoint of a linear operator (of a matrix), Hermitian operator (Hermitian matrix),
unitary operator (unitary matrix), orthogonal operator (orthogonal matrix), sym-
metric operator (symmetric matrix).
What numbers do they correspond in the one-dimensional case?
Drills
1. In each of the following cases, find the adjoint A∗ of the given matrix A:
(a) A =
1 + 2i 2 + 3i 3 + 4i4 + 5i 5 + 6i 6 + 7i7 + 8i 8 + 9i 9 + 9i
(b) A =
0 i 00 0 10 0 0
(c) A =
i 1 ii 1 ii 1 i
.
2. Let A and B be n× n matrices and let a be a complex number. Verify the following
identities. (♠ These identities will be used freely without giving explicit references.
So you must get familiar with them. ♠.)
(A+B)∗ = A∗ +B∗, (aA)∗ = aA∗, (AB)∗ = B∗A∗, (A∗A)∗ = A∗A.
Also, if A is invertible, then so is A∗ and (A−1)∗ = (A∗)−1.
3. Find the missing entry (or entries) indicated by ∗ in each of the following unitary
matrices:
1√2
[
1 ∗1 1
]
,1√2
[
1 ∗i 1
]
,1
5
[
−3 4i4i ∗
]
,1
2
[
1 + i 1 − i∗ 1 + i
]
,
1
3
1 2 ∗2 1 ∗2 ∗ 1
,1
3
i 2 22 ∗ ∗2 2i ∗
,1
2
1 ∗ ∗ 11 −1 −1 ∗1 1 1 ∗1 −1 1 ∗
.
4. Find the least square solution(s) to each of the following inconsistent systems
(a) x1 + ix2 = 0, x1 + ix2 = 2.
(b) x1 + x2 = 1, x1 − x2 = 1, x1 + 2x2 = 5.
(c) x1 + x2 + x3 = 1, x1 − x2 + x3 = 1, x1 − x2 − x3 = 1, x1 + x2 − x3 = 1.
25
5. True or False:
(a) The sum of two Hermitian matrices is Hermitian.
(b) The product of two Hermitian matrices is Hermitian.
(c) If a Hermitian matrix is invertible, then its inverse is also Hermitian.
(d) The sum of two unitary matrices is unitary.
(e) The product of two unitary matrices is unitary.
(f) Unitary matrices are invertible and their inverses are also unitary.
(g) An orthogonal matrix is a matrix orthogonal to a set of given matrices.
(h) If H is a Hermitian matrix and if U is a unitary matrix, then UHU−1 is a
Hermitian matrix.
(i) If H is a Hermitian matrix and if P is an invertible matrix, then PHP−1 is a
Hermitian matrix.
(j) If A is an arbitrary matrix, then A∗A is a Hermitian matrix.
6. Write down each of the following matrices explicitly:
(a) the 8 × 8 Hadamard matrix H3 explicitly (for notation, see Example 2.6.2)
(b) unitary matrices F2, F3, F4, F6, F8 in finite Fourier transform (for notation, see
Example 2.7.1).
Exercises
1. Let R be a linear operator on a complex inner product space V such that R2 = I.
Show that R is unitary if and only if R is Hermitian.
2. Let T be a linear operator on a finite dimensional complex inner product space V .
Show that there exist unique Hermitian operators H and K on V such that T =
H+ iK. (Aside: This is the analogue of the identity z = x+ iy (where x and y are the
real part and the imaginary part of z) for complex numbers. Notice that T ∗ = H−iK,
which is analogous to z = x− iy.)
3. Recall that a projection is a linear operator E satisfying E2 = E. If furthermore,
the space V on which E is a complex inner product space and if E is a Hermitian
operator (thus E2 = E = E∗), then E is called an orthogonal projection. Show that
a projection E on an inner product space is an orthogonal projection if and only if its
kernel is orthogonal to its range: kerE ⊥ E(V ).
4. Let V be a 2-dimensional inner product space and let T be a linear operator on V .
Show that T 2 = O if and only if there is an orthogonal system {e, f} in V such that
26
Tx = 〈x, f〉e for all x ∈ V . Hint: The rank of T is 0 or 1. (Aside: It is straightforward
to check that if T is an operator having the form Tx = 〈x, f〉e with e ⊥ f , then T 2 = O.
Indeed, for each x ∈ V , T 2x = T (Tx) = T (〈x, f〉e) = 〈x, f〉Te = 〈x, f〉〈e, f〉e = 0,
due to the assumption that 〈e, f〉 = 0.)
5. Let T be a linear operator on a finite dimensional inner product space V .
(a) Show that, for all x ∈ V , 〈T ∗Tx,x〉 ≥ 0.
(b) Show that T is invertible if and only if T ∗T is invertible.
6. Show that (a) if P is an orthogonal matrix, then det(P ) is either 1 or −1, and (b) if
U is an unitary matrix, then | det(U)| = 1.
7. Let A be a 2 × 2 orthogonal matrix. Show that
(a) in case det(A) = 1, A is a rotation matrix, i.e.
A =
[
cos θ − sin θsin θ cos θ
]
for some real number θ,
(b) in case det(A) = −1,
A =
[
cos θ sin θsin θ − cos θ
]
, and
(c) in case det(A) = −1, A2 = I; (Aside: A represents a reflection.)
8. Show that a 2 × 2 unitary matrix U with det(U) = 1 can always be expressed as
[
z1 z2−z2 z1
]
,
where z1 and z2 are complex numbers satisfying |z1|2 + |z2|2 = 1.
9. Show that, if H is a Hermitian operator on a finite dimensional complex inner product
space V , then H − iI is invertible and U ≡ (H + iI)(H − iI)−1 is a unitary operator
on V . (Aside: U is called the Cayley transform of H.)
10. Let A, B, C be 2 × 2 real matrices. Check that
(a) (A⊗B) ⊗ C = A⊗ (B ⊗ C)
(b) there is a 4 × 4 permutation matrix P such that P (B ⊗A)P−1 = A⊗B.
11*. Let T be a linear operator on a complex inner product space. Prove that
(a) T is Hermitian if and only if 〈Tx,x〉 is real for each x ∈ V .
(b) T = O if and only if 〈Tx,x〉 = 0 for each x ∈ V .
27
§3. Orthogonal Diagonalization
3.1. Question: Which operators on a finite dimensional inner product space possess
orthonormal bases consisting of eigenvectors? In other words, which operators can be
represented by diagonal matrices relative to appropriate orthonormal bases? For short,
which operators are orthogonally diagonalizable?
This question does not specify whether the space is real or complex. We have to
consider both situations. Moreover, we have to consider them separately, because they
come up with different answers. In both situations we take the same approach: find a
necessary condition first (an easy step) and then prove the sufficiency of this condition
(the hard part). Before we proceed, let us make an advertisement for the forthcoming
answers. There are three great things about it: first, it is thorough; second, it is neat and
pleasant; and third, it is extremely important, for both theoretical and practical purposes!
Without exaggeration, we can say that these answers are the best things we can learn in
a subject called linear algebra.
Let us start with our investigation. First we consider the real case. Let T be a
linear operator on a finite dimensional real inner product space V . Suppose that T does
have an orthonormal basis E consisting of e1, e2, . . . , en which are eigenvectors of T , say
Tej = λjej for j = 1, 2, . . . , n. Here, of course, λ1, λ2, . . . , λn are real numbers. The
matrix of T relative to E is diagonal:
[T ] ≡ [T ] E =
λ1λ2
. . .
λn
.
The unspecified entries of the above matrix are filled with zeros. By what we have seen in
§2.1 of the last section (to be more specific, identity (2.1.2)), the matrix [T ∗] of the adjoint
T ∗, also relative to E , is just its transpose [T ]⊤. But [T ], as shown above, is diagonal and
hence [T ]⊤ = [T ]. So [T ∗] = [T ], from which it follows T ∗ = T , that is, T is a symmetric
operator.
The above conclusion is easy to get and short to say. Now a wonderful thing happens:
the converse is also true!
Theorem 3.1.1. If T is a symmetric operator on a finite dimensional real inner
product space V , then there is an orthonormal basis consisting of eigenvectors of T .
To prove this theorem, we have to find an orthonormal basis E consisting of e1, e2, . . . , enwhich are eigenvectors of T . Here, n of course is dim V . Our proof proceeds by induction
28
on n. When n = 1, i.e. V is one dimensional, take any unit vector e1 and form the
orthonormal basis E consisting of the single vector e1. This clearly will do for our purpose.
Now we make the inductive hypothesis that the theorem is true for all symmetric operators
on spaces of dimension m. Assume that the dimension of the space V on which T (the
operator we are investigating) is defined is m+1: dimV = n = m+1. By Proposition 2.4.1
we know the the existence of a real eigenvalue for T , say λ1 so that ker(T −λ1I) �= {0}.
Let us take any vector e1 in ker(T − λ1I) with ‖e1‖ = 1. Let L be the one dimensional
subspace spanned by e1:
L = {x ∈ V | x = αe1 for some α ∈ R }.
Let M = L⊥ = {v ∈ V | 〈v, e1〉 = 0 }. (Here {e1} stands for the set consisting of single
vector e1.) Then L + M = V , dimM = dimV − dimL = (m + 1) − 1 = m. For each
y ∈M ≡ L⊥,
〈Ty, e1〉 = 〈y, Te1〉 = 〈y, λ1e1〉 = 0.
The first identity follows from the assumption T = T ∗, the second from e1 ∈ ker(T − λ1I)and the last from y ⊥ L and λe1 ∈ L.
Denote by S the linear operator on M obtained by restricting T to M , that is, S is
the operator defined on the subspace M by putting Sx = Tx for x ∈M . Notice that, the
above argument shows that the range of S is in M and hence it is indeed a linear operator
on M , not just a linear transformation from M to V . Also notice that the linearity of S
is inherited from T . (Aside: In general, if M is an invariant subspace of T , that is, M is
a subspace of V with the property that x ∈ M implies Tx ∈ M , then it is legitimate to
consider the restriction of T to M .) As we have noticed, T being a symmetric operator
can be described by the following condition:
〈Tx,y〉 = 〈x, Ty〉 for all x,y ∈ V.
If x and y are actually inM , then we can rewrite Tx and Ty as Sx and Sy respectively, and
the above identity becomes 〈Sx,y〉 = 〈x, Sy〉. This shows that S is a symmetric operator
on M , which is m–dimensional. So we can apply the induction hypothesis to assert that M
has an orthonormal basis consisting of eigenvectors of S, say e2, e3, . . . , em+ 1. Now you
can see that m + 1 vectors e1, e2, . . . , em + 1 form an orthonormal basis of V consisting
of eigenvectors of T . The proof os complete.
The “matrix version” of Theorem 3.1.1 is the following:
Theorem 3.1.2. If A is a real symmetric matrix, i.e. A = A⊤, then there is a real
diagonal matrix D and an orthogonal matrix P such that A = PDP⊤. (In short, a real
symmetric matrix is orthogonally diagonalizable.)
29
The converse of Theorem 3.1.2 is true and very easy to prove (and hence not very
exciting to us): if A = PDP⊤ for some diagonal D and some orthogonal P , then we have
A⊤ = (PDP⊤)⊤ = P⊤⊤DP⊤ = PDP⊤ = A; (recall that P⊤⊤ = P ).
Now we start the proof of the above theorem. Assume that A is a n×n real symmetric
matrix. Consider the “God-given” operator T ≡MA on Rn induced by A, i.e. T (x) = Ax
for x ∈ Rn. Then T is a symmetric operator on Rn (which is the real inner product space
equipped with the standard inner product). By Theorem 3.1.1, there is an orthonormal
basis v1, v2, . . . , vn in Rn such that Tvk ≡ Avk = λkvk for some scalars λk, (k =
1, 2, . . . , n). Let
P = [v1 v2 · · · vn],
that is, the matrix with v1, v2, etc. as its column vectors. Then P is an orthogonal
matrix. We can check that AP = PD as follows, where D is the diagonal matrix with
λ1, λ2, . . . , λn as its diagonal entries.
AP = A[v1 v2 · · · vn] = [Av1 Av2 · · · Avn] = [λ1v1 λ2v2 · · · λnvn],
With the last block matrix written in the correct way, we have:
[v1λ1 v2λ2 · · · vnλn] = [v1 v2 · · · vn]
λ1. . .
λn
= PD.
Hence AP = PD, giving us A = APP⊤ = PDP⊤.
3.2. Now we consider the similar question in the complex case:
Question: W hich linear operator on a finite dimensional complex inner product space
has an orthonormal basis consisting of eigenvectors?
We proceed in the same way as the real case. However, the complex case is not
just more complex, it is more tricky. Suppose that an operator T on a finite dimensional
complex inner product space V does have an orthonormal basis E = {e1, e2, . . . , en} which
are eigenvalues of T , say Tej = λjej for j = 1, , . . . , n. The matrix of T relative to E is
diagonal:
[T ] ≡ [T ] E =
λ1λ2
. . .
λn
.
30
By what we have seen in the last section (identity (2.1.2)) the matrix of the adjoint T ∗,
also relative to E , is given by
[T ∗] E ≡ [T ∗] = [T ]∗ =
λ1λ2
. . .
λn
.
Hence both [T ][T ∗] and [T ∗][T ] are equal to the diagonal matrix with
|λ1|2(= λ1λ1 = λ1λ1), |λ2|2, . . . , |λn|2
as the diagonal entries. Thus we have [TT ∗] = [T ][T ∗] = [T ∗][T ] = [T ∗T ]. That is, the
operators TT ∗ and T ∗T have the same matrix representation relative to E . Since a linear
operator is completely determined by its matrix representation, we must have TT ∗ = T ∗T .
The discussion here leads to
Definition: A normal operator is a linear operator T on a complex inner product
space satisfying the identity TT ∗ = T ∗T . Similarly, a normal matrix is a (complex)
square matrix A satisfying AA∗ = A∗A.
Normal operators include both Hermitian operators and unitary operators. Recall
that an operator H (on a complex inner product space) is Hermitian if H = H∗. If H is
a Hermitian operator, then HH∗ and H∗H are equal, because both of them are equal to
H2. Hence Hermitian operators are normal. Also recall that an operator U is unitary if
UU∗ = U∗U = I. This identity clearly indicates that U is normal. In the same fashion,
Hermitian matrices and unitary matrices are normal matrices.
Example 3.2.1. Consider the matrix
A =
[
1 − i i−i 1 − i
]
with A =
[
1 + i i−i 1 + i
]
.
Then we can check AA∗ = A∗A =
[
3 2i−2i 3
]
, showing that A is normal, but is neither
hermitain nor unitary.
3.3. The previous discussion establishes that, if an operator T possesses an orthonor-
mal basis consisting of its eigenvectors, then T is a normal operator. Now we witness a
miracle: the converse is also true.
Theorem 3.3.1 A linear operator T on a (finite dimensional) complex inner product
space V has a diagonal matrix representation relative to some orthonormal basis if and
only if T is normal, that is, TT ∗ = T ∗T .
31
Assume that T is normal. We have to find an orthonormal basis E = {e1, e2, . . . , en}which are eigenvectors of T . Here, n of course is the dimension of V . Our proof proceeds
by induction on n. When n = 1, as in the real case, take any unit vector e1 and form the
orthonormal basis E consisting of the single vector e1. This clearly will do for our purpose.
Now we assume the existence of a diagonalizing orthonormal basis for normal operators on
spaces of dimension m, and the dimension of the space V on which T (the operator under
investigation) is defined is m+ 1, that is, n = dimV = m+ 1. Let λ0 be an eigenvalue of
T and let e0 be an eigenvector corresponding to λ0 with norm one, that is, ‖e0‖ = 1. Let
M be the one dimensional subspace spanned by e0:
M = {x ∈ V | x = αe0 for some α ∈ C }.
For convenience, let us write T0 for T −λ0I. Notice that, “x ∈M” implies “T0x = 0”. We
proceed our proof step-by-step as follows. Firstly, notice that T0 is also normal. Indeed,
T0T∗0 = (T − λ0I)(T ∗ − λ0I) = TT ∗ − λ0T
∗ − λ0T + λ0λ0I
= T ∗T − λ0T∗ − λ0T + λ0λ0I = (T ∗ − λ0I)(T − λ0I) = T ∗
λ0Tλ0
.
Secondly, for x ∈ M , in addition to T0x = 0, we also have T ∗0 x = 0. (Aside: Attention!
This is the crucial step.) Indeed, we have
‖T ∗0 x‖2 = 〈T ∗
0 x, T∗0 x〉 = 〈T0T ∗
0 x,x〉 = 〈T ∗0 T0x,x〉 = 〈0,x〉 = 0.
Hence T ∗0 x = 0. Thirdly, we claim that M⊥ (the orthogonal complement of M) is invariant
for both T and T ∗, i.e. if y ∈M⊥, then both Ty and T ∗y belong to M⊥. To prove this,
let us suppose y ∈M⊥. Notice that T = T0 + λI. Hence, for each x ∈M , we have
〈Ty,x〉 = 〈(T0 + λI)y,x〉 = 〈T0y + λy,x〉= 〈T0y,x〉 + λ0〈y,x〉 = 〈y, T ∗
0 x〉 + 0 = 〈y, 0〉 = 0,
and, in the same fashion, we can show 〈T ∗y,x〉 = 0 for all x ∈M . Therefore both Ty and
T ∗y are in M⊥.
Now the rest of the argument is very similar to that of Theorem 3.1.1. Denote by S
the linear operator on M⊥ by restricting T to M⊥, that is, S is the operator defined on
the subspace M⊥ by putting Sx = Tx for x ∈ M⊥. The third step above tells us that
S is indeed a linear operator on M⊥. We check that S ∈ L (M⊥) is also normal. To this
end, we need to find out its adjoint S∗ and show that S and S∗ commutes. So let us take
x,y ∈M⊥. Then
〈x, S∗y〉 = 〈Sx,y〉 = 〈Tx,y〉 = 〈x, T ∗y〉.
32
Since x is an arbitrary vector in M⊥ and since both S∗y and T ∗y are in M⊥, we must have
S∗y = T ∗y. In other words, S∗ is just the restriction of T ∗ to M⊥. Hence, for y ∈M⊥,
SS∗y = S(S∗y) = S(T ∗y) = T (T ∗y) = TT ∗y.
In the same way, we can show that S∗Sy = T ∗Ty. As T is normal, TT ∗y = T ∗Ty and
hence SS∗y = S∗Sy. As y is an arbitrary vector in M⊥ (the domain of S), we have
SS∗ = S∗S, that is, S is normal.
We have shown that S is a normal operator on M⊥. From the fact that M is one-
dimensional and V has dimension m + 1, we see that M⊥ is m-dimensional. Therefore,
by our induction hypothesis, M⊥ has an orthonormal basis consisting of eigenvectors of
S, say e1, e2, . . . , em. Now, m + 1 vectors e0, e1, e2, . . . . . . , em form an orthonormal
basis of V consisting of eigenvectors of T .
3.4. The “matrix version” of Theorem 3.3.1 is the following:
Theorem 3.4.1. If A is a normal matrix, i.e. AA∗ = A∗A, then there is a diagonal
matrix D and a unitary matrix U such that A = UDU∗.
The converse of the above theorem is true and very easy to prove (and hence not very
interesting). Indeed, if A = UDU∗ for some diagonal D and some unitary U , then
AA∗ =UDU∗(UDU∗)∗ = UDU∗U∗∗DU∗ = UDU∗UD∗U∗
= UDD∗U∗ = UD∗DU = UD∗U∗U∗∗DU∗ = A∗A;
(recall that U∗∗ = U and UU∗ = U∗U = I.)
We can derive Theorem 3.4.1 from Theorem 3.3.1 in the same way as deriving Theorem
3.1.2 from Theorem 3.3.1 except that we work with the standard complex space Cn instead
of the real Rn. This same argument will not be repeated here. Recall the following
definition of unitary equivalence: we say that n × n (complex) matrices A and B are
unitarily equivalent if and only if A = U∗BU for some unitary matrix U . Now Theorem
3.4.1 can be restated as follows:
An n× n complex matrix is unitarily equivalent to adiagonal matrix if and only if it is a normal matrix.
Since Hermitian operators and unitary operators are normal operators, Theorem 3.3.1 is
applicable to these types of operators. Thus, if A is a Hermitian operator (or a uni-
tary operator) on a finite dimensional complex inner product space V , then there is an
33
orthonormal basis E such that the representing matrix [T ] E relative to E is diagonal, say
[T ] E =
λ1λ2
. . .
λn
.
Notice that, the diagonal elements of [T ]E are eigenvalues of T . In case T is Hermitian,
the diagonal elements λk are real. In case T is unitary, |λk| = 1 for all k.
We have shown that a Hermitian matrix has an orthonormal basis consisting of eigen-
vectors with real eigenvalue. The matrix version of this statement is: a Hermitian matrix
A is unitarily equivalent to a real diagonal matrix, that is, there is a real diagonal matrix
D and a untary operator U such that A = UDU∗; (the converse is also true but not very
interesting.)
3.5. A 1 × 1 complex matrix is simply a complex number. A 1 × 1 unitary matrix
is a unit modulus number, that is, a complex number z with |z| = 1. A 1 × 1 Hermitain
matrix is just a real number. A good way to think of Hermitian matrices or operators is to
regard them an extension of real numbers. In the present subsection we study operators
and matrices which can be considered as an extension of positive numbers.
Definition. W e say that a linear operator P on a complex inner product space V
is positive if P is Hermitain and 〈Px,x〉 ≥ 0 for all x in V .
Notice that eigenvalues of positive operators are nonnegative real numbers. Indeed, if λ is
an eigenvalue of a positive operator P , say Pv = λv for some vector v with ‖v‖ = 1,
then we have 〈Pv,v〉 = 〈λv,v〉 = λ‖v‖2 = λ and hence λ ≥ 0.
Example 3.5.1. If P is a positive operator on V and if T is any operator on V ,
then the operator T ∗PT is also positive. Indeed, for any vector x in V ,
〈T ∗PTx,x〉 = 〈PTx, Tx〉 = 〈Py,y〉 ≥ 0, with y = Tx.
In particular, for any operator T on an inner product space, operators T ∗T and TT ∗
are positive.
Example 3.5.2. If T is a Hermitian operator on V with nonnegative eigenvalues,
then T is positive. Indeed, since T is Hermitain (and hence normal), it follows from
Theorem 3.3.1 that there is an orthonormal basis E = {e1, e2, . . . , en} in V consisting
eigenvectors of T , say Tek = λkek with λk ≥ 0 (1 ≤ k ≤ n). Any vector v can be
34
written as a linear combination of the basis vectors, say v =∑
k vkek; (here we briefly
recall that vk = 〈v, ek〉, even this will not be used here). Thus
〈Tv,v〉 =⟨
T(∑
kvkek
)
,∑
jvjej
⟩
=∑
k,jvkvjλk〈ek, ej〉 =
∑
kλk|vk|2 ≥ 0.
Hence T is positive.
Let P be a positive operator on a complex inner product space V . By Theorem
3.3.1 we see that that there is an orthonormal basis E = {e1, e2, . . . , en} in V consisting
eigenvectors of T , say Tek = λkek with Theorem 3.3.1 that there is an orthonormal
basis E = {e1, e2, . . . , en} in V consisting eigenvectors of T , say Tek = λkek with
λk ≥ 0 (1 ≤ k ≤ n). In other words, the matrix [P ]E representing P relative to E is a
diagonal matrix with λk ≥ 0 (1 ≤ k ≤ n) as its diagonal elements. Now let Q be the
operator such that its matrix representation [Q]E is a diagonal matrix with with√λk
(1 ≤ k ≤ n) as its diagonal elements. Thus
[P ] E =
λ1λ2
. . .
λn
and [Q]E =
√λ1 √
λ2. . . √
λn
.
Clearly [Q]2E = [P ] E . Hence we have Q2 = P . From Example 3.5.2 above we know that
Q is positive. We have proved the existence part of the following theorem
Theorem 3.5.1. If P is a positive operator, then there exists a unique positive
operator Q such that Q2 = P .
The Proof of the uniqueness of Q is rather technical and hence is omitted here. The
operator Q in the above theorem is called the square root of P and is denoted by P 1/2
or√P .
Take any operator T on an inner product space V . According to Example 3.5.1
above, T ∗T is a positive operator and hence its square root is defined. We denote√T ∗T
by |T |. Thus, |T | is a positive operator satisfying
|T |2 = T ∗T. (3.5.1)
In general, |T | and |T ∗| are not the same.
Example 3.5.3. Prove that |T | = |T ∗| if and only if T is normal.
35
Solution. Suppose |T | = |T ∗|. Then |T |2 = |T ∗|2. Now |T |2 = T ∗T and |T ∗|2 =
T ∗(T ∗)∗ = TT ∗. Hence T ∗T = TT ∗, that is, T is normal. The steps can be reversed to
show that, if T is normal, then |T ∗| = |T |.
The eigenvalues of |T |, arranged in the decreasing order, say µ1 ≥ µ2 ≥ · · · ≥ µn, are
called singular values, or s-numbers of T . They are important in many areas, but we
do not plan to say more about this.
Take any complex n × n matrix A = [ajk] and consider the linear operator MA
induced by A, defined on the complex space Cn with the standard inner product. If
MA is a positive operator, we say that A is positive semi-definite. If, furthermore,
MA is invertible, we say that A is positive definite. It can be easily checked that, if
v = (z1, z2, . . . , zn), then
〈MAv,v〉 = v∗Av =∑
k,jajkzkzj .
Thus a Hermitain matrix A = [ajk] is positive semi–definite if and only if
∑
k,jajkzkzj ≥ 0.
All of the above discussion about operators can be applied to matrices. For example, for
any matrix A, A∗A is positive semi–definite and hence |A| =√A∗A exists
Example 3.5.4. Take any set of vectors v1, v2, . . . , vr in an inner product space
and let G = [gjk] be the r × r matrix with gjk = 〈vj ,vk〉. Check that A is positive
semi–definite.
Solution. For all comples numbers z1, z2, . . . , zr, we have
∑r
j,k = 1gjkzjzk =
∑r
j,k = 1〈vj ,vk〉zjzk =
∑r
j,k = 1ajk〈zjvj , zkvk〉
=⟨∑r
j = 1zjvj ,
∑r
k = 1zkvk
⟩
= 〈w,w〉 ≥ 0,
where w =∑r
j = 1zjvj . A matrix of the form described here is called a Gram matrix.
3.6.* Let T be an operator on a complex inner product space and, as before, write
|T | =√T ∗T . Let us check that |T | and T have the same kernel:
ker |T | = kerT (3.6.1)
Indeed, for any vector v, we have
‖Tv‖2 = 〈Tv, Tv〉 = 〈T ∗Tv,v〉 = 〈|T |2v,v〉 = 〈|T |v, |T |v〉 = ‖|T |v‖2, (3.6.2)
36
which tells us that Tv = 0 if and only if |T |v = 0.
Now assume that T is invertible. From (3.6.1) we know that |T | is also invertible.
Let U = T |T |−1. Then T = U |T |, U∗U = (|T |−1T ∗)T |T |−1) = |T |−1|T |2|T |−1 = I and
UU∗ = (T |T |−1)(|T |−1T ∗) = T (|T |2)−1T ∗ = T (T ∗T )−1T ∗ = T (T−1(T ∗)−1)T ∗ = I,
showing that U is unitary. We have proved that T can be written as a product UP of a
unitary operator U and a positive operator P . Now we check that U and P are uniquely
determined by T . In fact, from T = UP we have P 2 = PU∗UP = (UP )∗(UP ) = T ∗T =
|T |2. By the uniqueness of positive square root, we have P = |T |, from which we also
have U = TP−1 = T |T |−1. The expression UP here is called the polar decomposition
of T . In the one–dimensional case, we can identify T with a complex number z and
the polar representation z = reiθ of z corresponds to the polar decomposition of T .
There is a matrix version for polar decomposition defined in the same manner. The polar
decomposition of an n × n matrix A is A = U |A|, where |A| =√A∗A and U is a
unitary matrix.
Exampe 3.6.1. Find the polar decomposition for A =
[
3 3ii 1
]
.
Solution. Direct computation shows A∗A =
[
10 8i−8i 10
]
with eigenlues λ1 = 18,
λ2 = 2 and corresponding eigenvectors v1 = (i, 1) and v2 = (1, i). Hence |A| =√A∗A
has the eigenvalues√λ1 = 3
√2,
√λ2 =
√2 with the same set of eigenvectors. Let
D =
[
3√
2 00
√2
]
and W =1√2
[
i 11 i
]
.
Then (A∗A)W = WD2, |A|W = WD and U is unitary. A direct computation shows
|A| = WDW ∗ =√
2
[
2 i−i 2
]
and U = A|A|−1 =1√2
[
1 ii 1
]
.
Hence the required polar decomposition is A = U |A|, with U and |A| as given above.
Now we breifly describe the polar decomposition for an operator T not necessarily
invertible. Equation (3.6.2) tells us that ‖|T |w‖ = ‖Tw‖ for all w. Since |T | is Hermitian,
its range is the orthogonal complement of its kernel. We define an operator U by specifying
its values for a vector v in the range or in the kernel of |T |. When v is in the kernel of |T |,we simply set Uv = 0. When v is in the range of |T |, say v = |T |w, we let Uv = Tw.
Notice that
‖Uv‖ = ‖Tw‖ = ‖|T |w‖ = ‖v‖.
37
This shows that, on the range of |T |, U is isometric. From the way U is defined, we have
T = U |T | and kerU = ker |T |. Here U in general is not unitary since it may not be
invertible. However, it resembles a unitary operator, in view of the following identities
which can be verified:
UU∗U = U, U∗UU∗ = U∗.
Let e1, e2, . . . , en be an orthonormal basis consisting of eigenvectors of |T | and let
µ1, µ2, . . . , µn be corresponding eigenvalues of |T |, which are singular values of T :
µ1 ≥ µ2 ≥ · · · ≥ µn.
So we have |T |ek = µkek (1 ≤ k ≤ n). Let r be the rank of T , which is also the rank of
|T |. Thus we have µ1 ≥ µ2 ≥ · · · ≥ µr > 0 and µr + 1 = · · · = µn = 0. Since the vectors
e1, e2, . . . , . . . , er are in the range of |T |, and the operator U is isometric on the range
of |T |, the vectors
fk = Uek 1 ≤ k ≤ r
form an orthonormal system. One can check that
Tx =∑r
k = 1µk〈x, e k 〉 fk (3.6.3)
for any vector x. The above identity is called the singular decomposition of T .
Example 3.6.2. Find the singular decomposition of A =
[
1 i1 i
]
.
Solution. Direct computation shows |A|2 = A∗A =
[
2 2i−2i 2
]
with eigenvalues
4 and 0. Furthermore, e = (i/√
2, 1/√
2) is an eigenvector of |A|2 corresponding to
the eigenvalue 4 with ‖e‖ = 1. From |A|2e = 4e we have |A|e = 2e. Furthermore, we
have Ae = U |A|e = U(2e) = 2Ue = 2f . Thus f = 2−1Ae = (i/√
2, i/√
2). Thus the
singular decomposition of A is given by Ax = 2〈x, e〉f with e = (i/√
2, 1/√
2) and
f = (i/√
2, i/√
2).
38
EXERCISE SET IV. 3.
Review Questions. What is the orthogonal diagonalization problem? What is the neat
and thorough answer to this problem in the real case? In the complex case? What is the
matrix version of this problem and what is the corresponding answer? (Again, you have
to consider the real case and the complex case separately.)
Drills
1. Show that each of the following pairs of matrices A and B are unitarily equivalent by
finding a unitary matrix U such that B = U∗AU .
(a) A =
[
a bc d
]
, B =
[
d cb a
]
. (b) A =
[
a bc d
]
, B =
[
a −b−c d
]
.
(c) A =
[
0 a0 0
]
, B =
[
0 |a|0 0
]
. (d) A =
[
1 1−1 −1
]
, B =
[
0 20 0
]
.
Hint for (c): try a diagonal U . Hint for (d): try the 45o rotation.
2. (a) Verify that N =
[
a bc a
]
is normal if and only if |b| = |c|.(b) Show that the circulant matrix given in Exercise 4 of EXERCISE SET III.1 is
normal.
3. True or false:
(a) If an n× n matrix A is both Hermitian and unitary, then A = I.
(b) The sum of two normal operators is normal.
(c) The product of two normal operators is normal.
(d) If a normal operator is invertible, then its inverse is also normal.
(e) The sum of two unitary operators is unitary.
(f) The product of two unitary operators is unitary.
(g) The sum of two Hermitian operators is Hermitian.
(h) The product of two Hermitian operators is Hermitian.
4. (Aside: It is clear that unitary equivalence implies similarity, but not vice versa. The
present exercise helps you to compare these two concepts.)
(a) Show that, if A and B are unitarily equivalent n × n matrices, then A∗A and
B∗B are also unitarily equivalent.
(b) Give an example of a pair of 2 × 2 matrices which are similar but not unitarily
equivalent.
(c) Give an example of similar 2 × 2 matrices A and B such that A∗A and B∗B are
not similar.
39
(d) Prove that if normal matrices A and B are similar, then they are unitarily
equivalent.
5. For each of the following Hermitian matrices, find the eigenvalues and correspond-
ing eigenvectors and find an appropriate diagonaling unitary matrix (or orthogonal
matrix):
A =
[
1 11 1
]
, B =
[
2 11 2
]
, C =
[
1 22 −2
]
, D =
[
0 −ii 0
]
,
S =
1 1 11 1 11 1 1
, T =
0 1 01 0 10 1 0
.
6. For each of the following matrices, find the eigenvalues and corresponding eigenvectors
and find an appropriate diagonaling unitary matrix (or orthogonal matrix):
A =
[
cos θ sin θsin θ − cos θ
]
, B =
[
cos θ − sin θsin θ cos θ
]
, W =
[
cos θ i sin θi sin θ cos θ
]
,
where θ is an arbitrary real number such that sin θ > 0.
Exercises
1. Find an orthogonal matrix P and a diagonal matrix D such that D = PAP⊤, where
A =
1 1 01 0 10 1 1
.
(Hint: Notice that (1, 1, 1) is an eigenvector of A.)
2. Follow the guidance given here, show that the n-th term of the Fibonacci sequence
{an} in which each term is is the sum of the preceding two, with the first few terms
1, 1, 2, 3, 5, 8, . . ., is given by
an =1√5
(
1 +√
5
2
)n
− 1√5
(
1 −√
5
2
)n
.
(a) Use the recursive relation an + 2 = an+ 1 + an to verify Xn+ 1 = AXn, where
A =
[
1 11 0
]
, Xn =
[
an + 1
an
]
, and X0 =
[
10
]
,
40
and derive Xn = AnX0. (b) Notice that A is a real symmetric matrix and hence we
can write A = PDP−1 where P is an invertible matrix and D is a diagonal matrix.
Find the explicit expressions of D and P , which enable us to get An and hence Xn.
3*. Let V be a finite dimensional complex inner product space. Prove that If N is a
normal operator on V and if TN = NT for some T ∈ L (V ), then TN∗ = N∗T .
4*. Prove that if P is an invertible positive operator on an inner product space V , then
the inequality
|〈x,y〉|2 ≤ 〈Px,x〉〈P−1y,y〉
holds for all x, y in V .
5*. Prove that if H is a Hermitian operator on a finite dimensional inner product space
V , then eitH is a unitary operator for all real t.
6*. Prove that an operator H on a finite dimensional inner product space V is a Hermitian
operator if eitH is unitary for all real t.
41
Appendices for Chapter IV
Appendix A*: Positive Semidefiniteness, Gram Matrices
Recall that a matrix A = [ajk]1≤j,k≤r is positive semidefinite if, for all complex
numbers z1, z2, . . . , zr,∑r
j,k = 1ajkzjzk ≥ 0.
A positive semidefinite matrix is necessarily Hermitian and its eigenvalues are nonnegative
real numbers. It follows from the spectral theory (for normal operators) that it is a sum
of positive semidefinite matrices of rank one, which are necessarily of the form
v1v1 v1v2 · · · v1vrv2v1 v2v2 · · · v1vr
......
...vrv1 vrv2 · · · vrvr
. (A1)
An interesting consequence of this observation is that, if A = [ajk] and B = [bjk] are
positive semidefinte matrices, then so is their Schur product A ◦B = [ajkbjk] (certainly
this is not the usual kind of matrix multiplication). Indeed, the above discussion tells us
that it is enough to consider the case when B is the matrix given as (A1) above. In that
case, we have
∑r
j,k = 1ajkbjkzjzk =
∑r
j,k = 1ajkvjvkzjzk =
∑r
j,k = 1ajkwjwk ≥ 0,
where wj = vjzj .
Let v1, v2, . . . , vr be a set of vectors in an inner product space V . By the Gram
matrix associated with this set of vectors we mean the following r × r matrix
Γ =
〈v1,v1〉 〈v1,v2〉 · · · 〈v1,vr〉〈v2,v1〉 〈v2,v2〉 · · · 〈v2,vr〉
......
...〈vr,v1〉 〈vr,v2〉 · · · 〈vr,vr〉
. (A2)
Its determinant Gr = det Γ is called the Gramian. Notice that Γ is positive semidef-
inite. Indeed, since the (j, k)–entry of Γ is 〈vj ,vk〉, for arbitrary complex numbers
z1, z2, . . . , zr, we have
r∑
j,k = 1
〈vj ,vk〉zjzk =r
∑
j,k = 1
〈zjvj , zkvk〉 =⟨∑
jzjvj ,
∑
kzkvk
⟩
=∥
∥
∥
∑
jzjvj
∥
∥
∥
2
≥ 0.
42
Furthermore, this argument shows that Γ is positive definite if and only if the vectors
v1, v2, . . . , vr are linearly independent. In terms of Gramian, the vectors v1, v2, . . . , vr
are linearly independent if and only if Gr ≡ det Γ > 0. Conversely, given a positive
definite matrix A = [ajk]1≤j,k≤r, there exist an inner product space and a set of vectors
v1, v2, . . . , vr in V such that ajk = 〈vj ,vk〉 for all j, k; in other words, every positive
definite matrix can be regarded as a Gram matrix. Indeed, given such a matrix A, we
define an inner product on V = Cr by putting
〈v,w〉 =∑r
j,k = 1ajkvjwk
for all v = (v1, . . . , vr) and w = (w1, . . . , wr) in Cr. It is straightforword to check
that this indeed defines an inner product and ajk = 〈ej , ek〉, where ek is the kth vector
of the standard basis for Cr.
Now we consider an interesting expression which resembles the Gramian det Γ, that
is, the determinant of Γ given in (A2),
g =
∣
∣
∣
∣
∣
∣
∣
∣
〈v1,v1〉 〈v1,v2〉 · · · 〈v1,vr−1〉 v1
〈v2,v1〉 〈v2,v2〉 · · · 〈v2,vr−1〉 v2
...... · · ·
......
〈vr,v1〉 〈vr,v2〉 · · · 〈vr,vr−1〉 vr
∣
∣
∣
∣
∣
∣
∣
∣
. (A3)
Notice that the last column of the above determinant are vectors v1, v2, . . . , vr. If we
take the cofactor expansion along the last column, we can write g as a linear combination
of v1,v2, . . . ,vr with
Gr−1 =
∣
∣
∣
∣
∣
∣
∣
∣
〈v1,v1〉 〈v1,v2〉 · · · 〈v1,vr−1〉〈v2,v1〉 〈v2,v2〉 · · · 〈v2,vr−1〉
...... · · ·
...〈vr−1,v1〉 〈vr−1,v2〉 · · · 〈vr−1,vr−1〉
∣
∣
∣
∣
∣
∣
∣
∣
as the coefficient of vr. Thus we may write
g = Gr−1vr + w (A4),
with w in S = span{v1, . . . ,vr−1}. Now, for each vk with 1 ≤ k ≤ r − 1, we have
〈g,vk〉 =
∣
∣
∣
∣
∣
∣
∣
∣
〈v1,v1〉 〈v1,v2〉 · · · 〈v1,vr−1〉 〈v1,vk〉〈v2,v1〉 〈v2,v2〉 · · · 〈v2,vr−1〉 〈v2,vk〉
......
......
〈vr,v1〉 〈vr,v2〉 · · · 〈vr,vr−1〉 〈vr,vk〉
∣
∣
∣
∣
∣
∣
∣
∣
= 0
43
because the last column is the same as the kth column. This shows g ∈ S⊥. Assume that
v1, v2, . . . , vr−1 are linearly independent so that Gr−1 �= 0. Rewrite (A4) as
vr = h + p, where h = G−1r−1g ∈ S⊥ and p = G−1
r−1w ∈ S.
Thus p is the projection of vr onto the subspace spanned by v1, v2, . . . , vr−1.
Example. Find the projection of v = (0, 0, 1) onto the subspace spanned by the
vectors v1 = (1, 1, 1) and v2 = (1, 2, 2).
Solution. Form the vector
g =
∣
∣
∣
∣
∣
∣
〈v1,v1〉 〈v1,v2〉 v1
〈v2,v1〉 〈v2,v2〉 v2
〈v,v1〉 〈v,v2〉 v
∣
∣
∣
∣
∣
∣
=
∣
∣
∣
∣
∣
∣
3 5 v1
5 9 v2
1 2 v
∣
∣
∣
∣
∣
∣
= v1 − v2 + 2v.
So v = 12g − 1
2v1 + 12v2. The required projection is p = −1
2v1 + 12v2 =
(
0, 12 ,− 12
)
.
Appendix B*: Numerical Characters of operators: Norm, Spectral Radius, Etc.
For a linear operator T on a finite dimensional inner product space V , the uniform norm,
or simply the norm, of T is defined to be
‖T‖ = max{‖Tx‖ : x ∈ V, ‖x‖ = 1}
The following basic properties about the norm hold: 1. ‖T‖ ≥ 0 and ‖T‖ = 0 if and
only if T = O; 2. ‖S + T‖ ≤ ‖S‖ + ‖T‖; 3. ‖aT‖ = |a|‖T‖; 4. ‖ST‖ ≤ ‖S‖‖T‖; 5.
‖Tv‖ ≤ ‖T‖‖v‖; 6. ‖T ∗‖ = ‖T‖. The last equality follows from the following observation:
‖T‖ = max{|〈Tx,y〉| : x, y ∈ V, ‖x‖ = ‖y‖ = 1}.
A less trivial property is the following “C∗–identity”:
‖T ∗T‖ = ‖T‖2.
Indeed, ‖T ∗T‖ ≤ ‖T ∗‖‖T‖ = ‖T‖2, and, on the other hand, from
‖Tx‖2 = 〈Tx, Tx〉 = 〈T ∗Tx,x〉 ≤ ‖T ∗Tx‖‖x‖ ≤ ‖T ∗T‖‖x‖2
we have ‖T‖2 = max‖ x ‖= 1 ‖Tx‖2 ≤ ‖T ∗T‖. Hence ‖T ∗T‖ = ‖T‖2.
44
One purpose of introducing the notion of norm is to study convergence of operators.
We say that a sequence of operators {Tn} converges to T if limn→ ∞ ‖Tn − T‖ = 0.
Also, we say that a series of operators∑∞
n = 0Tn converges if the sequence of its partial
sums Sn = T0 +T1 + · · ·+Tn converges. For example, it can be checked that, if ‖T‖ < 1,
then I − T is invertible and the series∑∞
n= 0Tn converges to (I − T )−1.
Recall that the spectrum σ(T ) of T is the set of all eigenvalues of T . The spectral
radius of T is defined to be
r(T ) = max{|λ| : λ ∈ σ(T )}.
It is easy to see that r(T ) ≤ ‖T‖. Indeed, we can choose an eigenvalue λ such that
|λ| = r(T ), and, letting v be a unit vector such that Tv = λv, we have
r(T ) = |λ| = ‖λv‖ = ‖Tv‖ ≤ ‖T‖.
Notice that r(T ) = 0 if and only if 0 is the only eigenvalue of T , or, equivalently,
T is a nilpotent operator. When S and T commutes, that is, ST = TS, we have the
inequalities r(S + T ) ≤ r(S) + r(T ) and r(ST ) ≤ r(S)r(T ). But in general, without
this commutativity condition, these two inequalities are not true. We have the following
important identity for spectral radius
r(T ) = limn→ ∞ ‖Tn‖1/n.
The usual proof of this uses complex analysis. The set
W(T ) = max{〈Tx,x〉 : ‖x‖ = 1}
is called the numerical range of T . It is true but highly nontrivial that W(T ) is always
a convex set in the complex plane. The number w(T ) = max{|λ| : λ ∈ W(T )} is called
the numerical radius of T . For all operators T , we have
r(T ) ≤ w(T ) ≤ ‖T‖ ≤ 2w(T ).
Given operators S and T , thier Hilbert–Schmidt inner product is defined to be
〈S, T 〉HS =∑n
j = 1〈Sej , Tej〉
where {ej}1≤j≤n is an orthonormal basis of V . It can be checked that 〈S, T 〉HS given
here is independent of the choice of the orthonormal basis {ej}1≤j≤n and hence it is well
defined. The Hilbert–Schmidt norm of an operator T , is defined to be
‖T‖HS =√
〈T, T 〉HS =
√
∑n
k = 1‖Tek‖2.
45
Let µ1 ≥ µ2 ≥ · · · ≥ µn be the singular values of T , that is, the eigenvalues of |T |,arranged in the decreasing order. Then ‖T‖ = µ1 and ‖T‖2HS =
∑nk = 1µ
2k. For any
positive number p ≥ 1, one can define the p–norm of T to be ‖T‖p by putting
‖T‖p =(∑n
k = 1µpk
)1/p
.
Then ‖T‖HS = ‖T‖2, that is, the Hilbert–Schmidt norm is just the 2–norm. The following
properties about the p–norm are highly nontrivial: 1. ‖S+T‖p ≤ ‖S‖p+‖T‖p; 2. ‖UT‖p =
‖TU‖p = ‖T‖p if U is unitary (or orthogonal in the real case); 3. ‖ST‖p ≤ ‖S‖‖T‖pand ‖ST‖p ≤ ‖S‖p‖T‖; 4. ‖T‖ ≤ ‖T‖p ≤ ‖T‖1; 5. ‖T ∗‖p = ‖T‖p. The norm ‖T‖1 is
called the trace norm of T . Notice that ‖T‖1 = tr |T |, the trace of |T |.
We can use an inner product space V and operators on V to model some quantum
system. A pure state of this system is a unit vector in V . An observable is an operator
on V . An eigenstate for an observable A is an eigenvector of A and the corresponding
eigenvalue is the observed value of A at that state, as measured in a lab. If v is
not an eigenstate for A, then 〈Av,v〉 is the expected value (the word “expected” is in
probabilistic sense) of A at state v. By a “mixed” state we mean a positive operator
T with ‖T‖1 ≡ tr T = 1. If µ1, µ2, . . . , µn are eigenvalues of T with corresponding
eigenvectors v1, v2, . . . , vn which form an orthonormal basis, then
〈AT, T 〉HS =∑n
k = 1µk〈Avk,vk〉,
which is the expected value of the observable A at the mixed state T . A pure state v
can be identified with the rank one positive operator T defined by Tx = 〈x,v〉v. Notice
that ‖v‖ = 1 implies tr T = 1. A mixed state is a convex combination of a set of pure
states.
Appendix C*: Linear Groups
By a linear group here we mean a group of linear operators (another word for this
is linear transformations), not a group that is linear. Let V be a vector space with
dimV = n, Recall that L (V ) is the set of all linear operators on V . We say that a subset
G of L (V ) is a linear group or simply a group if it satisfies the following conditions:
(LG1) The identity transformation I belongs to G.
(LG2) G is closed under multiplication, that is, if S and T are in G, then so is their
product ST .
46
(LG3) If S is in G, then S is invertible and its inverse S−1 is also in G.
For example, all invertible elements in L (V ) form a group denoted by GL(V ), called the
general linear group. When the vector space V is Fn, we may identify GL(V ) with
the group GL(n;F) of all invertible n × n matrices over F. A subgroup of GL(n;F)
is called a matrix group. For example, all orthogonal (real) n × n matrices form a group
denoted by O(n), called the orthogonal group. Notice that, for A ∈ O(n), we have
det(AA⊤) = det(I) = 1, or (detA)2 = 1 and hence detA is either 1 or −1. The subgroup
of O(n) consisting of orthogonal matrices of determinant 1, called the special orthogonal
group, is denoted by SO(n), All unitary n × n matrices form a subgroup of G(n,C),
denoted by U(n), called the unitary group. The special unitary group is defined to
beSU(n) = {A ∈ U(n) : detA = 1}
≡ {A ∈ GL(n,C) : AA∗ = A∗A = I, detA = 1}.
This group is important in several areas, including particle physics.
Let G be a group of n×n invertible matrices. An n×n matrix A is called a tangent
vector of G at I if A = Φ′(0) for some smooth curve Φ(t) in G satisfying Φ(0) = I.
Denote by LG the set of all tangent vectors of G at I.
Now we derive some basic properties of LG. First, take arbitrary A and B in LG. We
claim: A+B is also in LG. Indeed, by assumption, we have Φ′(0) = A and Ψ′(0) = B for
some parametric curves Φ(t) and Ψ(t) in G with Φ(0) = Ψ(0) = I. Then Θ(t) ≡ Φ(t)Ψ(t)
is also a parametric curve in G with Θ(0) = Φ(0)Ψ(0) = I. So Θ′(0) is in LG. The product
rule gives
Θ′(0) = Φ(0)Ψ′(0) + Φ′(0)Ψ(0) = I.B +A.I = A+B.
Hence A + B is in LG. Next we claim: if A is in LG, say A = Φ′(0) for a parametric
curve Φ(t) with Φ(0) = I, and if λ is a scalar, then λA is also in LG. Putting the last two
claims together, we see that LG is a vector space. Indeed, consider the new parametric
curve Ψ(t) = Φ(λt), which is also lying in G. Clearly Ψ(0) = I and, by the chain rule,
Ψ′(t) = λΦ′(λt) and consequently λA = λΦ′(0) = Ψ′(0) ∈ LG. Now we make the third
claim: if A ∈ LG and B ∈ G, then BAB−1 ∈ LG. Indeed, from A ∈ LG we know that
A = Φ′(0) for some curve Φ(t) in G satisfying Φ(0) = I. Let Ψ(t) = BΦ(t)B−1, which
is a parametric curve in G satisfying Ψ(0) = I with Ψ′(0) = BΦ′(0)B−1 = BAB−1 and
hence BAB−1 ∈ LG. The final claim is: if A,B are in LG, then so is AB −BA. Indeed,
we have Ψ′(0) = B for some parametric curve in Ψ(t) in G with Ψ(0) = I. By our third
claim, we know that C(t) = Ψ(t)−1AΨ(t) is a parametric curve in LG. Now our first and
second claims say that LG is a linear space of n× n matrices. Hence the derivative C ′(t)
47
of C(t), a curve in LG, is also in LG. In particular, C′(0) is in LG. Now
C ′ = (Φ−1AΦ)′ = (Φ−1)′AΦ + Φ−1AΦ′ = −Φ−1Φ′Φ−1AΦ + Φ−1AΦ′;
(here we have used (Φ−1)′ = −Φ−1Φ′Φ). From Φ(0) = I and Φ′(0) = A, we obtain
C′(0) = −BA + AB = AB − BA. The expression AB − BA is called the Lie bracket
or Lie product of A and B, and is denoted by [A,B]. We call a set L of n × n matrices
a real (matrix) Lie algebra if, for all A and B in L and for all real numbers λ and µ,
both λA+ µB and [A,B] are in L. We have arrived at the following fact: If G is a matrix
group, then LG is a Lie algebra. Naturally we call LG the Lie algebra of G.
The description of tangent vectors of G at I is not easy to work with in concrete cases.
The following criteria is handy: a matrix A is in LG if and only if etA ∈ G for all t. Using
this criterion, we can easily find out the Lie algebras of matrix groups mentioned above.
Writing Mn(F) for the set of all n× n matrices with entries in F, we have
L O(n) = {A ∈ Mn(R) : A+A⊤ = O};
L SO(n) = {A ∈ Mn(R) : A+A⊤ = O, tr A = 0};
L U(n) = {A ∈ Mn(C) : A+A∗ = O};
L SU(n) = {A ∈ Mn(C) : A+A∗ = O, tr A = 0}.
Appendix D*: Rotations
By a rotation we mean an element A in the matrix group SO(3); in other words,
A is a 3 × 3 real matrix with AA⊤ = A⊤A = I and detA = 1. Since A is a 3 × 3 real
matrix, its charateristic polynomial p(x) is a real polynomial of (odd) degree 3. Thus
p(x) must have a real root, say r, and the other two are either both real or a conjugate
pair. Let v be an eigenvector corresponding to r, that is, Av = rv, v �= 0. Now
‖v‖ = ‖Av‖ = ‖rv‖ = |r|‖v‖ and hence |r| = 1. so r is either 1 or −1. In case the other
two eigenvalues form a conjugate pair, say λ and λ, we have
1 = detA = rλλ = r|λ|2
which implies r > 0 and hence r = 1. If the other two eigenvalues are also real, say
r1, r2 ∈ R, then we also have |r1| = 1 and |r2| = 1. Furthermore, 1 = detA = rr1r2 and
hence one of r, r1, r2 is positive. Thus we have shown that 1 is always an eigenvalue of a
48
rotation matrix A. Let v1 be a unit vector such that Av1 = v1. Applying A−1 to
both sides, we get v1 = A−1v1, that is, A−1v1 = v1. Let S = {v1}⊥, the orthogonal
complement of v1. Notice that, if v ∈ S, then
〈Av,v1〉 = 〈v, A⊤v1〉 = 〈v, A−1v1〉 = 〈v,v1〉 = 0
and hence Av ∈ S. This shows that S is an invariant subspace of A. The restriction of A
to S, say AS, is necessary an orthogonal operator with determinant 1. Thus, if vectors
v2,v3 form an orthonormal basis of the 2–dimensional space S, the matrix representation
of AS relative to this basis is necessarily of the form
[
cos θ − sin θsin θ cos θ
]
a rotation matrix with θ as its angle of rotation. Relative to the orthnormal basis B =
{v1, v2, v3}, the representation matrix of A is given by
[A]B =
1 0 00 cos θ − sin θ0 sin θ cos θ
. (C1)
Geometrically, vector v1 gives the direction of the axis of rotation and θ is the angle of
rotation. The connection between A and [A]B is a matter of change of basis. Let
V = [v1 v2 v3], an orthogonal matrix. Then A = V [A]BV −1. Taking traces on both
sides, we get
tr A = tr V [A]BV−1 = tr [A]B = 1 + 2 cos θ (C2)
This gives us the recipe for finding the angle of rotation. Next we describe a way to find
the axis of rotation. As we have seen, Av1 = v1 and A⊤v1 = A−1v1 = v1. Hence
(A−A⊤)v1 = 0. But C = A−A⊤ is a skew symmetric matrix, that is, C⊤ = −C. We
can put C in the following form
C =
0 −a3 a2a3 0 −a1
−a2 a1 0
;
(see the final part of §2.4 in Chapter I). Since Cx = a×x, where a = (a1, a2, a3), we have
Ca = 0. We can set v1 = ‖a‖−1a.
Example. Find the angle and the axis of the “rotation sequence”
R =
cosα − sinα 0sinα cosα 0
0 0 1
1 0 00 cosβ − sinβ0 sinβ cosβ
≡
cosα − sinα cosβ sinα sinβsinα cosα cosβ − cosα sinβ
0 sinβ cosβ
.
49
Solution. Denote by θ the angle of rotation R. Then
1 + 2 cos θ = tr R = cosα+ cosα+ cosβ = (1 + cosα)(1 + cosβ) − 1
and hence cos θ = 12(1 + cosα)(1 + cosβ) − 1 from which θ can be obtained. Form the
skew symmetric matrix (for simplicity, we do not specify the lower left part of this matrix)
R−R⊤ =
0 − sinα(1 + cosβ) sinα sinβ∗ 0 − sinβ(1 + cosα)∗ ∗ 0
.
The axis of rotation is parallel to the vector
v = (sinβ(1 + cosα), sinα sinβ, sinα(1 + cosβ)).
A brute force computation shows Rv = v. We remark that computing rotation sequences
is useful in some practical problems, such as navigation.
Appendix E*: SU(2), Quaternions, and Spinors
Recall that SU(2) is the matrix group of all 2 × 2 unitary matrices of determinant
equal to 1:
SU(2) = {U ∈ U(2) : det(U) = 1}.
Let U be in SU(2). Write down U and UU∗ explicitly as follows
U =
[
z wu v
]
and UU∗ =
[
z wu v
] [
z uw v
]
=
[
|z|2 + |w|2 zu+ wvuz + vw |u|2 + |v|2
]
.
From UU∗ = I we get |z|2 + |w|2 = 1 and uz + vw = 0. Assume w �= 0 and z �= 0.
Then we may write u = αw and v = βz for some α and β. Now uz + vw = 0 gives
(α+ β)zw = 0 and hence α+ β = 0. Thus
1 = det(U) = zv −wu = z(βz) − w(αw)
= z(βz) − w(−βw) = β(|z|2 + |w|2) = β.
Therefore U is of the form
U =
[
z w−w z
]
, where |z|2 + |w|2 ≡ zz + ww = 1. (E.1)
50
In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 × 2
matrix U is in SU(2) if and only if it can be expressed at (E.1) above.
Writing z = x0 + ix1 and w = x1 + ix2 in (E.1), we have
U =
[
z w−w z
]
=
[
x0 + ix1 x2 + ix3−x2 + ix3 x0 − ix1
]
= x01 + x1i + x2j + x3k, (E.2)
where
1 =
[
1 00 1
]
, i =
[
i 00 −i
]
, j =
[
0 1−1 0
]
, k =
[
0 ii 0
]
. (E.3)
Matrix U in (E.1) belongs to SU(2) if and only if
|z|2 + |w|2 ≡ x20 + x21 + x22 + x23 = 1.
An expression written as the RHS of (E.2), without the condition x20 + x21 + x22 + x23 = 1
imposed, is called a quaternion. Since the theory of quaternions was discovered by
Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is
determined by the following identities among basic units 1, i, j, k:
1q = q1 = q, i2 = j2 = k2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j, (E.4)
where q is any quaternion. These identities can be checked by direct computation. We
usually suppress the unit 1 of the quaternion algebra H and write x0 for x01. Let q be
the quaternion given as (E.2), which is a 2 × 2 complex matrix. Its adjoint is given by
q∗ =
[
z −ww z
]
=
[
x0 − ix1 −x2 − ix3x2 − ix3 x0 + ix1
]
= x0 − x1i− x2j− x3k,
which is also called the conjugate of q. A direct computation shows
q∗q = qq∗ = (|z|2 + |w|2)1 ≡ |z|2 + |w|2
= det(q) = x20 + x21 + x22 + x23.
The square root of the last expression is called the norm of q and is denoted by ‖q‖. Thus
q∗q = qq∗ = ‖q‖2.
So, q is in SU(2) if and only if ‖q‖ = 1:
SU(2) = {q = x0 + x1i + x2j + x3k ∈ H : ‖q‖2 ≡ x20 + x21 + x22 + x23 = 1}.
Regarding H as the 4-dimensional space with rectangular coordinates x0, x1, x2, x3, we
may identity SU(2) is the 3-dimensional sphere x20 + x21 + x22 + x23 = 1, which will be
51
simply called the 3-sphere. Notice that, if we write z = x0 + x1i and w = x2 + x3i,
then q = x0 + x1i + x2j + x3k can be written as q = z + wj, in view of ij = k.
For a quaternion q = x0 + x1i + x2j + x3k, we often write q = x0 + x, where x0 is
called the scalar part and x = x1i + x2j + x3k is called the vector part. From (E.4) we
see how to multiply “pure vector” quaternions. It is easy to check that the product of two
quaternions q = x0 + x and r = y0 + y is determined by
qr = (x0 + x)(y0 + y) = x0y0 + x0y + y0x + xy, where
xy = −x · y + x× y.(E.5)
The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for
deciding its conjugate, as we can easily check that
q∗ = (x0 + x)∗ = x0 − x, (E.6)
which resembles the identity x+ iy = x− iy for complex numbers. From (E.6) we see that
a quaternion q is a pure vector if and only if q∗ = −q, that is, q is skew Hermitian.
We identify a pure vector x = x1i + x2j + x3k with the vector x = (x1, x2, x3) in
R3. For each q ∈ SU(2), define a linear transformation R(q) in R3 by putting
R(q)x = q∗xq. (the definition of R(q) here comes from the adjoint representation of a
matrix group, which is SU(2) in the present case, described in Appendix C above.) We
can check that y ≡ R(q)x is indeed in R3:
y∗ = (R(q)x)∗ = (q∗xq)∗ = q∗x∗q = q∗(−x)q = −q∗xq = −y.
The most interesting thing about R(q) is that it is an isometry: x and y ≡ R(q)x have
the same length. Indeed,
‖y‖2 = y∗y = (q∗xq)∗(q∗xq)
= q∗x∗qq∗xq = q∗x∗xq = q∗‖x‖2q = ‖x‖2q∗q = ‖x‖2.
Using some connectedness argument in topology, one can show that R(q) is actually a
rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be
written in the form R(q) and we call it the spinor representation of the rotation. Also,
we call SU(2) the spinor group. It is an essential mathematical device for describing
electron spin, and studying aircraft stability. It is also used to explain how a falling cat
can turn its body 180o in the midair in order to achieve a safe landing, without violating
the basic physical law of conservation of angular momentum.
52