Maths UROPS (MA2288): Norms in Vector Space

Undergraduate Research Opportunity

Programme in Science

NORMS IN VECTOR SPACE

Agus Leonardi Soenjaya

Supervisor: Assoc. Prof. Victor Tan

Department of Mathematics

National University of Singapore

2010

Contents

List of Symbols ii

Abstract iii

1 Vector Norms 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Basic Properties of Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Norms and Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Analytic Properties of Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Geometric Properties of Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . 13

1.6 Duality of Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Matrix Norms 22

2.1 Basic Properties of Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Induced Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Generalised Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Applications of Norms 46

3.1 Sequences and Series of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Bounds for Roots of Algebraic Equations . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Perturbation of Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Bibliography 62

i

List of Symbols

The list of common symbols used throughout this paper are included below for reference.

R the real numbers

C the complex numbers

F a field, which is either R or C

Rn space of n-tuples of complex numbers

Cn space of n-tuples of real numbers

Fn space of n-tuples of either real numbers or complex numbers

Mn space of n× n matrices over the field F

Re (z) real part of complex number z

Im (z) imaginary part of complex number z

〈u,v〉 inner product of u and v

‖v‖ norm of a vector v

‖v‖D dual norm of a vector v

v∗ conjugate transpose of a vector v

B‖·‖ unit ball of vector norm ‖ · ‖

|||A ||| norm of a matrix A

G(A) generalised matrix norm of a matrix A

A∗ conjugate transpose of a matrix A

A−1 inverse of a matrix A

A = [aij ] matrix A with (i, j)-th entry equals aij

diag(d1, . . . , dn) diagonal matrix inMn with diagonal entries d1, . . . , dn

ρ(A) spectral radius of a matrix A

det(A) determinant of a matrix A

κ(A) condition number of matrix A with respect to a given matrix norm

ii

Abstract

This paper discusses the notion of norms and its properties, mainly in finite-dimensional vector

space over the real or complex field, as an abstraction of ‘size’ of vectors and matrices. Some

applications of norms will also be studied.

We begin by discussing the basic properties of vector norms as well as its analytic and geo-

metric properties in Chapter 1. Furthermore, we study some important classes of vector norms,

including vector norms derived from inner product and the dual of vector norms. In Chapter

2, we will generalise the notion of norms to matrices in Mn. Some important classes of matrix

norms as well as its further generalisation, namely the generalised matrix norm inMn, will also

be studied.

Some applications of norms will be discussed in Chapter 3. We will generalise the notion of

series to matrices in Mn using norms to derive various properties analogous to the series of

real-valued functions. We will also derive some bounds for the roots of complex polynomial

using matrix norms discussed earlier. Lastly, some basic theories on perturbation of eigenvalues,

which is important in numerical linear algebra, will be discussed.

iii

Chapter 1

Vector Norms

1.1 Introduction

In Mathematics, it is often necessary to measure the ‘size’ of a vector in Cn or a matrix inMn.

One notion of size which arises naturally in R2 or R3 is the Euclidean length. For instance, given

a vector v = (v1, v2) ∈ R2, one can define the Euclidean length of this vector to be(v2

1 + v22

) 12 .

How could one generalise the notion of ‘size’ further to complex-valued vectors in n-dimensional

complex vector space for instance? What about the ‘size’ of a matrix? One way to answer this

question is to study the notion of norms for vectors and matrices.

1.2 Basic Properties of Vector Norms

Definition 1.2.1 (Vector Norm Axioms). Let V be a vector space over the field F (R or C). A

function ‖ · ‖ : V → R is a vector norm if for all x,y ∈ V , the following axioms are satisfied:

1. (Non-negative) ‖x‖ ≥ 0.

2. (Positive) ‖x‖ = 0 if and only if x = 0.

3. (Homogeneous) ‖cx‖ = |c|‖x‖ for all scalars c ∈ F.

4. (Triangle Inequality) ‖x + y‖ ≤ ‖x‖+ ‖y‖.

Proposition 1.2.2 (Generalised Triangle Inequality). If ‖ · ‖ is a vector norm on V , then

| ‖x‖ − ‖y‖ | ≤ ‖x + y‖ ≤ ‖x‖+ ‖y‖

for all x,y ∈ V .

Proof. The inequality ‖x + y‖ ≤ ‖x‖+ ‖y‖ follows from the norm axioms. It remains to prove

the other inequality.

Since y = −x + (x + y), we have

‖y‖ ≤ ‖−x‖+ ‖x + y‖ = ‖x‖+ ‖x + y‖.

1

by the triangle inequality and the homogeneity axioms. From then, it follows that

‖y‖ − ‖x‖ ≤ ‖x + y‖

Similarly, by writing x = −y + (x + y), we have

‖x‖ − ‖y‖ ≤ ‖x + y‖

Hence, we have proven ±(‖x− y‖) ≤ ‖x + y‖, which is what is needed to be proven.

Below are some examples of common vector norms in finite-dimensional vector space. In

these examples, we let x = (x1, x2, . . . , xn) ∈ Fn.

Example 1.2.3 (l1 norm on Fn).

‖x‖1 ≡ |x1|+ |x2|+ . . .+ |xn|

l1 norm is sometimes called sum norm or Manhattan norm

Example 1.2.4 (l2 norm on Fn).

‖x‖2 ≡(|x1|2 + |x2|2 + . . .+ |xn|2

) 12

l2 norm is also called Euclidean norm.

Example 1.2.5 (l∞ norm on Fn).

‖x‖∞ ≡ max{|x1|, |x2|, . . . , |xn|}

l∞ norm is also called max norm.

Example 1.2.6 (lp norm on Fn).

‖x‖p ≡

(n∑i=1

|xi|p) 1

p

for p ≥ 1.

To prove that lp norm is in general a vector norm (in particular, to prove the triangle

inequality axiom), we need to introduce some inequalities. More details can be seen in the

reference given.

Theorem 1.2.7 (Holder’s inequality). Let 1 < p, q < ∞ and p−1 + q−1 = 1. Suppose

z1, z2, . . . , zn, w1, w2, . . . , wn ∈ C. Then

n∑k=1

|zkwk| ≤

(n∑k=1

|zk|p) 1

p(

n∑k=1

|wk|q) 1

q

with equality if and only if all zk’s are 0, or there exists constant M ≥ 0 such that |wk|q = M |zk|p

for all k.

2

Proof. See [6] for proofs and more details.

Theorem 1.2.8 (Minkowski’s Inequality). Let 1 ≤ p <∞. Suppose a1, a2, . . . , an, b1, b2, . . . , bn ∈C. Then (

n∑k=1

|ak + bk|p) 1

p

≤

(n∑k=1

|ak|p) 1

p

+

(n∑k=1

|bk|p) 1

p

Proof. See [6] for proofs and more details.

Now, we are able to justify rigorously that lp-norm is indeed a vector norm. Moreover, we

will show that l∞-norm is also a vector norm.

Proposition 1.2.9. lp-norm is a vector norm on Fn for 1 ≤ p <∞.

Proof. We need to show that all the norm axioms are satisfied. Let x = (x1, . . . , xn) and c ∈ F.

1. (Non-negative) Clearly ‖x‖p =

(n∑k=1

|xk|p) 1

p

≥ 0.

2. (Positive) Inequality above holds with equality if and only if |xi|p = 0 for i = 1, 2, . . . , n,

which implies all xi’s are 0, i.e. x = 0.

3. (Homogeneous) ‖cx‖p =

(n∑k=1

|cxk|p) 1

p

=

(|c|p

n∑k=1

|xk|p) 1

p

= |c|‖x‖p.

4. (Triangle Inequality) This follows immediately from Minkowski’s inequality.

Hence, lp-norm is a vector norm on Fn for 1 ≤ p <∞.

Proposition 1.2.10. l∞-norm is a vector norm on Fn.

Proof. We need to show all the norm axioms are satisfied. Let x = (x1, . . . , xn) and y =

(y1, . . . , yn) be vectors in Fn and c ∈ F.

1. (Non-negative) Clearly ‖x‖∞ = max1≤k≤n

{|xk|} ≥ 0.

2. (Positive) As |xi| ≥ 0 for any real number xi, where i = 1, 2, . . . , n, inequality above holds

with equality if and only if all xi’s are 0, i.e. x = 0.

3. (Homogeneous) ‖cx‖∞ = max1≤k≤n

{|cxk|} = |c| max1≤k≤n

{|xk|} = |c|‖x‖∞.

4. (Triangle Inequality) Using triangle inequality for real numbers, we have

‖x + y‖∞ = max1≤k≤n

{|xk + yk|}

≤ max1≤k≤n

{|xk|+ |yk|}

≤ max1≤k≤n

{|xk|}+ max1≤k≤n

{|yk|}

= ‖x‖∞ + ‖y‖∞

Hence, we have proven l∞-norm is a vector norm on Fn.

Furthermore, the theorem below shows how lp-norm is connected with the l∞-norm.

3

Theorem 1.2.11. ‖x‖∞ = limp→∞ ‖x‖p for all x ∈ Fn.

Proof. Let x = (x1, . . . , xn). We have

n∑i=1

|xi|p ≥ |xk|p

i.e. (n∑i=1

|xi|p) 1

p

≥ |xk|

for each k = 1, 2, . . . , n. Then we have

(n∑i=1

|xi|p) 1

p

≥ max1≤k≤n

|xk|

i.e.

‖x‖p ≥ ‖x‖∞ (1.1)

Moreover, for p ≥ 1, we have

(‖x‖p)p =n∑i=1

|xi|p ≤ n(

max1≤k≤n

|xk|)p

= n‖x‖∞ (1.2)

Hence, combining inequalities (1.1) and (1.2), we have ‖x‖∞ ≤ ‖x‖p ≤ n1p ‖x‖∞.

Taking limit, and using the fact that limp→∞

n1p = 1, we have the conclusion.

Now we want to construct even bigger classes of vector norms. It can be shown that any

positive linear combinations of vector norms form a new vector norm. The maximum of several

vector norms also form a new vector norm. We can also look at several other ways to construct

new vector norms. The following propositions establish these results more precisely.

Proposition 1.2.12. Let V be a finite-dimensional vector space over the field F (R or C). Let

‖ · ‖α and ‖ · ‖β be two given vector norms and k1, k2 ∈ R+. Then ‖ · ‖γ ≡ k1‖ · ‖α + k2‖ · ‖β is

also a vector norm.

Proof. Clearly, the non-negativity and positivity axioms are satisfied. We check that the re-

maining axioms are also satisfied.

Let x = (x1, . . . , xn) and y = (y1, . . . , yn) be vectors in Fn and c ∈ F.

1. (Homogeneous) ‖cx‖γ = k1‖cx‖α + k2‖cx‖β = |c|(k1‖x‖α + k2‖x‖β) = |c|‖x‖γ by the

homogeneity of each given vector norm.

2. (Triangle Inequality) We have

‖x + y‖γ = k1‖x + y‖α + k2‖x + y‖β≤ k1‖x‖α + k2‖y‖α + k2‖x‖β + k2‖y‖β= (k1‖x‖α + k2‖x‖α) + (k2‖y‖β + k2‖y‖β)

= ‖x‖γ + ‖y‖γ

4

Therefore, ‖ · ‖γ is a vector norm.

Proposition 1.2.13. Let V be a finite-dimensional vector space over the field F (R or C). Let

‖ ·‖α and ‖ ·‖β be two given vector norms. Then ‖ ·‖γ ≡ max{‖ ·‖α, ‖ ·‖β} is also a vector norm.


maining norm axioms are satisfied.


1. (Homogeneous) ‖cx‖γ = max{‖cx‖α, ‖cx‖β} = max{|c|‖x‖α, |c|‖x‖β} = |c|‖x‖γ by the

homogeneity of each given vector norm.


‖x + y‖γ = max{‖x + y‖α, ‖x + y‖β}

≤ max{‖x‖α + ‖y‖α, ‖x‖β + ‖y‖β}

≤ max{‖x‖α, ‖x‖β}+ max{‖‖y‖α, ‖y‖β}

= ‖x‖γ + ‖y‖γ

where we used triangle inequality for each vector norm.

Hence, the result is proven.

Proposition 1.2.14. Let ‖ · ‖ be a vector norm on a finite-dimensional vector space over the

field F (R or C). If T ∈Mn is non-singular, then ‖ · ‖T defined by ‖x‖T ≡ ‖Tx‖, where x ∈ Fn

is also a vector norm on Fn.


maining norm axioms are satisfied.


1. (Homogeneous) ‖cx‖T = ‖T (cx)‖ = |c|‖Tx‖ = |c|‖x‖T by the homogeneity of the given

vector norm.


‖x + y‖T = ‖T (x + y)‖ ≤ ‖Tx‖+ ‖Ty‖ = ‖x‖T + ‖y‖T

hence proving the result.

Another way to construct a new norm from another vector norm is by using the concept of

duality. This will be discussed in the later section.

1.3 Norms and Inner Products

In this section, we will study another class of vector norms which are as important as the vector

norms previously described. These are vector norms derived by the so-called ‘inner product’.

The notion of inner product comes from the study of angle between two vectors in Cn. We will

formalise this concept in this section.

5

Definition 1.3.1 (Inner Product Axioms). Let V be a vector space over the field F (R or C).

A function 〈·, ·〉 : V × V → F is an inner product if for all x,y, z ∈ V , the following axioms are

satisfied:

1. (Non-negative) 〈x,x〉 ≥ 0.

2. (Positive) 〈x,x〉 = 0 if and only if x = 0.

3. (Additive) 〈x + y, z〉 = 〈x, z〉+ 〈y, z〉.

4. (Homogeneous) 〈cx,y〉 = c〈x,y〉 for all scalars c ∈ F.

5. (Hermitian) 〈x,y〉 = 〈y,x〉

Below are some useful properties of inner product as consequences of the above axioms.

Proposition 1.3.2. Let V be a vector space over the field F (R or C) equipped with the inner

product 〈·, ·〉. Let x,y, z ∈ V and c ∈ F. Then

1. 〈x, cy〉 = c〈x,y〉.

2. 〈x,y + z〉 = 〈x,y〉+ 〈x, z〉.

3. 〈x,y〉 = 0 for all y ∈ V if and only if x = 0.

4. 〈x, 〈x,y〉y〉 = |〈x,y〉|2

Proof. 1. We have

〈x, cy〉 = 〈cy,x〉 = c〈y,x〉 = c〈x,y〉

2. We have

〈x,y + z〉 = 〈y + z,x〉 = 〈y,x〉+ 〈z,x〉 = 〈x,y〉+ 〈x, z〉

3. (⇒) Suppose that 〈x,y〉 = 0 for all y ∈ V . Observe that, in particular, the identity is

satisfied when y = x, i.e. 〈x,x〉 = 0 if and only if x = 0 as required.

(⇐) Suppose that x = 0. Then we have

〈0,y〉 = 〈0 + 0,y〉 = 〈0,y〉+ 〈0,y〉

i.e. 〈0,y〉 = 0 as required.

4. Treating 〈x,y〉 as constant and using identity 1, we have

〈x, 〈x,y〉y〉 = 〈x,y〉〈x,y〉 = |〈x,y〉|2

Below is an important inequality involving inner products known as Cauchy-Schwarz in-

equality.

6

Theorem 1.3.3 (Cauchy-Schwarz Inequality). If 〈·, ·〉 is an inner product on a vector space V

over the field F (R or C), then

|〈x,y〉|2 ≤ 〈x,x〉〈y,y〉

for all x,y ∈ V .

Equality occurs if and only if y = αx for some α ∈ F.

Proof. If y = 0, then the assertion is trivial.

Assume y 6= 0. Let t ∈ F.

Consider p(t) ≡ 〈x + ty,x + ty〉 = 〈x,x〉 + 2tRe 〈x,y〉 + t2〈y,y〉, which is a real quadratic

polynomial with real coefficients.

By the non-negativity axiom of inner product, we have p(t) ≥ 0 for all real value of t. The

discriminant of p(t) is therefore non-positive, i.e.

(2 Re 〈x,y〉)2 − 4〈y,y〉〈x,x〉 ≤ 0

and hence

(Re 〈x,y〉)2 ≤ 〈x,x〉〈y,y〉

Since this inequality holds for any pair of vectors, we can replace y by 〈x,y〉y. Then we have

(Re {〈x, 〈x,y〉y〉})2 ≤ 〈x,x〉〈y,y〉|〈x,y〉|2

However, we have Re 〈x, 〈x,y〉y〉 = Re |〈x,y〉|2 = |〈x,y〉|2. Therefore the inequality above is

equivalent to

|〈x,y〉|4 ≤ 〈x,x〉〈y,y〉|〈x,y〉|2

Now, if 〈x,y〉 = 0, then the assertion is trivial as it follows directly from the non-negativity

axiom of inner product. Otherwise, we may divide both sides of the inequality by |〈x,y〉|2 to

obtain the desired result.

By the positivity axiom of inner product, p(t) can have a real (double) root if and only if

x + ty = 0 for some t, i.e. y = αx for some constant α ∈ F.

Now, we are in position to define another class of norm that is derived from some inner

product. This is stated in the theorem below.

Theorem 1.3.4. If 〈·, ·〉 is an inner product on a vector space over the field F (R or C), then

‖x‖ ≡√〈x,x〉 is a vector norm on V . In this case, ‖ · ‖ is said to be a vector norm derived from

an inner product 〈·, ·〉

Proof. Let x,y ∈ Fn and c ∈ F.

1. (Non-negative) By the inner product axiom,√〈x,x〉 ≥ 0.

2. (Positive) By the inner product axiom, equality for the above occur only when x = 0.

3. (Homogeneous) We have ‖cx‖ =√〈cx, cx〉 =

√cc〈x,x〉 = |c|

√〈x,x〉 = |c|‖x‖.

7


‖x + y‖2 = 〈x + y,x + y〉

= 〈x,x〉+ 〈x,y〉+ 〈y,x〉+ 〈y,y〉

= ‖x‖2 + 2 Re (〈x,y〉) + ‖y‖2

≤ ‖x‖2 + 2√〈x,x〉〈y,y〉+ ‖y‖2

= ‖x‖2 + 2‖x‖‖y‖+ ‖y‖2

= (‖x‖+ ‖y‖)2

where above we used Cauchy-Schwarz Inequality.

Therefore, we have ‖x + y‖ ≤ ‖x‖+ ‖y‖ as required.

Hence, we have shown that ‖ · ‖ is a vector norm.

Next, we will formulate a necessary and sufficient condition for a vector norm to be derived

from an inner product.

Theorem 1.3.5 (Parallelogram and Polarisation Identity). Let V be a vector space over the

field F (R or C) equipped with a vector norm ‖ · ‖. Let x,y ∈ V . Then ‖ · ‖ is derived from an

inner product if and only if the Parallelogram Identity :

‖x + y‖2 + ‖x− y‖2 = 2(‖x‖2 + ‖y‖2

)is satisfied.

In such case, the inner product is necessarily given by the Polarisation Identity :

1. (For real vector space) 〈x,y〉 = 14

(‖x + y‖2 − ‖x− y‖2

).

2. (For complex vector space) 〈x,y〉 = 14

(‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2

).

for all x,y ∈ V .

Proof. (⇒)

Suppose ‖ · ‖ is derived from an inner product 〈·, ·〉.Expanding the left-hand side of parallelogram identity, we have

‖x + y‖2 + ‖x− y‖2 = 〈x + y,x + y〉+ 〈x− y,x− y〉

= (〈x,x〉+ 〈x,y〉+ 〈y,x〉+ 〈y,y〉) + (〈x,x〉 − 〈x,y〉 − 〈y,x〉+ 〈y,y〉)

= 2 (〈x,x〉+ 〈y,y〉)

= 2(‖x‖2 + ‖y‖2

)proving the parallelogram identity.

(⇐)

Refer to [6] for proof of sufficiency.

8

For the Polarisation Identity in real vector space, expanding the right-hand side, we have

‖x + y‖2 − ‖x− y‖2 = 〈x + y,x + y〉 − 〈x− y,x− y〉

= (〈x,x〉+ 〈x,y〉+ 〈y,x〉+ 〈y,y〉)− (〈x,x〉 − 〈x,y〉 − 〈y,x〉+ 〈y,y〉)

= 2 (〈x,y〉+ 〈y,x〉)

= 4〈x,y〉

hence proving the Polarisation Identity in real vector space.

By similar calculation as above, for the case of complex vector space, we have:

‖x + y‖2 − ‖x− y‖2 = 2〈x,y〉+ 2〈y,x〉 (1.3)

and

‖x + iy‖2 − ‖x− iy‖2 = 2〈x, iy〉+ 2〈iy,x〉 (1.4)

Adding (1.3) and (1.4), we have

‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2 = 2〈x,y〉+ 2〈y,x〉+ 2i〈x, iy〉+ 2i〈iy,x〉

= 2〈x,y〉+ 2〈y,x〉+ 2(−i)2〈x,y〉+ 2i2〈y,x〉

= 4〈x,y〉

proving the Polarisation Identity in complex vector space.

By the above theorem, we can now show that some vector norms are not derived from inner

product. The following proposition will apply the above theorem for the case of l∞-norm.

Proposition 1.3.6. l∞-norm is not derived from any inner product.

Proof. Let x = (1, 1) and y = (0,−1).

Then it can be calculated that

1

2

(‖x + y‖2∞ + ‖x− y‖2∞

)=

1

2(12 + 22) =

5

2

but

‖x‖2∞ + ‖y‖2∞ = 12 + 12 = 2

hence Parallelogram Identity is not satisfied, proving the assertion.

1.4 Analytic Properties of Vector Norms

In the previous sections, we have derived several classes of vector norms. This is necessary

because one norm is more appropriate in some situations than others. For instance, l2 norm

is most commonly used in optimisation theory because it is continuously differentiable almost

everywhere (see [7]). On the other hand, l1 norm is more naturally used in statistics as it gives

a robust estimator in some statistical problems (see [8]). It turns out that in finite-dimensional

vector space, all vector norms are ‘equivalent’ in a certain sense, as we will see in this section.

We will begin by examining some useful analytic properties of vector norm.

9

Definition 1.4.1. Let V be a vector space over the field F (R or C) and let ‖ · ‖ be a norm on

V . The sequence {xk} of vectors in V is said to converge to a vector x ∈ V with respect to the

norm ‖ · ‖ if and only if ‖xk − x‖ → 0 as k →∞.

In such case, we write

limk→∞

xk = x with respect to ‖ · ‖

Furthermore, the theorem below guarantees that the limit of a sequence of vectors, if it

exists, is unique.

Theorem 1.4.2. Let ‖ · ‖ be a vector norm on V . If xk → x with respect to ‖ · ‖ and xk → y

with respect to the same vector norm ‖ · ‖, then x = y.

Proof. By triangle inequality, we have,

0 ≤ ‖x− y‖ ≤ ‖xk − x‖+ ‖xk − y‖ (1.5)

The right hand side of (1.5) converges to 0 by assumption. This implies ‖x− y‖ converges to 0

as k →∞ with respect to ‖ · ‖. Hence x = y as required.

To compare between one vector norm and the other vector norms, we need the notion of the

equivalence of vector norms as defined below.

Definition 1.4.3 (Equivalence of Vector Norms). Let V be a vector space over the field F (R or

C). Let ‖ · ‖α and ‖ · ‖β be any two vector norms. Then ‖ · ‖α and ‖ · ‖β are said to be equivalent

if and only if there exists finite positive constants Cm and CM such that

Cm‖x‖α ≤ ‖x‖β ≤ CM‖x‖α

for all x ∈ V .

Furthermore, in finite-dimensional vector space, there is an equivalent criterion for the con-

vergence of a sequence of vectors, namely the Cauchy Criterion. We will first define this notion

and then formulate this precisely in the following theorem.

Definition 1.4.4 (Cauchy Sequence). A sequence {xk} in a vector space V is said to be a

Cauchy sequence with respect to the vector norm ‖ · ‖ if for each ε > 0, there exists a positive

integer N = N(ε), such that whenever m,n ≥ N ,

‖xm − xn‖ < ε

Theorem 1.4.5. Let ‖ · ‖ be a given vector norm on a finite-dimensional real or complex vector

space V , and let {xk} be a given sequence of vectors in V . The sequence {xk} converges to a

vector in V if and only if it is a Cauchy sequence with respect to the norm ‖ · ‖.

Proof. By choosing a basis B for V , performing change of coordinate between vector spaces, and

considering the equivalence of norms in finite-dimensional vector space, we see that there is no

loss of generality in assuming V = Cn for some integer n.

(⇐)

10

Suppose {xk} is a Cauchy sequence, then so is each component sequence x(i)k of complex numbers

for each i = 1, . . . , n. Since a Cauchy sequence of complex numbers must have a limit, then for

each i = 1, . . . , n, there exists a scalar x(i) such that limk→∞

x(i)k = x(i). It is easily verified that

limk→∞

xk = x, where x = (x(1), . . . , x(n)) ∈ V .

(⇒)

Conversely, if there exists x ∈ V such that limk→∞

xk = x, then by Triangle Inequality,

‖xm − xn‖ ≤ ‖xm − x‖+ ‖xn − x‖

where both terms on the right hand side converge to 0, hence the given sequence is a Cauchy

sequence.

We will now discuss the notion of the equivalence of norms in finite-dimensional vector space.

To do so, we need a result on the continuity property of vector norms.

Lemma 1.4.6. Let ‖ · ‖ be a vector norm on a vector space V over the field F (R or C), and

let x1,x2, . . . ,xm be given vectors. Then the function g : Fm → R defined by

g(z1, z2, . . . , zm) ≡ ‖z1x1 + z2x2 + . . .+ zmxm‖

is a uniformly continuous function.

Proof. Let u =m∑i=1

uixi and v =m∑i=1

vixi. Then we have

|g(u1, . . . , um)− g(v1, . . . , vm)| = |‖u‖ − ‖v‖|

≤ ‖u− v‖

=

∥∥∥∥∥m∑i=1

(ui − vi)xi

∥∥∥∥∥≤

m∑i=1

|ui − vi|‖xi‖

≤ C max1≤i≤m

|ui − vi|

where C ≡ m max1≤i≤m

‖xi‖.Now if xi’s are all zero vector, then there is nothing to show. If not, given ε > 0, then in order

to have |g(u1, . . . , um)− g(v1, . . . , vm)| < ε, we only need to choose |ui − vi| < ε/C, proving the

result.

One of the useful corollary which is almost immediate from the above lemma is stated below.

Corollary 1.4.7. Every vector norm on Fn is a uniformly continuous function

Proof. In Lemma 1.4.6, choose the given vectors x1, . . . ,xn to be a basis for Fn. Then every

vector in Fn can be written as linear combination of the basis vectors. The result then follows.

The following theorem is a slightly more general result that we will need to establish the

equivalence of all vector norms in finite-dimensional vector space.

11

Theorem 1.4.8. Let f1 and f2 be two-real valued functions on a finite-dimensional vector space

V over the field F (R or C), and let B = {x1, . . . ,xn} be a basis for V . Furthermore, assume

that both f1 and f2 are:

1. Positive: fi(x) ≥ 0 for all x ∈ V and fi(x) = 0 if and only if x = 0;

2. Homogeneous: fi(αx) = |α|fi(x) for all α ∈ F and all x ∈ V ;

3. Continuous: fi(x(z)) is continuous on Fn, where

z = (z1, z2, . . . , zn) ∈ Fn and x(z) ≡ z1x1 + . . .+ znxn

Then there exists finite positive constants Cm and CM such that

Cmf1(x) ≤ f2(x) ≤ CMf1(x)

for all x ∈ V .

Proof. Define h(z) = f1(x(z))/f2(x(z)) on the Euclidean unit sphere S = {z ∈ Fn : ‖z‖2 = 1},which is closed and bounded in Fn. Note that h(z) does not vanish on S by assumption (1),

and therefore h(z) is continuous on S by assumption (3). By the Weierstrass theorem (see [7]

for proof and more details], the continuous function h achieves a finite positive maximum CM

and a positive minimum Cm on the closed and bounded set S. Hence, we have

Cmf1(x(z)) ≤ f2(x(z)) ≤ CMf1(x(z)) (1.6)

for all z ∈ S. Now, we have z/‖z‖2 for every non-zero vector z ∈ Fn. By the homogeneity

assumption (2), the inequality (1.6) hold for all non-zero z ∈ Fn. Trivially, the case where z = 0

holds. Now, every vector x ∈ V is of the form x = x(z) for some z ∈ Fn because B is a basis,

hence the inequality holds for all x ∈ V .

It then follows immediately by Theorem 1.4.8 that all vector norms on finite-dimensional

vector space are equivalent. This is stated in the corollary below.

Corollary 1.4.9. Let V be a finite-dimensional vector space over the field F (R or C). Then

all vector norms in V are equivalent.

The following theorem give equivalent statements on the equivalence of two vector norms.

Theorem 1.4.10. Let ‖ · ‖α and ‖ · ‖β be two vector norms on the vector space V over the field

F (R or C). Then the following statements are equivalent:

1. ‖ · ‖α and ‖ · ‖β are equivalent vector norms.

2. There exists finite positive constants Cm and CM such that

Cm‖x‖α ≤ ‖x‖β ≤ CM‖x‖α

for all x ∈ V .

3. limk→∞

xk = x with respect to ‖ · ‖α if and only if limk→∞

xk = x with respect to ‖ · ‖β

12

Proof. (1) ⇔ (2) follows immediately from definition.

(2) ⇒ (3)

By assumption, we have

Cm‖xk − x‖α ≤ ‖xk − x‖β ≤ CM‖xk − x‖α

for all k and some finite positive constants Cm and CM .

Then it follows that ‖xk − x‖β → 0 if ‖xk − x‖α → 0 as k →∞.

Similarly, we have

0 ≤ ‖xk − x‖α ≤ C−1m ‖xk − x‖β

from which it follows that ‖xk − x‖α → 0 if ‖xk − x‖β → 0 as k →∞.

(2) ⇐ (3)

Let f(x) ≡‖x‖β‖x‖α

on the unit sphere S of ‖ · ‖β, i.e. f(x) =1

‖x‖αon S.

Suppose f is unbounded. Then for all k ∈ R, there exists positive integer N such that if n > N ,

then f(xn) > k, i.e. 0 < ‖xn‖α < 1k and ‖xn‖β = 1.

But this implies, ‖xn‖α → 0 and ‖xn‖β → 1, which contradicts the equivalence of ‖ · ‖α and

‖ · ‖β.

Hence, f must be bounded, i.e. there exists positive constants Cm and CM such that

Cm ≤‖x‖β‖x‖α

≤ CM

which upon simplification giving the result.

1.5 Geometric Properties of Vector Norms

We will look at some of the geometric properties of vector norms in this section. In particular,

the properties of the unit ball of vector norms will be studied.

Definition 1.5.1. Let ‖ · ‖ be a vector norm on a finite-dimensional vector space V over the

field F (R or C) and let x be a vector in V . Let r > 0 be given. The ball of radius r around x

is defined to be the set

B‖·‖(r, x) ≡ {y ∈ V : ‖y − x‖ ≤ r}

In particular, the unit ball of ‖ · ‖ is the set

B‖·‖ ≡ B‖·‖(1, 0) = {y ∈ V : ‖y‖ ≤ 1}

The following gives an example of the relation between vector norms and the unit ball.

Example 1.5.2. Let ‖ · ‖α and ‖ · ‖β be two vector norms on finite-dimensional vector space

V over the field F (R or C). Define a new vector norm ‖ · ‖ by ‖ · ‖ ≡ max(‖ · ‖α, ‖ · ‖β). Then

B‖·‖ = B‖·‖α ∩B‖·‖β .

Proof. We have x ∈ B‖·‖ if and only if max(‖ · ‖α, ‖ · ‖β) ≤ 1, which is equivalent to ‖x‖α ≤

13

1 and ‖x‖β ≤ 1, i.e. x ∈ B‖·‖α and x ∈ B‖·‖β .

Hence, x ∈ B‖·‖α ∩B‖·‖β , proving the result.

The ordering of vector norms can be described geometrically by the containment of the unit

balls. This is seen in the following proposition.

Proposition 1.5.3. Let ‖ · ‖α and ‖ · ‖β be two vector norms on finite-dimensional vector space

V over the field F (R or C). Then ‖x‖α ≤ ‖x‖β if and only if B‖·‖β ⊆ B‖·‖α .

Proof. (⇒) Suppose ‖x‖α ≤ ‖x‖β.

Then for any z ∈ B‖·‖β , i.e. ‖z‖β ≤ 1, we have ‖z‖α ≤ ‖z‖β ≤ 1, i.e. z ∈ B‖·‖αThus we have proven B‖·‖β ⊆ B‖·‖α .

(⇐) Conversely suppose B‖·‖β ⊆ B‖·‖α .

Then for any z such that ‖z‖β ≤ 1, we have ‖z‖α ≤ 1.

Now for any z 6= 0 ∈ V , we have

∥∥∥∥ z

‖z‖β

∥∥∥∥β

= 1.

This implies

∥∥∥∥ z

‖z‖β

∥∥∥∥α

≤ 1, i.e. ‖x‖α ≤ ‖x‖β, proving the claim.

We will now characterise the unit ball of vector norms by several properties that it possesses.

Proposition 1.5.4. Let ‖ · ‖ be a vector norm on finite-dimensional vector space V over the

field F (R or C). The unit ball of ‖ · ‖, i.e. B‖·‖ has the following properties:

1. B‖·‖ contains 0 as an interior point.

2. B‖·‖ is equilibrated, i.e. if x ∈ B‖·‖, then αx ∈ B‖·‖ for all scalars α such that |α| = 1.

3. B‖·‖ is convex, i.e. for all x,y ∈ B‖·‖ and for all t ∈ [0, 1], tx + (1− t)y ∈ B‖·‖.

Proof. (1) ‖0‖ = 0 ≤ 1, hence 0 ∈ B‖·‖

(2) Let v ∈ B‖·‖ and |α| = 1, then ‖αv‖ = |α|‖v‖ ≤ 1, i.e. v ∈ B‖·‖.

(3) Let x,y ∈ B‖·‖ and t ∈ [0, 1], then

‖tx + (1− t)y‖ ≤ ‖tx‖+ ‖(1− t)y‖

= t‖x‖+ (1− t)‖y‖

≤ t+ (1− t) ≤ 1

i.e. tx + (1− t)y ∈ B‖·‖.

The above properties are in fact sufficient to characterise the unit ball of vector norms in

finite-dimensional vector space.

Theorem 1.5.5. A set B in a finite-dimensional vector space V over the field F (R or C) is the

unit ball of a vector norm on V if and only if B is a compact, convex, and equilibrated set with

0 as an interior point.

14

Proof. The necessary conditions have been proven in Proposition 1.5.4. To see that they suffice

for the definition of a norm, we consider any nonzero point x ∈ V . We construct a ray segment

{αx : 0 ≤ α ≤ 1} from the origin through x. Define the length of ‖x‖ as the proportional

distance along this ray from the origin to x, with the length of the interval of the ray from the

origin to the unique point on the boundary of the unit ball serving as one unit. By defining ‖ · ‖in this way, the unit ball B completely characterises the vector norm ‖ · ‖. Formally, define ‖x‖by

‖x‖ =

{0 if x = 0

min{

1t : t > 0, tx ∈ B

}if x 6= 0

Observe that the function is finite for each nonzero vector x because B is compact and 0 is an

interior point of B. It remains to check that ‖ · ‖ is a vector norm.

1. (Non-negative and Positive) It immediately follows by the definition of ‖ · ‖ that ‖x‖ ≥ 0

for all x ∈ V . Observe that min{

1t : t > 0, tx ∈ B

}cannot be zero by the compactness of

B. Hence we have ‖x‖ = 0 if and only if x = 0.

2. (Homogeneous) This is trivially true when x = 0. Suppose x 6= 0 and let α ∈ F. Again,

this is trivially true if α = 0. Now, suppose α 6= 0. Then

‖αx‖ = min

{1

t: t > 0 and tαx ∈ B

}= min

{|α|t′

:t′

α> 0 and

t′

|α|αx ∈ B

}= min

{|α|t′

: t′ > 0 and t′x ∈ B}

(by equilibriation assumption)

= |α|min

{1

t′: t′ > 0 and t′x ∈ B

}= |α|‖x‖

3. (Triangle Inequality) This is trivially true when x = 0 or y = 0. Let x,y be nonzero

vectors in V , thenx

‖x‖and

y

‖y‖are unit vectors that lie on the boundary of B. By

convexity assumption, the vector

z =‖x‖

‖x‖+ ‖y‖x

‖x‖+

‖y‖‖x‖+ ‖y‖

y

‖y‖

also lies in B, i.e. ‖z‖ ≤ 1, which is equivalent to ‖x + y‖ ≤ ‖x‖+ ‖y‖.

The above theorem shows that the unit ball of a norm are sufficient to characterise a vector

norm, a property which will be used to prove the duality theorem in the later section.

1.6 Duality of Vector Norms

Using the fact that the unit ball of any vector norm on Rn or Cn is compact, we will study

another important method of generating new class of vector norms as well as its properties

through the concept of ‘dual norm’.

15

Definition 1.6.1. Let ‖ · ‖ be a vector norm on finite-dimensional vector space V over the field

F (R or C). Let x,y ∈ V . The function defined by

‖y‖D ≡ max‖x‖=1

Re y∗x

is the dual norm of ‖ · ‖.

The dual norm is a well-defined function on V . Observe that Re y∗x is a continuous function

of x for each fixed y ∈ V . Furthermore, as the unit sphere of ‖ · ‖ is compact, by Weierstrass

theorem, the maximum of Re y∗x is attained at some point x0 on the unit sphere of ‖ · ‖. Below

we give an equivalent definition of the dual norm.

Proposition 1.6.2. Let ‖ · ‖ be a norm on finite-dimensional vector space V over the field F(R or C) and let ‖ · ‖D be its dual norm. Let x,y ∈ V . Then

‖y‖D = max‖x‖=1

|y∗x|

Proof. Let c ∈ F be a scalar such that |c| = 1. Then by the homogeneity of f , we have

max‖x‖=1

|y∗x| = max‖x‖=1

max|c|=1

Re cy∗x

= max‖x‖=1

max|c|=1

Re y∗(cx)

= max|c|=1

max‖x/c‖=1

Re y∗x

= max‖x‖=1

Re y∗x

hence the two definitions are equivalent.

Dual norm is named as such because the dual norm of a vector norm is again a vector norm.

This is proved in the following proposition.

Proposition 1.6.3. Let ‖ · ‖D be the dual norm of a vector norm ‖ · ‖ on finite-dimensional

vector space V over the field F (R or C). Then ‖ · ‖D is a vector norm.

Proof. We will check that ‖ · ‖D satisfies the norm axioms. Let x,y ∈ V .

1. (Homogeneity) Let c ∈ F. Then we have

‖cy‖D = max‖x‖=1

|(cy)∗x|

= max‖x‖=1

|c||y∗x|

= |c| max‖x‖=1

|y∗x|

= |c|‖y‖D

2. (Positive and Non-Negative) For y 6= 0,

‖y‖D = max‖x‖=1

|y∗x| ≥∣∣∣∣y∗ y

‖y‖

∣∣∣∣ =‖y‖22‖y‖

> 0

16

Furthermore, we have ‖0‖D = 0. As x 6= 0, we have max‖x‖=1

|y∗x| = 0 if and only if y = 0.

3. (Triangle Inequality) Let x,y, z ∈ V

‖y + z‖D = max‖x‖=1

|(y + z)∗x|

≤ max‖x‖=1

(|y∗x|+ |z∗x|)

≤ max‖x‖=1

|y∗x|+ max‖x‖=1

|z∗x|

= ‖y‖D + ‖z‖D

We will derive the dual of some common vector norms in the following propositions.

Proposition 1.6.4. The dual of l1-norm is l∞-norm. The dual of l∞-norm is l1-norm.

Proof. Let x,y ∈ Fn. We have, by Triangle Inequality,

|y∗x| =

∣∣∣∣∣n∑i=1

yixi

∣∣∣∣∣ ≤n∑i=1

|yixi| ≤ max1≤i≤n

|yi|n∑j=1

|xj | = ‖y‖∞‖x‖1 (1.7)

Now, given vector y, equality holds in (1.7) when x is a unit vector (with respect to ‖ · ‖1) such

that xi = 1 for one value of i for which |yi| = ‖y‖∞, and xi = 0 otherwise, where 1 ≤ i ≤ n.

Hence, we have

(‖y‖1)D = max‖x‖1=1

|y∗x| = max‖x‖1=1

‖y‖∞‖x‖1 = ‖y‖∞

from which we conclude the result. Similarly, given vector x, then equality holds in (1.7) when

y is a unit vector (with respect to ‖ · ‖∞), such that yi = xi/|xi| for all i where xi 6= 0, and

yi = 0 otherwise. Then, we have

(‖y‖∞)D = max‖x‖∞=1

|y∗x| = max‖x‖∞=1

‖y‖1‖x‖∞ = ‖y‖1

from which we could further conclude that (‖ · ‖∞)D = ‖ · ‖1.

Proposition 1.6.5. The dual of l2-norm is itself.

Proof. Let x,y ∈ Fn. Then by Cauchy-Schwarz Inequality,

|y∗x| =

∣∣∣∣∣n∑i=1

yixi

∣∣∣∣∣ ≤ ‖y‖2‖x‖2with equality when x = y/‖y‖2.

By similar argument as in Proposition 1.6.4, we can prove the result.

In Proposition 1.6.5, it is observed that the dual of l2-norm is itself. This, in fact, is the only

vector norm whose dual is itself. To prove this, it is necessary to first establish the following

inequality, which is a natural generalisation of the Cauchy-Schwarz Inequality.

17

Proposition 1.6.6. Let ‖ · ‖ be a vector norm on finite-dimensional vector space V over the

field F (R or C). Then for all x,y ∈ V ,

|y∗x| ≤ ‖x‖‖y‖D (1.8)

|y∗x| ≤ ‖x‖D‖y‖ (1.9)

Proof. When x = 0, inequality (1.8) holds trivially.

Suppose x 6= 0. Then we have ∣∣∣∣y∗ x

‖x‖

∣∣∣∣ ≤ max‖z‖=1

|y∗z| = ‖y‖D

and hence |y∗x| ≤ ‖x‖‖y‖D, proving inequality (1.8). Inequality (1.9) follows since we have

|y∗x| = |x∗y|.

Proposition 1.6.7. Let ‖ · ‖ be a vector norm on a finite-dimensional vector space V over the

field F (R or C), and let ‖ · ‖D be its dual norm. Let c > 0 be given. Then ‖x‖ = c‖x‖D for all

x ∈ V if and only if ‖ · ‖ =√c‖ · ‖2.

In particular, ‖ · ‖ = ‖ · ‖D if and only if ‖ · ‖ is the l2-norm.

Proof. (⇐)

Suppose ‖ · ‖ =√c‖ · ‖2. Let x ∈ V . Then

‖x‖D = max‖y‖=1

|x∗y| = max‖y‖2= 1√

c

|x∗y| = max‖y‖2=1

∣∣∣∣x∗ y√c

∣∣∣∣=

1√c

max‖y‖2=1

|x∗y|

=1√c‖x‖D2

=1√c‖x‖2

=1

c‖x‖

Hence for any x ∈ V , ‖x‖ = c‖x‖D as required.

(⇒)

Conversely, suppose ‖ · ‖ = c‖ · ‖D for some c > 0. Let x ∈ V . Then by Proposition 1.6.6,

‖x‖22 = |x∗x| ≤ ‖x‖‖x‖D =1

c‖x‖2 (1.10)

so ‖x‖ ≥√c‖x‖2. Moreover by Proposition 1.6.5, we have

|x∗y| ≤ ‖x‖2‖y‖2 (1.11)

with equality when y = x/‖x‖2. Observe that ‖y‖2 = 1. Hence,

maxy 6=0

∣∣∣∣x∗ y

‖y‖2

∣∣∣∣ = max‖y‖2=1

|x∗y| = ‖x‖2∥∥∥∥ x

‖x‖2

∥∥∥∥2

= ‖x‖2 (1.12)

18

where this maximum is attained at y = x/‖x‖2.

Using (1.12), we establish the reverse bound if x 6= 0, by considering

1

c‖x‖ = ‖x‖D = max

‖y‖=1|x∗y|

= maxy 6=0

∣∣∣∣x∗ y

‖y‖

∣∣∣∣= max

y 6=0

∣∣∣∣x∗ y

‖y‖2‖y‖2‖y‖

∣∣∣∣≤ max

y 6=0

∣∣∣∣x∗ y

‖y‖21√c

∣∣∣∣= ‖x‖2

1√c

Hence, combining this with (1.10), we have proven ‖x‖ =√c‖x‖2.

By taking c = 1, the final assertion follows, and we have shown that the l2-norm is the only

norm which is its own dual.

We will now establish the equivalence between dual norms as well as between the vector

norm with its dual. This is one of the nice and useful properties that finite-dimensional vector

space has.

Lemma 1.6.8. Let ‖ · ‖α and ‖ · ‖β be two given vector norms, and ‖ · ‖α and ‖ · ‖β be their

duals respectively, on finite-dimensional vector space V over the field F (R or C). Suppose there

exists some constant C > 0 such that ‖x‖α ≤ C‖x‖β for all x ∈ V . Then ‖x‖Dβ ≤ C‖x‖Dα for

all x ∈ V .

Proof. We have

‖x‖Dα = max‖y‖α=1

|x∗y| = maxy 6=0

∣∣∣∣ x∗y

‖y‖α

∣∣∣∣≥ max

y 6=0

|x∗y|C‖y‖β

=1

Cmaxy 6=0

|x∗y|‖y‖β

=1

Cmax‖y‖β=1

|x∗y|

=1

C‖x‖Dβ

which upon rearranging, gives the conclusion.

Theorem 1.6.9. Let ‖ · ‖α and ‖ · ‖β be two given vector norms on finite-dimensional vector

space V over the field F (R or C). Suppose that ‖ · ‖α and ‖ · ‖β are equivalent vector norms.

Then their duals, ‖ · ‖Dα and ‖ · ‖Dβ are also equivalent.

Proof. By the equivalence of ‖ · ‖α and ‖ · ‖β, we have, for all x ∈ V and some positive constant

cm and CM ,

cm‖x‖β ≤ ‖x‖α ≤ CM‖x‖β

19

In particular, we have ‖x‖α ≤ CM‖x‖β implies ‖x‖Dβ ≤ CM‖x‖Dα , and ‖x‖β ≤ c−1m ‖x‖α implies

‖x‖Dα ≤ c−1m ‖x‖Dβ for all x ∈ V by Lemma 1.6.8.

Hence, we have shown

C−1M ‖x‖

Dβ ≤ ‖x‖Dα ≤ c−1

m ‖x‖Dβ

proving the equivalence of ‖ · ‖Dα and ‖ · ‖Dβ .

Theorem 1.6.10. Let ‖·‖ be a vector norm and ‖·‖D be its dual on a finite-dimensional vector

space V over the field F (R or C). Then ‖ · ‖ and ‖ · ‖D are equivalent.

Proof. Note that in finite-dimensional vector space we have the equivalence of ‖ · ‖ and l2-norm:

cm‖x‖ ≤ ‖x‖2 ≤ CM‖x‖ (1.13)

for all x ∈ V and some positive constants cm and CM .

Using (1.11) and (1.13), for the upper bound we have

maxx 6=0

‖x‖D

‖x‖= max

x 6=0

max‖y‖=1 |x∗y|‖x‖

= maxx 6=0

max‖y‖=1

|x∗y|‖x‖

= maxx 6=0

max‖y‖=1

∣∣∣∣( x

‖x‖

)∗y

∣∣∣∣= max‖x‖=1

max‖y‖=1

|x∗y|

≤ max‖x‖=1

max‖y‖=1

‖x‖2‖y‖2

≤ max‖x‖=1

max‖y‖=1

CM‖x‖CM‖y‖

= C2M

By similar argument as above, for the lower bound, consider

minx 6=0

‖x‖D

‖x‖= min‖x‖=1

max‖y‖=1

|x∗y| ≥ min‖x‖=1

∣∣∣∣x∗x‖x‖∣∣∣∣

= min‖x‖=1

∣∣∣∣‖x‖22‖x‖

∣∣∣∣≥ min‖x‖=1

∣∣∣∣c2m‖x‖2

‖x‖

∣∣∣∣= c2

m

Hence, combining the upper and lower bound, we have

c2m ≤

‖x‖D

‖x‖≤ C2

M

or equivalently

c2m‖x‖ ≤ ‖x‖D ≤ C2

M‖x‖

as required. Therefore, ‖ · ‖ and ‖ · ‖D are equivalent.

We will conclude this chapter with the following duality theorem, which says that the dual

of the dual norm is itself. This is related to the convexity of the unit ball of the vector norm, as

20

is evident from the proof of this theorem below.

Theorem 1.6.11 (Duality Theorem). Let ‖ · ‖ be a vector norm on a finite-dimensional vector

space over the field F (R or C). Let ‖ · ‖D denotes the dual norm of ‖ · ‖ and let ‖ · ‖DD denotes

the dual norm of ‖ · ‖D. Let

B ≡ {x ∈ V : ‖x‖ ≤ 1} (1.14)

B′′ ≡ {x ∈ V : ‖x‖DD ≤ 1} (1.15)

denote the unit ball of ‖ · ‖ and the unit ball of ‖ · ‖DD respectively.

Then B = B′′, i.e. ‖ · ‖DD = ‖ · ‖.

Proof. First we will prove the following claim:

Claim 1: B′′ ⊂ Co B, where Co B is the closed convex hull of B, which is the intersection of all

convex sets containing B (see [7] for more rigorous treatment of convex hull and half-spaces).

Proof of Claim 1: Observe that the set {t ∈ V : Re t∗v ≤ 1} is a general closed half-space that

contains the origin. Now, let u ∈ B′′ be a given point and observe that

u ∈ {t ∈ V : Re t∗v ≤ 1 for every v such that ‖v‖D ≤ 1}

= {t ∈ V : Re t∗v ≤ 1 for every v such that Re v∗w ≤ 1 for every w such that ‖w‖ ≤ 1}

= {t ∈ V : Re t∗v ≤ 1 for every v such that Re w∗v ≤ 1 for all w ∈ B}

This implies that u lies in every closed half-space that has the property that it contains every

point of B, i.e. u lies in every closed half-space that contains B. Since the intersection of all

such closed half-spaces is the closed convex hull of B, we conclude that u ∈ Co B. This implies

B′′ ∈ Co B, proving Claim 1.

Next we prove another claim:

Claim 2: B ⊆ B′′.

Proof of Claim 2: By Proposition 1.6.6:

‖x‖DD = max‖y‖D=1

|y∗x| ≤ max‖y‖D=1

‖x‖‖y‖D = ‖x‖

which is equivalent to B ⊆ B′′ by Proposition 1.5.3, proving Claim 2.

Now, as B is the unit ball of a vector norm, by Theorem 1.5.5, B is a convex set. Hence,

the smallest convex set containing B is B itself, i.e. we have B = Co B. Then we have by Claim

1 and Claim 2 the following chain of inclusion:

Co B = B ⊆ B′′ ⊆ Co B

which implies B = B′′, i.e. ‖ · ‖DD = ‖ · ‖ as required.

21

Chapter 2

Matrix Norms

We will now generalise the notion of norm introduced in Chapter 1 to measure the ‘size’ of

matrices. Since Mn is itself a vector space of dimension n2, one may measure the ‘size’ of a

matrix by using any vector norm on Fn2. However, Mn has a natural multiplication operation,

and it is often useful to relate the ‘size’ of the matrix AB to the ‘size’ of A and B. In this

chapter, the notion of matrix norm and its properties will be studied.

2.1 Basic Properties of Matrix Norms

Definition 2.1.1 (Matrix Norm Axioms). A function ||| · ||| :Mn → R is a matrix norm on Mn

if for all A,B ∈Mn, the following axioms are satisfied:

1. (Non-negative) |||A ||| ≥ 0.

2. (Positive) |||A ||| = 0 if and only if A = 0.

3. (Homogeneous) ||| cA ||| = |c| |||A ||| for all scalars c ∈ F.

4. (Triangle Inequality) |||A+B ||| ≤ |||A |||+ |||B |||.

5. (Submultiplicative) |||AB ||| ≤ |||A ||| |||B |||.

We will now establish some basic results concerning matrix norm.

Proposition 2.1.2. Let ||| · ||| be a matrix norm on Mn. Then

1.∣∣∣∣∣∣Ak ∣∣∣∣∣∣ ≤ |||A |||k for every positive integer k.

2. ||| I ||| ≥ 1.

Proof. We will prove (1) by induction.

1. Let P (n) be the statement ‘|||An ||| ≥ |||A |||n’.

When n = 1, the statement is trivially true.

Suppose P (n) is true for some positive integer n = k, we will show it is true for n = k+ 1.

We have ∣∣∣∣∣∣∣∣∣Ak+1∣∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣∣AAk ∣∣∣∣∣∣∣∣∣ ≤ |||A ||| ∣∣∣∣∣∣∣∣∣Ak ∣∣∣∣∣∣∣∣∣ ≤ |||A ||| |||A |||k = |||A |||k+1

Hence, P (n) is true for all positive integers n by induction.

22

2. We have ∣∣∣∣∣∣ I2∣∣∣∣∣∣ ≤ ||| I ||| ||| I |||

which implies ||| I ||| ≤ ||| I |||2. As ||| I ||| 6= 0, we have ||| I ||| ≥ 1 as required.

We will now give some examples of the commonly used matrix norm and prove directly in

the case of Frobenius norm that it is indeed a matrix norm. Later, in Proposition 2.2.4, ||| · |||1will also be shown to be a matrix norm by showing that it is induced. ||| · |||∞ can also be shown

to be a matrix norm by similar method.

Example 2.1.3 (maximum column sum matrix norm on Mn).

|||A |||1 ≡ max1≤j≤n

n∑i=1

|aij |

Example 2.1.4 (maximum row sum matrix norm on Mn).

|||A |||∞ ≡ max1≤i≤n

n∑j=1

|aij |

Example 2.1.5 (Frobenius norm on Mn).

|||A |||F ≡

n∑i,j=1

|aij |21/2

Proof. Non-negativity and Positivity axioms are easy to check. We will show that ||| · |||F is a

matrix norm by checking the remaining matrix norm axioms:

1. (Homogeneous) Let c ∈ F. Then

||| cA ||| =

n∑i,j=1

|caij |21/2

=

|c|2 n∑i,j=1

|aij |21/2

= |c|

n∑i,j=1

|aij |21/2

= |c| |||A |||

2. (Triangle Inequality) Triangle inequality follows from Minkowski’s Inequality (Theorem

1.2.8) by taking p = 2.

3. (Submultiplicative) Let A,B ∈Mn. Then by Cauchy-Schwarz Inequality (Theorem 1.3.3),

|||AB |||2F =

n∑i,j=1

∣∣∣∣∣n∑k=1

aikbkj

∣∣∣∣∣2

≤n∑

i,j=1

[(n∑k=1

|aik|2)(

n∑m=1

|bmj |2)]

=

n∑i,k=1

|aik|2 n∑

m,j=1

|bmj |2

= |||A |||2F |||B |||2F

Hence, Frobenius norm is indeed a matrix norm.

23

Example 2.1.6 (spectral norm on Mn).

|||A |||2 ≡ max{√λ : λ is an eigenvalue of A∗A}

Notice that if A∗Ax = λx and x 6= 0, then x∗A∗Ax = ‖Ax‖22 = λ‖x‖22, hence λ is real and

non-negative, hence |||A ||| is well-defined. It can be checked that spectral norm is a matrix norm.

Similar as in the case of vector norms, all matrix norm in Mn are equivalent. To prove this

fact, we need to first show the following lemma, similar to Lemma 1.4.6.

Lemma 2.1.7. Let ||| · ||| be a matrix norm on Mn, and let A1, A2, . . . , Am be given matrices.

Then the function g : Fm → R defined by

g(z1, z2, . . . , zm) ≡ ||| z1A1 + z2A2 + . . .+ zmAm |||

is a uniformly continuous function.

Proof. Let U =

m∑i=1

uiAi and V =

n∑i=1

viAi. Then we have

|g(u1, . . . , um)− g(v1, . . . , vm)| = | |||U ||| − |||V ||| | ≤ |||U − V |||

=

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣m∑i=1

(ui − vi)Ai

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

≤m∑i=1

|ui − vi| |||Ai |||

≤ C max1≤i≤m

|ui − vi|

where C ≡ max1≤i≤m

|||Ai |||.Now if all Ai’s are zero matrices, then there is nothing to show. If not, given ε > 0, then in order

to have |g(u1, . . . , um)− g(v1, . . . , vm)| < ε, we only need to choose |ui − vi| < ε/C, proving the

result.

Corollary 2.1.8. Every matrix norm on Mn is a uniformly continuous function.

Proof. Using Lemma 2.1.7, choose the given matrices A1, . . . , An2 to be the basis of Mn. Then

every matrix in Mn can be written as linear combination of the chosen basis. The result then

follows.

We can now formulate the equivalence of matrix norms in Mn in the sense similar to the

equivalence of vector norms in Fn.

Theorem 2.1.9. Let ||| · |||α and ||| · |||β be two matrix norms on Mn. Then ||| · |||α and ||| · |||β are

equivalent, in the sense that there exist positive constants cm and CM such that

cm |||A |||α ≤ |||A |||β ≤ CM |||A |||α

for all A ∈Mn.

24

Proof. This follows immediately from Theorem 1.4.8 as ||| · ||| is positive and homogeneous by

definition. It is also uniformly continuous by Lemma 2.1.7. The conclusion then follows.

2.2 Induced Matrix Norms

In this section, we will define another class of matrix norm which is related to a vector norm in

some sense, and derive some of its important properties.

Definition 2.2.1 (Induced Matrix Norms). Let ‖ · ‖ be a vector norm on a finite-dimensional

vector space over the field F (R or C). Define ||| · ||| on Mn by

|||A ||| ≡ max‖x‖=1

‖Ax‖ = maxx 6=0

‖Ax‖‖x‖

Then ||| · ||| is said to be the matrix norm induced by ‖ · ‖ or the operator norm associated with

‖ · ‖.Note that the use of ‘max’ in the above definition is justified since ‖Ax‖ is a continuous function

of x and the unit ball B‖·‖ is compact.

We will now show that the induced norm defined above is indeed a matrix norm.

Proposition 2.2.2. ||| · ||| defined in Definition 2.2.1 is a matrix norm.

Proof. We will show that it is a matrix norm by checking the matrix norm axioms.

1. (Non-Negative and Positive) Non-negativity follows from the fact that |||A ||| is the maxi-

mum of a non-negative valued function. Moreover, by the definition, |||A ||| = 0 if and only

if Ax = 0 for all x 6= 0, which is equivalent to A = 0.

2. (Homogeneous) Let c ∈ F. We have

||| cA ||| = max‖x‖=1

‖cAx‖ = |c| max‖x‖=1

‖Ax‖ = |c| |||A |||

3. (Triangle Inequality) Let A,B ∈Mn. Then

|||A+B ||| = max‖x‖=1

‖(A+B)x‖ ≤ max‖x‖=1

(‖Ax‖+ ‖Bx‖)

≤ max‖x‖=1

‖Ax‖+ max‖x‖=1

‖Bx‖

= |||A |||+ |||B |||

4. (Submultiplicative) Let A,B ∈Mn. Suppose x is not in the nullspace of B. Then

|||AB ||| = maxx 6=0

‖ABx‖‖x‖

= maxx 6=0

‖ABx‖‖Bx‖

‖Bx‖‖x‖

≤ maxy 6=0

‖Ay‖‖y‖

maxx 6=0

‖Bx‖‖x‖

= |||A ||| |||B |||

Note that when x is in the nullspace of B, then Bx = 0 and the result is trivially true.

Hence, we have proven that ||| · ||| is a matrix norm.

25

Now, we will derive some basic properties of induced matrix norm.

Proposition 2.2.3. Let ||| · ||| be a matrix norm induced by the vector norm ‖ · ‖. Then

1. ‖Ax‖ ≤ |||A ||| ‖x‖, for all A ∈Mn and all x ∈ Fn.

2. ||| I ||| = 1.

Proof. First we will prove (1).

When x = 0, the inequality is trivially true. By definition of induced matrix norm, for all x 6= 0,

we have

∥∥∥∥ Ax

‖x‖

∥∥∥∥ ≤ |||A |||.By homogeneity of vector norm, we then have ‖Ax‖ ≤ |||A ||| ‖x‖ as required.

Now, for (2) we have

||| I ||| = max‖x‖=1

‖Ix‖ = max‖x‖=1

‖x‖ = 1

Note that by the proposition above, the condition ||| I ||| = 1 is a necessary condition for ||| · |||to be an induced matrix norm. However, note that it is not sufficient.

In the following, we will find the matrix norm induced by some common vector norms.

Proposition 2.2.4. The maximum column sum matrix norm ||| · |||1 is induced by l1-norm.

Proof. Write A ∈Mn in terms of its columns as A = [a1 a2 . . . an]. Then

|||A |||1 = max1≤i≤n

‖ai‖1

If x = (x1, . . . , xn), then

‖Ax‖1 = ‖x1a1 + . . .+ xnan‖1 ≤n∑i=1

‖xiai‖1

=

n∑i=1

|xi|‖ai‖1

≤n∑i=1

|xi|(

max1≤k≤n

‖ak‖1)

=

n∑i=1

|xi| |||A |||1

= ‖x‖1 |||A |||1

Hence, we have max‖x‖1=1

‖Ax‖1 ≤ |||A |||1. Now, choose x = ek (the k-th unit basis vector). Then

for any k = 1, . . . , n we have

max‖x‖1=1

‖Ax‖1 ≥ ‖ak‖1

26

and therefore

max‖x‖1=1

‖Ax‖1 ≥ max1≤k≤n

‖ak‖1 = |||A |||1

Hence, we have the required conclusion.

In the similar way, we can show that ||| · |||∞ is induced by l∞-norm.

Below are two more examples of matrix norms induced by some vector norm.

Example 2.2.5. The maximum row sum matrix norm ||| · |||∞ defined on Mn is induced by the

l∞-norm.

Example 2.2.6. The spectral norm ||| · |||2 defined on Mn is induced by the l2-norm.

We have seen in the previous section that all matrix norms in Mn are equivalent. In this

section, we are going to explore this notion of equivalence specifically for the case of induced

matrix norms further. Moreover, the relation between different induced matrix norms will also

be studied. Before that, we will introduce the following lemma.

Lemma 2.2.7. Let ‖ · ‖ be a given vector norm on finite-dimensional vector space V over the

field F (R or C) and let y ∈ Fn be a given fixed vector. Then there exists a vector y0 ∈ Fn such

that both of the following are satisfied:

1. (y0)∗y = ‖y‖; and

2. |(y0)∗x| ≤ ‖x‖ for all x ∈ V .

Proof. By the duality theorem, we have

‖y‖ = ‖y‖DD = max‖z‖D=1

|y∗z|

Moreover, by the compactness of unit sphere of the vector norm ‖ · ‖D, the maximum is actually

achieved for some vector z = y0 such that ‖y0‖D = 1, so ‖y‖ = |y∗y0|. Multiplying y0 with a

suitable factor of modulus 1, the inner product y∗y0 can be made positive, satisfying (1).

Moreover, by Proposition 1.6.6, we have

|(y0)∗x| ≤ ‖y0‖D‖x‖ = ‖x‖

for all x ∈ Fn, satisfying (2). Hence, we have found the required vector y0.

Theorem 2.2.8. Let ‖ · ‖α and ‖ · ‖β be two given vector norms on finite-dimensional vector

space V over the field F (R or C). Let ||| · |||α and ||| · |||β be the respective induced matrix norms

on Mn. Define

Rαβ ≡ maxx 6=0

‖x‖α‖x‖β

and Rβα ≡ maxx 6=0

‖x‖β‖x‖α

(2.1)

Then

maxA 6=0

|||A |||α|||A |||β

= maxA 6=0

|||A |||β|||A |||α

= RαβRβα (2.2)

Proof. Let A ∈Mn and x ∈ V be given, and suppose that x 6= 0 and Ax 6= 0. Then

‖Ax‖α‖x‖α

=‖Ax‖α‖Ax‖β

‖Ax‖β‖x‖β

‖x‖β‖x‖α

≤ Rαβ‖Ax‖β‖x‖β

Rβα

27

an inequality which also holds when Ax = 0. Thus we have

|||A |||α ≡ maxx 6=0

‖Ax‖α‖x‖α

≤ Rαβ maxx 6=0

‖Ax‖β‖x‖β

Rβα = RαβRβα |||A |||β

and hence|||A |||α|||A |||β

≤ RαβRβα (2.3)

for all nonzero A ∈Mn.

Now, rewrite (2.1) as follows

maxx 6=0

‖x‖α‖x‖β

= maxx 6=0

∥∥∥ x‖x‖2

∥∥∥α∥∥∥ x

‖x‖2

∥∥∥β

= max‖y‖2=1

‖y‖α‖y‖β

Hence, by the compactness of unit sphere and Weierstrass Theorem, each of the extrema in (2.1)

is achieved for some nonzero vectors, i.e. there exist vectors y, z ∈ V such that ‖y‖2 = ‖z‖2 = 1

and ‖y‖α = Rαβ‖y‖β and ‖z‖β = Rβα‖z‖α. By Lemma 2.2.7, there exists a vector z0 ∈ V such

that

1. |z∗0x| ≤ ‖x‖β; and

2. z0∗z = ‖z‖β.

Now, consider the matrix A0 ≡ yz0∗. By (2), we have

‖A0z‖α‖z‖α

=‖yz0

∗z‖α‖z‖α

=‖y‖α|z0∗z|‖z‖α

=‖y‖α‖z‖β‖z‖α

so by definition of induced matrix norm, we have the lower bound

|||A0 |||α ≥‖y‖α‖z‖β‖z‖α

= RαβRβα‖y‖β (2.4)

Moreover, by (1), we have

‖A0x‖β‖x‖β

=‖yz0

∗x‖β‖x‖β

=‖y‖β|z0∗x|‖x‖β

≤‖y‖β‖x‖β‖x‖β

= ‖y‖β (2.5)

By definition of induced matrix norm, we have the upper bound |||A0 |||β ≤ ‖y‖βCombining (2.4) and (2.5),

|||A0 |||α|||A0 |||β

≥RαβRβα‖y‖β‖y‖β

= RαβRβα

which shows that equality is possible in (2.3), hence establishing one part of (2.2). The complete

assertion in (2.2) follows because of the symmetry in α and β.

The above theorem has several interesting corollaries which are presented below. Corollary

2.2.9 shows that two different vector norms could induce the same matrix norm if and only if

one of the vector norms is a constant multiple of the other.

28

Corollary 2.2.9. Let ‖ ·‖α and ‖ ·‖β be vector norms on finite-dimensional vector space V over

the field F (R or C). Let ||| · |||α and ||| · |||β denote the respective induced matrix norm on Mn.

Then |||A |||α = |||A |||β for all A ∈ Mn if and only if there exists a positive constant c such that

‖x‖α = c‖x‖β for all x ∈ V .

Proof. Observe that

Rβα = maxx 6=0

‖x‖β‖x‖α

=

[minx 6=0

‖x‖α‖x‖β

]−1

≥[maxx 6=0

‖x‖α‖x‖β

]−1

=1

Rαβ

Hence, we have the general inequality

RαβRβα ≥ 1 (2.6)

with equality if and only if

minx 6=0

‖x‖α‖x‖β

= maxx 6=0

‖x‖α‖x‖β

which can occur if and only if the function ‖x‖α/‖x‖β is constant for all x 6= 0. Therefore, if

‖x‖α ≡ c‖x‖β, we have RαβRβα = 1, hence |||A |||α ≤ |||A |||β and |||A |||α ≤ |||A |||β by Theorem

2.2.8, implying |||A |||α = |||A |||β for all A ∈Mn.

Conversely, if two induced matrix norms are identical, then RαβRβα = 1 again by Theorem

2.2.8, and hence equality holds in (2.4) and the ratio ‖x‖α/‖x‖β is constant for all x ∈ V by

the preceding argument, proving the result.

Moreover, we have the following corollary which says that no induced matrix norm can be

uniformly dominated by another. This is made precise as follows.

Corollary 2.2.10. Let ‖ · ‖α and ‖ · ‖β be vector norms on finite-dimensional vector space V

over the field F (R or C). Let ||| · |||α and ||| · |||β denote the respective induced matrix norm on

Mn. Then |||A |||α ≤ |||A |||β for all A ∈Mn if and only if |||A |||α = |||A |||β for all A ∈Mn.

Proof. If |||A |||α ≤ |||A |||β for all A ∈ Mn, then RαβRβα ≤ 1, which because of (2.4) in the

previous corollary, implies RαβRβα = 1. Therefore, |||A |||α = |||A |||β for all A ∈ Mn by similar

argument as Corollary 2.2.9.

Corollary 2.2.10 says that no induced matrix norm can be uniformly dominated by another

induced matrix norm. The following theorem will examine the case when we compare induced

matrix norm to another (not necessarily induced) matrix norm.

Theorem 2.2.11. Let ||| · ||| be a given matrix norm on Mn, and let ||| · |||α be a given induced

matrix norm on Mn. Then

1. There is an induced matrix norm ||| · |||β such that |||A |||β ≤ |||A ||| for all A ∈Mn; and

2. |||A ||| ≤ |||A |||α for all A ∈Mn if and only if |||A ||| = |||A |||α for all A ∈Mn.

Proof. Define the vector norm ‖ · ‖ on Fn by

‖x‖ ≡ |||X ||| , where X ≡ [x x . . . x] ∈Mn (2.7)

29

where the matrix X consists of the columns of vector x.

Consider the matrix norm ||| · |||β on Mn induced by ‖ · ‖. For any A ∈Mn, we have

|||A |||β ≡ maxx 6=0

‖Ax‖‖x‖

= maxx 6=0

||| [Ax Ax . . . Ax] |||||| [x x . . . x] |||

= maxx 6=0

|||AX ||||||X |||

≤ maxx 6=0

|||A ||| |||X ||||||X |||

= |||A |||

which proves (1).

To prove (2), suppose that |||A ||| ≤ |||A |||α for all A ∈ Mn. Then by (1) just proven above,

we have

|||A |||β ≤ |||A ||| ≤ |||A |||α

for all A ∈Mn. However, both ||| · |||β and ||| · |||α are induced matrix norms, hence |||A |||β = |||A |||αby Corollary 2.2.10, and hence |||A ||| ≡ |||A |||α for all A ∈Mn.

The above result is the motivation of the following definition of minimal matrix norm.

Definition 2.2.12. A matrix norm ||| · ||| onMn is said to be a minimal matrix norm if the only

matrix norm ||| · |||α on Mn such that |||A |||α ≤ |||A ||| for all A ∈Mn is ||| · |||α = ||| · |||.

We will now establish the properties of the minimal matrix norm. The following theorem

gives some equivalent conditions.

Theorem 2.2.13. Let ||| · ||| be a matrix norm onMn. Let ||| · |||y be the matrix norm induced by

the vector norm defined by ‖x‖y ≡ |||xy∗ ||| for a given y ∈ Fn. Then the following are equivalent:

1. ||| · ||| is an induced matrix norm.

2. ||| · ||| is a minimal matrix norm.

3. ||| · ||| = ||| · |||y for all nonzero y ∈ Fn.

Proof. The assertion (1) implies (2) is proven in Theorem 2.2.11. Moreover, the assertion

(3) implies (1) is trivial because ||| · |||y is induced by definition. It remains to prove (2)

implies (3).

Observe that ‖ · ‖y is a vector norm on Fn with the property that for all A ∈Mn,

‖Ax‖y = |||A(xy∗) ||| ≤ |||A ||| |||xy∗ ||| = |||A ||| ‖x‖y

Now we have, for all A ∈Mn,

|||A |||y ≡ maxx 6=0

‖Ax‖y‖x‖y

≤ maxx 6=0

|||A ||| ‖x‖y‖x‖y

= |||A |||

30

If ||| · ||| is a minimal matrix norm, then the above inequality implies |||A ||| = |||A |||y for all

A ∈Mn.

Hence, we have proven the statement.

We have proven in the above theorem that the induced matrix norms are minimal among

all matrix norms. Subsequently, we will characterise the minimal matrix norms among some

important classes of matrix norms.

Definition 2.2.14. Let ||| · ||| be a matrix norm onMn such that |||A ||| = |||UAV ||| for all A ∈Mn

and all unitary matrices U, V ∈Mn. Then ||| · ||| is said to be a unitarily invariant matrix norm.

Some examples of unitarily invariant matrix norm include the Frobenius norm and the spec-

tral norm. Next, we will define the notion of adjoint of a matrix norm.

Definition 2.2.15. Let ||| · ||| be a matrix norm on Mn, then the function ||| · |||∗ defined by

|||A |||∗ = |||A∗ |||

for all A ∈Mn is a matrix norm. ||| · |||∗ is said to be the adjoint of ||| · |||.

Proof. We will show that ||| · |||∗ is a matrix norm by checking the norm axioms.

1. (Non-negative and Positive) Non-negativity follows immediately by definition. Moreover,

|||A |||∗ = 0 if and only if A∗ = 0, which implies A = 0.


||| cA |||∗ = ||| (cA)∗ ||| = ||| cA∗ ||| = |c| |||A∗ ||| = |c| |||A |||∗


|||A+B |||∗ = ||| (A+B)∗ ||| = |||A∗ +B∗ |||

≤ |||A∗ |||+ |||B∗ |||

= |||A |||∗ + |||B |||∗

4. (Submultiplicative) Let A,B ∈Mn. Then

|||AB |||∗ = ||| (AB)∗ ||| = |||B∗A∗ ||| ≤ |||B∗ ||| |||A∗ ||| = |||A |||∗ |||B |||∗

Hence ||| · |||∗ is indeed a matrix norm.

We will now define the notion of self-adjoint matrix norm, analogous to the notion of self-

adjoint matrix.

Definition 2.2.16. Let ||| · ||| be a matrix norm onMn and let ||| · |||∗ be its adjoint. Then ||| · ||| issaid to be self-adjoint if |||A |||∗ = |||A ||| for all A ∈Mn.

Some examples of matrix norm which are self-adjoint include Frobenius matrix norm and

spectral norm. Next, we will show that every unitarily invariant matrix norm is in fact self-

adjoint.

31

Proposition 2.2.17. Every unitarily invariant matrix norm on Mn is self-adjoint.

Proof. Let A,U, V ∈Mn and U, V be unitary matrices. Then for all A ∈Mn, we have

|||A |||∗ = |||A∗ ||| = |||UA∗V ||| = ||| (V ∗AU∗)∗ ||| = |||V ∗AU∗ ||| = |||A |||

where we used the definition of unitarily invariant matrix norm, and the fact that U∗ and V ∗ is

also unitary.

We are now ready to find the matrix norms which are minimal among the class of unitarily

invariant matrix norms and self-adjoint matrix norms. It turns out that the minimal matrix

norm in this case is just the spectral norm.

Theorem 2.2.18. If ||| · ||| is a unitarily invariant matrix norm onMn, then |||A |||2 ≤ |||A ||| for all

A ∈Mn. The spectral norm is the only matrix norm onMn that is both induced and unitarily

invariant.

Proof. Suppose that ||| · ||| is a given unitarily invariant matrix norm. By Theorem 2.2.11, there

exists a matrix norm ||| · |||β such that |||A |||β ≤ |||A ||| for all A ∈ Mn, where ||| · |||β is induced by

the vector norm ‖ · ‖ defined in the statement (2.7) of the proof in Theorem 2.2.11.

If U ∈Mn is unitary, then

‖Ux‖ = |||UX ||| = |||X ||| = ‖x‖

Now observe the fact that if y ∈ Fn, such that ‖y‖2 = 1, is a given nonzero vector, then there

exists a unitary matrix U such that Uy = e1, where e1 is the basis vector of Fn with the first

entry equal to 1 and 0 otherwise, so there exists x ∈ Fn such that Ux = ‖x‖2e1. Thus, we have

‖x‖ = ‖Ux‖ = ‖‖x‖2e1‖ = ‖x‖2‖e1‖

for all x ∈ Fn. The vector norm ‖ · ‖ is therefore a scalar multiple of the Euclidean norm. By

Corollary 2.2.9, we have ||| · |||β (matrix norm induced by ‖ · ‖) equals ||| · |||2 (matrix norm induced

by ‖ · ‖2). Therefore, ||| · |||2 = ||| · |||β ≤ |||A ||| for all A ∈Mn.

If ||| · ||| is assumed to be induced, then it is minimal and hence |||A |||2 = |||A ||| by Theorem 2.2.13,

hence proving the statement.

Next, we will determine the minimal matrix norm in the class of self-adjoint matrix norms.

For that, we need to establish several lemmas.

Lemma 2.2.19. Let ||| · ||| be a given matrix norm on Mn. Then ||| · |||∗ is an induced matrix

norm if and only if ||| · ||| is an induced matrix norm.

Proof. Let ||| · |||α be a matrix norm on Mn. Now if |||A |||α ≤ |||A |||∗ = |||A∗ ||| for all A ∈ Mn,

then |||A |||∗α = |||A∗ |||α ≤ |||A ||| for all A ∈Mn.

If ||| · ||| is induced, hence minimal by Theorem 2.2.13, then we have |||A∗ |||α = |||A |||, which

implies |||A |||α = |||A∗ ||| for all A ∈Mn, and therefore ||| · |||α = ||| · |||∗. So ||| · |||∗ is a minimal (hence

induced) matrix norm.

The converse can be established by similar reasoning.

Lemma 2.2.20. Let ||| · ||| be a given matrix norm on Mn. If the matrix norm ||| · ||| is induced

by the vector norm ‖ · ‖, then ||| · |||∗ is induced by the dual norm ‖ · ‖D.

32

Proof. Suppose that ||| · ||| is induced by the vector norm ‖ · ‖. By the duality theorem, we have

|||A |||∗ = |||A∗ ||| = max‖x‖=1

‖A∗x‖

= max‖x‖=1

(‖A∗x‖D

)D= max‖x‖=1

max‖z‖D=1

|(A∗x)∗z|

= max‖z‖D=1

max‖x‖=1

|x∗Az|

= max‖z‖D=1

‖Az‖D

and hence ||| · |||∗ is induced by ‖ · ‖D by definition.

Now we are in position to prove that in fact the spectral norm is the only induced matrix

norm in the class of self-adjoint matrices.

Theorem 2.2.21. The spectral norm ||| · |||2 is the only matrix norm that is both induced and

self-adjoint.

Proof. Observe that if the matrix norm ||| · ||| is induced by the vector norm ‖·‖, and if ||| · ||| = ||| · |||∗,then by Lemma 2.2.20, ||| · ||| is also induced by ‖ · ‖D. However, Corollary 2.2.9 says that the

vector norm that induces a matrix norm is uniquely determined up to a positive scalar factor.

Hence, there exists some c > 0 such that ‖ · ‖D = c‖ · ‖. Now, by Proposition 1.6.7, we then

have ‖ · ‖ = ‖ · ‖2/√c. Since the given norm is a positive multiple of the Euclidean vector norm,

they both induce the same matrix norm, hence we conclude that ||| · ||| = ||| · |||2.

2.3 Generalised Matrix Norms

When we relax the submultiplicativity axiom in the definition of matrix norms, we obtain a

bigger class of norms which could be useful for some important applications. In this section, we

will explore the properties of such class of matrix norms.

Definition 2.3.1 (Generalised Matrix Norm Axioms). A function G(·) :Mn → R is said to be

a generalised matrix norm on Mn or a vector norm on Mn if for all A,B ∈ Mn, the following

axioms are satisfied:

1. (Non-negative) G(A) ≥ 0.

2. (Positive) G(0) = 0 if and only if A = 0.

3. (Homogeneous) G(cA) = |c|G(A) for all scalars c ∈ F.

4. (Triangle Inequality) G(A+B) ≤ G(A) +G(B).

We will now give examples of some generalised matrix norms which are not matrix norms.

We will also check the norm axioms for some of the cases.

Example 2.3.2. Let ||| · ||| be a matrix norm onMn and T, S ∈Mn are non-singular. Then the

function GT,S(·) defined by

GT,S(A) ≡ |||TAS |||

33

for all A ∈ Mn is a generalised matrix norm on Mn. However, in general it is not a matrix

norm on Mn.

Proof. Now, we will check that the above function is a generalised matrix norm on Mn.

1. (Non-negative and Positive) By definition of matrix norm, GT,S(A) ≥ 0 for all A ∈ Mn.

We also have GT,S(A) = 0 if and only if TAS = 0. Moreover, as T and S are non-singular,

we have A = 0.


GT,S(cA) = |||T (cA)S ||| = |c| |||TAS ||| = |c|GT,S(A)


GT,S(A+B) = |||T (A+B)S ||| = |||TAS + TBS |||

≤ |||TAS |||+ |||TBS |||

= GT,S(A) +GT,S(B)

Hence, GT,S(·) is a generalised matrix norm on Mn.

Taking

T = A = B = I and S =

(18

14

14

18

)

and the matrix norm to be ||| · |||∞, we have GT,S(AB) = |||S |||∞ = 12 , but GT,S(A)GT,S(B) =

|||S |||∞ |||S |||∞ = 14 , hence the submultiplicativity axiom is not satisfied, and GT,S(·) is not a

matrix norm on Mn in general.

Example 2.3.3. Define the Hadamard product of two matrices A = [aij ] and B = [bij ] of the

same size to be the entry-wise product A ◦ B ≡ [aijbij ]. If H = [hij ] ∈ Mn is a given matrix

with no zero entries, and if ||| · ||| is any matrix norm on Mn, then the function GH(·) given by

GH(A) ≡ |||H ◦A |||

is a generalised matrix norm on Mn.

However, in general, it is not a matrix norm on Mn.

Proof. We will check that the above function is a vector norm on Mn.

1. By definition of matrix norm, we have GH(A) ≥ 0 for all A ∈Mn. Moreover, GH(A) = 0

if and only if H ◦A = 0. As H contains no zero entries, we must have A = 0.

2. Let A = [aij ] ∈Mn and c ∈ F. Then we have

GH(cA) = |||H ◦ cA ||| = ||| [hij caij ] ||| = ||| c[hij aij ] ||| = |c| |||H ◦A ||| = |c|GH(A)

34

3. Let A = [aij ], B = [bij ] ∈Mn. Then we have

GH(A+B) = |||H ◦ (A+B) ||| = ||| [hij ] ◦ [aij + bij ] |||

= ||| [hij(aij + bij)] |||

= ||| [hij aij ] + [hij bij ] |||

= |||H ◦A+H ◦B |||

≤ |||H ◦A |||+ |||H ◦B |||

= GH(A) +GH(B)

Hence, GH is a generalised matrix norm on Mn.

Taking

H =

(12

12

12

12

), A =

(0 1

0 0

), B =

(0 0

1 0

),

and the matrix norm to be ||| · |||∞, we have GH(AB) = 12 , but GH(A)GH(B) = 1

4 , hence the

submultiplicativity axiom is not satisfied, and GH(·) is not a matrix norm onMn in general.

Example 2.3.4. The function G∞(·) defined by

G∞(A) ≡ max1≤i,j≤n

|aij |

for all A = [aij ] ∈ Mn is a generalised matrix norm on Mn. However, it can be checked that

G∞(·) is not a matrix norm on Mn in general.

Some properties of matrix norms do carry over to the case of generalised matrix norms. One

of the useful properties that generalised matrix norms also enjoy is the equivalence property in

finite-dimensional vector space.

Theorem 2.3.5. Let G(·) and Gα(·) be generalised matrix norms on Mn. Then there exist

finite positive constants cm and CM such that

cmGα(A) ≤ G(A) ≤ CMGα(A) (2.8)

for all A ∈ Mn. In particular, this inequality also holds when Gα(·) is replaced by any matrix

norm ||| · ||| on Mn, i.e. we also have

cm |||A ||| ≤ G(A) ≤ CM |||A ||| (2.9)

for all A ∈Mn.

Proof. This follows almost immediately from Theorem 1.4.8. It remains to prove that G(·) is a

continuous function. However, the statement of Lemma 2.1.7 concerning the uniform continuity

of matrix norm is also valid for the case of generalised matrix norm. This is because the

definition of generalised matrix norm is almost similar to that of matrix norm, only without the

submultiplicativity property. However, nowhere in the proof of Lemma 2.1.7 have we used the

fact that matrix norm is submultiplicative. Hence, we have the conclusion.

35

Considering the fact that the definition of generalised matrix norm differs from matrix norm

only in the submultiplicativity property, we would expect these two classes of norms to be related

closely. The following result shows that a generalised matrix norm can be made into a matrix

norm by multiplying it with appropriate positive constant. First, we prove a lemma.

Lemma 2.3.6. Let G(·) be a generalised matrix norm on Mn. Define

c(G) ≡ maxA 6=0,B 6=0

G(AB)

G(A)G(B)(2.10)

If ||| · ||| is a matrix norm on Mn such that

cm |||A ||| ≤ G(A) ≤ CM |||A ||| (2.11)

for all A ∈Mn, then c(G) ≤ CMc2m

.

Proof. We have

c(G) ≡ maxA 6=0,B 6=0

G(AB)

G(A)G(B)= max

A 6=0,B 6=0G

(AB

G(A)G(B)

)= max

G(A′)=G(B′)=1G(A′B′)

which is finite and positive by the continuity and the compactness of the unit sphere of G(·).Then we have

G(AB) ≤ CM |||AB ||| ≤ CM |||A ||| |||B |||

≤ CM[

1

cmG(A)

] [1

cmG(B)

]=CMc2m

G(A)G(B)

This impliesG(AB)

G(A)G(B)≤ CM

c2m

for all nonzero A,B ∈Mn. Hence we have c(G) ≤ CMc2m

as required.

Theorem 2.3.7. Let G(·) be a generalised matrix norm onMn and c(G) as defined in Lemma

2.3.6. Define the function ||| · ||| by |||A ||| ≡ kG(A), where k is a positive constant. If k ≥ c(G),

then ||| · ||| is a matrix norm.

In particular, ||| · ||| ≡ CMc2m

G(·) is a matrix norm.

Proof. We will check that ||| · ||| satisfies the matrix norm axioms.

1. (Non-negative and Positive) Clearly a positive multiple of a generalised matrix norm is

always non-negative. Moreover, |||A ||| = 0 if and only if G(A) = 0, which implies A = 0.


||| cA ||| = kG(cA) = |c|kG(A) = |c| |||A |||

36


|||A+B ||| = kG(A+B) ≤ k(G(A) +G(B)) = |||A |||+ |||B |||

4. (Submultiplicativity) Let A,B ∈Mn. Then by definition of c(G),

|||AB ||| = kG(AB) ≤ kc(G)G(A)G(B)

≤ kG(A) kG(B)

= |||A ||| |||B |||

Hence, ||| · ||| is a matrix norm.

In particular, ||| · ||| ≡ CMc2m

G(·) is a matrix norm asCMc2m

≥ c(G) by Lemma 2.3.6, and hence

satisfying the hypothesis of the above theorem.

At this point, to derive further properties of generalised matrix norms, we are going to define

the notion of spectral radius of a matrix. Further properties and applications of this spectral

radius will be given in the next chapter.

Definition 2.3.8 (Spectral Radius). The spectral radius ρ(A) of a matrix A ∈Mn is

ρ(A) ≡ max{|λ| : λ is an eigenvalue of A}

Below is the relation between spectral radius and matrix norm that we require at the moment.

Theorem 2.3.9. If ||| · ||| is any matrix norm on Mn, then ρ(A) ≤ |||A ||| for all A ∈Mn.

Proof. Observe that if λ is any eigenvalue of A, then |λ| ≤ ρ(A). Moreover, there is at least

one eigenvalue λ for which |λ| = ρ(A). Let x be an eigenvector associated with λ. Consider the

matrix X ∈Mn, all the columns of which are equal to the eigenvector x. Then we have

|λ| |||X ||| = |||λX ||| = |||AX ||| ≤ |||A ||| |||X |||

and hence |λ| = ρ(A) ≤ |||A ||| as required.

Next, we will define the notion of compatible matrix norm as well as compatible vector norm

on matrices.

Definition 2.3.10. The vector norm ‖ · ‖ on a finite-dimensional vector space V over the field

F (R or C) is said to be compatible with the generalised matrix norm G(·) in Mn if

‖Ax‖ ≤ G(A)‖x‖

for all x ∈ V and all A ∈Mn.

Similarly, ‖ · ‖ is compatible with the matrix norm ||| · ||| on Mn if

‖Ax‖ ≤ |||A ||| ‖x‖

for all x ∈ V and all A ∈Mn.

37

In Proposition 2.2.3, we have shown that given a vector norm on Fn, then there is a matrix

norm compatible with it (the induced matrix norm). Here, we will show the converse.

Proposition 2.3.11. If ||| · ||| is a matrix norm on Mn, then there is some vector norm on Fn

that is compatible with it.

Proof. If we define a vector norm ‖ · ‖ on Fn by ‖x‖ ≡ ||| [x 0 . . . 0] |||, then we have

‖Ax‖ = ||| [Ax 0 . . . 0] ||| = |||A[x 0 . . . 0] |||

≤ |||A ||| ||| [x 0 . . . 0] |||

= |||A ||| ‖x‖

Hence, we find a vector norm ‖ · ‖ compatible with given ||| · ||| as required.

Now we are going to extend the above notion of compatibility to the case of generalised

matrix norm. In this case, the situation is more complicated as some generalised matrix norms

on Mn have compatible vector norms in Fn, while others do not. This will be explored in the

next few theorems.

Theorem 2.3.12. Let G(·) be a generalised matrix norm on Mn that has a compatible vector

norm ‖ · ‖ on Fn. Then G(A) ≥ ρ(A) for all A ∈Mn. More generally,

G(A1)G(A2) . . . G(Ak) ≥ ρ(A1A2 . . . Ak) (2.12)

for all A1, A2, . . . , Ak ∈Mn and all k = 1, 2, . . ..

Proof. We will first prove the following claim.

Claim: G(A1)G(A2) . . . G(Ak)‖x‖ ≥ ‖A1A2 . . . Akx‖ for A1, . . . , Ak ∈ Mn and x ∈ Fn, where

‖ · ‖ is compatible with G(·).

Proof of Claim: We will proceed by induction.

Let P (k) be the statement ‘G(A1)G(A2) . . . G(Ak)‖x‖ ≥ ‖A1A2 . . . Akx‖ for A1, . . . , Ak ∈ Mn

and x ∈ Fn’.

Clearly, P (1) is true by definition of ‖ · ‖ being compatible with G(·).Suppose P (k) is true for some positive integer m, i.e. P (m) is true. We will prove P (m+ 1) is

true. We have

G(A1) . . . G(Am)G(Am+1)‖x‖ ≥ G(A1) . . . G(Am)‖Am+1x‖

≥ ‖A1 . . . AmAm+1x‖

Hence P (m+ 1) is true. Therefore the claim is proven by mathematical induction.

Now, let x be nonzero vector such that A1 . . . Akx = λx, where |λ| = ρ(A1 . . . Ak). Then

G(A1)G(A2) . . . G(Ak) ≥ ‖A1A2 . . . Akx‖

= ‖λx‖

= ρ(A1A2 . . . Ak)‖x‖

38

Hence, the conclusion follows.

We have observed in Theorem 2.3.12 the necessary condition for a given vector norm onMn

to have a compatible vector norm on matrices. It is in fact sufficient. To show this, we need

another lemma.

Lemma 2.3.13. Let G(·) be a generalised matrix norm on Mn that satisfies the condition

(2.10) in Theorem 2.3.12. Let ||| · |||2 denotes the spectral norm onMn. Then there exists a finite

positive constant c = c(G) such that

G(A1)G(A2) . . . G(Ak) ≥ c |||A1A2 . . . Ak |||2


Proof. Let k be a given positive integer and let A1, A2, . . . , Ak ∈Mn be given.

By singular value decomposition theorem, there exist unitary matrices V and W and a diagonal

matrix Σ = diag(σ1, σ2, . . . , σn) with all σi ≥ 0 for i = 1, . . . , n, such that A1A2 . . . Ak = V ΣW ∗

and ρ(Σ) = max{σ1, . . . , σn} = |||A1A2 . . . Ak |||2.

(Refer to [2] for more details on singular value decomposition theorem, and the proof of the

above assertion).

By the hypothesis, we have

G(V ∗)G(A1)G(A2) . . . G(Ak)G(W ) ≥ ρ(V ∗A1A2 . . . AkW )

= ρ(Σ)

= |||Σ |||2= |||V ∗A1A2 . . . AkW |||2= |||A1A2 . . . Ak |||2

where we used the fact that the spectral norm is unitarily invariant.

Furthermore, by the equivalence between generalised matrix norm and matrix norm on Mn,

there exists a finite positive constant b = b(G) such that |||A |||2 ≥ bG(A) for all A ∈Mn.

We then have

G(A1)G(A2) . . . G(Ak) ≥1

G(V ∗)G(W )|||A1A2 . . . Ak |||2

≥ b2

|||V ∗ |||2 |||W |||2|||A1A2 . . . Ak |||2

= b2 |||A1A2 . . . Ak |||2

The conclusion follows by taking c = b2.

Now, we are ready to prove the sufficient condition for a given vector norm to have a com-

patible generalised matrix norm.

Theorem 2.3.14. Let G(·) be a generalised matrix norm on Mn. There exists a vector norm

‖ · ‖ on Fn such that

‖Ax‖ ≤ G(A)‖x‖

39

for all x ∈ Fn and all A ∈Mn if and only if

G(A1)G(A2) . . . G(Ak) ≥ ρ(A1A2 . . . Ak)


Proof. Necessity has been proven in Theorem 2.3.12. For sufficiency, we will show that there

exists a matrix norm ||| · ||| on Mn such that G(A) ≥ |||A ||| for all A ∈Mn.

Let ‖ · ‖ be a vector norm on Fn that is compatible with ||| · ||| (which is guaranteed to exist by

Proposition 2.3.11) and let x ∈ Fn and A ∈ Mn be given. Then we have ‖Ax‖ ≤ |||A ||| ‖x‖ ≤G(A)‖x‖, so we will be done if we can construct a matrix norm that is dominated by G(·).

Now observe that for a given matrix A ∈ Mn, there are various ways to represent A as a

product of matrices or as a sum of products of matrices. Define

|||A ||| ≡ inf

{∑i

G(Ai1) . . . G(Aiki) :∑i

Ai1 . . . Aiki = A and all Aikj ∈Mn

}(2.13)

It remains to check that ||| · ||| defined above is indeed a matrix norm.

1. (Non-negative and Positive) By Lemma 2.3.13 and triangle inequality for spectral norm,

we have ∑i

G(Ai1) . . . G(Aiki) ≥∑i

c |||Ai1 . . . Aiki |||2

≥ c

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∑

i

Ai1 . . . Aiki

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣2

= c |||A |||2

from which it follows that ||| · ||| is non-negative. Moreover, we have |||A ||| = 0 if and only

if A = 0, for if A 6= 0 then by the inequality above, we have |||A ||| ≥ c |||A |||2 > 0 by the

property of matrix norm.

2. (Homogeneous) Let c ∈ F. Then we have

||| cA ||| = inf

{∑i

G(Bi1) . . . G(Biki) :∑i

Bi1 . . . Biki = cA and all Bikj ∈Mn

}

= inf

{|c|∑i

G(Bi1) . . . G

(1

cBij

). . . G(Biki) :

∑i

Bi1 . . .1

cBij . . . Biki = cA

}

= |c| inf

{∑i

G(Bi1) . . . G

(1

cBij

). . . G(Biki) :

∑i

Bi1 . . .1

cBij . . . Biki = cA

}= |c| |||A |||

3. (Triangle Inequality) Let A,B ∈Mn and let C = A+B. Consider the following set:

A′ ≡

{∑i

G(Ai1) . . . G(Aiki) :∑i

Ai1 . . . Aiki = A and all Aikj ∈Mn

}

40

B′ ≡

{∑i

G(Bi1) . . . G(Biki) :∑i

Bi1 . . . Biki = B and all Bikj ∈Mn

}

C ′ ≡

{∑i

G(Ci1) . . . G(Ciki) :∑i

Ci1 . . . Ciki = C and all Cikj ∈Mn

}where each set defined above is a set of the sum of products of generalised matrix norm on

Mn over the various representation of the matrix concerned as a sum of products. Define

the addition of sets by

A′ +B′ ≡ {a+ b : a ∈ A′, b ∈ B′}

Then we have (A′+B′) ⊆ C ′ because every representation of A and B separately as a sum

of products yields a representation of C as a sum of products, but not all representations

of C arise in this way. Therefore

|||A+B ||| = |||C ||| = inf C ′ ≤ inf(A′ +B′)

= inf(A′) + inf(B′)

= |||A |||+ |||B |||

4. (Submultiplicative) Let A,B ∈ Mn and let C = AB. Let the set A′, B′, C ′ be as defined

above. Define the product of sets by

A′B′ ≡ {ab : a ∈ A′, b ∈ B′}

Then we have A′B′ ⊆ C ′ for the same reason that every representation of A and B

separately as a sum of products yields a representation of C as a sum of products, but not

all representations of C arise in this way. Therefore

|||AB ||| = |||C ||| = inf C ′ ≤ inf(A′B′)

= inf(A′) inf(B′)

= |||A ||| |||B |||

Hence, we have proven that ||| · ||| is a matrix norm and the conclusion follows.

Now we have understood the useful necessary and sufficient conditions for a generalised

matrix norm on Mn to have a compatible vector norm on Fn. Furthermore, we also know that

given a vector norm on Fn, there exists a matrix norm onMn that is compatible with it (which

is the induced matrix norm). Now we are going to show that given a vector norm on Fn, one

can always find a compatible generalised matrix norm onMn that is not submultiplicative (i.e.

not a matrix norm).

Proposition 2.3.15. Let ‖ · ‖ be a given vector norm on Fn. Then there exists a generalised

matrix norm G(·) on Mn, which is not a matrix norm and is such that

‖Ax‖ ≤ G(A)‖x‖

for all x ∈ Fn and all A ∈Mn.

41

Proof. Let P ∈Mn be any permutation matrix whose entries in main diagonal are all zero. For

instance, take P = [pij ] with

pij =

{1 if j = i+ 1 or if i = n and j = 1

0 otherwise

Let ||| · ||| denote the matrix norm on Mn which is induced by the vector norm ‖ · ‖. Let A =

[aij ] ∈Mn. Define G(·) on Mn by

G(A) ≡ |||A |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ max

1≤i≤n|aii|

We will now verify that G(·) is a generalised matrix norm onMn. Non-negativity and Positivity

axioms are almost immediate. We will check the remaining axioms.

1. (Homogeneous) Let c ∈ Fn. Then

G(cA) = ||| cA |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ max

1≤i≤n|caii|

= |c| |||A |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ |c| max

1≤i≤n|aii|

= |c|(|||A |||+ |||P |||

∣∣∣∣∣∣P T ∣∣∣∣∣∣ max1≤i≤n

|aii|)

= |c|G(A)


G(A+B) = |||A+B |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ max

1≤i≤n|aii + bii|

≤ |||A |||+ |||B |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ (max

1≤i≤n|aii|+ max

1≤i≤n|bii|)

= |||A |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ max

1≤i≤n|aii|+ |||B |||+ |||P |||

∣∣∣∣∣∣P T ∣∣∣∣∣∣ max1≤i≤n

|bii|

= G(A) +G(B)

Hence G(·) is a generalised matrix norm on Mn. Moreover G(A) ≥ |||A ||| for all A ∈Mn, and

‖Ax‖ ≤ |||A ||| ‖x‖ ≤ G(A)‖x‖

for all A ∈Mn and all x ∈ Fn. Observe that P is orthogonal and ||| I ||| = 1 by Proposition 2.2.3.

However, we have

G(PP T ) = G(I) = ||| I |||+ |||P |||∣∣∣∣∣∣P T ∣∣∣∣∣∣ = 1 + |||P |||

∣∣∣∣∣∣P T ∣∣∣∣∣∣G(P ) = |||P |||

G(P T ) =∣∣∣∣∣∣P T ∣∣∣∣∣∣

Hence, we have

G(PP T ) ≥ G(P )G(P T )

Hence, the vector norm G(·) on Mn is compatible with the given vector norm ‖ · ‖ on Fn, but

42

it is not submultiplicative.

We have observed in Theorem 2.3.12 that the condition G(A) ≥ ρ(A) for all A ∈ Mn is a

necessary and sufficient condition for a generalised matrix norm on Mn to have a compatible

vector norm ‖ · ‖ on Fn. Subsequently, we will study and characterise the generalised matrix

norms on Mn that have the above property.

Definition 2.3.16. Let G(·) be a generalised matrix norm on Mn. If G(A) ≥ ρ(A) for all

A ∈Mn, then G(·) is said to be spectrally dominant.

Definition 2.3.17. Let G(·) be a generalised matrix norm on Mn. Define the spectral charac-

teristic of G(·) to be

m(G) = maxG(A)≤1

ρ(A)

A generalised matrix norm G(·) onMn is said to be minimally spectrally dominant if m(G) = 1.

Observe that a matrix norm induced by a vector norm is an example of minimally spectrally

dominant matrix norm.

Proposition 2.3.18. Induced matrix norm ||| · ||| on Mn is minimally spectrally dominant.

Proof. We will show that m(||| · |||) = 1. We have

m(||| · |||) = max|||A |||≤1

ρ(A) ≤ max|||A |||≤1

|||A ||| ≤ 1

Moreover, we have ||| I ||| = 1 by Proposition 2.2.3 and ρ(I) = 1, hence the maximum above is

actually attained, and we have m(||| · |||) = 1 as required.

We will now explore some of the properties of spectrally dominant generalised matrix norms.

Proposition 2.3.19. Let G(·) be a generalised matrix norm on Mn. Then G(·) is spectrally

dominant if and only if m(G) ≤ 1.

Proof. First, suppose G(A) ≥ ρ(A).

Then we have

m(G) = maxG(A)≤1

ρ(A) ≤ maxG(A)≤1

G(A) ≤ 1

Conversely, suppose m(G) ≤ 1.

Then we have

maxA 6=0

ρ(A)

G(A)= max

A 6=0ρ

(A

G(A)

)= max

G(A)=1ρ(A)

≤ maxG(A)≤1

ρ(A)

= m(G)

≤ 1

Thus we have ρ(A) ≤ G(A) for all nonzero matrix A.

For A = 0, the result is trivially true, hence proving the statement for all A ∈Mn.

43

Now, any generalised matrix norm can be made into a spectrally dominant norm by multi-

plying it with a suitable constant, as illustrated below.

Theorem 2.3.20. Let G(·) be a generalised matrix norm on Mn. Then G′(·) defined by

G′(A) ≡ m(G)G(A)

for all A ∈Mn is a spectrally dominant generalised matrix norm on Mn.

Proof. It can be checked that G′(·) is a vector norm on Mn. We now need to show that

G′(A) ≥ ρ(A) for all A ∈Mn. We have

ρ(A)

G(A)≤ max

B 6=0

ρ(B)

G(B)= max

B 6=0ρ

(B

G(B)

)= max

G(B)=1ρ(B)

≤ maxG(B)≤1

ρ(B)

= m(G)

This implies

m(G)G(A) ≥ ρ(A) (2.14)

for all A ∈Mn. Hence G′(A) ≥ ρ(A) for all A ∈Mn as required.

We will end this chapter with some sufficient conditions of spectrally dominant generalised

matrix norm. For that, we need to prove a lemma.

Lemma 2.3.21. Let G(·) be a generalised matrix norm and A1, A2, . . . be a sequence in Mn

such that

ρ(Aj) = 1

for j = 1, 2, . . .. Then G(Aj) does not tend to 0, as j →∞.

Proof. Suppose G(Aj)→ 0 as j →∞.

Now when ρ(B) = 1, where B ∈ Mn, we have m(G)G(B) ≥ ρ(B) = 1 by statement (2.14) in

Theorem 2.3.20, which implies G(B) ≥ 1

m(G)> 0.

However for the above sequence, we have G(Aj) ≤1

m(G)by taking j sufficiently large, which is

a contradiction.

Theorem 2.3.22. Let G(·) be a generalised matrix norm on Mn and A ∈ Mn. If there exists

a constant γA (depending on G(·) and A) such that for all integers k > 0,

G(Ak) ≤ γAG(A)k (2.15)

then G is spectrally dominant.

Proof. Suppose that γA exists with the above property for each A ∈ Mn. Suppose ρ(A) = m,

but G(A) < m.

44

Then we haveG(A)

m< 1 which implies

1

mkG(A)k → 0 as k →∞. This implies

γAmk

G(A)k → 0.

By statement (2.15), this implies1

mkG(Ak)→ 0

i.e. G

(1

mkAk)→ 0 as k →∞.

Moreover, we have ρ(Ak) = mk, which implies ρ

(1

mkAk)

= 1 for all k. However, this is a

contradiction by Lemma 2.3.21, hence proving the statement.

Another application of Lemma 2.3.21 yields different sufficient condition of spectral domi-

nance.

Proposition 2.3.23. Let G(·) be a generalised matrix norm on Mn. If for some fixed positive

integer k,

G(Ak) ≤ G(A)k (2.16)

for all A ∈Mn, then G(·) is spectrally dominant.

Proof. We will first prove that G(Ak

l)≤ G(A)k

lfor all positive integers l. Let P (l) be the

above statement. The case when l = 1 is given to be true. Suppose the statement is true for

some l. We will show that P (l + 1) is true.

G(Ak

l+1)

= G

[(Ak

l)k]≤ G

(Ak

l)k

≤[G(A)k

l]k

= G(A)kl+1

proving the statement.

Now, suppose ρ(A) = m, but G(A) < m. Then similarly, as the proof of Theorem 2.3.22,

we have ρ

(1

mklAk

l

)= 1 for all l, while

1

mklG(Ak

l) → 0 as l → ∞, i.e. G

(1

mklAk

l

)→ 0,

which is a contradiction by Lemma 2.3.21. Hence G(·) is spectrally dominant.

45

Chapter 3

Applications of Norms

In this chapter, some applications of the various classes of norms discussed in previous chapters

will be given. In particular, we will discuss the notion of convergence of series of matrices

and bounds for the roots of algebraic equations using the norms we have discussed so far.

Furthermore, we will give some simple applications of norms on perturbation of eigenvalues,

which is an important notion in numerical linear algebra.

3.1 Sequences and Series of Matrices

We will discuss the infinite sequences and series of matrices, as well as power series of matrices,

in this section. This can be thought of as a generalisation of infinite sequences and series of real

numbers. We will need the notion of spectral radius as defined in Definition 2.3.8 as well as its

basic property in Theorem 2.3.9. Now we will define the notion of convergence inMn formally.

This is a natural extension of the convergence of sequences of vectors in Fn.

Note that in the following, we do not specify with respect to which matrix norm the sequence

converges, as all matrix norm in Mn are equivalent by Theorem 2.1.9.

Definition 3.1.1. Let ||| · ||| be a matrix norm on Mn. Then the sequence {Ak} of matrices in

Mn converges to a matrix A ∈ Mn if and only if |||Ak −A ||| → 0 as k → ∞. In such case, we

write

limk→∞

Ak = A

Definition 3.1.2. Let ||| · ||| be a matrix norm on Mn. Then the sequence {Ak} of matrices in

Mn is said to be a Cauchy sequence if for each ε > 0, there exists a positive integer N = N(ε),

such that whenever m,n ≥ N ,

|||Am −An ||| < ε

Now that we have these definitions, equivalence of matrix norms in Mn established earlier,

and the similarity of the axioms of vector norms and matrix norms, all the analytic properties

of vector norms carry over to the case of matrix norms. Below we will state some of them. The

proof is essentially the same as those in the case of vector norms in Section 1.4 (by replacing

vector norms with matrix norms, and basis of Fn with basis of Mn).

Theorem 3.1.3. Let ||| · ||| be a matrix norm on Mn and let A,B ∈ Mn. If Ak → A and

Ak → B, then A = B.

46

Theorem 3.1.4. Let ||| · ||| be a matrix norm on Mn and let {Ak} be a sequence of matrices on

Mn. The sequence {Ak} converges to a matrix A ∈Mn if and only if it is a Cauchy sequence.

We will now formulate several lemmas to determine the behaviour of the sequence of matrices

{Ak} as k → ∞, which will be useful later to determine the convergence of power series of

matrices.

Lemma 3.1.5. If ||| · ||| is a matrix norm on Mn and if S ∈ Mn is non-singular, the function

||| · |||S defined by

|||A |||S ≡∣∣∣∣∣∣SAS−1

∣∣∣∣∣∣is a matrix norm.

Proof. Non-negativity and Homogeneity axioms of matrix norm is easy to verify. We will check

the remaining axioms are satisfied.


|||A+B |||S =∣∣∣∣∣∣S(A+B)S−1

∣∣∣∣∣∣ =∣∣∣∣∣∣SAS−1 + SBS−1

∣∣∣∣∣∣≤∣∣∣∣∣∣SAS−1

∣∣∣∣∣∣+∣∣∣∣∣∣SBS−1

∣∣∣∣∣∣= |||A |||S + |||B |||S

2. (Submultiplicative) Let A,B ∈Mn. Then

|||AB |||S =∣∣∣∣∣∣SABS−1

∣∣∣∣∣∣ =∣∣∣∣∣∣ (SAS−1)(SBS−1)

∣∣∣∣∣∣≤∣∣∣∣∣∣SAS−1

∣∣∣∣∣∣ ∣∣∣∣∣∣SBS−1∣∣∣∣∣∣

= |||A |||S |||B |||S

Hence, ||| · |||S is a matrix norm on Mn.

Lemma 3.1.6. Let A ∈ Mn and ε > 0 be given. Then there exists a matrix norm ||| · ||| such

that ρ(A) ≤ |||A ||| ≤ ρ(A) + ε.

Proof. By Schur’s Triangularisation Theorem (see [3] for proof), there exists a unitary matrix

U and upper triangular matrix ∆, such that A = U∗∆U . Let Dt ≡ diag (t, t2, . . . , tn), and let

the entries of ∆ be

[∆]ij =

{dij if i ≥ j0 otherwise

Then we have

Dt∆D−1t =

d11 t−1d12 t−2d13 . . . t−n+1d1n

0 d22 t−1d23 . . . t−n+2d2n

... 0 d33 . . ....

0...

.... . .

...

0 0 0 . . . dnn

Hence, by taking t > 0 sufficiently large, we can make the sum of all the absolute values of the

off-diagonal entries arbitrarily small (i.e. less than ε). Then taking the maximum column sum

matrix norm, we have∣∣∣∣∣∣Dt∆D

−1t

∣∣∣∣∣∣1≤ ρ(A) + ε for t > 0 sufficiently large.

47

Now, construct a function ||| · ||| defined by

|||A ||| ≡

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣Dt UAU

∗︸︷︷︸∆

D−1t

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣1

=∣∣∣∣∣∣ (DtU)A(DtU)−1

∣∣∣∣∣∣1

(3.1)

which is a matrix norm on Mn by Lemma 3.1.5. Hence, we have constructed a matrix norm

such that |||A ||| ≤ ρ(A) + ε. Since |||A ||| ≥ ρ(A) for any matrix norm by Theorem 2.3.9, the

conclusion follows.

Note that Lemma 3.1.6 implies that ρ(A) = inf{|||A ||| : ||| · ||| is a matrix norm}.

Lemma 3.1.7. Let A ∈ Mn be a given matrix. If there is a matrix norm ||| · ||| such that

|||A ||| < 1, then limk→∞

Ak = 0; that is, all the entries of Ak tend to zero as k →∞.

Proof. If |||A ||| < 1, then∣∣∣∣∣∣Ak ∣∣∣∣∣∣ ≤ |||A |||k → 0 as k → ∞. As all matrix norm on Mn are

equivalent, taking ||| · ||| to be the maximum row sum norm ||| · |||∞, this implies the entries of

Ak → 0 as k →∞.

We are now ready to state and proof a theorem which explains the behaviour of Ak as

k →∞.

Theorem 3.1.8. Let A ∈ Mn. Then limk→∞

Ak = 0 if and only if ρ(A) < 1. In this case, matrix

A is said to be convergent.

Proof. Suppose Ak → 0 as k → ∞. If x 6= 0 is an eigenvector of A such that Ax = λx, then

Akx = λkx→ 0 if and only if |λ| < 1. Since this inequality must hold for every eigenvalue λ of

A, we conclude that ρ(A) < 1.

Conversely, suppose ρ(A) < 1. Then by Lemma 3.1.6, there exists some matrix norm ||| · |||such that |||A ||| < 1. Thus, Ak → 0 as k →∞ by Lemma 3.1.7.

We can derive several corollaries from the above theorem. One of the useful corollaries is

the Gelfand’s Formula for spectral radius. The other is a useful bound on the size of the entries

of Ak as k →∞.

Corollary 3.1.9. Let A ∈ Mn be a given matrix, and let ε > 0 be given. Then there exists a

constant C = C(A, ε) such that

|(Ak)ij | ≤ C[ρ(A) + ε]k

for all k = 1, 2, . . . and all i, j = 1, 2, . . . , n, where (Ak)ij denotes the (i, j)-entry of the matrix

Ak.

Proof. Consider the matrix A = [ρ(A) + ε]−1A, which has spectral radius ρ(A) =ρ(A)

ρ(A) + ε< 1.

Then by Theorem 3.1.8, Ak → 0 as k →∞.

In particular, the entries of the matrices in the sequence {Akn}∞n=1 are bounded, i.e. there exists

a constant C > 0 such that |(Akn)ij | ≤ C for all n. This implies |(Ak)ij | ≤ C[ρ(A) + ε]k as

required.

48

Corollary 3.1.10 (Gelfand’s Formula). Let ||| · ||| be a matrix norm on Mn. Then

ρ(A) = limk→∞

∣∣∣∣∣∣∣∣∣Ak ∣∣∣∣∣∣∣∣∣ 1kfor all A ∈Mn.

Proof. Since ρ(A)k = ρ(Ak) ≤∣∣∣∣∣∣Ak ∣∣∣∣∣∣, we have ρ(A) ≤

∣∣∣∣∣∣Ak ∣∣∣∣∣∣ 1k for all k = 1, 2, . . ..

Given ε > 0, the matrix A = [ρ(A) + ε]−1A has spectral radius strictly less than 1, hence it is

convergent.

By definition of limits, there exists natural number N , such that∣∣∣∣∣∣∣∣∣ Ak ∣∣∣∣∣∣∣∣∣ < 1 for all k ≥ N , i.e.∣∣∣∣∣∣Ak ∣∣∣∣∣∣ ≤ [ρ(A) + ε]k for all k ≥ N , which implies

∣∣∣∣∣∣Ak ∣∣∣∣∣∣ 1k ≤ ρ(A) + ε.

Then we have ρ(A) ≤∣∣∣∣∣∣Ak ∣∣∣∣∣∣ 1k ≤ ρ(A) + ε. Since ε > 0 is arbitrary, the result follows.

We have established several useful result on the convergence of Ak. We will extend this

concept of convergence to infinite series and power series of matrices subsequently.

Theorem 3.1.11. Let {Ak} ⊂ Mn be a given infinite sequence of matrices. If there exists a

matrix norm ||| · ||| on Mn such that the series of real numbers∞∑k=0

|||Ak ||| is convergent, then the

series

∞∑k=0

Ak is convergent.

Proof. By Cauchy’s Criterion, the convergence of∞∑k=0

|||Ak ||| implies that given ε > 0, there exists

positive integers N0 such that for all natural numbers n,m where n > m > N0, we have∣∣∣∣∣n∑

k=m+1

|||Ak |||

∣∣∣∣∣ < ε.

Now, given ε > 0, for all n,m such that n > m > N0, we have∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

n∑k=m+1

Ak

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ≤

∣∣∣∣∣n∑

k=m+1

|||Ak |||

∣∣∣∣∣ < ε

By Cauchy’s Criterion,∞∑k=0

Ak is convergent.

Theorem 3.1.12. Let A ∈ Mn and {ak} be a sequence of complex numbers. Then the series∞∑k=0

akAk converges if there exists a matrix norm ||| · ||| onMn such that the series of real numbers

∞∑k=0

|ak| |||A |||k converges.

Proof. Similarly as proof of Theorem 3.1.12, the convergence of

∞∑k=0

|ak| |||A |||k, implies that

given any ε > 0, there exists positive integer N0, such that for all natural numbers n,m where

n > m > N0, we have

∣∣∣∣∣n∑

k=m+1

|ak| |||A |||k∣∣∣∣∣ < ε.

49

Now, given ε > 0, for all n,m such that n > m > N0, we have∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

n∑k=m+1

akAk

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ≤

∣∣∣∣∣n∑

k=m+1

∣∣∣∣∣∣∣∣∣ akAk ∣∣∣∣∣∣∣∣∣∣∣∣∣∣ =

∣∣∣∣∣n∑

k=m+1

|ak|∣∣∣∣∣∣∣∣∣Ak ∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ≤

∣∣∣∣∣n∑

k=m+1

|ak| |||A |||k∣∣∣∣∣ < ε

By Cauchy’s Criterion,∞∑k=0

akAk converges.

The notion of absolute convergence and radius of convergence for power series of real numbers

(see [1] for further details) can also be carried over to the case of power series of matrices in the

following way.

Theorem 3.1.13. Let the function f(z) be defined by f(z) =

∞∑k=0

akzk, which has radius of

convergence R > 0, and let ||| · ||| be a matrix norm onMn. Then f(A) ≡∞∑k=0

akAk is well-defined

for all A ∈Mn such that |||A ||| < R.

In particular, f(A) is well-defined for all A ∈Mn such that ρ(A) < R.

Proof. We have f (|||A |||) =

∞∑k=0

|ak| |||A |||k, which converges absolutely because |||A ||| < R, where

R is the radius of convergence of f(z).

By Theorem 3.1.12, the conclusion follows immediately.

The above theorem enables us to define power series of matrices similar to the case of power

series for real numbers.

Example 3.1.14. The matrix exponential is given by the power series

eA ≡∞∑k=0

1

k!Ak

which is well-defined for all A ∈ Mn, because the corresponding power series for real numbers

ex =∞∑k=0

1

k!xk has radius of convergence R =∞.

Other type of functions can also be defined similarly. For instance, one can define trigono-

metric function of a matrix, analogous to the case of power series for real numbers.

Another important power series of matrices which will be used later is the power series expression

for the inverse of a matrix.

Proposition 3.1.15. Let A ∈Mn. Then A is invertible if there exists a matrix norm ||| · ||| such

that ||| I −A ||| < 1. In such case, we have

A−1 =∞∑k=0

(I −A)k

50

Proof. If ||| I −A ||| < 1, then the series∞∑k=0

(I − A)k converges to some matrix C ∈ Mn because

the radius of convergence of the series∞∑k=0

zk is 1. However, we have

AN∑k=0

(I −A)k = [I − (I −A)]N∑k=0

(I −A)k = I − (I −A)N+1

which tends to matrix I as N →∞. Hence, we conclude that C = A−1.

The above result would be useful when we study the perturbation theory in the solution of

linear system later. Moreover, we have the following results as corollaries.

Corollary 3.1.16. Let ||| · ||| be a matrix norm onMn. Suppose that a given matrix A ∈Mn is

related to another matrix B ∈Mn by |||BA− I ||| < 1. Then A and B are both invertible.

Proof. By Proposition 3.1.16, BA is invertible. This implies det(AB) 6= 0, hence det(A) 6= 0

and det(B) 6= 0, proving the result

Corollary 3.1.17. Let A = [aij ] ∈Mn. Suppose that

|aii| >n∑

j=1,j 6=i|aij | for all i = 1, 2, . . . , n (3.2)

Then A is invertible.

A matrix which satisfies condition (3.2) is said to be strictly diagonally dominant.

Proof. The hypothesis (3.2) ensures that all main diagonal entries aii are nonzero.

Set D ≡ diag(a11, . . . , ann), such that D is an invertible diagonal matrix and D−1A has all 1’s

on the main diagonal. Then the matrix B ≡ I −D−1A has zero entries on the main diagonal

and bij =−aijaii

if i 6= j, where [bij ] is the (i, j)-th entry of B.

Consider the maximum row sum norm ||| · |||∞. The hypothesis guarantees |||B |||∞ < 1, so that

I −B = D−1A is invertible by Proposition 3.1.15, hence A is invertible.

3.2 Bounds for Roots of Algebraic Equations

In this section, we will examine how matrix norms can be used in various ways to give a bound for

the roots of polynomials with real or complex coefficients. First we begin with some definitions.

Definition 3.2.1. Any polynomial f(z) of degree at least 1 can be written in the form f(z) =

Azkp(z), where A is a nonzero constant and

p(z) = zn + an−1zn−1 + . . .+ a1z + a0

where a0 6= 0.

51

The companion matrix of p(z) is given by

C(p) ≡

−an−1 −an−2 . . . a1 a0

1 0 . . . 0 0

0 1. . . 0 0

......

. . ....

...

0 0 . . . 1 0

(3.3)

where the characteristic polynomial of C(p) is exactly p(z).

The following proposition is the key result used to obtain the bound for p(z).

Proposition 3.2.2. If |z| is a root of p(z) and if ||| · ||| is any matrix norm on Mn, then |z| ≤|||C(p) |||.

Proof. z is a root of p(z) = 0 if and only if z is an eigenvalue of C(p).

Now, for any eigenvalues of C(p), we have |z| ≤ ρ[C(p)] ≤ |||C(p) ||| as required.

We will now make use the above proposition by varying the matrix norm used to obtain

various upper bounds for p(z).

Proposition 3.2.3 (Montel’s Upper Bound). Let z be a root of p(z). Then

|z| ≤ 1 + |a0|+ |a1|+ . . .+ |an−1|

Proof. Using ||| · |||∞ as the matrix norm in Proposition 3.2.2, we have

|z| ≤ max{1, |a0|+ |a1|+ . . .+ |an−1|}

≤ 1 + |a0|+ |a1|+ . . .+ |an−1|

as required.

Proposition 3.2.4 (Cauchy’s Upper Bound). Let z be a root of p(z). Then

|z| ≤ 1 + max{|a0|, |a1|, . . . , |an−1|}

Proof. Using ||| · |||1 as the matrix norm in Proposition 3.2.2, we have

|z| ≤ max{|a0|, 1 + |a1|, . . . , 1 + |an−1|}

≤ 1 + max{|a0|, |a1|, . . . , |an−1|

as required.

Observe that Cauchy’s upper bound is a stronger bound compared to Montel’s upper bound.

Now, we will derive another upper bound in a slightly different way.

Proposition 3.2.5 (Carmichael and Mason’s Upper Bound). Let z be a root of p(z). Then

|z| ≤(1 + |a0|2 + |a1|2 + . . .+ |an−1|2

) 12

52

Proof. Write C(p) = S +R, where

S =

0 0 0 0 0 0

1 0 0 0 0 0

0 1. . . 0 0 0

... 0. . .

. . ....

......

.... . .

. . .. . .

...

0 0 . . . 0 1 0

and

R =

−an−1 −an−2 . . . −a1 −a0

0 0 . . . 0 0

0 0 . . . 0 0...

.... . .

......

0 0 . . . 0 0

We have S∗R = R∗S = 0. Moreover, we have (S∗S)2 = diag(1, 1, . . . , 1, 0) such that |||S∗S |||2 =

max{√λ : λ is an eigenvalue of (S∗S)∗(S∗S) = (S∗S)2} = 1.

Similarly, we have |||R∗R |||2 = |a0|2 + |a1|2 + . . .+ |an−1|2.

Then we have

|||C(p) |||22 = |||C(p)∗C(p) |||2 = ||| (S +R)∗(S +R) |||2= |||S∗S +R∗R |||2≤ |||S∗S |||2 + |||R∗R |||2

which implies |z| ≤(1 + |a0|2 + |a1|2 + . . .+ |an−1|2

) 12 as required.

We will now generalise Cauchy’s upper bound to derive another upper bound due to Kojima.

Proposition 3.2.6 (Generalised Cauchy’s Upper Bound). Let z be a root of p(z). Let D ≡diag(p1, p2, . . . , pn) be any diagonal matrix with all pi > 0. Then

|z| ≤ max

{|a0|

pnp1, |a1|

pn−1

p1+pn−1

pn, |a2|

pn−2

p1+pn−2

pn−1, . . . , |an−2|

p2

p1+p2

p3, |an−1|+

p1

p2

}Proof. Observe that ρ(A) = ρ(D−1AD) for any non-singular matrix D.

Then |z| ≤ ρ[C(p)] = ρ[D−1C(p)D] ≤∣∣∣∣∣∣D−1C(p)D

∣∣∣∣∣∣ for any matrix norm ||| · ||| on Mn.

We have

D−1C(p)D =

−an−1 −an−2p−11 p2 . . . −a1p

−11 pn−1 −a0p

−11 pn

p1p−12 0 . . . 0 0

0 p2p−13

. . . 0 0...

.... . .

......

0 0 . . . pn−1p−1n 0

53

In particular, using the matrix norm ||| · |||1, we have

|z| ≤∣∣∣∣∣∣D−1C(p)D

∣∣∣∣∣∣1

= max

{|a0|

pnp1, |a1|

pn−1

p1+pn−1

pn, |a2|

pn−2

p1+pn−2

pn−1, . . . , |an−2|

p2

p1+p2

p3, |an−1|+

p1

p2

}as required.

Proposition 3.2.7 (Kojima’s Upper Bound). Let z be a root of p(z). If all ai’s are nonzero,

then

|z| ≤ max

{∣∣∣∣a0

a1

∣∣∣∣ , 2 ∣∣∣∣a1

a2

∣∣∣∣ , 2 ∣∣∣∣a2

a3

∣∣∣∣ , . . . , 2 ∣∣∣∣an−2

an−1

∣∣∣∣ , 2|an−1|}

Proof. Using Proposition 3.2.6, choose pk ≡p1

|an−k+1|, for k = 2, 3, . . . , n, which is always

positive. Then the conclusion follows almost immediately.

Now that we have established several upper bounds for the roots of polynomial, we are

interested to establish the corresponding lower bounds. The following lemma is required to

establish the required lower bounds.

Lemma 3.2.8. If p(z) is given by p(z) = zn + an−1zn−1 + . . .+ a1z + a0 with a0 6= 0, then the

function q(z) defined by

q(z) ≡ 1

a0znp

(1

z

)= zn +

a1

a0zn−1 +

a2

a0zn−2 + . . .+

an−1

a0z +

1

a0

is a polynomial of degree n whose roots are exactly the reciprocals of the roots of p(z).

Proof. Let z0 be the roots of p(z). Then q

(1

z0

)=

1

a0zn0 p(z0) = 0, i.e.

1

z0is the root of q(z).

Now every root of p(z) gives rise to a root of q(z), giving a total of n roots, as required.

We are now in position to examine various lower bounds for the roots of p(z). We will derive

the lower bound by applying each of the upper bounds established previously to the polynomial

q(z) defined above. By combining the lower and upper bounds, it is possible to locate the roots

of the polynomial p(z) in the annulus {z : r1 ≤ |z| ≤ r2}.

Proposition 3.2.9 (Montel’s Lower Bound). Let z be a root of p(z). Then

|z| ≥ |a0|1 + |a0|+ |a1|+ . . .+ |an−1|

Proof. Applying Montel’s upper bound to q(z), we have∣∣∣∣1z∣∣∣∣ ≤ 1 +

∣∣∣∣ 1

a0

∣∣∣∣+

∣∣∣∣an−1

a0

∣∣∣∣+ . . .+

∣∣∣∣a1

a0

∣∣∣∣=

1 + |a0|+ |a1|+ . . .+ |an−1||a0|

Hence, the conclusion follows by taking reciprocals.

54

Proposition 3.2.10 (Cauchy’s Lower Bound). Let z be a root of p(z). Then

|z| ≥ |a0||a0|+ max{1, |an−1|, |an−2|, . . . , |a1|}

Proof. Applying Cauchy’s upper bound to q(z), we have∣∣∣∣1z∣∣∣∣ ≤ 1 + max

{1,

∣∣∣∣a1

a0

∣∣∣∣ , . . . , ∣∣∣∣an−1

a0

∣∣∣∣}=|a0|+ max{1, |an−1|, |an−2|, . . . , |a1|}

|a0|


Proposition 3.2.11 (Carmichael and Mason’s Lower Bound). Let z be a root of p(z). Then

|z| ≥ |a0|(1 + |a0|2 + . . .+ |an−1|2)

12

Proof. Applying Carmichael and Mason’s upper bound to q(z), we have

∣∣∣∣1z∣∣∣∣ ≤

(1 +

∣∣∣∣ 1

a0

∣∣∣∣2 +

∣∣∣∣an−1

a0

∣∣∣∣2 + . . .+

∣∣∣∣a2

a0

∣∣∣∣2 +

∣∣∣∣a1

a0

∣∣∣∣2) 1

2

=

(1 + |a0|2 + |a1|2 + . . .+ |an−1|2

|a0|2

) 12


Proposition 3.2.12 (Kojima’s Lower Bound). Let z be a root of p(z). Then

|z| ≥ min

{|an−1| ,

∣∣∣∣ an−2

2an−1

∣∣∣∣ , . . . , ∣∣∣∣ a0

2a1

∣∣∣∣}Proof. Applying Kojima’s upper bound to q(z), we have∣∣∣∣1z

∣∣∣∣ ≤ max

{∣∣∣∣ 1/a0

an−1/a0

∣∣∣∣ , 2 ∣∣∣∣an−1/a0

an−2/a0

∣∣∣∣ , . . . , 2 ∣∣∣∣a1

a0

∣∣∣∣}= max

{∣∣∣∣ 1

an−1

∣∣∣∣ , 2 ∣∣∣∣an−1

an−2

∣∣∣∣ , . . . , 2 ∣∣∣∣a1

a0

∣∣∣∣}Hence, the conclusion follows by taking reciprocals.

We will end this section with an example on how to use the various bounds established above

to find the location of the roots of a polynomial.

Example 3.2.13. Consider

f(z) =1

n!zn +

1

(n− 1)!zn−1 + . . .+

1

2z2 + z + 1

which is the n-th partial sum of the power series for the exponential function ez, where n is a

55

positive integer. Then all roots z of f(z) satisfy the inequality

1

2≤ |z| ≤ 1 + n!

Proof. Write f(z) as1

n!p(z), where

p(z) = zn +n!

(n− 1)!zn−1 +

n!

(n− 2)!zn−2 + . . .+

n!

1!z + n!

We then only need to consider the root of p(z). Let z denotes a root of p(z). Then using

Cauchy’s upper bound, we have

|z| ≤ 1 + max

{n!,

n!

2!,n!

3!, . . . ,

n!

(n− 1)!

}= 1 + n!

Using Cauchy’s lower bound, we have

|z| ≥ n!

1 + max{

1, n!, n!2! , . . . ,

n!(n−1)!

} =n!

1 + n!≥ 1

2

which give the conclusion as required.

3.3 Perturbation of Eigenvalues

As another application of matrix and vector norms, we consider what happened to the eigen-

value of a matrix when the matrix is perturbed. This is an important aspect in numerical linear

algebra, especially since many computations are done by computer, hence errors of rounding and

truncation are unavoidable. Vector and matrix norms can quantify this ‘error’ more precisely.

We will begin this section with definition and some properties of condition number, which

is a measure of sensitivity of error due to a small perturbation.

Definition 3.3.1 (Condition Number). The condition number of A with respect to the matrix

norm ||| · ||| on Mn is defined to be:

κ(A) =

{|||A |||

∣∣∣∣∣∣A−1∣∣∣∣∣∣ if A is non-singular

∞ if A is singular

Definition 3.3.2 (Well-conditioned and Ill-Conditioned). Let A ∈Mn.

A is said to be well-conditioned with respect to ||| · ||| if κ(A) is small (near 1).

A is said to be ill-conditioned with respect to ||| · ||| if κ(A) is large.

If κ(A) = 1, then A is said to be perfectly conditioned.

It turns out that when a well-conditioned matrix A is slightly perturbed, the eigenvalues will

not change by very much. However, in the case of ill-conditioned matrix, a small perturbation of

its entries will change its eigenvalues by large amount. Below is a concrete example to emphasise

the importance of condition number.

56

Example 3.3.3 (Wilkinson-Bidiagonal Matrix). Consider a 20×20 triangular matrix B, where

B =

20 20 0 0 . . . 0

0 19 20 0 . . . 0

0 0 18 20 . . . 0...

.... . .

. . .. . .

......

......

. . . 2 20

0 0 0 . . . 0 1

Note that using MATLAB, we found κ(B) = 5.3 × 108 with respect to the Frobenius norm.

Moreover, the eigenvalues of B are found to be 1, 2, . . . , 20. If we perturb the (20, 1)-entry of

B by ε = 10−10, then using MATLAB, we found that the eigenvalues will change drastically,

with some even becoming complex. A plot of original and perturbed eigenvalues of B in Argand

diagram is shown in Figure 3.1 below.

Figure 3.1: A plot of original and perturbed eigenvalues of matrix B

It is clear from Definition 3.3.1 that the condition number of a matrix A depends on the

matrix norm used. In fact, all condition numbers are equivalent in the sense described below.

Theorem 3.3.4. Let κα(·) be the condition number of a matrix with respect to ||| · |||α and κβ(·)be the condition number of a matrix with respect to ||| · |||β. Then there exists a finite positive

constants cm and CM such that

cmκα(A) ≤ κβ(A) ≤ CMκα(A)

for all invertible A ∈Mn.

Proof. By equivalence of matrix norm on Mn, there exists finite positive constants c1, c2, c′1, c′2

such that

c1

∣∣∣∣∣∣A−1∣∣∣∣∣∣α≤∣∣∣∣∣∣A−1

∣∣∣∣∣∣β≤ c2

∣∣∣∣∣∣A−1∣∣∣∣∣∣α

(3.4)

57

c′1 |||A |||α ≤ |||A |||β ≤ c′2 |||A |||α (3.5)

Multiplying (3.4) and (3.5), we have

c1c′1 |||A |||α

∣∣∣∣∣∣A−1∣∣∣∣∣∣α≤ |||A |||β

∣∣∣∣∣∣A−1∣∣∣∣∣∣β≤ c2c

′2 |||A |||α

∣∣∣∣∣∣A−1∣∣∣∣∣∣α

Hence, the conclusion follows by taking cm = c1c′1 and CM = c2c

′2.

Below are some useful lower bound to obtain a rough estimate of the condition number.

Proposition 3.3.5. For any non-singular matrix A ∈Mn and any matrix norm,

κ(A) ≥ max{|λA|}min{|λA|}

where max{|λA|} denotes the maximum value of the modulus of eigenvalues of A, and min{|λA|}denotes the minimum value of the modulus of eigenvalues of A.

Proof. Observe that λ is an eigenvalue of A if and only if λ−1 is an eigenvalue of A−1. Hence,

we have

κ(A) = |||A |||∣∣∣∣∣∣A−1

∣∣∣∣∣∣ ≥ ρ(A)ρ(A−1)

= max{|λA|}max{|λA−1 |}

= max{|λA|}max{|λ−1A |}

=max{|λA|}min{|λA|}

Proposition 3.3.6. Let B ∈Mn be any singular matrix. For any non-singular matrix A ∈Mn

and any matrix norm ||| · ||| on Mn,

κ(A) ≥ |||A ||||||A−B |||

Proof. We have B = A − (A − B) = A[I − A−1(A − B)] is singular. Hence I − A−1(A − B) is

singular. By Proposition 3.1.16, we have∣∣∣∣∣∣A−1(A−B)

∣∣∣∣∣∣ ≥ 1. Hence,

∣∣∣∣∣∣A−1∣∣∣∣∣∣ |||A−B ||| |||A ||| ≥ ∣∣∣∣∣∣A−1(A−B)

∣∣∣∣∣∣ |||A ||| ≥ |||A |||which implies κ(A) ≥ |||A |||

|||A−B |||as required.

We will now move to the actual application of the above notion of condition number to the

theory of perturbation of eigenvalues. First we begin with a theorem on location of eigenvalues

of a matrix due to Gershgorin.

Theorem 3.3.7 (Gershgorin Circle Theorem). Let A = [aij ] ∈Mn and let

R′i(A) =

n∑j=1,j 6=i

|aij |, 1 ≤ i ≤ n

58

denote the deleted absolute row sums of A. Then all eigenvalues of A are located in the union

of n discs, G(A) defined as

G(A) ≡n⋃i=1

{z ∈ C : |z − aii| ≤ R′i(A)}

Proof. Let λ be an eigenvalue of A. Suppose Ax = λx, where x = [xi] 6= 0. Then there exists

an entry of x that has the largest absolute value, say |xp| ≥ |xi| for all i = 1, 2, . . . , n and xp 6= 0.

Then

λxp = [λx]p = [Ax]p =n∑j=1

apjxj

here [x]p denotes the p-th entry of vector x.

This is equivalent to

xp(λ− app) =

n∑j=1,j 6=p

apjxj

By Triangle Inequality,

|xp||λ− app| =

∣∣∣∣∣∣n∑

j=1,j 6=papjxj

∣∣∣∣∣∣ ≤n∑

j=1,j 6=p|apjxj |

=n∑

j=1,j 6=p|apj ||xj |

≤ |xp|n∑

j=1,j 6=p|apj |

= |xp|R′p(A)

Hence, |λ− app| ≤ R′p(A).

Since, we do not know which p is appropriate to each eigenvalue λ (unless we know its associated

eigenvector, in which case we could just calculate λ exactly), we can only conclude that λ lies

in the union of all such discs.

We will apply the above theorem to find an a relation between eigenvalues of a perturbed

matrix and the original matrix. We only consider the case when the matrix is diagonalisable

in this case (see [2] for more detailed treatment). The basic idea is contained in the following

proposition.

Proposition 3.3.8. Let D = diag(λ1, λ2, . . . , λn) ∈ Mn. Let E = [eij ] ∈ Mn and consider the

perturbed matrix D + E. If λ is an eigenvalue of D + E, then there exists some eigenvalue λi

of D such that |λ− λi| ≤ |||E |||∞.

Proof. By Theorem 3.3.6, the eigenvalues of D + E are contained in the union of discs

S =n⋃i=1

z ∈ C : |z − λi − eii| ≤ R′i(D + E) =

n∑j=1,j 6=i

|eij |

(3.6)

59

We will first show that the set S in (3.6) above is contained in the union of discs

T =n⋃i=1

z ∈ C : |z − λi| ≤ Ri(E) =

n∑j=1

|eij |

(3.7)

Let z0 ∈ S. Then we have

|z0 − λi| = |z − λi − eii + eii| ≤ |z − λi − eii|+ |eii|

≤

n∑j=1,j 6=i

|eij |

+ |eii|

≤n∑

j=1,j 6=i|eij |+

n∑i=1

|eii|

=

n∑j=1

|eij | = Ri(E)

Hence, z0 ∈ T , proving the claim that S ⊆ T .

This implies that if λ is an eigenvalue of D + E, then there exists some λi of D such that

|λ− λi| ≤ |||E |||∞ as required.

We can extend the above argument to the case in which the matrix is diagonalisable.

Theorem 3.3.9. Let A ∈ Mn be diagonalisable with A = SΛS−1 and Λ = diag(λ1, . . . , λn).

Let E ∈ Mn and let ||| · ||| be a matrix norm such that |||D ||| = max1≤i≤n

di for all diagonal matrices

D = diag(d1, . . . , dn) ∈Mn.

If λ is an eigenvalue of A+ E, then there exists some eigenvalue λi of A for which

|λ− λi| ≤ κ(S) |||E ||| (3.8)

where κ(·) is the condition number with respect to the matrix norm ||| · |||.

Proof. Observe that A+ E and S−1(A+ E)S = Λ + S−1ES have the same eigenvalues. If λ is

an eigenvalue of Λ + S−1ES, then λI − Λ− S−1ES is singular.

Now if λI − Λ is singular, then λ = λi for some i and the bound (3.8) is trivially satisfied.

Suppose, however, that λI − Λ is nonsingular. In this case, the matrix

(λI − Λ)−1(λI − Λ− S−1ES) = I − (λI − Λ)−1S−1ES

is singular. Hence we have∣∣∣∣∣∣∣∣∣ (λI − Λ)−1S−1ES

∣∣∣∣∣∣∣∣∣ ≥ 1 by Proposition 3.1.15.

By the assumption about the behaviour of the matrix norm ||| · ||| on diagonal matrices, we have

1 ≤∣∣∣∣∣∣∣∣∣ (λI − Λ)−1S−1ES

∣∣∣∣∣∣∣∣∣ ≤ ∣∣∣∣∣∣S−1ES∣∣∣∣∣∣ (λI − Λ)−1

=∣∣∣∣∣∣S−1ES

∣∣∣∣∣∣ max1≤i≤n

|λ− λi|−1

=

∣∣∣∣∣∣S−1ES∣∣∣∣∣∣

min1≤i≤n |λ− λi|

60

Hence,

min1≤i≤n

|λ− λi| ≤∣∣∣∣∣∣S−1ES

∣∣∣∣∣∣ ≤ ∣∣∣∣∣∣S−1∣∣∣∣∣∣ |||S ||| |||E ||| = κ(S) |||E |||

as required.

All our estimates so far have been a priori bounds on the perturbations induced in the

eigenvalues; they do not involve the computed eigenvalues or eigenvectors or any quantity derived

from them. Now suppose that an ‘approximate eigenvector’ and ‘approximate eigenvalue’ have

been found somehow, then we can obtain an estimate of how close it is to the actual eigenvalue

by using the residual vector.

Theorem 3.3.10. Let A ∈ Mn be diagonalisable with A = SΛS−1 and Λ = diag(λ1, . . . , λn).

Let ‖ · ‖ be a vector norm on Cn. Let ||| · ||| be a matrix norm onMn induced by ‖ · ‖. Moreover,

|||D ||| = max1≤i≤n

|di| whenever D = diag(d1, . . . , dn) ∈Mn.

Let x ∈ Cn be a given nonzero ‘approximate eigenvector’ of A, λ be a given ‘approximate

eigenvalue’ associated with x, and let r = Ax− λx be the residual vector.

Then there exists some eigenvalue λi of A for which

|λ− λi| ≤ κ(S)‖r‖‖x‖

Proof. Write A = SΛS−1, and suppose that λ is not exactly equal to any eigenvalue of A. Then

r = Ax− λx = S(Λ− λI)S−1x

so that x = S(Λ− λI)−1S−1r. Then we have

‖x‖ = ‖S(Λ− λI)−1S−1r‖ ≤∣∣∣∣∣∣∣∣∣S(Λ− λI)−1S−1

∣∣∣∣∣∣∣∣∣ ‖r‖≤ |||S |||

∣∣∣∣∣∣S−1∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣ (Λ− λI)−1

∣∣∣∣∣∣∣∣∣ ‖r‖= κ(S)

∣∣∣∣∣∣∣∣∣ (Λ− λI)−1∣∣∣∣∣∣∣∣∣ ‖r‖

= κ(S)

(min

1≤i≤n|λi − λ|

)−1

‖r‖

Hence,

‖x‖ min1≤i≤n

|λi − λ| ≤ κ(S)‖r‖

which upon rearranging give the desired conclusion.

We end this chapter with a note that a typical example of the matrix norm ||| · ||| onMn which

satisfies the condition stated in Theorem 3.3.9 and Theorem 3.3.10, namely |||D ||| = max1≤i≤n

|di|

whenever D = diag(d1, . . . , dn) ∈Mn, is the matrix norm induced by the lp vector norm. Hence,

in particular, ||| · |||1, ||| · |||2 and ||| · |||∞ can be used as the matrix norm in the above theorems. For

further results and discussions on matrix norm with the above property, see [3] or [5].

61

Bibliography

[1] Bartle, R. G., and Sherbert D. R., Introduction to Real Analysis. John Wiley and Sons, Inc.,

New York, 2000.

[2] Datta, B. N., Numerical Linear Algebra and Applications. Society for Industrial and Applied

Mathematics (SIAM), 2010.

[3] Horn, R. A., and Johnson C. R., Matrix Analysis. Cambridge University Press, Cambridge,

1990.

[4] Householder, A. S., The Theory of Matrices in Numerical Analysis. Blaisdell, New York,

1964.

[5] Lancaster, P. and Tismenetsky, M., The Theory of Matrices with Applications. Academic

Press, New York, 1985.

[6] Ponnusamy, S., Foundations of Functional Analysis. Taylor and Francis, London, 2003.

[7] Sundaram, R. K., A First Course in Optimization Theory. Cambridge University Press,

Cambridge, 1996.

[8] Wilcox, R. R., Introduction to Robust Estimation and Hypothesis Testing. Academic Press,

New York, 2005.

62

Documents

Maths UROPS (MA2288): Norms in Vector Space