دانشگاه تهرانphysics.ut.ac.ir/~khatibi/note.pdf · A Note to the Reader This outline note is intended to serve as a supplementary material to Linear Algebra part of the

A Companion to

Elementary Linear Algebra

for Physics Students

Gol Mohammad Nafisi∗

University of Tehran

Fall 2018

∗[email protected]

Contents

1 Preliminaries 1

1.1 Operations on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 On Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Vector Spaces 4

2.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Linear Maps and Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Inner Product Space 15

3.1 Inner Product, Norm and Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Orthonormal Basis and Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Hilbert Space and Operator Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

i

A Note to the Reader

This outline note is intended to serve as a supplementary material to Linear Algebra part of the undergraduate

“Mathematical Physics” course in the Physics department curriculum. I prepared it in order to be used in weekly

recitation sessions and it is NOT going to be a replacement for the class lectures or the course reference textbooks.

By this goal, I am not going to treat Linear Algebra rigorously (as it is done in Math departments). Rather, my

approach will be practical, meaning that it is more of a “survival kit for Physics undergrads”. As “One does not

simply walk into Mordor”, one does not also simply learn Mathematics just by memorizing stuff and watching

others do. Therefore we will try many examples and exercises. Problems which will be marked as exercise are

going to be solved during the recitations so I highly encourage you to ponder them and try to arrive at a solution

on your own or by discussing them with your peers. I do not claim on originality of the material whatsoever.

To quote wonderful David Tong, “My primary contribution has been to borrow, steal and assimilate the best

discussions and explanations I could find from the vast literature on the subject”. I will list couple of my favorite

resources at the end of this section just in case you might like to check them out. Last but not least, all comments

are most welcome.

• Some Useful Resources:

. Linear Algebra Done Right, Sheldon Axler, Springer

. Mathematics of Classical and Quantum Physics, F. W. Byron and R. W. Fuller, Dover

. Essential Linear Algebra, Joel G. Broida

. Mathematical Methods for Physicists (7th edition), G. B. Arfken, Academic Press

. Principles of Quantum Mechanics (Chapter 1), R. Shankar, Springer

ii

1 Preliminaries

“The beginner should not be discouraged if he finds he does not have the prerequisites for reading the prerequisites.”

- Paul Halmos

Prelude

In this note we are going to dive into the realm of Linear Algebra, but you might ask “What is Algebra in the

first place?”. Literally, Algebra means ’bone-setting’ or ’the reunion of broken parts’! If this is not helpful, you

can consider Algebra as a branch of Mathematics in which people study objects (like Sets), structures that these

objects build (like Categories, Groups, Rings, Modules,...) and the relations between them. Familiar Set-like

objects that you have encountered during your educations are the number systems: Natural (N), Integer (Z),

Rational (Q), Real (R) and Complex (C) numbers for which we have the following:

N ⊂ Z ⊂ Q ⊂ R ⊂ C

Now if we want to build algebraic structures from Sets, first we need to define operations on them.

1.1 Operations on Sets

Since the foundation of Mathematics (up until 1945) is based on Sets, we begin by recalling some of its concepts

that we need. Fortunately we all learned a thing or two about Sets at least during our high school education so I

will spare you the conceptual details.

Definition 1 (Map). Let X and Y be sets. A map f is a rule which assigns y ∈ Y for each x ∈ X and denoted

by

f : X → Y

Definition 2 (n-ary Cartesian Product). Let X1, ..., Xn be n sets. Then n-ary Cartesian product of these n

sets is the set of n-tuples defined as:

X1 × · · · ×Xn = {(x1, ..., xn) | ∀i ∈ {1, ..., n} : xi ∈ Xi}

1

Definition 3 (n-ary Cartesian Power). Let X be a set. Then n-ary Cartesian power of X is the set of n-tuples

defined as:

Xn := X ×X × · · · ×X︸︷︷︸n

= {(x1, ..., xn) | ∀i ∈ {1, ..., n} : xi ∈ X}

A familiar example is the 2-dimensional plane R2 = R× R which is the set of all points (x, y) where x, y ∈ R .

Now an n-ary operation on X takes n element of X and combines them into a single element of X. Examples

of such operations are Unary (1-ary), Binary (2-ary), Ternary (3-ary) , ... . We will restrict ourselves to Binary

operations throughout this note.

Definition 4 (Binary Operation). Let X be a non-empty set. A map

⋆ : X ×X → X

is called a Binary operation on X. It takes x1, x2 ∈ X and combines them into single element x1 ⋆ x2 ∈ X.

Now we say that ⋆ is

• associative if ∀x1, x2, x3 ∈ X : x1 ⋆ (x2 ⋆ x3) = (x1 ⋆ x2) ⋆ x3

• commutative if ∀x1, x2 ∈ X : x1 ⋆ x2 = x2 ⋆ x1

Familiar examples of Binary operations are Addition (+) and Multiplication (·) which both can be defined in any

algebraic structure.

1.2 On Algebraic Structures

One can classify Algebraic structures based on the number of sets and operations that are involved. Examples of

one set structures are:

• Group-like structures which involve one Binary operation (e.g. Monoid, Group, ...)

• Ring-like structures which involve two Binary operations (e.g. Ring, Field, ...)

2

Examples of two set structures are:

• Module-like structures which involve at least two Binary operations (e.g. Module, Vector Space, ...)

• Algebra-like structures which are defined over a ring and a module and involve two operations on the ring,

two on the module and one involving both the ring and the module, making up to five Binary operations

(e.g. Lie algebra, Inner Product Space, ...)

Note that we can construct other classes of structures by using these Algebraic structures together with structures

which are not Algebraic themselves (like Topological). One important example of such classes is called “Normed

Vector Space” which is the Mathematical foundation of Quantum Mechanics and we will soon meet her majesty.

By the way, you do not need to worry about these fancy terms here. Let me tell you a story.

Suppose that a good friend of yours suggests a piece of music. Let’s say it is “Joe Satriani’s Jumpin’ In”. You

hear it and your mind blows. You want to play it! But you can’t do it just by hearing it repeatedly (which was

your method of playing so far) since there are sounds that you don’t know how to produce. So you grab its music

sheet, but you don’t understand a thing! What does that piece of music have to do with these alien symbols? Later

you find that every pitch of that track can be represented by those strange symbols and the relations between them

will produce different parts of solos and riffs. You even might dig deeper and find out that in scales and octaves

one can find traces of group theory or Riemann zeta function. Now in my naive view, I guess that is the case

with Physics, us and Mathematics. If one wants to understand the Nature, one could at least do the courtesy of

observing its alphabets and grammar. The term “universe” that we love to use so much, comes from the Latin

“universum” meaning “one verse, or one song” after all.

3

2 Vector Spaces

“Young man, in mathematics you don’t understand things. You just get used to them.”

- John von Neumann

2.1 Vector Space

In order to define a vector space, we first need to define a Field. Since as I mentioned earlier, a vector space is a

Module-like structure which means it consists of two sets. You might note that when we use the term vector, we

do not mean it as a Geometrical arrow-like object per se, but rather we mean it as an abstract object. Later we

will see that we can represent this abstract object by some interesting and familiar class of objects called Matrices.

Definition 5 (Field). A structure (F,+, ·) is a field F if F is a non-empty set with at least two elements, with

“ + ” (addition) and “ · ” (multiplication) as two Binary operations defined on F such that ∀a, b, c ∈ F we have

the following axioms:

A.1. a+ b = b+ a (commutativity of addition)

A.2. a+ (b+ c) = (a+ b) + c (associativity of addition)

A.3. ∃0 ∈ F s.t. a+ 0 = a (existence of an additive identity)

A.4. ∃ − a ∈ F s.t. a+ (−a) = 0 (existence of additive inverse)

A.5. a · b = b · a (commutativity of multiplication)

A.6. a · (b · c) = (a · b) · c (associativity of multiplication)

A.7. (a+ b) · c = a · c+ b · c (distributivity)

A.8. ∃1 ∈ F s.t. 1 = 0 and x · 1 = x (existence of multiplicative identity)

A.9. if a = 0 then ∃ a−1 ∈ F (i.e. 1/a ) s.t. a · a−1 = 1 (existence of multiplicative inverse)

A exercise 1. One can show that Q, R and C are fields by checking that they satisfy the field axioms. Now state

an argument about Z being a field or not 1.

1The symbol A amounts no nothing but fun! It’s just a label that I like to use.

4

Definition 6 (Vector Space). Let V be a non-empty set (whose elements we call vectors) and F be a field

(whose elements we call scalars) and define two binary operations as:

• + : V × V → V s.t. ∀u,v ∈ V : u+ v ∈ V (vector addition)

• · : F × V → V s.t. ∀u ∈ V, a ∈ F : a · u ∈ V (scalar multiplication)

Now (V,F ,+, ·) is a vector space V over F , if ∀u,v,w ∈ V and ∀ a, b ∈ F the following eight axioms hold:

B.1. u+ v = v + u (commutativity of addition)

B.2. u+ (v +w) = (u+ v) +w (associativity of addition)

B.3. ∃0 ∈ V s.t. u+ 0 = u (identity element of addition)

B.4. ∃ − u ∈ V s.t. u+ (−u) = 0 (inverse elements of addition)

B.5. a · (u+ v) = a · u+ a · v (distributivity of scalar multiplication with respect to vector addition)

B.6. (a+ b) · u = a · u+ b · u (distributivity of scalar multiplication with respect to field addition)

B.7. ∃1 ∈ F s.t. 1 · u = u (identity element of scalar multiplication)

B.8. a · (b · u) = (a · b) · u (associativity of scalar multiplication with field multiplication)

Remark 1. From now on we will drop “·” from scalar multiplication a · v and write av instead. Also the bold

letters and numbers denote vectors while the plain ones will denote scalars. For example 0 is scalar while 0 is the

additive identity vector.

Example 1. Show that ∀a ∈ F and ∀ |v⟩ ∈ V we have:

i. 0v = 0 ii. a0 = 0 iii. (−1)v = −v

Solution 1.

i. We know that

0 + 1 = 1 ⇒ (0 + 1)v + (−v) = 0

Now using axiom (B.6) of def.6 we have:

0v = 0

5

ii. We have:

a0 = a(0+ 0) = a0+ a0

Now adding additive inverse of a0 which is −a0 to both sides and using axiom (B.4) we get:

0 = a0

iii. We know that:

1 + (−1) = 0 ⇒ (1 + (−1))v = 0v ⇒ v + (−1)v = 0

Therefore from axiom (B.4), (−1)v must be the additive inverse of v i.e. (−1)v = −v

Remark 2. Throughout this note we restrict F to be either R or C and the vector spaces defined on them are

called real or complex vector spaces respectively.

Some examples of vector spaces include:

• ∀xi ∈ R and ∀i ∈ {1, ..., n}, the set of n-component vectors denoted by

x1...

xn

form a real vector space.

• ∀xij ∈ C and ∀i ∈ {1, ...,m} and ∀j ∈ {1, ..., n}, the set of m× n matrices

x11 · · · x1n... . . . ...

xm1 · · · xmn

form a complex vector space.

• Let S be a non-empty set and F be a field such that FS denotes the set of functions from S to F . Now if:

i. ∀f, g ∈ FS , x ∈ S we have (f + g)(x) = f(x) + g(x) ∈ FS

ii. ∀λ ∈ F we have (λf)(x) = λf(x) ∈ FS

6

Then FS is a vector space over F . For example, R[0,1] which is the set of real-valued functions on the interval

[0, 1] is a vector space over R.

• The set Pn[F ] of all polynomials p(x) of the form:

∀x, p(x), ai ∈ F : p(x) = a0 + a1x+ · · ·+ anxn

form a vector space (where n is a positive integer and is called degree of polynomial), in which the vector

addition and scalar multiplication are defined respectively as:

n∑i=0

aixi+

n∑i=0

bixi =

n∑i=0

(ai + bi)xi

cn∑

i=0

aixi =

n∑i=0

(cai)xi

A exercise 2. If V = C and F = R, then C is a vector space on R. Now let V = R and F = C. Does V form a

vector space on F ?

Definition 7 (Subspace). Let V be a vector space on F . A subset W of V is a subspace of V if U forms a vector

space U on F .

Example 2. Let M be the set of all M2×2 real matrices which forms a real vector space. Consider the subset

N = {A ∈ M| det(A) = 0}. Is N a subspace of M ?

Solution 2. Consider the matrices A =

1 0

0 0

, B =

0 0

0 1

∈ N . From def.6, we see that if N is a vector space

then it must be closed under vector addition i.e. A+ B should be in N. But we see that A+ B =

1 0

0 1

/∈ N

since det(A+B) = 1. Hence from def.7 N is not a vector space ergo it is not a subspace of M

7

Definition 8 (Linear Combination). Let V be a vector space over F and let S = {v1, ...,vn} be a list which

contains n vectors in V. Given any set of scalars {a1, ..., an} ∈ F , the vector

n∑i=1

aivi = a1v1 + · · ·+ anvn (1)

is called a linear combination of n vectors and the set of all such linear combinations of elements in S is called

the subspace spanned by S and denoted by span(v1, ...,vn).

Remark 3. A vector space V is said to be finite dimensional if it is spanned by some list of vectors in V. Otherwise

it is infinite dimensional.

Definition 9 (Linear Independence). A list of vectors {v1, ...,vn} in V are linearly independent if the equation

a1v1 + · · ·+ anvn = 0 (2)

only has a trivial solution a1 = a2 = · · · = an = 0. Otherwise they are linearly dependent. If a list of vectors are

linearly independent, it means that we can not write any member of the list in terms of the others.

Example 3. Check the linear dependency of these vectors:

i. x1 = (1, 0, 0) , x2 = (0, 1, 0) , x3 = (0, 0, 1) are three vectors in R3.

ii. y1 = (1, 0, 0) , y2 = (0, 1, 2) , y3 = (1, 3, 6) are three vectors in R3.

Solution 3.

i. Since a1x1 + a2x2 + a3x3 = (a1, a2, a3), then (a1, a2, a3) = 0 holds only for a1 = a2 = a3 = 0. Hence three

vectors are linearly independent.

ii. Since y3 = y1+3y2 then it implies that we can write one vector in terms of the others hence three vectors are

linearly dependent.

Now if the vector space be the space of r-times continuously differentiable functions on an open interval I denoted

by Cr(I), then there is a simple way to check the linear dependency of a list of its elements. First we need to

define a Wronskian and use the theorem that follows.

8

Definition 10 (Wronskian). If f1, ..., fn be n real- or complex-valued (n − 1)-times differentiable functions on

an interval I, then the Wronskian W (f1, ..., fn) is a function on I and is defined as a determinant:

W (f1, ..., fn)(x) :=

∣∣∣∣∣∣∣∣∣∣∣∣∣

f1(x) f2(x) · · · fn(x)

f ′1(x) f ′2(x) · · · f ′n(x)

...... . . . ...

f(n−1)1 (x) f

(n−1)2 (x) · · · f

(n−1)n (x)

∣∣∣∣∣∣∣∣∣∣∣∣∣(3)

where x ∈ I

Theorem 1. Let f1, ..., fn be differentiable functions on the interval I. If for some x0 ∈ I the Wronskian

W (f1, ..., fn)(x) = 0 then f1, ..., fn are linearly independent. If f1, ..., fn are linearly dependent then

∀x ∈ I : W (f1, ..., fn)(x) = 0 (4)

Example 4. Let {ex, e2x, e3x} be a list of functions in the vector space of C∞(−1, 1). Check their linear dependency.

Solution 4. By computing the Wronskian we have:

W (x) =

∣∣∣∣∣∣∣∣∣∣ex e2x e3x

ex 2e2x 3e3x

ex 4e2x 9e3x

∣∣∣∣∣∣∣∣∣∣= exe2xe3x

∣∣∣∣∣∣∣∣∣∣1 1 1

1 2 3

1 4 9

∣∣∣∣∣∣∣∣∣∣= 2e6x

Now since W (x) = 0 on (−1, 1) then by theorem (1), these vectors are linearly independent.

Another important definition is the basis of a vector space.

Definition 11 (basis). A basis of vector space V over F is a list of vectors {v1, ...,vn} which is linearly independent

and spans V and every v ∈ V can be written uniquely in the form:

v =

n∑i=1

aivi , ai ∈ F

The dimension of the V denoted by dim(V) is the number of its basis vectors.

9

One important example of a vector space and its basis is the real vector space of 2× 2 Hermitian matrices which

appears in Quantum Mechanics. I just mention that the basis of this space is the list {1, σ1, σ2, σ3} in which 1 is

the identity matrix and σi are Pauli matrices defined as:

σ1 =

0 1

1 0

, σ2 =

0 −i

i 0

, σ3 =

1 0

0 −1

so this vector space is 4-dimensional. Now we turn ourselves to the concept of linear map and dual space.

2.2 Linear Maps and Dual Space

Definition 12 (Linear Map). A linear map (also called linear transformation) from vector spaces V to W is a

map T : V → W that satisfies

T (v1 + v2) = Tv1 + Tv2 (additivity)

T (λv) = λ(Tv) (homogeneity)

The set of all linear maps from V to W denoted by L(V,W) is a vector space if for S, T ∈ L(V,W), λ ∈ F we

have:

(S + T )v = Sv + Tv

(λT )v = λ(Tv)

A exercise 3. Let C(R) be the vector space of real functions. Define the map T as T (f(x)) = (f(x))2 for

f ∈ C(R). Determine whether T is a linear map or not.

Example 5. Let V = C∞(R) be the vector space of all C∞ real-valued functions (smooth function, differentiable

for all degrees of differentiation). Let L(V) be the vector space of all linear transformations from V to V. Prove that:

i. Differentiation which is a map D defined as D(f(x)) = ddxf(x) and integration which is a map S defined as

S(f(x)) =∫f(x) dx, for all f ∈ L(V) are linear maps.

ii. Let T1, T2, T3 ∈ L(V) be defined as:

T1(f(x)) =d

dxf(x) , T2(f(x)) =

d2

dx2f(x) , T3(f(x)) =

∫ x

0f(t) dt

Then determine whether the list {T1, T2, T3} is linearly independent or not.

10

Solution 5.

i. For f, g ∈ C∞(R) and λ, µ ∈ R we have:

D(λf(x) + µg(x)) =d

dx(λf(x) + µg(x)) = λ

d

dxf(x) + µ

d

dxg(x) = λD(f(x)) + µD(g(x))

Same argument goes for integration as well:

S(λf(x) + µg(x)) =

∫ (λf(x) + µg(x)

)dx

= λ

∫f(x) dx+ µ

∫g(x) dx = λS(f(x)) + µS(g(x))

ii. We need to show that a1T1 + a2T2 + a3T3 = 0 has a trivial solution a1 = a2 = a3 = 0 for all a1, a2, a3 ∈ R

and for all f ∈ C∞(R). First let f(x) = 1. Then we have:

T1(1) = T2(1) = 0 , T3(1) =

∫ x

01 dt = x ⇒ a3x = 0 → a3 = 0

Now let f(x) = x. Then:

T1(x) = 1 , T2(x) = 0 ⇒ a1 = 0

Finally let f(x) = x2. Then:

T2(x2) = 2 ⇒ 2a2 = 0 → a2 = 0

Therefore we have a trivial solution ergo the list is linearly independent.

One can safely say that linear algebra deals with the vector spaces and linear maps and the objects that can

represents them. If a linear map is from a vector space to itself it is called an Operator and the vector space

of operators is denoted by L(V). These are the maps Physicists use in Quantum Mechanics and to quote Axler,

“The deepest and most important parts of linear algebra deal with operators.”

Now since L(V,W) is a vector space, it implies one nice feature that is the ability to multiply vectors (maps or

operators) which will come quite handy, for example when applying multiple operators to a state ket for measuring

an observable in Quantum Mechanics

11

Definition 13 (Product of Linear Maps). Let T ∈ L(U ,V) and S ∈ L(V,W), then the product ST ∈ L(U ,W)

is defined by:

(ST )u = S(Tu) ∀u ∈ U

with the properties:

• associativity: (T1T2)T3 = T1(T2T3)

• identity: TI = IT = T

• distributivity: (S1 + S2)T = S1T + S2T and S(T1 + T2) = ST1 + ST2

There is a special type of linear maps which map a vector space into its field F . They are called linear functionals

and the vector space of them defines a Dual Space.

Definition 14 (Linear Functional). A linear functional on vector space V is a linear map ϕ : V → F .

For example, these maps are linear functionals:

• ϕ : R3 → R s.t. ϕ(x, y, z) = 4x− 5y + 2z

• For fixed (a1, ..., an) ∈ Fn, ϕ : Fn → F s.t. ϕ(x1, ..., xn) =∑n

i=1 aixi

• ϕ : P[R] → R s.t. ϕ(p) =∫ 10 p(x) dx

Definition 15 (Dual Space). The dual space of V denoted by V∗ is the vector space of all linear functionals on

V such that dim(V) = dim(V∗). Its elements are called “dual vectors”, “co-vectors” or “one-forms”.

Now as for the V, the dual space V∗ does also have basis called dual basis. In Dirac notation (i.e. bra-ket notation),

each vector v is called “ket” and denoted by |v⟩ and each dual vector ϕ is called “bra” which is denoted by ⟨ϕ| .

We should note that the kets and bras are elements of a special type of vector space called complex Hilbert space

denoted by H which will be introduced later. Hence the action of linear functional ϕ : H → C is written neatly

as ⟨ϕ|v⟩ which is an element of C.

Definition 16 (Dual Basis). If {v1, ...,vn} is a basis for V, then its dual basis is the list {v∗1, ...,v

∗n} ∈ V∗ of

linear functionals on V such that:

v∗i (vj) = δij s.t. δij =

1 if i = j

0 if i = j

(5)

12

Warning 1. Sometimes people use the convention v∗i ≡ vi to denote the dual vectors. Moreover, sometimes

people (specially in General Relativity texts) choose to write vectors by its components as V = V iei denoting V i

as components (scalars) and ei as the basis of vector space. Then they write a dual vector as V ∗ = Vie∗i denoting

Vi as dual components and e∗i as the dual basis. Frustrating, right?! The important thing one should do while

reading a book or a paper is to determine which conventions the author is using.

Remark 4 (Dual Basis in Dirac Notation). In our bra-ket notation, if {|e1⟩ , ..., |en⟩} is a basis of H, then

the dual basis of {|e1⟩ , ..., |en⟩} is the list {⟨ε1| , ..., ⟨εn|} of elements of H∗ where each ⟨εi| is the linear functional

on H such that

⟨εi|ej⟩ = δij s.t. δij =

1 if i = j

0 if i = j

Remark 5. Any vector |ψ⟩ can be expressed in terms of basis vectors as |ψ⟩ =∑

j αj |ej⟩ for αj ∈ C. Any linear

functional ⟨ϕ| can be expressed in terms of dual basis vectors as ⟨ϕ| =∑

i βi ⟨εi| for βi ∈ C. Therefore the action

of a one-form on a vector can be expressed as:

⟨ϕ|ψ⟩ =∑ij

αjβi ⟨εi|ej⟩ =∑ij

αjβi δij =∑i

αiβi (6)

Remark 6. One can talk about a one-to-one correspondence between elements in H and H∗ for fixed basis of

both spaces. In this case, corresponding to a ket |ψ⟩ =∑

k αk |ek⟩ ∈ H , there exists a bra ⟨ψ| =∑

k αk ⟨εk| ∈ H∗

such that αk is the complex conjugate of αk. Note that from now on, I will choose the standard “bar” notation

representing the complex conjugation to distinguish it from “asterisk (∗)” which is used for dual vectors. Now the

reason for this conjugation is to enable one to define a norm. We will discuss these issues later.

Remark 7. One useful way of representing bras and kets is to consider |ψ⟩ as a column vector and ⟨ϕ| as a row

vector:

|ψ⟩ :=

α1

...

αn

and ⟨ϕ| :=(β1 · · · βn

)

so that ⟨ϕ|ψ⟩ is regarded as just a matrix multiplication of a row vector and a column vector which yields a scalar

n∑i=1

αiβi ∈ C

13

Example 6. Let’s consider R2 with the familiar basis {i, j}. Now suppose that a linear functional f : R2 → R is

defined such that we know f(i− j) = 2 and f(i+ j) = 0. What is the action of f on individual basis vectors i.e.

f(i) and f(j) ?

a. f(i) = 1 , f(j) = 0 b. f(i) = −1 , f(j) = 1 c. f(i) = 1 , f(j) = −1 d. f(i) = 2 , f(j) = 0

Solution 6. Using the fact that linear functional is indeed linear i.e. f(αi± βj) = αf(i)± βf(j) , we have here:

f(i)− f(j) = 2 , f(i) + f(j) = 0 ⇔

f(i)− f(j) = 2

f(i) + f(j) = 0

Now we have a system of linear equations to solve and by using the methods of solving them which you are familiar

with, we get: f(i) = 1

f(j) = −1

14

3 Inner Product Space

“You know, you remind me of a poem I can’t remember,

and a song that may never have existed, and a place I’m not sure I’ve ever been to.”

- Grampa Simpson

One can define some additional structures on a given algebraic or topological structure. This will enable one

to introduce and explore extra rich structures and for the case of a Physicist, to apply them on the Physical

phenomenas that he/she is studying. Now for the vector spaces, one can define an additional structure called

inner product which is a generalization of the usual dot product that we are familiar with from the geometrical

intuition that we had to vectors, like the length of a vector and angle between vectors. In fact one can think

of an inner product space as a generalization of Euclidean space in which the inner product is the scalar or dot

product which we will begin this section by reviewing it. The vector spaces endowed with an inner product (inner

product spaces) are truly important in Physics. Among these type of spaces, there is a very special one which all

the Quantum Mechanics is built upon. It is entitled the “Hilbert Space” after an influential 19th and early 20th

century Mathematician David Hilbert. Although the treatment of Hilbert space is done in a branch of Analysis

called “Functional Analysis”, which is beyond the scope of this note, we will indeed scratch its surface to hopefully

prepare some essentials for your future journey into the heart of Quantum Mechanics.

3.1 Inner Product, Norm and Inequalities

First let us recall the definition of dot product on Euclidean space.

Definition 17 (Dot Product). ∀x, y ∈ Rn, the dot product of x and y denoted by x · y is defined as:

x · y := x1y1 + x2y2 + · · ·+ xnyn =n∑

i=1

xiyi (7)

where x = (x1, x2, ..., xn) and y = (y1, y2, ..., yn), and satisfies these conditions:

1. x · x ≥ 0 2. x · x = 0 iff x = 0 3. x · y = y · x

Definition 18 (Inner Product). Let V be a vector space on F (where F is either R or C). An inner product

on V is a map ⟨· , ·⟩ : V × V → F such that ∀u,v,w ∈ V and ∀λ ∈ F we have the following axioms:

15

C.1. ⟨v,v⟩ ≥ 0 (positivity)

C.2. ⟨v,v⟩ = 0 iff v = 0 (definiteness)

C.3. ⟨u+ v,w⟩ = ⟨u,w⟩+ ⟨v,w⟩ (additivity in first slot)

C.4. ⟨λu,v⟩ = λ ⟨u,v⟩ (homogeneity in first slot)

C.5. ⟨u,v⟩ = ⟨v,u⟩ (conjugate symmetry)

Remark 8. On following the standard notation for representing conjugation i.e. “bar” instead of “asterisk”, for

z = x+ iy ∈ C we have:

1. z = x− iy

2. z1z2 = z1z2

3. z1 ± z2 = z1 ± z2

4. |z| = |z|

5. zz = |z|2

6. (z1/z2 ) = z1/z2

Warning 2. You should note that in axiom (C.4), I have chosen a convention used in Mathematics literature

since I learned a great deal of these stuff by them and to be honest, I am more comfortable by those conven-

tions. Now you might encounter a case (most often in Physics literature) where the author chooses an alternative

convention and writes ⟨λu,v⟩ = λ ⟨u,v⟩ instead. Also, they might demand the additivity in the second slot as

⟨u,v +w⟩ = ⟨u,v⟩ + ⟨u,w⟩ instead of first slot that we chose. Although it is OK to choose any convention (as

long as the author keeps it to the end, as I mentioned earlier), you should know that the convention for additivity

in the second slot comes from defining inner product through the bilinear form and its generalization sequilinear

form in which they choose to denote inner product by (· , ·) instead of ⟨·, ·⟩ which we have chosen. Please take my

warning.1 together with this one seriously to avoid encounter any confusion from now on.

Example 7. Axiom (C.3) states that the inner product is additive in first slot. Does this property hold in the

second slot as well?

Solution 7. Using axioms (C.4) and (C.5) we have:

⟨u, λv + βw⟩ = ⟨λv + βw,u⟩ = λ ⟨v,u⟩+ β ⟨w,u⟩ = λ ⟨u,v⟩+ β ⟨u,w⟩

Hence the inner product is anti-linear or conjugate linear in the second slot. Note that a map f in a complex

vector space is anti-linear if f(av + bw) = af(v) + bf(w) .

16

Some examples of inner products are:

• The inner product on Cn defined as:

⟨x,y⟩ :=n∑

i=1

xiyi (8)

for x = (x1, ..., xn),y = (y1, ..., yn) ∈ Cn

• The most general form of an inner product on Cn is called Hermitian form and defined as:

⟨x,y⟩ := y†Mx (9)

where “†” (dagger) is the conjugate transpose (also called Hermitian conjugate or Hermitian adjoint) and

M is any positive-definite Hermitian matrix i.e. a Hermitian matrix that all its eigenvalues are positive.

• Let C[a,b] be the vector space of all continuous complex-valued functions on the interval [a, b]. Then the

inner product on C[a,b] is defined as:

⟨f, g⟩ :=∫ b

af(x) g(x) dx (10)

• The inner product on P[R] is defined as:

⟨p, q⟩ :=∫ ∞

0p(x) q(x) e−x dx (11)

Definition 19 (Inner Product Space). An inner product space is a vector space V with an inner product defined

on V.

Definition 20 (Norm). The norm of a vector v ∈ V, denoted by ∥v∥, is given by:

∥v∥ :=√⟨v,v⟩ (12)


i. ∥v∥ = 0 iff v = 0 ii. ∥λv∥ = |λ| ∥v∥

A exercise 4. Let x = (x1, ..., xn) ∈ Rn and z = (z1, ..., zn) ∈ Cn . Write down the explicit formula for ∥x∥ and

∥v∥ .

17

Figure 1: going from A to B in a taxicab world

Note that the norm defined by def.(20) is called Euclidean norm if v ∈ Rn and 2-norm (or L2-norm) in general,

and guess what?! We have other kinds of norms for a given vector as well. They are called “p-norms”.

Definition 21 (p-norm). Let x = (x1, ..., xn) be a vector in Fn. For a real number p ≥ 1, the p-norm (or

Lp-norm) of x is defined as:

∥x∥p :=(|x1|p + |x2|p + · · ·+ |xn|p

) 1p=( n∑

i=1

|xi|p) 1

p (13)

When p→ ∞ we have a L∞-norm, called maximum norm or uniform norm, given by:

∥x∥∞ = max{|x1|, |x2|, ..., |xn|} (14)

A exercise 5. Let v = (1, 2, 3) be a vector in R3. Calculate ∥v∥p for p = 1, 2, 3 and its L∞-norm.

It is fun to dig the p = 1 or L1-norm a little bit, since people has given an interesting name to it: “Taxicab” or

“Manhattan” norm! Imagine that we are living in a city which is laid out in a grid system where the measure

of distance is an edge of square cell being one unit (fig.1). Let’s say that your home is at A and you want to

go to your friend’s at B and you can’t get there by flying of course, or by jumping rooftops since you’re not

Batman/Catwoman. Are you? Now you either have to walk or take a cab (or any other vehicle). How many units

of distance you need to take to get there?

If you look again at fig.1, you can see that it’s 12 units. In fact some possible paths that you might take is shown

by red, blue and yellow lines. What about green line? That’s for when you can fly straight from A to B. How

many units for green line? Yes, approximately 8.49 units by using the ancient Pythagoras’ theorem.

18

More formally, the metric (or p-distance) dp between two vectors x = (x1, ..., xn) and y = (y1, ..., yn) is defined

(the notion of metric comes from a more general space called metric space):

dlp(x,y) = ∥x− y∥p = p

√√√√ n∑i=1

|xi − yi|p (15)

So for our example, if we set the coordinates of your home (x1, y1) at origin (0, 0) and your friend’s at (x2, y2) =

(6, 6), then the taxicab distance will be d1 = |x1 − x2|+ |y1 − y2| = 12.

Definition 22 (Matrix Norm). For matrices, one can define different types of norms as well. The formal

definition comes from the operator norm or induced norm. Let Km×n be the vector space of all m × n matrices

over K. Let ∥·∥p be a norm on Kn and ∥·∥q be a norm on Km. Then the matrix norm induced by these norms is

defined by:

∥A∥p,q = max{∥Ax∥q

∥x∥p: x ∈ Kn and x = 0

}(16)

If we set p = q (which we will assume it from now on), then:

∥A∥p = max{∥Ax∥p

∥x∥p: x ∈ Kn and x = 0

}(17)

and the intuitive idea behind the concept of a matrix norm will be that of how much matrix A amplifies its

given input vector. Why? First, in functional analysis there is a special class of linear operators called bounded

operators for which there is a property:

∀v ∈ V : ∥Lv∥ ≤ ∥L∥op ∥v∥ (18)

so the action of operator on the elements of a vector space will form a set whose elements are the ratio of operator

action to the norm of the vector it is acting on i.e. ∥Lv∥∥v∥ . Then the norm of L is defined as the supremum of this

set:

∥L∥op := sup{∥Lv∥∥v∥

: v ∈ V with v = 0} (19)

So from this definition it one can see that for any bounded operator, the norm is the biggest case of scaling an

input. Now since any bounded operator in Hilbert space admits a matrix representation (we can represent it by a

specific matrix), therefore our argument following the operator norm goes for the matrix norm as well. Second, as

we will see soon below that one can define a norm of matrix by its largest eigenvalue. But what is the significance

of eigenvalues in the first place?

19

Although we will come back to the related issues on Matrices later in this note with details, here just think of

them as magnitudes of scaling for certain vectors after the action of a given matrix. This comes from the classic

eigenvalue problem in which for λ ∈ C being an eigenvalue of a matrix A means satisfying the equation:

Av = λv

where vector v is called eigenvector. So the largest λ means the most scaling of v.

Remark 9. Calculating matrix norm using the definition above is not an easy thing to do. However there are

some special cases in which the computation becomes simpler. These cases are for p = 1, 2,∞ together with

Frobenius norm and defined as:

i. (maximum absolute column sum norm): For p = 1 we have

∥A∥1 = maxj

n∑i=1

|aij | (20)

ii. (spectral norm): Let A† be the conjugate transpose of A and {λi(A†A)} be the eigenvalues of A. Then for

p = 2 we have

∥A∥2 = maxi

√λi(A†A) (21)

iii. (maximum absolute row sum norm): Its ∞-norm is

∥A∥∞ = maxi

n∑j=1

|aij | (22)

iv. (Frobenius norm):

∥A∥F =

√√√√ m∑i=1

n∑j=1

|aij |2 =√tr(A†A) (23)

If A be a operator in Hilbert space, then the Frobenius norm is called Hilbert-Schmidt norm denoted by ∥A∥HS

Example 8. Let A =

1 2

3 4

. Calculate the norms ∥A∥1 , ∥A∥2 , ∥A∥∞ , ∥A∥F .

20

Solution 8.

a.

∥A∥1 = max{|1|+ |3|, |2|+ |4|} = max{4, 6} = 6

b.

∥A∥∞ = max{|1|+ |2|, |3|+ |4|} = max{3, 7} = 7

c.

∥A∥F =√|1|2 + |2|2 + |3|2 + |4|2 =

√30 ≈ 5.4772

d.

A†A =

10 14

14 20

Now to find its eigenvalues, we need to find the roots of its characteristic equation:

det

10 14

14 20

− λ

1 0

0 1

= 0 ⇒ λ2 − 30λ+ 4 = 0 ⇒ λ =

15 +

√221

15−√221

Hence ∥A∥2 = max{√

15 +√221,

√15−

√221}=√

15 +√221 ≈ 5.4649

A exercise 6. Calculate the ∥A∥1 , ∥A∥2 , ∥A∥∞ , ∥A∥F norms of A =

2 −2 1

−1 3 −1

2 −4 1

.

Definition 23 (Orthogonality). Two vectors u,v are called orthogonal if ⟨u,v⟩ = 0 .

Example 9 (Pythagorean Theorem). Suppose that u,v are orthogonal vectors in V. Prove that ∥u+ v∥2 =

∥u∥2 + ∥v∥2 .

Solution 9. ∥u+ v∥2 = ⟨u+ v,u+ v⟩ = ⟨u,u⟩+ ⟨u,v⟩+ ⟨v,u⟩+ ⟨v,v⟩ = ∥u∥2 + ∥v∥2

Theorem 2 (Cauchy-Schwartz Inequality). Let V be an inner product space. Then for u,v ∈ V the following

inequality holds:

|⟨u,v⟩| ≤ ∥u∥ ∥v∥ (24)

21

The equality holds iff one of the vectors is a scalar multiple of the other. Also in Rn this inequality becomes:

(n∑

i=1

uivi

)2

≤

(n∑

i=1

u2i

) (n∑

i=1

v2i

)(25)

Here we are not going to prove this theorem and the following one, you can find the proofs almost in every related

textbook. Instead we will use the inequality to obtain a relation between L1 and L2 norms.

Example 10. Prove that ∀x ∈ Rn : ∥x∥1 ≤√n ∥x∥2 .

Solution 10. We have from the definition:

∥x∥1 =n∑

i=1

|xi| =

(n∑

i=1

|xi|

)· (1)

The right hand side is the inner product therefore satisfies the Cauchy-Schwartz inequality. Hence

∥x∥1 ≤

(n∑

i=1

|xi|2) 1

2(

n∑i=1

12

) 12

=√n ∥x∥2 ⇒ ∥x∥1 ≤

√n ∥x∥2

Theorem 3 (Triangle Inequality). ∥u+v∥ ≤ ∥u∥+∥v∥ . Again the equality holds iff one vector is non-negative

multiple of the other.

A exercise 7. Prove ∥u+ v∥2 + ∥u− v∥2 = 2(∥u∥2 + ∥v∥2

).

3.2 Orthonormal Basis and Gram-Schmidt Procedure

Definition 24 (Orthonormal Basis). A list of vectors {e1, ..., en} is called orthonormal if each vector in the

list has norm 1 and is orthogonal to all the other vectors in the list. In other words:

⟨ei, ej⟩ = δij (26)

Any such set is called complete if it is not a subset of any larger orthonormal list of vectors in the given vec-

tor space. Any complete set is basis. An orthonormal list of vectors which is also a basis is called orthonormal basis.

A exercise 8. Do the vectors ( 1√3, 1√

3, 1√

3), (− 1√

2, 1√

2, 0) and ( 1√

6, 1√

6,− 2√

6) form an orthonormal list?

22

Orthonormal basis is important specially in Quantum Mechanics. For example, for spin-12 systems we can choose

either |Sz;±⟩ or |Sx;±⟩ as the basis kets. Then we want to know how these two descriptions are related to

each other. That is done by a unitary transformation and the process is called change of basis or change of

representation. If we are talking about matrices, a change of basis will be a rotation. The condition for performing

such a process is to have at least two orthonormal basis. Another important example is the case of function spaces

like Fourier Analysis. The reason why a function can be represented by the Fourier decomposition is that the set

of exponential functions {eint} form an orthonormal basis:

1

2π

∫ π

−πeint · e−imt dt =

1 if n = m

0 if n = m

(27)

Now if we are given a basis for some inner product space V, it is possible to construct an orthonormal basis for

the given space. It can be done by a process called “Gram-Schmidt procedure.

Definition 25 (Gram-Schmidt Procedure). Suppose {v1, ...,vm} be a linearly independent list of vectors in

V. Let

e1 =v1∥v1∥

Now for j = 2, 3, ...,m, define ej by:

ej =vj − ⟨vj , e1⟩ e1 − · · · − ⟨vj , ej−1⟩ ej−1

∥vj − ⟨vj , e1⟩ e1 − · · · − ⟨vj , ej−1⟩ ej−1∥(28)

Then {e1, ..., em} is an orthonormal list of vectors in V.

Example 11. For the given vectors in R3:

v1 = (1, 1,−2) , v2 = (1, 2,−3) , v3 = (0, 1, 1)

find an orthonormal set of vectors.

Solution 11. By the Gram-Schmidt procedure, letting e1 = v1∥v1∥ we have:

∥v1∥ =√⟨v1,v1⟩ =

√12 + 12 + (−2)2 =

√6 ⇒ e1 =

1√6(1, 1,−2)

Now for the next vector we have:

e2 =v2 − ⟨v2, e1⟩ e1

∥v2 − ⟨v2, e1⟩ e1∥

23

Hence:

v2 − ⟨v2, e1⟩ e1 = (1, 2,−3)− 9√6

1√6(1, 1,−2) = (−1

2,1

2, 0) ⇒ ∥v2 − ⟨v2, e1⟩ e1∥ =

1√2

⇒ e2 =1√2(−1, 1, 0)

Now for the last vector we have:

e3 =v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2∥v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2∥

So that:

v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2 = (0, 1, 1)− (− 1√6)1√6(1, 1,−2)− 1√

2

1√2(−1, 1, 0) =

2

3(1, 1, 1)

∥v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2∥ =2√3

⇒ e3 =1√3(1, 1, 1)

Therefore the obtained {e1, e2, e3} is the desired orthonormal vectors.

Example 12. Find an orthonormal basis of P(R) with the basis {1, x, x2} where the inner product is given by

⟨p, q⟩ =∫ 1−1 p(x) q(x) dx .

Solution 12. By the Gram-Schmidt procedure, we first evaluate e1 from v1 ≡ 1 by e1 = 1∥1∥ where ∥1∥ is obtained

by the given inner product:

∥1∥ =√⟨1, 1⟩ =

√∫ 1

−112 dx =

√2 ⇒ e1 =

1√2

Now for the next basis we have:

e2 =v2 − ⟨v2, e1⟩ e1

∥v2 − ⟨v2, e1⟩ e1∥=

x− ⟨x, e1⟩ e1∥x− ⟨x, e1⟩ e1∥

So:

x− ⟨x, e1⟩ e1 = x−(∫ 1

−1x (

1√2) dx

)1√2= x and ∥x∥ =

√∫ 1

−1x2 dx =

√2

3⇒ e2 =

√3

2x

Now for the third basis we have:

e3 =v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2

∥v3 − ⟨v3, e1⟩ e1 − ⟨v3, e2⟩ e2∥=

x2 − ⟨x2, e1⟩ e1 − ⟨x2, e2⟩ e2∥x2 − ⟨x2, e1⟩ e1 − ⟨x2, e2⟩ e2∥

Now:

x2 − ⟨x2, e1⟩ e1 − ⟨x2, e2⟩ e2 = x2 −(∫ 1

−1x2 (

1√2) dx

)1√2−

(∫ 1

−1x2 (

√3

2x) dx

) √3

2x = x2 − 1

3

24

And: ∥∥∥∥x2 − 1

3

∥∥∥∥ =

√∫ 1

−1

(x4 − 2

3x2 +

1

9

)dx =

√8

45⇒ e3 =

√45

8

(x2 − 1

3

)

Therefore { 1√2,√

32 x,

√458

(x2 − 1

3

)} is an orthonormal list in P(R) .

A exercise 9. Find e4 and e5 for the example.12 by continuing the Gram-Schmidt procedure.

3.3 Orthogonal Polynomials

The orthonormal basis {ei} that we found in the previous example are among the important special polynomials

that are used in Physics. They are called “Legendre Polynomials” and they appear for example in the solutions

of Laplace equation ∇2Φ = 0 in Electrostatics and Schrodinger equation in Quantum Mechanics which both

separated in spherical coordinates. For the Laplace equation we have:

d

dx

[(1− x2

) dP (x)dx

]+

[l (l + 1)− m2

1− x2

]P (x) = 0 s.t. x = cos θ and l ∈ Z+ ∪ {0}

d2U(r)

dr2− l(l + 1)

r2U(r) = 0

(29)

Now if we require for the angular equation to have no azimuthal dependence (i.e. m = 0) we get the standard

Legendre’s differential equation for the angular part together with an ordinary differential equation for the radial

part as:d

dx

[(1− x2

) dP (x)dx

]+ l (l + 1)P (x) = 0

d2U(r)

dr2− l(l + 1)

r2U(r) = 0

(30)

which has the general power series solution of the form (which describe an electrostatic potential inside or outside

of an spherical region):

Φ(r, θ) =∞∑i=0

[Al r

l +Bl r−(l+1)

]Pl(cos θ) (31)

where Pl(cos θ) are Legendre polynomials that we obtained in the example:

P0(x) = 1 , P1(x) = x = cos θ , P2(x) =1

2

(3x2 − 1

)=

1

4(3 cos 2θ + 1) (32)

25

You should note that the difference in the coefficients appears from the orthonormality condition that we have

set. This condition for the Legendre’s polynomial is expressed as:

∫ 1

−1Pn(x)Pm(x) dx =

2

2n+ 1if n = m

0 if n = m

(33)

Remark 10. The Legendre’s polynomials are among the spacial class of functions called “orthogonal polynomi-

als” or “orthogonal functions” which play an important role in describing the various Physical systems. Upon this

importance and their relation to the inner product spaces, I will give you their definition with examples.

Definition 26 (System of Orthogonal Polynomials). Let {Φn(x)} be a list of polynomials defined on the

interval a < x < b such that Φn is of degree n, and let w(x) > 0 be a function defined on the same interval and

call them weight function. We call the positive numbers ∥Φn∥2 the norm defined by:

∥Φn∥2 =∫ b

a[Φn(x)]

2 w(x) dx (34)

Then {Φn(x)} is said to be orthogonal over a < x < b with respect to the weight function if:

∫ b

aΦm(x)Φn(x)w(x) dx =

∥Φn∥2 if m = n

0 if m = n

s.t. m, n ∈ Z+ ∪ {0} (35)

The normalized system of polynomials {ϕn(x)} where

ϕn(x) =Φn(x)

∥Φn(x)∥(36)

is said to be orthonormal if:

∫ b

aϕm(x)ϕn(x)w(x) dx =

1 if m = n

0 if m = n

s.t. m, n ∈ Z+ ∪ {0} (37)

Remark 11. One useful thing that you can do with orthogonal polynomials is to approximate (expand) poly-

nomials in terms of them. That comes from the properties of orthonormal basis in a vector space as you may

recall.

26

Definition 27 (Orthogonal Expansion). Let’s assume that we have a system {ϕn(x)} of orthogonal polynomials

over a < x < b with respect to w(x). Then any polynomial f(x) of degree n can be expanded as:

f(x) =

m∑k=0

am ϕm(x) s.t. am = ⟨f, ϕm⟩ =∫ b

af(x)ϕm(x)w(x) dx (38)

where m indicates the degree of approximation (i.e. an expansion in terms of the first m orthogonal polynomial).

A exercise 10. Using the definition of norm for the orthogonal polynomials, find an orthonormal basis of P(R)

with the basis {1, x, x2} over 0 < x < ∞ with respect to the weight function w(x) = e−x . These orthogonal

polynomials will be the first three Laguerre polynomials. Then using them, expand the function f(x) = e−2x with

the approximation degree of m = 3 by finding the coefficients am .

Now we will end this section by introducing a space that we talked about in many occasions during the note

i.e. the Hilbert space. First, note that here we will only deal with the finite-dimensional Hilbert space (which

is useful in learning basic Quantum Mechanics, for example when discussing spin, and of course it is essential to

Quantum Information and Quantum Computation as well) since dealing with infinite-dimensional case requires

lots of techniques from Analysis which is studied in Functional Analysis rather than Linear Algebra which deals

with finite-dimensional vector spaces. But we need to learn a bit about Metric Space first.

3.4 Metric Space

Definition 28 (Metric Space). A metric space (X, d) is a set X together with a real-valued function d : X×X →

R≥0 called metric or distance function such that ∀x, y, z ∈ X the following axioms hold:

D.1. d(x, y) = 0 iff x = y (definiteness)

D.2. d(x, y) = d(y, x) (symmetry)

D.3. d(x, z) ≤ d(x, y) + d(y, z) (sub-additivity or triangle inequality)

Now the important thing about metric space regarding our vector space is the following proposition that relates

metric to the norm:

Proposition 1. If ∥ · ∥p is a norm on vector space V then the lp-metric defined by:

dlp = ∥x− y∥p = p

√√√√ n∑i=1

|xi − yi|p (39)

is a distance on V turning it into a metric space.

27

Figure 2: Hierarchy of Mathematical Spaces

Remark 12. From the proposition 1, one can see that every normed vector space is indeed a metric space i.e. every

norm induces a metric. But the reverse does not hold always. The useful sketch in fig.2 (which I have borrowed it

from our beloved Wikipedia) shows the relationship between different Mathematical spaces. To familiar examples

of metric space is the set of real numbers with the metric d(x, y) = |x− y| and the set of continuous real-valued

functions on the interval [a, b] with the metric d(f, g) =∫ ba |f(x)− g(x)| dx . Now in order to proceed, we need to

define a Cauchy sequence.

Definition 29 (Cauchy sequence). A sequence {xn} of elements of a metric space (X, d) is called a Cauchy

sequence if for every ϵ > 0 there exist a number N such that n,m ≥ N implies d(xn, xm) < ϵ .

Example 13. Consider the sequence{xj =

∑jn=1

1n2

}of real numbers. Now let’s see whether this is a Cauchy

sequence or not. To do so, we need to find a convergence bound for elements xj and xk for j, k > N . Since the

summation to j is definitely smaller than the summation from N to infinity, hence:

|xj − xk| ≤∞∑

n=N

1

n2≤

∞∑n=N

1

n(n− 1)=

∞∑n=N

{1

n− 1− 1

n

}=

1

N − 1⇒ |xj − xk| ≤

1

N − 1

Now since we can make |xj − xk| arbitrarily small by taking N arbitrarily large, therefore {xn} is a Cauchy

sequence. In fact any convergent sequence is Cauchy. Fig.3 shows a sketch of a Cauchy sequence.

Definition 30. A metric space in which every Cauchy sequence converges is called a complete metric space

or Cauchy space.

Now we have a powerful proposition regarding our finite-dimensional spaces which states:

Proposition 2. Every Cauchy sequence in a finite-dimensional inner product space over R or C converges. That

is every finite-dimensional real or complex inner product space is complete with respect to the norm induced by its

inner product.

28

Figure 3: A Cauchy sequence where the distances of its elements gets smaller and smaller

3.5 Hilbert Space and Operator Algebra

Definition 31 (Hilbert Space). A complete inner product space (with respect to the norm induced by its inner

product) is a Hilbert Space which is denoted by H .

Remark 13. From proposition 2, we see that all finite-dimensional real or complex inner product spaces are real

or complex Hilbert spaces, like Cn and Rn.

Now if you recall, we said some stuff about Dirac notation and dual spaces when we were discussing about linear

functionals. You went through all these previous abstract concepts for a reason! That is to choose the Dirac

notation from now on without worrying about its Mathematical exactness. Often when Physicists speak of bra-

kets, then mean it as an inner product. But so far we have seen that inner product and bra-kets (which are in

fact linear functionals) are two complete different concepts. Fear not! There is a powerful theorem called Riesz

lemma or Riesz Representation Theorem that connects them. I will present it here in terms of Dirac notation.

Theorem 4 (Riesz lemma). Let H be a complex vector space and let H∗ be its dual. Then for |v⟩ , |w⟩ ∈ H and

⟨v| ∈ H∗ we have:

⟨v|w⟩ = ⟨ |w⟩ , |v⟩ ⟩ (40)

Remark 14. From now on, we will choose the Dirac notation and work in the finite-dimensional Hilbert space

(unless otherwise is stated) and denote vectors by kets and therefore their inner product will be:

∀ |v⟩ , |w⟩ ∈ H, with |v⟩ :=

v1

v2...

vn

, |w⟩ :=

w1

w2

...

wn

⇒

⟨v| :=(v1 · · · vn

)

⟨w| :=(w1 · · · wn

)

29

we have:

⟨v|w⟩ =(v1 v2 · · · vn

)

w1

w2

...

wn

=

n∑i=1

viwi (41)

This way we can define some useful concepts.

Definition 32 (Adjoint). Let α ∈ C, |v⟩ , |w⟩ ∈ H and the linear operators T,U ∈ L(V) . Then the adjoint or

Hermitian conjugate denoted by “ †” is defined such that:

a. (|v⟩)† = ⟨v| b. (T |v⟩)† = ⟨v|T † c. ( ⟨v|T |w⟩ )† = ⟨w|T †|v⟩


1. (αT )† = α T † 2. (T + U)† = T † + U † 3.(T †)† = T 4. (T U)† = U † T †

Example 14. Suppose T : C3 → C3 is defined by T

α1

α2

α3

=

α1 − iα2 + α3

iα1 − α3

α1 − α2 + iα3

. Find its adjoint T † .

Solution 13. Since from the definition we have ( ⟨v|T |w⟩ )† = ⟨w|T †|v⟩, therefore we first define

|v⟩ =

α1

α2

α3

, |w⟩ =

β1

β2

β3

then we use the left hand side and operate T on |w⟩ and then compare it to the right hand side to obtain its

adjoint:

( ⟨v|T |w⟩ )† =

(α1 α2 α3

)T

β1

β2

β3

†

=

(α1 α2 α3

)β1 − iβ2 + β3

iβ1 − β3

β1 − β2 + iβ3

†

= (α1β1 − iα1β2 + α1β3 + iα2β1 − α2β3 + α3β1 − α3β2 + iα3β3)†

=(β1α1 − iβ1α2 + β1α3 + iβ2α1 − β2α3 + β3α1 − β3α2 − iβ3α3

)

30

The last equation is nothing but the usual matrix dot product. Hence:

⟨w|T †|v⟩ =

⟨w|︷︸︸︷(β1 β2 β3

)α1 − iα2 + α3

iα1 − α3

α1 − α2 − iα3

︸︷︷︸

T †|v⟩

Therefore:

T †

α1

α2

α3

=

α1 − iα2 + α3

iα1 − α3

α1 − α2 − iα3

A exercise 11. Consider an operator A = x ddx defined on the Hilbert space with the inner product given by:

⟨f |g⟩ =∫ +∞

−∞f(x) g(x) dx

If both f, g vanish at ±∞ i.e. f(±∞) = g(±∞) = 0, then find A† .

Definition 33 (Expectation Value). The expectation value of an operator T in (for) the ket |v⟩ , denoted by

⟨T ⟩v is a number defined by:

⟨T ⟩v = ⟨v|T |v⟩ (42)

A exercise 12. Using the definition of inner product in exercise 11 and the condition within, find the expectation

value of operator p = −i ddx for the function of the form ψ(x) = eiθ f(x) , where f(x) is a real-valued function and

θ ∈ R .

Definition 34 (Hermitian Operator). An operator (matrix) H is Hermitian if H† = H. It is called anti-

Hermitian if H† = −H. The expectation value of a Hermitian operator is real.

A exercise 13. Check that whether these matrices are Hermitian or anti-Hermitian:

0 −i

i 0

a) 1√3

1 0 0

0 1 0

0 0 −2

b)

0 0 1 0

0 0 0 −1

−1 0 0 0

0 1 0 0

c)

31

Definition 35 (Positive Definite Operator). A Hermitian operator H on an inner product space is called

positive definite, denoted by H ≥ 0, if ∀ |v⟩ = |0⟩ : ⟨v|H|v⟩ ≥ 0 . It is called strictly positive if ⟨v|H|v⟩ > 0 .

A strictly positive operator is invertible and its inverse is denoted by H−1 . Unitary operators preserve the inner

product i.e.

⟨U |w⟩ , U |v⟩ ⟩ = ⟨v|U † U |w⟩ = ⟨v|1|w⟩ = ⟨v|w⟩ (43)

Definition 36 (Unitary Operator). An operator U is called a unitary operator if U † = U−1. Hence the

unitary operators satisfy U † U = 1 . If U is defined on a real vector space, it is called an orthogonal operator.

A exercise 14. Let the operator U : C2 → C2 be given by:

U

α1

α2

=

i α1√2− i α2√

2

α1√2+ α2√

2

Find U † and check if it is unitary.

A exercise 15. Show that the product of two unitary operators is always unitary.

Definition 37 (Outer Product). Outer product of the two kets |v⟩ , |w⟩ ∈ H is defined by:

|v⟩ ⊗ |w⟩ ≡ |v⟩ ⟨w| =

v1

v2...

vn

(w1 w2 · · · wn

)=

v1w1 v1w2 · · · v1wn

v2w1 v2w2 · · · v2wn

...... . . . ...

vnw1 vnw2 · · · vnwn

(44)

which can act on any ket |u⟩ ∈ H as:

(|v⟩ ⟨w|)(|u⟩) = |v⟩ ⟨w|u⟩ = ⟨w|u⟩ |v⟩ (45)

Definition 38 (Completeness Relation). Let {|ei⟩} be an orthonormal basis for V such that any vector can be

written in terms of them. Then: ∑i=1

|ei⟩ ⟨ei| = 1 (46)

which is known as completeness relation. This relation is very useful in Quantum Mechanics and operators like

P = |ei⟩ ⟨ei| such that P 2 = P are called projectors.

32

Remark 15. One useful application of completeness relation is to represent a linear operator in the outer product

notation. Let A : V → W be a linear operator , {|vi⟩} be an orthonormal basis of V and {|wj⟩} be an orthonormal

basis of W. Then we can represent A by:

A = 1W A1V =∑ij

⟨wj |A|vi⟩ |wj⟩ ⟨vi| (47)

such that ⟨wj |A|vi⟩ is the matrix element of A in the ith column and jth row with respect to the input basis {|vi⟩}

and output basis {|wj⟩} .

Now we turn ourselves to the important types of unitary transformations.

Definition 39 (Unitary Transformation). Suppose that we perform a unitary transformation on the Hilbert

space by acting a unitary operator U on its all vectors i.e. |v′⟩ = U |v⟩ . Under this transformation we have:

i. (Basis Change): Let {|ei⟩} be a basis such that any vector can be written as |v⟩ =∑n

i=1 vi |ei⟩, and let {|e′i⟩}

be a new basis under this transformation such that any vector in this basis is represented by |v′⟩ =∑n

i=1 v′i |e′i⟩ .

Now the operator U that will enable us to perform this basis change is a projector:

U =∑k

|e′k⟩ ⟨ek| (48)

and its matrix elements will be:

U =

⟨e1|e′1⟩ ⟨e1|e′2⟩ · · · ⟨e1|e′n⟩

⟨e2|e′1⟩ · · · · · · ⟨e2|e′n⟩...

... . . . ...

⟨en|e′1⟩ · · · · · · ⟨en|e′n⟩

(49)

Given a vector in the old basis |ei⟩, we can obtain its representation in the new basis |e′i⟩ as:

v′1

v′2...

v′n

= U †

v1

v2...

vn

(50)

ii. (Similarity Transformation): A linear operator X will change under this transformation as X ′ which we

call it similarity transformation such that:

X ′ = U †X U (51)

33

Example 15. In Quantum Mechanics, a single spin-12 particle (like electron) has a two dimensional Hilbert space

with the orthonormal basis {|+⟩ , |−⟩} with respect to the spin operator Sz i.e. spin along the z-axis. In this basis,

the spin basis along the x-axis can be represented by |Sx;±⟩ = 1√2(|+⟩ ± |−⟩) . Now assume that {|+⟩ , |−⟩} is

the old basis and |Sx;±⟩ is the new basis. Find the representation of the old in the new basis. Use the fact that

|+⟩ =

1

0

and |−⟩ =

0

1

Solution 14. First we need to find the unitary matrix that would get us to the new basis. We have:

U =

⟨+|Sx; +⟩ ⟨+|Sx;−⟩

⟨−|Sx; +⟩ ⟨−|Sx;−⟩

=1√2

⟨+|+⟩+ ⟨+|−⟩ ⟨+|+⟩ − ⟨+|−⟩

⟨−|+⟩+ ⟨−|−⟩ ⟨−|+⟩ − ⟨−|−⟩

Now since {|+⟩ , |−⟩} are orthonormal i.e. ⟨+|+⟩ = ⟨−|−⟩ = 1 , ⟨+|−⟩ = ⟨−|+⟩ = 0 we have:

U =1√2

1 1

1 −1

⇒ U † =1√2

1 1

1 −1

Therefore:

|+⟩ = 1√2

1 1

1 −1

1

0

=1√2

1

1

and |−⟩ = 1√2

1 1

1 −1

0

1

=1√2

1

−1

A exercise 16. A Hilbert space has the five basis defined on the surface of a unit sphere and expressed in

spherical coordinates by:

χ1 =

√15

4πsin θ cos θ cosφ , χ2 =

√15

4πsin θ cos θ sinφ , χ3 =

√15

4πsin2 θ sinφ cosφ

χ4 =

√15

16πsin2 θ

(cos2 φ− sin2 φ

), χ5 =

√5

16π

(3 cos2 θ − 1

)They are orthonormal with respect to the inner product ⟨f |g⟩ =

∫ π0 sin θ dθ

∫ 2π0 dφf(θ, φ) g(θ, φ) . This Hilbert

space can have another orthonormal basis:

χ′1 = −

√15

8πsin θ cos θeiφ , χ′

2 =

√15

8πsin θ cos θe−iφ , χ′

3 =

√15

32πsin2 θe2iφ

χ′4 =

√15

32πsin2 θe−2iφ , χ′

5 = χ5

Find the unitary matrix that describes the transformation from the unprimed to the primed basis.

34

Documents

دانشگاه تهرانphysics.ut.ac.ir/~khatibi/note.pdf · A Note to the Reader This outline note is intended to serve as a supplementary material to Linear Algebra part of the