Image Analysis, A Tutorial Part 1. Basicsernst/Lehre/ALA/WebPages03/Literature/... · Image...

Preview:

Citation preview

Image Analysis, A Tutorial

Part 1. Basics

Lawrence Sirovich

June 1, 2001

1 Introduction

Generally, in these notes we will be interested in image data. (The underlyinganalysis and mathematical apparatus encompasses more general frameworks.)As an illustration, and one which we will return to for inspiration, consider anensemble of pictures of human faces. Six exemplars are shown below (of a collec-tion of 280 photos that are located in http://camelot.mssm.edu/imaging.html).Symbolically, we can represent the ensemble of such images by

f = f(t,x), (1)

where t is the index of the image (this might be the timestampof when thephoto was taken), x the pixel location and f is the gray level of image t atpixel location x. Figure 1 exhibits f(t,x) for t = 1, .., 6. A word about notation:In what follows I will be cavalier about variables being either continuous ordiscrete - virtually everything will be independent of this distinction. This givesus the option of thinking of a continuous or discrete model, whichever is moreconvenient.

Each image in figure 1 contains roughly 128 × 128 = O(104) pixels, andtherefore each face in the ensemble is a point in a vector space of O(104) di-mensions - the point location tells us the gray level at each pixel location in thisvector space. One says that the state space of the ensemble of faces has O(104)dimensions.

The issue that I want to take up is whether there is a best space in whichto represent the faces. What is meant by best will emerge below, but for themoment we take this to mean that we seek ways of identifying or locating anymember of the collection, or for that matter any face by means more concisethan individual pixel specification. This challenge is referred to as the Rogue’sGallery Problem. 1

1This problem was introduced in 1987 (Sirovich and Kirby 1987), see also (Kirby andSirovich 1990), and was originally intended as a hobby horse problem for more complicatedproblems in turbulence. Subsequent face recognition technology has its origin in these twopapers.

1

Figure 1: Six chosen from a collection of 280 faces. Note that the sixth face isthe mirror image of the second.

2 Mathematical Preliminaries

There are diverse ways of arriving at the solution of the problem, and in thespirit that ”more is better” I will indicate some of these later. To start wepoint out that the state space description treats each pixel independently, andthus is not sensitive to the fact that adjacent pixels might be correlated. Wecan imagine that a more efficient description will be attained if globally smoothfunctions are used, instead of specifying individual pixels, since such functionscarry implied (spatial...or pixel-pixel) correlations. With this in mind supposefor any integer N, {ϕn(x)}, n = 1, ...N is a collection of, as yet unknown, func-tions for the description of the faces. Since it is good practice we take these tobe orthonormal.

(ϕn, ϕm)x =∑x

ϕn(x)ϕm(x) = δnm. (2)

(If x is continuous∑

x →∫dx.) To pose the problem we ask for what collection

of orthonormal functions, {ϕn(x)}, is there a best fit to f(t,x) in the sense that

2

we can choose constants {αn(t)} so that

〈‖f(t,x)−N∑

n=1

αn(t)ϕn(x)‖2x〉t (3)

is a minimum for any fixed integer N . (〈〉t represents the average over the indext.) Clearly if we can find such sets {ϕn(x)} and {αn} we have obtained a bestset in the sense that for any N we commit the smallest error, εN , in writingf(t,x) as

f(t,x) =N∑

n=1

αn(t)ϕn(x) + εN (t,x). (4)

As a first step in solving the posed problem, we first observe that the pro-jection of f onto the space spanned by {ϕn}, n = 1, .., N is given by

PNf =N∑

n=1

(ϕn, f)ϕn. (5)

Next, it is straightforward that the quantity that appears under the average in(3) satisfies the Pythagorean relation

‖f −N∑

n=1

αnϕn‖2x = ‖f − PNf + PNf −N∑

n=1

αnϕn‖2x

= ‖f − PNf‖2x + ‖PNf −N∑

n=1

αnϕn‖2x,

(6)

and we leave this as an Exercise. The first leg of the triangle in the second formof (6) is independent of the choice of the {αn} and depends only on f . Fromthis it follows that for any orthonormal set {ϕn} the best choice of the {αn} isone for which the second term of (6) vanishes, and hence such that

αn = (ϕn, f) (7)

In other words the best fit to f is PNf , the projection of f onto the spacespanned by {ϕn}, a fact which is surely intuitive.

This still leaves open the determination of the set {ϕn}. To answer this wedigress for a moment and construct the spatial covariance

K(x,y) = T 〈f(t,x)f(t,y)〉t = K(y,x), (8)

where the factor T , the number of snapshots, is introduced for convenience.Regarded as an operator, K is symmetric and non-negative, i.e., for arbitraryψ(x)

(ψ,Kψ)x = (Kψ,ψ)x ≥ 0 (9)

3

where

Kψdef=

∑y

K(x,y)ψ(y). (10)

(Recall that under the convention∑

y ↔∫dy both the discrete and continuous

cases are being treated.) The assertion (9) follows immediately from (8).Assertion: The sought after set {ϕn}, n = 1, .., N is given by the first N

eigenfunctions of K(x,y),

Kϕn = λnϕn, (11)

under the ordering convention

λ1 ≥ λ2 ≥ λ3 ≥ .... (12)

Under standard theorems of linear algebra (or analysis) sinceK is symmetric,(8), we are guaranteed that the {ϕn} exist and can be taken as orthonormal.

We skip the proof of the above Assertion (An adequate demonstration iscontained in the following exercise) and instead pursue a different line below.

Exercise. (a) Show that the maximum of

C(φ) = 〈‖f, φ‖2x〉t (13)

for ‖φ‖2x = 1 is given by (11) for n = 1.(b) Show that the maximum of C for (φ, φ1)x = 0 and ‖φ‖2x = 1, is given by

(11) for n = 2. And so on.As already mentioned several routes to the development are possible and we

now take a seemingly unrelated approach.Suppose {ϕn(x)} are an unspecified complete set of orthonormal functions

(we use the symbol ϕ to denote these for good reason) and {an(t)} a set oforthonormal functions in t-space,

(an, am)t =T∑

t=1

an(t)am(t) = δmn. (14)

Next we require that an and ϕn be such that

f(t,x) =∑

n

µnan(t)ϕn(x) (15)

where {µn} represent a still to be determined set of constants. The reader shouldnote that such an expansion, i.e. a sum of products of orthogonal functions,generally cannot be assumed to exist priori , and even might be regarded asremarkable. If (14) and (2) are applied to (15) we obtain

an = (ϕn, f)x/µn (16)

and

4

ϕn = (an, f)t/µn. (17)

If (16) is back substituted into (17) we obtain

Kϕn = λnϕn(x), (18)

which is just (11) with λn = µ2n. Thus we have shown that the assertion of (15)

leads to the same eigenfunction framework as the optimization principle posedearlier.

Numerically, the eigenfunction problem posed by (11) appears to be formidable.Formally, the order of the matrix is equal to the number of pixels in a typicalimage. Thus for the modest case of 128 × 128 pixels, the order of K is 0(104),and diagonalization becomes computationally challenging. Fortunately otherconsiderations can simplify the problem. From (8) we have that

K(x,y) =T∑

t=1

f(x, t)f(y, t). (19)

A kernel of an integral equation which is such a sum of products is said to bedegenerate and can be reduced considerably. In particular substitution of (19)into (11) shows that every eigenfunction ofK is an admixture of the T snapshots,f(t,x). It immediately follows from this that the eigenfunction problem reducesto the diagonalization of a T × T matrix. Thus if T = 1000 a nominal amountof images, the eigenfunction problem is reduced by two orders of magnitude.This reduction is known as the snapshot method (Sirovich 1987). Instead ofdeveloping this we show it next by an essentially equivalent procedure.

Instead of dealing with (18) we can back substitute (17) into (16) in theabove to obtain

Can =T∑

s=1

C(t, s)an(s) = λnan(t) (20)

where

C = C(t, s) = (f(t,x), f(s,x))x, (21)

which is a proportional form of the auto-covariance. Clearly the symmetric non-negative matrix, C(t, s) = C(s, t) generates an orthonormal eigenbasis and is oforder no more than T .

Since both (11) and (20) lead to symmetric operators, this implies that {an}and {ϕn} satisfying (14) and (2) exist and hence the assertion that we canexpand in the form (15) has been demonstrated. We note that only one of thetwo eigenfunction problems (20) or (11) need be solved. For once one set ofeigenfunctions is determined the complementary set is determined from (16) or(17), whichever is appropriate.

5

3 Rogues Gallery Problem

We illustrate the procedure by returning to the problem introduced earlier. Anensemble of faces can be found at http://camelot.mssm.edu/imaging.html (Adescription of these as well as instructions for accessing the database is availableon the website.) These images have been prepared so that eye location has beenput in a fixed position, lighting has been normalized and so forth. As is clearfrom Figure 1 the 128× 128 pixels furnish an adequately resolved image.

The number of faces in the ensemble is 280, but this number is in realitydouble an original set of 140 faces. To explain the doubling, we denote theoriginal set of faces by

fn(x, y), n = 1, ...140. (22)

Here y denotes a vertical variable and x denotes the horizontal variable measuredfrom the midline of the face (The perpendicular bisector of the line segment be-tween the eyes.) The original ensemble has been doubled by including fn(−x, y),the mirror image of each face. As noted in the caption the sixth face of Figure1 is the mirror image of the second.

Obviously each image generated in this way is an admissible face, and onemight believe that in a large enough population of faces, something like a mir-ror image face might be realized eventually. Aside from doubling our originalensemble this maneuver has the interesting consequence that it makes everyeigenfunction either even in the midline

ϕn(x, y) = ϕn(−x, y) (23)

or odd in the midline

ϕn(−x, y) = −ϕn(x, y). (24)

And even though the ensemble size has been doubled to 280, we only have todiagonalize two order 140 matrices.

Exercise: For the above described ensemble in which for each face the mirrorimage face also appears, prove that the eigenfunctions are either odd or even inthe mid-line.(Hint:

K =∑

n

fn(x, y)fn(x′, y′)

=12

∑n

(fn(x, y) + fn(−x, y))(fn(x′, y′) + fn(−x′, y′))

+12

∑n

(fn(x, y)− fn(−x, y))(fn(x′, y′)− fn(x′, y′))

= K+ +K−

(25)

6

Any eigenfunction of K can be found from K±ϕ = λϕ.

Figure 2: The first 6 eigenfaces arranged in descending order of eigenvalues, λn,for the ensemble specified in Figure 1.

The first six eigenfunctions are shown in Figure 2, and we observe that thefirst five are even, and only at the sixth eigenfunction does oddness appear.

Of interest is the range of values of the eigenvalues. It is clear from theexpansion (15) that λn measures the degree of importance of the correspondingan and ϕn(and an). Another way of seeing this comes from the observation that

λn = T 〈‖(ϕn, f)2x‖〉t, (26)

so that λn is a clear measure of the energy of the corresponding eigenfunction.It is natural to associate the eigenfunctions, at the end of the range with

noise, and in figure 3 we show the last three eigenfunctions, all of which areodd in the mid-line. The fact that these eigenfunctions have remnants of a face(look for example at the teeth) shows the shortcomings of dealing with such asmall population.

7

Figure 3: The last 3 eigenfaces arranged in descending order of eigenvalues, λn,for the ensemble specified in Figure 1.

In figure 4 we plot λn versus n for the Rogues Gallery problem. It mightreasonably be assumed that noisy eigenfunctions might not be important. Tojudge how many such negligible eigenfunctions are present we run a straightline through the log of the noisy eigenvalues and note that it departs from theplot at an index of n ≈ 100. Thus each face of the ensemble should be wellrepresented by a sum truncated at N = 100,

0 50 100 150 200 25010

4

106

108

1010

Figure 4: The eigenvalues as a function of their index, for the ensemble specifiedin Figure 1.

8

f(t,x) ≈100∑n=1

an(t)φn(t). (27)

In figure 5 we show a typical approximate representation of an in-ensembleface. This approximation may be given an analytical form. Let us consider therelative error in taking only N terms,

Figure 5: A typical in-ensemble face appears in the upper left. Its approximationby 100, 66 and 33 eigenfunctions appear in clockwise order.

εN =‖f −

∑Nn=1 an(t)φn(x)‖2x‖f(t,x)‖2

=‖

∑Tn=N+1 an(t)φn(x)‖2x

‖f(t, x)‖2x(28)

where T is the total number in the ensemble. It is plausible that each imagehas the same power

‖f(t,x)‖2x ≈1T

∑t

‖f(t,x)‖2x =1T

T∑n=1

λn (29)

9

and therefore

〈εN 〉t ≈∑T

N+1 λn∑Tn=1 λn

. (30)

For the Rogues Gallery problem, with T = 280 and N = 100, we get√〈εN 〉t ≈ .03 (31)

so that there is an average 3% error in signal.The ensemble of 280 faces is a relatively modest size population and the

resulting eigenfunctions, {φn}, cannot be regarded as being robust. We cannevertheless try these to fit an out-of-ensemble face. The result is shown in figure6. Thus while the fit, unlike the in-ensemble faces, is equivocal at N = 100, itis reasonably good if the full set of eigenfunctions is used.

Figure 6: A non-ensemble face appears at the upper left. Its approximation byall 280 eigenfunctions appears next and by 100 at the right.

Roughly speaking, we see that the empirical eigenfunctions reduces the di-mensionality of face space from 0(104) to 0(102). This reduction is accomplishedat a cost. A great many objects can be imaged with an assembly of 100 × 100pixels. But only faces can be constructed with the O(100) or so eigenfaces thatare significant. This is illustrated in the next figure in which we try to fit amonkey face with the set of eigenfaces.

10

Figure 7: A monkey face appears at the left.Its approximation by all 280 eigen-functions appears next and by 100 at the right.

Remarks

It has been seen that for the dataset f(t,x) of T images and P () pixels,that there are at most T eigenvalues. (All of which are non-negative.) Howeverif we solve in pixel space there would appear to be a far greater number ofeigenvalues, namely P in number.

The apparent discrepancy is easily resolved. For in pixel space each imagef(t,x),for t fixed, can be thought of as a vector, and thus the entire databaseis composed of T vectors. Clearly there must be P − T vectors orthogonal toour database and hence K has a large nullspace, i.e. λn = 0, a degeneracy of atleast order P − T .

Another issue of interest concerns the time-like index t. As a little reflectionreveals, if the sequences of images is shuffled, the eigenpictures ϕn(x) remain thesame. This invariance of ϕn under shuffling is of course not true for the an(t),each of which in fact undergoes a predictable shuffled time course under thetransformation. For the Rogue’s Gallery Problem the order in which we acquirepictures is of no consequence. However, in many applications the time courseis important since it may reflect the dynamics underlying the phenomenon. Insuch cases the dynamical evolution of the model coefficients {an(t)} takes animportance equal to the modes {ϕn} themselves.

4 Singular Value Decomposition

We can go back to the decomposition

f(t,x) =T∑

n=1

an(t)µnϕn(x) (32)

11

and cast it into the framework of matrix theory. For each t we construct avector of pixels by concatenating the rows of the image. The resulting matrix,denoted by M, is

M =

f(1, 1) f(1, 2) . . . f(1, P )f(2, 1) f(2, 2) . . . f(2, P )

......

f(T, 1) . . . . . . . . . . . f(T, P )

, (33)

and hence each row of the matrix is an image of the ensemble. We perform alike operation on the an(t)

An =

an(1)...

an(T )

(34)

and also on the ϕn(x) and write

Vn =

ϕn(1)...

ϕn(P )

. (35)

In such terms (32) can be re-expressed as

M =T∑

n=1

AnµnV†n, (36)

or if we assemble the matrices

A =[A1 A2 . . . AT

↓ ↓ ↓

],U =

µ1 0. . .

0 µT

(37)

and

V =

V1 V2 . . . VT

↓ ↓ ↓

(38)

then (36) and hence (32) can be written as

M = AUV† (39)

in which the columns of A are orthonormal, as are the columns of V. This isknown as the singular value decomposition (SVD) of the matrix M.

The SVD is a general result for arbitrary matrix arrays. The assertion thatan arbitrary matrix M, has the decomposition (36) with the sets {An} and{Vn}, each orthonormal, leads to their construction. This, in essence, is what

12

was proven in section 2. SVD is also a standard numerical tool and may befound for example as a function in Matlab.

It is interesting to note that any image itself may be regarded as a matrix.Thus any face can be put into the form

F =

f(x1, y1) f(x2, y1) . . . f(xN , y1)f(x1, y2) f(x2, y2) . . . f(xN , y2)

......

f(x1, yM ) . . . f(xN , yM )

(40)

in terms of the gray scales at each of the M × N pixel locations. If SVD isapplied to F we get, say (suppose M ≤ N)

F = auv† (41)

where a is M ×N , u is M ×M and v is N ×M .

0 20 40 60 80 100 120 14010

−12

10−10

10−8

10−6

10−4

10−2

100

Normalized Eigenvalue Spectrum

index

eige

nval

ue

Figure 8: Spectrum of a face regarded as a matrix.

To illustrate the use of SVD on an image regarded as a matrix we consideranother in-ensemble face . Viewed as a matrix this has the SVD eigenvaluespectrum shown in Figure 9. the precipitous fall off of the eigenvalues suggeststhat we can compress an image by taking fewer than M elements from a andv. This is illustrated in figure 10.

13

Figure 9: A typical face,lower right, is approximated by 1,2,3,4,5,10 and 20principal components as indicated in the figure.

Acknowledgement. I am grateful to Takeshi Yokoo and Andrew Sornborgerfor help and suggestions in the preparation of these notes.

References

Kirby, M. and L. Sirovich (1990). Application of the Karhunen-Loeve pro-cedure for the characterization of human faces. IEEE Transactions onPattern Analysis and Machine Intelligence 12 (1), 103–108.

Sirovich, L. (1987). Turbulence and the dynamics of coherent structures, partsi, ii, and iii. Quarterly of Applied Mathematics XLV (3), 561–590.

Sirovich, L. and M. Kirby (1987). Low-dimensional procedure for the charac-terization of human faces. J. Optical Society of America 4, 519–524.

14