8
Vol. 6, No. 6/June 1989/J. Opt. Soc. Am. A 827 Group-theoretical model of feature extraction Reiner Lenz Linkbping University, S-58183Linkbping, Sweden Received August 11, 1988; accepted January 30, 1989 A group-theoretically motivated investigation of feature extraction is described. A feature extraction unit is defined as a complex-valued function on a signal space. It is assumed that the signal space possesses a group- theoretically defined regularity that we introduce. First the concept of a symmetrical signal space is derived. Feature mappings then are introduced on signal spaces and some properties of feature mappings on symmetrical signal spaces are investigated. Next the investigation is restricted to linear features, and an overview of all possible linear features is given. Also it is shown how a set of linear features can be used to construct a nonlinear feature that has the same value for all patterns in a class of similar patterns. These results are used to construct filter functions that can be used to detect patterns in two- and three-dimensional images independent of the orientation of the pattern in the image. Finally it is sketched briefly how the theory developed here can be applied to solve other, symmetrical problems in image processing. 1. INTRODUCTION Feature extraction is one of the most important tasks in pattern recognition. In this paper a general model of a feature-extraction unit is investigated. Basically we shall assume that such a unit transforms an input signal s into a response signal f(s). We call f(s) the feature extracted from the input signal s. For the purpose of this paper we can consider the input signal s a part of an image detected by a camera or the retina. The feature-extraction unit is then a cell or a pattern-recognition algorithm that analyzes the incoming gray-value pattern. The computed result f(s) is a description of the incoming pattern s. In the case in which we consider the feature-extraction unit to be a cell, we can say that the incoming signal s describes the output of the cells in the receptive field of the feature-extraction cell. The output f(s) is the reaction of the feature-extraction cell to the incoming signal. The purpose of this paper, however, is not a description of natural cells but the development of a mathematical model that is based on mathematically moti- vated criteria. The feature-extraction unit obviously is specified com- pletely if weknow how the response f depends on the input s. Designing such a unit is equivalent to constructing the map- ping f. Many approaches for designing such units use an ad hoc strategy depending on the problem at hand. Others study natural systems, such as the human vision system, to find out how well working systems have solved the problem. Many of these approaches have in common that they treat the set of input signals as just an unstructured set. In reality, however, we find that the space of input signals is often highly structured in the sense that the space can be partitioned into different subspaces, in which each subspace consists of essentially the same signals. In this paper we assume that the signal space can be partitioned into sub- spaces in such a way that all the signals in a subspace are connected to one another with the help of a group-theoreti- cally defined transformation. We shall also say that the subspaces are invariant under the group operation or that the signal space is symmetrical. As an example, consider a feature-extraction unit analyz- ing the gray-value distribution in the neighborhood of a point in an image. Assume further that the goal of this unit is to find out whether the incoming gray-value pattern is edgelike. We shall refer to this problem as the two-dimen- sional (2-D) edge-detection problem. In what follows, we shall denote by s,, Sh, and sn the gray-value distributions of a vertical edge, a horizontal edge, and a noise pattern, respec- tively. In the case in which we assume that the signal space is unstructured, we would treat s,, Sh, and Sn as just three input patterns. We would completely ignore the additional information that s, and Sh have more in common than s, and sn. In contrast to this unstructured approach, we shall in what follows describe a method that is based on the observa- tion that an effective feature-extraction unit should use the additional information. In this theory we shall divide the space of input signals into subspaces of "essentially the same signals." (This expression is defined in Section 2.) We call such a subspace an invariant subspace. In the edge-detec- tion example one such subspace would consist of all rotated versions of a fixed gray-value distribution. Now s, and Sh lie in the same subspace, whereas Sn lies in another subspace. We can thus say that the similarity between s, and Sh is greater than the similarity between s, and s,,. The purpose of this paper is to show how to construct a feature-extraction unit that can use the regularity of the space of input signals. We show that such a unit computes two essentially equal feature values f(s 1 ) and f(s 2 ) for two essentially equal signals s 1 and S2. We demonstrate also that the unit can recognize all signals in a given invariant sub- space once it has seen one member of it. In the edge- detection example mentioned above, we could say that such a unit can recognize a whole class of patterns, the class of edgelike gray-value distributions. Furthermore, the system can recognize all edges once is has encountered one special example of an edge. In Section 2 we develop the concept of a symmetrical signal space. We then review some important results from the theory of compact groups. These results then are used to get an overview of all linear feature-extraction units on 0740-3232/89/060827-08$02.00 © 1989 Optical Society of America Reiner Lenz

Group-theoretical model of feature extraction

  • Upload
    reiner

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Group-theoretical model of feature extraction

Vol. 6, No. 6/June 1989/J. Opt. Soc. Am. A 827

Group-theoretical model of feature extraction

Reiner Lenz

Linkbping University, S-58183 Linkbping, Sweden

Received August 11, 1988; accepted January 30, 1989

A group-theoretically motivated investigation of feature extraction is described. A feature extraction unit isdefined as a complex-valued function on a signal space. It is assumed that the signal space possesses a group-theoretically defined regularity that we introduce. First the concept of a symmetrical signal space is derived.Feature mappings then are introduced on signal spaces and some properties of feature mappings on symmetricalsignal spaces are investigated. Next the investigation is restricted to linear features, and an overview of all possiblelinear features is given. Also it is shown how a set of linear features can be used to construct a nonlinear feature thathas the same value for all patterns in a class of similar patterns. These results are used to construct filter functionsthat can be used to detect patterns in two- and three-dimensional images independent of the orientation of thepattern in the image. Finally it is sketched briefly how the theory developed here can be applied to solve other,symmetrical problems in image processing.

1. INTRODUCTION

Feature extraction is one of the most important tasks inpattern recognition. In this paper a general model of afeature-extraction unit is investigated. Basically we shallassume that such a unit transforms an input signal s into aresponse signal f(s). We call f(s) the feature extracted fromthe input signal s. For the purpose of this paper we canconsider the input signal s a part of an image detected by acamera or the retina. The feature-extraction unit is then acell or a pattern-recognition algorithm that analyzes theincoming gray-value pattern. The computed result f(s) is adescription of the incoming pattern s. In the case in whichwe consider the feature-extraction unit to be a cell, we cansay that the incoming signal s describes the output of thecells in the receptive field of the feature-extraction cell.The output f(s) is the reaction of the feature-extraction cellto the incoming signal. The purpose of this paper, however,is not a description of natural cells but the development of amathematical model that is based on mathematically moti-vated criteria.

The feature-extraction unit obviously is specified com-pletely if we know how the response f depends on the input s.Designing such a unit is equivalent to constructing the map-ping f. Many approaches for designing such units use an adhoc strategy depending on the problem at hand. Othersstudy natural systems, such as the human vision system, tofind out how well working systems have solved the problem.Many of these approaches have in common that they treatthe set of input signals as just an unstructured set. Inreality, however, we find that the space of input signals isoften highly structured in the sense that the space can bepartitioned into different subspaces, in which each subspaceconsists of essentially the same signals. In this paper weassume that the signal space can be partitioned into sub-spaces in such a way that all the signals in a subspace areconnected to one another with the help of a group-theoreti-cally defined transformation. We shall also say that thesubspaces are invariant under the group operation or thatthe signal space is symmetrical.

As an example, consider a feature-extraction unit analyz-ing the gray-value distribution in the neighborhood of apoint in an image. Assume further that the goal of this unitis to find out whether the incoming gray-value pattern isedgelike. We shall refer to this problem as the two-dimen-sional (2-D) edge-detection problem. In what follows, weshall denote by s,, Sh, and sn the gray-value distributions of avertical edge, a horizontal edge, and a noise pattern, respec-tively. In the case in which we assume that the signal spaceis unstructured, we would treat s,, Sh, and Sn as just threeinput patterns. We would completely ignore the additionalinformation that s, and Sh have more in common than s, andsn. In contrast to this unstructured approach, we shall inwhat follows describe a method that is based on the observa-tion that an effective feature-extraction unit should use theadditional information. In this theory we shall divide thespace of input signals into subspaces of "essentially the samesignals." (This expression is defined in Section 2.) We callsuch a subspace an invariant subspace. In the edge-detec-tion example one such subspace would consist of all rotatedversions of a fixed gray-value distribution. Now s, and Sh liein the same subspace, whereas Sn lies in another subspace.We can thus say that the similarity between s, and Sh isgreater than the similarity between s, and s,,.

The purpose of this paper is to show how to construct afeature-extraction unit that can use the regularity of thespace of input signals. We show that such a unit computestwo essentially equal feature values f(s1 ) and f(s 2 ) for twoessentially equal signals s1 and S2. We demonstrate also thatthe unit can recognize all signals in a given invariant sub-space once it has seen one member of it. In the edge-detection example mentioned above, we could say that sucha unit can recognize a whole class of patterns, the class ofedgelike gray-value distributions. Furthermore, the systemcan recognize all edges once is has encountered one specialexample of an edge.

In Section 2 we develop the concept of a symmetricalsignal space. We then review some important results fromthe theory of compact groups. These results then are usedto get an overview of all linear feature-extraction units on

0740-3232/89/060827-08$02.00 © 1989 Optical Society of America

Reiner Lenz

Page 2: Group-theoretical model of feature extraction

828 J. Opt. Soc. Am. A/Vol. 6, No. 6/June 1989

symmetric signal spaces, and we shall derive some importantproperties of these feature-extraction units. The generaltheory is illustrated here with two examples in which thesignals are gray-value distributions and the regularities orsymmetries are defined with the help of the group of 2-D orthree-dimensional (3-D) rotations.

2. DESCRIPTION OF THE REGULARITIES INTHE SIGNAL SPACE

From the edge-detection example in the Introduction we cansee that we would like to introduce some type of structure inthe space of all possible input signals. In this section weshall introduce a method that allows us to divide the space ofall possible input patterns into subspaces of essentially thesame patterns.

When we develop our model of a regular or a symmetricalsignal space it might be useful to have a concrete example inmind against which we can compare the general definitionsand constructs. We shall use mainly the familiar 2-D edge-detection problem as such an example. The receptive fieldof the feature-extraction unit consists now of a number ofdetectors located on a disk. If x denotes a point on this disk,then we denote by s(x) the intensity measured by the detec-tor located at x. The input to the unit, the signal s, is in thiscase the light intensity measured by the sensors on the disk.Since the detectors are located on a disk, we can assume thats is a function defined on some disk. For notational simplic-ity we can assume that this disk has a radius of 1, and we alsoassume that the function s is square integrable. We denotethe unit disk by 0O, and we assume that our signal space is thespace of all square-integrable functions on the unit disk.This space is denoted by L2(AD). Two typical step edgesfrom this space are shown in Fig. 1.

From functional analysis we know that this signal spaceforms a Hilbert space; i.e., there is a scalar product (,) withwhich we can measure the length of a signal and the cosine ofthe angle between two signals (for an exact definition of aHilbert space the reader is referred to Appendix A and theliterature; see, for example, Refs. 1 and 2). These two valuesare given by IsI12 = (s, s) and 1Isil IIs2 1cos(si, 2) = (s1, s 2 ).Furthermore, we can compute the distance between twosignals as |s1 - 211. This is about all that we can say aboutthe signals as elements of the Hilbert space. In our casethese measurements can be quite misleading, as Fig. 1 shows.As elements of the Hilbert space these two signals are or-thogonal, i.e., as uncorrelated or different as possible. At ahigher level of understanding they are, however, nearly iden-tical: they are only rotated against each other. We thusmust add a higher level of similarity to our basic, low-levelHilbert-space model.

Fig. 1. Two orthogonal, equivalent gray-value distributions.

In the edge-detection problem we introduce such a simi-larity by defining two gray-value distributions to be essen-tially equal if they are rotated versions of each other. Wethus have a higher-level symmetry or a regularity defined bythe set of all rotations. In this section we show how to builda model that describes such a regular signal space. We theninvestigate what can be said about feature-extraction unitsthat can use these regularities.

We begin by introducing some basic concepts from grouptheory that will be of fundamental importance below.

Definition 11. A transformation of a space X is a continuous, linear,

one-to-one mapping of X onto itself. The set of all transfor-mations of X is a group under the usual composition ofmappings, and it is denoted by GL(X).

2. A regular or a symmetrical signal space is a triple (H,G, T) of a Hilbert space H, a compact group G, and a map-ping T: G - GL(H); g - T(g) from the group G into thegroup of transformations of H. T preserves also the groupstructure; i.e., we have T(g1g2) = T(g1 )T(g 2 ) for all g1, g2 E G.The Hilbert space H is called the signal space, the group G iscalled the symmetry group of the symmetrical signal space,and the mapping T is called its representation.

3. If g e G is a fixed group element and s e H is a fixedsignal, then T(g)s is the transformed version of s. Some-times we shall also write s9 instead of T(g)s.

In our edge-detection example these definitions translateas follows: The Hilbert space H is the signal space L2(AD) ofsquare-integrable intensity distributions on the unit disk.If s H is a given intensity distribution and g is a 2-Drotation, then we can map this signal s(x, y) onto the rotatedintensity distribution s[g-1 (x, y)]. Now we select a fixedrotation g. The mapping T(g) that maps a function s E H toits rotated version is obviously a linear mapping from H intoH. The symmetry group G is in this example the group of 2-D rotations, denoted by SO(2). The mapping T maps therotation g to the rotation operator T(g), where [T(g)s](x, y)= s[g-1(x, y)]. Note that T(glg2)s = s[(g1g2 )-'] =S(92-19-1)= T(g 1 )[s(g2 '1)] = [T(gD)T(g 2)]s, which explains the use ofthe inverse of g in the definition of T.

For the definition of a compact group and groups in gener-al the reader is referred to Appendixes A and B and theliterature (see, for example, Refs. 3 and 4). Our definition ofa symmetrical signal space is slightly more restrictive than isnecessary; for most of our results it would have been suffi-cient to require that the group G be locally compact insteadof compact. The resulting theory is, however, technicallymore complicated, and therefore we shall restrict ourselvesin what follows to compact groups (for a treatment of thesimplest noncompact, noncommutative case, the reader mayconsult Refs. 5 and 6). Among the possible symmetricalsignal spaces there are, of course, also uninteresting ones.As an example, take L2(D) as the Hilbert space and G =SO(2) as described above. As representation, select themapping g '-- id, where id is the identity in GL(H). Forevery signal s E H and all g E G we have thus T(g)s = s; theonly transformed version of the signal is the signal itself.This example demonstrates that the mapping T is responsi-ble for the power of the constraints that the symmetry grouphas on the signal space: In the case in which all groupelements are mapped to one element, the identity, we have

Reiner Lenz

Page 3: Group-theoretical model of feature extraction

Vol. 6, No. 6/June 1989/J. Opt. Soc. Am. A 829

no constraint at all, and the pattern classes consist of oneelement only. If the mapping T is such that for a fixed s theset T(g)slg e G is a large set of signals, then the constraintsimposed by the group are powerful.

We now define exactly what we mean when we speak ofessentially the same signals. If (H, G, T) is a symmetricalsignal space, then we define two signals 1 , 2 H as GTequivalent if there is a g e G such that T(g)sl = 2; in thiscase we write S2 = sie or s1GTs2. If p is an element in H, thenwe denote the set of all signals s e H that are equivalent to pby pG = S e HIs = p9; g e G. We call pG a pattern class andcall p the prototype pattern of pG. The set of all patternclasses in H is denoted here by H/G. We shall also denote acomplete set of prototype patterns by H/G, keeping in mindthat this is an inexact notation.

From the properties of T it is clear that GT equivalence isan equivalence relation and that H can thus be partitionedinto pattern classes. This means that H is the disjunctunion of the different pattern classes; i.e., H = UH/GPG, andif sJG n s2G 5- 0, then S1G = 2 G. These definitions describethe high-level similarity referred to above: two signals rep-resent essentially the same pattern if they lie in the samepattern class, independent of their Hilbert-space relation.

In Section 3 we introduce the definitions and results fromthe theory of group representations that are used in the restof the paper. In Section 4 we develop the general theory offeature extraction, and in Section 5 we apply this theory tothe edge-detection problem.

3. SOME RESULTS FROM THE THEORY OFCOMPACT GROUPS

We now collect some important results from the theory ofcompact groups. The interested reader may consult theliterature (for example, see Refs. 3, 4, and 7) for a detailedtreatment.

Let H be a finite-dimensional Hilbert space, and let T:G- L(H) be a representation. If we select a fixed basis in H,then we can describe every linear mapping in GL(H) by amatrix. The representation T maps thus a group element gonto a complex matrix T(g) such that those matrices satisfythe equation T(gg2) = T(g1)T(g2) for all group elements g,and g2. Now, assume further that b = fel, .. , eNj and b' =lei', ... , eN' are two sets of basis elements connected bymeans of the matrix multiplication b' = Ab. In the coordi-nate systems defined by these two basis sets the map T(g)will then be described by the two matrices T(g) and i'(g),respectively. It is easy to show that the two matrices T(g)and T'(g) are linked by the equation T(g) = AT'(g)A-1. Amapping .:G - GL(n, C) as defined above is called a matrixrepresentation of G. For notational simplicity we some-times ignore the difference between the representation Tand its corresponding matrix representation. In this casewe also denote the matrix representation by T.

The next definition introduces some important propertiesof representations.

Definition 21. A representation is called a finite representation if H

is finite dimensional.2. Let T and 7' be a two finite representations in Hilbert

spaces of the same dimension. These two representationsdefine two matrix representations 'P and 'P. The two repre-

sentations are called equivalent if there is a matrix A suchthat r'(g) = AT(g)A-1 for all g e G.

3. A finite representation T of G is called reducible ifthere is a basis e1,. . ., em, em+1,. . ., eNj of H such that all themaps T(g) leave the two spaces generated by le,, . . ., em} andem+1, . .. , eN} invariant. A representation is irreducible if it

is not reducible.4. If T is reducible, then we can find a basis of H such

that the matrices T(g) with respect to this basis have theform

P(g) = [P(g) 0]

with m X m matrices T(g) and (N - m) X (N - m) matricesT 2(g).

5. We say also that the matrix representation is reduc-ible if there is a matrix A such that

A-'T(g)A = 1l(g) OL0 T 2(g)

6. If T is reducible and T, and T2 are defined as above,then T is the direct sum of T, and T2 .

7. If the representation preserves the scalar product inthe Hilbert space, then the representation is unitary. Inthis case we have, for all g e G and all sl, S2 E H, (S1, S2) =

(T(g)si, T(g)s2 ).

In most cases of interest to us we can decompose a givenrepresentation T into a sum of irreducible representationsTk. Consequently we can decompose the underlying Hil-bert space H into a sum of smaller subspaces H', H2,... suchthat all Hk are invariant under all mappings T(g). Thesesubspaces are also minimal in the sense that they do notcontain smaller invariant subspaces. Tk:G - GL(Hk) isthen an irreducible representation of the group G on theHilbert space Hk. We shall sometimes refer to Hk as thesubspace belonging to Tk.

It is easy to show that equivalent representations definean equivalence relation on the space of representations, andit is therefore natural to select, from each class of equivalentrepresentations, one member with convenient properties.Some ways of doing this are suggested in the next few theo-rems, which collect some important results from the theoryof compact groups.

Theorem 1Any finite-dimensional, unitary representation is complete-ly reducible; i.e., either it is irreducible or it is the direct sumof irreducible representations.

Theorem 2Let T be a finite-dimensional representation of a compactgroup G; then T is equivalent to a unitary representation;i.e., there is a fixed matrix A such that AIT(g)A' is a unitarymatrix.

Theorem 3All irreducible, unitary representations of a compact groupare finite dimensional.

Theorem 4Every irreducible, finite-dimensional representation of acommutative group G is one dimensional (1-D).

Reiner Lenz

Page 4: Group-theoretical model of feature extraction

830 J. Opt. Soc. Am. A/Vol. 6, No. 6/June 1989

The simplest illustration of these results is provided bythe symmetrical signal space (H, G, T) in which H is thespace of all square-integrable functions on the unit circle, G= SO(2), and T is the rotation operator T(g)s(O) = T(6)s(-P)= s(o - ip ( is the rotation angle of the rotation g). Anelement s e H can be decomposed into a Fourier series s(o)= E c exp(27rino); i.e., H is the direct sum of the one-dimensional subspaces Hk consisting of the complex multi-ples of exp(27rink). The representation T can be reduced tothe sum of 1-D, irreducible representations Tk defined byTk(g)s(M) = Tk(iP)[E Cn exp(27rinp)] = Enkc Cn exp(2irino)+ ckiexp[2rik(q - l)]}. The (1-D) matrix representation isgiven by the matrix entry tk(g) defined by gexp(-27rikip.

The next theorem, known as the Peter-Weyl theorem, isone of the most important results in the theory of compactgroups. It describes how a function on the group can bedecomposed into a series of simple basis functions. It is alsopossible to say that the theorem describes a kind of Fourierexpansion of functions on compact groups.

Theorem 5Let Tk(k = 1, 2, ... ) be a complete set of inequivalent,irreducible, unitary representations of a compact group G,let dk be the dimension of Tk, and let tnnk(g) be the matrixelements of the transformation Tk; then the functions

(1)

with (k = 1, 2,. ..) (1 S n, m S dk) form a complete orthonor-mal system of square-integrable functions on G with respectto the inner product based on the invariant integration on G.

We can reformulate the result as follows: each functions(g) can be written as a series:

dk

s(g) = 1 "nm ktnmk(g), (2)k=l n,m=l

with

"nmk = dk-1 fG s(g)tnmk(g)dg. (3)

4. APPLICATION TO FEATURE EXTRACTION

We can now apply the results presented so far to investigatefeature-extraction units. For this purpose we assume thatthe signal space of our unit is a symmetrical signal space (H,G, T) as defined in definition 1. In the discussion of thisdefinition we noted that GT equivalence is an equivalencerelation, and we partitioned H into the equivalence classesH/G. As a result we can now decompose an arbitrary signals e H into a component depending on G and one dependingon the prototype; we write s = s(g, p). We now define afeature.

Definition 31. A feature is a complex-valued function f on H. If we

write f(s) = f[s(g,p)] = f(g, p), then we also assume that f, asa function of g, is square integrable on G.

2. A linear feature is a feature that is also a linear map;i.e., it satisfies the condition f(cls + c2s2) = clf(sl) +

C2 f(S 2) for all signals s1 , S2 e H and all complex constants c1,c2.

With the help of the Peter-Weyl theorem we can now givea simple description of a features: if we keep the prototypesignal p e H/G fixed, thenf is a function of g, and we can usethe Peter-Weyl theorem to get a series expansion:

(4)(g, p) = Y Y nm (P)tnmk(9)-k nm

This expansion describes how f depends on p and g; we couldalso say that we have separated the dependencies of f on pand g.

Using this expression for the feature f and the orthogonali-ty of the functions tnmk(g), we can apply Haar integration tocompute the mean value of f over one pattern class or thecorrelation between the signals in two pattern classes. Forthe mean we get Jf f(g, p)dg = ao(p), where ao is the coeffi-cient belonging to the trivial, 1-D representation defined byt(g) = 1 for all g e G. For the correlation we get

J (g, Pl)f(g, P2)dg = I [ 1 °nm (Pl)tnmh(g)]

Z a tJ (P2)telgdij

= I I I I 0n (Pi)ajl(P2)k n,m I i,

X Gftij(g)tnmr (g)dg

=I I C!nmk(Pl)°nm(P2)dk l.k nm

For our purposes these features are too general, and in therest of this section we therefore consider only linear featuresand simple functions of linear features.

Assume now that (H, G, T) is a symmetrical signal spaceand that the representation T is the direct sum of a completeset of inequivalent, unitary, irreducible representations Tkof G. We denote the Hilbert space belonging to the irreduc-ible, unitary representation Tk by Hk. The dimension ofthis space is denoted by dk, and the basis elements of Hk aresignals elk . . ., edh . The matrix entries of Tk with respect tothis basis are the functions tn.k'g). For a fixed index k wedenote the matrix of size dk X dk and elements tmk(g) byTk(g). For simplicity we shall also assume that H is thedirect sum of all the subspaces Hk. A basis of H is thus givenby the elements enk.

Using the fact that the elements enk form a basis of H, wefind for a prototype signal p the following expansion:

dk

(5)P = I I fkenk

k n=l

Using the scalar product in the Hilbert space H, we cancompute the coefficients n k as usual by n = (p, enk). Thesubspaces Hk are invariant under the group operations, andthe transformation of the basis elements is described by the

Reiner Lenz

(d k) 1/2 tnmk (g)

Page 5: Group-theoretical model of feature extraction

Vol. 6, No. 6/June 1989/J. Opt. Soc. Am. A 831

matrices Tk(g) = [t,k(g)]. For an arbitrary signal s = s(g,p) = pg, we therefore find the expansion

= Pg = I I Inktnmk(g)enk. (6)k nm

From this expansion and the linearity of the feature functionf, we get the following equation for the value of the linearfeature f:

f(S) = f(pg) = I ktnmk (g)f(en k). (7)k nm

This equation shows how f depends on the prototype signalp, the group element g, and the values of f at the basiselements enk. We see also that a linear feature function f isdefined completely by its values at the basis elements en .Among the linear features there is a set of especially simpleones, which are defined below.

Definition 4

1. A linear feature that has a value of 1 at one fixed basiselement enk and is 0 for all other basis elements is called abasic linear feature. It is denoted here by fnk.

2. A basic feature vector k is a map fk:H - Cdk of theform s - fk(s) = [lk(s), .. ., fdk(s)], where the fk are thebasic features belonging to the subspace Hk of the represen-tations Tk.

Equation (7) shows that all the linear features are linearcombinations of basic linear features, and in the rest of thispaper we therefore restrict ourselves to the study of basiclinear features.

For the prototype signal p we get, from Eqs. (5) and (7),f(P) = Sk En On kf(enk), and, if f is a basic linear feature, say,fM then we have

v = /3. (8)

If s is a signal that is GT equivalent top:s = pg, then we find,for the feature value [see Eq. (7)],

f,/,(s) = fA(p) = I I k Atnm(g)fy(enk)k nm

= I AM tvmp(g) = I f"'(p)tp m (g). (9)m m

Using the basic feature vector notation, we then find thefollowing theorem.

Theorem 6First,

fM(s) = f(pg) = T(g)f(p).

Second, TA was a unitary representation, and therefore wehave, for all g e G,

UAf(p) 11 = jfm(pg) II. (10)

The magnitude of the feature vector is thus invariant underthe transformation p - pg.

The previous results show that we can treat the basicfeature vectors f/k and fk 2 separately if k, #d k2 but that we

must consider simultaneously all basic linear features be-longing to the same representation. Equation (10) is, ofcourse, valid only for basic feature vectors; if the differentbasis elements have different feature values, then we do notin general have the same magnitude for all feature vectors ofGT-equivalent signals.

The results in theorem 6 suggest the following procedurefor solving the pattern-recognition problem of detecting acertain class of signals.

Assume that we know that our signal space is symmetric,and assume further that we know the symmetry group of ourproblem and the irreducible, unitary representations of thisgroup. In the learning phase we present to the system oneprototype signal p. The feature-extraction unit computesfrom this signal the feature vectorfk(p) = [f1k(p),. * .,fdkk(p)]= ((p, elk), . . ., (p, edkk)), where the ek form a basis of thesubspace Hk belonging to the irreducible, unitary represen-tation Tk. The next time that the unit receives an unknownsignal s at its input, it computes fk(s) = [lk(S), . , fdk (s)] =((s,elk),. . ., (s,edkk)). Itthencompares lfI(p)|| and |fk(s)ll,and, if these two values are different, then it concludes thatthe two signals do not belong to the same class. If they aremore or less equal, then we can conclude that they might beequivalent, i.e., that there might be a group element g e Gsuch that s = pg. However, it is not guaranteed that they aresimilar. If it is necessary to find the transformation g also,then we might try to recover g from the matrix equation fk(s)= Tk(g)fk(p). However, this is not always possible, as weshall see in Section 5.

The Hilbert space H is in practically all cases infinitedimensional, i.e., there are infinitely many basic feature vec-tors fk. However, in a real application we can compute onlyfinitely many feature values k(s) = (s, enk). Therefore wemust select a finite number of these basic features. Thisproblem cannot be solved by using symmetry considerationsonly; instead we must take into account the problem at hand.As one example of how we can select suitable feature vectors,we consider again the problem in which we want to recognizetransformed versions of a prototype signal p. The featurevector fk(p) describes in this case the projection of p into thesubspace Hk, and ||fk(p) || is a measure of the portion of p thatlies in Hk. Therefore it seems to be reasonable to choose thesubspace Hk such that a maximal part of p lies in Hk; i.e., weselect k such that hfk(p) 11 is maximal. This is a direct gener-alization of the optimal basis-function approach to edgedetection developed by Hummel and Zucker.8 9 The theoryalso generalizes the pattern-recognition strategies based onFourier-Mellin transforms (see, for example, Refs. 10 and11).

5. GROUP-THEORETICAL DESIGN OF FILTERFUNCTIONS

In this section we shall use the abstract results of the preced-ing sections to get an overview of the symmetrical signalspaces, and we shall also discuss methods of constructingfilter functions.

Consider now a fixed signal space H and a fixed (compact)symmetry group G. If (H, G, T) is a symmetrical signalspace, then we know from the previous theorems that T isequivalent to the direct sum of finite-dimensional, irreduc-ible representations Tk. This gives all the possible spaces(H, G, ). As is shown above, the properties of T describe

Reiner Lenz

Page 6: Group-theoretical model of feature extraction

832 J. Opt. Soc. Am. A/Vol. 6, No. 6/June 1989

the strength of the symmetry constraints: if T(g) is theidentity mapping for all g e G, then we impose no group-theoretical constraint, since every signal s E H is GT equiva-lent only to itself. On the other hand, if T is the direct sumof a complete set of irreducible representations of G, then Timposes the strongest constraint possible.

Let us now return to our 2-D edge-detection problem.The signal space is in this case the space of all square-integrable functions defined on the unit disk O: H = L2(0))The symmetry group of our problem is the group of 2-Drotations, which means that we want to detect the patternindependent of its orientation in the image. This group iscommutative, and the irreducible, unitary representationsare therefore all 1-D (see theorem 4). If 4 is the rotationangle of the rotation g, then the matrix entry tllk(g) belong-ing to the irreducible representation Tk is the functionexp(-27rikV/). The Peter-Weyl theorem states that, in thiscase, functions tllk(g) = exp(-27rikp) form a complete func-tion set on the unit circle. Thus the Peter-Weyl theorem isa generalization of the Fourier expansion of periodic func-tions on a finite interval. The 1-D subspaces Hk of H be-longing to T are spanned by functions of the formWh(r)exp(27riko) with a radial weight function Wk. Theseare the so-called rotation-invariant operators investigated ina number of papers (see, for example, Refs. 12 and 13).

Now suppose that Tk is an irreducible representation ofSO(2); then Tk transforms a signal s by multiplying the kthterm in the Fourier expansion by the factor exp(-27rik'). Ifthe signal s = s(r, 0) has the Fourier decomposition s(r, 0) =EZ wl(r)exp(27ril), then the transformed pattern Tk(g)s be-comes

Tk(g)s = T k()s(r, 0)

= I w1(r)exp(27rilo) + wk(r)exp[2-ik(0 - 4')]. (11)

From this equation we infer that two signals are GTk equiva-lent if their Fourier coefficients are equal for all I / k and ifthe Fourier coefficients for the index k have the same magni-tude. An arbitrary representation is the direct sum of irre-ducible, 1-D representations, and two signals are equivalentif their Fourier coefficients are equal for all indices that arenot involved in the representation. For all indices involvedin the decomposition of the representation, the Fourier coef-ficients have equal magnitudes. The number of Fouriercomponents involved in the representation T describes howpowerful the symmetry constraint imposed by the represen-tation T is.

The scalar product in H is given by the integral2

(sl(r, k), s2(r, ')) =7r-l | J sl(r, k)s2(r, /)rdqdr.

If Wh(r)exp(-27rik) is a fixed basis element elk, and if thesignal s has the Fourier decomposition s(r, ) = EZ wl(r)exp(27rikgo), then the computed feature value becomes (s,ell) = f1 Wk(r)wk(r)rdr. Feature extraction thus amountsto a convolution of the signal function s with the filter func-tion Ws(r)exp(27rikk) or to a Fourier decomposition of thesignal in polar coordinates.

Equation (11) shows also that two different group ele-ments g, / g2 can produce the same transformation Tk(gl) =

Tk(92); we need only to take as g the rotation with an angle 4'

and to take asg2 the rotation with an angle 4 + 2r/k. There-fore, in general, we cannot recover the group element g fromthe feature values.

In the next example we shall study the same problem as inthe previous example but for 3-D images. The signal spaceis in this case the space of square-integrable functions on theunit ball, and the symmetry group is the group 3-D rotationsSO(3). This symmetry group is unfortunately no longercommutative. Therefore we cannot apply theorem 4, andwe must consider representations of dimensions greaterthan 1. The representation Tk(k = 0, 1, ... ) has now thedimension 2k + 1, and the basis elements in the spaces Hkare the surface harmonics Pkm(cos O)exp(27rimo) (-k < m <k), where (r, 0, p) are the polar coordinates in three-dimen-sions and Pkm are the associated Legendre polynomials.The basic features are computed now by convolving thesignal with the filter functions Wmk(r)Pkm(cos O)exp(2rimk)with radial weight functions Wmk(r). For a detailed study ofthese functions and their properties, the reader is referred tothe literature (see, for example, Refs. 14 and 15). For anapplication to filter design see also Refs. 13 and 16.

6. FURTHER APPLICATIONS

The feature-extraction problem was the starting point forthe development of the group-theoretical model presentedin this paper. The model is, however, so general that itseems to be useful also in other areas of image processing andpattern recognition. We mention here only a few areas inwhich such a strategy might be useful.

The problem of detecting a pattern independent of itsorientation in an image is the simplest example in whichgroup-theoretical methods are useful in image processing.Its natural generalization is connected with the analysis ofcamera motion. This can be seen as follows. Describe theimage that is detected by a camera located at a fixed positionin space by the function s. If the camera is rotated aroundits optical axis, then we get after the rotation g the image sg,where g is a 2-D rotation as in the edge-detection example.If we now allow arbitrary Euclidian motions g, then in thesame way we get a transformed image s9. It can be shownthat the investigation of this problem leads to the study ofthe group SL(2, C), the group of 2 X 2 matrices with complexentries and determinant 1. This is the simplest noncommu-tative, noncompact group, and it is known that its irreduc-ible representation is no longer finite dimensional. 5 6 17

As another example, consider image coding. Here theproblem is to compress the information content in an image.One approach to solving this problem is to partition animage into small segments. The gray-value distribution insuch a segment then is described by a series. The senderthen transmits the coefficients in this series and not theoriginal gray values in this segment. The problem here is, ofcourse, which function set should be used in the series ex-pansion.

One approach might be as follows. Assume that we wantto encode the signals in one class pG. Assume further that fis an arbitrary basis function with fjIj = 1. We measure thequality of f by the mean-squared error M(f) = fGllpg - (f,p9)fjj2 dg. A good basis function is thus a function that has alow mean-squared error. This value can be computed withthe help of the Peter-Weyl theorem as follows.

We first use the properties of the scalar product to get the

Reiner Lenz

Page 7: Group-theoretical model of feature extraction

Vol. 6, No. 6/June 1989/J. Opt. Soc. Am. A 833

following expressions for the squared error and the mean-squared error:

-Ip (pg)ll 2 = p pg12

M(/) = llpg12 -I(f p9) 12dg = lipII2-J f, p')(pl,f)dg.

We now use the expansion for p and p9 as in Eqs. (5) and (6)to find

(fop )(P Jf)dg I I /GE tn (g)#jt_!(g)(en,,)(f/e')dgknm lij

= I I klom2dkj1I(en k f)12

mnk

= I dk' (z Il3mk2 )(Il(en k In n

f)12.)

The first part of the sum describes the part of the prototypepattern that lies in the subspace Hk spanned by the elk,....

edkk. The second sum describes the part of the filter func-tion that lies in Hk. We conclude that it is best to select thebasis function f = enk such that the space Hk contains thelargest proportion of p. All basis functions enk in this spacehave the same importance.

Our last example is motivated by the optimal basis func-tion approach to filter design8'9 and by some problems in thestudy of neural networks.'8 In both cases it turns out thatwe must find the solutions of an eigenvalue problem of theform

|k(, l)fQ()dQ(t) = Xf(n). (12)

Here X is a set, k is the kernel of the equation, the integral isa Lebesgue integral, X is a complex constant, the eigenvalue,and f is the eigenfunction. In the case of the neural net-works the solutions would describe the stable states of thenetwork, and in the filter design method the solutions arethe required filter functions. We want to show that we candeduce a number of properties of the solutions from onlysome general properties of the kernel and the integral. Wemake the following assumptions:

* The group elements g are one-to-one mappingsfrom X onto X, and the integral is invariant under G; i.e.,fxfQt)dA(Q) = Sxf[g(Q)]ds(Q) for all functionsf and all groupelements g E G.

* The kernel is symmetrical in the sense that

| Q, )ft) dAu Q) = |k[gQS), g(rn)]fQS)dyQ~)fX - X

for all g E G and all f.

We now define the transformation T as T(g)f(Q) = fg() =

fig-l(Q)], and we finally define (the unknown) Hilbert spaceH as the space of the eigenfunctions belonging to a fixedeigenvalue X in Eq. (12). We then have a symmetrical signalspace (H, G, T), since is an element of H for all f E H andall g e G, as can be seen by applying the properties of thekernel and the integral introduced above. For a large num-ber of groups G, the representation spaces H and many of

their properties are known. Since H is related to G by arepresentation, we can now conclude that H must be one ofthese known Hilbert spaces. If, for example, G is a compact,Abelian group, then we find immediately from the represen-tation-theoretical theorems that T is equivalent to the directsum of 1-D unitary, irreducible representations. In the casein which X is the unit circle or the unit disk and G is asubgroup of S0(2) we find again as basic solutions the com-plex exponentials. In the 3-D case in which the symmetrygroup is given by a subgroup of SO(3) we find the surfaceharmonics as solutions.

This result can also be reformulated in the following way:Assume that H' is the Hilbert space L2(X) and that G and Tare defined as above; then the mapping K defined as K(f)(n)= Sx k(t, 7)f()du(t) is an operator from H' to H'. Under theconditions formulated above we find that K commutes withthe action of G, i.e., K(fg) = [K(f)]g for all f and all g. If wenow define H as the space of all eigenfunctions belonging tothe fixed eigenvalue X, then we find that (H, G, T) is thesame symmetrical signal space as is described above. Thisnew formulation of the integral equation problem has, how-ever, the advantage that we can now study arbitrary opera-tors K that commute with the group operation. In the caseof the rotation groups S0(2) and S0(3), we find that thecomplex exponentials and the surface harmonics are eigen-functions of operators that commute with the actions ofS0(2) and SO(3), respectively. They are especially eigen-functions of the Fourier transform and the Laplacian. Thisis a group-theoretical explanation of why the filter functionsobtained by the optimal basis-function approach have anumber of other nice properties, such as Fourier-transforminvariance.

These examples may demonstrate the power of the group-theoretical method. The method allows us to find resultsthat we could obtain otherwise only by means of lengthycomputations, and it gives an explanation of otherwise unre-lated properties. We shall not go into further detail here,but the interested reader can turn to the literature.1519-2

APPENDIX A: GROUPS AND HILBERT SPACES

In this appendix we summarize some notations, definitions,and basic properties of groups and Hilbert spaces.

If A and B are sets and f is a mapping from A into B, thenwe denote this by f:A - B; a -b = f(a). By R and C wedenote the real line and the complex numbers, respectively.Rn and Cn are the spaces of n-dimensional real and complexvectors, respectively. The unit circle 91 is denoted by @ andthe unit disk; i.e., the unit circle together with its interior isdenoted by O. The conjugate complex value of the complexnumber z is denoted by 2.

We now define the algebraic concept of a group.

Definition 5A nonempty set G is called a group if a product o:G X G - G;(g1, g2) '- ° g2 is defined for every two elements g, and g2 inG, for which the following conditions hold:

* (g1 o g2 ) o 3= o (g2 o g 3 ). We say that the product isassociative.

* There is a unique element ee G such that e o g = g o e= g for all g e G. e is called the identity element.

* For every element g e G there is a unique element g' such that gl o g = g o gl = e. g is called the inverse of g.

Reiner Lenz

Page 8: Group-theoretical model of feature extraction

834 J. Opt. Soc. Am. A/Vol. 6, No. 6/June 1989

A group is called commutative or Abelian if g, 2 = 2 1

for all g, g 2 G.

Definition 61. Assume that H is a (complex) vector space. A scalar

product is a mapping (, ) with the following properties:

* (,): HXH- >C.* (x +y,z) = (x, z) + (y,z) for allx,y,z H.* (a -x, y) = (x, y) for all x H and all a C.* (x, y) = (y, x for all x, y H.* (x,y) > and (x, x) = 0 ifand onlyifx = 0.

2. A vector space together with a scalar product is calledan inner product space or a pre-Hilbert space.

Theorem 7If H is a pre-Hilbert space, then Ix I = (x, x) defines a normon H.

Definition 71. A normed space is complete if every Cauchy sequence

converges in this space.2. A Hilbert space is a pre-Hilbert space that is complete

in the norm defined by its scalar product.

APPENDIX B: TOPOLOGICAL GROUPS,REPRESENTATIONS, AND THE HAAR INTEGRAL

The definitions and the theorem below introduce the con-cept of a topological group, representation of such groups,and invariant integration on a group.

Definition 81. If the set G underlying the group is also a topological

space, then G is a topological group if the product (g,, g2 ) -

gl 0 g2 and the inversion g g-1 are continuous maps.2. A topological group is called a compact group if G as a

topological space is compact, i.e., if any open covering of Ghas a finite subcovering.

Definition 9Assume that G is a topological group, H is a Hilbert space,and GL(H) is the space of continuous, linear, one-to-onemappings of H into itself. A representation of G is a mapT:G - GL(H); g - T(g) that satisfies the following condi-tions:

* T(e) = id (e is the identity in G and id is the identity.function on H).

* T(g1g2) = T(gD)T(g 2) for allgj,g 2 e G.* (x, g) - T(g)x is a continuous mapping from X X G

onto X.

Theorem 8

For compact groups (and the slightly more general locallycompact groups) it is possible to construct a positive, right-invariant, and normed integral, i.e., an integral that satisfiesthe following three conditions:

1. Jf f(g)dg 2 if f(g) O for all g G.2. g (ggl)dg = g f(g)dg for all g, e G.3. fgldg=1.

Definition 10The integral described above is unique, and it is called theHaar integral of the group G. It is also possible to define aleft-invariant integral of a group, and it can be shown thatthose integrals are identical for compact groups. Integra-tion over a group is an averaging operation in which allelements in the group are equally likely. This is a directconsequence of the fact that the integral is normed andinvariant.

ACKNOWLEDGMENT

The research reported in this paper was supported by a grantfrom the Swedish National Board for Technical Develop-ment.

REFERENCES

1. F. Hirzebruch and W. Scharlau, Einfuhrung in die Funktiona-lanalysis (Bibliographisches Institut, Mannheim, Federal Re-public of Germany, 1971).

2. K. Yosida, Functional Analysis (Springer-Verlag, Berlin, 1978).3. M. A. Naimark and A. I. Stern, Theory of Group Representa-

tions (Springer-Verlag, New York, 1982).4. L. Pontrjagin, Topological Groups, Princeton Mathematical

Series 2 (Princeton University, Princeton, N.J., 1946).5. I. M. Gelfand, M. I. Graev, and N. Y. Vilenkin, Integral Geome-

try and Representation Theory, Vol. 5 of Generalized Func-tions (Academic, New York, 1966).

6. M. A. Naimark, Linear Representations of the Lorentz Group(Pergamon, Oxford, 1964).

7. D. P. Zelobenko, Compact Lie Groups and Their Representa-tions (American Mathematical Society, Providence, R.I., 1973).

8. R. A. Hummel, "Feature detection using basis functions," Com-put. Graphics Image Process. 9, 40-55 (1979).

9. S. W. Zucker and R. A. Hummel, "A three-dimensional edgeoperator," IEEE Trans. Pattern Anal. Mach. Intell. PAMI-3,324-331 (1981).

10. Y. Sheng and H. H. Arsenault, "Experiments on pattern recog-nition using invariant Fourier-Mellin descriptors," J. Opt. Soc.Am. A 3, 771-776 (1986).

11. H. Stark and R. Wu, "Rotation-invariant pattern recognitionusing optimum feature extraction," Appl. Opt. 24, 179-184(1985).

12. P.-E. Danielsson, "Rotation invariant linear operators with di-rectional response," in Proceedings of the 5th InternationalConference on Pattern Recognition (Institute of Electrical andElectronic Engineers Computer Society, New York, 1980), pp.1171-1176.

13. R. Lenz, "Optimal filters for the detection of linear patterns in2-d and higher dimensional images," Pattern Recognition 20,163-172 (1987).

14. I. M. Gelfand, R. A. Minlos, and Z. Y. Shapiro, Representationsof the Rotation and Lorentz Groups and Their Applications(Pergamon, New York, 1963).

15. H. Dym and H. P. McKean, Fourier Series and Integrals (Aca-demic, New York, 1972).

16. R. Lenz, "Reconstruction, processing and display of 3D-ima-ges," doctoral dissertation (Link6ping University, Linkoping,Sweden, 1986).

17. R. Lenz, "The Mellin transform and the analysis of cameramotion," Tech. Rep. 0977 (Department of Electrical Engineer-ing, Link6ping University, Linkoping, Sweden, 1988).

-18. T. Kohonen, Self Organization and Associative Memory(Springer-Verlag, Berlin, 1984).

19. A. Terras, Harmonic Analysis on Symmetric Spaces and Ap-plications I (Springer-Verlag, Berlin, 1985).

20. R. Lenz, "Symmetry and filter design," Tech. Rep. LiTH-ISY-I-0897 (Department of Electrical Engineering, Link6ping Uni-versity, Linkoping, Sweden, 1988).

21. R. Lenz, "Recognizing group invariant pattern classes," Tech.Rep. LiTH-ISY-I-0925 (Department of Electrical Engineering,Link6ping University, Link6ping, Sweden, 1988).

Reiner Lenz