Chapter 12 Object Recognition

Image Comm. Lab EE/NTHU 1

Chapter 12 Object Recognition

• Images regions are treated as objects or patterns

• Object recognition → pattern recognition• Pattern recognition:

(a) decision-theoretic: quantitative descriptor, i.e., length, area, texture.

(b) Structural: qualitative descriptor, i.e., relational descriptor.


2.1 Patterns and Pattern Classes2.1 Patterns and Pattern Classes

• A pattern is an arrangement of descriptors.• Feature is used to denote a descriptor.• A pattern class is a family of patterns that share some

common properties.• Pattern recognition → assign patterns to their

respective classes.• Three common pattern arrangements: vectors, strings

and trees.• Pattern vectors are x=[x1, x2,…xn] where each

component xi represent the ith descriptor and n is the total number of descriptors.



The flower is described by two measurement x=[x1, x2] where x1 and x2corresponding to petal length and width



We represent each object by its signature, and form the pattern vectors by letting x1=r(θ1) x2=r(θ2) ,……xn=r(θn).


2.1 Patterns and Pattern Classes

• Pattern classes are characterized by quantitative Information or structural relationships

• i.e., in fingerprint recognition: interrelationship of print features called minutiae.

• Minutiae and their relative size and location are used as primitive components to describe the ridge property, i.e., ending, branching, and merging,…








12.2 Recognition based on decision-theoretic methods

12.2 Recognition based on decision-theoretic methods

• Decision-theoretic approaches to recognition are based on the use of decision (or discriminant) function.

• Let x=[x1, x2,…xn] represent an n-dimensional pattern vector. For W pattern classes ω1,… ωW, find the W decision functions d1(x), d2(x),…dW(x) with the property that , if a pattern x belongs to class ωi,…then

di(x) > dj(x),…for j=1, 2….W; j≠i• The decision boundary separating class ωi from ωj,

is given by values of x for which di(x)= dj(x).


12.2.1 Matching

• Recognition techniques represent each class by a prototype pattern vector.

• The simple approach is the minimum-distance classifier, which compute the (Euclidean) distance between the unknown and each of the prototype vectors.

• It choose the smallest distance to make a decision


12.2.1 Matching

• Minimum distance classifier• The prototype of each pattern class to be the mean

vector of the patterns of that class:

where Nj is the number of patterns from class j.• Using the Euclidean distance to determine the

closeness as the distance measureDj(x)=║x–mj║ j=1, 2,….W

where ║a║=(aT·a)1/2 is the Euclidean distance norm.• We then assign x to class ωi if Dj(x) is the smallest

distance.

∑∈

=jj

j N ωxxm 1


12.2.1 Matching

• Selecting the smallest distance is equivalent to evaluating the functions

and assigning x to class ωi if dj(x) yields the largestnumerical value.

• The decision boundary between ωi and ωj for a minimum distance classifier is

• It is a surface indicating a perpendicular bisector (a line or a surface) of the line segment joining mi and mj.

jTjj

Tjd mmmxx

21)( −=

)()(21)()()()( ji

Tjiji

Tjiij ddd mmmmmmxxxx −−−−=−=


12.2.1 Matching12.2.1 Matching

The sample means are m1=(4.3, 1.3)T

m2=(1.5, 0.3)T

17.13.05.121)( 212222 −+=−= xxd TT mmmxx

1.103.13.421)( 211111 −+=−= xxd TT mmmxx

09.80.18.2)()()( 212112 =−+=−= xxddd xxxEqu. of boundary:




12.2.1 Matching

• Matching by correlation• The correlation between f(x, y) and w(x, y) is

c(x, y)=ΣΣf(s, t)w(x+s, y+t)for x=0, 1, 2,….M-1 and y=0, 1, 2,…N-1.

• The correlation c(x, y) is sensitive to the changes in the amplitude of f and w, a normalization is applied on the c(x, y) as

[ ][ ][ ] [ ]

2/122

),(),(),(

),(),(),(),(

⎭⎬⎫

⎩⎨⎧ −++−

−++−=

∑∑

∑∑

s t

s t

wtysxwtsftsf

wtysxwtsftsfyxγ





f(x, y) w(x, y) γ(x, y)


12.2.2 Optimal statistical classifier

• Assume the probability that a particular pattern xcomes from class ωi is denoted as p(ωi |x).

• If the pattern classifier decides that x came from ωj and when it actually came from ωi and it incur a loss, denoted as Lij.

• A pattern x may be assigned to any class, and the average loss incurred is

∑=

=W

kkkjj pLr

1

)()( xx ω



• Under the Bayesian rule: p(A|B)=p(A)p(B|A)/p(B), we have

• Since p(x) is common to all the rj(x), it can be dropped as

• The classifier minimizes the total average loss is called the Bayes classifier.

∑=

=W

kkkkjj ppL

pr

1

)()()(

1)( ωωxx

x

∑=

=W

kkkkjj ppLr

1

)()()( ωωxx



• The Bayes classifier assigns a unknown pattern x to class ωi if ri(x) < rj(x) for j=1, 2,…W, and j≠i

• Assume the lose function Lij=1-δij then we have

• The Bayes classifier assign a pattern to class ωi if

• or

∑∑==

<W

qqqqj

W

kkkki ppLppL

11

)()()()( ωωωω xx

)()()()()()1()(1

jj

W

kkkkjj pppppr ωωωωδ xxxx −=−= ∑

=

)()()()()()( iijj pppppp ωωωω xxxx −>−

)()()()()()( xxxx iiijjj dppppd =<= ωωωω



Bayes classifier for Gaussian classes• Consider 1-D problem with two classes W=2.• The Bayes decision function is

• The boundary between two classes is x=x0 such that d1(x0)= d2(x0)• For equal-likely case p(ω1)=p(ω2)=1/2, then p(x0 |ω1 ) = p(x0 |ω1

), e.g., boundary (at x= x0 ), is shown in Fig. 12.10.• For non-equal-likely case p(ω1)≠p(ω2), if ω2 is more likely,

then x0 move to the right, else it moves to the left

)(2

1)()()(2

2

2)(

j

mx

jjjj pepxpxd j

j

ωσπ

ωω σ−

−

==


12.2.2 Optimal statistical classifier12.2.2 Optimal statistical classifier



• In n-dimensional case, the Gaussian density of vector is the jth pattern class has the form as

where the mean vector is mj=Ej{x} and covariance matrix Cj=Ej{(x-mj )(x-mj )T}

• Approximating the mean by the averagingand

)()(21

2/12/

1

)2(1)( jj

Tjep

jnj

mxCmx

Cx

−−− −

=π

ω

∑∈

=jxj

j N ω

xm 1∑∈

−=jx

Tjj

T

jj N ω

mmxxC 1



• The decision function can also be written as

• For n-dimensional Gaussian density function, we have

• Simplified as

• If all covariance matrix are equal Cj=C for all j then

• If C=I then

)(ln)(ln)]()(ln[)( jjjjj ppppd ωωωω +== xxx

)]()[(21ln

21)2(

2)(ln)( 12/

jjT

jjn

jjnpd mxCmxCx −−−−−= −πω

)(21)(ln)( 11

jjTjjj

Tjj pd mCmmCxx −− −+= ω

jTjj

Tjd mmmCxx

21)( 1 −= −

)]()[(21ln

21)(ln)( 1

jjT

jjjj pd mxCmxCx −−−−= −ω



• Examplem1= m2= C1=C1=

We assume equal-likely case p(ω1)=p(ω2)=1/2 then

where C-1 =

The decision functions: d1(x)=4x1-15 and d2(x)= -4x1+8x2+8x3-5.5Decision surface: d(x) = d1(x)-d2(x) = -4x1+8x2+8x3-5.5

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−311131

113

161

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

113

41

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

131

41

jTjj

Tjd mmmCxx

21)( 1 −= −

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−

−−

844484448





• Example Multi-spectral scanner response to selected wavelength bands: 0.40~0.44 microns (violet), 0.58~0.62 microns (green), 0.66~072 microns (red), 0.80~1.00 microns (infrared). Every point in the ground is represented by 4-element pattern vector as x=[x1, x2, x3, x4]






12.2.3 Neural Networks

• The patterns used to estimate the parameters (mean and covariance of each class) are the training patterns, or training set

• The process by which a training set is used to obtain the decision function is called learning or training.

• The statistical properties of pattern classes in a problem often are unknown or cannot be estimated

• Solution: neural networks



• Non-linear computing elements organized as a network are believed to be similar to the neurons in the brain called neural network, neurocomputers, parallel distribution model (PDP) etc.

• Interest in neural networks dated back to 1940s• During 1950~1960, learning machine such as

perceptron is proposed by Rosenblatt.• 1969, Minsky and Papert discouraged the perceptron-

like machine.• 1986, Rumelheart, Hinto and Willams, dealing with the

developement of multi-layer perceptrons


12.2.3 Neural Networks• Perceptron of two pattern classes (Fig. 12.14)• The response of the basic devise is based on a weighted

sum of its inputs as d(x)=Σi wixi+wn+1.• It is a linear decision function with respect to a pattern

vectors. The coefficient wi i=1,…n, n+1, are weights. • The function that maps the output of summation to the

output of the device is called the activation function.• When d(x)>0 the threshold element causes the output of

the perceptron to be +1, indicating x belonging to ω1. When d(x)<0 indicating the other case.

• The decision boundary is d(x)= Σi wixi+wn+1 =0


12.2.3 Neural Networks12.2.3 Neural Networks



• Another formulation is to augments the pattern vectors by appending an additional (n+1)st element, which is always equal to 1.

• An argument pattern vector y is created from a pattern vector x by letting yi=xi and yn+1=1.

• The decision function becomesd(y)=Σi wiyi

where y=(y1, y2……yn, 1)T is an argument pattern vector.


12.2.3 Neural Networks• Training Algorithms →linearly separable case• For two training sets belonging to pattern classes ω1

and ω2 . Let w(1) be the initial weight vector chosen arbitrarily.

• At the kth iterative step, if y(k)∈ ω1 and wT(k)y(k)≤0, replace w(k) by w(k+1) = w(k) +cy(k), where c is a positive correction number.

• Conversely, if y(k)∈ω2 and wT(k)y(k)≥0, replace w(k) by w(k+1) = w(k)–cy(k).

• Otherwise leave w(k) unchanged w(k+1)= w(k) • The algorithm is referred as fixed increment

correction rule. It converges if two training pattern sets are linearly separable,



• Example - Consider two training sets (Fig. 12.15)

• {(0,0,1), (0,1,1)} ∈ω1, {(1,0,1), (1,1,1)} ∈ω2

• Let c=1 and w(1)=0. representing the patterns in the order of sequence as

[ ] 1110

100)2()2( =⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡⋅=ywT

[ ] 0100

000)1()1( =⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡⋅=ywT

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=+=

100

)1()1()2( yww

w(3)=w(2)

[ ] 1111

001)4()4( −=⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡⋅−=ywT

w(5)=w(4)

[ ] 1101

100)3()3( =⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡⋅=ywT

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡−=−=

001

)3()3()4( yww



• Example (continued)• A solution is obtained only when the algorithm

yields a complete error-free iteration through all training patterns

• Convergence is achieved at k=14, yield the solution weight vector w(14)=(-2, 0, 1)T.

• The corresponding decision function is d(y)=-2y1+1 and d(x)=-2x1+1.





• Non-separable classes (non-linear case) → Least-mean square (LMS) data rule

• Consider iteration function J(w)=½(r–wTy)2

where r is the desired response (i.e., r=+1, y∈ω2, and r=-1 y∈ω1 )

• The task of LMS data rule is to adjust w incrementally in the direction of negative gradient of J(w) in order to minimize J(w) which occurs when r=wTy

• After the k iteration step, the w(k) is updated as

)(

)()()1(k

Jkkwww

www=

⎥⎦⎤

⎢⎣⎡

∂∂

−=+ α



• From J(w)=½(r–wTy)2 we have

or w(k+1)= w(k)+αe(k)y(k), and e(k)=r(k)-wT(k)y(k)• If we change w(k) to w(k+1) but leave the pattern the

same, the error becomes e(k) = r(k)-wT(k+1)y(k).• So ∆e(k)= [wT(k+1)–wT(k)]y(k)

= –αe(k)yT(k)y(k)= –αe(k)║y(k)║2

[ ] )()()()()()1( kkkkrkk T yywww −−=+ α



• Multilayer feedforward neural networks (fig. 12.14)• Each neuron has the same form as the perceptron model

except that the hard-limiting activation function has been replaced by a soft-limiting “sigmoid” function which has the necessary differentiability as

where Ij, j=1,2,…NJ, is the input to the activation element of each node in layer J, θj is an offset, and θjcontrol the shape of the sigmoid function.The sigmoid function is plotted in fig. 12.17.

0/)(11)( θθ jjIjj e

Ih +−+=







• From fig. 12,17, the offset θj is analog to the weight wi+1 in perceptron.

• In fig. 12.16, the input to node in any layer is the weighted sum of the output from previous layer.

• Let layer K preceding layer J gives the input to the activation element of each node in layer J, denoted as Ij as

where Nj or Nk is the number of nodes in layer J or KThen output of layer K are Ok=hk(Ik), k=1,…Nk.

∑=

=kN

kkjkj OwI

1



• Every node in layer J, but each individual input can be weighted differently, as wik and w2k for k=1,2,…Nk are the weights on the inputs to the 1st and 2nd nodes

• The main problem in training a multilayer network lies in adjusting the weights in the so-called hidden layers.

01

/)(

1

1)(θθ jk

kN

kjkOw

jj

e

Ih+− ∑

+

=

=



• Training by back propagation• Begin from the output layer, the total square error between the

desired output rq and actual output Oq of nodes in layer Q is

• Similar to delta rule, the training rule adjust the weights in each layer in a way that seeks a minimum of an error function, i.e.,

• Using the chain rule, we have

∑=

−=QN

qqqq OrE

1

2)(21

qp

Qqp w

Ew

∂∂

−=Δ α

pq

Q

qp

q

q

Q

qp

Q OIE

wI

IE

wE

∂∂

=∂∂

∂∂

=∂∂



• Therefore, ∆wqp=–α(∂EQ/∂Iq)Op = αδq Op

where δq =–(∂EQ/∂Iq)• From chain rule, we have δq=– (∂EQ/∂Oq)(∂OQ/∂Iq)

where ∂EQ/∂Oq =– (rq–Oq) and ∂OQ/∂Iq=∂[hq(Iq)]/∂Iq = h’q(Iq).

• Finally, we have δq= (rq–Oq) h’q(Iq)• So, ∆wqp= αδqOp=α (rq–Oq) h’q(Iq) Op



• Now considering layer P, preceeding in the same manner as above as ∆wpj= αδqOj=α (rp–Op) h’q(Iq) Oj

• We have similar error terms, i.e., δp =–(∂Ep/∂Ip) =– (∂Ep/∂Op)(∂Op/∂Ip)

=– (∂Ep/∂Op)h’p(Ip)• The term (∂Ep/∂Op) does not produce rp, but is expressed

as ∑∑∑===

=∂∂

−=∂∂

∂∂

−=∂∂

−qqq N

qpqq

N

qpq

q

pN

q p

q

q

p

p

p wwIE

OI

IE

OE

111

)( δ

Layer J Layer P Layer Q



• The parameter δp is δp = h’p(Ip) Σq δp wqp

• After the error term and weights have been computed for layer P, these quantities can be used to compute the error and weights for the layer preceding layer P.

Summary: for any layers K and J, where K precedes J.1. Computer the weights wjk, which modify the

connections between these two layers, by ∆wjk =αδjOk.2. If J is the output layer, δj = (rj–Oj) h’j(Ij)3. If J is the internal layer δj = h’j(Ij) Σp δp wjp for

j=1,2,…Nj. Layer K Layer J Layer P














12.3 Structural Methods12.3 Structural Methods

• Structural relationships inherent in a pattern’s shape.

• 12.3.1 Matching Shape Numbers• 12.3.2 String Matching• 12.3.3 Syntactic Recognition of Strings


12.3.1 Matching Shape numbers

• Refer to 11.2.2, the degree of similarity, k, between two region boundaries (shapes) is defined as the largest order for which their shape numbers still coincide.

• For example, let a and b denote shape numbers of closed boundaries represented by 4-directional chain codes. These two shapes have a degree of similarity k if

sj(a)=sj(b) for j=4, 6, 8,…k.sj(a)≠sj(b) for j=k+2, k+4, ….

where s indicates shape number and the subscript indicates order.

• The distance between two shapes a and b is defined as the inverse of their degree of similarity, i.e., D(a, b)=1/k


12.3.1 Matching Shape numbers

• The distance satisfy the following properties:– D(a, b)≥0– D(a, b)=0 if a=b– D(a, b)≤max[D(a, b), D(b, c)]

• Example. Find the closest match between the give shape f and the other five shapes (a~e) as shown in Fig. 12.24


12.3.1 Matching Shape numbers12.3.1 Matching Shape numbers


12.3.2 String Matching

• Suppose two regions a and b are coded into two strings as a1a2,…an, and b1b2,…bm , respectively.

• Let α represent the number of matches between the two strings, the number of symbols that do not match is

β= max(|a|, |b|)–α• where |a|, is the length of symbol a, β=0 if a and b are identical.• The measurement of similarity between a and b is the ratio

R=α/β• Because matching is done symbol by symbol, the starting point on

each boundary is important.• Example 12.25(a) and (b) show sample boundaries of two objects,

12.25(c) and (d) show the polygonal approximations. Strings are formed from the polygon by computing the interior angleθbetween segments as each polygon was traversed clockwise.


12.3.2 String Matching12.3.2 String Matching

Angels are coded into one of eight possible symbols corresponding to 45o increments.


12.3.3 Syntactic Recognition of Strings

• Syntactic pattern recognition: (1) a set of pattern primitives; (2) a set of rules (grammar) that governs their interconnection; (3) recognizer (automaton) whose structure is determined by the set of rules in the grammar.

• String Grammar is defined as 4-tuple:G=(N, Σ, P, S)whereN is a finite set of variable called non-terminals.Σ is a finite set of constants called terminalsP is a set of rewritting rules called productionsS in N is callled starting symbol


12.3.3 Syntactic Recognition of strings12.3.3 Syntactic Recognition of strings






Chapter 12Object Recognition




















Documents

Chapter 12 Object Recognition