Digital Image Processing - Image Restoration

Unit IV : Image Restoration Two mark Questions 1. What is image restoration?

Restoration is a process of reconstructing or recovering an image that has been degraded by using a priori knowledge of the degradation phenomenon. Thus restoration techniques are oriented towards modeling the degradation and applying the inverse process in order to recover the original image.

Restoration attempts to reconstruct or recover an image that has

been degraded by using a clear knowledge of the degrading phenomenon.

2. What is meant by unconstrained restoration?

In the absence of any knowledge about the noise „n‟, a meaningful

criterion function is to seek an f^ such that H f^ approximates of in a least square sense by assuming the noise term is as small as possible.

It is also known as least square error approach.

n = g-Hf

To estimate the original image f^, noise n has to be minimized and f^ = g/H

Where,

H = system operator. f^ = estimated input image. g = degraded image.

3. What is meant by constrained restoration?

It is also known as maximum square error approach n = g-Hf. To estimate

the original image f^, noise n has to be maximized and f^ = g/H.

4. What is inverse filtering?

The simplest approach to restoration is direct inverse filtering, an

estimate F^(u,v) of thetransform of the original image simply by dividing the transform of the degraded image G^(u,v) by the degradation function.

F^ (u,v) = G^(u,v)/H(u,v)

Inverse filtering is the process of recovering the input of the system from its output.

5. What is interactive restoration?

In general, iterative restoration refers to any technique that attempts to minimize a function of the form )(fM using an updating rule for the

partially restored image.

6. What is a pattern?

Pattern is a quantitative or structural description of an object or some other entity of interest in an image. It is formed by one or more descriptors.

7. What is a pattern classifier?

It is a family of patterns that share some common properties. Pattern classes are denoted as w1 ,w2 ,w3 ………, wM , where M is the number of classes.

8. What are optimal statistical classifiers?

In most of the fields measuring and interpreting of physical events, probability considerations are dealt with and it has become much important in pattern recognition because of the randomness under which pattern classes normally are generated. It is possible to derive a classification approach that is optimal in the sense that, on average, its use yields the lowest probability of committing classification errors.

9. Give the mathematical form of the Bayes decision function.

10. What are artificial neural networks?

11. What is a multilayer feedforward neural network?

It consists of layer of structurally identical computing nodes (neurons) arranged so that the output of every neurons in one layer feeds into the input of every neuron in next layer. The number of neurons in the first layer, called A, is NA. Often, NA=n, the dimensionality of the input pattern vectors. The number of neurons in the output layer, called layer Q, is denoted NQ. The number of NQ equals w, the number of pattern classes that the neural networks has been trained to recognize. The network recognize a pattern vector X as belonging to classes if the ith output of the network is “high” while all other outputs are “low”.

12. What is meant by “training” in artificial neural network?

Set weights to minimize Levenberg-Marquad Algorithm (multi-dim steepest descent). Training is very expensive computationally. If there are x input nodes, „t‟ output nodes, and „p‟ hidden nodes, then the weights = (x+t)p.

13. What is the concept behind algebraic approach to restoration?

The concept of algebraic approach is to estimate the original image which minimizes a predefined criterion of performances.

14. What are the methods to estimating the degradation function?

The three methods of degradation function are,

Observation Experimentation Mathematical modeling

15. How the blur is removed caused by uniform linear motion?

An image f(x,y) undergoes planar motion in the x and y-direction and x0(t) and y0(t) are the time varying components of motion. The total exposure at any point of the recording medium (digital memory) is obtained by integrating the instantaneous exposure over the time interval during which the imaging system shutter is open.

16. What is meant by least mean square filter?

The limitation of inverse and pseudo inverse filter is very sensitive noise. The wiener filtering is a method of restoring images in the presence of blurr as well as noise.

17. Give the difference between Enhancement and Restoration.

Enhancement technique is based primarily on the pleasing aspects it

might present to the viewer. For example: Contrast Stretching. Where as Removal of image blur by applying a deblurrings function is considered a restoration technique.

18. What are the properties of Linear Operator?

Additivity

Homogenity

19. What are the methods of algebraic approach?

Unconstraint restoration approach Constraint restoration approach

20. What are the types of noise models?

Guassian noise Rayleigh noise Erlang noise Exponential noise Uniform noise Impulse noise

21. What is meant by noise probability density function?

The spatial noise descriptor is the statistical behavior of gray level values in the noise component of the model.

22. What is meant by blind image restoration?

Degradation may be difficult to measure or may be time varying in an unpredictable manner. In such cases information about the degradation must be extracted from the observed image either explicitly or implicitly. This task is called blind image restoration.

23. What are the approaches for blind image restoration?

Direct measurement Indirect estimation

24. What is blur impulse response and noise levels?

Blur impulse response: This parameter is measured by isolating an image of a suspected object within a picture.

Noise levels: The noise of an observed image can be estimated by measuring the image covariance over a region of constant background luminance.

25. What is meant by indirect estimation?

Indirect estimation method employs temporal or spatial averaging to either obtain a restoration or to obtain key elements of an image restoration algorithm.

Twelve mark Questions 1. Explain degradation model for (i) continuous function (ii) discrete

formulation

• Restoration attempts to reconstruct or recover an image that has been degraded by using a priori knowledge of the degradation phenomenon.

• Restoration techniques are oriented toward modeling the degradation and applying the inverse process in order to recover the original image.

Degradation models:

Many types of degradation can be approximated by linear, space invariant processes Can take advantages of the mature techniques developed for

linear systems Non-linear and space variant models are more accurate

Difficult to solve Unsolvable

Estimating degradation function

Estimation by image observation Degradation system H is completely characterized by its

impulse response

Select a small section from the degraded image ( , )g x ys

Reconstruct an unblurred image of the same size ˆ ( , )f x ys

The degradation function can be estimated by ( , )

( , )ˆ ( , )

ss

s

G u vH u v

F u v

By ignoring the noise term, G(u,v) = F(u,v)H(u,v). If F(u,v) is the Fourier transform of point source (impulse), then G(u,v) is approximates H(u,v).

Fig: A model of the image degradation / restoration process Continuous degradation model

Motion blur. It occurs when there is relative motion between the object and the camera during exposure.

otherwise,022

if,1

)(

Li

L

Lih

Atmospheric turbulence. It is due to random variations in the reflective index of the medium between the object and the imaging system and it occurs in the imaging of astronomical objects.

2

22

2exp),(

jiKjih

Uniform out of focus blur

otherwise,0

if,1

),(22 Rji

Rjih

Uniform 2-D blur

1 , if ,2 2 2( )( , )

0, otherwise

L Li jLh i j

Two dimensional discrete degradation model. Circular convolution Suppose we have a two-dimensional discrete signal ),( jif of size BA

samples which is due to a degradation process. The degradation can now be modeled by a two dimensional discrete impulse response ),( jih of size

DC samples. We form the extended versions of ),( jif and ),( jih , both of size NM ,

where 1 CAM and 1 DBN , and periodic with period NM . These

can be denoted as ),( jife and ),( jihe . For a space invariant degradation

process we obtain

1

0

1

0

),(),(),(),(M

mee

N

nee jinnjmihnmfjiy

Using matrix notation we can write the following form nHfy

Where, f and y are MN dimensional column vectors that represent the

lexicographic ordering of images ),( jife and ),( jihe respectively.

02M1M

201

11M0

HHH

HHH

HHH

H

)0,()2,()1,(

)2,()0,()1,(

)1,()1,()0,(

jhNjhNjh

jhjhjh

jhNjhjh

eee

eee

eee

j

H

The analysis of the diagonalisation of H is a straightforward extension of the one-dimensional case. In that case we end up with the following set of NM scalar problems.

)),()(,(),(),( vuNvuFvuMNHvuY

1,,1,0 ,1,,1,0 NvMu

2. Explain algebraic approach to (i) unconstrained restoration and (ii)

constrained restoration.

(i) unconstrained restoration

In the absence of any knowledge about the noise „n‟, a meaningful criterion function is to seek an f^ such that H f^ approximates of in a least square sense by assuming the noise term is as small as possible.

It is also known as least square error approach.

n = g-Hf

To estimate the original image f^, noise n has to be minimized and f^ = g/H

Where,

H = system operator. f^ = estimated input image. g = degraded image.

(ii) constrained restoration

The set-based approach described previously can be generalized so that any number of prior constraints can be imposed as long as the constraint sets are closed convex. If the constraint sets have a non-empty intersection, then a solution that belongs to the intersection set can be found by the method of POCS. Any solution in the intersection set is consistent with the a priori constraints and therefore it is a feasible solution. Let mQQQ ,,, 21 be closed convex sets

in a finite dimensional vector space, with mPPP ,,, 21 their respective

projectors. The iterative procedure

,1 2

P P Pm

f fkk 1

converges to a vector that belongs to the intersection of the sets

, 1,2, ,Q i mi , for any starting vector 0f . An iteration of the form

1 2P P

f f

kk 1 can be applied in the problem described previously.

where we seek for an image which lies in the intersection of the two ellipsoids defined by

2 2{ | }Q E f y Hf

f|y and

2 2{ | }Q f Cff

The respective projections f1P and f2P are defined by

1( )

1 1 1P λ λ

T Tf f I H H H y Hf

1[ ]

2 2 2P λ λ

T Tf I I C C C C f

3. What is the Inverse Filtering method for restoration of images? Explain. Inverse filtering:

Degradation model

( , ) ( , ) ( , ) ( , )

( , ) ( , ) ( , ) ( , )

g x y f x y h x y x y

G u v F u v H u v N u v

This expression tells us that even if we know the degradation

function we cannot recover the undegraded image exactly because of

the random noise, whose Fourier transform is not known. Image

restoration is an ill-posed problem. When H (u, v) is very small, the

noise term dominates the restoration result.

Inverse filter

( , ) ( , )ˆ ( , )( , ) ( , )

( , )( , )

( , )

G u v N u vF u v

H u v H u v

N u vF u v

H u v

Assume h is known (low-pass filter)

Inverse filter G(u,v) = 1 / H(u,v)

Problems with Inverse Filtering H(u,v) = 0, for some u, v In noisy case,

X(u,v) Y(u,v) G(u,v)

y x h n

n: additive noise

The simplest approach to restoration is direct inverse filtering, where we compute an estimate, f(u,v), of the transform of the original image simply by dividing the transform of the degraded image, G(u,v), by the degradation function:

This is an interesting expression. It tells us that even if we know the degradation function we cannot recover the undegraded image exactly because N(u,v) is a random function whose Fourier transform is not known. There is more bad. News. If the degradation has zero or very small values, then the ratio N(u,v)/H(u,v) could easily dominate the estimate F(u,v). this in fact is frequently the case. One approach to get around the zero or small value problem is to limit the filter frequencies to values near the origin.

By this equation we known that H(0,0) is equal to the average value of h(x,y) and that this is usually the highest value of H(u,v) in the frequency domain. Thus by limiting the analysis to frequencies near the origin, we reduce the probability of encountering zero values.

The objective is to minimize

2 2( ) ( )J f n f y Hf

We set the first derivative of the cost function equal to zero

( )0 2 ( )

J

f TH y Hf

f0

If NM and 1H

exists then

-1f H y

According to the previous analysis if H (and therefore -1H ) is block

circulant the above problem can be solved as a set of NM scalar problems as follows

( , ) ( , ) ( , ) ( , )1( , ) ( , )2 2

( , ) ( , )

H u v Y u v H u v Y u vF u v f i j

H u v H u v

( ) T -1 Tf H H H y

Computational issues concerning inverse filtering

(I) Suppose first that the additive noise ),( jin is negligible. A problem

arises if ),( vuH becomes very small or zero for some point ),( vu or

for a whole region in the ),( vu plane. In that region inverse filtering

cannot be applied. Note that in most real applications ),( vuH drops

off rapidly as a function of distance from the origin. The solution is that if these points are known they can be neglected in the computation of ),( vuF .

(II) In the presence of external noise we have that

( , ) ( , ) ( , )ˆ ( , )2

( , )

H u v Y u v N u vF u v

H u v

( , ) ( , ) ( , ) ( , )

2 2( , ) ( , )

H u v Y u v H u v N u v

H u v H u v

( , )ˆ ( , ) ( , )( , )

N u vF u v F u v

H u v

If ),( vuH becomes very small, the term ),( vuN dominates the result. The

solution is again to carry out the restoration process in a limited neighborhood about the origin where ),( vuH is not very small. This

procedure is called pseudoinverse filtering. In that case we set

( , ) ( , )( , )

2( , )

ˆ ( , )

0 ( , )

H u v Y u vH u v T

H u v

F u v

H u v T

The threshold T is defined by the user. In general, the noise may very well possess large components at high frequencies ),( vu , while ),( vuH and

),( vuY normally will be dominated by low frequency components.

4. Discuss the Wiener Filter method for restoration of images.

Wiener filter:

It removes the additive noise and inverts the blurring simultaneously.

The Wiener filtering is optimal in terms of the mean square error.

In most images, adjacent pixels are highly correlated, while the gray

level of widely separated pixels are only loosely correlated.

Therefore, the autocorrelation function of typical images generally

decreases away from the origin.

Power spectrum of an image is the Fourier transform of its

autocorrelation function, therefore we can argue that the power

spectrum of an image generally decreases with frequency.

Typical noise sources have either a flat power spectrum or one that

decreases with frequency more slowly than typical image power

spectrum.

Therefore, the expected situation is for the signal to dominate the

spectrum at low frequencies, while the noise dominates the high

frequencies.

Where Sxx (f1,f2), Sηη(f1,f2) are respectively power spectra of the original image and the additive noise, and H(f1,f2) is the blurring filter.

Wiener filter has two separate parts, an inverse filtering part and a noise smoothing part.

It performs the deconvolution by inverse filtering (highpass filtering) and removes the noise with a compression operation (lowpass filtering). Wiener Filter Formulation

Degradation model

( , ) ( , ) ( , ) ( , )

( , ) ( , ) ( , ) ( , )

g x y f x y h x y x y

G u v F u v H u v N u v

Wiener filter

2( , )1ˆ ( , ) ( , )

2( , )( , ) ( , ) / ( , )

2( , )1 ( , )

2( , )( , )

H u vF u v G u v

H u vH u v P u v P u vn f

H u vG u v

H u vH u v K

Least Mean Square Filter :-

*H (u,v)G(u,v)

2H(u,v) S (u,v)/S (u,v)xn

In practice :-

*H (u,v)G(u,v)

2H(u,v) K

Wiener filtering - problems

The power spectra of the undegraded image and noise must be

known.

Weights all errors equally regardless of their location in the image,

while the eye is considerably more tolerant of errors in the dark areas

and high-gradient areas in the image.

In minimizing the mean square error, Wiener filter also smooth the

image more than the eye would prefer.

Assumptions

o The original image and noise are statistically independent

o The power spectral density of the original image and noise are

known

o Both the original image and noise are zero mean

Wiener Filtering: Special Cases

Balancing between two jobs for deblurring noisy image

– HPF filter for de-blurring

(undo H distortion)

– LPF for suppressing noise

1 2 wiener

*

1( , )

uu

GS

HH S

Noiseless case ~ S = 0

– Wiener filter becomes pseudo-inverse filter for S 0

*1 2

1 21 2 0 02

1 2

1, if |H( , )| 0

( , )( , )| | /

0, if |H( , )| 0

S S

uv

HHG

H S S

No-blur case ~ H = 1 (Wiener Smoothing Filter)

– Zero-phase filter to attenuate noise according to SNR at each

freq.

1 2 1 21 2 wiener 1

1 2 1 2 1 2

( , ) ( , )( , )

( , ) ( , ) ( , ) 1

uu SNRH

uu SNR

S SG

S S S

In practice, we often approx SNR as a constant to walk around

the need of estimating image p.s.d .

Note the phase response of Wiener is the same as inverse filter

1 / H( ), i.e. not compensate phase distortion due to noise

Comparisons:

Fig : Wiener Filter Characteristics

1 2 wiener

*

1( , )

uu

GS

HH S

1 2 1 2

1 2

1 ( , ) for|H( , )|

( , )choose G as

H

5. What is constrained least squares restoration? Explain.

Only the mean and variance of the noise is required

The degradation model in vector-matrix form

1 1 1MN MNMN MN MN

g H f η

The objective function

The solution

*( , )ˆ ( , ) ( , )2

( , ) ( , )

H u vF u v G u v

H u v P u v

0 1 0

( , ) 1 4 1

0 1 0

p x y

In that case we seek for a solution that minimizes the function

( )M 2

f y Hf

A necessary condition for )(fM to have a minimum is that its gradient

with respect to f is equal to zero. This gradient is given below

( )( ) 2(

MM

f T Tf H y H Hf)

ff

And by using the steepest descent type of optimization we can formulate an iterative rule as follows:

Tf H y0

( )( ) ( )

M

fT T Tkf f f H y Hf H y I H H f

k k k kfk

k 1

21 1 2min [ ( , )]0 0

2 2

M NC f x y

x y

subject to

g Hf η

Constrained least squares iteration In this method we attempt to solve the problem of constrained restoration iteratively. As already mentioned the following functional is minimized

2 2( , )M f y Hf Cf

The necessary condition for a minimum is that the gradient of ),( fM

is equal to zero. That gradient is

( ) ( , ) 2[( ) ]M T T Tf f H H C C f H yf

The initial estimate and the updating rule for obtaining the restored image are now given by

Tf H y0

[ ( ) ]

T T Tf f H y H H C C fk kk 1

It can be proved that the above iteration (known as Iterative CLS or Tikhonov-Miller Method) converges if

20max

where max is the maximum eigenvalue of the matrix

( )T TH H C C

If the matrices H and C are block-circulant the iteration can be implemented in the frequency domain.

6. What is meant by interactive restoration? Explain.

In general, iterative restoration refers to any technique that attempts to minimize a function of the form )(fM using an updating

rule for the partially restored image.

Motivation of iterative method

Wiener filter needs prior knowledge of power spectral density of

original image, which is often unavailable

The challenge is to estimate power spectral density of original image

from a single copy of degraded image

Rationale of iterative method

Use the restored image as an improved prototype of the original

image, estimate its power spectral density, and construct Wiener

filter iteratively.

Basic iterative algorithm

The degraded image is used as an initial estimate of original

image, and a restored image is attained from the corresponding

Wiener filter.

The restored image is used as an updated estimate of the

original image and leads to a new restoration.

The iterations continue until the estimate converges.

Additive iterative algorithm

It can be proved that in basic iterative algorithm the estimate

converges, but not to its true value.

Correction item is added in each iteration.

Implementation

Power spectral density is estimated using periodogram

Degradation model is designed to be a low pass filter (a circulant

matrix)

Iterative Procedure

They refer to a class of iterative procedures that successively use the Wiener filtered signal as an improved prototype to update the covariance estimates of the original image as follows.

Step 0: Initial estimate of Rff

(0) { }E TR R yyyyff

Step 1: Construct the thi restoration filter

( 1) ( ) ( ( ) )i i i T TW R H HR H Rnnff ff1

Step 2: Obtain the th)1( i estimate of the restored image

ˆ( 1) ( 1)i i f W y

Step 3: Use )1(ˆ if to compute an improved estimate of ffR given by

ˆ ˆ( 1) { ( 1) ( 1)}i E i i TR f fff

Step 4: Increase i and repeat steps 1,2,3,4.

7. Explain (i) minimum distance classifier (ii) matching by correlation.

i) Minimum distance classifier suppose that we define the prototype of each pattern class to be the mean vector of the patterns of the class;

……………………… (1)

Where Nj is the number of pattern vectors from class ωj and the summations is taken over these vectors. As before, W is the number of pattern classes. One way to determine the class membership of an unknown pattern vector x is to assigh it to the class of its closest prototype, as noted previously. Using the Euclidean distance to determine closeness reduces the problem to computing the distance measure:

………………….(2)

we then assign x to class ωj if Di(x) is the smallest distance. That is, the smallest distance implies the best match in this formulation. It is not difficult to show that selecting the smallest distance is equivalent to evaluating the functions

………………(3)

And assigning x to class ωi if di(x) yields the largest numerical value. This formulation agrees with the concept of a decision function, as define in Eq.1 . For Equation 2 and 3 the decision boundary between classes ωi and ωj for a minimum distance classifier is

…..(4)

The surface given by eq.(4) is the perpendicular bisector of the line segment joining mi and mj. For n=2 the perpendicular bisector is a line, for n=3 it is a plane and for n=3 it is called a hyper plane.

Minimum distance classifier works well when the distance between means is large compared to the spread or randomness of each class with respect to its mean. The minimum distance classifier yields optimum performance when the distribution of each class about its mean is in the form of a spherical „hyper cloud” in n- dimensional pattern space. The simultaneous occurrence of large mean separations and relatively small class spread occur seldomly in practice unless the system designer controls the nature of the input. An excellent example is provided by systems designed to read stylized character fonts, such as the familiar American banker‟s association font character set.

Its shows, this particular font set consists of 14 characters that were purposely designed on a 9 ×7 grid in order to facilitate their reading. The characters usually are printed in ink that contains finely ground magnetic material. Prior to being read, the ink is subjected to a magnetic field, which accentuates each character to simplify detection. In other words. The segmentation problem is solved by artificially highlighting the key characteristics of each character

ii) Matching by correlation

Let w(x,y) be a sub image of size J×K within the image f(x,y) of size M×N, where we assume that J≤M and K≤N. although the correlation approach can be expressed in vector form working directly with an image or sub image format is more intuitive. In its simplest form, the correlation between f(x,y) and w(x,y)

……………(5)

For x= 0,1,2,….,M -1, y=0,1,2,…,N-1 and the summation is taken over the image region whre w and F overlap. Note by comparing this equation

with Equation it can be noted that its is implicitly assumed that the functions are real quantities and that we left out the MN constant. The reason is that we are going to use a normalized function in which these constants cancel out and the definition given in in equation 12.2.7 is used comomonly in practice. We also used the symbols s and t in Eq.(5) to avoid confusion with m and n

The above figure illustrates the procedure where we assume that the origin of f is at its top left and the origin of w is at is center. For one value of (x,y), say,(x0,y0) inside f, application of Eq.(5) yields one value of c. As x and y are varied, w moves around the image area, giving the function c(x,y). the maximum value(s) of c indicates the position(s) where w best matches ƒ. note that accuracy is lost for values of x and y near the edges of f, with the amount of error being in the correlation proportional to the size of w. The correlation function given in Eq in (5) has the disadvantage of being sensitive to changes in the amplitude of f and w. for example doubling all the values of ƒ doubles the value of c(x,y). an approach frequently used to overcome this difficulty is to perform matching via the correlation coefficient, which is defined as

Where x=0,1,2,…..,M-1,y=0,1,2,….,N -1, w‾ is the average value of the pixels in w( computed only once), f‾ is the average value of f in the region coincident with the current location of w, and the summations are taken

over the coordinates common to both f and w. the correlation coefficient γ (x,y) is scaled in the range -1 to 1, independent of scale changes in the amplitude of ƒ and ω. Although the correlation function can be normalized for amplitude changes through the correlation coefficient, obtaining normalization for changes in size and rotation can be difficult. Normalizing for changes in size and rotation can be difficult. Normalizing for size involves spatial scaling, a process that in itself adds a significant amount of computation. Normalizing for rotation is even more difficult. If a clue regarding rotation can be extracted from ƒ(x,y). However, if the nature of rotation is unknown, looking for the best match requires exhaustive rotations of ω(x,y). this procedure is impractical and as a consequence, correlation seldom is used in cases when arbitrary or unconstrained rotation is present. The correlation also can be carried out in the frequency domain through the FFT. If ƒ and ω are the same size, this approach can be more efficient than direct implementation of correlation in the spatial domain. Equation (5) is used when ω is much smaller than ƒ.

8. What are (i) optimal statistical classifiers (ii) Bayes classifier for

Gaussian pattern classes? Explain.

i) Optimal statistical classifiers

In most of the fields measuring and interpreting of physical events,

probability considerations are dealt with and it has become much

important in pattern recognition because of the randomness under which

pattern classes normally are generated. It is possible to derive a

classification approach that is optimal in the sense that, on average, its use

yields the lowest probability of committing classification errors.

Foundation:

The probability that a particular pattern x comes from class wj, is

denoted p(wj/x). if the pattern classifier decides that x came from wj, it

incurs a loss, denoted Lij. As pattern x may belong to nay of W classes

under consideration, the average loss incurred in assigning x to class wj is

…………………….(1)

This equation often is called the conditional average risk or loss in

decision theory terminology.

From basic probability theory, we know that p(A/B) =[p(A)p(B/A)]/p(B).

Using this expression, we write equation 1 in the form,

………………..(2)

Where p(x/wk) is the probability density function of the patterns from

class wk and p(wk) is the probability of occurrence of class wk. because

1/p(x) is positive and common to all the rj(x), j= 1,2,……,W, it can be

dropped form eq.(2) without affecting the relative order of theses function

form the smallest to the largest values. The expression for the average loss

then reduces to

……………(3)

The classifier has W possible classes to choose from for any give unknown

pattern. If is computes r1(x), r2(x),……rw(x) for each pattern x and

assigns the pattern to the class with the smallest loss, the total average loss

with respect to all decisions will be minimum.

The classifier that minimizes the total average loss is called the bayes

classifier. Thus the bayes classifier assigns an unknown pattern x to class

wj if ri(x),rj(x) for j= 1,2,…..,W; j≠i. in other words x is assigned to class wi

if

………..(4)

For all j;j j≠I, the “loss” for a correct decision generally is assigned a values

of zero and the loss for any incorrect decision usually is assigned the same

nonzero value. Under these conditions, the loss function becomes

………………………………….(5)

Where ∂ij=1 if i=j and ∂ij=0 if i≠j. equation 5 indicates a loss of zero for

correct decisions. Substituting equation 5 in equation in 3 yields

………………………(6)

The bayes classifier then assigns a pattern x to class wi if all i≠j

………….(7)

Or equivalently ,if

……(8)

We see that the Bayes classifier for a 0-1 loss function is nothing more that

computation of decision function of the form.

…………………(9)

Where a pattern vector x is assigned to the class whose decision function

yields the largest numerical value

The decision functions given in following eq

……………….(10)

It is optimal in the sense that they minimize the average loss in

misclassification. For this optimality to hold, however, the probability

density functions of the patterns in each class, as well as the probability of

occurrence of each class, must be known.

The latter requirement usually is not a problem. For instance, if all classes

are equally likely to occur, the P(w j)=1/M. even if this condition is not

true, these probabilities generally can be inferred from knowledge of the

problem. Estimation of the probability density functions p(x/ w j) is a

function of n variables, which, if its form is not known, requires methods

from multivariate probability theory for its estimation.

These methods are difficult to apply in practice, especially if the number

of representative patterns from each class is not large or if the underlying

form of the probability density functions is not well behaved. For these

reasons, use of the bayes classifier generally is based on the assumption of

an analytic expression for the various density functions and then as

estimation of the necessary parameters from sample patterns from each

class. By far the most prevalent form assumed for p(x/ w j) is the Gaussian

probability density function. The closer this assumption is to reality, the

closer the bayes classifier approaches the minimum average loss in

classification.

ii) Bayes classifier for Gaussian pattern classes

Let us consider a one dimensional problem (n=1) involving two

pattern classes (W=2) governed by Gaussian densities, with means m1 and

m2 and standard deviation of σ1 and σ2, respectively. The bayes decision

functions have the form

……………….(1)

Where the patterns are now scalars, denoted by x. figure shows a plot of

the probability density functions for the two classes

Fig : Probability density functions for 1-D pattern classes.

The point x0 shown is the decision boundary if the two classes are

equally likely to occur

The boundary between the two classes is a single point, denoted x0 such

that d1(x0)= d2(x0). If the two classes are equally likely to occur, the

P(w1)=P(w2)= ½ and the decision boundary is the value of x0 for which

p(x0/w1)=p(x0/w2).

This point is the intersection of the two probability density functions are

shown in the figure. Any pattern to the right of x0 is classified as

belonging to class w2. When the classes are not equally likely to occur, x0

moves to the left is class w1 is more likely to occur or, conversely, to the

right if class w2 is more likely to occur. This result is to be expected,

because the classifier is trying to minimize the loss of misclassification. For

instance, in the extreme case, if class w2 never occurs, the classifier would

never make a mistake by always assigning all patterns to class w1.

In the n- dimensional case, the Gaussian density of the vectors in the jth

pattern class has the for

……………(2)

Where each density is specified completely by its mean vector mj and

covariance matrix cj, which are defines as

………………………………(3)

And

……………….(4)

Where Ej{.} denotes the expected value of the argument over the

patterns of class wj. In Eq (3) & (4), n is the dimensionality of the pattern

vectors, and |Cj| is the determinant of the matrix Cj. Approximating the

expected value Ej by average value of the quantities in question yields and

estimate of the mean vector and covariance matrix :

………………………. (5)

And

………………….. (6)

Where Nj is the number of pattern vectors from class wj and the

summation is taken over these vectors. Later in this section we give an

example of how to use these two expressions.

According to eq. (1) the bayes decision function of class wj is

dj(x)=p(x/wj)P(wj). However, because of the exponential form of the

Gaussian density, working with natural logarithm of this decision

function is more convenient. In other words, we can use the form

………………….(7)

This expression is equivalent to Eq. (7) in terms of classification

performance because the logarithm is a monotonically increasing function.

In other words, the numerical order of the decision functions in

Eq .(1) and (7) is the same. Substituting Eq (2) in Eq. (7) yields

……(8)

The term (n/2) in 2π is the same for all classes, so it can be eliminated

from Eq. (8) which then becomes

………..(9)

For j=1,2…,W. Equation (5) represents the Bayes decision functions for

Gaussian pattern classes under the condition of a 0-1 loss function.

The decision functions. In Eq (9) are hyper quadrics because no terms

higher that the second degree in the components of x appear in the

equation. Clearly, then, the best that a bayes classifier for Gaussian

patterns can do is to place a general second order decision surface

between each pair of pattern classes.

If the pattern populations are truly Gaussian, however, no other surface

would yield a lesser average loss in classification if all convariance matices

are equal, the Cj=C, for j=1,2,…..,W. by expanding Eq. (9) and dropping all

terms independent of j we obtain

…………….(10)

Which are linear decision functions for j=1,2,…..,W

If in addition, C=I whre I is the identity matrix and also (wj)=1/W, for j=

1,2,…..,W, then

………………(11)

9. Explain perceptron for two pattern classes.

10. Explain multilayer feedforward networks in pattern recognition.

Engineering

Digital Image Processing - Image Restoration