Upload
mathankumar-subramaniam
View
383
Download
7
Embed Size (px)
Citation preview
Unit IV : Image Restoration Two mark Questions 1. What is image restoration?
Restoration is a process of reconstructing or recovering an image that has been degraded by using a priori knowledge of the degradation phenomenon. Thus restoration techniques are oriented towards modeling the degradation and applying the inverse process in order to recover the original image.
Restoration attempts to reconstruct or recover an image that has
been degraded by using a clear knowledge of the degrading phenomenon.
2. What is meant by unconstrained restoration?
In the absence of any knowledge about the noise „n‟, a meaningful
criterion function is to seek an f^ such that H f^ approximates of in a least square sense by assuming the noise term is as small as possible.
It is also known as least square error approach.
n = g-Hf
To estimate the original image f^, noise n has to be minimized and f^ = g/H
Where,
H = system operator. f^ = estimated input image. g = degraded image.
3. What is meant by constrained restoration?
It is also known as maximum square error approach n = g-Hf. To estimate
the original image f^, noise n has to be maximized and f^ = g/H.
4. What is inverse filtering?
The simplest approach to restoration is direct inverse filtering, an
estimate F^(u,v) of thetransform of the original image simply by dividing the transform of the degraded image G^(u,v) by the degradation function.
F^ (u,v) = G^(u,v)/H(u,v)
Inverse filtering is the process of recovering the input of the system from its output.
5. What is interactive restoration?
In general, iterative restoration refers to any technique that attempts to minimize a function of the form )(fM using an updating rule for the
partially restored image.
6. What is a pattern?
Pattern is a quantitative or structural description of an object or some other entity of interest in an image. It is formed by one or more descriptors.
7. What is a pattern classifier?
It is a family of patterns that share some common properties. Pattern classes are denoted as w1 ,w2 ,w3 ………, wM , where M is the number of classes.
8. What are optimal statistical classifiers?
In most of the fields measuring and interpreting of physical events, probability considerations are dealt with and it has become much important in pattern recognition because of the randomness under which pattern classes normally are generated. It is possible to derive a classification approach that is optimal in the sense that, on average, its use yields the lowest probability of committing classification errors.
9. Give the mathematical form of the Bayes decision function.
10. What are artificial neural networks?
11. What is a multilayer feedforward neural network?
It consists of layer of structurally identical computing nodes (neurons) arranged so that the output of every neurons in one layer feeds into the input of every neuron in next layer. The number of neurons in the first layer, called A, is NA. Often, NA=n, the dimensionality of the input pattern vectors. The number of neurons in the output layer, called layer Q, is denoted NQ. The number of NQ equals w, the number of pattern classes that the neural networks has been trained to recognize. The network recognize a pattern vector X as belonging to classes if the ith output of the network is “high” while all other outputs are “low”.
12. What is meant by “training” in artificial neural network?
Set weights to minimize Levenberg-Marquad Algorithm (multi-dim steepest descent). Training is very expensive computationally. If there are x input nodes, „t‟ output nodes, and „p‟ hidden nodes, then the weights = (x+t)p.
13. What is the concept behind algebraic approach to restoration?
The concept of algebraic approach is to estimate the original image which minimizes a predefined criterion of performances.
14. What are the methods to estimating the degradation function?
The three methods of degradation function are,
Observation Experimentation Mathematical modeling
15. How the blur is removed caused by uniform linear motion?
An image f(x,y) undergoes planar motion in the x and y-direction and x0(t) and y0(t) are the time varying components of motion. The total exposure at any point of the recording medium (digital memory) is obtained by integrating the instantaneous exposure over the time interval during which the imaging system shutter is open.
16. What is meant by least mean square filter?
The limitation of inverse and pseudo inverse filter is very sensitive noise. The wiener filtering is a method of restoring images in the presence of blurr as well as noise.
17. Give the difference between Enhancement and Restoration.
Enhancement technique is based primarily on the pleasing aspects it
might present to the viewer. For example: Contrast Stretching. Where as Removal of image blur by applying a deblurrings function is considered a restoration technique.
18. What are the properties of Linear Operator?
Additivity
Homogenity
19. What are the methods of algebraic approach?
Unconstraint restoration approach Constraint restoration approach
20. What are the types of noise models?
Guassian noise Rayleigh noise Erlang noise Exponential noise Uniform noise Impulse noise
21. What is meant by noise probability density function?
The spatial noise descriptor is the statistical behavior of gray level values in the noise component of the model.
22. What is meant by blind image restoration?
Degradation may be difficult to measure or may be time varying in an unpredictable manner. In such cases information about the degradation must be extracted from the observed image either explicitly or implicitly. This task is called blind image restoration.
23. What are the approaches for blind image restoration?
Direct measurement Indirect estimation
24. What is blur impulse response and noise levels?
Blur impulse response: This parameter is measured by isolating an image of a suspected object within a picture.
Noise levels: The noise of an observed image can be estimated by measuring the image covariance over a region of constant background luminance.
25. What is meant by indirect estimation?
Indirect estimation method employs temporal or spatial averaging to either obtain a restoration or to obtain key elements of an image restoration algorithm.
Twelve mark Questions 1. Explain degradation model for (i) continuous function (ii) discrete
formulation
• Restoration attempts to reconstruct or recover an image that has been degraded by using a priori knowledge of the degradation phenomenon.
• Restoration techniques are oriented toward modeling the degradation and applying the inverse process in order to recover the original image.
Degradation models:
Many types of degradation can be approximated by linear, space invariant processes Can take advantages of the mature techniques developed for
linear systems Non-linear and space variant models are more accurate
Difficult to solve Unsolvable
Estimating degradation function
Estimation by image observation Degradation system H is completely characterized by its
impulse response
Select a small section from the degraded image ( , )g x ys
Reconstruct an unblurred image of the same size ˆ ( , )f x ys
The degradation function can be estimated by ( , )
( , )ˆ ( , )
ss
s
G u vH u v
F u v
By ignoring the noise term, G(u,v) = F(u,v)H(u,v). If F(u,v) is the Fourier transform of point source (impulse), then G(u,v) is approximates H(u,v).
Fig: A model of the image degradation / restoration process Continuous degradation model
Motion blur. It occurs when there is relative motion between the object and the camera during exposure.
otherwise,022
if,1
)(
Li
L
Lih
Atmospheric turbulence. It is due to random variations in the reflective index of the medium between the object and the imaging system and it occurs in the imaging of astronomical objects.
2
22
2exp),(
jiKjih
Uniform out of focus blur
otherwise,0
if,1
),(22 Rji
Rjih
Uniform 2-D blur
1 , if ,2 2 2( )( , )
0, otherwise
L Li jLh i j
Two dimensional discrete degradation model. Circular convolution Suppose we have a two-dimensional discrete signal ),( jif of size BA
samples which is due to a degradation process. The degradation can now be modeled by a two dimensional discrete impulse response ),( jih of size
DC samples. We form the extended versions of ),( jif and ),( jih , both of size NM ,
where 1 CAM and 1 DBN , and periodic with period NM . These
can be denoted as ),( jife and ),( jihe . For a space invariant degradation
process we obtain
1
0
1
0
),(),(),(),(M
mee
N
nee jinnjmihnmfjiy
Using matrix notation we can write the following form nHfy
Where, f and y are MN dimensional column vectors that represent the
lexicographic ordering of images ),( jife and ),( jihe respectively.
02M1M
201
11M0
HHH
HHH
HHH
H
)0,()2,()1,(
)2,()0,()1,(
)1,()1,()0,(
jhNjhNjh
jhjhjh
jhNjhjh
eee
eee
eee
j
H
The analysis of the diagonalisation of H is a straightforward extension of the one-dimensional case. In that case we end up with the following set of NM scalar problems.
)),()(,(),(),( vuNvuFvuMNHvuY
1,,1,0 ,1,,1,0 NvMu
2. Explain algebraic approach to (i) unconstrained restoration and (ii)
constrained restoration.
(i) unconstrained restoration
In the absence of any knowledge about the noise „n‟, a meaningful criterion function is to seek an f^ such that H f^ approximates of in a least square sense by assuming the noise term is as small as possible.
It is also known as least square error approach.
n = g-Hf
To estimate the original image f^, noise n has to be minimized and f^ = g/H
Where,
H = system operator. f^ = estimated input image. g = degraded image.
(ii) constrained restoration
The set-based approach described previously can be generalized so that any number of prior constraints can be imposed as long as the constraint sets are closed convex. If the constraint sets have a non-empty intersection, then a solution that belongs to the intersection set can be found by the method of POCS. Any solution in the intersection set is consistent with the a priori constraints and therefore it is a feasible solution. Let mQQQ ,,, 21 be closed convex sets
in a finite dimensional vector space, with mPPP ,,, 21 their respective
projectors. The iterative procedure
,1 2
P P Pm
f fkk 1
converges to a vector that belongs to the intersection of the sets
, 1,2, ,Q i mi , for any starting vector 0f . An iteration of the form
1 2P P
f f
kk 1 can be applied in the problem described previously.
where we seek for an image which lies in the intersection of the two ellipsoids defined by
2 2{ | }Q E f y Hf
f|y and
2 2{ | }Q f Cff
The respective projections f1P and f2P are defined by
1( )
1 1 1P λ λ
T Tf f I H H H y Hf
1[ ]
2 2 2P λ λ
T Tf I I C C C C f
3. What is the Inverse Filtering method for restoration of images? Explain. Inverse filtering:
Degradation model
( , ) ( , ) ( , ) ( , )
( , ) ( , ) ( , ) ( , )
g x y f x y h x y x y
G u v F u v H u v N u v
This expression tells us that even if we know the degradation
function we cannot recover the undegraded image exactly because of
the random noise, whose Fourier transform is not known. Image
restoration is an ill-posed problem. When H (u, v) is very small, the
noise term dominates the restoration result.
Inverse filter
( , ) ( , )ˆ ( , )( , ) ( , )
( , )( , )
( , )
G u v N u vF u v
H u v H u v
N u vF u v
H u v
Assume h is known (low-pass filter)
Inverse filter G(u,v) = 1 / H(u,v)
Problems with Inverse Filtering H(u,v) = 0, for some u, v In noisy case,
X(u,v) Y(u,v) G(u,v)
y x h n
n: additive noise
The simplest approach to restoration is direct inverse filtering, where we compute an estimate, f(u,v), of the transform of the original image simply by dividing the transform of the degraded image, G(u,v), by the degradation function:
This is an interesting expression. It tells us that even if we know the degradation function we cannot recover the undegraded image exactly because N(u,v) is a random function whose Fourier transform is not known. There is more bad. News. If the degradation has zero or very small values, then the ratio N(u,v)/H(u,v) could easily dominate the estimate F(u,v). this in fact is frequently the case. One approach to get around the zero or small value problem is to limit the filter frequencies to values near the origin.
By this equation we known that H(0,0) is equal to the average value of h(x,y) and that this is usually the highest value of H(u,v) in the frequency domain. Thus by limiting the analysis to frequencies near the origin, we reduce the probability of encountering zero values.
The objective is to minimize
2 2( ) ( )J f n f y Hf
We set the first derivative of the cost function equal to zero
( )0 2 ( )
J
f TH y Hf
f0
If NM and 1H
exists then
-1f H y
According to the previous analysis if H (and therefore -1H ) is block
circulant the above problem can be solved as a set of NM scalar problems as follows
( , ) ( , ) ( , ) ( , )1( , ) ( , )2 2
( , ) ( , )
H u v Y u v H u v Y u vF u v f i j
H u v H u v
( ) T -1 Tf H H H y
Computational issues concerning inverse filtering
(I) Suppose first that the additive noise ),( jin is negligible. A problem
arises if ),( vuH becomes very small or zero for some point ),( vu or
for a whole region in the ),( vu plane. In that region inverse filtering
cannot be applied. Note that in most real applications ),( vuH drops
off rapidly as a function of distance from the origin. The solution is that if these points are known they can be neglected in the computation of ),( vuF .
(II) In the presence of external noise we have that
( , ) ( , ) ( , )ˆ ( , )2
( , )
H u v Y u v N u vF u v
H u v
( , ) ( , ) ( , ) ( , )
2 2( , ) ( , )
H u v Y u v H u v N u v
H u v H u v
( , )ˆ ( , ) ( , )( , )
N u vF u v F u v
H u v
If ),( vuH becomes very small, the term ),( vuN dominates the result. The
solution is again to carry out the restoration process in a limited neighborhood about the origin where ),( vuH is not very small. This
procedure is called pseudoinverse filtering. In that case we set
( , ) ( , )( , )
2( , )
ˆ ( , )
0 ( , )
H u v Y u vH u v T
H u v
F u v
H u v T
The threshold T is defined by the user. In general, the noise may very well possess large components at high frequencies ),( vu , while ),( vuH and
),( vuY normally will be dominated by low frequency components.
4. Discuss the Wiener Filter method for restoration of images.
Wiener filter:
It removes the additive noise and inverts the blurring simultaneously.
The Wiener filtering is optimal in terms of the mean square error.
In most images, adjacent pixels are highly correlated, while the gray
level of widely separated pixels are only loosely correlated.
Therefore, the autocorrelation function of typical images generally
decreases away from the origin.
Power spectrum of an image is the Fourier transform of its
autocorrelation function, therefore we can argue that the power
spectrum of an image generally decreases with frequency.
Typical noise sources have either a flat power spectrum or one that
decreases with frequency more slowly than typical image power
spectrum.
Therefore, the expected situation is for the signal to dominate the
spectrum at low frequencies, while the noise dominates the high
frequencies.
Where Sxx (f1,f2), Sηη(f1,f2) are respectively power spectra of the original image and the additive noise, and H(f1,f2) is the blurring filter.
Wiener filter has two separate parts, an inverse filtering part and a noise smoothing part.
It performs the deconvolution by inverse filtering (highpass filtering) and removes the noise with a compression operation (lowpass filtering). Wiener Filter Formulation
Degradation model
( , ) ( , ) ( , ) ( , )
( , ) ( , ) ( , ) ( , )
g x y f x y h x y x y
G u v F u v H u v N u v
Wiener filter
2( , )1ˆ ( , ) ( , )
2( , )( , ) ( , ) / ( , )
2( , )1 ( , )
2( , )( , )
H u vF u v G u v
H u vH u v P u v P u vn f
H u vG u v
H u vH u v K
Least Mean Square Filter :-
*H (u,v)G(u,v)
2H(u,v) S (u,v)/S (u,v)xn
In practice :-
*H (u,v)G(u,v)
2H(u,v) K
Wiener filtering - problems
The power spectra of the undegraded image and noise must be
known.
Weights all errors equally regardless of their location in the image,
while the eye is considerably more tolerant of errors in the dark areas
and high-gradient areas in the image.
In minimizing the mean square error, Wiener filter also smooth the
image more than the eye would prefer.
Assumptions
o The original image and noise are statistically independent
o The power spectral density of the original image and noise are
known
o Both the original image and noise are zero mean
Wiener Filtering: Special Cases
Balancing between two jobs for deblurring noisy image
– HPF filter for de-blurring
(undo H distortion)
– LPF for suppressing noise
1 2 wiener
*
1( , )
uu
GS
HH S
Noiseless case ~ S = 0
– Wiener filter becomes pseudo-inverse filter for S 0
*1 2
1 21 2 0 02
1 2
1, if |H( , )| 0
( , )( , )| | /
0, if |H( , )| 0
S S
uv
HHG
H S S
No-blur case ~ H = 1 (Wiener Smoothing Filter)
– Zero-phase filter to attenuate noise according to SNR at each
freq.
1 2 1 21 2 wiener 1
1 2 1 2 1 2
( , ) ( , )( , )
( , ) ( , ) ( , ) 1
uu SNRH
uu SNR
S SG
S S S
In practice, we often approx SNR as a constant to walk around
the need of estimating image p.s.d .
Note the phase response of Wiener is the same as inverse filter
1 / H( ), i.e. not compensate phase distortion due to noise
Comparisons:
Fig : Wiener Filter Characteristics
1 2 wiener
*
1( , )
uu
GS
HH S
1 2 1 2
1 2
1 ( , ) for|H( , )|
( , )choose G as
H
5. What is constrained least squares restoration? Explain.
Only the mean and variance of the noise is required
The degradation model in vector-matrix form
1 1 1MN MNMN MN MN
g H f η
The objective function
The solution
*( , )ˆ ( , ) ( , )2
( , ) ( , )
H u vF u v G u v
H u v P u v
0 1 0
( , ) 1 4 1
0 1 0
p x y
In that case we seek for a solution that minimizes the function
( )M 2
f y Hf
A necessary condition for )(fM to have a minimum is that its gradient
with respect to f is equal to zero. This gradient is given below
( )( ) 2(
MM
f T Tf H y H Hf)
ff
And by using the steepest descent type of optimization we can formulate an iterative rule as follows:
Tf H y0
( )( ) ( )
M
fT T Tkf f f H y Hf H y I H H f
k k k kfk
k 1
21 1 2min [ ( , )]0 0
2 2
M NC f x y
x y
subject to
g Hf η
Constrained least squares iteration In this method we attempt to solve the problem of constrained restoration iteratively. As already mentioned the following functional is minimized
2 2( , )M f y Hf Cf
The necessary condition for a minimum is that the gradient of ),( fM
is equal to zero. That gradient is
( ) ( , ) 2[( ) ]M T T Tf f H H C C f H yf
The initial estimate and the updating rule for obtaining the restored image are now given by
Tf H y0
[ ( ) ]
T T Tf f H y H H C C fk kk 1
It can be proved that the above iteration (known as Iterative CLS or Tikhonov-Miller Method) converges if
20max
where max is the maximum eigenvalue of the matrix
( )T TH H C C
If the matrices H and C are block-circulant the iteration can be implemented in the frequency domain.
6. What is meant by interactive restoration? Explain.
In general, iterative restoration refers to any technique that attempts to minimize a function of the form )(fM using an updating
rule for the partially restored image.
Motivation of iterative method
Wiener filter needs prior knowledge of power spectral density of
original image, which is often unavailable
The challenge is to estimate power spectral density of original image
from a single copy of degraded image
Rationale of iterative method
Use the restored image as an improved prototype of the original
image, estimate its power spectral density, and construct Wiener
filter iteratively.
Basic iterative algorithm
The degraded image is used as an initial estimate of original
image, and a restored image is attained from the corresponding
Wiener filter.
The restored image is used as an updated estimate of the
original image and leads to a new restoration.
The iterations continue until the estimate converges.
Additive iterative algorithm
It can be proved that in basic iterative algorithm the estimate
converges, but not to its true value.
Correction item is added in each iteration.
Implementation
Power spectral density is estimated using periodogram
Degradation model is designed to be a low pass filter (a circulant
matrix)
Iterative Procedure
They refer to a class of iterative procedures that successively use the Wiener filtered signal as an improved prototype to update the covariance estimates of the original image as follows.
Step 0: Initial estimate of Rff
(0) { }E TR R yyyyff
Step 1: Construct the thi restoration filter
( 1) ( ) ( ( ) )i i i T TW R H HR H Rnnff ff1
Step 2: Obtain the th)1( i estimate of the restored image
ˆ( 1) ( 1)i i f W y
Step 3: Use )1(ˆ if to compute an improved estimate of ffR given by
ˆ ˆ( 1) { ( 1) ( 1)}i E i i TR f fff
Step 4: Increase i and repeat steps 1,2,3,4.
7. Explain (i) minimum distance classifier (ii) matching by correlation.
i) Minimum distance classifier suppose that we define the prototype of each pattern class to be the mean vector of the patterns of the class;
……………………… (1)
Where Nj is the number of pattern vectors from class ωj and the summations is taken over these vectors. As before, W is the number of pattern classes. One way to determine the class membership of an unknown pattern vector x is to assigh it to the class of its closest prototype, as noted previously. Using the Euclidean distance to determine closeness reduces the problem to computing the distance measure:
………………….(2)
we then assign x to class ωj if Di(x) is the smallest distance. That is, the smallest distance implies the best match in this formulation. It is not difficult to show that selecting the smallest distance is equivalent to evaluating the functions
………………(3)
And assigning x to class ωi if di(x) yields the largest numerical value. This formulation agrees with the concept of a decision function, as define in Eq.1 . For Equation 2 and 3 the decision boundary between classes ωi and ωj for a minimum distance classifier is
…..(4)
The surface given by eq.(4) is the perpendicular bisector of the line segment joining mi and mj. For n=2 the perpendicular bisector is a line, for n=3 it is a plane and for n=3 it is called a hyper plane.
Minimum distance classifier works well when the distance between means is large compared to the spread or randomness of each class with respect to its mean. The minimum distance classifier yields optimum performance when the distribution of each class about its mean is in the form of a spherical „hyper cloud” in n- dimensional pattern space. The simultaneous occurrence of large mean separations and relatively small class spread occur seldomly in practice unless the system designer controls the nature of the input. An excellent example is provided by systems designed to read stylized character fonts, such as the familiar American banker‟s association font character set.
Its shows, this particular font set consists of 14 characters that were purposely designed on a 9 ×7 grid in order to facilitate their reading. The characters usually are printed in ink that contains finely ground magnetic material. Prior to being read, the ink is subjected to a magnetic field, which accentuates each character to simplify detection. In other words. The segmentation problem is solved by artificially highlighting the key characteristics of each character
ii) Matching by correlation
Let w(x,y) be a sub image of size J×K within the image f(x,y) of size M×N, where we assume that J≤M and K≤N. although the correlation approach can be expressed in vector form working directly with an image or sub image format is more intuitive. In its simplest form, the correlation between f(x,y) and w(x,y)
……………(5)
For x= 0,1,2,….,M -1, y=0,1,2,…,N-1 and the summation is taken over the image region whre w and F overlap. Note by comparing this equation
with Equation it can be noted that its is implicitly assumed that the functions are real quantities and that we left out the MN constant. The reason is that we are going to use a normalized function in which these constants cancel out and the definition given in in equation 12.2.7 is used comomonly in practice. We also used the symbols s and t in Eq.(5) to avoid confusion with m and n
The above figure illustrates the procedure where we assume that the origin of f is at its top left and the origin of w is at is center. For one value of (x,y), say,(x0,y0) inside f, application of Eq.(5) yields one value of c. As x and y are varied, w moves around the image area, giving the function c(x,y). the maximum value(s) of c indicates the position(s) where w best matches ƒ. note that accuracy is lost for values of x and y near the edges of f, with the amount of error being in the correlation proportional to the size of w. The correlation function given in Eq in (5) has the disadvantage of being sensitive to changes in the amplitude of f and w. for example doubling all the values of ƒ doubles the value of c(x,y). an approach frequently used to overcome this difficulty is to perform matching via the correlation coefficient, which is defined as
Where x=0,1,2,…..,M-1,y=0,1,2,….,N -1, w‾ is the average value of the pixels in w( computed only once), f‾ is the average value of f in the region coincident with the current location of w, and the summations are taken
over the coordinates common to both f and w. the correlation coefficient γ (x,y) is scaled in the range -1 to 1, independent of scale changes in the amplitude of ƒ and ω. Although the correlation function can be normalized for amplitude changes through the correlation coefficient, obtaining normalization for changes in size and rotation can be difficult. Normalizing for changes in size and rotation can be difficult. Normalizing for size involves spatial scaling, a process that in itself adds a significant amount of computation. Normalizing for rotation is even more difficult. If a clue regarding rotation can be extracted from ƒ(x,y). However, if the nature of rotation is unknown, looking for the best match requires exhaustive rotations of ω(x,y). this procedure is impractical and as a consequence, correlation seldom is used in cases when arbitrary or unconstrained rotation is present. The correlation also can be carried out in the frequency domain through the FFT. If ƒ and ω are the same size, this approach can be more efficient than direct implementation of correlation in the spatial domain. Equation (5) is used when ω is much smaller than ƒ.
8. What are (i) optimal statistical classifiers (ii) Bayes classifier for
Gaussian pattern classes? Explain.
i) Optimal statistical classifiers
In most of the fields measuring and interpreting of physical events,
probability considerations are dealt with and it has become much
important in pattern recognition because of the randomness under which
pattern classes normally are generated. It is possible to derive a
classification approach that is optimal in the sense that, on average, its use
yields the lowest probability of committing classification errors.
Foundation:
The probability that a particular pattern x comes from class wj, is
denoted p(wj/x). if the pattern classifier decides that x came from wj, it
incurs a loss, denoted Lij. As pattern x may belong to nay of W classes
under consideration, the average loss incurred in assigning x to class wj is
…………………….(1)
This equation often is called the conditional average risk or loss in
decision theory terminology.
From basic probability theory, we know that p(A/B) =[p(A)p(B/A)]/p(B).
Using this expression, we write equation 1 in the form,
………………..(2)
Where p(x/wk) is the probability density function of the patterns from
class wk and p(wk) is the probability of occurrence of class wk. because
1/p(x) is positive and common to all the rj(x), j= 1,2,……,W, it can be
dropped form eq.(2) without affecting the relative order of theses function
form the smallest to the largest values. The expression for the average loss
then reduces to
……………(3)
The classifier has W possible classes to choose from for any give unknown
pattern. If is computes r1(x), r2(x),……rw(x) for each pattern x and
assigns the pattern to the class with the smallest loss, the total average loss
with respect to all decisions will be minimum.
The classifier that minimizes the total average loss is called the bayes
classifier. Thus the bayes classifier assigns an unknown pattern x to class
wj if ri(x),rj(x) for j= 1,2,…..,W; j≠i. in other words x is assigned to class wi
if
………..(4)
For all j;j j≠I, the “loss” for a correct decision generally is assigned a values
of zero and the loss for any incorrect decision usually is assigned the same
nonzero value. Under these conditions, the loss function becomes
………………………………….(5)
Where ∂ij=1 if i=j and ∂ij=0 if i≠j. equation 5 indicates a loss of zero for
correct decisions. Substituting equation 5 in equation in 3 yields
………………………(6)
The bayes classifier then assigns a pattern x to class wi if all i≠j
………….(7)
Or equivalently ,if
……(8)
We see that the Bayes classifier for a 0-1 loss function is nothing more that
computation of decision function of the form.
…………………(9)
Where a pattern vector x is assigned to the class whose decision function
yields the largest numerical value
The decision functions given in following eq
……………….(10)
It is optimal in the sense that they minimize the average loss in
misclassification. For this optimality to hold, however, the probability
density functions of the patterns in each class, as well as the probability of
occurrence of each class, must be known.
The latter requirement usually is not a problem. For instance, if all classes
are equally likely to occur, the P(w j)=1/M. even if this condition is not
true, these probabilities generally can be inferred from knowledge of the
problem. Estimation of the probability density functions p(x/ w j) is a
function of n variables, which, if its form is not known, requires methods
from multivariate probability theory for its estimation.
These methods are difficult to apply in practice, especially if the number
of representative patterns from each class is not large or if the underlying
form of the probability density functions is not well behaved. For these
reasons, use of the bayes classifier generally is based on the assumption of
an analytic expression for the various density functions and then as
estimation of the necessary parameters from sample patterns from each
class. By far the most prevalent form assumed for p(x/ w j) is the Gaussian
probability density function. The closer this assumption is to reality, the
closer the bayes classifier approaches the minimum average loss in
classification.
ii) Bayes classifier for Gaussian pattern classes
Let us consider a one dimensional problem (n=1) involving two
pattern classes (W=2) governed by Gaussian densities, with means m1 and
m2 and standard deviation of σ1 and σ2, respectively. The bayes decision
functions have the form
……………….(1)
Where the patterns are now scalars, denoted by x. figure shows a plot of
the probability density functions for the two classes
Fig : Probability density functions for 1-D pattern classes.
The point x0 shown is the decision boundary if the two classes are
equally likely to occur
The boundary between the two classes is a single point, denoted x0 such
that d1(x0)= d2(x0). If the two classes are equally likely to occur, the
P(w1)=P(w2)= ½ and the decision boundary is the value of x0 for which
p(x0/w1)=p(x0/w2).
This point is the intersection of the two probability density functions are
shown in the figure. Any pattern to the right of x0 is classified as
belonging to class w2. When the classes are not equally likely to occur, x0
moves to the left is class w1 is more likely to occur or, conversely, to the
right if class w2 is more likely to occur. This result is to be expected,
because the classifier is trying to minimize the loss of misclassification. For
instance, in the extreme case, if class w2 never occurs, the classifier would
never make a mistake by always assigning all patterns to class w1.
In the n- dimensional case, the Gaussian density of the vectors in the jth
pattern class has the for
……………(2)
Where each density is specified completely by its mean vector mj and
covariance matrix cj, which are defines as
………………………………(3)
And
……………….(4)
Where Ej{.} denotes the expected value of the argument over the
patterns of class wj. In Eq (3) & (4), n is the dimensionality of the pattern
vectors, and |Cj| is the determinant of the matrix Cj. Approximating the
expected value Ej by average value of the quantities in question yields and
estimate of the mean vector and covariance matrix :
………………………. (5)
And
………………….. (6)
Where Nj is the number of pattern vectors from class wj and the
summation is taken over these vectors. Later in this section we give an
example of how to use these two expressions.
According to eq. (1) the bayes decision function of class wj is
dj(x)=p(x/wj)P(wj). However, because of the exponential form of the
Gaussian density, working with natural logarithm of this decision
function is more convenient. In other words, we can use the form
………………….(7)
This expression is equivalent to Eq. (7) in terms of classification
performance because the logarithm is a monotonically increasing function.
In other words, the numerical order of the decision functions in
Eq .(1) and (7) is the same. Substituting Eq (2) in Eq. (7) yields
……(8)
The term (n/2) in 2π is the same for all classes, so it can be eliminated
from Eq. (8) which then becomes
………..(9)
For j=1,2…,W. Equation (5) represents the Bayes decision functions for
Gaussian pattern classes under the condition of a 0-1 loss function.
The decision functions. In Eq (9) are hyper quadrics because no terms
higher that the second degree in the components of x appear in the
equation. Clearly, then, the best that a bayes classifier for Gaussian
patterns can do is to place a general second order decision surface
between each pair of pattern classes.
If the pattern populations are truly Gaussian, however, no other surface
would yield a lesser average loss in classification if all convariance matices
are equal, the Cj=C, for j=1,2,…..,W. by expanding Eq. (9) and dropping all
terms independent of j we obtain
…………….(10)
Which are linear decision functions for j=1,2,…..,W
If in addition, C=I whre I is the identity matrix and also (wj)=1/W, for j=
1,2,…..,W, then
………………(11)
9. Explain perceptron for two pattern classes.
10. Explain multilayer feedforward networks in pattern recognition.