Multisource classification using ICM and Dempster-Shafer theory

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 51, NO. 2, APRIL 2002 277

Multisource Classification Using ICM andDempster–Shafer Theory

Samuel Foucher, Member, IEEE, Mickaël Germain, Student Member, IEEE, Jean-Marc Boucher, Member, IEEE, andGoze Bertin Bénié, Member, IEEE

Abstract—We propose to use evidential reasoning in order torelax Bayesian decisions given by a Markovian classification algo-rithm (ICM). The Dempster–Shafer rule of combination enablesus to fuse decisions in a local spatial neighborhood which we fur-ther extend to be multisource. This approach enables us to moredirectly fuse information. Application to the classification of verynoisy images produces interesting results.

Index Terms—Data fusion, Dempster–Shafer theory, ICM algo-rithm, multisource classification, remote sensing.

I. INTRODUCTION

FOR the past few years, image processing research has fo-cused on the problem of merging several images in order

to increase information content. Image fusion can be done atdifferent levels of representation: pixel level, feature level, ordecision level. The present paper deals with the fusion of deci-sions (classes) commonly called multisource classification. Tra-ditional methods, such as maximum likelihood, are based on amultivariate Gaussian pdf employed to statistically model thedata set. Whereas this is suitable for multispectral data, such amodel fails when sources of information are highly heteroge-neous i.e., a combination of radar and optical images. More-over, performances of ML methods rapidly decrease when thenumber of images increases and the quality of the training be-comes critical. In order to overcome these limits, fusion methodstry to deal with the following issues: heterogeneity in the sourcesand in the representation format, large number of sources, im-precision in the data, non-Gaussian sources, etc. Fusion methodscan be categorized by two main approaches: the statistical ap-proach using a classical Bayesian framework and methods usingan Artificial Intelligence framework, such as possibility theoryor Dempster–Shafer theory. The aim of this article is twofold.First, we propose a modification of the multiscale iterated con-ditional mode (ICM) algorithm using a local relaxation of theBayesian decision based on Dempster–Shafer theory. Second,we extend this approach to apply to the multisource case. Thefinal method produces interesting results on classification ofradar images and in the fusion of an optical (spot) and the SARimage (radarsat).

Manuscript received December 12, 2001; revised January 21, 2002.S. Foucher and G. B. Bénié are with the Centre d’Applications et de

Recherche en Télédétection Université de Sherbrooke, Sherbrooke, QC,Canada (e-mail: [email protected]).

M. Germain and J.-M. Boucher are with the École Nationale Supérieure desTélécommunications de Bretagne, Brest, France.

Publisher Item Identifier S 0018-9456(02)04315-2.

II. PRINCIPLE

A. ICM Classification

Markovian methods of classification try to estimate the MAPsolution for the class field of the image. Annealing methods,such as the Gibbs sampler, or the Metropolis algorithm, ensurethe convergence toward a global energy minimum but the com-putational burden is high. Deterministic methods such as ICMare much faster but remain a suboptimal approach where findinga global minimum is not guaranteed [1]. The ICM method esti-mates a local MAP solution for the label by minimizing the sumof the local likelihood and Gibbs energies. The image data froma sensor is assumed to consit ofvectors

. Let be the set of pixels in the image. The classificationprocess is to estimate the class labels of the scene

, is chosen in the class set .The ICM algorithm [1] is a solution to resolve this problem.

The ICM algorithm is based on maximizing withrespect to , we note as the representation of contextuallabels. For each iteration of algorithm, a plausible choice is theclass label that maximizes conditional probability, givenandthe current class label elsewhere. We can note a fast con-vergence to a local maximum instead of a global MAP algorithmlike the simulated annealing method.

B. Dempster–Shafer Theory Basics

Dempster–Shafer theory is a mathematical framework inwhich nonadditive probability models enable us to model impre-cision in beliefs [6]. The hypothesis set, called the frame ofdiscernment, is intended to represent a set of mutually exclusiveand exhaustive propositions. In our problem of classification, wehave . Evidence on a subset is represented witha basic probability assignment (bpa) . Subsets withnonnull bpa are called focal elements and compose the kernel

, and they have the following properties:

(1)

(2)

(3)

The belief function gives the amount of evidence whichimplies the observation of This function is defined on theframe of discernment by the relation

(4)

0018-9456/02$17.00 © 2002 IEEE

278 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 51, NO. 2, APRIL 2002

(a) (b)

(c)

Fig. 1. (a) Consonant distribution, (b) partially consonant distribution, and(c) dissonant distribution.

(5)

(6)

The plausibility function can be seen as the amount ofevidence which does not refute

(7)

This function can be represented according to the belief function

(8)

Total ignorance is represented whenis the only focal ele-ment. On the contrary, when focal elements are all singletons,we obtain a Bayesian representation where and areequal and equivalent to a probability measure on. When weobserve the outcome of a statistical experiment, Shafer proposesan approach to assess our evidence concerningprovided bythe statistical observation [4], [6]. The Dempster rule combinespieces of evidence from independent sources

If and ,

(9)

(10)

(11)

The Dempster rule is difficult to apply when kernels havenonsingleton focal elements. The different mass representationsare a way to reduce complexity by imposing a structure to thekernel. These representations can be described by three distri-butions (Fig. 1).

A mathematical framework has been described by Shafer [6]in the consonant distribution. This theory has been generalizedby Walley [7] with the definition of the partially consonant be-lief. In fact, a belief function is defined “partially con-sonant” on if is defined “consonant” on a partition

on . This belief function has the fol-lowing representation

(12)

(13)

We can notice that with a partition with . The partiallyconsonant belief becomes a consonant belief defined by Shafer

(14)

Conversely, a partition with shows a dissonant beliefdefined by a set of Bayesian masses

(15)

In Section II-C, we will give closed algebric formulations forthese three bpa distributions (one consonant, one partially con-sonant and one dissonant).

C. Local Relaxation of the Bayesian Decision With theDempster Rule

In order to relax the first decision made by one ICM iteration,we establish the following hypothesis: in a 33 neighborhoodnoted , labels around the central pixelare elements ofevidence that determine our belief in the value of the labelof the central pixel.

1) Elementary Mass Distributions and Local Combi-nation Rule: Following one iteration of the ICM algo-rithm, labels attached to the pixels can be ordained ina decreasing order according to their probability value

with and, .

Two different types of mass distributions are used reflectingdifferent ways to distribute our belief on.

a) Consonant distribution:The elementary masses of evi-dence are determined by the results of the previous Bayesian de-cision in the ICM iteration. The choice of the elementary massdistribution on the frame of discernment is crit-ical because it models our primary knowledge. Following Shafer[6] and Denoeux [3], we choose a consonant way to distributeelementary belief, as depicted below

(16)

The local combination using the Dempster rule is

(17)In the neighborhood , evidence supporting can be splitin two sets: the set of sites with focal elements

and the set of sites with

FOUCHERet al.: MULTISOURCE CLASSIFICATION USING ICM AND DEMPSTER–SHAFER THEORY 279

(focal elements intersect only in: )This approach enables us to identify possible factorizations inthe relation (17), and mass combinations in and canbe done separately

(18)

The second term is trivial, and the first one can be simplifiedobserving the following relation expressing the sum of all thecombinations of products of binary terms(with a set of indexes)

(19)

With , we obtain the first term of therelation (18)

(20)

In Sections II-C1b and c, we note and respectively asthe elementary mass and the Dempster rule without normaliza-tion. Using relations (18) and (19) we obtain the following non-normalized mass

(21)

(22)

The normalization constant is

(23)

a) Partially consonant distribution:Focal elements are, with the following mass distribution

(24)

The Dempster product is simplified in the same way as the con-sonant case by regrouping pixels according to their focal setswhen they have a nonnull intersection, that is to say

,

and. Consequently, the

nonnormalized Dempster product gives

(25)

Terms in this relation can be simplified using the relation(20) with and

which gives

(26)

We obtain two simple algebraic relations

(27)

280 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 51, NO. 2, APRIL 2002

Dissonant Distribution: Focal elements are. In that case, focal elements have the

following mass distribution

(28)

In the same way as partially consonant distribution, we obtaintwo simple algebraic relations (27).

2) Decision Rule:There are many ways to decide, the moststraightforward being the maximum of belief rule

(29)

When , we have a total conflict between decisions inand the rule of combination is no longer defined. When conflictoccurs, we propose to take the decision in which has thebest confidence level. That is to say

(30)

D. Extension to the Multisource Case

We consider a set of images . Each in-formation source has a particular class set noted

. The fusion process at the level decision aims tofocus decisions from the sources in the information of interestset . As a result, we obtain the multisourceclassification .

1) Decision Space Mapping:Projection of on the infor-mation set is obtained from a priori knowledge by defining the

matrix where is our belief that theclass contributes to the information class. Consequently,for a site , bpa’s in the information set are calculated fromsource bpa’s using prior belief in the following manner:

(31)

(32)

After projection, information fusion is realized when thebpa’s are combined in the following multisource neighborhood

(33)

As in Section II-C, in case of conflict, we take the decision in, which has the maximum bpa.

(a) (b) (c)

(d) (e) (f)

Fig. 2. (a) Noisy image, (b) truth, (c) ICM method without Dempster–Shafer,(d) proposed method with consonant distribution, (e) proposed method withpartially consonant distribution, and (f) proposed method with dissonantdistribution.

III. RESULTS

A. Algorithm Implementation

In order to compare different mass distributions, we use theartificial Gibbs field image in which initialization is performedwith the SEM algorithm.

B. Classification of Artificial Noisy Image

The proposed algorithm is applied on a 256256 artificialimage (Fig. 2) corrupted by a simulated Gaussian noise. Theground truth contained four classes [Fig. 2(b)]. A simple ICMalgorithm gives 89.9% of correct classification [Fig. 2(c)]. Weobtained good results with Dempster–Shafer (around 93% ofcorrect classification), despite the fact that the image was notfiltered with a strong noise. A consonant and a partially conso-nant distribution give similar results (91.2% and 93.2% of cor-rect classifiation, respectively) whereas a dissonant distributiongives better results (94.7%).

C. Fusion of Optical and Radar Images

Spot images are very sensitive to vegetation cover density.Dense vegetation appears in dark whereas bare soils are bright(Fig. 2). RADARSAT images give reliable information aboutlateritic soil which appears dark on the image whereas densevegetation is very bright. The proposed method is used with thefollowing projection matrix:

In this case, we use the multiresolution ICM algorithm pro-posed by Boucher [2].

On the classification in five classes, we preserved informa-tion from the radar (white) whereas information from both sen-sors are used to determine vegetation classes (blackdense

FOUCHERet al.: MULTISOURCE CLASSIFICATION USING ICM AND DEMPSTER–SHAFER THEORY 281

(a) (b) (c)

(d) (e) (f)

Fig. 3. (a) Spot image, (b) radar image, (c) multisource classification, (d) truth,(e) c-means, and (f) UDS method [5].

vegetation, dark gray average cover, gray low cover, lightgray bare soil). Despite the very noisy radar image, we areenable to extract information without filtering. In order to com-pare, we give classification results obtained by a clustering al-gorithm (c-means) and an unsupervised method of fusion usingDempster–Shafer theory [5].

IV. CONCLUSION

The proposed approach takes into account local imprecisionin a previous Bayesian classification in order to initiate a seconddecision based on the Dempster–Shafer rule of combination.We have extended this local relaxation to incorporate multi-source information. Results show an interesting robustness to-ward noisy images.

REFERENCES

[1] J. Besag, “On the statistical analysis of dirty pictures,”J. R. Statist. Soc.B, vol. 48, no. 3, pp. 259–302, 1986.

[2] J. M. Boucher, G. B. Bénié, R. Fau, and S. Plehiers, “Local and globalmultiscale image classification,”Proc. SPIE, vol. 2303, pp. 485–493,1994.

[3] T. Denoeux, “Ak-nearest neigbor classification rule based on Demp-ster–Shafer theory,”IEEE Trans. Syst., Man, Cybern., vol. 25, pp.805–813, May 1995.

[4] H. Kim and P. H. Swain, “Evidential reasoning approach to multi-source-data classification,”IEEE Trans. Geosci. Remote Sensing, vol.25, pp. 1257–1265, Oct. 1995.

[5] S. Le Hégarat-Mascle, I. Bloch, and D. Vidal-Madjar, “Application ofDemspter-Shafer evidence theory to unsupervised classification in mul-tisource remote sensing,”IEEE Trans. Geosci. Remote Sensing, vol. 35,pp. 1018–1031, Apr. 1997.

[6] G. Shafer, A Mathematical Theory of Evidence. Princeton, NJ:Princeton Univ. Press, 1976.

[7] P. Walley, “Belief function representation of statistical evidence,”Ann.Statist., vol. 15, no. 4, pp. 1439–1465, 1987.

Samuel Foucher(M’02) was born in Nantes, France, in 1969. He received theB.S. degree in physics from the University of Nantes in 1989, the telecommuni-cation engineering degree from the Ecole Nationale Supérieure des Télécommu-nications de Bretagne, Brest, France, the M.S. degree in image processing fromthe University of Rennes, Rennes, France, in December 1996, and the Ph.D. de-grees in radar filtering and segmentation in September 2001.

Mickaël Germain (S’01) was born in Bressuire, France, in 1974. He receivedthe telecommunication engineering degree from the Ecole Nationale Supérieuredes Télécommunications de Bretagne, Brest, France, and the M.S. degree inimage processing from the University of Rennes, Rennes, France, in November1998. He is currently pursuing the Ph.D. degree at the University of Sherbrooke,Sherbrooke, QC, Canada.

His research including multispectral image fusion and segmentation.

Jean-Marc Boucher (M’83) was born in 1952. He received the engineeringdegree in telecommunications from the Ecole Nationale Supérieure desTélécommunications, Paris, France, in 1975, and the Habilitation à Diriger desRecherches degree in 1995 from the University of Rennes 1, Rennes, France.

He is currently Professor with the Department of Signal and Communica-tions, Ecole Nationale Supérieure des Télécommunications de Bretagne, Brest,France, where he is also Education Deputy Director. His current research inter-ests include estimation theory, Markov models and Gibbs fileds, blind decon-volution, wavelets and multiscale image analysis with applications to radar andsonar image filtering and classification, multisensor seismic signal deconvolu-tion, electrocardiographic signal processing, and speech coding. He has pub-lished 100 technical articles in these areas in international journals and confer-ences.

Goze Bertin Bénié(M’01) was born in Daloa, Ivory Coast. From 1977 to 1987,he received the B.A.Sc. degree in surveying and the M.Sc. and the Ph.D. degreesin photogrammetry and remote sensing from Universite Laval, Sainte-Foy, QC,Canada.

He was a Postdoctoral Fellow at the Canada Centre for Remote Sensing,Digim, Inc., Lavalin, Montreal, QC, and at Intera Information Technologies,Inc., Calgary, AB, Canada, from 1987 to 1990. In 1990, he joined the Depart-ment of Geography and Remote Sensing and the Centre d’applications et derecherches en télédétection (CARTEL) of the Université de Sherbrooke, Sher-brooke, QC, Canada, as an Assistant Professor. He was the head of CARTELfrom 1995 to 2000. He is currently Full Professor in image processing and geo-matics. His research interests include image filtering, segmentation and classi-fication methodology, and spatial modeling in GIS.

Documents

Multisource classification using ICM and Dempster-Shafer theory