Image fusion and unsupervised joint segmentation using a ...web1.lss.supelec.fr/files/TWFyY2VsIFNFSVNNSUM=_MCMCFusion.pdf · Probabilistic framework for unsupervised segmentation

Image fusion and unsupervised joint segmentation

using a HMM and MCMC algorithms

Olivier Feron† and Ali Mohammad-Djafari†

†Laboratoire des signaux et systemes (LSS), UMR8506 (CNRS-Supelec-UPS)

Supelec, plateau de Moulon, 3 rue Joliot Curie

91192 Gif sur Yvette, France

Abstract

In this paper we propose a Bayesian framework for unsupervised image fusion and

joint segmentation. More specifically we consider the case where we have observed

images of the same object through different imaging processes or through different

spectral bands (multi or hyper spectral images). The objective of this work is then to

propose a coherent approach to combine these images and obtain a joint segmenta-

tion which can be considered as the fusion result of these observations.

The proposed approach is based on a Hidden Markov Modeling (HMM) of the im-

ages where the hidden variables represent the common classification or segmenta-

tion labels. These label variables are modeled by the Potts Markov Random Field

(PMRF). We propose two particular models for the pixels in each segment (iid. or

Markovian) and develop appropriate Markov Chain Monte Carlo (MCMC) algo-

1

rithms for their implementations. Finally we present some simulation results to show

the relative performances of these models and mention the potential applications of

the proposed methods in medical imaging and survey and security imaging systems.

key words : Data fusion, Segmentation, Markov random field, multi spectral images,

HMM, MCMC, Gibbs sampling.

1 Introduction

Data fusion and multi-source information has become a very active area of research in

many domains : industrial non destructive testing and evaluation ([1]), industrial inspec-

tion ([2]), and medical imaging ([3, 4, 5, 6, 7]). In all these domains the main objective

of image fusion schemes is to extract all the useful information from the source images,

which will be represented in a single image.

There is a large literature describing techniques of image fusion which use different ap-

proaches :

• Pixel-based approach : Those methods are the simplest and work directly on the

pixels of the source images ([8]). For example the very intuitive method of averag-

ing consists in constructing a pixel of the fused image by averaging the correspond-

ing pixels of the source images. These methods can be used if different images

represent the same physical quantity (luminance for example) with the same scale.

The main limitation of these methods is the fact that very often different images do

not represent the same physical quantity.

2

• Feature-based and Transform domain approach : The main idea here is to ex-

tract some particular features of the images (contours, regions) which are more

robust to pixel values scaling and variations and then use data fusion techniques

to obtain common features. This domain is more developed in the literature of fu-

sion scheme and considers that the fused image must preserve all the features of

the source images. For extracting those features typical methods use pyramid trans-

forms (Wavelet, Laplacian, Gradient,...) ([9, 10]), which was particularly developed

because it gives information on contours or contrast changes, in which the human

vision is particularly sensitive. In these methods the coefficients in the transform

domain represent the characteristics of the source images. The fusion consists then

in selecting the main coefficient of the sensor images, with certain criteria, and

constructing a fused image in the transform domain, and finally make the inverse

transform to obtain the resulting fused image.

• Image fusion after PCA or ICA : When the number of images to fusion becomes

more important (hyper spectral images) it may be necessary to extract the principal

(Principal Component Analysis PCA [11]) or independent (Independent Compo-

nent Analysis ICA [12, 13]) components first and then use image fusion techniques

on these components.

• Probabilistic model-based approach : This type of approach consists in intro-

ducing a model which represents a relationship between the observed images and

the source images or some particular features of them ([5, 7, 8, 14]). The model

can also take into account noise and unknown parameters of the model such as the

3

registration parameters of the images. These methods may be supervised or not.

In supervised case a training step, or more generally a pre-processing, is used to

estimate the parameters of the images model ([15]). In unsupervised case these

parameters are estimated from the data themselves.

Those different approaches are not exhaustive and not independent, and they can be mixed

in hybrid methods. In all these methods there are two different objectives :

• to obtain an image which represents all the information of the sources. Because the

human vision is very sensitive on contrast changes in the image, this objective is

often reduced to construct a segmentation in which all the regions and contours of

the different sources are represented.

• to involve the reconstruction of an image by using complementary information

present in other data sets.

The method presented in this work can be classified in probabilistic model-based approach

and our objective is to obtain a common segmentation and to involve the reconstruction

at the same time. The main problem is how to combine the information contents of

different sets of data gi(r). Very often the data sets gi, and corresponding images fi,

do not represent the same quantities. A general model for these problems can be the

following :

gi(r) = [Hifi](r) + εi(r), i = 1, . . . ,M (1)

where Hi are the functional operators of the measuring systems, or registration operators

if the observations have to be registered. We may note that estimating fi given each set of

data gi is an inverse problem by itself.

4

(a)

(b)

Figure 1: Examples of images for data fusion and joint segmentation. a) T1-weighted,T2-weighted and T1-weighted with contrast agent transversal slices of a 3D brain MRimages. b) Two observations from transmission and backscattering X rays in securitysystems (with the permission of American Science and Engineering, Inc., 2003)

In this paper we consider the case where the measuring data systems can be assumed

almost perfect and the observations are registered, which means that we can write :

gi(r) = fi(r) + εi(r), i = 1, . . . ,M (2)

for r ∈ Z2. Note that if we consider images, the pixels r belong to a finite lattice S ,

and we will note S the number of pixels of this lattice. In the following we also use the

notations :

gi = fi + εi or g = f + ε (3)

where gi = {gi(r), r ∈ S} and g = {gi, i = 1, . . . ,M}.

Figure 1 shows two examples of image fusion problem. The first sets of data are multi

spectral noisy images of transversal slices of 3D brain MR images. The second example

shows a multimodal case with transmission and backscattering X-rays acquisitions of a

5

suit-case. As we can see in the observed images gi, the only thing these images have

really in common is their anatomy (contours and regions).

In this work we introduce a label variable z(r) for the regions and consider the region

labels as common feature between all images. Thus the data fusion becomes then the

estimation of joint segmentation labels z = {z(r), r ∈ S}.

The problem of segmentation is a long standing problem in computer vision. Recently

works on medical imaging propose methods to construct a segmentation from multi-

spectral images ([4, 6, 15]), which can be considered as fusion problem.

Probabilistic framework for unsupervised segmentation is a very active area and has still

shown effective results in many domains. In [16] and [17] the authors propose a Monte

Carlo Markov Chain (MCMC) method for image segmentation, using Bayesian frame-

work and Markov field prior probability. In this paper we propose to use these types of

methods in the case of multiple source images.

The Bayesian approach we propose models the observed data through p(gi|fi), the im-

ages through p(fi|z) and the classification labels z through P (z). When these priors are

appropriately assigned we obtain the expression of the a posteriori p(f ,z|g) from which

we infer not only on z but also on f . Our aim is then to obtain a common segmentation

of M observations and to reconstruct fi, i = 1, . . . ,M at the same time.

This paper is organized as follows : In section 2 we introduce the common feature z,

model the relation between the images fi to it through p(fi|z) and its proper characteris-

tics through a prior law P (z). In section 3 we give detailed expressions of the a posteriori

laws. The section 4 gives the general structure of the MCMC algorithm we used to esti-

mate f and z. In section 5 we introduce a more complex model accounting for a spatial

6

dependency of f |z in order to decrease the noise of the observations. In section 6 we

present some simulation results to show the performances of the proposed methods and

their potential applications in medical imaging and security imaging systems. Finally in

section 7 we discuss about the estimation of the number labels.

2 Modeling for Bayesian data fusion

Within the observation model (3) the expression of the posterior law p(f ,z|g) is given

by the relation :

p(f ,z|g) ∝ p(g|f) p(f |z) P (z). (4)

We need then to give precise expressions of p(g|f), p(f |z) and P (z) according to appro-

priate hypothesis on the noise model, the image model and the labels model.

2.1 Observation noise model and the likelihood

Assuming independent noises εi among the different observations we have

p(g|f) =M∏

i=1

p(gi|fi) =M∏

i=1

pεi(gi − fi)

Assuming εi centered, white and Gaussian p(εi) = N (0, σ2εiI), and S the number of

pixels of an image, we have :

p(gi|fi) = N (fi, σ2εiI) =

(1

2πσ2εi

)S2

exp

{− 1

2σ2εi

||gi − fi||2}

7

2.2 Hidden Markov modeling of images

As we want to reconstruct an image with statistically homogeneous regions, it is natural

to introduce a hidden variable z = (z(1), . . . , z(S)) ∈ {1, . . . , K}S which represents

a common classification of the images fi. The problem is now to estimate the set of

variables (f ,z) using the Bayesian approach :

p(f ,z|g) = p(f |z, g) P (z|g) (5)

Thus to be able to give an expression for p(f ,z|g) using the Bayes formula, we need to

define p(gi|fi) and p(fi|z) for p(f |z, g), and p(gi|z) and P (z) for P (z|g).

To assign p(fi|z) we first define the sets of pixels which are in the same class :

Rk = {r : z(r) = k}, |Rk| = nk

fik = {fi(r) : z(r) = k}

In this paper, in a first step, we assume that all the pixels fik of an image fi which are in

the same class k will be characterized by a mean mik and a variance σ2i k :

p(fi(r)|z(r) = k) = N (mik, σ2i k) ∀r ∈ S

With these notations we have :

p(fik) = N (mik1, σ2i kI) (6)

and thus

p(fi|z) =K∏

k=1

N (mik1, σ2i kI)

=

K∏

k=1

(1√

2πσ2i k

)nk

exp

{− 1

2σ2i k

||fik −mik1||2}, i = 1, . . . ,M(7)

8

where 1 is a vector with all components equal to 1. As we will see in section 5, we will

extend this model to the case where the pixels in different regions are assumed indepen-

dent but inside any homogeneous region we account for their local correlation by using a

Gauss-Markov model.

2.3 Potts-Markov modeling of labels

Finally we have to assign P (z). As we introduced the hidden variable z for finding

statistically homogeneous regions in images, it is natural to define a spatial dependency

on these labels. The simplest model to account for this desired local spatial dependency

is a Potts Markov Random Field model :

P (z) =1

T (α)exp

α

∑

r∈S

∑

s∈V(r)

δ(z(r)− z(s))

, (8)

where S is the set of pixels, δ(0) = 1, δ(t) = 0 if t 6= 0, V(r) denotes the neighborhood

of the pixel r (here we consider a neighborhood of 4 pixels), T (α) is the partition function

or the normalization constant and α represents the degree of the spatial dependency of the

variable z. There are many studies on the influences of this parameter. In [18], D. Higdon

showed that there exists a critical value αc which depends on the size of the images and

the number of classes. For values α < αc the Potts model realizations are strongly noisy

with a great number of small regions. For values α > αc, the realizations consist mainly

of a few large regions which become fast prevalent and a homogeneous background. The

Potts model appears then not appropriate for segmenting small regions. However it is

used and gives satisfactory results in the case of images with a homogeneous background.

In practice we fix the value of α largely greater than the critical point αc in order to force

9

the spatial dependency.

We have now all the necessary prior laws p(gi|fi), p(fi|z), p(gi|z) and P (z) and then

we can give an expression for p(f ,z|g). However these probability laws have in general

unknown parameters such as σ2εi

in p(gi|fi) or mik and σ2i k in p(fi|z). In a full Bayesian

approach, we have to assign prior laws to these ”hyperparameters”.

2.4 Conjugate priors for the hyperparameters

Let mi = (mik)k=1,...,K and σ2i = (σ2

i k)k=1,...,K be the means and the variances of the

pixels in different regions of the images fi as defined before. We define θi as the set of

all the parameters which must be estimated :

θi = (σ2εi,mi,σ

2i ), i = 1, . . . ,M

and we note θ = (θi)i=1,...,M . The choice of prior laws for the hyperparameters is still an

open problem. In [19] the authors used differential geometry tools to construct particular

priors which contain as particular case the entropic and conjugate priors. In this paper we

choose this last one.

When applied the particular priors of ([19]) for our case, we find the following conjugate

priors :

• Inverse Gamma IG(αεi0 , βεi0 ) and IG(αi0, βi0) respectively for the variances σ2

εiand

σ2i k,

• Gaussian N (mi0, σ2i 0) for the means mik.

The hyper-hyperparameters αi0, βi0, mi0 and σ2i 0 are fixed and the results are not in

general too sensitive to their exact values. However in case of noisy images we can

10

constrain small value on σ2i 0 in order to force the reconstruction of homogeneous regions.

3 A posteriori distributions for the Gibbs algorithm

The Bayesian approach consists now in estimating the whole set of variables (f ,z,θ)

following the joint a posteriori distribution p(f ,z,θ|g). It is difficult to simulate a joint

sample (f , z, θ) directly from his joint a posteriori distribution. However we can note

that considering the prior laws defined before, we are able to simulate the conditional a

posteriori laws p(f ,z|g,θ) and p(θ|g,f ,z). That is the main reason to propose a Gibbs

algorithm to estimate (f , z, θ), splitting first this set of variables into two subsets, (f ,z)

and (θ), and then into three subsets f , z and θ using the following relation :

p(f ,z|g,θ) = p(f |z, g,θ)P (z|g,θ), (9)

Then the sampling of the joint distribution p(f ,z|g,θ) is obtained by sampling first

P (z|g,θ) and then sampling p(f |z, g,θ). We will now define the conditional a pos-

teriori distributions we use for the Gibbs algorithm.

Sampling z using P (z|g,θ) :

For this step we have :

P (z|g,θ) ∝ p(g|z,θ) P (z)

=

M∏

i=1

p(gi|z,θi) P (z)

where using the relation (3) and the laws p(fi|z) and p(εi) we obtain

p(gi|z,θi) =∏

r∈Sp(gi(r)|z(r),θi)

11

and

p(gi(r)|z(r) = k) = N (mik, σ2i k + σ2

εi) (10)

As we chose a Potts Markov Random Field model for the labels z, we may note that

an exact sampling of the a posteriori distribution P (z|g,θ) is still impossible. However

we may note that P (z|g,θ) is still a PMRF where the probabilities are weighted by the

likelihood p(g|z,θ). We use this fact to propose in section 4 a parallel implementation of

a Gibbs sampling for this PMRF.

Sampling fi using p(fi|gi,z,θi) :

We can write the a posteriori law p(fi(r)|gi(r), z(r),θi) as follows :

p(fi(r)|gi(r), z(r) = k,θi) = N (miapostk , σ2

iapost

k )

where

miapostk = σ2

iapost

k

(gi(r)

σ2εi

+mik

σ2i k

)

σ2iapost

k =

(1

σ2εi

+1

σ2i k

)−1

sampling θi using p(θi|fi, gi,z) :

We have the following relation :

p(θi|fi, gi,z) ∝ p(mi,σ2i |fi,z) p(σ2

εi|fi, gi)

For the first term p(mi,σ2i |fi,z) we have to use a Gibbs algorithm and then sample

following the conditional distributions p(mi|σ2i ,fi,z) and p(σ2

i |mi,fi,z). Using again

the Bayes formula, the a posteriori distributions are calculated from the prior selection

fixed before and we have

12

• mik|fi,z, σ2i k,mi0, σ

2i 0 ∼ N (µik, v

2i k), with

µik = v2i k

(mi0

σ2i 0

+1

σ2i k

∑

r∈Rkfi(r)

)

v2i k =

(nkσ2i k

+1

σ2i 0

)−1

• σ2i k|fi,z,mik, αi0, βi0 ∼ IG(αik, βik), with

αik = αi0 +nk2

βik = βi0 +1

2

∑

r∈Rk(fi(r)−mik)

2

• σ2εi|fi, gi ∼ IG(αi, βi), with

αi =S

2+ αεi0 , S = number of pixels

βi =1

2||gi − fi||2 + βεi0

4 Parallel implementation of the sampling of p(z|g,θ)

As we could see in previous section, to generate samples from p(f ,z,θ|g) we gener-

ate alternatively samples z from P (z|g,θ), then f from p(f |g,z,θ) and finally θ from

p(θ|f , g,z). The second step is easy because p(f |z,θ, g) is Gaussian. The last step

is also easy because we have to generate samples from either a Gaussian or an Inverse

Gamma distribution. The first step, i.e. sampling z from P (z|g,θ), is not easy and by

itself needs a Gibbs sampler. However, as we chose a first order neighborhood system

for the a priori PMRF of the labels P (z), the a posteriori is still a PMRF with the same

neighborhood. We can then decompose the whole set of pixels into two subsets (odd and

13

even position) forming a chess board (see figure 2). In this case if we fix the black (respec-

tively white) labels, then the white (respectively black) labels become independent. This

decomposition reduces the complexity of the Gibbs algorithm because we can simulate

the whole set of labels in only two steps.

black labels

white labels

Figure 2: Chess board decomposition of the labels z

The Parallel Gibbs algorithm we implemented is then the following : given an initial state

(θ1, θ2, z)(0),

Parallel Gibbs samplingrepeat until convergence

1. simulate zB(n) ∼ p(z|zW (n−1), g, θ

(n−1))

simulate zW (n) ∼ p(z|zB(n), g, θ

(n−1))

simulate fi(n) ∼ p

(fi|gi, z(n), θi

(n−1))

2. simulate θi(n) ∼ p

(θi|fi

(n), z(n), gi

)

5 Accounting for local spatial dependency inside regions

We want now to introduce a local dependency between pixels of fik which are in a same

homogeneous region k. In previous section we assumed that these pixels are indepen-

dent even if they share the same mean and variance. In this section we want to relax

this hypothesis by accounting for possible local correlation. Our aim is to improve the

14

reconstructed images and then (because our algorithm is iterative) improve the quality of

our classification. We will now describe this new modelization and the modifications it

implies.

5.1 New modelization on the images fi

We now consider that pixels fi(r) inside a same region are locally dependent. However

pixels being in different regions stay independent. Note that this is our a priori hypothesis.

All the pixels either inside a given region or in different regions are a posteriori interde-

pendent. To be able to distinguish between the pixels in different regions we introduce a

hidden ”contour” variable q = {q(r), r ∈ S} as follows :

q(r) = 0, if {z(s), s ∈ V(r)} are in a same region,

= 1, else

We may note that when z is given, q is obtained in a deterministic way. So, q(r) is related

to z(r) and then the distribution of q is related to the distribution of z by the following

relation :

P (q(r) = 1|z) = 1−∏

s∈V(r)

δ(z(r)− z(s)) (11)

Then we have :

p(fi|z, q,θi) =K∏

k=1

p(fik|z, q,θi)

Let note fiV(r) = {fi(s), s ∈ V(r)}, where V(r) stands for the neighborhood of r and

|V| is its size (the number of pixels of the neighborhood system which is 4 here). Then

15

we can write :

p(fi(r)|z(r) = k, q(r),fiV(r),θi) = N (µk, σ2k) if q(r) = 1

= N (1

4

∑

s∈V(r)

fi(s),σ2k

4) if q(r) = 0

(12)

where 14

∑s∈V(r) fi(s) is the mean value of the four neighboring pixels around the pixel

position r. Note also that we can group these two cases together by noting

mfi(r) = q(r)µk + (1− q(r))1

4

∑

s∈V(r)

fi(s)

σ2fi(r)

= q(r)σ2k + (1− q(r))σ

2k

4

With these notations we can write the distribution of the likelihood p(gi(r)|z(r) = k, q(r),fiV(r),θi)

as in section 2 :

p(fi(r)|z(r) = k, q(r),fiV(r),θi) = N (mfi(r), σ2fi(r)

),

and

p(gi(r)|z(r) = k, q(r),fiV(r),θi) = N (mfi(r), σ2fi(r)

+ σ2εi

)

5.2 A posteriori distributions

As we chose a spatial dependency between pixels fi(r) with a neighborhood system of 4

pixels, we have the same problem as for the labels. Then we have to decompose the set of

variables fi into two subsets, fiW and fiB , which represent respectively odd numbered

position (labeled white) and the even numbered position (labeled black) pixels of the

image fi. Let note also fW

= {fiW}i=1,...,M and fB

= {fiB}i=1,...,M .

For this case we propose then to decompose directly the set of variables into three subsets

16

: (fW,zW ), (f

B,zB) and θ and then we have to sample them with their conditional a

posteriori distributions. For the first two subsets we can use the same decomposition of

(9) :

p(fW,zW |fB,zB, g,θ, q) = p(f

W|f

B,z, g,θ, q) P (zW |fB,zB, g,θ, q)

p(fB,zB|fW ,zW , g,θ, q) = p(f

B|f

W,z, g,θ, q) P (zB|fW ,zW , g,θ, q)

and we have also

p(fW|f

B,z, g,θ, q) =

M∏

i=1

p(fiW |fiB,z, gi,θi, q)

p(fB|f

W,z, g,θ, q) =

M∏

i=1

p(fiB|fiW ,z, gi,θi, q)

Sampling fiB and fiW using p(fiB|fiW ,z, gi,θi, q) and p(fiW |fiB,z, gi,θi, q)

With this decomposition we have the following relations :

p(fiB|gi,fiW ,z, q,θi) =∏

r black

p(fi(r)|gi(r),fiV(r), z(r), q(r),θi)

p(fiW |gi,fiB,z, q,θi) =∏

r white

p(fi(r)|gi(r),fiV(r), z(r), q(r),θi),

and with the same method of section 4, we obtain the a posteriori distribution :

p(fi(r)|gi(r), z(r), q(r),fiV(r),θi) = N (mapost, σ2apost),

with

mapost = σ2apost

(gi(r)

σ2εi

+mfi(r)

σ2fi(r)

)

σ2apost =

(1

σ2εi

+1

σ2fi(r)

)−1

17

Sampling zB and zW using P (zB|fW ,zW , g,θ, q) and P (zW |fB,zB, g,θ, q)

Using the Bayes rule we have

P (zB|zW , g,fW , q,θ) ∝ p(g|z,fW, q,θ) p(f

W|z, q,θ) P (zB|zW ) (13)

Due to the term p(fW|z, q,θ) in the right hand side of 13, we can not obtain an explicite

expression for the a posteriori distribution P (zB|zW , g,fW , q,θ). We propose then, for

this step, two different approximations. The first one is to approximate p(fW|z, q,θ) by

its expected value with respect to zB :

p(fW|z, q,θ) ≈

∑

zB

p(fW|z, q,θ) P (zB|zW ) = p(f

W|zW , q,θ), (14)

which becomes a constant with respect to zB . Indeed this approximation can be inter-

preted as a mean field approximation method ([20]). The approximated a posteriori dis-

tribution we propose to use for this step is :

P (zB|zW , g,fW , q,θ) ∝ p(g|z,fW, q,θ) P (zB|zW )

∝ P (zB|zW )M∏

i=1

∏

r black

p(gi(r)|z(r),fiV(r), q(r),θi)

(15)

We also have the symmetric relation

P (zW |zB, g,fB, q,θ)

∝ P (zW |zB)M∏

i=1

∏

r white

p(gi(r)|z(r),fiV(r), q(r),θi)

(16)

Note that the likelihood function p(gi(r)|z(r),fiV(r), q(r),θi) = N (mfi(r), σ2fi(r)

+ σ2ε)

is different from p(gi(r)|z(r),θi) = N (mk, σ2k + σ2

ε) in section 3 and more expensive in

computer time. The second approximation we propose then is to use the second expres-

sion in place of the first one in this step.

18

Updating q

As we mentionned before, given z, q is determined in a deterministic way and is updated

using the current variable z and the relation (11).

Sampling θi|z, gi,fi, q

We still use the same method to obtain the a posteriori distributions of the parameters of

θi. However we have here to decompose the set Rk into two subsets as follows :

Rk = R0k ∪R1

k

with Rik = {r; z(r) = k, q(r) = i}. Let also note nik = |Ri

k|. With this decomposition we

can calculate the a posteriori distributions of θi :

• mik|fi,z, q, σ2i k,mi0, σ

2i 0 ∼ N (µik, v

2i k), with

µik = v2i k

mi0

σ2i 0

+1

σ2i k

∑

r∈R1k

fi(r)

v2i k =

(n1k

σ2i k

+1

σ2i 0

)−1

An approximation of these equations can be to replace R1k by the whole Rk in the

determination of µik. Indeed even if we have changed the model by introducing

spatial dependency on fi, we have still in mind that pixels fi(r) which are in a

same homogeneous region must have the same mean. Then we grow the number of

pixels for calculating µik when we replace R1k by Rk.

19

• σ2i k|fi,z, q,mik, αi0, βi0 ∼ IG(αik, βik), with

αik = αi0 +nk2

βik = βi0 +1

2

∑

r∈R1k

(fi(r)−mik)2 + 2

∑

r∈R0k

(fi(r)−1

4

∑

s∈V(r)

fi(r))2

(17)

• σ2εi|fi, gi ∼ IG(νi,Σi), with

νi =S

2+ αεi0 , S = total number of pixels

Σi =1

2||gi − fi||2 + βεi0

5.3 New Gibbs algorithm

The difference between the algorithm of section 4 is in the decomposition of the set of

variables. The Gibbs algorithm we have implemented is then :

Parallel Gibbs samplingrepeat until convergence

1. simulate zW (n) ∼ P(zW |zB(n−1), f

B

(n−1), g, θ

(n−1), q(n−1)

)

simulate fi(n)

W ∼ p(fiW |fi(n−1)

B gi, z(n−1), θi

(n−1), q(n−1)

)

2. simulate zB(n) ∼ P(zB|zW (n), ˆf

W

(n), g, θ

(n−1), q(n−1)

)

simulate fi(n)

B ∼ p(fiB|fi(n)

W gi, z(n), θi

(n−1), q(n−1)

)

3. compute q(n) using z(n)

4. simulate θi(n) ∼ p

(θi|fi

(n), z(n), gi

)

20

6 Simulation and results

In this section we present results of our two models in different cases. First we test our

methods on fully simulated data sets to evaluate the different performances in case of

presence of noise. Then we present results on MRI images, but with the addition of an

artificial noise to compare the two methods. This second test permits us to have noisy

registered data sets of the same objects. Then those images are also considered here as

test images to compare the two proposed methods. Finally we test our algorithm in a

real application of security system using two X-ray images of the same object : X-ray

in transmission and in backscattering. These data sets are courtesy of the permission of

American Science and Engineering, Inc, 2003 (www.as-e.com). In the following we note

by ”HMMI” the first method described on this paper, and by ”HMMC” the second method

where we have introduced a local spatial correlation on the pixels of the images fi.

6.1 Simulated data

Here we have constructed two (256 × 256) normalized images, noted by f1 and f2 with

individual and common regions (fig. 3-a) . We then added independent Gaussian noises

then to obtain the noisy images g1 and g2 (fig. 3-b) wich we used as data for our pro-

posed data fusion methods.. The performances of these methods are evaluated using the

following measure between two images u and v :

d(u,v) =||u− v||2||u||2

We compared then the performances of these methods as a function of the variance of

noise in the observations. The estimated images are respectively noted by fiHMMI

and

21

σ2εi

d (f1, g1) d“f1, f1

HMMI”

d“f1, f1

HMMC”

d (f2, g2) d“f2, f2

HMMI”

d“f2, f2

HMMC”

0 0 0.0000 0.0000 0 0.0000 0.00000.001 0.0126 0.0053 0.0014 0.0013 0.0006 0.00010.002 0.0258 0.0109 0.0016 0.0027 0.0011 0.00020.005 0.0644 0.0279 0.0021 0.0067 0.0029 0.00030.01 0.1288 0.0568 0.0058 0.0134 0.0064 0.00110.02 0.2575 0.1239 0.0094 0.0268 0.0142 0.00190.05 0.6438 0.3215 0.0279 0.0670 0.0367 0.00490.1 1.2876 0.6805 0.0628 0.1340 0.0766 0.0105

Table 1: Comparison of the two methods with noisy data

σ2εi

s HMMIε1

s HMMIε2

s HMMCε1

s HMMCε2

0 0.0003 0.0003 0.0002 0.00020.001 0.0007 0.0007 0.0010 0.00100.002 0.0011 0.0011 0.0019 0.00180.005 0.0020 0.0019 0.0048 0.00460.01 0.0040 0.0038 0.0094 0.00880.02 0.0068 0.0069 0.0193 0.01800.05 0.0155 0.0146 0.0486 0.04530.1 0.0310 0.0254 0.0966 0.0937

Table 2: Estimation of the noise variance

fiHMMC

.

Results of reconstruction and segmentation : Figure 3 and 4 show some results with

different values of noise’s variance. When the observations have not any noise, both

methods give perfect results of segmentation. However we can note that in this case the

first method converges faster than the second.

In the presence of noise, the degradation of the segmentation appears in the smallest

regions. Indeed the results of segmentation show the loss of small regions, especially with

the first method. This is due to the fact that even if the Gibbs algorithm asymptotically

ensures the convergence to the global minimum, it can be locked in a local minimum.

This particular case appears when observations are too noisy. We can also remark that

the first method does not significantly increase the quality of the reconstructed images.

22

(a) f1, f2 (b) g1, g2

(c) f1

HMMI, f2

HMMI, zHMMI

(d) f1

HMMC, f2

HMMC, z HMMC

Figure 3: Results of data fusion with high SNR (σ2εi

= 0.001) : (a,b) original images f1

and f2 and (b) their corresponding observations g1 and g2. (c) results of data fusion (7labels) with the first model (from right to left : f1

HMMI, f2

HMMIand z HMMI). (d) results of

data fusion (7 labels) with the second model : (from right to left : f1

HMMC, f2

HMMCand

z HMMC.)

The algorithm seems to reconstruct exactly the data without canceling noise. However

the second method gives denoised images and a better segmentation. We can also note

that individual regions of the first data set g1 (resp. g2) appears in the reconstructed image

f2 (resp. f1). This is due to the modeling we have chosen, where we considered a unique

segmentation and reconstructed both images from it, which means that the two images

consist of the same objects.

Noise estimation : Table 1 summarizes the performances of the second method. Indeed

the denoising part of this method implies a reduction of the measure d by a factor 20 in

23

(a) f1, f2 (b) g1, g2

(c) f1

HMMI, f2

HMMI, z HMMI

(d) f1

HMMC, f2

HMMC, z HMMC

Figure 4: Results of data fusion with low SNR (σ2εi

= 0.01) : (a,b) original images f1 andf2 and (b) their corresponding observations g1 and g2. (c) results of data fusion with thefirst model (from right to left : f1

HMMI, f2

HMMIand z HMMI. (d) results of data fusion with

the second model : (from right to left : f1

HMMC, f2

HMMCand z HMMC.

relation to the initial measure between the real data and the noisy observations. The first

method permits to reduce this measure only by a factor 2. The gain of performance of

the second method is then significant when the observations are noisy. Also in table 2

we can see that the noise is better estimated by the second algorithm, which confirms the

better quality of the denoising step. However even if the estimated images are better, the

common segmentation is not changed in relation to the first method.

24

6.2 Medical imaging

(a) f1, f2, f3

(b) g1, g2, g3

(c) f1

HMMI, f2

HMMI, f3

HMMI, z HMMI

(d) f1

HMMC, f2

HMMC, f3

HMMCand z HMMC

Figure 5: Data fusion of medical images : (a) original data. (b) noisy observations witha Gaussian noise of variance 0.005. (c) Estimation with the first method (7 labels). (d)Estimation with the second method (7 labels).

Here we illustrate an example of MRI noisy images : T1-weighted, T2-weighted and T1-

weighted with contrast agent slices of a MR brain image, which are (289 × 236) images.

25

Here we used these images as the test images f1, f2 and f3. Then we added Gaussian

iid noises to them to obtain the simulated observations g1, g2 and g3, according to the

observation model 3.

Figure 5 shows the reconstruction and joint segmentation results of our algorithms. This

confirms the remarks made on simulated data : the reconstructions are largely better with

the second method. However we can see that the segmentation results are almost the

same except in the central part where some single pixels are badly classified with the

first method. In case of high signal-to-noise ratio it is not necessary to use the second

method. Finally we have to note that we did not introduce any physiological information

on particular tissues or on the characteristics of the MRI images. Those data sets are

only used as test images. We can expect for better results if we study more in detail the

particularities of MRI images.

6.3 Imaging in security systems

Here we test our algorithms on two images (transmission and backscattering X-rays data)

of a suitcase which are (141 × 198) images. We compared then our two fusion methods

to some other classical algorithms provided by a Matlab fusion Toolbox ([21]): Average,

Principle Component Analysis (PCA), Laplacian pyramid and Shift Invariant Discrete

Wavelet Transform (SIDWT).

Figure 6 then shows the different results of the fusion methods. In all the methods of the

Matlab fusion toolbox the right gun is not detected because it has not enough contrast

changes, in relation to the the details present in the same location in the backscattering X-

ray image. Because our algorithms produce a segmentation the right gun appears clearly

26

(a) g1 and g2

(b) Average (c) PCA (d) Laplacian pyramid

(e) SIDWT (f) HMMI (g) HMMC

Figure 6: Data fusion in X-ray security system images : (a) original data. Fusion result ofdifferent methods : (b) Average, (c) PCA-transform, (d) Laplacian Pyramid, (e) SIDWT,(f) our first method (8 labels), (g) our second method (8 labels).

after convergence. In particular our first method presents good results of detection of the

two guns. However we can expect for better results on these images if we implement a

texture classification. This can be possible if we extend our second model by considering

the neighborhood of a pixel fi(r) differently than computing the mean. This remains an

open problem.

27

7 Estimation of K

In the proposed model we must have a quite precise idea about the number K of labels.

Indeed in the case of the simulated data we chose K = 7 even if the theoretical perfect

segmentation consists of only 6 regions and thus 6 labels. The two algorihtms canceled

one label during the iterations and resulted to a good segmentation. In simulations of

the section 6.3 too, we obtained quite similar results with K ∈ {8, 9, 10}. However our

algorithms need to have fixed value of this parameter. In a fully unsupervised joint seg-

mentation we have to estimate K. There are a great number of works on the estimation

of K. In [17], K can vary along the iterations of the algorithm using a Reversible Jump

MCMC method, but this solution is too expensive to be implemented in real applications.

A more tractable solution consists ([22, 23]) in estimating this parameter by a preprocess-

ing step using prior information or fixing bounded value of K. In particular the authors in

[22] propose to use the minimum description length (MDL) as a function of K in the case

of a Finite Normal Mixture (FNM) model. For our case we can write the MDL function

as follows :

MDL(K) = −logL(Θ) + 0.5(Ka)log(S),

where Θ is the ML estimate of Θ = {{P (z(r) = k)}r∈S,k=1,...,K},θ}, L(Θ) is the like-

lihood of the model parameter, and Ka is the number of degrees of freedom of the model.

Considering the HMM model and the assumptions of section 3 we have :

p(g1(r), . . . , gM (r)|Θ) =K∑

k=1

P (z(r) = k) p(g1(r), . . . , gM(r)|z(r) = k,θ)

=K∑

k=1

P (z(r) = k)M∏

i=1

p(gi(r)|z(r) = k,θi)

28

Because we chose a PMRF model on the labels the computation of P (z(r) = k) is quite

impossible. Then for this preprocessing we propose to make the approximation that the

labels are independent, P (z(r) = k) = πk, which is the case of FNM model, and we can

write :

p(g1(r), . . . , gM(r)|Θ) =K∑

k=1

πk

M∏

i=1

p(gi(r)|z(r) = k,θi)

The computation of the ML estimate Θ = {π1, . . . , πK ,θ} is done by a hybrid method

using Expectation-Maximization (EM) and Classification-Maximization (CM) algorithms

([22]).

K 3 4 5 6 7 8 9 KMDL (without noise) -6037 -8305 -9236 -10026 -10024 -10021 -10019 6

MDL (with noise) -1225 -1213 -1199 -1239 -1227 -1214 -1202 6

Table 3: Estimation K for the simulated data

K 5 6 7 8 9 10 KMDL(g1) -13359 -13587 -13648 -13715 -13717 -1.3695 9MDL(g2) -16149 -16599 -16816 -17026 -17134 -17175 10

Table 4: Estimation K fro the sut-cases taken independently

5 10 15 20 25 30−4.5

−4.4

−4.3

−4.2

−4.1

−4

−3.9

−3.8

−3.7x 104

MDL

K

Figure 7: Estimation of joint K for the suit-cases

Table (3) shows the results of the estimation of K for the simulated data. The method

seems to give good results of estimation both for no nose data and noisy data. In the case

of inspection imaging this method gives also reasonable estimation of K if we take each

29

image independently (Table 4. However in the case of the joint segmentation problem

the MDL objective function did not reach the minimum (Figure 7) and seems to be quite

constant between 20 and 30 labels. These results are due to the fact that both images

present a lot of small regions with different grey-scale values. The rough approximation

of the likelihood of the HMM model by the likelihood of the FNM model seems then

to be not efficient. Our future studies is then on a criterion which take into account the

HMM model.

Aknowledgement

The authors would like to thank the referees for their useful remarks and suggestions

which improved the content of this paper.

8 Conclusion

We proposed a Bayesian approach for data fusion of images using a hierarchical Markov

modeling which permits us to obtain a joint segmentation for these images as data fusion

result. The proposed MRF for the labels is the Potts MRF. We proposed then two particu-

lar models for the pixels of images in each segment. The first model considers these pixels

independent and the second model introduces a local spatial dependency on these pixels.

We then developed appropriate Gibbs sampling for the two models and illustrated how

joint segmentation and reconstruction can be obtained in cases of simulated data sets. We

showed then how denoising and fusion can be obtained at the same time with an MCMC

algorithm. We showed also that our approach gives better results of fusion than classical

30

methods, in the case of X-ray inspection images. However we assume for the moment

that the sensor images are registered. We think that our modelization is promising for

introducing a registration and blur operator Hi and then implementing the common seg-

mentation, deblurring and registration at the same time. This remains an open problem

and is our future studies.

References

[1] S. Gautier, G. Le Besnerais, A. Mohammad-Djafari, and B. Lavayssiere, “Data

fusion in the field of non destructive testing,” in Maximum Entropy and Bayesian

Methods. Kluwer Academic Publ., Santa Fe, NM, K. Hanson edition, 1995.

[2] T. Bass, “Intrusion detection systems and multisensor data fusion,” in Comm. of the

ACM, vol. 43, April 2000, pp. 99–105.

[3] G. Matsopoulos, S. Marshall, and J. Brunt, “Multiresolution morphological fusion

of MR and CT images of the human brain,” in IEEE Proceedings on Vision, Image

and Signal Processing, vol.141 Issue : 3, Seattle, USA, 1994, pp. 137–142.

[4] B. Johnston and B. Mackiewitch, “Segmentation of multiple sclerosis lesions in

intensity corrected multispectral MRI,” IEEE Trans. on medical imaging, pp. 154–

169, April 1996.

[5] Chuin-Mu Wang and Clayton Chi-Chang Chen et al., “Detection of spectral sig-

natures in multispectral MR images for classification,” IEEE Trans. on medical

imaging, pp. 50–61, January 2003.

31

[6] E. Reddick and J.O. Glass et al., “Automated segmentation and classification of

multispectral Magnetic Resonance Images of brain using artificial neural networks,”

IEEE Trans. on medical imaging, pp. 911–918, December 1997.

[7] M.N. Ahmed and M. Yamany et al., “A modified fuzzy c-means algorithm for bias

field estimation and segmentation of MRI data,” IEEE Trans. on medical imaging,

pp. 193–199, March 2002.

[8] R.K. Sharma, Probabilistic model-based multisensor image fusion, Ph.D. thesis,

Graduate Institute of Science and Technology, Oregon, USA, 1999.

[9] Du Yong et al., “Satellite image fusion with multiscale wavelet analysis for marine

applications : preserving spatial information and minimizing artifacts (PSIMA),” J.

Remote Sensing, Vol.29, No 1, pp. 14–23, 2003.

[10] Ramesh Chaveli et al., “Fusion performance measures and a lifting wavelet trans-

form based algorithm for image fusion,” in Information Fusion, Proc. of the 5th int.

conf. on, July 2002, pp. 317–320.

[11] P.S. Chavez and A.Y. Kwarteng, “Extracting spectral contrast in Landsat thermal

mapper image data using Principal Component Analysis,” in PE and RS(55), 1989,

pp. 339–348.

[12] G. Simone and F.C. Morabito, “ICA-NN based data fusion approach in ECT signal

restoration,” in Neural Networks, Proceeding of the IEEE-INNS-ENNS International

Joint Conference on, Vol 5, July 2000, pp. 59–64.

32

[13] C.H. Chen and Z. Xiaouhui, “On the roles of PCA and ICA in data fusion,” in Geo-

science and Remote Sensing Symposium, IEEE International Proceedings IGARSS,

vol 6, July 2000, pp. 2620–2622.

[14] K. Held and E.R. Kops et al., “Markov Random Field segmentation of brain MR

images,” IEEE Trans. on medical imaging, pp. 878–886, December 1997.

[15] L. Aurdal, Analysis of Multi-Image Magnetic Resonance Acquisitions for Segmen-

tation and Quantification of Cerebral Pathologies, Ph.D. thesis, Ecole Nationale

Superieure des Telecommunications, ENST, Paris, France, 1997.

[16] Tu Zhuowen and Zhu Song-Chun, “Image segmentation by data-driven Markov

Chain Monte Carlo,” IEEE Trans. on pattern analysis and machine intelligence, pp.

657–673, May 2002.

[17] Z. Kato, “Bayesian color image segmentation using Reversible Jump Markov Chain

Monte Carlo,” Tech. Rep., ERCIM (European Research Consortium for Informatics

and Mathematics), february 1999.

[18] D. Higdon, Spatial Applications of Markov Chain Monte Carlo for Bayesian Infer-

ence, Ph.D. thesis, University of Washington, 1994.

[19] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and segmentation of

mixed images,” Journal of Electronic Imaging, vol 13(2), April 2004.

[20] D. Chandler, Introduction to modern statistical mechanics, Oxford university press,

1987.

33

[21] O. Rockinger and T. Feshner, “Pixel-level image fusion : the case of image se-

quences,” in Proc. SPIE, vol. 3374, february 1998, pp. 378–398.

[22] Tianhu Lei and Wilfred Sewchand, “Statistical approach to X-ray CT imaging and

its applications in image analysis–part ii : a new stochastic model-based image seg-

mentation technique for X-ray CT image,” IEEE Trans. on medical imaging, vol.

11, no. 1, pp. 62–69, March 1992.

[23] Tianhu Lei and Jayaram K. Udupa, “Performance evaluation of finite normal mix-

ture model-based image segmentation techniques,” IEEE Trans. on image pro-

ceesing, vol. 12, no. 10, pp. 1153–1169, October 2003.

[24] F. Samadzadegan, “Fusion techniques in remote sensing,” in Com. IV Joint work-

shop on challenges in geospatial analysis integration and visualisation II, Stuttgart,

Germany, September 2003.

[25] G. Gindi, M. Lee, A. Rangarajan, and I. George Zubal, “Bayesian reconstruction

of functional images using anatomical information as priors,” IEEE Transaction on

medical imaging, vol. 12, no. 4, pp. 670–680, 1993.

[26] T. Hebert and R. Leahy, “A generalized EM alogorithm for 3-D Bayesian reconstruc-

tion from Poisson data using Gibbs priors,” IEEE Transaction on medical imaging,

vol. 8, no. 2, pp. 194–202, June 1989.

[27] S. Gautier, J. Idier, A. Mohammad-Djafari, and B. Lavayssiere, “X-ray and ultra-

sound data fusion,” in Proceeding of the International Conference on Image Pro-

cessing, Chicago, USA, October 1998, pp. 366–369.

34

[28] C. Robert, Methodes de Monte Carlo par Chaınes de Markov, Economica, Paris,

France, 1996.

List of Figures

1 Examples of images for data fusion and joint segmentation. a) T1-weighted,

T2-weighted and T1-weighted with contrast agent transversal slices of

a 3D brain MR images. b) Two observations from transmission and

backscattering X rays in security systems (with the permission of Ameri-

can Science and Engineering, Inc., 2003) . . . . . . . . . . . . . . . . . . 5

2 Chess board decomposition of the labels z . . . . . . . . . . . . . . . . . 14

3 Results of data fusion with high SNR (σ2εi

= 0.001) : (a,b) original images

f1 and f2 and (b) their corresponding observations g1 and g2. (c) results

of data fusion (7 labels) with the first model (from right to left : f1

HMMI,

f2

HMMIand z HMMI). (d) results of data fusion (7 labels) with the second

model : (from right to left : f1

HMMC, f2

HMMCand z HMMC.) . . . . . . . . . . 23

4 Results of data fusion with low SNR (σ2εi

= 0.01) : (a,b) original images

f1 and f2 and (b) their corresponding observations g1 and g2. (c) results

of data fusion with the first model (from right to left : f1

HMMI, f2

HMMIand

z HMMI. (d) results of data fusion with the second model : (from right to

left : f1

HMMC, f2

HMMCand z HMMC. . . . . . . . . . . . . . . . . . . . . . . 24

35

5 Data fusion of medical images : (a) original data. (b) noisy observations

with a Gaussian noise of variance 0.005. (c) Estimation with the first

method (7 labels). (d) Estimation with the second method (7 labels). . . . 25

6 Data fusion in X-ray security system images : (a) original data. Fusion

result of different methods : (b) Average, (c) PCA-transform, (d) Lapla-

cian Pyramid, (e) SIDWT, (f) our first method (8 labels), (g) our second

method (8 labels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7 Estimation of joint K for the suit-cases . . . . . . . . . . . . . . . . . . . 29

List of Tables

1 Comparison of the two methods with noisy data . . . . . . . . . . . . . . 22

2 Estimation of the noise variance . . . . . . . . . . . . . . . . . . . . . . 22

3 Estimation K for the simulated data . . . . . . . . . . . . . . . . . . . . 29

4 Estimation K fro the sut-cases taken independently . . . . . . . . . . . . 29

36

Documents

Image fusion and unsupervised joint segmentation using a ...web1.lss.supelec.fr/files/TWFyY2VsIFNFSVNNSUM=_MCMCFusion.pdf · Probabilistic framework for unsupervised segmentation