Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Image fusion and unsupervised joint segmentation
using a HMM and MCMC algorithms
Olivier Feron† and Ali Mohammad-Djafari†
†Laboratoire des signaux et systemes (LSS), UMR8506 (CNRS-Supelec-UPS)
Supelec, plateau de Moulon, 3 rue Joliot Curie
91192 Gif sur Yvette, France
Abstract
In this paper we propose a Bayesian framework for unsupervised image fusion and
joint segmentation. More specifically we consider the case where we have observed
images of the same object through different imaging processes or through different
spectral bands (multi or hyper spectral images). The objective of this work is then to
propose a coherent approach to combine these images and obtain a joint segmenta-
tion which can be considered as the fusion result of these observations.
The proposed approach is based on a Hidden Markov Modeling (HMM) of the im-
ages where the hidden variables represent the common classification or segmenta-
tion labels. These label variables are modeled by the Potts Markov Random Field
(PMRF). We propose two particular models for the pixels in each segment (iid. or
Markovian) and develop appropriate Markov Chain Monte Carlo (MCMC) algo-
1
rithms for their implementations. Finally we present some simulation results to show
the relative performances of these models and mention the potential applications of
the proposed methods in medical imaging and survey and security imaging systems.
key words : Data fusion, Segmentation, Markov random field, multi spectral images,
HMM, MCMC, Gibbs sampling.
1 Introduction
Data fusion and multi-source information has become a very active area of research in
many domains : industrial non destructive testing and evaluation ([1]), industrial inspec-
tion ([2]), and medical imaging ([3, 4, 5, 6, 7]). In all these domains the main objective
of image fusion schemes is to extract all the useful information from the source images,
which will be represented in a single image.
There is a large literature describing techniques of image fusion which use different ap-
proaches :
• Pixel-based approach : Those methods are the simplest and work directly on the
pixels of the source images ([8]). For example the very intuitive method of averag-
ing consists in constructing a pixel of the fused image by averaging the correspond-
ing pixels of the source images. These methods can be used if different images
represent the same physical quantity (luminance for example) with the same scale.
The main limitation of these methods is the fact that very often different images do
not represent the same physical quantity.
2
• Feature-based and Transform domain approach : The main idea here is to ex-
tract some particular features of the images (contours, regions) which are more
robust to pixel values scaling and variations and then use data fusion techniques
to obtain common features. This domain is more developed in the literature of fu-
sion scheme and considers that the fused image must preserve all the features of
the source images. For extracting those features typical methods use pyramid trans-
forms (Wavelet, Laplacian, Gradient,...) ([9, 10]), which was particularly developed
because it gives information on contours or contrast changes, in which the human
vision is particularly sensitive. In these methods the coefficients in the transform
domain represent the characteristics of the source images. The fusion consists then
in selecting the main coefficient of the sensor images, with certain criteria, and
constructing a fused image in the transform domain, and finally make the inverse
transform to obtain the resulting fused image.
• Image fusion after PCA or ICA : When the number of images to fusion becomes
more important (hyper spectral images) it may be necessary to extract the principal
(Principal Component Analysis PCA [11]) or independent (Independent Compo-
nent Analysis ICA [12, 13]) components first and then use image fusion techniques
on these components.
• Probabilistic model-based approach : This type of approach consists in intro-
ducing a model which represents a relationship between the observed images and
the source images or some particular features of them ([5, 7, 8, 14]). The model
can also take into account noise and unknown parameters of the model such as the
3
registration parameters of the images. These methods may be supervised or not.
In supervised case a training step, or more generally a pre-processing, is used to
estimate the parameters of the images model ([15]). In unsupervised case these
parameters are estimated from the data themselves.
Those different approaches are not exhaustive and not independent, and they can be mixed
in hybrid methods. In all these methods there are two different objectives :
• to obtain an image which represents all the information of the sources. Because the
human vision is very sensitive on contrast changes in the image, this objective is
often reduced to construct a segmentation in which all the regions and contours of
the different sources are represented.
• to involve the reconstruction of an image by using complementary information
present in other data sets.
The method presented in this work can be classified in probabilistic model-based approach
and our objective is to obtain a common segmentation and to involve the reconstruction
at the same time. The main problem is how to combine the information contents of
different sets of data gi(r). Very often the data sets gi, and corresponding images fi,
do not represent the same quantities. A general model for these problems can be the
following :
gi(r) = [Hifi](r) + εi(r), i = 1, . . . ,M (1)
where Hi are the functional operators of the measuring systems, or registration operators
if the observations have to be registered. We may note that estimating fi given each set of
data gi is an inverse problem by itself.
4
(a)
(b)
Figure 1: Examples of images for data fusion and joint segmentation. a) T1-weighted,T2-weighted and T1-weighted with contrast agent transversal slices of a 3D brain MRimages. b) Two observations from transmission and backscattering X rays in securitysystems (with the permission of American Science and Engineering, Inc., 2003)
In this paper we consider the case where the measuring data systems can be assumed
almost perfect and the observations are registered, which means that we can write :
gi(r) = fi(r) + εi(r), i = 1, . . . ,M (2)
for r ∈ Z2. Note that if we consider images, the pixels r belong to a finite lattice S ,
and we will note S the number of pixels of this lattice. In the following we also use the
notations :
gi = fi + εi or g = f + ε (3)
where gi = {gi(r), r ∈ S} and g = {gi, i = 1, . . . ,M}.
Figure 1 shows two examples of image fusion problem. The first sets of data are multi
spectral noisy images of transversal slices of 3D brain MR images. The second example
shows a multimodal case with transmission and backscattering X-rays acquisitions of a
5
suit-case. As we can see in the observed images gi, the only thing these images have
really in common is their anatomy (contours and regions).
In this work we introduce a label variable z(r) for the regions and consider the region
labels as common feature between all images. Thus the data fusion becomes then the
estimation of joint segmentation labels z = {z(r), r ∈ S}.
The problem of segmentation is a long standing problem in computer vision. Recently
works on medical imaging propose methods to construct a segmentation from multi-
spectral images ([4, 6, 15]), which can be considered as fusion problem.
Probabilistic framework for unsupervised segmentation is a very active area and has still
shown effective results in many domains. In [16] and [17] the authors propose a Monte
Carlo Markov Chain (MCMC) method for image segmentation, using Bayesian frame-
work and Markov field prior probability. In this paper we propose to use these types of
methods in the case of multiple source images.
The Bayesian approach we propose models the observed data through p(gi|fi), the im-
ages through p(fi|z) and the classification labels z through P (z). When these priors are
appropriately assigned we obtain the expression of the a posteriori p(f ,z|g) from which
we infer not only on z but also on f . Our aim is then to obtain a common segmentation
of M observations and to reconstruct fi, i = 1, . . . ,M at the same time.
This paper is organized as follows : In section 2 we introduce the common feature z,
model the relation between the images fi to it through p(fi|z) and its proper characteris-
tics through a prior law P (z). In section 3 we give detailed expressions of the a posteriori
laws. The section 4 gives the general structure of the MCMC algorithm we used to esti-
mate f and z. In section 5 we introduce a more complex model accounting for a spatial
6
dependency of f |z in order to decrease the noise of the observations. In section 6 we
present some simulation results to show the performances of the proposed methods and
their potential applications in medical imaging and security imaging systems. Finally in
section 7 we discuss about the estimation of the number labels.
2 Modeling for Bayesian data fusion
Within the observation model (3) the expression of the posterior law p(f ,z|g) is given
by the relation :
p(f ,z|g) ∝ p(g|f) p(f |z) P (z). (4)
We need then to give precise expressions of p(g|f), p(f |z) and P (z) according to appro-
priate hypothesis on the noise model, the image model and the labels model.
2.1 Observation noise model and the likelihood
Assuming independent noises εi among the different observations we have
p(g|f) =M∏
i=1
p(gi|fi) =M∏
i=1
pεi(gi − fi)
Assuming εi centered, white and Gaussian p(εi) = N (0, σ2εiI), and S the number of
pixels of an image, we have :
p(gi|fi) = N (fi, σ2εiI) =
(1
2πσ2εi
)S2
exp
{− 1
2σ2εi
||gi − fi||2}
7
2.2 Hidden Markov modeling of images
As we want to reconstruct an image with statistically homogeneous regions, it is natural
to introduce a hidden variable z = (z(1), . . . , z(S)) ∈ {1, . . . , K}S which represents
a common classification of the images fi. The problem is now to estimate the set of
variables (f ,z) using the Bayesian approach :
p(f ,z|g) = p(f |z, g) P (z|g) (5)
Thus to be able to give an expression for p(f ,z|g) using the Bayes formula, we need to
define p(gi|fi) and p(fi|z) for p(f |z, g), and p(gi|z) and P (z) for P (z|g).
To assign p(fi|z) we first define the sets of pixels which are in the same class :
Rk = {r : z(r) = k}, |Rk| = nk
fik = {fi(r) : z(r) = k}
In this paper, in a first step, we assume that all the pixels fik of an image fi which are in
the same class k will be characterized by a mean mik and a variance σ2i k :
p(fi(r)|z(r) = k) = N (mik, σ2i k) ∀r ∈ S
With these notations we have :
p(fik) = N (mik1, σ2i kI) (6)
and thus
p(fi|z) =K∏
k=1
N (mik1, σ2i kI)
=
K∏
k=1
(1√
2πσ2i k
)nk
exp
{− 1
2σ2i k
||fik −mik1||2}, i = 1, . . . ,M(7)
8
where 1 is a vector with all components equal to 1. As we will see in section 5, we will
extend this model to the case where the pixels in different regions are assumed indepen-
dent but inside any homogeneous region we account for their local correlation by using a
Gauss-Markov model.
2.3 Potts-Markov modeling of labels
Finally we have to assign P (z). As we introduced the hidden variable z for finding
statistically homogeneous regions in images, it is natural to define a spatial dependency
on these labels. The simplest model to account for this desired local spatial dependency
is a Potts Markov Random Field model :
P (z) =1
T (α)exp
α
∑
r∈S
∑
s∈V(r)
δ(z(r)− z(s))
, (8)
where S is the set of pixels, δ(0) = 1, δ(t) = 0 if t 6= 0, V(r) denotes the neighborhood
of the pixel r (here we consider a neighborhood of 4 pixels), T (α) is the partition function
or the normalization constant and α represents the degree of the spatial dependency of the
variable z. There are many studies on the influences of this parameter. In [18], D. Higdon
showed that there exists a critical value αc which depends on the size of the images and
the number of classes. For values α < αc the Potts model realizations are strongly noisy
with a great number of small regions. For values α > αc, the realizations consist mainly
of a few large regions which become fast prevalent and a homogeneous background. The
Potts model appears then not appropriate for segmenting small regions. However it is
used and gives satisfactory results in the case of images with a homogeneous background.
In practice we fix the value of α largely greater than the critical point αc in order to force
9
the spatial dependency.
We have now all the necessary prior laws p(gi|fi), p(fi|z), p(gi|z) and P (z) and then
we can give an expression for p(f ,z|g). However these probability laws have in general
unknown parameters such as σ2εi
in p(gi|fi) or mik and σ2i k in p(fi|z). In a full Bayesian
approach, we have to assign prior laws to these ”hyperparameters”.
2.4 Conjugate priors for the hyperparameters
Let mi = (mik)k=1,...,K and σ2i = (σ2
i k)k=1,...,K be the means and the variances of the
pixels in different regions of the images fi as defined before. We define θi as the set of
all the parameters which must be estimated :
θi = (σ2εi,mi,σ
2i ), i = 1, . . . ,M
and we note θ = (θi)i=1,...,M . The choice of prior laws for the hyperparameters is still an
open problem. In [19] the authors used differential geometry tools to construct particular
priors which contain as particular case the entropic and conjugate priors. In this paper we
choose this last one.
When applied the particular priors of ([19]) for our case, we find the following conjugate
priors :
• Inverse Gamma IG(αεi0 , βεi0 ) and IG(αi0, βi0) respectively for the variances σ2
εiand
σ2i k,
• Gaussian N (mi0, σ2i 0) for the means mik.
The hyper-hyperparameters αi0, βi0, mi0 and σ2i 0 are fixed and the results are not in
general too sensitive to their exact values. However in case of noisy images we can
10
constrain small value on σ2i 0 in order to force the reconstruction of homogeneous regions.
3 A posteriori distributions for the Gibbs algorithm
The Bayesian approach consists now in estimating the whole set of variables (f ,z,θ)
following the joint a posteriori distribution p(f ,z,θ|g). It is difficult to simulate a joint
sample (f , z, θ) directly from his joint a posteriori distribution. However we can note
that considering the prior laws defined before, we are able to simulate the conditional a
posteriori laws p(f ,z|g,θ) and p(θ|g,f ,z). That is the main reason to propose a Gibbs
algorithm to estimate (f , z, θ), splitting first this set of variables into two subsets, (f ,z)
and (θ), and then into three subsets f , z and θ using the following relation :
p(f ,z|g,θ) = p(f |z, g,θ)P (z|g,θ), (9)
Then the sampling of the joint distribution p(f ,z|g,θ) is obtained by sampling first
P (z|g,θ) and then sampling p(f |z, g,θ). We will now define the conditional a pos-
teriori distributions we use for the Gibbs algorithm.
Sampling z using P (z|g,θ) :
For this step we have :
P (z|g,θ) ∝ p(g|z,θ) P (z)
=
M∏
i=1
p(gi|z,θi) P (z)
where using the relation (3) and the laws p(fi|z) and p(εi) we obtain
p(gi|z,θi) =∏
r∈Sp(gi(r)|z(r),θi)
11
and
p(gi(r)|z(r) = k) = N (mik, σ2i k + σ2
εi) (10)
As we chose a Potts Markov Random Field model for the labels z, we may note that
an exact sampling of the a posteriori distribution P (z|g,θ) is still impossible. However
we may note that P (z|g,θ) is still a PMRF where the probabilities are weighted by the
likelihood p(g|z,θ). We use this fact to propose in section 4 a parallel implementation of
a Gibbs sampling for this PMRF.
Sampling fi using p(fi|gi,z,θi) :
We can write the a posteriori law p(fi(r)|gi(r), z(r),θi) as follows :
p(fi(r)|gi(r), z(r) = k,θi) = N (miapostk , σ2
iapost
k )
where
miapostk = σ2
iapost
k
(gi(r)
σ2εi
+mik
σ2i k
)
σ2iapost
k =
(1
σ2εi
+1
σ2i k
)−1
sampling θi using p(θi|fi, gi,z) :
We have the following relation :
p(θi|fi, gi,z) ∝ p(mi,σ2i |fi,z) p(σ2
εi|fi, gi)
For the first term p(mi,σ2i |fi,z) we have to use a Gibbs algorithm and then sample
following the conditional distributions p(mi|σ2i ,fi,z) and p(σ2
i |mi,fi,z). Using again
the Bayes formula, the a posteriori distributions are calculated from the prior selection
fixed before and we have
12
• mik|fi,z, σ2i k,mi0, σ
2i 0 ∼ N (µik, v
2i k), with
µik = v2i k
(mi0
σ2i 0
+1
σ2i k
∑
r∈Rkfi(r)
)
v2i k =
(nkσ2i k
+1
σ2i 0
)−1
• σ2i k|fi,z,mik, αi0, βi0 ∼ IG(αik, βik), with
αik = αi0 +nk2
βik = βi0 +1
2
∑
r∈Rk(fi(r)−mik)
2
• σ2εi|fi, gi ∼ IG(αi, βi), with
αi =S
2+ αεi0 , S = number of pixels
βi =1
2||gi − fi||2 + βεi0
4 Parallel implementation of the sampling of p(z|g,θ)
As we could see in previous section, to generate samples from p(f ,z,θ|g) we gener-
ate alternatively samples z from P (z|g,θ), then f from p(f |g,z,θ) and finally θ from
p(θ|f , g,z). The second step is easy because p(f |z,θ, g) is Gaussian. The last step
is also easy because we have to generate samples from either a Gaussian or an Inverse
Gamma distribution. The first step, i.e. sampling z from P (z|g,θ), is not easy and by
itself needs a Gibbs sampler. However, as we chose a first order neighborhood system
for the a priori PMRF of the labels P (z), the a posteriori is still a PMRF with the same
neighborhood. We can then decompose the whole set of pixels into two subsets (odd and
13
even position) forming a chess board (see figure 2). In this case if we fix the black (respec-
tively white) labels, then the white (respectively black) labels become independent. This
decomposition reduces the complexity of the Gibbs algorithm because we can simulate
the whole set of labels in only two steps.
black labels
white labels
Figure 2: Chess board decomposition of the labels z
The Parallel Gibbs algorithm we implemented is then the following : given an initial state
(θ1, θ2, z)(0),
Parallel Gibbs samplingrepeat until convergence
1. simulate zB(n) ∼ p(z|zW (n−1), g, θ
(n−1))
simulate zW (n) ∼ p(z|zB(n), g, θ
(n−1))
simulate fi(n) ∼ p
(fi|gi, z(n), θi
(n−1))
2. simulate θi(n) ∼ p
(θi|fi
(n), z(n), gi
)
5 Accounting for local spatial dependency inside regions
We want now to introduce a local dependency between pixels of fik which are in a same
homogeneous region k. In previous section we assumed that these pixels are indepen-
dent even if they share the same mean and variance. In this section we want to relax
this hypothesis by accounting for possible local correlation. Our aim is to improve the
14
reconstructed images and then (because our algorithm is iterative) improve the quality of
our classification. We will now describe this new modelization and the modifications it
implies.
5.1 New modelization on the images fi
We now consider that pixels fi(r) inside a same region are locally dependent. However
pixels being in different regions stay independent. Note that this is our a priori hypothesis.
All the pixels either inside a given region or in different regions are a posteriori interde-
pendent. To be able to distinguish between the pixels in different regions we introduce a
hidden ”contour” variable q = {q(r), r ∈ S} as follows :
q(r) = 0, if {z(s), s ∈ V(r)} are in a same region,
= 1, else
We may note that when z is given, q is obtained in a deterministic way. So, q(r) is related
to z(r) and then the distribution of q is related to the distribution of z by the following
relation :
P (q(r) = 1|z) = 1−∏
s∈V(r)
δ(z(r)− z(s)) (11)
Then we have :
p(fi|z, q,θi) =K∏
k=1
p(fik|z, q,θi)
Let note fiV(r) = {fi(s), s ∈ V(r)}, where V(r) stands for the neighborhood of r and
|V| is its size (the number of pixels of the neighborhood system which is 4 here). Then
15
we can write :
p(fi(r)|z(r) = k, q(r),fiV(r),θi) = N (µk, σ2k) if q(r) = 1
= N (1
4
∑
s∈V(r)
fi(s),σ2k
4) if q(r) = 0
(12)
where 14
∑s∈V(r) fi(s) is the mean value of the four neighboring pixels around the pixel
position r. Note also that we can group these two cases together by noting
mfi(r) = q(r)µk + (1− q(r))1
4
∑
s∈V(r)
fi(s)
σ2fi(r)
= q(r)σ2k + (1− q(r))σ
2k
4
With these notations we can write the distribution of the likelihood p(gi(r)|z(r) = k, q(r),fiV(r),θi)
as in section 2 :
p(fi(r)|z(r) = k, q(r),fiV(r),θi) = N (mfi(r), σ2fi(r)
),
and
p(gi(r)|z(r) = k, q(r),fiV(r),θi) = N (mfi(r), σ2fi(r)
+ σ2εi
)
5.2 A posteriori distributions
As we chose a spatial dependency between pixels fi(r) with a neighborhood system of 4
pixels, we have the same problem as for the labels. Then we have to decompose the set of
variables fi into two subsets, fiW and fiB , which represent respectively odd numbered
position (labeled white) and the even numbered position (labeled black) pixels of the
image fi. Let note also fW
= {fiW}i=1,...,M and fB
= {fiB}i=1,...,M .
For this case we propose then to decompose directly the set of variables into three subsets
16
: (fW,zW ), (f
B,zB) and θ and then we have to sample them with their conditional a
posteriori distributions. For the first two subsets we can use the same decomposition of
(9) :
p(fW,zW |fB,zB, g,θ, q) = p(f
W|f
B,z, g,θ, q) P (zW |fB,zB, g,θ, q)
p(fB,zB|fW ,zW , g,θ, q) = p(f
B|f
W,z, g,θ, q) P (zB|fW ,zW , g,θ, q)
and we have also
p(fW|f
B,z, g,θ, q) =
M∏
i=1
p(fiW |fiB,z, gi,θi, q)
p(fB|f
W,z, g,θ, q) =
M∏
i=1
p(fiB|fiW ,z, gi,θi, q)
Sampling fiB and fiW using p(fiB|fiW ,z, gi,θi, q) and p(fiW |fiB,z, gi,θi, q)
With this decomposition we have the following relations :
p(fiB|gi,fiW ,z, q,θi) =∏
r black
p(fi(r)|gi(r),fiV(r), z(r), q(r),θi)
p(fiW |gi,fiB,z, q,θi) =∏
r white
p(fi(r)|gi(r),fiV(r), z(r), q(r),θi),
and with the same method of section 4, we obtain the a posteriori distribution :
p(fi(r)|gi(r), z(r), q(r),fiV(r),θi) = N (mapost, σ2apost),
with
mapost = σ2apost
(gi(r)
σ2εi
+mfi(r)
σ2fi(r)
)
σ2apost =
(1
σ2εi
+1
σ2fi(r)
)−1
17
Sampling zB and zW using P (zB|fW ,zW , g,θ, q) and P (zW |fB,zB, g,θ, q)
Using the Bayes rule we have
P (zB|zW , g,fW , q,θ) ∝ p(g|z,fW, q,θ) p(f
W|z, q,θ) P (zB|zW ) (13)
Due to the term p(fW|z, q,θ) in the right hand side of 13, we can not obtain an explicite
expression for the a posteriori distribution P (zB|zW , g,fW , q,θ). We propose then, for
this step, two different approximations. The first one is to approximate p(fW|z, q,θ) by
its expected value with respect to zB :
p(fW|z, q,θ) ≈
∑
zB
p(fW|z, q,θ) P (zB|zW ) = p(f
W|zW , q,θ), (14)
which becomes a constant with respect to zB . Indeed this approximation can be inter-
preted as a mean field approximation method ([20]). The approximated a posteriori dis-
tribution we propose to use for this step is :
P (zB|zW , g,fW , q,θ) ∝ p(g|z,fW, q,θ) P (zB|zW )
∝ P (zB|zW )M∏
i=1
∏
r black
p(gi(r)|z(r),fiV(r), q(r),θi)
(15)
We also have the symmetric relation
P (zW |zB, g,fB, q,θ)
∝ P (zW |zB)M∏
i=1
∏
r white
p(gi(r)|z(r),fiV(r), q(r),θi)
(16)
Note that the likelihood function p(gi(r)|z(r),fiV(r), q(r),θi) = N (mfi(r), σ2fi(r)
+ σ2ε)
is different from p(gi(r)|z(r),θi) = N (mk, σ2k + σ2
ε) in section 3 and more expensive in
computer time. The second approximation we propose then is to use the second expres-
sion in place of the first one in this step.
18
Updating q
As we mentionned before, given z, q is determined in a deterministic way and is updated
using the current variable z and the relation (11).
Sampling θi|z, gi,fi, q
We still use the same method to obtain the a posteriori distributions of the parameters of
θi. However we have here to decompose the set Rk into two subsets as follows :
Rk = R0k ∪R1
k
with Rik = {r; z(r) = k, q(r) = i}. Let also note nik = |Ri
k|. With this decomposition we
can calculate the a posteriori distributions of θi :
• mik|fi,z, q, σ2i k,mi0, σ
2i 0 ∼ N (µik, v
2i k), with
µik = v2i k
mi0
σ2i 0
+1
σ2i k
∑
r∈R1k
fi(r)
v2i k =
(n1k
σ2i k
+1
σ2i 0
)−1
An approximation of these equations can be to replace R1k by the whole Rk in the
determination of µik. Indeed even if we have changed the model by introducing
spatial dependency on fi, we have still in mind that pixels fi(r) which are in a
same homogeneous region must have the same mean. Then we grow the number of
pixels for calculating µik when we replace R1k by Rk.
19
• σ2i k|fi,z, q,mik, αi0, βi0 ∼ IG(αik, βik), with
αik = αi0 +nk2
βik = βi0 +1
2
∑
r∈R1k
(fi(r)−mik)2 + 2
∑
r∈R0k
(fi(r)−1
4
∑
s∈V(r)
fi(r))2
(17)
• σ2εi|fi, gi ∼ IG(νi,Σi), with
νi =S
2+ αεi0 , S = total number of pixels
Σi =1
2||gi − fi||2 + βεi0
5.3 New Gibbs algorithm
The difference between the algorithm of section 4 is in the decomposition of the set of
variables. The Gibbs algorithm we have implemented is then :
Parallel Gibbs samplingrepeat until convergence
1. simulate zW (n) ∼ P(zW |zB(n−1), f
B
(n−1), g, θ
(n−1), q(n−1)
)
simulate fi(n)
W ∼ p(fiW |fi(n−1)
B gi, z(n−1), θi
(n−1), q(n−1)
)
2. simulate zB(n) ∼ P(zB|zW (n), ˆf
W
(n), g, θ
(n−1), q(n−1)
)
simulate fi(n)
B ∼ p(fiB|fi(n)
W gi, z(n), θi
(n−1), q(n−1)
)
3. compute q(n) using z(n)
4. simulate θi(n) ∼ p
(θi|fi
(n), z(n), gi
)
20
6 Simulation and results
In this section we present results of our two models in different cases. First we test our
methods on fully simulated data sets to evaluate the different performances in case of
presence of noise. Then we present results on MRI images, but with the addition of an
artificial noise to compare the two methods. This second test permits us to have noisy
registered data sets of the same objects. Then those images are also considered here as
test images to compare the two proposed methods. Finally we test our algorithm in a
real application of security system using two X-ray images of the same object : X-ray
in transmission and in backscattering. These data sets are courtesy of the permission of
American Science and Engineering, Inc, 2003 (www.as-e.com). In the following we note
by ”HMMI” the first method described on this paper, and by ”HMMC” the second method
where we have introduced a local spatial correlation on the pixels of the images fi.
6.1 Simulated data
Here we have constructed two (256 × 256) normalized images, noted by f1 and f2 with
individual and common regions (fig. 3-a) . We then added independent Gaussian noises
then to obtain the noisy images g1 and g2 (fig. 3-b) wich we used as data for our pro-
posed data fusion methods.. The performances of these methods are evaluated using the
following measure between two images u and v :
d(u,v) =||u− v||2||u||2
We compared then the performances of these methods as a function of the variance of
noise in the observations. The estimated images are respectively noted by fiHMMI
and
21
σ2εi
d (f1, g1) d“f1, f1
HMMI”
d“f1, f1
HMMC”
d (f2, g2) d“f2, f2
HMMI”
d“f2, f2
HMMC”
0 0 0.0000 0.0000 0 0.0000 0.00000.001 0.0126 0.0053 0.0014 0.0013 0.0006 0.00010.002 0.0258 0.0109 0.0016 0.0027 0.0011 0.00020.005 0.0644 0.0279 0.0021 0.0067 0.0029 0.00030.01 0.1288 0.0568 0.0058 0.0134 0.0064 0.00110.02 0.2575 0.1239 0.0094 0.0268 0.0142 0.00190.05 0.6438 0.3215 0.0279 0.0670 0.0367 0.00490.1 1.2876 0.6805 0.0628 0.1340 0.0766 0.0105
Table 1: Comparison of the two methods with noisy data
σ2εi
s HMMIε1
s HMMIε2
s HMMCε1
s HMMCε2
0 0.0003 0.0003 0.0002 0.00020.001 0.0007 0.0007 0.0010 0.00100.002 0.0011 0.0011 0.0019 0.00180.005 0.0020 0.0019 0.0048 0.00460.01 0.0040 0.0038 0.0094 0.00880.02 0.0068 0.0069 0.0193 0.01800.05 0.0155 0.0146 0.0486 0.04530.1 0.0310 0.0254 0.0966 0.0937
Table 2: Estimation of the noise variance
fiHMMC
.
Results of reconstruction and segmentation : Figure 3 and 4 show some results with
different values of noise’s variance. When the observations have not any noise, both
methods give perfect results of segmentation. However we can note that in this case the
first method converges faster than the second.
In the presence of noise, the degradation of the segmentation appears in the smallest
regions. Indeed the results of segmentation show the loss of small regions, especially with
the first method. This is due to the fact that even if the Gibbs algorithm asymptotically
ensures the convergence to the global minimum, it can be locked in a local minimum.
This particular case appears when observations are too noisy. We can also remark that
the first method does not significantly increase the quality of the reconstructed images.
22
(a) f1, f2 (b) g1, g2
(c) f1
HMMI, f2
HMMI, zHMMI
(d) f1
HMMC, f2
HMMC, z HMMC
Figure 3: Results of data fusion with high SNR (σ2εi
= 0.001) : (a,b) original images f1
and f2 and (b) their corresponding observations g1 and g2. (c) results of data fusion (7labels) with the first model (from right to left : f1
HMMI, f2
HMMIand z HMMI). (d) results of
data fusion (7 labels) with the second model : (from right to left : f1
HMMC, f2
HMMCand
z HMMC.)
The algorithm seems to reconstruct exactly the data without canceling noise. However
the second method gives denoised images and a better segmentation. We can also note
that individual regions of the first data set g1 (resp. g2) appears in the reconstructed image
f2 (resp. f1). This is due to the modeling we have chosen, where we considered a unique
segmentation and reconstructed both images from it, which means that the two images
consist of the same objects.
Noise estimation : Table 1 summarizes the performances of the second method. Indeed
the denoising part of this method implies a reduction of the measure d by a factor 20 in
23
(a) f1, f2 (b) g1, g2
(c) f1
HMMI, f2
HMMI, z HMMI
(d) f1
HMMC, f2
HMMC, z HMMC
Figure 4: Results of data fusion with low SNR (σ2εi
= 0.01) : (a,b) original images f1 andf2 and (b) their corresponding observations g1 and g2. (c) results of data fusion with thefirst model (from right to left : f1
HMMI, f2
HMMIand z HMMI. (d) results of data fusion with
the second model : (from right to left : f1
HMMC, f2
HMMCand z HMMC.
relation to the initial measure between the real data and the noisy observations. The first
method permits to reduce this measure only by a factor 2. The gain of performance of
the second method is then significant when the observations are noisy. Also in table 2
we can see that the noise is better estimated by the second algorithm, which confirms the
better quality of the denoising step. However even if the estimated images are better, the
common segmentation is not changed in relation to the first method.
24
6.2 Medical imaging
(a) f1, f2, f3
(b) g1, g2, g3
(c) f1
HMMI, f2
HMMI, f3
HMMI, z HMMI
(d) f1
HMMC, f2
HMMC, f3
HMMCand z HMMC
Figure 5: Data fusion of medical images : (a) original data. (b) noisy observations witha Gaussian noise of variance 0.005. (c) Estimation with the first method (7 labels). (d)Estimation with the second method (7 labels).
Here we illustrate an example of MRI noisy images : T1-weighted, T2-weighted and T1-
weighted with contrast agent slices of a MR brain image, which are (289 × 236) images.
25
Here we used these images as the test images f1, f2 and f3. Then we added Gaussian
iid noises to them to obtain the simulated observations g1, g2 and g3, according to the
observation model 3.
Figure 5 shows the reconstruction and joint segmentation results of our algorithms. This
confirms the remarks made on simulated data : the reconstructions are largely better with
the second method. However we can see that the segmentation results are almost the
same except in the central part where some single pixels are badly classified with the
first method. In case of high signal-to-noise ratio it is not necessary to use the second
method. Finally we have to note that we did not introduce any physiological information
on particular tissues or on the characteristics of the MRI images. Those data sets are
only used as test images. We can expect for better results if we study more in detail the
particularities of MRI images.
6.3 Imaging in security systems
Here we test our algorithms on two images (transmission and backscattering X-rays data)
of a suitcase which are (141 × 198) images. We compared then our two fusion methods
to some other classical algorithms provided by a Matlab fusion Toolbox ([21]): Average,
Principle Component Analysis (PCA), Laplacian pyramid and Shift Invariant Discrete
Wavelet Transform (SIDWT).
Figure 6 then shows the different results of the fusion methods. In all the methods of the
Matlab fusion toolbox the right gun is not detected because it has not enough contrast
changes, in relation to the the details present in the same location in the backscattering X-
ray image. Because our algorithms produce a segmentation the right gun appears clearly
26
(a) g1 and g2
(b) Average (c) PCA (d) Laplacian pyramid
(e) SIDWT (f) HMMI (g) HMMC
Figure 6: Data fusion in X-ray security system images : (a) original data. Fusion result ofdifferent methods : (b) Average, (c) PCA-transform, (d) Laplacian Pyramid, (e) SIDWT,(f) our first method (8 labels), (g) our second method (8 labels).
after convergence. In particular our first method presents good results of detection of the
two guns. However we can expect for better results on these images if we implement a
texture classification. This can be possible if we extend our second model by considering
the neighborhood of a pixel fi(r) differently than computing the mean. This remains an
open problem.
27
7 Estimation of K
In the proposed model we must have a quite precise idea about the number K of labels.
Indeed in the case of the simulated data we chose K = 7 even if the theoretical perfect
segmentation consists of only 6 regions and thus 6 labels. The two algorihtms canceled
one label during the iterations and resulted to a good segmentation. In simulations of
the section 6.3 too, we obtained quite similar results with K ∈ {8, 9, 10}. However our
algorithms need to have fixed value of this parameter. In a fully unsupervised joint seg-
mentation we have to estimate K. There are a great number of works on the estimation
of K. In [17], K can vary along the iterations of the algorithm using a Reversible Jump
MCMC method, but this solution is too expensive to be implemented in real applications.
A more tractable solution consists ([22, 23]) in estimating this parameter by a preprocess-
ing step using prior information or fixing bounded value of K. In particular the authors in
[22] propose to use the minimum description length (MDL) as a function of K in the case
of a Finite Normal Mixture (FNM) model. For our case we can write the MDL function
as follows :
MDL(K) = −logL(Θ) + 0.5(Ka)log(S),
where Θ is the ML estimate of Θ = {{P (z(r) = k)}r∈S,k=1,...,K},θ}, L(Θ) is the like-
lihood of the model parameter, and Ka is the number of degrees of freedom of the model.
Considering the HMM model and the assumptions of section 3 we have :
p(g1(r), . . . , gM (r)|Θ) =K∑
k=1
P (z(r) = k) p(g1(r), . . . , gM(r)|z(r) = k,θ)
=K∑
k=1
P (z(r) = k)M∏
i=1
p(gi(r)|z(r) = k,θi)
28
Because we chose a PMRF model on the labels the computation of P (z(r) = k) is quite
impossible. Then for this preprocessing we propose to make the approximation that the
labels are independent, P (z(r) = k) = πk, which is the case of FNM model, and we can
write :
p(g1(r), . . . , gM(r)|Θ) =K∑
k=1
πk
M∏
i=1
p(gi(r)|z(r) = k,θi)
The computation of the ML estimate Θ = {π1, . . . , πK ,θ} is done by a hybrid method
using Expectation-Maximization (EM) and Classification-Maximization (CM) algorithms
([22]).
K 3 4 5 6 7 8 9 KMDL (without noise) -6037 -8305 -9236 -10026 -10024 -10021 -10019 6
MDL (with noise) -1225 -1213 -1199 -1239 -1227 -1214 -1202 6
Table 3: Estimation K for the simulated data
K 5 6 7 8 9 10 KMDL(g1) -13359 -13587 -13648 -13715 -13717 -1.3695 9MDL(g2) -16149 -16599 -16816 -17026 -17134 -17175 10
Table 4: Estimation K fro the sut-cases taken independently
5 10 15 20 25 30−4.5
−4.4
−4.3
−4.2
−4.1
−4
−3.9
−3.8
−3.7x 104
MDL
K
Figure 7: Estimation of joint K for the suit-cases
Table (3) shows the results of the estimation of K for the simulated data. The method
seems to give good results of estimation both for no nose data and noisy data. In the case
of inspection imaging this method gives also reasonable estimation of K if we take each
29
image independently (Table 4. However in the case of the joint segmentation problem
the MDL objective function did not reach the minimum (Figure 7) and seems to be quite
constant between 20 and 30 labels. These results are due to the fact that both images
present a lot of small regions with different grey-scale values. The rough approximation
of the likelihood of the HMM model by the likelihood of the FNM model seems then
to be not efficient. Our future studies is then on a criterion which take into account the
HMM model.
Aknowledgement
The authors would like to thank the referees for their useful remarks and suggestions
which improved the content of this paper.
8 Conclusion
We proposed a Bayesian approach for data fusion of images using a hierarchical Markov
modeling which permits us to obtain a joint segmentation for these images as data fusion
result. The proposed MRF for the labels is the Potts MRF. We proposed then two particu-
lar models for the pixels of images in each segment. The first model considers these pixels
independent and the second model introduces a local spatial dependency on these pixels.
We then developed appropriate Gibbs sampling for the two models and illustrated how
joint segmentation and reconstruction can be obtained in cases of simulated data sets. We
showed then how denoising and fusion can be obtained at the same time with an MCMC
algorithm. We showed also that our approach gives better results of fusion than classical
30
methods, in the case of X-ray inspection images. However we assume for the moment
that the sensor images are registered. We think that our modelization is promising for
introducing a registration and blur operator Hi and then implementing the common seg-
mentation, deblurring and registration at the same time. This remains an open problem
and is our future studies.
References
[1] S. Gautier, G. Le Besnerais, A. Mohammad-Djafari, and B. Lavayssiere, “Data
fusion in the field of non destructive testing,” in Maximum Entropy and Bayesian
Methods. Kluwer Academic Publ., Santa Fe, NM, K. Hanson edition, 1995.
[2] T. Bass, “Intrusion detection systems and multisensor data fusion,” in Comm. of the
ACM, vol. 43, April 2000, pp. 99–105.
[3] G. Matsopoulos, S. Marshall, and J. Brunt, “Multiresolution morphological fusion
of MR and CT images of the human brain,” in IEEE Proceedings on Vision, Image
and Signal Processing, vol.141 Issue : 3, Seattle, USA, 1994, pp. 137–142.
[4] B. Johnston and B. Mackiewitch, “Segmentation of multiple sclerosis lesions in
intensity corrected multispectral MRI,” IEEE Trans. on medical imaging, pp. 154–
169, April 1996.
[5] Chuin-Mu Wang and Clayton Chi-Chang Chen et al., “Detection of spectral sig-
natures in multispectral MR images for classification,” IEEE Trans. on medical
imaging, pp. 50–61, January 2003.
31
[6] E. Reddick and J.O. Glass et al., “Automated segmentation and classification of
multispectral Magnetic Resonance Images of brain using artificial neural networks,”
IEEE Trans. on medical imaging, pp. 911–918, December 1997.
[7] M.N. Ahmed and M. Yamany et al., “A modified fuzzy c-means algorithm for bias
field estimation and segmentation of MRI data,” IEEE Trans. on medical imaging,
pp. 193–199, March 2002.
[8] R.K. Sharma, Probabilistic model-based multisensor image fusion, Ph.D. thesis,
Graduate Institute of Science and Technology, Oregon, USA, 1999.
[9] Du Yong et al., “Satellite image fusion with multiscale wavelet analysis for marine
applications : preserving spatial information and minimizing artifacts (PSIMA),” J.
Remote Sensing, Vol.29, No 1, pp. 14–23, 2003.
[10] Ramesh Chaveli et al., “Fusion performance measures and a lifting wavelet trans-
form based algorithm for image fusion,” in Information Fusion, Proc. of the 5th int.
conf. on, July 2002, pp. 317–320.
[11] P.S. Chavez and A.Y. Kwarteng, “Extracting spectral contrast in Landsat thermal
mapper image data using Principal Component Analysis,” in PE and RS(55), 1989,
pp. 339–348.
[12] G. Simone and F.C. Morabito, “ICA-NN based data fusion approach in ECT signal
restoration,” in Neural Networks, Proceeding of the IEEE-INNS-ENNS International
Joint Conference on, Vol 5, July 2000, pp. 59–64.
32
[13] C.H. Chen and Z. Xiaouhui, “On the roles of PCA and ICA in data fusion,” in Geo-
science and Remote Sensing Symposium, IEEE International Proceedings IGARSS,
vol 6, July 2000, pp. 2620–2622.
[14] K. Held and E.R. Kops et al., “Markov Random Field segmentation of brain MR
images,” IEEE Trans. on medical imaging, pp. 878–886, December 1997.
[15] L. Aurdal, Analysis of Multi-Image Magnetic Resonance Acquisitions for Segmen-
tation and Quantification of Cerebral Pathologies, Ph.D. thesis, Ecole Nationale
Superieure des Telecommunications, ENST, Paris, France, 1997.
[16] Tu Zhuowen and Zhu Song-Chun, “Image segmentation by data-driven Markov
Chain Monte Carlo,” IEEE Trans. on pattern analysis and machine intelligence, pp.
657–673, May 2002.
[17] Z. Kato, “Bayesian color image segmentation using Reversible Jump Markov Chain
Monte Carlo,” Tech. Rep., ERCIM (European Research Consortium for Informatics
and Mathematics), february 1999.
[18] D. Higdon, Spatial Applications of Markov Chain Monte Carlo for Bayesian Infer-
ence, Ph.D. thesis, University of Washington, 1994.
[19] H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and segmentation of
mixed images,” Journal of Electronic Imaging, vol 13(2), April 2004.
[20] D. Chandler, Introduction to modern statistical mechanics, Oxford university press,
1987.
33
[21] O. Rockinger and T. Feshner, “Pixel-level image fusion : the case of image se-
quences,” in Proc. SPIE, vol. 3374, february 1998, pp. 378–398.
[22] Tianhu Lei and Wilfred Sewchand, “Statistical approach to X-ray CT imaging and
its applications in image analysis–part ii : a new stochastic model-based image seg-
mentation technique for X-ray CT image,” IEEE Trans. on medical imaging, vol.
11, no. 1, pp. 62–69, March 1992.
[23] Tianhu Lei and Jayaram K. Udupa, “Performance evaluation of finite normal mix-
ture model-based image segmentation techniques,” IEEE Trans. on image pro-
ceesing, vol. 12, no. 10, pp. 1153–1169, October 2003.
[24] F. Samadzadegan, “Fusion techniques in remote sensing,” in Com. IV Joint work-
shop on challenges in geospatial analysis integration and visualisation II, Stuttgart,
Germany, September 2003.
[25] G. Gindi, M. Lee, A. Rangarajan, and I. George Zubal, “Bayesian reconstruction
of functional images using anatomical information as priors,” IEEE Transaction on
medical imaging, vol. 12, no. 4, pp. 670–680, 1993.
[26] T. Hebert and R. Leahy, “A generalized EM alogorithm for 3-D Bayesian reconstruc-
tion from Poisson data using Gibbs priors,” IEEE Transaction on medical imaging,
vol. 8, no. 2, pp. 194–202, June 1989.
[27] S. Gautier, J. Idier, A. Mohammad-Djafari, and B. Lavayssiere, “X-ray and ultra-
sound data fusion,” in Proceeding of the International Conference on Image Pro-
cessing, Chicago, USA, October 1998, pp. 366–369.
34
[28] C. Robert, Methodes de Monte Carlo par Chaınes de Markov, Economica, Paris,
France, 1996.
List of Figures
1 Examples of images for data fusion and joint segmentation. a) T1-weighted,
T2-weighted and T1-weighted with contrast agent transversal slices of
a 3D brain MR images. b) Two observations from transmission and
backscattering X rays in security systems (with the permission of Ameri-
can Science and Engineering, Inc., 2003) . . . . . . . . . . . . . . . . . . 5
2 Chess board decomposition of the labels z . . . . . . . . . . . . . . . . . 14
3 Results of data fusion with high SNR (σ2εi
= 0.001) : (a,b) original images
f1 and f2 and (b) their corresponding observations g1 and g2. (c) results
of data fusion (7 labels) with the first model (from right to left : f1
HMMI,
f2
HMMIand z HMMI). (d) results of data fusion (7 labels) with the second
model : (from right to left : f1
HMMC, f2
HMMCand z HMMC.) . . . . . . . . . . 23
4 Results of data fusion with low SNR (σ2εi
= 0.01) : (a,b) original images
f1 and f2 and (b) their corresponding observations g1 and g2. (c) results
of data fusion with the first model (from right to left : f1
HMMI, f2
HMMIand
z HMMI. (d) results of data fusion with the second model : (from right to
left : f1
HMMC, f2
HMMCand z HMMC. . . . . . . . . . . . . . . . . . . . . . . 24
35
5 Data fusion of medical images : (a) original data. (b) noisy observations
with a Gaussian noise of variance 0.005. (c) Estimation with the first
method (7 labels). (d) Estimation with the second method (7 labels). . . . 25
6 Data fusion in X-ray security system images : (a) original data. Fusion
result of different methods : (b) Average, (c) PCA-transform, (d) Lapla-
cian Pyramid, (e) SIDWT, (f) our first method (8 labels), (g) our second
method (8 labels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Estimation of joint K for the suit-cases . . . . . . . . . . . . . . . . . . . 29
List of Tables
1 Comparison of the two methods with noisy data . . . . . . . . . . . . . . 22
2 Estimation of the noise variance . . . . . . . . . . . . . . . . . . . . . . 22
3 Estimation K for the simulated data . . . . . . . . . . . . . . . . . . . . 29
4 Estimation K fro the sut-cases taken independently . . . . . . . . . . . . 29
36