Image Segmentation and Classiﬂcation Based on a 2D … · 2017-10-26 · image. The transition between superstates is modeled as a ﬂrst-order Markov chain and each superstate

Image Segmentation and Classification Based on a 2DDistributed Hidden Markov Model

Xiang Ma, Dan Schonfeld and Ashfaq Khokhar

Department of Electrical and Computer Engineering, University of Illinois at Chicago,

851 South Morgan Street, Chicago, IL, U.S.A

ABSTRACT

In this paper, we propose a two-dimensional distributed hidden Markov model (2D-DHMM), where dependency ofthe state transition probability on any state is allowed as long as causality is preserved. The proposed 2D-DHMMmodel is result of a novel solution to a more general non-causal two-dimensional hidden Markov model (2D-HMM)that we proposed. Our proposed models can capture, for example, dependency among diagonal states, which canbe critical in many image processing applications, for example, image segmentation. A new sets of basic imagepatterns are designed to enrich the variability of states, which in return largely improves the accuracy of stateestimations and segmentation performance. We provide three algorithms for the training and classification ofour proposed model. A new Expectation-Maximization (EM) algorithm suitable for estimation of the new modelis derived, where a novel General Forward-Backward (GFB) algorithm is proposed for recursive estimationof the model parameters. A new conditional independent subset-state sequence structure decomposition ofstate sequences is proposed for the 2D Viterbi algorithm. Application to aerial image segmentation shows thesuperiority of our model compared to the existing models.

Keywords: I mage classification, Hidden Markov models, Image segmentation.

1. INTRODUCTION

Hidden Markov Models (HMMs) have received tremendous attention in recent years due to its wide applicabilityin diverse areas such as speech recognition,1 gesture and motion trajectory recognition2,3 image classification4

and retrieval,5 musical score following,6 genome data analysis,7 etc. Most of the previous research has focusedon the classical one-dimensional HMM developed in the 1960s by Baum et al,8 where the states of the systemform a single one-dimensional Markov chain. However, the one-dimensional structure of this model limits itsapplicability to more complex data elements such as images and videos.

Early efforts devoted to extending 1D-HMMs to higher dimensions were presented by pseudo 2D-HMMs9.10

The model is called “pseudo 2D” in the sense that it is not a fully connected 2D-HMM. The basic assumptionis that there exists a set of “superstates” that are Markovian and within each superstate there is a set of simpleMarkovian states. To illustrate this model for higher dimensional systems, let us consider a two-dimensionalimage. The transition between superstates is modeled as a first-order Markov chain and each superstate is usedto represent an entire row of the image; a simple Markov chain is then used to generate observations in thecolumn, as depicted in Fig. 1(a). Thus, superstates relate to rows while simple states to columns of the image.Later efforts to represent 2D data using 1D HMMs were proposed using coupled HMMs (CHMMs)11.12 In thisframework, each state of the 1D HMM is used as a meta-state to represent a collection of states, as depictedin Fig. 1(b). For example, image representation based on CHMMs would rely on a 1D HMM where each staterepresents an entire column of the image. In certain applications, these models perform better than the classical1D-HMM.9 However, the performance of pseudo 2D-HMMs and CHMMs remains limited since these modelscapture only part of the two-dimensional hidden state information.

Genuine efforts to represent true 2D-HMM models were presented by Devijver13.14 In this model, Markovmeshes, in particular, second- and third-order Markov meshes, were used to characterize the state process alongwith hidden observations to represent images thus proposing a true 2D-HMM model. Although this general 2D-HMM model is very powerful, analytic solutions for training and classification algorithms needed to determinethe maximum a posteriori classification were not provided. Nonetheless, suboptimal classification algorithms

Figure 1. Various two-dimensional hidden Markov models: (a) Pseudo 2D-HMM (b) Coupled HMM (CHMM) (c) Causal2D-HMM4 with two nearest neighbors in vertical and horizontal directions (d) Proposed general 2D-HMM

have been proposed for 2D-HMMs, namely, the deterministic relaxation algorithm.13 The use of suboptimalalgorithms in various applications served to demonstrate the importance of the information embedded in two-dimensional models. For example, Levin and Pieraccini15 developed an algorithm for character recognition basedon 2D-HMM, while Park and Miller16 constructed a 2D-HMM-based image decoding system over noisy channels.

The first analytic solution to true two-dimensional HMMs has been presented by Li, Najmi and Gray.4 Theyproposed a causal two-dimensional HMM and presented its application to image Classification. In this model,state transition probability for each node is conditioned on the states of nearest neighboring nodes from thehorizontal and vertical directions, as depicted in Fig. 1(c). The limitation of this approach is that the statedependence of a specific node may arise from any direction and from any of its neighbors. Thus, the analyticsolution to the two-dimensional model presented in4 will only capture partial information. In particular, thetraining and classification algorithms presented in4 rely on the causality of the model. Hence, direct extension ofthese algorithms to general 2D-HMMs, which can represent state dependencies from neighbors in all directions,is not possible since such a model is inherently non-causal.

In this paper, we propose a novel two-dimensional distributed hidden Markov model (2D-DHMM) framework.We first provide a solution for non-causal, two-dimensional HMMs by distributing the non-causal model intomultiple distributed causal HMMs. We approximate the simultaneous solution of multiple distributed HMMs ona sequential processor by an alternate updating scheme, one possible alternate updating scheme is depicted in Fig.2. Numbers {1, 2, 3, 4...} are the sequence order of updating of model parameters. Subsequently we extend thetraining and classification algorithms presented in4 to a general causal model. A new Expectation-Maximization(EM) algorithm for estimation of the new model is derived, where a novel General Forward-Backward (GFB)algorithm is proposed for recursive estimation of the model parameters. A new conditional independent subset-state sequence structure decomposition of state sequences is proposed for the 2D Viterbi algorithm. The newmodel can be applied to many problems in pattern analysis and classification.

2. TWO-DIMENSIONAL DISTRIBUTED HIDDEN MARKOV MODEL

In traditional block-based image segmentation algorithms, feature vectors are generated for each image block;segmentation decisions are made independently for each block based on feature information. The performanceof such algorithms is limited since context information between blocks is lost. J. Li et al.4 proposed a two-dimensional hidden Markov model for image segmentation, where state transition probability for each block is

Figure 2. Sequential alternate updating scheme of multiple distributed HMMs

conditioned on the states of nearest neighboring blocks from horizontal and vertical directions, depicted in Fig.1(c). However, the context information that a block depends on may arise from any direction and from any of itsneighbors, as depicted in Fig. 3(a). Thus, the existing two-dimensional model will only capture partial contextinformation. Generalization of the hidden Markov model (HMM) framework to represent state dependencies fromneighbors in all directions and all its neighbors will make the model non-causal. The non-causal hidden Markovmodel with arbitrary directions of state transitions (or state dependencies) is powerful of characterizing theintrinsic state transition structure and behavior of complex systems involving multi-dimensional system states.For example, for multiple object motion trajectories analysis, a non-causal model is needed that will capturethe representation of motion trajectories of multiple objects while: (i) not isolating the motion trajectoriesto individual objects and thus losing their interaction; and (ii) avoiding costly semantic analysis that wouldperform classification based on heuristics rather than the inherent probabilistic model used for multiple trajectoryrepresentation. This goal can only be achieved by providing an algorithm based on non-causal HMM.

The enormous challenge confronting us is that no solution has yet been proposed to the general non-causalmulti-dimensional HMM models. Even for the simplest case, whereas the dimensionality is reduced to two, thetask is still extremely hard due to its non-causality. Since almost all existing methods rely on the causalityassumption, and currently there is no analytic solution to non-causal problem, one has to find ways to break thenon-causality.

We propose a novel solution to arbitrary non-causal two-dimensional hidden Markov model, by distributingit into multiple causal distributed hidden Markov models and process them simultaneously. For simplicity, wewould discuss in detail the novel solution to arbitrary non-causal two-dimensional hidden Markov models, ourscheme can be easily extended to higher dimensional case.

For an arbitrary two-dimensional hidden Markov model which has N2 state nodes lying on the two-dimensionalstate transitional diagram, if every dimension of the model is non-causal, we can solve the model by allocatingN2 processors, each for one node, and if the N2 processors can be perfectly synchronized and dead lock of con-current state dependencies can be successfully solved, we can estimate the parameters of the non-causal modelby setting all N2 processors working simultaneously in perfect synchrony. However, this is usually impractical inreality. We propose to distribute the non-causal model to N2 distributed causal models, by specifically focusingon the state dependencies of each node one at a time, while ignoring other nodes. Similarly, for arbitrary M -dimensional hidden Markov models, we can distributing the non-causal model to NM distributed causal HMMs,by specifically focusing on the state dependencies of each node one at a time, while ignoring other nodes.

Figure 3. Comparison of state transition diagrams: (a) general non-causal 2D HMM (b) causal 2D HMM (c) specialcase of causal 2D HMM. (Dashed square: image block, Dot: state, Arrow: direction of state transition, shaded blocks:neighboring blocks of block B(i, j).)

More over, if we assume one dimension of the non-causal multi-dimensional model be causal, we can reducedramatically the computational complexity as well as resources needed for estimation of the model parameters.For example, for the two-dimensional non-causal model depicted in Fig. 4(b), if we assume that one dimension,e.g. the horizontal dimension, is causal, we only need N parallel processors instead of N2 that is originallyneeded. What’s more, in the distributing phase, there would be only N distributed causal hidden Markovmodels instead of N2. Generally speaking, for arbitrary M dimensional non-causal hidden Markov model withP (≤ M − 1) causal dimensions, we can solve the non-causal model by distributing it into NM−d distributedcausal hidden Markov models and process them in parallel. After distributing, the general problem dealing withnon-causal model is successfully transformed into problems of causal models which are much easier to solve.

Figure 4. Comparison of state transition diagrams: (a) two-dimensional hidden Markov model in4 and (b) proposednon-causal two-dimensional hidden Markov model. (c)(d) proposed distributed causal two-dimensional hidden Markovmodels (Dashed square: image block, Dot: state, Arrow: direction of state transition.)

Suppose original image is divided into image blocks. For each image block (i, j), i = {1, 2, ..., I}; j ={1, 2, ..., J}, where I and J are the numbers of row and column blocks in the original image, the feature vectoris o(i, j), the corresponding hidden state is s(i, j), and the class of the block is c(i, j). Two basic assumptionsof our 2D-DHMM model are: (i) The transition probability of one state depends on its adjacent neighboringstates in vertical, horizontal and diagonal directions. (ii) The feature vector for each image block follows aGaussian Mixture distribution, given its corresponding state, and it is independent of other feature vectors andtheir corresponding states.

In previous work,4 image classification decisions are made for each blocks based on context information fromits neighbors; each block is classified as either man-made region or natural region, based its corresponding hiddenstates. However, image blocks are not necessarily total man-made or natural region. In reality, most image blocksare mixture of man-made and natural regions, no matter how we change block size. Based on this observation, we

Figure 5. proposed 16 basic image block patterns (White: man-made regions, Gray: natural regions)

propose to define 16 basic image block patterns that cover all possible variabilities of image blocks, as depictedin Fig. 5. An image block can be either totally man-made, or natural, or mixture of man-made and naturalregions. Each pattern has several corresponding hidden states, which enriches the variability of possible stateswithin the model, and improves the accuracy of state estimations. Choosing more patterns of image blocks mayfurther improve classification accuracy, however the computational complexity would be much larger as a result.Later we will see 16 basic image block patterns result in relatively higher accuracy with lower computationalcomplexity.

3. 2D-DHMM TRAINING AND CLASSIFICATION

3.1. Expectation-Maximization (EM) AlgorithmWe propose a newly derived Expectation-Maximization (EM) algorithm suitable for the estimation of parametersof proposed 2D-DHMM model, which is analog but different than the EM algorithm for 1D HMM17.18 Definethe observed feature vector set O = {o(i, j), i = 1, 2, ...I; j = 1, 2, ...J} and corresponding hidden state setS = {s(i, j), i = 1, 2, ...I; j = 1, 2, ...J}. The model parameters are defined as a set Θ = {Π,A,B}, whereΠ = {πm} is the set of initial probabilities of states; A = {am,n,k,l} is the set of state transition probabilities,(m,n, k, l ∈ {1, 2, ..., M}); and B is the set of probability density functions (PDFs) of the observed feature vectorsgiven corresponding states.

Define F(p)m,n,k,l(i, j) as the probability of state corresponding to observation o(i − 1, j) is state m, state

corresponding to observation o(i − 1, j − 1) is state n, state corresponding to observation o(i, j − 1) is state kand state corresponding to observation o(i, j) is state l, given the observations and model parameters,

F(p)m,n,k,l(i, j) = P

(m = s(i− 1, j), n = s(i− 1, j − 1), k = s(i, j − 1), l = s(i, j)|O, Θ(p)

), (1)

and define G(p)m (i, j) as the probability of the state corresponding to observation o(i, j) is state m, then

G(p)m (i, j) = P (s(i, j) = m|O, Θ(p)). (2)

We can get the iterative updating formulas of parameters of the proposed model,

π(p+1)m = P (G(p)

m (1, 1)|O, Θ(p)). (3)

a(p+1)m,n,k,l =

∑Ii

∑Jj F

(p)m,n,k,l(i, j)∑M

l=1

∑Ii

∑Jj F

(p)m,n,k,l(i, j)

. (4)

µ(p+1)m =

∑Ii

∑Jj G

(p)m (i, j)o(i, j)

∑Ii

∑Jj G

(p)m (i, j)

. (5)

Σ(p+1)m =

∑Ii

∑Jj G

(p)m (i, j)(o(i, j)− µ

(p+1)m )(o(i, j)− µ

(p+1)m )T

∑Ii

∑Jj G

(p)m (i, j)

. (6)

In eqns. (1)-(6), p is the iteration step number. F(p)m,n,k,l(i, j), G

(p)m (i, j) are unknown in the above formulas, next

we propose a General Forward-Backward (GFB) algorithm for the estimation of them.

Figure 6. (a) State transition diagram of proposed 2D-DHMMs and (b) its decomposed subset-state sequences.

3.2. General Forward-Backward (GFB) Algorithm

Forward-Backward algorithm was firstly proposed by Baum et al.8 for 1D Hidden Markov Model and latermodified by Jia Li et al. in.4 Here, we generalize the Forward-Backward algorithm so that it can be applied toany HMM, the proposed algorithm is called General Forward-Backward (GFB) algorithm. For any HMM model,if its state sequence satisfy the following property, then GFB algorithm can be applied to it: The probabilityof all-state sequence S can be decomposed as products of probabilities of conditional-independent subset-statesequences U0, U1, ..., i.e., P (S) = P (U0)P (U1/U0)...P (Ui/Ui−1)..., where U0, U1, ..., Ui...are subsets of all-statesequence in the HMM system, we call them subset-state sequences. Define the observation sequence correspondingto each subset-state sequence Ui as Oi. Subset-state sequences for our model are shown in Fig. 6(b). The newstructure enables us to use General Forward-Backward (GFB) algorithm to estimate the model parameters.

3.2.1. Forward and Backward Probability

Define the forward probability αUu(u), u = 1, 2, ... as the probability of observing the observation sequence Ov(v ≤

u) corresponding to subset-state sequence Uv(v ≤ u) and having state sequence for u-th product component inthe decomposing formula as Uu, given model parameters Θ, i.e. αUu

(u) = P{S(u) = Uu, Ov, v ≤ u|Θ}, and thebackward probability βUu

(u), u = 1, 2, ... as the probability of observing the observation sequence Ov (v > u)corresponding to subset-state sequence Uv(v > u), given state sequence for u-th product component as Uu andmodel parameters Θ, i.e. βUu(u) = P (Ov, v > u|S(u) = Uu,Θ).

The recursive updating formula of forward and backward probabilities can be obtained as

αUu(u) = [

∑u−1

αUu−1(u− 1)P{Uu|Uu−1,Θ}]P{Ou|Uu,Θ}. (7)

βUu(u) =∑u+1

P (Uu+1|Uu,Θ)P (Ou+1|Uu+1,Θ)βUu+1(u + 1). (8)

Then, the estimation formulas of Fm,n,k,l(i, j), Gm(i, j) are :

Gm(i, j) =αUu

(u)βUu(u)∑

u:Uu(i,j)=m αUu(u)βUu(u). (9)

Fm,n,k,l(i, j)

=αUu−1(u− 1)P (Uu|Uu−1,Θ)P (Ou|Uu,Θ)βUu

(u)∑u

∑u−1[αUu−1(u− 1)P (Uu|Uu−1,Θ)P (Ou|Uu,Θ)βUu

(u)]. (10)

3.3. 2D Viterbi Algorithm

For classification, we employ a two-dimensional Viterbi algorithm19 to search for the best combination of stateswith maximum a posteriori probability and map each block to a class. This process is equivalent to search forthe state of each block using an extension of the variable-state Viterbi algorithm presented in,4 based on thenew structure in Fig. 6(b). If we search for all the combinations of states, suppose the number of states ineach subset-state sequence Uu is w(u), then the number of possible sequences of states at every position will beMw(u), which is computationally infeasible. To reduce the computational complexity, we only use N sequencesof states with highest likelihoods out of the Mw(u) possible states.

4. EXPERIMENTAL RESULTS: IMAGE SEGMENTATION

In this section, we test our general 2D-HMM model for the segmentation of man-made and natural regions of6 aerial images of the San Francisco Bay area provided by TRW (formerly ESL, Inc.). One of the six imagesused is shown in Fig. 7(a) and its hand-labeled truth image is depicted in Fig. 7(b). The images are dividedinto non-overlapping blocks, and feature vectors for each block are extracted. The feature vector consists ofnine features, of which 6 are intra-block features, as defined in,4 and 3 are inter-block features defined as thedifferences of average intensities of block (i, j) with its vertical, horizontal and diagonal block. Let the averageintensity of block (i, j) be I(i, j), then the 3 features are f7 = I(i, j) − I(i − 1, j); f8 = I(i, j) − I(i − 1, j − 1);and f9 = I(i, j)− I(i− 1, j).

We first train our model using training images, and estimate the model parameters based on the trainingfeature vectors and their corresponding truth set of classes. We then perform image classification for a test imageusing the trained model. Feature vectors are generated for each block in the test image in the same way as intraining. For testing the model, six-fold cross-validation is used. For each test, one image is used as a test image,and the other 5 serve as training images. 2D-HMM models with different number of states and different blocksizes are evaluated. We found that the model with 6 states for the natural class and 8 states for the man-madeclass yields the best result. Comparison for one of the classified images is shown in Figs. 7(c) and 7(d), we cansee that the proposed 2D-HMM model largely reduces the error rate of segmentation.

Figure 7. Comparison of the classification results of the proposed general 2D-HMM model and the model presented in4:(a) an original aerial image; (b) hand-labeled truth image; (c) classification results using the model presented in4—errorrate 13.39%; and, (d) classification results using the proposed general model with 16 basis image block patterns—errorrate 8.25%. (White: man-made regions, Gray: natural regions)

5. CONCLUSION

In this paper, a novel two-dimensional distributed hidden Markov model (2D-DHMM) has been proposed. Theproposed DHMM model provides an analytic solution to the non-causal two-dimensional hidden Markov modelby decomposing it into multiple distributed casual multi-dimensional hidden Markov models (HMMs). Analternate update process has been introduced to provide an approximate solution of the distributed scheme ona sequential processor. Moreover, the analytic solution for the training and classification algorithms has beenextended to general causal two-dimensional HMMs. Specifically, we extended the following algorithms to general

causal two-dimensional systems: (a) Expectation-Maximization (EM); (b) General Forward-Backward (GFB);and (c) Viterbi algorithms. The proposed DHMM framework provides an analytic solution to a more generalmulti-dimensional HMM model that can more accurately represent the state dependencies and yet provides asolution to the maximum a posteriori (MAP) classification.

REFERENCES1. L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,”

Proceedings of the IEEE 77, pp. 257–286, 1989.2. T. Starner and A. Pentland, “Real-time american sign language recognition from video using hidden markov

models,” Technical Report, MIT Media Lab, Perceptual Computing Group 375, 1995.3. F. I. Bashir, A. A. Khokhar, and D. Schonfeld, “Object trajectory-based activity classification and recogni-

tion using hidden markov models,” IEEE Trans. on Image Processing 16, pp. 1912–1919.4. J. Li, A. Najmi, and R. M. Gray, “Image classification by a two-dimensional hidden markov model,” IEEE

Trans. on Signal Processing 48, pp. 517–533, 2000.5. H. C. Lin, L. L. Wang, and S. N. Yang, “Color image retrieval based on hidden markov models,” IEEE

Transactions on Image Proceeding , pp. 332–339.6. C. Raphael, “Automatic segmentation of acoustic musical signals using hidden markov models,” IEEE

Transactions on Pattern Analysis and Machine Intelligence 21, pp. 360–370, 1999.7. A. V. Lukashin and M. Borodovsky, “Genemark.hmm: new solutions for gene finding,” Nucleic Acids

Research 26, pp. 1107–1115, 1998.8. L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique occuring in the statistical

analysis of probabilistic functions of markov chains,” Ann. Math. Stat 1, pp. 164–171, 1970.9. S. S. Kuo and O. E. Agazzi, “Machine vision for keyword spotting using pseudo 2d hidden markov models,”

Proceedings of International Conference on Acoustic, Speech and Signal Processing 5, pp. 81–84, 1993.10. C. C. Yen and S. S. Kuo, “Degraded documents recognition using pseudo 2d hidden markov models in

gray-scale images,” Proceedings of SPIE 2277, pp. 180–191, 1994.11. M. Brand, “Coupled hidden markov models for modeling interacting processes,” Technical Report, MIT

Media Lab, Perceptual Computing Group 405, 1997.12. M. Brand, N. Oliver, and A. Pentland, “Coupled hidden markov models for complex action recognition,”

IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’97) 2277, p. 994,1997.

13. P. A. Devijver, “Probabilistic labeling in a hidden second order markov mesh,” Pattern Recognition inPractice II , pp. 113–123, 1985.

14. P. A. Devijver, “Modeling of digital images using hidden markov mesh random fields,” Signal ProcessingIV: Theories and Applications (Proc. EUSIPCO-88) , pp. 23–28, 1988.

15. E. Levin and R. Pieraccini, “Dynamic planar warping for optical character recognition,” Proceeding ofInternational Conference and Acoustic, Speech and Signal Proceesing 3, pp. 149–152, 1992.

16. M. Park and D. J. Miller, “Image decoding over noisy channels using minimum mean-squared estimationand a markov mesh,” Proceeding of International Conference on Image Processing 3, pp. 594–597, 1997.

17. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the emalgorithm,” Journal of the Royal Statistical Society: Series B 39, pp. 1–38, 1977.

18. J. A. Bilmes, “A gental tutorial of the em algorithm and its application to parameter estimation for gaussianmixture and hidden markov models,” Technical Report, Dept. of EECS, U. C. Berkeley TR-97-021, 1998.

19. D. Schonfeld and N. Bouaynaya, “A new method for multidimensional optimization and its application inimage and video processing,” IEEE Signal Processing Letters 13, pp. 485–488, 2006.

Documents

Image Segmentation and Classiﬂcation Based on a 2D … · 2017-10-26 · image. The transition between superstates is modeled as a ﬂrst-order Markov chain and each superstate