18
1948 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 Stochastic Approximation Algorithms for Partition Function Estimation of Gibbs Random Fields Gerasimos Potamianos, Member, IEEE, and John Goutsias, Senior Member, IEEE Abstract—We present an analysis of recently proposed Monte Carlo algorithms for estimating the partition function of a Gibbs random field. We show that this problem reduces to estimating one or more expectations of suitable functionals of the Gibbs states with respect to properly chosen Gibbs distributions. As ex- pected, the resulting estimators are consistent. Certain generaliza- tions are also provided. We study computational complexity with respect to grid size and show that Monte Carlo partition function estimation algorithms can be classified into two categories: E- Type algorithms that are of exponential complexity and P-Type algorithms that are of polynomial complexity, Turing reducible to the problem of sampling from the Gibbs distribution. E-Type algorithms require estimating a single expectation, whereas, P- Type algorithms require estimating a number of expectations with respect to Gibbs distributions which are chosen to be sufficiently “close” to each other. In the latter case, the required number of expectations is of polynomial order with respect to grid size. We compare computational complexity by using both theoretical results and simulation experiments. We determine the most efficient E-Type and P-Type algorithms and conclude that P-Type algorithms are more appropriate for partition function estimation. We finally suggest a practical and efficient P-Type algorithm for this task. Index Terms—Computational complexity, Gibbs random fields, importance sampling, Monte Carlo simulations, partition func- tion estimation, stochastic approximation. I. INTRODUCTION A LTHOUGH Gibbs random fields (GRF’s) constitute a popular class of statistical models [1]–[3], a number of theoretical and computational problems are associated with these models, the most prominent one being the computation of the partition function. With the exception of some restrictive cases [4], no exact solution is known for this problem. Lack of a closed-form expression for partition function calculation has imposed restrictions on a number of statistical problems of interest. Although GRF parameter estimation techniques exist that do not require knowledge of the partition function (e.g., see [1], [5]–[7]), this is not the case for certain Bayesian inference, optimal model selection, and hypothesis testing problems. These problems require knowledge of the Manuscript received November 14, 1995; revised December 19, 1996. This work was supported by the Office of Naval Research, Mathematical, Computer, and Information Sciences Division, under ONR Grant N00014- 90-J-1345. The material in this paper was presented in part at the IEEE International Conference on Acoustics, Speech, and Signal Processing, Min- neapolis, MN, April 27–30, 1993, and the 28th Conference on Information Sciences and Systems, Princeton, NJ, March 16–18, 1994. G. Potamianos is with AT&T Labs–Research, Murray Hill, NJ 07974 USA. J. Goutsias is with the Image Analysis and Communications Laboratory, Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD 21218 USA. Publisher Item Identifier S 0018-9448(97)06714-X. likelihood function which is not available analytically but only in terms of an integral, or summation, of a given functional over a large state space (i.e., a partition function). For example, as explained in [8], we may be interested in the problem of optimally fitting a parametric statistical model to given data. If the number of parameters to be estimated is small (relative to the data size), this problem can be solved by means of traditional maximum likelihood, in which case, the techniques proposed in [1] and [5]–[7] will be relevant. If, however, the number of parameters is large, a penalized likelihood method may be used instead, by means of Akaike’s information criterion for example (see [8]–[12]). This approach naturally leads to calculating partition functions similar to the ones considered in this paper. Similarly, GRF hypothesis testing problems (i.e., the problem of testing whether or not given data come from a particular GRF model) naturally lead to the problem of evaluating log-likelihood ratios which are given as functions of partition function ratios (e.g., see [9, pp. 241–242]). As an alternative to exact calculations, a number of ana- lytical techniques have been proposed for partition function approximation (e.g., see [4], [11], and [13]). These techniques are often unreliable and limited to special cases. In this paper, we focus our attention on stochastic approximation (i.e., Monte Carlo) techniques for partition function estimation. Although these techniques are computationally intensive, they justify their use as being highly accurate and applicable to general GRF models. A respectable number of Monte Carlo partition function estimation algorithms have been proposed in the literature (see [8]–[12], [14], and [15]). However, many issues related to these algorithms need investigation. For example, statistical properties of the resulting estimators are often not determined, whereas, computational complexity is hardly ever analyzed. In addition, certain algorithms can be extended into new ones, capable of more efficient partition function estimation. Clearly, there is need for a unified presentation, rigorous analysis, and comparative study of all these methods. Addressing these issues is the main theme of this contribution. The paper is organized as follows. Section II establishes the required background and notation. Sections III and IV are devoted to studying fundamental properties of the Monte Carlo partition function estimation algorithms proposed in [8], [10]–[12], [14], and [15]. It is shown that these algorithms share common characteristics, in terms of computational com- plexity required to achieve a predefined level of partition function estimation accuracy and confidence. They are sub- 0018–9448/97$10.00 1997 IEEE

Stochastic approximation algorithms for partition function estimation of Gibbs random fields

  • Upload
    j

  • View
    222

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1948 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Stochastic Approximation Algorithms for PartitionFunction Estimation of Gibbs Random Fields

Gerasimos Potamianos,Member, IEEE, and John Goutsias,Senior Member, IEEE

Abstract—We present an analysis of recently proposed MonteCarlo algorithms for estimating the partition function of a Gibbsrandom field. We show that this problem reduces to estimatingone or more expectations of suitable functionals of the Gibbsstates with respect to properly chosen Gibbs distributions. As ex-pected, the resulting estimators are consistent. Certain generaliza-tions are also provided. We study computational complexity withrespect to grid size and show that Monte Carlo partition functionestimation algorithms can be classified into two categories: E-Type algorithms that are of exponential complexity and P-Typealgorithms that are of polynomial complexity, Turing reducibleto the problem of sampling from the Gibbs distribution. E-Typealgorithms require estimating a single expectation, whereas, P-Type algorithms require estimating a number of expectationswith respect to Gibbs distributions which are chosen to besufficiently “close” to each other. In the latter case, the requirednumber of expectations is of polynomial order with respect togrid size. We compare computational complexity by using boththeoretical results and simulation experiments. We determine themost efficient E-Type and P-Type algorithms and conclude thatP-Type algorithms are more appropriate for partition functionestimation. We finally suggest a practical and efficient P-Typealgorithm for this task.

Index Terms—Computational complexity, Gibbs random fields,importance sampling, Monte Carlo simulations, partition func-tion estimation, stochastic approximation.

I. INTRODUCTION

A LTHOUGH Gibbs random fields(GRF’s) constitute apopular class of statistical models [1]–[3], a number of

theoretical and computational problems are associated withthese models, the most prominent one being the computationof thepartition function. With the exception of some restrictivecases [4], no exact solution is known for this problem.

Lack of a closed-form expression for partition functioncalculation has imposed restrictions on a number of statisticalproblems of interest. Although GRF parameter estimationtechniques exist that do not require knowledge of the partitionfunction (e.g., see [1], [5]–[7]), this is not the case for certainBayesian inference, optimal model selection, and hypothesistesting problems. These problems require knowledge of the

Manuscript received November 14, 1995; revised December 19, 1996.This work was supported by the Office of Naval Research, Mathematical,Computer, and Information Sciences Division, under ONR Grant N00014-90-J-1345. The material in this paper was presented in part at the IEEEInternational Conference on Acoustics, Speech, and Signal Processing, Min-neapolis, MN, April 27–30, 1993, and the 28th Conference on InformationSciences and Systems, Princeton, NJ, March 16–18, 1994.

G. Potamianos is with AT&T Labs–Research, Murray Hill, NJ 07974 USA.J. Goutsias is with the Image Analysis and Communications Laboratory,

Department of Electrical and Computer Engineering, The Johns HopkinsUniversity, Baltimore, MD 21218 USA.

Publisher Item Identifier S 0018-9448(97)06714-X.

likelihood functionwhich is not available analytically but onlyin terms of an integral, or summation, of a given functionalover a large state space (i.e., a partition function). For example,as explained in [8], we may be interested in the problemof optimally fitting a parametric statistical model to givendata. If the number of parameters to be estimated is small(relative to the data size), this problem can be solved bymeans of traditional maximum likelihood, in which case,the techniques proposed in [1] and [5]–[7] will be relevant.If, however, the number of parameters is large, apenalizedlikelihood methodmay be used instead, by means of Akaike’sinformation criterion for example (see [8]–[12]). This approachnaturally leads to calculating partition functions similar tothe ones considered in this paper. Similarly, GRF hypothesistesting problems (i.e., the problem of testing whether or notgiven data come from a particular GRF model) naturally leadto the problem of evaluating log-likelihood ratios which aregiven as functions of partition function ratios (e.g., see [9, pp.241–242]).

As an alternative to exact calculations, a number of ana-lytical techniques have been proposed for partition functionapproximation (e.g., see [4], [11], and [13]). These techniquesare often unreliable and limited to special cases. In this paper,we focus our attention onstochastic approximation(i.e.,MonteCarlo) techniques for partition function estimation. Althoughthese techniques are computationally intensive, they justifytheir use as being highly accurate and applicable to generalGRF models.

A respectable number of Monte Carlo partition functionestimation algorithms have been proposed in the literature (see[8]–[12], [14], and [15]). However, many issues related tothese algorithms need investigation. For example, statisticalproperties of the resulting estimators are often not determined,whereas, computational complexity is hardly ever analyzed.In addition, certain algorithms can be extended into new ones,capable of more efficient partition function estimation. Clearly,there is need for a unified presentation, rigorous analysis,and comparative study of all these methods. Addressing theseissues is the main theme of this contribution.

The paper is organized as follows. Section II establishesthe required background and notation. Sections III and IVare devoted to studying fundamental properties of the MonteCarlo partition function estimation algorithms proposed in [8],[10]–[12], [14], and [15]. It is shown that these algorithmsshare common characteristics, in terms of computational com-plexity required to achieve a predefined level of partitionfunction estimation accuracy and confidence. They are sub-

0018–9448/97$10.00 1997 IEEE

Page 2: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1949

TABLE IA SUMMARY OF THE PARTITION FUNCTION ESTIMATORS CONSIDERED IN THIS PAPER

sequently classified into two categories:E-Type algorithms(discussed in Section III) that are ofexponentialcomplexitywith respect to grid size, andP-Type algorithms(discussedin Section IV) that are ofpolynomial complexity, Turingreducible to the problem of sampling from the Gibbs distri-bution.1

More specifically, Section III-A discusses the algorithmproposed in [14], whereas, Section III-B presents an analysisof the algorithm proposed in [10]. Theorem 1 and (37)–(40)provide original results regarding the computational com-plexity of these algorithms. In order to facilitate furthercomparison, additional results at certain “extreme” tempera-tures are reported. Section IV presents two partition functionestimation algorithms that generalize existing techniques. Inparticular, Section IV-A generalizes the technique proposedin [8], [11], and [12], whereas, Section IV-B generalizesthe technique proposed in [15]. Theorems 3–5 constituteoriginal results concerning the computational complexity ofthese methods. Section V provides comparisons between thealgorithms discussed in this paper, based on the theoreticalresults of Sections III and IV and a representative number

1If PPP 1 and PPP 2 are two problems, then we say that problemPPP 1 is ofpolynomial complexity, Turing reducible to problemPPP 2, if the number ofsteps required for solving problemPPP 1 is bounded by a polynomial in theproblem size, provided that the solution to problemPPP 2 is readily available.

of simulation experiments. Finally, Section VI draws ourconclusions. For convenience, Table I summarizes all partitionfunction estimators considered in this paper.

II. GIBBS RANDOM FIELDS

Consider a collection of points in , where isthe set of all integers, given by the rectangular grid

A discrete-valued random variable is assigned at eachpoint of the grid, taking values from a finite state-space , which contains distinct values. Theresulting random field

can take anyone of the possible realizations (states)

in the Cartesian product with probability mass function. We restrict to be a GRF whose probability

mass function is given by theGibbs distribution

(1)

Page 3: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1950 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

where

(2)

and

(3)

In (1)–(3), is a normalizing constant known as thepartitionfunction,2 is a positive parameter known as thetemperature,and is theenergy functionthat is independent of . Wedefine quantities and by

and

(4)

which exist and are both finite.Let us denote theneighborhoodof a point ,

induced by the Gibbs distribution (1)–(3), by [1]–[4].Let

and

for every , and , independent of ,assuming ahomogeneousneighborhood [1]–[4], wheredenotes the cardinality of set. Then, (2) can be written as[9], [16]

(5)

In (5)

is the local transfer function(LTF), which depends on andis positive and finite for every . The LTF needsto be modified at the boundary points ofdepending on thetype of boundary condition assumed (e.g., free or toroidal).In this paper, we assume that the LTF is homogeneous, i.e.,independent of .3 Similarly to (4), we defineand by

(6)

(7)

which clearly satisfy , for every.

As an example of (5), consider the two-dimensionalIsingmodelwith a toroidal boundary condition (e.g., see [4, ch. 7]).For such a GRF, , , and the LTFis given by

(8)

2Generally speaking, most of the quantities used here depend onM , N ,andT (or on the LTF�, see (5)). This dependence is often suppressed in orderto simplify notation. When necessary, however, it appears as a subscript oran argument.

3The more general case of anonhomogeneousLTF is treated in [9].

where and , are two real-valued param-eters.

Equations (1)–(3) provide an energy–temperature formu-lation for the Gibbs distribution, whereas, (1), (3), and (5)provide a LTF formulation. Both are equivalent notations. Theformer will be used in Section III and characterizes a Gibbsdistribution as a one-parameter exponential family with respectto . In Section IV, however, we exclusively considerthe LTF formulation which characterizes a Gibbs distributionas an parameter exponential family with respect to

[9]. Notice that , as , whereas,, as , provided that .

This is due to the fact that (e.g., see [16])

for some finite functional on .Usually, with increasing grid size (i.e., as). It is then more appropriate to consider the quantity

(9)

known as thepressure. In the case of a GRF with homogeneousLTF, the pressure enjoys a finite limit, as ,4 at anytemperature (or, equivalently, for any positiveand finite LTF) [17], [18]; i.e.,

(10)

An important property of a GRF is its ability to mathemat-ically describephase transitions. If

(11)

with being thespecific heatat temperature , defined by

(12)

where denotes variance with respect to probabilitymass function , then we say that the GRF is in phasetransition atcritical temperature . We refer to temperatures

well above (i.e., ) as “high” temperatures andto temperatures close or below (i.e., ) as “low”temperatures. The problem of partition function estimation ismore challenging in the second case, as theory and simulationsdemonstrate.

A useful special case of a GRF is amutually compatibleGibbs random field(MC-GRF) [16], [19]. This random field ischaracterized by a probability mass function , ,of the form

(13)

where the LTF

4Throughout the paper, and whenM 6= N , M; N ! 1 in the sense ofvan Hove [17].

Page 4: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1951

is positive, finite, and satisfies

(14)In this case, the partition function equals one, whereas, MC-GRF samples can be drawn lexicographically in exactly

time [16]. Both of these properties are instrumentalin developing the importance sampling Monte Carlo partitionfunction estimation procedure suggested in [14] and furtherdiscussed in Section III-A. In the following, we denote theclass of probability mass functions that satisfy (13) and (14)by , and the class of LTF’s that satisfy (14) by .

III. E-TYPE PARTITION FUNCTION ESTIMATION ALGORITHMS

In this section, we discuss two stochastic approximationtechniques for partition function estimation proposed by us in[14] and by Ogata and Tanemura in [10]. Both techniques arebased on estimating a single expectation of a certain functionalof and are of exponential complexity with respect to gridsize.

A. MC-GRF-Based Monte Carlo Estimator

Consider an MC-GRF on with probability massfunction given by (13) and (14). From (3), observethat

(15)

where denotes expectation with respect to probability massfunction , and (see (5) and (13))

(16)In this case

(17)

is an unbiasedand consistentMonte Carlo estimator of thepartition function , provided that , for every

[14] (see also [20]). In (17),is a collection of MC-GRF’s that are statistically independentand equivalent to .

The main focus of our work in [14] was to choose the ap-propriate probability mass function or, equivalently,the LTF . Since is an unbiased and consistentestimator of , we have concentrated our effort on finding a

such that

for every , and every , in an effort toachieveimportance sampling[20]. This is clearly equivalentto minimizing, with respect to , an Ali–Silvey type

of “distance” of a probability mass functionfrom the Gibbs distribution , given by

Var

(18)

A solution to this minimization problem, however, may notexist, and we are merely left with the problem of “cleverly”choosing a that approximately satisfies our minimumvariance requirement.

A “naive” choice for can be obtained by means of LTF

(19)This choice fails, however, to provide efficient estimators of

, by means of (17), as it has been already discussed in [14]and further demonstrated by (28) and (29) later.

Two alternative choices for have been proposedin [14]. Given the LTF of the original GRF, we choose twoLTF’s that satisfy (14), given by

(20)

and

(21)

In (20), denotes the number of occur-rences that appear in a real-ization of a GRF , whereas, ,

, can be estimated by means of aMarkov-Chain Monte Carlo (MCMC) scheme, such as theGibbssampler [3] (see also [21]).

We are now interested in studying the computational com-plexity of calculating a “reliable” estimate of the partitionfunction, by means of (17), as a function of the numberof points in . We assume that the LTF of the MC-GRF hasbeen computed off-line. Since our approach is stochastic, weare interested in the minimum size of the Monte Carlosample sufficient to obtain Monte Carlo estimators that satisfy

(22)for a givenaccuracy and confidence .The following Theorem 1 shows that such an integerexists and that the computational complexity of the previouslydiscussed methods is at most exponential with respect to.5

5The proofs of all theorems can be found in the Appendix.

Page 5: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1952 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Theorem 1: For any Monte Carlo estimator of thepartition function , given by (17), there exists an integer

such that satisfies (22), for every .Furthermore,

(23)

at any temperature , where

(24)

In addition, if is such that

(25)then

(26)

In practice, and in the case of a large grid, Theorem 1shows that good estimates ofcan be obtained in reasonablecomputational time only when which, for example,would be the case of infinite temperature (it is easy to showthat , , and satisfy (25) and hence, (26)). Extensivesimulation experiments in [9] and [14] have shown that both

and provide “reliable” partition function estimatesat “high” temperatures. However, only may provide“reliable” estimates of at “low” temperatures. This is partlyexplained by the empirical observation that

(27)

for virtually all values of and for a number of GRFmodels at different temperatures (see also Section V-A).

It has been shown in [9, Lemma 4.1, pp. 119–120] that

(28)

where

is the number of minimum energy states of a GRF. Forsome GRF’s (e.g., the Ising model), is constant,independent of the grid size. In such cases, and for sufficientlylarge grid sizes6

(29)

Equations (18), (28), and (29) demonstrate the fact thatis an inefficient estimator of at “low”

temperatures. In fact, as , becomesequivalent to a brute-force summation of the partition functionby means of (3), as shown by our simulations in Section V-A(see also (65) later).

6In fact,

limM;N!1

limT!0

D� (M; N; T ) = ln R

provided thatln qmin

(M; N) = o(MN) [9, Lemma 4.1, pp. 119–120].

B. Ogata–Tanemura Partition Function Estimator

The estimator proposed in [10] is based on the identity (seealso (1))

provided that , for all states . Then

(30)

is a Monte Carlo estimator for , whereis a collection of GRF’s obtained by means of

MCMC such that

for every . It can be shown that estimatoris asymptotically unbiasedand consistent, with

(31)where

(32)

, and is an asymptotic term reflecting the effectof correlation between random fields onthe rate of convergence of the mean-square estimation error tozero [21], [22]. The following theorem reveals an interestingproperty associated with the variance of .

Theorem 2: At any temperature such that

(33)

we have that

(34)

provided that and .The left-hand side of (34) is theasymptotic efficiencyof

to (see [23]). Theorem 2 shows thatis less efficient than at sufficiently

“low” temperatures and is, therefore, an inefficient estimatorof at these temperatures.

In analogy with Theorem 1, we now examine the asymptotic(as ) computational complexity of the partitionfunction approximation scheme (30). However, due to theasymptotic nature of (31) and the presence of, such a taskseems impossible. To ameliorate this problem, we shall assumehere that there exists an unbiased and consistent estimator

of with variance

(35)

Page 6: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1953

If (which is frequently the case [22]), the convergenceproperties of will be worse than the ones of

; therefore, studying will be the “bestcase” scenario. Theorem 1 turns out to be applicable here,by defining as (compare with (18), and seealso (9), (32), and (35))

(36)

where is the pressure. Then (comparewith (23))

(37)

at any temperature , where (compare with (24))

(38)

Furthermore, it can be easily shown that [9] (compare with(26) and (28))

(39)

whereas

(40)

Equations (23), (25), (26), (37), and (39) suggest that, at“high” enough temperatures, becomes computationallycomparable to , , and . Furthermore, (37) and(40) suggest that becomes computationally inefficientas the temperature approaches zero, and is in fact worse thana brute-force summation of the partition function by means of(3) (see also (65) later).

IV. P-TYPE PARTITION FUNCTION ESTIMATION ALGORITHMS

We now propose two stochastic approximation schemes forpartition function estimation that generalize the two methodsreported in [8], [11], [12], and [15]. These algorithms are basedon estimating expectations of suitable functionals ofwithrespect to Gibbs distributions that are chosen to be sufficiently“close” to each other. They are both of polynomial complexitywith respect to grid size, Turing reducible to the problem ofsampling from the Gibbs distribution.

A. Ogata–Tanemura Partition Function Estimator

Let us assume that we are given the partition functionof a GRF with LTF and we are interested in calculating thepartition function of a GRF with LTF . Let

(41)

It can be easily shown that

(42)

By differentiating the logarithm of with respect to, we obtain

(43)

where

(44)

The function is continuously differentiable withrespect to , for all finite and , and, therefore,(see (9), (41), and (43))

(45)

An equivalent form of (41)–(45) appears in [11] and[12], and for the case of spatial point processes, with

, for all .A more general form of (41)–(45) appears in [8].

Similarly, to the approach suggested by Ogata and Tanemurain [11] and [12], and in order to calculate the integral in (45),we may first estimate at a number of points

, and then approximate function by meansof a polynomial fit approach based on least mean-square-error estimation [11], or cubic B-splines [12]. The resultingapproximate function can be then integrated inorder to provide an estimate of (45). Notice that we can obtaina Monte Carlo estimator for (43) by means of

(46)

where is a collection of GRF’sobtained by means of MCMC such that

for every .In general, the polynomial fit approach suffers from approx-

imation errors that do not necessarily converge to zero as thedegree of the interpolating polynomial is increased [24]. Inthe spline approach of [12], the error in approximating theinternal energy approaches zero as the number of samplingpoints increases to infinity [24], provided that the Monte Carlo

Page 7: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1954 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

estimation error at each sampling point also approaches zero(which is the case when the number of MCMC iterationsapproach infinity). It is not clear, however, how to statisticallyanalyze the resulting algorithm. In [9], we have suggestedemploying Simpson’s integration rule[24] for numericallycomputing (45), an idea similar to the approach suggestedin [8].7 The resulting algorithm is amenable to a statisticalanalysis whose main points are now summarized.

Let us consider equally spaced points ,where

for (47)

with beingeven. A simple and effective way to numericallycalculate the integral in (45) is by means of Simpson’sintegration rule, in which case

(48)

where [24]

(49)

with

forforfor

(50)

A Monte Carlo estimator of the partition function ratiois now given by (see (45)–(50))

(51)

In (51), , is the numberof samples associated with the Monte Carlo estimationof by means of (46), whereas

are GRF’s obtained by means of MCMC suchthat

where . To facilitate the derivation of Theorem 3below, and similarly to the statistical analysis in [15] and inSection IV-B, we assume here thatare independent and identically distributed (i.i.d.) GRF’s ob-tained by means of “many short runs” MCMC with “burn-in”

7In [9, ch. 5], a more general approach has been employed that uses theRomberg integration procedureof order2s (see also [24]). Whens = 1, thisleads to the well-knowntrapezoidal integration rulethat has been suggested asa useful tool for partition function estimation in [8]. Whens = 2, we obtainthe Simpson’s integration rule employed here. For the particular problemat hand, we have noticed that the Simpson’s integration rule constitutes animprovement over the simpler trapezoidal rule, with the former often requiringonly 25% of the sampling points of the latter [9, Sec. 5.3]. Furthermore, littlebenefit is achieved by going from a Romberg integration rule withs = 2 tointegration rules withs > 2, unlessM andN are large [9].

equal to (e.g., see [21]). In this case, identical andstatistically independent ergodic Markov chains are generatedthat approximately converge to probability after steps.The first samples are discarded and the st sampleis kept. GRF will then be the st state of the thchain. Estimator (51) can be easily shown to be asymptoticallyunbiased and consistent, as and for all

. However, and for all finite , estimator (51)will be inconsistentsince (48)–(50) introduce a “systematic”error in approximating the integral in (45).

The purpose of (51) is to approximate inthe sense of (22). We, therefore, seek sufficient values for,

, and , such that estimator (51) satisfies

(52)

for given . We have the following theorem.Theorem 3: For calculating , for ,

where , by means of the Monte Carloestimation scheme (51) such that (52) is satisfied, it sufficesto use equally spaced points , given by (47), with

, where

(53)

and i.i.d. samples , with, obtained by means of “many short

runs” MCMC with “burn-in” ,for , where is the maximum (inabsolute value) nonunit eigenvalue of the transition probabilitymatrix of the underlying MCMC.

According to Theorem 3, the computational cost of satisfy-ing (52), by means of (51), is clearly

Monte Carlo iterations, and is of , whereis given by [9]

(54)

Equation (51) provides a stochastic approximation algorithmfor estimating the ratio of two partition functions. The problemof estimating the partition function can be addressedas a special case of (51) by replacing by and bya . As we have already discussed in Section II,

and, therefore, will be a partitionfunction estimator for the pressure .

Theorem 3 shows that the computational complexityof estimator depends on the “distance”

. It is, therefore, natural to seek a that

Page 8: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1955

minimizes . In Section III, we have proposed threechoices for . These choices can be used in order to providethree estimators for : namely, ,

, and . In Section V-B, we demonstrate that the last two estimators are clearlysuperior to the first. As an additional remark, it is not difficultto confirm that

(55)

for all LTF’s (see (21) and (53)). Furthermore, there existLTF’s for which inequality (55) is strict. Inequality (55), to-gether with the fact that , providesa theoretical justification for preferringover .

B. Jerrum–Sinclair Partition Function Estimator

We now discuss the stochastic approximation algorithmproposed by Jerrum and Sinclair in [15] for estimating thepartition function of a special class of GRF’s, and used byGeyer and Thompson in [25] and Geyer in [26] for themaximum-likelihood estimation of GRF parameters. Althoughour presentation closely resembles that in [15], we extendJerrum and Sinclair’s exposition to the problem of partitionfunction estimation of general GRF models. It will soonbecome apparent that the MC-GRF Monte Carlo estimationmethods, discussed in Section III, constitute a special case ofthis generalization.

The idea proposed in [15] is to use the identity (see also(1), (3))

(56)

as a basis for computing , given . A Monte Carloestimator for is then given by

(57)

where is a collection of GRF’sobtained by means of MCMC such that

for every .8

Estimator (57) provides “reliable” estimates of only ifthe Gibbs distributions and , or equivalently the LTF’s

8Two interesting alternatives to estimating the ratio in (56) have beensuggested in [27] and [28]. In both cases, (56) is estimated as a ratio oftwo Monte Carlo estimators. In [27], the expected values of two parametricfunctionals with respect to Gibbs distributions�� and�� are estimated. The“optimal” choice for these functionals, however, demandsa priori knowledgeof the ratio (56). In [28], the expected values of two functionals with respectto a parametric Gibbs distribution that includes, as special cases, both��and�� are estimated. The “optimal” choice for this distribution is unclear.In any case, (57) constitutes a special case of the methods suggested in [27]and [28], with the advantage that is amenable to a rigorous statistical analysis.

and , are “close enough” to each other. Indeed, let uschoose the expression at the top of this page as a measure of“distance” between the two LTF’s and . By virtue of theCentral Limit Theorem [23], the convergence rate of (57) to

is determined by the ratio ,where . This ratio is bounded fromabove by [9, Sec. 5.2]

Var

(58)

Since we are interested in the computational complexity ofthe method, as , we would like to upper-boundthe ratio by a bound which is notexponential with respect to grid size. Ifwas of , then the ratio in (58) would be upper-bounded by a constant, as . This, however, is notpossible in general,9 since and , and, thus, ,are independent of . In practice, we need to estimatethe partition function ratio (56) by means of a product of asufficiently large number of partition function ratios, as weexplain next.

Let us consider LTF’s , given by

(59)

, and the associated Gibbs distributions ,. By using (5), (9), (56), and (59), we can

easily show that

(60)

Notice that, if

for

then (compare with (58))

(61)

The upper bound in (61) will therefore remain constant inde-pendent of the lattice size if, for example,

9UnlessdJS(�1; �0) = 0, in which case

Z(�1)=Z(�0) = [max f�1(x0; x1; � � � ; xJ )=�0(x0; x1; � � � ; xJ ) :

x0; x1; � � � ; xJ 2 Eg]MN :

Page 9: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1956 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

A Monte Carlo estimator of is now givenby (see (56), (57), and (60))

(62)

In (62), , whereas, ,, are GRF’s obtained by means of MCMC such that

To facilitate the derivation of Theorem 4 below, and similarlyto Theorem 3 and the analysis in [15], we assume here that

are i.i.d. GRF’s obtained by meansof a “many short runs” MCMC with “burn in” equal to

. Estimator (62) can be easily shown to be asymptoticallyunbiased and consistent, as for all

independent of the choice of.Similarly to Theorem 3 and (52), we seek sufficient values

for and such that estimator (62) satisfies

(63)

for given . We have the following theorem.Theorem 4: For calculating , for ,

where , by means of the Monte Carloestimation scheme (62) such that (63) is satisfied, it sufficesto use LTF’s given by (59), with

and i.i.d. samples , with, obtained by means of “many short runs” MCMC

with “burn-in”

for

where is the maximum (in absolute value) nonuniteigenvalue of the transition probability matrix of the under-lying MCMC.

According to Theorem 4, the computational cost of satisfy-ing (63), by means of (62), is clearly MonteCarlo iterations, and is of , where isgiven by (54). Therefore, estimator is expected to becomeslightly faster than estimator , as . Thedifference in computational complexity of the two estimatorscan be further reduced by approximating integral (45) with ahigher order numerical integration scheme (see [9] and [24]).From the proofs of Theorems 3 and 4, notice that, althoughthe number of intermediate points associated withdepends on the required accuracy, this is not true for the

associated with , which only depends on and the“distance” .

Equation (62) provides a stochastic approximation algorithmfor estimating the ratio of two partition functions. Clearly,estimating the partition function of a Gibbs distribution

can be addressed as a special case of (62) by replacing

by and by a . This leads to estimatorfor the pressure .

By comparing (15)–(17), (59), and (62), it immediatelybecomes clear that estimator is a special case of estimator

, when . Although and maybe “separated” by a large “distance,” the LTF’s

may be taken to be “close” to each other(for large enough ). Therefore, estimatoris expected to achieve better importance sampling perfor-mance, when , since it estimates intermediate ratios

as compared todirectly estimating the ratio . This is further demon-strated in Section V-C.

Theorem 4 shows that the computational complexity ofdepends on the “distance” . It

is, therefore, natural to seek a that minimizes. The three choices for suggested in Section III

can be used here in order to provide three estimators for, by means of (62); namely, ,

, and . In Section V-B,we demonstrate that the last two are clearly superior to thefirst. As an additional remark, notice that

(64)

for all LTF’s [9]. Furthermore, there exist LTF’s for whichinequality (64) is strict. Inequality (64), together with the factthat , provides some theoreticaljustification for preferring estimator overestimator .

We now conclude our discussion with a theorem concerningthe computational complexity of both estimators discussed inthis section.

Theorem 5: The problem of calculating the partition func-tion of a GRF can be solved in polynomial time on the gridsize, Turing reducible to the problem of sampling from theGibbs distribution, by means of either or . When

, for some integer , calculation ofthe partition function can be achieved in polynomial time onthe grid size by means of these estimators.

Theorem 5 is the first result of this type regarding estimator. It also generalizes similar results, obtained by Jerrum

and Sinclair in [15]. Notice that the problem of designingan MCMC algorithm such that ,for some integer , is an important open research problem.A limited solution to this problem has already appearedin [15]. However, it has also been shown in [15] that ageneral solution to such a problem cannot be achieved understandard complexity-theoretic assumptions. Thus and as adirect consequence of Theorem 5, one cannot in general hopeto achieve polynomial complexity by using the previouslydiscussed P-Type estimators.

V. EXPERIMENTAL RESULTS AND COMPARISONS

So far, we have considered a number of stochastic ap-proximation algorithms, suitable for estimating the partitionfunction of a general GRF. In this section, we provide an

Page 10: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1957

experimental study of their computational complexity, neces-sary to achieve a “good” partition function estimate, in thesense of (22), (52), or (63). Due to lack of space, we onlydiscuss few representative simulation experiments limited toan Ising model with LTF given by (8). In this case, thepartition function is known analytically, for any finiteand any temperature [4]. This enables us tocompare estimates of the partition function to its true value.In Section V-A, we consider a Ising model withparameters , , at various temperatures. Thecritical temperature is given by (recall (11) and(12)). Additional simulation experiments, for the case of moregeneral GRF’s, may be found in [9].

The E- and P-Type estimators discussed in Sections IIIand IV are different in nature. Therefore, we first limit ourcomparisons within each class. We subsequently compare thebest E-Type to the best P-Type estimators, and we finallydemonstrate the accuracy and effectiveness of a simple P-Type estimator, by applying it to the problem of calculatingthe likelihood of a fully and a partially observed GRF. Thecomparisons presented here are based on the same number ofMonte Carlo iterations; thus they approximately require thesame amount of computations.10

A. Comparison of Estimators

Theorem 1, (18), (28), (29), and (36)–(40) provide a wayof comparing E-Type estimators, by means of the complex-ity coefficients and . MonteCarlo estimates of these coefficients can be obtained by meansof estimating , , and , with estimator

, for example, where, in each case, wesubstitute with the appropriate LTF.

We now consider the Ising model described above overthe temperature range . Fig. 1(a) depicts theestimated complexity coefficients , for , , ,and , as a function of temperature. We have also plottedthe complexity coefficient

(65)

associated with calculating the partition function by means ofa brute-force summation (since, in this case, ).As expected (see (26) and (39)), all three coefficientsand

approach zero as . However, (see (29), (40), and(65)), becomes large at “low” temperatures, whereas

asymptotically approaches , as . On theother hand, and are always close to zero, even at“low” temperatures. Notice that (27) is verified experimentally,with being closer to zero than , at virtually alltemperatures.

10Two main operations contribute to the computational cost of a partitionfunction estimator: generating Monte Carlo samples and calculating thefunctionals associated with each ergodic average. In our simulations, wegenerate independent MC-GRF’s for (17) and use a “single long run”MCMC (based on the Gibbs sampler with lexicographic site updating [3])for estimating the expectations in (20), and for (30), (51), and (62). In bothcases, the cost of generating each Monte Carlo sample is ofO(MN) and,given a Monte Carlo sample, the cost of calculating the functionals in (17),(20), (30), (51), and (62) is ofO(MN) as well.

(a)

(b)

Fig. 1. (a) Monte Carlo estimates of the complexity coefficients associatedwith E-Type partition function estimators as a function of temperature. (b)E-Type estimates of the pressuref(Z) as a function of temperature.

Fig. 1(b) compares pressure estimates, obtained by meansof the four E-Type algorithms under consideration, to theexact pressure. The comparison considers ,

, , and , for. In the second case, and for the shake of

a “fair” comparison, Monte Carlo iterations areallocated for estimating the expectations in (20) (and, thus,the LTF with the subsequent iterations allocated forcalculating . Clearly, and as expected from Fig. 1(a),estimator is the most “reliable” one, among all E-Typepartition function estimators. Notice, however, that the simplerto implement estimator is sufficiently accurate at“high” temperatures as well and should be therefore preferredat such temperatures.

We now compare the Ogata–Tanemura P-Type estimatorsfor various choices of . The first row of Fig. 2 depicts

, as well as , , and, for , with the Ising model being

at a “high” temperature and a “low” temperature, respectively. In all cases, , for every

, where , with the exceptionof the second estimator, where , with theremaining iterations used for estimating (20). The

Page 11: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Fig. 2. Comparison of the Ogata–Tanemura (first row) and the Jerrum–Sinclair (second row) P-Type estimators, as a function of�, for various choicesof � and at a “high” temperatureT = 3:5 and a “low” temperatureT = 3:0.

same experiment is repeated for the Jerrum–Sinclair esti-mator. The second row of Fig. 2 depicts , as well as

, , and ,11

for , at temperatures and .As expected, the results show that the accuracy of all P-Type estimators increases as a function of. Estimators

and are clearly superiorto the rest, since they consistently produce more accurateresults for the same value of. At “high” temperatures, how-ever, the simpler to implement estimatorsand perform well and should be, therefore,preferred at such temperatures.

It is rather difficult to draw a general conclusion regardingthe relative merits of the Ogata–Tanemura versus the Jer-rum–Sinclair estimators, due primarily to the different natureof the associated (dominant) errors (i.e., integration versusstatistical error). Our simulation experience shows, however,that the Ogata–Tanemura estimators are more “reliable” thanthe corresponding Jerrum–Sinclair estimators for small valuesof the ’s, with the Jerrum–Sinclair estimators being slightlybetter than the corresponding Ogata–Tanemura estimators forlarge values of the ’s. We believe that this is due to the factthat, for small values of the ’s, the variance of the ergodicaverages in (51) is smaller than the variance of the ergodic

11For the shake of a “fair” comparison, and since the Ogata–Tanemuraestimators require estimation of� + 1 ergodic averages, we consider� + 1

terms in (62).

averages in (62), whereas, for large values of the’s, theintegration error due to (48) becomes the main source of errorin (51).

To conclude this subsection, we demonstrate the fact that the- and -based P-Type estimators are superior to the corre-

sponding E-Type estimators and . Towards thisgoal, we experimentally compare the exact value of pressure

to estimates , , and, for , . These comparisons are

depicted in Fig. 3. In all cases, the Ising model is taken tobe at a “low” temperature , whereasand , for . All estimators requirethe same number of computations. In all cases, the P-Typeestimators are more “reliable” than the correspondingestimators, which converge to the exact pressure slowly. Ourexperience indicates that, for sufficiently large, any P-Typeestimator eventually outperforms the best E-Type estimator

, for the same number of Monte Carlo iterations.To summarize, if one is willing to estimate by (20), one

should proceed by employing estimator orestimator . Otherwise,or should be used instead.

B. Likelihood Function Estimation

We now present simulation experiments that demonstratethe effectiveness of the - and -based P-Type partition

Page 12: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1959

Fig. 3. E-Type versus P-Type estimators for� = ��; �

��. The verticalscales are different in each plot so as to enhance detail.

function estimators for estimating the likelihood function of afully, as well aspartially, observed GRF. As we have brieflystated in Section I, this calculation is relevant to the problemsof Bayesian statistical inference and hypothesis testing.

The (per point) log-likelihood function of afully observedsample of a GRF is given by (see also (1) and (9))

(66)

Since depends on the pressure, its analytical cal-culation is not feasible in general. We can, however, obtaina Monte Carlo estimator of the log-likelihood function, de-noted by , by replacing in (66) with itscorresponding E-Type or P-Type partition function estimator.Clearly, the theoretical results and algorithms presented inSections III and IV are directly relevant to this problem.

In certain applications, we may need to assume that GRFis not fully observed but instead it is transformed into an

observable random field

defined on grid . Random field can take anyone of thepossible realizations

where , and is a finite state space that contains: distinct values. Transformation is

mathematically described by means of conditional probabili-ties , for all and .A partially observedGRF is then a realization of , drawnfrom the probability mass function

The (per point) log-likelihood function of apartially observedsample is now given by

(67)

where

(68)

Under certain restrictions on the conditional probability(see [9, Proposition 2.3]), is the

partition function of a GRF whose-function (2) is given by. Both partition functions in (67)

can be then estimated by means of an E- or P-Type estimator,and a Monte Carlo estimator of canbe thus obtained.

We restrict our simulations to a Ising model withLTF given by (8), where we take . We first generatea sample by assuming “true” parameters

. Sample is depicted in Fig. 4(a), and hasbeen obtained by means of 50 000 iterations of the Gibbssampler. We then “degrade” by means of the stochasticdegradation process

(69)

where , , with , for, and , for . Fig. 4(b)–(d) depicts three de-

graded realizations , for respectively.Estimates of and are obtained by means ofestimator . Notice that the LTF associatedwith the second partition function isinhomogeneous, and isgiven by (see also (8), (68), and (69))

for all . The pressure estimates lead to thelikelihood function surfaces depicted in Fig. 5. Fig. 5(a)depicts the estimated log-likelihood function , whereasFig. 5(b)–(d) depict the estimated log-likelihood function

, with , respectively. In all cases,, where

Page 13: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1960 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

(a) (b)

(c) (d)

Fig. 4. Samples (a)hhh0; (b) yyy0

, for p = 0:15; (c) yyy0

, for p = 0:25; and (d)yyy0

, for p = 0:35, used in the simulation experiments of Section V-B.

A Monte Carlo maximum-likelihood estimate(MCMLE)of can be obtained as the value that

maximizes the estimated log-likelihood function over .From the surfaces depicted in Fig. 5, we obtain

given , and

given , for , respectively. The accuracyof these results demonstrates the fact that the likelihoodfunction has been adequately estimated by means of estimator

. Since this estimator is the most slowlyconvergent estimator among the four- and -based P-Type estimators discussed in this paper, the remaining three areexpected to give similar or even more accurate results, for thesame number of Monte Carlo iterations. Notice, finally, thatas increases the log-likelihood function becomes “flatter”around the point , and the “true”parameters are more difficult to locate on the likelihoodsurface.

VI. CONCLUSION

In this paper, we have proposed a unified approach to sto-chastic simulation algorithms for partition function estimation.We have focused our attention on the computational com-plexity of these algorithms and on determining the statisticalproperties of the resulting estimators. We have been able toclassify the algorithms into two categories: E-Type algorithms,of exponential complexity with respect to grid size, and P-Type algorithms, of polynomial complexity, Turing reducibleto the problem of sampling from the Gibbs distribution. TheP-Type algorithms have been introduced in a more generalsetting, over the one reported in the literature. We havesuggested use of and as means of initializing estimators

and , as opposed to the orig-

inal suggestion of setting . It is clear from Fig. 2that this leads to significantly faster algorithms. All partitionfunction estimators presented in this paper are consistent,as the number of Monte Carlo iterations grows to infinity,with the exception of estimator . Among theconsistent partition function estimators, only estimatoris unbiased (for any value of ).

Theoretical analysis and supporting simulations lead us tothe following conclusions (see also Table I):

1) Estimator is the most “reliable” E-Type estima-tor of the pressure, at all temperatures, withbeing sufficiently accurate at “high” temperatures (seeFig. 1(b)).

2) Estimators and arethe most “reliable” P-Type estimators and enjoy similarestimation performance (see Fig. 2).

3) The P-Type estimators are more “reliable” than the cor-responding estimators (see Fig. 3). Moreover, andfor sufficiently large , any P-Type estimator eventuallyoutperforms all E-Type estimators, for the same numberof Monte Carlo iterations.

4) Either or , for, can be successfully employed for estimating

the likelihood function of a fully or a partially observedGRF.

It should be emphasized here, however, that our experimen-tal results depend on the particular GRF at hand. As is clearfrom (18) and Theorems 3 and 4, the relative performance ofthe various partition function estimation techniques dependson the “distance” between the probability mass functions,

, from the Gibbs distribution . It is possible, thatboth and (or and ) achieve good approximationto (or to ), which is clearly the case at “high” enoughtemperatures, or that both and (or and )achieve a poor approximation to (or to ). In eithercase, there is no significant advantage in employing a-based estimator, and the computationally simpler-basedestimator should be used instead.

APPENDIX

PROOFS OFTHEOREMS

Proof of Theorem 1:From Chebyshev’s inequality [23],with , we have that (see also (15) and (17))

It is now clear that (22) is satisfied if , whereis given by

(70)

Page 14: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1961

(a) (b)

(c) (d)

Fig. 5. Estimated (per point) log-likelihood functions of an Ising model by means of estimatorFJS(�; ���; K

��10

). (a) F (Lhhh (�)). (b) F (Lyyy (�)),for p = 0:15. (c) F (Lyyy (�)), for p = 0:25. (d) F (Lyyy (�)), for p = 0:35.

We now establish the existence and finiteness of , given

by (24), at any temperature . Notice that

which is clearly the partition function of a GRF on grid,

with LTF

at any . In this case, [18, Theorem D.2.1] is

applicable and, therefore,

(71)

From (10), (18), and (71), we conclude that exists and

is finite for every , being the difference of twofinite limits. From (18), observe that , forall integers and all , and therefore,

.We now show (23). First, notice (see also (18)) that

, whereas, , sinceTherefore, (70) yields

which, together with (18) and (24), results in

(72)

We next show that

(73)

Page 15: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1962 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Notice that (73) holds trivially when . We,therefore, need only show (73) for the case when .From (18) and (24) we have that

(74)

From (70), and by applying the inequality, with , we obtain

which gives (73) (see also (18), (24), and (74)). Clearly, (72)and (73) yield (23).

It now remains to show (26). First, notice that (13) and (25)yield

for all

whereas, (1)–(3) yield

for all

These limits, together with (18), yield

Notice next that (see (1), (3), (5)–(7), (13), and (18))

(75)

where

Due to (25) and the fact that

(75) yields

which in turn gives

This completes the proof.We should point out here that, similarly to the proof of

[15, Lemma 3], we can “relax” the dependence of(see (70)) to an dependence. This can

be accomplished by independently employing an odd numberof estimators (17) and by taking theirmedian. This is alsoapplicable to Theorems 3 and 4.

Proof of Theorem 2:From (31) and (32), we have that

(76)

provided that , since due to the fact that

Notice now that (see (3)–(5), and (33))

(77)

at any which satisfies (33). From (15), (16), (19), (76), and(77) we obtain

(78)

at any temperature which satisfies (33). By taking limits onboth sides of (78), as , we show (34). This completesthe proof.

Proof of Theorem 3:Let us define random variablesandby

(79)

for (80)

where , and the numerical integrationerror in (48)–(50) by

(81)

Let us consider an accuracy and confidence, such that

(82)

and

Page 16: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1963

for (83)

where , , for , and. By using the statistical independence of random

variables , and the facts that (see(48)–(51) and (79)–(81))

and

we obtain (see also (82) and (83))

since , for every integer and every. For (52) to be satisfied, it suffices to

• choose such that (82) is satisfied;• given this choice for , choose and such that (83)

is satisfied.

The numerical integration error is bounded by [24]

for some (84)

By differentiating (43) four times, with respect to, and byusing the equality

for every integers (which can be easily shown bymeans of (42)–(44)), we have that

(85)

where and . From (44),(53), and the fact that

we obtain

(86)

for every integer , which, together with (85), yields

(87)

Clearly, (82), (84), and (87) lead to

(88)where positive even integer: .

Notice now that (83) is equivalent to (see also (80))

(89)

for all . By using Chebyshev’s inequality, it isnot difficult to see that (89) is satisfied if

(90)

The left-hand side of (90) is themean-square errorof estimator(46), where we substitute and . It has beenshown in [9, Sec. 3.5] that the mean-square error of estimator

of

(where , for all , and ,is a collection of i.i.d. GRF’s obtained by means

of “many short runs” MCMC with “burn-in” equal to ),satisfies

(91)where , with

Clearly, and for any , the mean-square error (91) willbe bounded above by , if

and

(92)

Page 17: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

1964 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

In order for the second inequality in (92) to be satisfied, itsuffices to choose such that

(93)

or, since (see (1), (3), and (5)–(7))

it suffices to require that satisfies

or, equivalently,

(94)

In order for the first inequality in (92) to be satisfied, it sufficesto choose such that

(95)

If we now substitute , ,, , for , and

in (95), we obtain

(96)

In addition, if we set , for in (94),and if we consider the fact that (see (41) and (47))

we derive

(97)

Finally, (93) becomes

(98)

under the same substitutions.Notice now that, if , then there exists an

such that

Clearly (see also (44)),

for all integers . Therefore (see also (98)), thereexist integers and such that , for every

, , and for all . This fact,

together with (86), (88), and (96), yields ,for all . It is finally obvious from (97) that

This completes the proof.Proof of Theorem 4:Notice first that

is clearly of . We now proceed to show that

(99)

and that the “burn-in” is given by (97), with

(100)

for every .Let . Consider random variables

for

such that

for (101)

For (63) to be satisfied, it suffices to choose and suchthat (101) is satisfied (see also (1), (60), and (62)). FromChebyshev’s inequality, we have that

where . For (101) to be satisfied it,therefore, suffices that

for

where . Clearly, (91)–(95) areapplicable in our case, if we substitute ,

, , , and . With thesesubstitutions, and , in (91), (92), and(95), become

(102)

and

Page 18: Stochastic approximation algorithms for partition function estimation of Gibbs random fields

POTAMIANOS AND GOUTSIAS: ALGORITHMS FOR PARTITION FUNCTION ESTIMATION OF GIBBS RANDOM FIELDS 1965

(103)

respectively. Clearly, (93) and (95), with the previous substi-tutions, and (102), (103), give (99) and (100). Notice now thatfor given by

for every integer , which, together with (61), and the left-handside equality in (99), yields . This completesthe proof.

Proof of Theorem 5:Theorem 5 is a straightforward exten-sion of Theorems 3 and 4 (see also (54) and the fact thatthe computation of functionals in (51) and

in (62) is of ).

REFERENCES

[1] J. Besag, “Spatial interaction and the statistical analysis of latticesystems (with discussion),”J. Roy. Statist. Soc., Ser. B, vol. 36, pp.192–236, 1974.

[2] R. Kindermann and J. L. Snell,Markov Random Fields and TheirApplications, vol. 1 of Contemporary Mathematics.Providence, RI:Amer. Math. Soc., 1980.

[3] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions,and the Bayesian restoration of images,”IEEE Trans. Pattern Anal.Machine Intell., vol. PAMI-6, pp. 721–741, 1984.

[4] R. J. Baxter,Exactly Solved Models in Statistical Mechanics.London,England: Academic, 1982.

[5] J. Besag, “Statistical analysis of non-lattice data,”Statistician, vol. 24,pp. 179–195, 1975.

[6] L. Younes, “Estimation and annealing for Gibbsian fields,”Ann. Inst.Henri Poincare, vol. 24, pp. 269–294, 1988.

[7] , “Parametric inference for imperfectly observed Gibbsian fields,”Prob. Theory Related Fields, vol. 82, pp. 625–645, 1989.

[8] Y. Ogata, “A Monte Carlo method for an objective Bayesian procedure,”Ann. Inst. Statist. Math., vol. 42, pp. 403–433, 1990.

[9] G. Potamianos, “Stochastic simulation algorithms for partition functionestimation of Markov random field images,” Ph.D. dissertation, TheJohns Hopkins Univ., Dept. Elec. Comput. Eng., Baltimore, MD, 1994.

[10] Y. Ogata and M. Tanemura, “Estimation of interaction potentials ofspatial point patterns through the maximum likelihood procedure,”Ann.Inst. Statist. Math., vol. 33, pp. 315–338, 1981.

[11] , “Likelihood analysis of spatial point patterns,”J. Roy. Statist.Soc., Ser. B, vol. 46, pp. 496–518, 1984.

[12] , “Likelihood estimation of soft-core interaction potentials forGibbsian point patterns,”Ann. Inst. Statist. Math., vol. 41, pp. 583–600,1989.

[13] A. Penttinen, “Modelling interactions in spatial point patterns: Parameterestimation by the maximum likelihood method,”Jyvaskyla Studies inComput. Sci., Economics and Statist., vol. 7, pp. 1–107, 1984.

[14] G. G. Potamianos and J. K. Goutsias, “Partition function estimationof Gibbs random field images using Monte Carlo simulations,”IEEETrans. Inform. Theory,vol. 39, pp. 1322–1332, 1993.

[15] M. Jerrum and A. Sinclair, “Polynomial-time approximation algorithmsfor the Ising model,”SIAM J. Comput., vol. 22, pp. 1087–1116, 1993.

[16] J. K. Goutsias, “Mutually compatible Gibbs random fields,”IEEE Trans,Inform, Theory, vol. 35, pp. 1233–1249, 1989.

[17] D. Ruelle, Statistical Mechanics: Rigorous Results.Reading, MA:Addison-Wesley, 1983.

[18] R. S. Ellis,Entropy, Large Deviations, and Statistical Mechanics.NewYork: Springer-Verlag, 1985.

[19] J. Goutsias, “Unilateral approximation of Gibbs random field images,”Comput. Vis., Graph., and Image Process.: Graphical Models and ImageProcess., vol. 53, pp. 240–257, 1991.

[20] M. H. Kalos and P. A. Whitlock,Monte Carlo Methods. Volume I:Basics. New York: Wiley, 1986.

[21] C. J. Geyer, “Practical Markov chain Monte Carlo (with discussion),”Statist. Sci., vol. 7, pp. 473–511, 1992.

[22] P. H. Peskun, “Optimum Monte-Carlo sampling using Markov chains,”Biometrika, vol. 60, pp. 607–612, 1973.

[23] P. J. Bickel and K. A. Doksum,Mathematical Statistics. Oakland, CA:Holden-Day, 1977.

[24] J. Stoer and R. Bulirsch,Introduction to Numerical Analysis.NewYork: Springer-Verlag, 1980.

[25] C. J. Geyer and E. A. Thompson, “Constrained Monte Carlo maximumlikelihood for dependent data (with discussion),”J. Roy. Statist. Soc.,Ser. B, vol. 54, pp. 657–699, 1992.

[26] C. J. Geyer, “On the convergence of Monte Carlo maximum likelihoodcalculations,”J. Roy. Statist. Soc., Ser. B, vol. 56, pp. 261–274, 1994.

[27] C. H. Bennett, “Efficient estimation of free energy differences fromMonte Carlo data,”J. Computat. Phys., vol. 22, pp. 245–268, 1976.

[28] G. M. Torrie and J. P. Valleau, “Nonphysical sampling distributions inMonte Carlo free energy estimation: Umbrella sampling,”J. Computat.Phys., vol. 23, pp. 187–199, 1977.