GModel CMIG-1051; No.of Pages9 ARTICLE IN PRESS ...rosie/mypapers/stopping-rule.pdf · 2 H. Guo, R.A. Renaut / Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx appropriate

G

C

R

Ha

b

a

ARR2A

P88808

KPEIS

1

eoDptcm[thsusetlav

0d

ARTICLE IN PRESSModel

MIG-1051; No. of Pages 9

Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx

Contents lists available at ScienceDirect

Computerized Medical Imaging and Graphics

journa l homepage: www.e lsev ier .com/ locate /compmedimag

evisiting stopping rules for iterative methods used in emission tomography

ongbin Guoa,∗, Rosemary A. Renautb

InstaRecon, Inc., 60 Hazelwood Dr., Champaign, IL 61820, USAArizona State University, School of Mathematical and Statistical Sciences, Tempe, AZ 85287-1804, USA

r t i c l e i n f o

rticle history:eceived 13 February 2010eceived in revised form9 September 2010ccepted 23 November 2010

ACS:7.57.s7.57.nf7.57.uk

a b s t r a c t

The expectation maximization algorithm is commonly used to reconstruct images obtained from positronemission tomography sinograms. For images with acceptable signal to noise ratios, iterations are termi-nated prior to convergence. A new quantitative and reproducible stopping rule is designed and validatedon simulations using a Monte-Carlo generated transition matrix with a Poisson noise distribution onthe sinogram data. Iterations are terminated at the solution which yields the most probable estimate ofthe emission densities while matching the sinogram data. It is more computationally efficient and moreaccurate than the standard stopping rule based on the Pearson’s �2 test.

© 2010 Elsevier Ltd. All rights reserved.

2.70.Uu7.57.U

eywords:ETxpectation maximization
mage reconstructiontopping rule
. Introduction

Many image reconstruction methods have been proposed formission tomography (ET). They can be classed as deterministicr statistical, dependent on the underlying mathematical model.eterministic algebraic models, which include the filtered backrojection method as well as variations of the algebraic reconstruc-ion technique [1], do not model the noise. On the other hand, thelassical approach for reconstructing ET image data uses a statisticalethod which models the Poisson distribution of the recorded data

2], and yields the maximum likelihood (ML) model with the expec-ation maximization (EM) algorithm. It has the potential to produceigh quality images [2], and its extension in the form of the orderedubsets EM (OSEM) [3,4], has been adopted by a number of man-facturers [5]. Still many important issues, including the rule fortopping the iteration, need to be addressed. While both theory andxperiment indicate the monotonic increase of the likelihood func-

Please cite this article in press as: Guo H, Renaut RA. Revisiting stopping ruImaging Graph (2011), doi:10.1016/j.compmedimag.2010.11.011

ion in the EM method, it is well-known that stopping at a smallerikelihood value, i.e. not at the maximum for the likelihood, can givehigher quality solution [6–9]. Stopping the iteration before con-ergence in this way is necessary but appears to conflict with the

∗ Corresponding author.E-mail address: [email protected] (H. Guo).

895-6111/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.oi:10.1016/j.compmedimag.2010.11.011

spirit of maximizing the likelihood function. Here a new statisticallybased test for determining the optimal iteration at which to stopthe EM algorithm for reconstructing positron emission tomography(PET) images is presented.

The outline of the paper is as follows. Section 2 provides anoverview of the PET model, including revisiting the mathemati-cal derivation of the EM algorithm for reconstructing the sinogramdata, and a discussion of the impact of low count data on theEM algorithm. This leads naturally to the presented terminationtest in Section 3. Numerical experiments and the results are dis-cussed in Section 4, where results are also contrasted with thoseobtained using a standard Pearson’s �2 test to terminate the itera-tion. It is concluded in Section 5 that the presented terminationrule is not only viable, but also more computationally efficientand accurate, particularly for low count data, than the Pearson’s�2 test.

2. The maximum likelihood model and the expectationmaximization algorithm

les for iterative methods used in emission tomography. Comput Med

Although the ML model based iterative methods can result inthe same solution as the EM algorithm, ML and EM are indeeddifferent concepts. For completeness, an overview of the MLmodel and application of the EM algorithm in the context of PETimage reconstruction is provided. First necessary information and

dx.doi.org/10.1016/j.compmedimag.2010.11.011


http://www.sciencedirect.com/science/journal/08956111

http://www.elsevier.com/locate/compmedimag

mailto:[email protected]


ING

C

2 ical Im

ac

2

doae�aCetvnatdiixmsb∑

AotT

tasofsP

2

dod�

p

T

L

Dg

h

ARTICLEModel


H. Guo, R.A. Renaut / Computerized Med

ppropriate notation for PET imaging is provided. Further detailsan be found in many standard texts, for example [10].

.1. Background

For simplicity, assume that the PET scanner is used for two-imensional brain tomography. The brain is surrounded by an arrayf detectors in a plane, for which each pair of detectors formstube which crosses the plane. After a positron is emitted and

ncounters a nearby electron, annihilation occurs and a pair of-rays in (nearly) opposite directions is created. These emissionsre assumed to be random but uniformly distributed in space.oincident arrivals of a �-ray pair of sufficient energy at bothnds of a tube constitute a count for that tube. Assuming that thewo-dimensional pixel data are represented by a one-dimensionalector with components xj (j = 1, . . ., n), where xj represents the trueumber of coincidences at pixel j, then pij represents the prob-bility that a photon pair emitted at pixel j will be detected byube i (i = 1, . . ., m), and bi is the measured count of all coincidencesetected by tube i. This vector of counts is the sinogram for the

mage data x. The detection probability matrix, or transition matrixs given by P ∈ Rm×n

+ , m > n and the unobserved intensity vector by∈ Rn+. Without loss of generality, assume that all emissions in theeasured plane can be captured by the ring of detectors in that

lice. In other words, the sum of entries in each column of P shoulde one,

i

pij = 1, j = 1, . . . , n, i.e. PT em = en. (2.1)

zero row in P implies that the associated tube does not detect anyf the positron emissions. Clearly, such tubes do not contribute tohe reconstruction and should be removed from the formulation.hus

∑jpij > 0, i = 1, . . ., m, i.e. Pen > 0.

Above, and throughout, standard notation is adopted: For vec-ors a > (≥)b implies the elementwise relationship, i.e. ai > (≥)bi forll i, and.* and./indicate component-wise multiplication and divi-ion, respectively, for vectors. Pi denotes row i of P and el is a vectorf length l where each entry is 1. y ∼ (0, 1) denotes that y is sampledrom a standard normal distribution, mean 0 and variance 1, andimilar notation is used for other statistical distributions, includingoisson denoted by P, binomial by B, and �2.

.2. The ML model

Let �i represent the expected number (statistically) of countsetected by tube i. Then �i = Pix. Because of the Poisson naturef the positron emission process, the counts detected by tube i,enoted by �i, obey a Poisson distribution with expectation �i,i∼(�i). Thus the conditional probability is

(�i = bi|x) = e−�i(�i)

bi

bi!. (2.2)

he likelihood function over all tubes is

(x) =m∏

i=1

e−�i(�i)

bi

bi!.

ropping the constant term, the negated log-likelihood function is


iven by

(x) =m∑

i=1

(�i − bi log �i). (2.3)

PRESSaging and Graphics xxx (2010) xxx–xxx

It is easy to verify that the gradient and Hessian of h(x) are given by

∇h = PT (em − b./(Px)), and H = PT diag

(bi

(Px)2i

)P. (2.4)

Based on the Kuhn–Tucker (KT) conditions for minimizing h(x) withconstraint x ≥ 0 one may derive many iterative algorithms whichcan be used to reconstruct the image data, for example see a stan-dard text such as [11].

2.3. The EM algorithm

In 1982, Shepp and Vardi [2] proposed the use of the EM algo-rithm [12] for PET image reconstruction. The solution of the EMalgorithm, which is the same as the Richardson–Lucy algorithmfor this problem [13,14] is the minimum point of h(x), and can bederived as in the following which partially adapts the derivation in[15].

Denote the number of emissions by pixel j detected in tube iby the random variable �ij(x) or simply �ij, the so-called complete-data in the EM method [12]. Due to the �-ray emission process,�ij ∼ (pijxj). To make it clear that zero count tubes are not consid-ered, introduce � ={i : bi > 0}. Then, with the condition

∑j�ij = bi,

�ij is binomially distributed �ij∼(

bipijxj/(∑

spisxs

)). Thus, for a

given x we can estimate for �ij . Moreover, for a given �ij , x can beestimated by maximization of the likelihood function

Lc(x(k+1)) =∏

i,je−pijx

(k+1)j (pijx

(k+1)j

)�ij

�ij!.

Differentiating log(Lc(x)) with respect to xj, j = 1, . . ., n and settingto zero yields

m∑i=1

pij −∑

i ∈ ��ij

xj= 0, j = 1, . . . , n (2.5)

1 =∑

i ∈ ��ij

xj, j = 1, . . . , n, (2.6)

where the second equation follows from (1). The EM algorithm thusfollows by iterating over the E and M iterations:

E-iteration (expectation iteration): given x = x(k) estimate

�ij = E(�ij|∑

j

�ij = bi) =

⎧⎨⎩

bipijx(k)j∑n

s=1pisx(k)s

, i ∈ �

0, i /∈ �.

(2.7)

M-iteration (maximization iteration): given �ij , estimate x(k+1)

x(k+1)j

=∑i ∈ �

�ij =∑i ∈ �

bipijx(k)j∑n

s=1x(k)s pis

= x(k)j

(PT (b./(Px)))j. (2.8)

In these iterations, one only uses the rows of P for which mea-sured tube counts are nonzero. While it is clear that tubes with zerocounts result for tubes which do not cross any active pixels, thosewhere emissions occur, zero counts may also occur if the meannumber of emissions for intersected active pixels is low. Remov-ing rows of P and the corresponding measurements bi for whichthe measured counts are negligible as determined by some thresh-old, and using the notation P and b to denote variables with thenegligible data removed, the M iteration (8) is given by


x(k+1) = x(k). ∗ ◦P

T(

◦b./(

◦Px(k))). (2.9)

Remark 2.1. In reducing P to P, the column sum identity (1) is nolonger guaranteed, but, without loss of generality, each column isstill nonzero, pTe > 0. Equivalently, the probability of any emitted


ARTICLE ING Model

CMIG-1051; No. of Pages 9

H. Guo, R.A. Renaut / Computerized Medical Im

Table 1Probability p(|� − �| ≤ 0.2�) for � = 1, . . . , 20.

� 1 2 3 4 5 6 7 8 9 10

pi

Rcdrd

rbrr

mf

2

db

iT1T2p

bvz1Tfigwt

Ft

p .360 .270 .224 .190 .497 .459 .428 .403 .382 .571

� 11 12 13 14 15 16 17 18 19 20p .549 .530 .512 .459 .635 .619 .605 .591 .578 .687

hoton from the image to be captured by one of the reduced tubess still nonzero and the EM iteration does not breakdown.

emark 2.2. The removal of the negligible count data does notompletely delete the information from small measurements. Thisata is still included in the first term of (5) but can be totallyemoved from (6) if this equation, and consistently also (9), isivided by

∑mi=1pij on the right hand side.

The gradient and Hessian of h(x) are given in terms of theeduced variables by immediately replacing all occurrences of P andin (4) by P and b. The convergence of the EM algorithm for the

educed variables still holds provided x(0) > 0 ∈ ˝ where the feasibleegion is given by ˝ = {x|x ≥ 0 and Px > 0}.

The negated log-likelihood function h(x) has an unique mini-um point xML in region ˝. Moreover the iterative values obtained

rom the EM algorithm converge to xML, i.e. xEM = xML.

.4. Significance of tubes with low count data

What is the interpretation of a tube with a low count? Are theata from low count tubes reliable? Recall that in the ideal casei∼(�i). This implies that p(bi = �) = (��

i/�!)e−�i . But how likely is

t that the reported values will be close to the mean when � is small?able 1 provides the probability that � is close to the mean � for � =, . . ., 20, where close here means within 20% of the expected value.he values shown in Table 1 indicate that for low mean counts, �i ≤0, the relative deviation of bi from the mean is, with significantrobability, greater than 20%.

To assess the relevance of this information, a set of FDG-PETrain data acquired on a PET scanner, for which details are pro-ided in Section 4, was examined. After filtering out tubes withero counts, there are still 33,191 tubes with counts ranging fromto 877. Of these there are 2615 tubes with counts less than 20.

hough the measured counts are not the actual expected values


or each tube, it is safe to conclude that, with significant probabil-ty, the deviation for about 7.9% of the sinogram measurements isreater than 20%. It is thus appropriate to combine data from tubesith low counts, as is used in the statistical argument which leads

o the termination rule for the iteration described in Section 3.

ig. 1. Illustrations of Px for a representative case. In (a) Px(68) is illustrated for 256 tubesube number, for those 19824 tubes with positive counts.

PRESSaging and Graphics xxx (2010) xxx–xxx 3

2.5. Significance of removing the negligible count data for the EMalgorithm

Suppose that the EM algorithm is initialized with an everywherenon-zero image x(0) > 0. It is then easy to verify from (8) that x(k) > 0,the total number of counts is preserved (

∑nj=1x(k)

j=

∑mj=1bi) and

the negated log-likelihood function decreases as the iterations pro-ceed (h(x(k)) < h(x(k−1)). Moreover, because x(k) > 0 and all entriesof P are nonnegative, Px(k) > 0. Thus the EM algorithm will notbreakdown. However, suppose that some of the entries of Px(k) areextremely small, i.e. suppose Px(k)

iis very small, then (b./(Px(k)))i

will be large, which implies PT(b./(Px(k))) will include large uncer-tainty. Practically a very small value of Pi indicates that the ith tubehas a very low count, and such tubes contain data that is quite unre-liable. Thus such data should be removed from the reconstruction.Fig. 1 shows that the entries of Px associated with nonzero tubesare greater than one, while those for zero tubes may be as low as10−40.

3. Stopping rule

It is well-known that it is necessary to terminate the EM iterationearly in order to achieve an acceptable image reconstruction whichis not overly contaminated by noise. For example, finding a suitablestatistical stopping rule has been addressed previously, includingby Veklerov and Llacer in 1987 who used Pearson’s �2 test [16] andCoakley in 1991 who used cross-validation [17].

• To apply the Pearson’s �2 test, the cumulative distribution func-tion (CDF) for each density function P([Px(k)]i) is split into anumber, N, of segments, say N = 20 [16], such that area for eachsegment is equal to 1/N. The sinogram data (vector b), is thendivided into groups based on the location of each component bifor the corresponding Poisson density function: P([Px(k)]i). Let gibe the number of sinograms which belong to group i, then statis-tic G, given by G(k) = ∑N

i=1{(gi − m/N)2/(m/N)}, is used for thePearson’s �2 test. For a given significance level this test then pro-vides a range of iterations for which the corresponding imagesare acceptable.

• To implement cross-validation, the sinogram is split into at leasttwo groups. EM is performed for the first group and stopped at


the iteration for which the likelihood function for the remainingsinogram data is maximized. The same procedure is repeated forother groups to generate images for each group of data, and thefinal image is the average of the images generated by each group.As noted by the authors, cross-validation critically depends on

for each direction, over all 192 directions. In (b) Px(68) is given on a log scale against


IN PRESSG

C

4 ical Imaging and Graphics xxx (2010) xxx–xxx

i

mdmsbwBN1w

12

3

4

56

7

8

Rmot

Rc(a

RttTbGslaulr

4

C(

ARTICLEModel


H. Guo, R.A. Renaut / Computerized Med

the proportion of data in each of the groups, and, as noted byJohnson, is a limitation of the approach [18].

Here a new termination test is derived based on stopping theterations at the solution x(k) which yields the most probable esti-

ate of the emissions, i.e.◦bi∼([

◦Px(k)]i) for all i. Because a Poisson

istribution can only be approximated by a Gaussian when theean is significantly large, say 20, small counts tubes are combined

o that (Px)i ≥ 20 for all aggregated tubes i. Then the Poisson distri-ution P([Px]i) can be approximated as Gaussian N([Px]i,

√[Px]i)

hen [Px]i ≥ 20. and �2(m) can be approximated by N(m,√

2m).ecause

√n(

(1/n)∑n

i=1xi − �)

/� and∑n

i=1(xi − �)2/�2 follow(0, 1) and �2(n), distributions, respectively, if xi∼N(�, �), i =, 2, . . . , n. This leads to the following algorithm for assessinghether to stop the EM iteration.

For iteration k = 1,2, . . .

. Update x(k) using (9).

. Bin equations with small counts such that all measurementshave means greater than 20, i.e. �(k)

i= [Px(k)]i ≥ 20 for i =

1, . . . , m. Here m is the number of tubes after binning.

. Normalize the data through y(k)i

= (bi − �(k)i

)/√

�(k)i

, yielding

y(k)i

∼N(0, 1) because bi∼N(�(k)i ), i = 1, . . . , m.

. Calculate ˛(k) =√

my(k), here y(k) is the mean of {y(k)i

, i =1, . . . , m}.

. Calculate ˇ(k) =∑m

i=1(y(k)i

)2.

. Calculate the likelihood

lk = pN(˛(k))pN(ˇ(k)) = 1

2√

2me(−((˛(k))

2/2)−((ˇ(k)−m)

2/4m)),

here pN(˛(k)) is the Gaussian density N(0, 1) at the point ˛(k) andpN(ˇ(k)) is the Gaussian density N(m,

√2m) at the point ˇ(k).

. Update k = arg maxj≤klj.

. If lk < lk, Stop and return solution x(k).

emark 3.1. Zero tube data are used to keep the complete infor-ation. But data associated with zero rows of P are removed,

therwise the artificially increased freedom for the revelant dis-ributions introduces inaccuracy.

emark 3.2. The Gaussian probability density can be easily cal-ulated using the MATLAB statistics toolbox function pdf: pdf‘ norm ’ , x, �, �) computes the normal distribution with mean �nd standard deviation � at x.

emark 3.3. The number 20 is not a totally arbitrary choice forhe minimal mean. It is selected based on the Poisson nature ofhe data and does not depend on the density of the radioactivity.he binning is performed to reduce the standard deviation for theinned tubes with low counts and to guarantee that the use of theaussian is a good approximation for the Poissonian. While using amaller threshold would result in a large uncertainty for tubes withow counts. On the other hand, using a larger threshold will yield

larger bin size but may degrade the accuracy due to the loss ofseful information about details in the reconstruction. Moreover, a

arger bin size may lead to a situation in which the model for theeconstruction is incompletely specified.


. Numerical experiments

The transition probability matrix is generated by the Montearlo method based on the geometry of the scanner ECAT 951/31CTI, Knoxville, TN). A 2D PET scanner of radius 300 mm with

Fig. 2. The skull removed un-scaled Shepp–Logan phantom I0. Regions 1–4 havedensities 0.1, 0.2, 0.3 and 0.4, respectively and the background has zero density. Inthe tests I0 is scaled to yield reference images with total counts of 0.5, 1, 6.4 and 12.8million, respectively.

256 parallel detector tubes per angle and a total 192 angles wasmodeled. The 128 × 128 pixels are of size 1.8776 mm × 1.8776 mm.10 million coincidences are uniformly assigned to 128 × 128 pixels.The system matrix is determined by Monte-Carlo random simula-tion for uniformly distributed radiation angles. Assume Nj lines areassociated with pixel j according to a uniform angular distributionto represent �-ray pairs emitted from the pixel j. Suppose that Ni isthe number of these lines which lie in the detector tube i, then theprobability pij is given by the expression pij = Ni/Nj.

The skull removed (there is no radioactivity in the bone)128 × 128 two-dimensional Shepp–Logan phantom, denoted by I0,is used for simulations, Fig. 2. The regions with density 0.1, 0.2, 0.3and 0.4 are labeled by 1, 2, 3 and 4 respectively. The correspond-ing numbers of pixels for these regions are 24, 5351, 701 and 14.The remaining pixels have zero density. I0 is scaled so that the totalcounts are 0.5, 1, 6.4 and 12.8 million, respectively. The scaled trueimages are denoted by x*. The measured sinograms are set to bePoisson samples fromP(Px∗), i.e. b ∼ P(Px*), using MATLAB functionpoissrnd (Px*). By assuming that the sinogram follows a true Pois-son distribution it is assumed that the detector efficiency equals1 and the effects of attenuation and scattering are corrected. Eachcase is simulated with 100 realizations of the sinogram.

The initial image, x(0), used in the EM algorithm has uniformdensity and the total number of initial counts is set to 0.5, 1, 6.4and 12.8 million accordingly. The bin size for the low count sino-grams is set to 20 counts. Thus the size of the merged tubes will begreater than or equal to 20 and less than 40. Veklerov and Llacer’simplementation of the Pearson’s �2 test for determining the termi-nating iteration in this situation is also tested for comparison [16].As in [16], the number of classes is set to 20, but rather than using allnonzero sinogram data, only those data with means greater or equalto 1, Px(k) ≥ 1, are used in the calculation of the �2 test. In these sim-ulations about 35.6% of the nonzero entries are less than 1, Px(k) < 1,and 33.3% are even less than 0.01. If all the nonzero entries of thesinograms are included for the calculation of the �2 test, the gen-erated value for G(k) for all iterations is far higher (at least 10 timesgreater) than the critical value suggested by the �2 test. Although

2


the Pearson’s � test is assumed to provide a range of iterationswhich will provide an acceptable image, there are some cases forwhich no k can be found such that G(k) is less than the critical value.To avoid this issue the terminating k is chosen to minimize G(k), i.e.k is chosen with respect to the optimal value of G(k) and thus no


ARTICLE IN PRESSG Model


H. Guo, R.A. Renaut / Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx 5

F e funcT 89.

s

4

kp

e

Fttrdt

k

Dtt

j

ig. 3. Representative EM reconstructed images for a simulation (a)–(c) and objectivhe best iteration to terminate the iteration, as measured by the minimum error, is

ignificance level is needed.

.1. Results

The difference between the reconstructed image at iteration, x(k), and the true image x* is measured using the normalizedercentage root mean square error

rr = ||x(k) − x∗||||x∗|| 100%. (4.10)

or each region of interest (ROI) ROIi, i = 1, . . ., 4, x(i) and Ri denotehe assigned counts and number of pixels of each region, respec-ively. kpred denotes the predicted iteration at which a stoppingule indicates that the optimal iteration has been reached and kbestenotes the iteration at which the corresponding image is closesto the true image as measured through the 2-norm distance

best = arg mink||x(k) − x∗||.


ifferences dknew and dk�2 are calculated via dk = kpred − kbest, forhe new termination test and the Pearson’s �2 test [16], respec-ively.

The deviation of the estimated value for realization l at pixel∈ ROIi to x(i) is djl = (x(kpred,l)

j− x(i)). The mean and absolute devi-

tion h(x(k)) decrease against number of EM iterations with 12.8 million total counts.

ation of djl over the 100 realizations for each region are

�i = 1100Ri

100∑l=1

∑j ∈ ROIi

djl (4.11)

di = 1100Ri

100∑l=1

∑j ∈ ROIi

|djl|. (4.12)

Throughout the measurements for the normalized bias �i/x(i) anddeviation di/x(i) are presented as percentages.

The EM reconstructed images at various stopping iterationsfor a representative realization with total counts 12.8 million arepresented in Fig. 3(a)–(c) and the convergence of the likelihoodfunctions is illustrated in (d). The equivalent error curves calcu-lated using (10), and the iterations to the minima, are presentedin Fig. 4(a). As the log-likelihood function decreases to conver-gence the images eventually deteriorate and the percentage error


increases. Equivalently, while x(k) converges to the ML solution XML,the image series eventually diverges from the true image x*. Thecurves (functions lk and G(k)) used to determine the iteration atwhich to stop the iteration using the presented rule and the use ofthe �2 test are illustrated in Fig. 4(b) and (c).




6 H. Guo, R.A. Renaut / Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx

0 50 100 150 20010

20

30

40

50

60

70a b c

step k

per

cen

tag

e er

ror

12.8 M6.4 M1 M500 K

0 50 100 1500

0.2

0.4

0.6

0.8

1x 10

−3

step

l k

12.8 M6.4 M1 M500 K

0 50 100 15010

1

102

103

104

105

106

G(k

)

12.8 M6.4 M1 M500 K

Fig. 4. Representative data, err (a), lk (b), and G(k) (c), for four simulations. The best storespectively (a). The predicted stopping iterations by the min lk based new rule and therespectively.

Table 2The bias, standard deviation and worst case of predicted iteration errors for dknew

and dk�2 over 100 realizations. Data are presented as the triples bias (std, worst).

500 K 1 M 6.4 M 12.8 M


New −0.42 (1.80, 5) 0.18 (1.89, 6) 5.34 (3.77, 15) 6.06 (5.12, 21)�2 −3.10 (2.35, −6) −5.65 (2.77, −12) 0.24 (5.91, 21) −0.11 (6.75, 20)

−8 −6 −4 −2 0 2 4 60

5

10

15

20

25

30

35

dk

Fre

quen

cy

New ruleχ2

−12 −8 −4 0 4 8 12 16 200

5

10

15

20

25

30

35

40

dk

Fre

quen

cy

New ruleχ2

a 500Kcounts

c 6.4Mcounts

Fig. 5. Histograms of predicted iteration erro

k step k

pping iterations are kbest = [89, 64, 28, 20] for 12.8, 6.4, 1 million and 500 K countsmin G(k) based �2 rule are knew = [99, 70, 21, 16] (b) and k�2 = [95, 70, 21, 16] (c)

Table 2 contrasts the estimates of the optimal stopping iterationover simulations with 100 realizations in each case. Given are thebias, standard deviation and worst case for dknew and dk�2 , respec-tively. The histograms in Fig. 5 complement the statistics for this


comparison. It can be concluded that the overall ability of the pre-sented rule to accurately find the best stopping iteration is better forsimulations with fewer counts, namely 0.5, 1 and 6.4 million counts.For 12.8 million counts the results are comparable. The variance indknew is less than that of dk�2 , although the bias may be larger.

−10 −5 0 50

5

10

15

20

25

30

35

40

dk

Fre

quen

cy

New ruleχ2

−12 −8 −4 0 4 8 12 16 200

5

10

15

20

25

30

dk

Fre

quen

cy

New ruleχ2

b 1Mcounts

d12.8Mcounts

rs dknew and dk�2 over 100 realizations.





0 50 100

0

50

100

150 New, 25(5)χ2, 15(−6)True

0 50 100

0

50

100

150

200

250

300 New, 34(6)χ2, 16(−12)True

0 50 100

0

500

1000

1500

2000 New, 77(15)χ2, 84(21)True

0 50 100

0

1000

2000

3000

4000 New, 112(21)χ2, 109(20)True

a 500K counts b 1M counts

c 6.4M counts d 12.8 M counts

F ationsi on (de

HFtFwwOiemapfiIacnti

nsT

�2 test this additional cost is 34.18 s as compared to just 0.01 forthe presented stopping rule. While the stopping rule based on the�2 test needs to evaluate a value associated with a Poisson CDF,F(x|) = e−

∑floor(x)i=0 i/i!, for every measurement bi at each itera-

Table 3Normalized bias �i/x(i) for each ROI for the two methods “new (�2)”.

ig. 6. Profile for the 70th row of the worst images in terms of err from 100 realizteration and deviation to the best stopping iteration are given in the legend, iterati

igher count cases need more iterations to reach the best image.rom Fig. 4(b) it is apparent that for a realization with more countshe optimal stopping iteration is higher, but it is also clear fromig. 4(a) that the iteration can be terminated sooner than optimumhen there is a high number of counts while still obtaining an imagehich is improved as compared to the low count case (err < 20%).n the other hand using more counts the image quality can be

mproved so that err < 10% is achieved. These graphs also show thatrror curves for realizations with fewer counts are sharper at theinimum than those with higher counts, where the curves are rel-

tively flat at the minimum. This means that a small change in theredicted value to stop the iteration may have a much larger impactor low count than large count cases. Hence larger bias and variancen dknew and dk�2 can be tolerated for the cases with large counts.n other words, the performance of a stopping rule for images gener-ted from low count realizations is much more important than for highount situations. This observation supports the conclusion that theew stopping rule outperforms the stopping rule detection usinghe �2 test. Results using the worst case estimates of the stopping


teration are illustrated in Fig. 6.In contrasting the methods it is also pertinent to examine the

umber of counts of the reconstructed image at the predictedtopping iteration. �i/x(i) and di/x(i) are reported for each ROI inables 2 and 3, respectively. The differences between the two stop-

for the new presented stopping rule and the �2 test. The corresponding stoppingviation).

ping rules are significant for the low count case (0.5 and 1 million)but not for the higher count cases (12.8 and 6.4 million). �i/x(i) isillustrated for the low count case for four ROIs in Fig. 7. The newstopping rule may perform marginally better in generating smallregions than the �2 test, as is illustrated for ROI 4 in which thereis a higher percentage of counts which is underestimated whengenerated using the �2 test (Table 4).

The computational cost for one EM iteration iteration is about0.16 s for a 2D image of size 128 × 128 on a PC with 1 GHz CPUand MATLAB code. The stopping rule calculation requires additionaltime per iteration above and beyond the actual EM iteration. For the


500 K 1 M 6.4 M 12.8 M

ROI 1 15.71 (18.21) 9.97 (13.56) −0.55 (0.15) −2.90 (−2.52)ROI 2 −6.13 (−6.83) −4.94 (−5.74) −2.91 (−3.05) −2.38 (−2.48)ROI 3 −7.88 (−8.74) −6.07 (−7.32) −2.89 (−3.09) −2.25 (−2.36)ROI 4 −20.98 (−24.39) −13.41 (−18.36) −4.75 (−5.05) −3.82 (−3.95)




8 H. Guo, R.A. Renaut / Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx

−50 0 500

200

400

600

ROI 1

New rule

χ2

−50 0 500

0.5

1

1.5

2x 10

5 ROI 2

−50 0 500

1

2

3x 10

4 ROI 3

200

400

600

ROI 4

Fig. 7. Histograms of �i/x(i) for each ROI over 10

Table 4Normalized deviation di/x(i) for each ROI for the two methods “new (�2)”.

500 K 1 M 6.4 M 12.8 M

ROI 1 24.35 (24.73) 22.80 (22.19) 26.37 (25.64) 28.73 (28.13)ROI 2 14.92 (14.34) 13.53 (12.66) 10.29 (10.06) 9.13 (8.97)ROI 3 13.58 (13.35) 11.88 (11.63) 8.12 (7.97) 7.07 (6.95)

tGi�w

Fig. 9, and the EM algorithm converges slowly. This is true for anyalgorithm which seeks to minimize h(x) when h(x) becomes very

(k)

Fa

ROI 4 21.42 (24.58) 14.60 (18.58) 8.14 (8.07) 7.85 (7.74)

ion the new approach only requires the evaluation of ONLY ONEaussian density at each iteration. Overall for the EM requires 105


terations for this 2D image, the total cost of the iteration using the2 test to estimate the optimal iteration requires about one hour,hereas the presented rule uses just 17.9 s.

ig. 8. Illustrating the slow decrease in the maximum value of the KT condition for convend (b) the actual pixel values at iteration 500.

−50 0 500

0 realizations of the 1 million count case.

4.2. Discussion

4.2.1. Observations for Kuhn–Tucker conditions and gradientsAs the iteration number increases the EM solution gradually

tends to a solution of the KT equation as shown in Fig. 8(a) for arealization with 12.8 millon counts. But |x(500).*Oh(x(500))| is stillquite large at active pixels; those with positive emissions, evenafter 500 iterations, Fig. 8(b). This occurs because emissions x areusually large. Considering |Oh(x)|, on the other hand, clearly thepartial derivatives are already small (<0.016) at iteration kbest = 95,


flat. In this case, when x is close to XML, the term in the Hessianb./(Px(k))2 is quite small when the scale of Px(k) is comparable withthat of b. The exact EM (i.e. ML) solution is, however, not of interest

rgence. (a) Illustrates the slow decrease of the maximum value for iterations 1–500





M so

bm

4m

eaciamffcibotw

4

aptuiopo

5

wPwmdsrtotf

[

[

[

[

[

[

[hypothesis testing. IEEE Trans Med Imag 1987;6(4):313–9.

Fig. 9. Illustrating absolute values of the partial derivatives of the E

ecause it is not the true solution. An image at an early EM iterationay be closer to the true image than is XEM.

.2.2. Stopping rule for the ordered subset expectationaximization algorithm

The presented stopping rule for the EM algorithm can also beasily implemented in conjunction with OSEM which has beendopted by many scanner manufacturers and is used in many PETenters. In the OSEM method, the complete sinogram is dividednto subsets which are chosen to maximize the associated geometryngles between subsets with the intent that each subset contributesaximum new information. One standard EM iteration is per-

ormed for each subset using as the initial image the image obtainedrom the previous subset iteration. A single OSEM iteration pro-esses all subsets, or has multiple sub-iterations, and updates themage multiple times. But the computational time is still compara-le to a single standard EM iteration, which updates the image oncenly. Because the data in each subset are still Poisson distributed,he presented stopping rule can be used after each sub-iterationithout any changes.

.2.3. Notes for clinical useWhen applied to image reconstruction from data with a rel-

tively low number of total counts, the presented stopping ruleerforms better than accepted reconstruction approaches. In prac-ice the optimal iteration at which to stop the iteration, kbest, isnknown, but the results suggest that the estimate for kbest which

s obtained using the presented stopping rule is to be trusted. More-ver, phantom data can be used for a given scanning environment torovide a reference value for kbest as a function of the total numberf counts for a scan under the same environment.

. Conclusions

This article presents a new rule by which it can be determinedhen to stop the iterations of the EM algorithm when applied for

ET image reconstruction. The iteration should stop at the solutionhich best explains statistically the Poisson nature of the detectoreasurements. This new stopping rule was justified on simulated

ata by comparing with images obtained when the iteration istopped using the �2 test. These simulations show that the new


ule accurately provides a good estimate of the iteration at whicho stop the iteration and outperforms the Pearson’s �2 test in termsf both accuracy (particularly for low count cases) and computa-ional cost. Moreover, the presented approach can be easily adoptedor OSEM reconstruction.

[

[

lution at iteration kbest = 95. In (b) with the background set to zero.

Acknowledgements

This work was supported by Arizona Center for Alzheimer’sDisease Research, funded by the Arizona Department of Health Ser-vices, NIH grant EB 2553301, NSF DMS 0652833, 0513214, 0937737and 0966270. The authors thank Dr. Chi-Chuan Chen for providingthe transition matrix, generated by the Monte Carlo method, and Dr.Yoram Bresler for his constructive suggestion on the experimentaldesign.

References

[1] Gorden R, Bender R, Herman GT. Algebraic reconstruction techniques (ART)for three dimension electron microscopy and X-ray photography. J Theor Biol1970;29:471–81.

[2] Shepp LA, Vardi Y. Maximum likelihood reconstruction for emission tomogra-phy. IEEE Trans Med Imag 1982;MI-1(2):113–22.

[3] Hudson HM, Hutton BF, Larkin R. Accelerated EM reconstruction using orderedsubsets. J Nucl Med 1992;33:960.

[4] Hudson HM, Larkin R. Accelerated imaging reconstruction using orded subsetsof projection data. IEEE Trans Med Imag 1994;13(4):601–9.

[5] Leahy R, Byrne C. Recent developments in iterative image reconstruction forPET and SPECT. IEEE Trans Med Imag 2000;19:257–60.

[6] Veklerov E, Llacer J. Feasible images and practical stopping rules for iter-ative algorithms in emission tomography. IEEE Trans Med Imag 1989;8:186–93.

[7] Holte S, Schmidlin P, Linden A, Rosenqvist G, Eriksson L. Iterative imagereconstruction for positron emission tomography: a study of convergence andquantitation problems. IEEE Trans Nucl Sci 1990;37(2):629–35.

[8] Bissantz N, Mair BA, Munk A. A statistical stopping rule for MLEM reconstruc-tions in PET. In: IEEE nuclear science symposium conference record. 2008. p.4198–200.

[9] Gaitanis A, Kontaxakis G, Spyrou G, Panayiotakis G, Tzanakos G. PET imagereconstruction: a stopping rule for the MLEM algorithm based on properties ofthe updating coefficients. Comput Med Imag Grap 2010;34(2):131–41.

10] Phelps ME, Mazziotta JC, Schelbert HR, editors. Positron emission tomographyand autoradiography. Principles and applications for the brain and heart. RavenPress; 1986.

11] Vogel CR. Computational methods for inverse problems. Philadelphia, PA, USA:Society for Industrial and Applied Mathematics; 2002.

12] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete datavia the EM algorithm. J R Stat Soc Ser B 1977;39:1–38.

13] Richardson WH. Bayesian-based iterative method of image restoration. J OptSoc Am 1972;62(1), 55+.

14] Lucy LB. An iterative technique for the rectification of observed distributions.Astron J 1974;79, 745+.

15] Vardi Y, Shepp LA, Kaufman L. A statistical model for positron emission tomo-grphy. J Am Stat Ass 1985;80:8–20.

16] Veklerov E, Llacer J. Stopping rule for the MLE algorithm based on statistical


17] Coakley K. A cross-validation procedure for stopping the EM algorithmand deconvolution of neutron depth profiling spectra. IEEE Trans Nucl Sci1991;38:9–15.

18] Johnson VE. A note on stopping rules in EM-ML reconstructions of ECT images.IEEE Trans Med Imag 1994;13(3):569–71.


Documents

GModel CMIG-1051; No.of Pages9 ARTICLE IN PRESS ...rosie/mypapers/stopping-rule.pdf · 2 H. Guo, R.A. Renaut / Computerized Medical Imaging and Graphics xxx (2010) xxx–xxx appropriate