Perfusion deconvolution via em algorithm

Mat-2.108 Independent Research Project in Applied Mathematics

Perfusion Deconvolutionvia EM Algorithm

27th January 2004

Helsinki University of TechnologyDepartment of Engineering Physics and Mathematics

Systems Analysis Laboratory

Helsinki Brain Research CenterFunctional Brain Imaging Unit

Tero Tuominen51687J

Contents

List of abbreviations and symbols ii

1 Introduction 1

2 Perfusion Model and Problem Description 42.1 Discretization: 0th order approximation . . . . . . . . . . . . . . . . 52.2 SVD Solution to Deconvolution . . . . . . . . . . . . . . . . . . . . . 62.3 Discretization: 1st order approximation . . . . . . . . . . . . . . . . 6

3 EM Algorithm 83.1 Overview of EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . 83.2 EM Algorithm applied to Perfusion Deconvolution . . . . . . . . . 9

3.2.1 Lange’s Method in PET image reconstruction . . . . . . . . . 93.2.2 Vonken’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Improved application of EM 14

5 This Work 185.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2 Detailed description and the parameters used . . . . . . . . . . . . . 19

6 Results 19

7 Conclusions 23

References 26

i

List of abbreviations and symbols

MRI Magnetic Resonance ImagingfMRI Functional Magnetic Resonance ImagingPWI Perfusion Weighted ImagingEM Expectation MaximumMLE Maximum Likelihood EstimateMTT Mean Transit TimeCBV Cerebral Blood VolumeCBF Cerebral Blood FlowSNR Signal-to-Noise RatioTR Time-to-RepeatTE Time-to-EchoEPI Echo-Planar ImagingAIF a(t) Arterial Input FunctionTCC c(t) Tissue Concentration Curve

r(t) Residue FunctionΨ(t) Impulse Response; Ψ(t) = CBF · r(t)a vector or matrix, a ∈ <n×m, n,m > 1a scalar, a ∈ <A random variablea realizationA random vector or matrixa realization of random vector or matrix

ii

1 Introduction

Since its introduction in 1988 perfusion weighted fMRI has gained widespreadinterest in the field of medical imaging. It offers an easy and - most importantly - anon-invasive method for monitoring brain perfusion and even its minor changesin vivo. General principles of perfusion weighted imaging (PWI) were introducedby Villinger et al. in 1988 [1] and further developed by Rosen et al. in 1989 [2].By injecting a bolus of intravascular paramagnetic contrast agent and observingits first passage concentration-time curves in the brain they were able to gain avaluable insight to functioning of the living organ.

The theory of kinetics of intravascular tracers was developed by Meier andZierler in 1954 [3]. To gain all the knowledge methodologically possible one mustrecover so called impulse response function for each volume of interest. This func-tion characterises the local perfusion properties. According to the work of Meierand Zierler, however, in order to recover this function one must solve an integralequation of the form

c(t) =∫ t

0a(τ)Ψ(t− τ) dτ,

This is a typical equation of class of equatiations known as Fredholm’s integralequations. The integral also represent so called convolution; thus solving thiskind of equation is widely known as deconvolution.

Deconvolution belongs to a class of inversion problems. That is, the theory ofMeier and Zierler (equation above) describes the change in the input function aas it experiences the changes resulting from the properties of the vasculature andlocal perfusion (charactirezied by impulse response Ψ). The result is a new func-tion c. The inverse of this problem emerges when one measures input function aand its counterpart c and asks from what kind of mechanism do these changesoriginate from, i.e. what is the impulse response Ψ.

Several methods have been proposed to solve the inversion problem. Tradi-tional methods such as Fourier and Laplace techniques fail in this case due to thesignificant amount of noise that is present in the measurements. The noisy dataand the form of the problem as a typically hard-to-solve Fredholm’s equationmake an additional requirement for the method used to solve the problem: thesolution has to be recovered so that the effect of noise is either cancelled out orin some other way ignored because an exact solution computed directly from thenoisy data is heavily biased and physiologically meaningless. This fact highlightsthe significance of the physical model which the solution method is based on.

The current standard method is based on an algebraic decomposition methodknown as Singular Value Decomposition (SVD). It requires the discretization ofthe equation and then reqularises the ill-conditioned system of equations by cut-ting off the smallest singular values. The method was introduced to the field by

1

Østergaard et al. [4].An alternative methodology for inversion is based on probabilistic formula-

tion of the model for the problem and then solving it in term of maximum likeli-hood. Such a method was first introduced by Vonken et al. in 1999 [5]. It is basedon the Expectation-Maximum (EM) algorithm developed by Dempster et al. in1977 [6]. The EM algorithm was introduced to the field of medical imaging in-dependently by Shepp and Vardi in 1982 [7] and Lange and Carson in 1984 [8]and further developed by Vardi, Shepp and Kaufman [9]. Vonken’s work reliesheavily on that of Lange’s.

There are four goals for this work. First, there is no comprehensive descrip-tion of the EM-based perfusion deconvolution; Vonken’s paper is very dense andbrief in what comes to the theory. In some parts it is even inaccurate and falselyjustified. So here we try to offer a comprehensive and thorough desription of theEM algorithm and its application. We shall take an excessive care to formulateour presentation in a mathematically fluent form.

Secondly, Vonken tries to base his version of the algorithm on the physicalmodel but fails to some extent. He simplifies on the expense of the physical modelby borrowing one result directly from Lange. The problem is that the result is de-rived assuming Poisson distribution for random variates which in reality follownormal distribution. In this work we correct this assumpition and also the otherinaccuarte parts of Vonken’s work and see wether the results are affected.

Third, we try to repeat Vonken’s results and for this purpose a computer pro-gram had to created. We also implement the proposed changes and try to com-pare their effects. These programs are to be created in such a manner that they canlater serve as research tools at the Helsinki Brain Research Center. The HBRC cur-rently lacks such tools. The comparison of the methods is carried out by MonteCarlo simulations. Since the main interest in this report, however, is in the the-oretical aspects of the EM application we do not concentrate too much on thesimulations and thus they are not meant to fully cover the subject.

The fourth and the last goal for this report is to fulfill to requirements of courseMat-2.108 Independent Research Project in Applied Mathematics at Helsinki Univer-sity of Technology in Systems Analysis Laboratory.

This report is organized as follows. First in chapter 2 the perfusion model andthe problem description are represented. Also the SVD solution method and dis-cretization is dealt with. Then the chapter 3 describes the general EM algorithm. Itis followed by in introductory example of the use of EM in typical problem, thatis, the EM complete-data embedding derived and used by Lange [8] and lateradopted by Vonken [5] is revisited. The aim is to offer a simple example and laygrounds for the later developements and representation of Vonken’s work. Suchderivation is not present even in the original Lange’s article. The next chapter 4is entirely devoted to the derivation of the corrected probabilistic model and the

2

EM algorithm based on it. Since the simplifications used by Vonken are omittedthe derivation is tedious.

The later chapter include the description of the simulation and their results.The last chapter gives the conlusions.

3

2 Perfusion Model and Problem Description

Villinger and Rosen introduced the general principles of MR perfusion imagingin 1988 and 1989 ([1],[2]). Using paramagnetic intravascular contrast agent hewas able to detect measurable change in time series of MR signal S(t). Assum-ing linear relatioship between concentration of a contrast agent c(t) and changein transverse relaxation rate ∆R2 the concentration as a function of time can becharacterized as

c(t) ∝ ∆R2 = − 1

TEln

S(t)

S0

, (1)

where S0 is the baseline intensity of the signal.For intravascular tracer, i.e. tracers that remain strictly inside the vasculature,

theoretical framework for mathematical analysis was developed by Meier andZierler in 1954 [3]. According to their work the concentration of a contrast agentin vasculature as a function of time can be represented as

c(t) = F∫ t

0a(τ)r(t− τ) dτ, (2)

where a(t) is the concentration in large artery (also called Arterial Input Func-tion, AIF) feeding the volyme of interest (VOI). c(t) on the left hand side of theequation 2 typically refers to concentration further in tissue and is thus also calledTissue Concentration Curve or TCC. r(t) is so called residue function which is thefraction of tracer remaining in the system at time t. Formally it is defined as

r(t) = 1−∫ t

0h(s) ds, (3)

where h(t) is the distribution of transit times, i.e. the time a plasma particle takesto travel through the capillary vasculature detectable by dynamic susceptibilitycontranst MRI (DSC-MRI). That is, h(t) is a probability density function. Hencer(t) has the following properties: r(0) = 1 and r(∞) = 0. In practice it is alsopossible that the TCC is delayed be some time td due to the non-zero distancefrom where the AIF was measured to where the TCC is measured. In theory, thisshifts r(t) to right. Hence, more general form of the residue function is

rd(t) =

0 t < tdr(t− td) t ≥ td

(4)

From now on we will use more general rd(t) without explicit statement and de-note it simply as r(t).

In perfusion weighted imaging the TCC c(t) and AIF a(t) are measured. Thegoal is in finding the solution to integral equation 2, i.e finding out the impulse

4

response Ψ(t) = F · r(t). This impulse response characterizes the prorerties of theunderlying vasculature to the extent that is methodologically possible.

In practical PWI the main interest, however, are the parameters MTT and CBF,whos interdependency is characterized by the Central Volume Theorem [3]

CBV = MTT · CBF (5)

MTT is so called Mean Transit Time, i.e. the expectancy of h(t) and CBF is CerebralBlood Flow, that is, F in equation 2. The CBV is simply the area under the c(t)curve. In this work we concentrate on recovering only the CBF. Anyway, for thispurpose the whole impulse response has to be recovered.

2.1 Discretization: 0th order approximation

The measurements of a(t) and c(t) are made in discrete time intervals t0, t1, t2, . . . , tnwhere time between each measurement is ∆t = TR. This represents natural dis-cretization for the problem 2. Traditionally eq. 2 is discretized directly with anassumption that both the a(t) and the c(t) are constants over the time interval∆t [4].

This zeroth order (step function) approximation of the convolution integral 2leads to following linear formulation for the problem

c(tj) = cj =∫ tj

0a(τ)Ψ(tj − τ)dτ ≈ ∆t

j∑i=0

aiΨj−i (6)

where a(ti) = ai and Ψ(tj) = Ψj .By defining matrix a0 ∈ <n×n as

a0 = ∆t

a0 0 · · · 0a1 a0 · · · 0... . . . ...

an an−1 · · · a0

(7)

and discrete versions of Ψ(t) and c(t) as column vectors Ψ ∈ <n×1 and c ∈ <n×1

it is possible to rewrite approximated eq. 6 briefly as

c = a0 ·Ψ (8)

In practice, however, TR is of magnitude of seconds and a(t) varies betweenmagnitude of 10 to 30 within a few seconds. This naturally gives rise to a dis-cretization errors.

5

2.2 SVD Solution to Deconvolution

Traditionally in perfusion fMRI the equation 8 is solved via Singular Value De-composition (SVD) [4]. This regularises typically ill-conditined system of linearequations 8. In general SVD of matrix a ∈ <m×n is

a = U ·D ·V> (9)

where U ∈ <m×m and V ∈ <n×n are orthogonal so that U> ·U = V> ·V = I. I isan identity matrix. D is a diagonal matrix with same dimensionality as a and itselements are so called singular values σin

i=1, i.e. D = diagσi.SVD’s regularizing properties come up simply in inverting the decomposed

matrix a. From 9 it is easy to see that

a−1 = V · diag1/σi ·U> (10)

Now, if singular value are very small, i.e. σi << 1 the inversion becomes instableas the elements in the diagonal grow. Hence a pseudo-inversion is performed incase of small singular values, that is, large elements 1/σi corresponding to smallsingular values σi are simply set to zero. In practise this requires a threshold un-der which singular values are ingored. In case of perfusion inversion this thresh-old has been shown to be 0.2×the largest singular value [4].

SVD solution (pseudo-inverse) is not suitable for approximation representedin next subsection because trapezoidal approximation weightes separate elementsof a differently.

2.3 Discretization: 1st order approximation

The first order (trapezoidal) approximation for the convolution integral 2 is adoptedfrom Jacquez [10]. The measurements of a(t) and c(t) are made in discrete timeintervals t0, t1, t2, . . . , tn. Now 2 at time tj is approximated as

cj ≈∆t

2

j∑i=1

(aj−iΨi + aj−i+1Ψi−1) (11)

Assuming a0 = 0 and defining a1 as

a1 =∆t

2

a1 0 · · · 0a2 2a1 · · · 0a3 2a2 2a1 0... . . . ...

an 2an−1 · · · 2a1

(12)

6

we can write 11 briefly in vector notation as

c = a1 ·Ψ (13)

This does not help in case of SVD solution but might be of assistance where directdiscrete convolution is needed. EM is one of these.

7

3 EM Algorithm

McLachlan encapsulates the essence of EM algorithm as [11]

The Expectation-Maximization (EM) algorithm is a broadly applicableapproach to the iterative computation of maximum likelihood (ML)estimates, useful in a variety of incomplete-data problems [. . . ] Oneach iteration of the EM algorithm, there are two steps – called theexpectation step or the E-Step and the maximization step or the M-step.[. . . ] The notion of ’incomplete-data’ includes the conventional senseof missing data, but it also applies to situations where the completedata represents what would be available from some hypothetical ex-periment. [. . . ] even when a problem does not at first appear to be anincomplete-data one, computation of MLE is often greatly facilitatedby artificially formulating it be as such.

The first general treatment of the EM algorithm was published by Dempster etal. in 1977 [6]. Since then it has been applied in numerous different fields. In per-fusion fMRI it was first used by Vonken et al. in 1999 [5]. Vonken’s work reliesheavily on that of Lange’s in 1984 [8]. Lange, however, applied EM to PET imagereconstruction.

In this chapter first a brief overview of the EM algorithm is offered. It culmi-nates to statemenst of both the E- and M-steps in eqs. 18 and 19. This is followedby introductory overview of Lange’s method [8] which is meant to offer a compre-hensive example of the use of EM in typical problem. Next Vonken’s method [5]is introduced. Excessive care has been taken to formulate the made assumptionin mathematically fluent form.

3.1 Overview of EM Algorithm

Here we offer a brief recap of the EM theory imitating McLachlan’s book [11].Let Y be the random vector corresponding to the observed data y, that is,

y is Y’s realization. Y has probability density function (pdf) g(y;Ψ) where Ψis the vector containing the unknown parameters to be estimated. Respectivelycomplete-data random vector will be denoted by X and respectively its realiza-tion as x. X has the pdf f(x;Ψ).

The complete-data log likelihood function that could be formed for Ψ if xwere fully observable is

ln L(Ψ) = ln f(x;Ψ) (14)

Define h as many-to-one mapping from complete-data sample space X toincomplete-data sample space Y

h : X → Y (15)

8

Now we do not observe complete-data x in X but instead incomplete-datay = h(x) in Y . Thus,

g(y;Ψ) =∫X (y)

f(x;Ψ) dx, (16)

where X (y) is the subset of the complete-data sample space X determined by theequation y = h(x).

The eq. 16 in discrete form is

g(y;Ψ) =∑

x:h(x)=y

f(x;Ψ) (17)

Problem here is to solve incomplete-data (observable-data) log likelihood max-imization. The main idea of EM is to solve it in terms of the complete-data rep-resentation L(Ψ) = f(x;Ψ). As it is unobservable it is replaced by its conditionalexpectation given y and current fit for Ψ which at iteration n is denoted by Ψ(n).In other words, the entire likelihood function is replaced by its conditional expec-tation, not merely complete-data variates.

To crystallize the heuristic EM approach to concrete steps we have the follow-ing:

First choose an initial value/guess Ψ(0) for the iteration to begin with.Next carry out the the E-step i.e. calculate the conditional expectation of the

log likelihood function given the current parameter estimate Ψ(n) and the obser-vations y

Q(Ψ;Ψ(n)) = EΨ(n) [ ln L(Ψ) | y,Ψ(n) ] (18)

Finally the M-step: maximize Q(Ψ;Ψ(n)) with respect to the parameters Ψ

Ψ(n+1) = arg maxΨ

Q(Ψ;Ψ(n)) (19)

Now, if there are terms independent of Ψ in eq. 19 they do not contribute tonew Ψ(n+1) because they drop out in derivation (i.e. maximization) with respectto Ψ. In some cases this eases the derivation.

3.2 EM Algorithm applied to Perfusion Deconvolution

3.2.1 Lange’s Method in PET image reconstruction

Here we review Lange’s derivation of his version of the physically based EMalgorithm. It is meant to serve as an introductory example and to clarify the useof EM in practise.

The idea in PET is to recover the values of the emission intensity Ψj when onesees only the sum of the emission over a finite time interval. Let the number ofemissions from pixel j during projection i be the random variate Xij

Xij ∼ Poisson(cijΨj) (20)

9

where cij’s are assumed to be known constants. Next define the observable quan-tity, i.e. their sum, be the number of emission recorded for projection i as therandom variate Yi

Yi =∑j

Xij (21)

HenceYij ∼ Poisson(

∑j

cijΨj) (22)

From 20 it follows that

P [Xij = xij] =(cijΨj)

xij

xij!e−cijΨj (23)

and sof(x;Ψ) =

∏i

∏j

P [Xij = xij] (24)

Thus with 14 we have

ln L(Ψ) =∑

i

∑j

xij ln(cijΨj)− cijΨj − ln xij! (25)

and eq. 18 yields based on the linearity of the expectation

Q(Ψ;Ψ(n)) = EΨ(n) [ ln L(Ψ) | y,Ψ(n) ]

=∑

i

∑j

E[ Xij | y,Ψ(n) ] ln(cijΨj)− cijΨj + R (26)

R does not depend on Ψ. It includes the term E[ ln Xij! | y,Ψ(n) ] which wouldbe difficult to calculate.

Conditional expectation can be derived as follows

E[ Xij | y,Ψ(n) ] =yi∑

k=0

k · P [ Xij = k | y,Ψ(n) ] (27)

where

P [ Xij = k | y,Ψ(n) ] =P [Xij = k, Yi = yi]

P [Yi = yi]

=P [Xij = k,

∑p\j Xip = yi − k]

P [Yi = yi]

=

(yi

k

)(cijΨ

(n)j )k (

∑p\j cipΨ

(n)p )yi−k

(∑

p cipΨ(n)p )yi

(28)

10

because Ψ(n) is a parameter vector and Xij is independent of other Yjs except ofthe Yi to which itself contributes. Substituting this to eq. 27 and using

n∑k=0

(n

k

)akbn−k = (a + b)n (29)

and (yi

k

)k = yi

(yi − 1

k − 1

), yi ≥ k > 1 (30)

we finally get the conditional expectation for Xij and denote it by Nij

Nij = E[ Xij | y,Ψ(n) ] =yicijΨ

(n)j∑

p cipΨ(n)p

(31)

Now, if the initial guess Ψ(0) is positive then Nijs are all positive. Hence E-step iscompleted and yields

Q(Ψ;Ψ(n)) =∑

i

∑j

Nij ln(cijΨj)− cijΨj + R (32)

Now M-step is performed by derivating eq. 32 with respect to Ψ and equatingits derivatives to zero. Derivation yields

∂

∂Ψj

Q(Ψ;Ψ(n)) =∑

i

Nij

Ψj

−∑

i

cij (33)

and setting it to zero and solving for Ψj yields the new estimate Ψ(n+1)j

Ψ(n+1)j =

∑i Nij∑i cij

=Ψ

(n)j∑i cij

∑i

yicij∑p cipΨ

(n)p

(34)

This solution truly maximizes Q. It can be seen as follows. Q’s second derivativeis

∂2

∂Ψi∂Ψj

Q(Ψ;Ψ(n)) = −∑j

Nij

Ψ2j

(35)

when i = j and zero otherwise. Thus the quadratic form Ψ>H(Ψ)Ψ, where Hdenotes the Hessian matrix of Q, is strictly negative for all Ψj ≥ 0. That is, theeq. 34 represents the point of concave function where its gradient is equal to zerovector.

11

3.2.2 Vonken’s Method

Here we review the application of the EM algorithm to perfusion weighted fMRIpublished by Vonken in 1999 [5]. First the article is briefly referred and then someof its flaws are pointed out. The notation is changed to correspond this documentbut no changes beyond this have been made. In the next section we try to offermore exact and thorough treatment of the subject and correct the contradictionsin Vonken’s work.

Vonken starts by defining the convolution operator a as a square matrix whoseelements are defined as

aij =

Ai−j if i− j ≥ 00 otherwise (36)

where Ai−j denotes AIF at time ti − tj , i.e. A(ti − tj). Thus the operator a corre-sponds to the zeroth order approximation for the convolution integral, i.e. eq 8 inpage 5.

The next two steps are responsible for cleverly formulating the complete-dataembedding. For this purpose Vonken assumes two distributions, one for the com-plete and one for the observed data. The first one has the pdf f(X;Ψ) and it isassumed to follow the normal distribution. The observed data is also assumed tofollow the normal distribution. Its pdf is g(C;Ψ). These normality assumption aresatisfactorily justified; especially the normality of C is treated thoroughly. FirstVonken defines the elements of complete-data matrix as

xij = aijΨj (37)

and then naturally the linkage to the observed (incomplete-) data as

ci =∑k

xik =∑k

aikΨk (38)

The notation for current estimate of cis based on the current estimate Ψ(n) is

ci =∑k

aikΨ(n)k (39)

Next Vonken moves onwards to define the complete-data log likelihood func-tion based on the assumption that the complete-data xij are distributed normally,i.e. Xij ∼ N(aijΨj, σ

2ij)· The variances σ2

ij are later taken to be equal and after allin the M-step they cancel out. He says:

" . . . using Eq. 38 and the expectancy E[Xij|c,Ψ(n)] = ci·aijΨ(n)j /

∑j aijΨ

(n)j

= ci/ci ≡ Nij . This gives

E[ln f(X;Ψ)|c,Ψ(n)] =∑

i

∑j

ln P [Xij] = −∑

i

∑j

(aijΨ(n)j −Nij)

2/2σ2ij

12

with P [Xij] the probability of Xij and σij the standard deviation in thecomplete-data representation."

From this Vonken proceeds to the M-step. He takes the derivative of the condi-tional expectation above and equates this to zero. This yield a set of equations∑

i

aij(aijΨ(n)j −Nij) = 0 (40)

i.e. an equation for each Ψ(n+1)j . To finish, Vonken says: "A program has been

implemented that numerically solves Eq. 40 using a Newton-Raphson scheme."The above summarization is not meant to be a complete description of the

Vonken’s article; rather it tries to describe the essential points of his derivation inorder to illustrate the the facts that are to be changed here. Here are the pointsthat seem to need changes.

First, Vonken’s notation could be more exact. He does not make notationaldifference between random variates and their realizations. This might be a con-sequence of the Lange’s work being the reference point throughout his work.

Secondly, more explicit expression of the assumpition used migth clarify thederivation. Especially, even though Vonken let’s the reader to believe that the en-tire derivation is faithfully based on the normality assumptions there is one pointwhere this is not the case. Namely when Vonken takes the conditional expecta-tion E[Xij|c,Ψ(n)] he does not mention its origins. In fact it is taken directly fromLange [8]. The result, however, is derived based on the assumption of Poissondistribution Xij ∼ Poisson(aijΨ

(n)j ). This may serve as a satisfactory approxima-

tion but is clearly incorrect and ungrounded here. Vonken’s obvious goal is to tryto ground his work on the physical model like Lange but here he deviated fromthis without any explanation.

Finally, the calculation of the log likelihood of the complete-data is guestion-albe. In EM theory the conditional expectation is taken from the entire log likeli-hood function ln L(Ψ) as stated in eq. 18. If the log likelihood function is linearin x in terms containing the parameter Ψj the result looks just like the xijs hadsimply been replaced by their conditional expectations. For an example see eq. 25in page 10. Here, however, the normality assumption leads to non-linear term(aijΨ

(n)j − xij)

2 whose conditional expectation with notation E[Xij|c,Ψ(n)] ≡ Nij

is not (aijΨ(n)j −Nij)

2 as derived by Vonken. This might be the explanation for thefast and sometimes instable convergence of the algorithm.

13

4 Improved application of EM

Vonken’s reasoning in complete-data embedding is adopted and the convolutionoperator a is defined as

aij =

Ai−j if i− j ≥ 00 otherwise (41)

where Ai−j denotes AIF at time ti − tj , i.e. A(ti − tj). Thus a represents a zerothorder approximation for the convolution integral as eq. 8 on page 5 shows.

The distribution of the measured values of time-series of AIF and TCC is as-sumed the be normal as Vonken argued. This is also intuitively appealing as thevalues at issue are measurement values of a physical quantity after almost lineartransformation.

Now A refers to both the convolution matrix which is treated as a randomvariate (matrix) and also to the random vector of AIF values. After measurementA is realized as a; first as the AIF and then after transfomation 41 also as theconvolution operator a. These two differ only at the notational level: aj refers tothe element of AIF whereas aij is an element of the operator 41.

Based on the previous reasoning the AIF values Ai are assumed to be normallydistributed around its mean which will be notated here with parameter µi, i.e.E[Aij] = µij . Later when the actual measurements are made and the developedalgorithm will be used to recover the residual this parameter will be replaced bythe measured aij , i.e Aij’s realization. The variance associated with the parameternaturally is σ2

AIF . Explicitly

Aij ∼ N(µij, σ2AIF ) (42)

From this the distribution of the complete-data elements Xij can be easily de-rived. They are defined as Xij = AijΨj and thus

Xij ∼ N(µijΨj, (ΨjσAIF )2) (43)

Thus the complete-data pdf is of the familiar exponential form and is from nowon denoted by fX(x;Ψ).

Now as the observed-data are defined as

Ci =∑k

Xik (44)

we haveCi ∼ N(

∑k

µikΨk,∑k

(ΨkσAIF )2) (45)

From now on the pdf of random observed data vector C is denoted by gC(c;Ψ).

14

From eq. 43 one can easily formulate the complete-data log likelihood functionwhich is needed in the E-step

ln L(Ψ) =∑

i

∑j

− ln(√

2π√

(ΨjσAIF )2)− (µijΨj − xij)2

2(ΨjσAIF )2 (46)

Writing the binomial open and denoting by R the terms independent of Ψ theconditional expectation of the log likelihood can be written as

Q(Ψ;Ψ(n)) = EΨ(n) [ ln L(Ψ) | c,Ψ(n) ]

=∑

i

∑j

− ln(

√2π√

(ΨjσAIF )2) +µij

Ψjσ2AIF

E[ Xij | c,Ψ(n) ]

− 1

2(ΨjσAIF )2E[ X2

ij | c,Ψ(n) ]

+ R (47)

From eq. 47 it is clear that two different conditional expectations are needed:

E[ Xij | c,Ψ(n) ] =∫<

xij fX|C,Ψ(n)(xij|ci,Ψ(n)) dxij (48)

E[ X2ij | c,Ψ(n) ] =

∫<

x2ij fX|C,Ψ(n)(xij|ci,Ψ

(n)) dxij (49)

where fX|C,Ψ(n)(xij|ci, Ψ(n)) refers to the current conditional pdf of Xij given c and

Ψ(n). This can be found using basic property familiar from the probability theory

fX|C,Ψ(n)(xij|ci,Ψ(n)) = gC|X,Ψ(n)(ci|xij,Ψ

(n))fX|Ψ(n)(xij|Ψ(n))

gC|Ψ(n)(ci|Ψ(n))(50)

where gC|X,Ψ(n)(ci|xij,Ψ(n)) refers respectively to the conditional pdf of Ci given

xij and current Ψ(n). fX(xij) and gC(ci) are merely the pdfs of Xij and Ci.The functions in eq. 50 expressed explicitly are:

fX|Ψ(n)(xij|Ψ(n)) =1

√2π√

(Ψ(n)j σAIF )2

exp(−(µijΨ

(n)j − xij)

2

2(Ψ(n)j σAIF )2

) (51)

and

gC|Ψ(n)(ci|Ψ(n)) =1

√2π√∑

k(Ψ(n)k σAIF )2

exp(−(∑

k µikΨ(n)k − ci)

2

2∑

k(Ψ(n)k σAIF )2

) (52)

and

gC|X,Ψ(n)(ci|xij,Ψ(n)) =

1√

2π√∑

k\j(Ψ(n)k σAIF )2

exp(−(∑

k\j µikΨ(n)k + xij − ci)

2

2∑

k\j(Ψ(n)k σAIF )2

)

(53)

15

The notation∑

k\j means that the sum is taken over all k exept j, in otherwords

∑k\j zk =

∑k zk − zj .

From equations 50 through 53 it is obvious that the results will get messy.Therefore we define the following short-hand notations

µijΨ(n)j = γij

(Ψ(n)j σAIF )2 = αj∑

k

(Ψ(n)k σAIF )2 =

∑α

∑k\j

(Ψ(n)k σAIF )2 =

∑βj

∑k

(µikΨ(n)k ) =

∑γi∑

k\j(µikΨ

(n)k ) =

∑δij

One must not confuse Ψ and Ψ(n) because the maximization in the M-step re-quires derivation of Q with respect to each Ψj and iterates Ψ

(n)j are treated as con-

stant parameters. Hence 52 has no dependency on xij it will be denoted merelyby gC(ci) in the future.

The conditional expectations yield with defined notation

E[ Xij | c,Ψ(n) ] =ciαj + γ

∑βj − αj

∑δij√

2πgC(ci)∑

α−3/2exp(−(

∑γi − ci)

2

2∑

α) (54)

and

E[ X2ij | c,Ψ(n) ] =

1√2πgC(ci)

∑α5/2

(ciαj)

2 + (γij

∑βj)

2 +

+2ciαj(γij

∑βj − αj

∑δij) +

+αj

∑βj(∑

βj − 2γij

∑δij) +

+α2j (∑

βj +∑

δ2ij)

exp(−(∑

γi − ci)2

2∑

α) (55)

Substituting these to Q (eq. 47) have have completed the E-step.Now, Q is of the form

Q(Ψ;Ψ(n)) =∑

i

∑j

Kij(Ψj) (56)

thus the derivation with respect to each Ψj yields

∂Q(Ψ;Ψ(n))

∂Ψj

=∑

i

∂Kij(Ψj)

∂Ψj

(57)

16

where the derivative of Kij(Ψj) can be written as

∂Kij(Ψj)

∂Ψj

= ΛijΨ−3j − ΩijΨ

−2j −Ψ−1

j (58)

where we have again defined the following short-hand notations

Λij =1√

2πσ2AIF gC(ci)

∑α5/2

(ciαj)

2 + (γij

∑βj)

2 +

+2ciαj(γij

∑βj − αj

∑δij) +

+αj

∑βj(∑

βj − 2γij

∑δij) +

+α2j (∑

βj +∑

δ2ij)

exp(−(∑

γi − ci)2

2∑

α) (59)

and

Ωij =µij(ciαj + γ

∑βj − αj

∑δij)√

2πσ2AIF gC(ci)

∑α−3/2

exp(−(∑

γi − ci)2

2∑

α) (60)

Hence after the summation over i and multiplication by Ψ3j in eq. 56 we have the

equation for the root of the derivative eq. 57

Ψ2j

∑i

1 + Ψj

∑i

Ωij −∑

i

Λij = 0 (61)

This second-degree equation can easily be solve for Ψj . Choosing the positiveroot we have

Ψ(n+1)j = Ψj =

−∑i Ωij +√

(∑

i Ωij)2 + 4∑

i 1∑

i Λij

2∑

i 1(62)

17

5 This Work

5.1 Overview

The two main goals of this report are to descripe the EM-based deconvolutionmethod published by Vonken [5] and try to improve it and then to evaluate themade changes by simulations. For this purpose both the Vonken’s methdod andthe new method were implemented on the MATLAB platform. Both methodswere also implemented using both the 0th and then the 1st order approximationsfor the convolution integral. Therefore in total four different methods were to beevaluated.

As stated in the introduction, however, this report concentrates mainly onthe theoretical aspects and the full evaluation is not included. Instead only thereproductibility of CBF was studied.

The methods are evaluated using Monte Carlo simulation. For this purposethe true values of AIF, TCC and impulse response Ψ have to be known. Thiswas achieved by creating a numerical integrator which computes the "true" TCCbased on given AIF and impulse response using eq. 2. This avoids the effect ofdiscretization errors arising from discretized eq. 8, for example. This method alsoenables us easily to change all the parameters affecting the implulse response;most importantly the delay is not binded to multiples of TR.

After the true functions are know the gaussian noise is added to TCC usingeq. 1. This noisy TCC is then used when performing the deconvolution by themethods to be tested. Numerical values used in this work were S0 = 300 andk = 1. Signal-to-Noise ratio was set to clinically interesting values of SNR = 35.

Vonken reported difficulties in deciding the optimal number of iterations needed.In his clinical experiment he used four iterations. This number is adoted here,also. This number is without any further investigation used for both the zerothand the first order approximations.

However, the convergence properties of the algorithm change dramaticallywhen the proposed changes are implemented. Empirically (try-and-error) the fol-lowing iteration numbers were found: the zeroth order approximation was iter-ated 100 times whereas in the case of the first order approximation the maximumnumber of iterations was set to 400.

Another problematic area not described by Vonken was the tendency of therecovered impulse response to "upraise its tail". In other word the convergenceproduced nearly always an impulse response whose last and sometime eventhe second-to-last elements were clearly incorrectly large. This, however, did notseem to affect the previous elements. The same was observed in the case of thenew EM version. This may result in erroneously determined CBF if the tail riseshigher than the true maximum of the impulse response. To overcome this diffi-

18

culty in CBF estimation the last four elements of the estimated impulse responsewere simply put to zero.

The new algorith was found to suffer minor numerical instabilities. The val-ues of eq. 52 are typicall very small and in case of initial guess that differs greatlyfrom the measured data the values of eq. 52 become too small for the available ac-curacy. Therefore a good initial guess is needed. To guarantee equal treatment ofall methods a common initial guess was set to constant function of value .02. Theinsufficienf numerical accuracy, however, was in some cases so severe that some-times (very rarely, in present simulations the occurence frequency was 5 timesout of 13 · 512 = 6656 simulations) the algorithm could not procede and in suchcases was set to produce a NaN (Not a Number) result. The mean and standarddeviations of the estimates were calculated ignoring these NaN values.

5.2 Detailed description and the parameters used

There were two different sets of simulations: one with zero delay (td = 0) andother with 2.7 seconds delay,i.e. td = 2.7 in eq. 4. Both were carried out in sim-ilar manner. The CBF was varied between 0.01 and .13 [arbitary units] with .01intervals. At each flow level 512 different noisy TCCs were generated and eachof them was deconvolved with every method. The average CBF estimate and itsstandard deviation was then calculated. The original residue function (see eq. 2pp. 4) was generated from h(t) of the form

h(t) =Γ(α + β)

Γ(α)Γ(β)(t1 − t0)

1−α−β(t− t0)α−1(t1 − t)β−1

with empirically seems to be reasonable model for h(t) [12]. The numerical valueswere set to t0 = 0, t1 = 8, α = 2, 3 and β = 3, 8 corresponding physiologicallytypical to MTT ≈ 3s. The AIF was modeled as a gamma-variate function of theform

AIF (t) = a(t− t0)be−

(t−t0)

c

where now a = 2, b = 4 and c = 1, 1. All the time TR was kept at 1,5s. Forcomparison also SVD solution was calculated.

The simulation were very time consuming. Each of the two set describedabove took nearly two days to complete on 2,4GHz AMD platform.

6 Results

The simulation results are depicted in figures 1 and 2 on pages 21 and 22. The firstone depicts a normal case whereas in the latter one TCC is delayed by 2.7 seconds.

19

The similar results for the standard SVD deconvolution method are show in fig-ure 3 on page 23. There are four pictures corresponding to each four differentversions of the EM based deconvolution: first two (upper row) depict the perfor-mance of the new EM algorithm using both the zeroth order and the first orderapproximation for the convolution integral. The lower row respectively depictsthe performance of the Vonken’s EM algorithm in both the zeroth and the firstorder cases.

The two most eye-catching features are the enormous standard deviation ofthe traditional Vonken’s EM based CBF estimate and the tendency of the Vonken’soriginal algorithm to yield dramatically overestimated CBF estimates in low CBFvalues. Standard deviations of such magnitude were not reported in Vonken’soriginal paper. Neither was the obviously incorrect convergence in the low CBFvalues. Since the last elements of the impulse responses recovered here were setto zero this huge variation in CBF estimates has to originate from the physicallymeaningfull part of the impulse responses.

The principal differences between the results obtained by different methodsare as follows. In case where no delay is present the original Vonken’s algorithmseem to provide equal results as the new zeroth order version developed here.Despite the major difference in the standard deviation the means of the resultsseem equal. The simultaneous appearance of the huge change in the standarddeviation and smaller change in the mean CBF value may indicate the existenceof few major out-liers.

The effect of the first order approximation for the convolution integral resultsin more loyal estimate of the CBF. In both cases - traditional and new EM decon-volution - the estimated CBF seems follow the true value well. The new version,however, is prone to overestimation. The original version of the EM deconvolu-tion equipped with the more accurete approximation, however, yields very goodresults. The constant overestimation of the new algorithm may be a result of poorselection of the number of maximum iterations.

The presence of 2,7 seconds delay in general deteriorates the performance ofboth methods. The standard deviations are not affected but the CBF estimates arelower throughout the range than before. The new algorithm with higher orderapproximation (upper right corner in figure 2 in pp. 22), however, gives extremelygood results withmodest standard deviation. However, the biased estimation inno-delay-situation and the behaviour of the original algorithm with the higherorder approximation suggest that here a bias is compensated by another bias.

20

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2new EM−d0 CBF

true CBF [arb.units]

estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2new EM−d1 CBF


estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2trad. EM−d0 CBF


estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1



estim

ated

CB

F [a

rb.u

nits

]

Figure 1: Simulation results in case there is no delay (td = 0). Pictures in upperrow correspond to the new version of the EM algorithm whereas the lower rowcorresponds to the original Vonken’s version. The left pictures are computed withthe original zeroth order convolution integral approximation but in right ones thelinear approximation is used. The thiker lines give the mean of the deconvolvedCBF estimate vs. the true CBF. The dashed lines are the mean of the CBF ± theirstandard deviation. The dotted line corresponds the perfect match.

21

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2new EM−d0 CBF


estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2new EM−d1 CBF


estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1



estim

ated

CB

F [a

rb.u

nits

]

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1



estim

ated

CB

F [a

rb.u

nits

]

Figure 2: Simulation results in case there is 2.7 seconds delay (td = 2.7). Pictures inupper row correspond to the new version of the EM algorithm whereas the lowerrow corresponds to the original Vonken’s version. The left pictures are computedwith the original zeroth order convolution integral approximation but in rightones the linear approximation is used. The thiker lines give the mean of the de-convolved CBF estimate vs. the true CBF. The dashed lines are the mean of theCBF ± their standard deviation. The dotted line corresponds the perfect match.

22

0.02 0.04 0.06 0.08 0.1 0.12

0

0.1

0.2SVD−CBF norm. & delayd


estim

ated

CB

F [a

rb.u

nits

]

Figure 3: With and without delay SVD. Dashed line is with delay

7 Conclusions

In this work the EM based deconvolution method developed by Vonken et al. [5]was reviewed. Also some theoretical backgrounds were given and attention waspaid especially to discretization accuracy. Some flaws of Vonken’s article werepointed out and corrected. This resulted in an entirely new version of the EMdeconvolution algorithm.

The new EM based algorithm was tedious to derive. First major change withrespect to that of Vonken’s was in implementing the more natural and bettergrounded normality assumption concerning the distribution of the complete-datavariates. Second, more fundamental change was done amending Vonken’s con-ditional expectancy of the complete-data log likelihood function. This is likely tobe the source of the different convergence properties of the new algorithm.

After implementing the first order approximation and the new version of thealgorithm there were four different versions of the algorithm to be tested. Sim-ulations were carried out with and without delay between AIF and TCC. Forcomparison, also traditional SVD deconvolution was carried out.

The results were surprising. First of all, the strange behaviour of Vonken’soriginal algorithm is in contrast of that reported in his original article. It seems

23

to be prone to dramatically overestimate the low CBF values and in addition tothat it suffers from large standard deviation. These are likely to originate from thewrongly derived equation 40 on page 13.

The new version of the algorithm converges much slowlier and hence requiresmore iterations to be used. Neither the optimal number of iterations nor the ini-tial guess were subjects here. Regardless of that the results were promising. Thestandard deviation was of the same magnitude as that of SVD’s. In fact, the ze-roth order approximation yielded almost identical results as SVD did. The useof first order approximation resulted in minor overestimation of the CBF but no-table here is that the magnitude of the bias does not change as the CBF does.The higher order approximation, however, results in somewhat greater standarddeviation of the estimate. The absence of the improvement due to higher orderapproximation in the case of delayed TCC, however, suggest that the excellentperformance of the new algorithm with higher order approximation results fromone bias being compensated by another.

After all, developements described in this report seem promising. They wereable to guarantee nearly certain convergence with modest spread of CBF esti-mates. A clear improvement with respect to Vonken’s original algorithm wasrecorded. The price paid was slower convergence and longer computation time.Further research still has to be carried out. The reproductibility of the full im-pulse response is of great importance in some applications. The effect of differentdelays and especially different shapes of residual function also remain to be in-vestigated.

24

References

[1] A. Villringer, B. Rosen, J. Belliveau, J. Ackerman, R. Lauffer, R. Buxton,Y. Chao, V. Wedeen, and T. Brady, “Dynamic imaging with lanthanidechelates in normal brain: contrast due to magnetic susceptibility effects,”Magnetic Resonance In Medicine, vol. 6, no. 2, pp. 164–174, 1988.

[2] B. R. Rosen, J. W. Belliveau, and D. Chien, “Perfusion Imaging by NuclearMagnetic Resonance,” Magnetic Resonance Quaterly, vol. 5, no. 4, pp. 263–281,1989.

[3] P. Meier and K. L. Zierler, “On the Theory of the Indicator-Dilution Methodfor Measurement of Blood Flow and Volume,” Journal of Applied Physiology,vol. 6, no. 12, pp. 731–744, 1954.

[4] L. Østergaard, R. M. Weisskoff, D. A. Chesler, C. Gyldensted, and B. R.Rosen, “High Resolution Measurement of Cerebral Blood Flow using In-travascular Tracer Bolus Passages. Part 1: Mathematical Approach and Sta-tistical Analysis,” Magnetic Resonance in Medicine, vol. 36, pp. 715–725, 1996.

[5] E.-J. P. Vonken, F. J. Beekman, C. J. Bakker, and M. A. Viergever, “Maxi-mum Likelihood Estimation of Cerebral Blood Flow in Dynamic Suscepti-bility Contrast MRI,” Magnetic Resonance in Medicine, vol. 41, pp. 343–350,1999.

[6] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood fromIncomplete Data via EM Algorithm,” Journal of the Royal Statistical Society.Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.

[7] L. A. Shepp and Y. Vardi, “Maximum Likelihood Reconstruction for Emis-sion Tomography,” IEEE Transactions on Medical Imaging, vol. 1, pp. 113–122,1982.

[8] K. Lange and R. Carson, “EM Reconstruction Algorithms for Emission andTrasmission Tomography,” Journal of Computer Assisted Tomography, vol. 8,no. 2, pp. 306–316, 1984.

[9] Y. Vardi, L. A. Shepp, and L. Kaufman, “A Statistical Model for PositronEmission Tomography,” Journal of the American Statistical Association, vol. 80,no. 389, pp. 8–20, 1985.

[10] J. A. Jacquez, Compartmental Analysis in Biology and Medicine. The Universityof Michigan Press, 2 ed., 1985.

25

[11] G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. WileySeries in Probability and Statistics, Wiley, 1997.

[12] L. Østergaard, D. A. Chesler, R. M. Weisskoff, A. G. Sorensen, and B. R.Rosen, “Modeling Cerebral Blood Flow and Flow Heterogenity From Mag-netic Resonance Residue Data,” Journal of Cerebral Blood Flow and Metabolism,vol. 19, pp. 690–699, 1999.

26

Documents

Perfusion deconvolution via em algorithm