2
Convergence rate for the Bayesian inversion theory Andreas Neubauer 1 and Hanna K. Pikkarainen ∗∗2 1 Industrial Mathematics Institute, Johannes Kepler University Linz, Altenbergerstrasse 69, A-4040 Linz, Austria. 2 Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Sciences, Altenbergerstrasse 69, A-4040 Linz, Austria. Recently, the metrics of Ky Fan and Prokhorov were introduced as a tool for studying convergence of regularization methods for stochastic ill-posed problems. In this work, we examine the Bayesian approach to linear inverse problems in this new framework. We consider the finite-dimensional case where the measurements are disturbed by an additive normal noise and the prior distribution is normal. A convergence rate result for the posterior distribution is obtained when the covariance matrices are proportional to the identity matrix. 1 Introduction We are interested in solving the linear inverse problem y = Ax (1) where A R m×n is a known matrix, x R n and y R m . Given the exact data y, the least squares minimum norm solution x of problem (1) is defined as follows: x minimizes the residual Ax y and among all minimizers it has the minimal Euclidean norm. For the linear problem (1) the least squares minimum norm solution is x = A y where A is the Moore–Penrose inverse of the matrix A. We assume that the measurements are disturbed by an additive noise. If the matrix A is ill-conditioned, the observed inexact data y data cannot be used directly to infer an approximate solution A y data of (1) but some regularization technique must be applied. In this work the Bayesian inversion theory is utilized to obtain a regularized solution of (1). In the Bayesian approach the solution of an inverse problem is obtained via the Bayes formula. The prior information about the quantities of primary interest is presented in the form of the prior distribution. The likelihood function is given by the model for the indirect measurements. The solution of the inverse problem after performing the measurements is the posterior distribution of the random variables of interest. By the Bayes formula the posterior probability density is proportional to the product of the prior probability density and the likelihood function. For a comprehensive introduction into the Bayesian inversion theory see [1]. We assume that indirect measurements are described by a linear model with additive noise Y = AX + E where X , Y , and E are random variables from a probability space (Ω, F , P) to R n and R m , respectively. Let x 0 R n and γ,σ > 0. We suppose that X and E are mutually independent normal random variables with distributions N (x 0 2 I ) and N (02 I ), respectively. By the Bayes formula (cf. [1, Theorem 3.7]) the posterior distribution µ post of X with the data y data is the normal distribution N (x post , Γ post ) where x post =(A T A + σ 2 γ 2 I ) 1 (A T y data + σ 2 γ 2 x 0 ) and Γ post = σ 2 (A T A + σ 2 γ 2 I ) 1 . The data y data is a realization of the random variable y + E. Thus the posterior mean x post is also a realization of a random variable, namely the random variable X post (ω)=(A T A + σ 2 γ 2 I ) 1 (A T (y + E(ω)) + σ 2 γ 2 x 0 ). The posterior covariance matrix Γ post is deterministic. Hence the posterior distribution µ post is a realization of the random variable M post : (Ω, F , P) (M(R n )P ), ω →N (X post (ω), Γ post ) where M(R n ) is the set of all Borel measures in R n and ρ P is the Prokhorov metric in M(R n ) definded as follows: E-mail: [email protected], Phone: +43 732 2468 9222 ∗∗ Corresponding author: E-mail: [email protected], Phone: +43 732 2468 5233, Fax: +43 732 2468 5212 PAMM · Proc. Appl. Math. Mech. 7, 1080103–1080104 (2007) / DOI 10.1002/pamm.200700275 © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Convergence rate for the Bayesian inversion theory

Embed Size (px)

Citation preview

Convergence rate for the Bayesian inversion theory

Andreas Neubauer∗1 and Hanna K. Pikkarainen∗∗2

1 Industrial Mathematics Institute, Johannes Kepler University Linz, Altenbergerstrasse 69, A-4040 Linz, Austria.2 Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Sciences, Altenbergerstrasse 69,

A-4040 Linz, Austria.

Recently, the metrics of Ky Fan and Prokhorov were introduced as a tool for studying convergence of regularization methodsfor stochastic ill-posed problems. In this work, we examine the Bayesian approach to linear inverse problems in this newframework. We consider the finite-dimensional case where the measurements are disturbed by an additive normal noise andthe prior distribution is normal. A convergence rate result for the posterior distribution is obtained when the covariancematrices are proportional to the identity matrix.

1 Introduction

We are interested in solving the linear inverse problem

y = Ax (1)

where A ∈ Rm×n is a known matrix, x ∈ R

n and y ∈ Rm. Given the exact data y, the least squares minimum norm

solution x† of problem (1) is defined as follows: x† minimizes the residual ‖Ax − y‖ and among all minimizers it has theminimal Euclidean norm. For the linear problem (1) the least squares minimum norm solution is x† = A†y where A† is theMoore–Penrose inverse of the matrix A.

We assume that the measurements are disturbed by an additive noise. If the matrix A is ill-conditioned, the observedinexact data ydata cannot be used directly to infer an approximate solution A†ydata of (1) but some regularization technique mustbe applied. In this work the Bayesian inversion theory is utilized to obtain a regularized solution of (1).

In the Bayesian approach the solution of an inverse problem is obtained via the Bayes formula. The prior information aboutthe quantities of primary interest is presented in the form of the prior distribution. The likelihood function is given by themodel for the indirect measurements. The solution of the inverse problem after performing the measurements is the posteriordistribution of the random variables of interest. By the Bayes formula the posterior probability density is proportional tothe product of the prior probability density and the likelihood function. For a comprehensive introduction into the Bayesianinversion theory see [1].

We assume that indirect measurements are described by a linear model with additive noise

Y = AX + E

where X , Y , and E are random variables from a probability space (Ω,F , P) to Rn and R

m, respectively. Let x0 ∈ Rn and

γ, σ > 0. We suppose that X and E are mutually independent normal random variables with distributions N (x0, γ2I) and

N (0, σ2I), respectively. By the Bayes formula (cf. [1, Theorem 3.7]) the posterior distribution µpost of X with the data ydata isthe normal distribution N (xpost, Γpost) where

xpost = (AT A +σ2

γ2I)−1(AT ydata +

σ2

γ2x0) and Γpost = σ2(AT A +

σ2

γ2I)−1.

The data ydata is a realization of the random variable y + E. Thus the posterior mean xpost is also a realization of a randomvariable, namely the random variable

Xpost(ω) = (AT A +σ2

γ2I)−1(AT (y + E(ω)) +

σ2

γ2x0).

The posterior covariance matrix Γpost is deterministic. Hence the posterior distribution µpost is a realization of the randomvariable

Mpost : (Ω,F , P) → (M(Rn), ρP), ω → N (Xpost(ω), Γpost)

where M(Rn) is the set of all Borel measures in Rn and ρP is the Prokhorov metric in M(Rn) definded as follows:

∗ E-mail: [email protected], Phone: +43 732 2468 9222∗∗ Corresponding author: E-mail: [email protected], Phone: +43 732 2468 5233, Fax: +43 732 2468 5212

PAMM · Proc. Appl. Math. Mech. 7, 1080103–1080104 (2007) / DOI 10.1002/pamm.200700275

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Definition 1.1 Let µ1 and µ2 be Borel measures in a metric space (X, dX). The distance between µ1 and µ2 in theProkhorov metric is defined by

ρP(µ1, µ2) := inf ε > 0 : µ1(B) ≤ µ2 (Bε) + ε ∀B ∈ B(X)

where B(X) is the Borel σ-algebra in X . The set Bε is the ε-neighbourhood of B, i.e., Bε := x ∈ X : infz∈B dX(x, z) < ε.

We are interested in where the random variable Mpost converges to when the noise E tends to the zero random variable.These convergence issues were studied in [2, 3]. In this work these results are improved in the case where the covariancematrices are proportional to the identity matrix.

2 Convergence rate for the posterior distribution

We want to quantify the convergence in probability for M(Rn)-valued random variables. This can be achieved via the Ky Fanmetric which measures distances between random variables on a metric space:

Definition 2.1 Let ξ1 and ξ2 be random variables in a probability space (Ω,F , P) with values in a metric space (X, dX).The distance between ξ1 and ξ2 in the Ky Fan metric is defined by

ρK(ξ1, ξ2) := inf ε > 0 : P (dX(ξ1(ω), ξ2(ω)) > ε) < ε .

Convergence rates are essentially preserved when they are lifted from a metric space to a space of random variablesequipped with the Ky Fan metric. Hence we utilize the Ky Fan metric also for the noise E.

Lemma 2.2 [2, Lemma 12], [4, Proposition 2.5] Let the distribution of E be N (0, σ2I) for some σ > 0. Then

(i) ρK(E, 0) → 0 ⇐⇒ σ → 0,

(ii) ρK(E, 0) ≤ min

1, σ

√2(m − ln−(amσ2))

= O

(σ√

1 + | lnσ|)

where am = 2πm2(e/2)m and ln−(x) = min0, ln(x).

By the above lemma a proper question of convergence is where the random variable Mpost converges to as σ → 0.Let p be the number of positive singular values of A and the matrix V2 ∈ R

n×(n−p) contain orthonormal eigenvectorsof AT A that span the null space of A. In addition, let P be the orthogonal projection onto the null space of A. We use thenotations x†

0 := x† + Px0 and µx†0

:= N (x†0, γ

2V2VT2 ). If the null space of A is trivial, µ

x†0

= δx† .

Theorem 2.3 [4, Theorem 3.1] The distance between the posterior distribution Mpost and the constant random variableµ

x†0

is bounded in the Ky Fan metric by

ρK

(Mpost, µ

x†0

)≤ max

ρK(E, 0) ,

σ2

γ2λ2p + σ2

‖x†0 − x0‖ +

(2γ2σ2

γ2λ2p + σ2

(p − ln−

(apγ

2σ2

γ2λ2p + σ2

))) 1

2

+γ max (σ, γλp)

max (σ, γλp)2

+ σ2ρK(E, 0)

= O(σ√

1 + | ln σ|)

where λp is the minimal singular value of A and ap = 2πp2(e/2)p.

As expected for regularization in finite dimensional spaces we obtain a convergence rate with the same order as the noise.The result can be generalized to the normal case where the covariance matrices are arbitrary positive definite symmetricmatrices and to an infinite dimensional setting (see [4]).

Acknowledgements This work has been supported by the Austrian National Science Foundation FWF through the project SFB F1308.

References

[1] J. P. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems (Springer-Verlag, Berlin, Germany, 2004).[2] A. Hofinger and H. K. Pikkarainen, Inverse Problems 23, 2469–2484 (2007).[3] A. Hofinger and H. K. Pikkarainen, submitted.[4] A. Neubauer and H. K. Pikkarainen, in preparation.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

ICIAM07 Minisymposia – 08 Probability and Statistics 1080104