Linear Observation

Embed Size (px)

Citation preview

  • 7/28/2019 Linear Observation

    1/27

    The Annals of Statistics

    2008, Vol. 36, No. 1, 310336DOI: 10.1214/009053607000000721 Institute of Mathematical Statistics, 2008

    NONLINEAR ESTIMATION FOR LINEAR INVERSE PROBLEMS

    WITH ERROR IN THE OPERATOR1

    BY MAR C HOFFMANN AND MARKUS REISS

    University of Marne-la-Valle and University of Heidelberg

    We study two nonlinear methods for statistical linear inverse problemswhen the operator is not known. The two constructions combine Galerkinregularization and wavelet thresholding. Their performances depend on theunderlying structure of the operator, quantified by an index of sparsity. Weprove their rate-optimality and adaptivity properties over Besov classes.

    1. Introduction.

    Linear inverse problems with error in the operator. We want to recover f L2(D), where D is a domain in Rd, from data

    g = Kf + W ,(1.1)where K is an unknown linear operator K : L2(D) L2(Q), Q is a domain inR

    q , W is Gaussian white noise and > 0 is the noise level. We do not know Kexactly, but we have access to

    K = K + B.(1.2)The process K is a blurred version of K , polluted by a Gaussian operator whitenoise B with a noise level > 0. The operator K acting on f is unknown andtreated as a nuisance parameter. However, preliminary statistical inference aboutK is possible, with an accuracy governed by . Another equivalent approach isto consider that for experimental reasons we never have access to K in practice,but rather to K . The error level can be linked to the accuracy of supplementaryexperiments; see Efromovich and Koltchinskii [11] and the examples below. Inmost interesting cases K1 is not continuous and the estimation problem (1.1) is

    ill-posed (e.g., see Nussbaum and Pereverzev [16] and Engl, Hanke and Neubauer[12]).The statistical model is thus given by the observation (g, K). Asymptotics are

    taken as , 0 simultaneously. In probabilistic terms, observable quantities takethe form

    g, k := Kf,kL2(Q) + W , k k L2(Q)

    Received December 2004; revised April 2007.1Supported in part by the European research training network Dynstoch.AMS 2000 subject classifications. 65J20, 62G07.Key words and phrases. Statistical inverse problem, Galerkin projection method, wavelet thresh-

    olding, minimax rate, degree of ill-posedness, matrix compression.

    310

    http://www.imstat.org/aos/http://dx.doi.org/10.1214/009053607000000721http://www.imstat.org/http://www.imstat.org/http://www.ams.org/msc/http://www.ams.org/msc/http://www.imstat.org/http://dx.doi.org/10.1214/009053607000000721http://www.imstat.org/aos/
  • 7/28/2019 Linear Observation

    2/27

    ESTIMATION FOR INVERSE PROBLEMS 311

    and

    Kh, k := Kh,kL2(Q) + Bh,k (h,k) L2(D) L2(Q).

    The mapping k L2(Q) W , k defines a centered Gaussian linear form, withcovariance

    E[W , k1W , k2] = k1, k2L2(Q), k1, k2 L2(Q).Likewise, (h,k) L2(D)L2(Q) Bh,k defines a centered Gaussian bilinearform with covariance

    E[Bh1, k1Bh2, k2] = h1, h2L2(D)k1, k2L2(Q).If(hi )i1 and (ki )i1 form orthonormal bases ofL2(D) and L2(Q), respectively

    in particular, we will consider hereafter wavelet bases, the infinite vector (W ,kj)j1 and the infinite matrix (Bhi , kj)i,j1 have i.i.d. standard Gaussian en-tries. Another description of the operator white noise is given by stochastic inte-gration using a Brownian sheet, which can be interpreted as a white noise modelfor kernel observations; see Section 2 below.

    Main results. The interplay between and is crucial: if , one expectsto recover model (1.1) with a known K . On the other hand, we will exhibit adifferent picture if . Even when the error in the signal g dominates , theassumption

    =0 has to be handled carefully. We restrict our attention to the case

    Q=D and nonnegative operators K on L2(D).We first consider a linear estimator based on the Galerkin projection method.

    For functions in the L2-Sobolev space Hs and suitable approximation spaces,the linear estimator converges with the minimax rate max{, }2s/(2s+2t+d), wheret > 0 is the degree of ill-posedness ofK .

    For spatially inhomogeneous functions, like smooth functions with jumps,linear estimators cannot attain optimal rates of convergence; see, for example,Donoho and Johnstone [10]. Therefore we propose two nonlinear methods by sep-arating the two steps of Galerkin inversion (I ) and adaptive smoothing (S), which

    provides two strategies:

    Nonlinear Estimation I: (g, K)(I ) flin,

    (S) fI, ,

    Nonlinear Estimation II: (g, K)(S) (g, K) (I ) fII, ,

    where flin, is a preliminary and undersmoothed linear estimator. We use a Galerkinscheme on a high-dimensional space as inversion procedure (I ) and waveletthresholding as adaptive smoothing technique (S), with a level-dependent thresh-olding rule in Nonlinear Estimation I and a noise reduction in the operator by entry-

    wise thresholding of the wavelet matrix representation in Nonlinear Estimation II.To our knowledge, thresholding for the operator is new in a statistical framework.

  • 7/28/2019 Linear Observation

    3/27

    312 M. HOFFMANN AND M. REISS

    From both mathematical and numerical perspectives, the inversion step is crit-ical: we cannot choose an arbitrarily large approximation space for the inversion,even in Nonlinear Estimation II. Nevertheless, both methods are provably rate-

    optimal (up to a log factor in some cases for the second method) over a wide rangeof (sparse) nonparametric classes, expressed in terms of Besov spaces Bsp,p withp 2.

    Organization of the paper. Section 2 discusses related approaches. The the-ory of linear and nonlinear estimation is presented in Sections 3 to 5. Section 6discusses the numerical implementation. The proofs of the main theorems are de-ferred to Section 7 and the Appendix provides technical results and some toolsfrom approximation theory.

    2. Related approaches with error in the operator.

    Perturbed singular values. Adhering to a singular-value decomposition ap-proach, Cavalier and Hengartner [3] assume that the singular functions of K areknown, but not its singular values. Examples include convolution operators. Byan oracle-inequality approach, they show how to reconstruct f efficiently when .

    Physical devices. We are given an integral equation Kf = g on a closedboundary surface , where the boundary integral operator

    Kh(x) =

    k(x,y)h(y)(dy)

    is of order t > 0, that is, K : Ht /2() Ht /2() is given by a smooth ker-nel k(x,y) as a function of x and y off the diagonal, but which is typicallysingular on the diagonal. Such kernels arise, for instance, by applying a bound-ary integral formulation to second-order elliptic problems. Examples include thesingle-layer potential operator in Section 6.2 below or Abel-type operators withk(x,y) = b(x,y)/|x y| on = [0, 1] for some > 0 (see, e.g., Dahmen, Har-brecht and Schneider [6]). Assuming that k is tractable only up to some experi-mental error, we postulate the knowledge of dk(x,y) = dk(x,y) + dB(x,y),where B is a Brownian sheet. Assuming moreover that our data g is perturbed bymeasurement noise as in (1.1), we recover our abstract framework.

    Statistical inference. The widespread econometric model of instrumental vari-ables (e.g., Hall and Horowitz [13]) is given by i.i.d. observations (Xi , Yi , Wi ) fori

    =1, . . . , n, where (Xi , Yi ) follow a regression model

    Yi = g(Xi ) + Ui

  • 7/28/2019 Linear Observation

    4/27

    ESTIMATION FOR INVERSE PROBLEMS 313

    with the exception that E[Ui |Xi] = 0, but under the additional information givenby the instrumental variables Wi that satisfy E[Ui |Wi] = 0. Denoting by fXW the

    joint density ofX and W, we define

    k(x,z) :=

    fXW(x,w)fXW(z,w)dw,

    Kh(x) :=

    k(x,z)h(z)dz.

    To draw inference on g, we use the identity Kg(x) = E[E[Y|W]fXW(x,W)]. Thedata easily allow estimation of the right-hand side and of the kernel function k.We face exactly an ill-posed inverse problem with errors in the operator, except forcertain correlations between the two noise sources and for the fact that the noise is

    caused by a density estimation problem. Note that K has a symmetric nonnegativekernel and is therefore self-adjoint and nonnegative on L2. Hall and Horowitz [13]obtain in their Theorem 4.2 the linear rate of Section 3 when replacing their termsas follows: = = n1/2, t= , s = + 1/2, d= 1.

    In other statistical problems random matrices or operators are of key impor-tance or even the main subject of interest, for instance the linear response functionin functional data analysis (e.g., Cai and Hall [2]) or the empirical covariance op-erator for stochastic processes (e.g., Reiss [17]).

    Numerical discretization. Even if the operator is known, the numerical ana-lyst is confronted with the same question of error in the operator under a differentangle: up to which accuracy should the operator be discretized? Even more impor-tantly, by not using all available information on the operator the objects typicallyhave a sparse data structure and thus require far less memory and time of compu-tation; see Dahmen, Harbrecht and Schneider [6].

    3. A linear estimation method. In the following, we write a b when a cb for some constant c > 0 and a b when a b and b a simultaneously. Theuniformity in c will be obvious from the context.

    3.1. The linear Galerkin estimator. We briefly study a linear projection esti-mator. Given s > 0 and M > 0, we first consider the Sobolev ball

    Ws (M) := {f Hs; fHs M}as parameter space for the unknown f. Pick some j 0 and let Vj = span{,|| j} denote an approximation space associated with a (s + 1)-regular mul-tiresolution analysis (Vj); see Appendix A.6. We look for an estimator f, Vj,solution to

    Kf, , v = g, v for all v Vj.(3.1)

  • 7/28/2019 Linear Observation

    5/27

    314 M. HOFFMANN AND M. REISS

    This only makes sense if K restricted to Vj is invertible. We introduce theGalerkin projection (or stiffness matrix) of an operator T onto Vj by settingTj := PjT|Vj, where Pj is the orthogonal projection onto Vj, and set formally

    f, :=

    K1,jPjg, ifK1,jVjVj 2j t,0, otherwise,

    (3.2)

    where TjVjVj = supvVj,vL2=1 Tjv denotes the norm of the operatorTj : (Vj, L2 ) (Vj, L2 ). The estimator f, is specified by the level jand the cut-off parameter > 0 (and the choice of the multiresolution analysis).

    3.2. Result. The ill-posedness comes from the fact that K1 is notL2-continuous: we quantify the smoothing action by a degree of ill-posedness

    t > 0, which indicates that K behaves roughly like t-fold integration. This is pre-cisely defined by the following ellipticity condition in terms of the L2-Sobolevnorm Hs of regularity s R; see Appendix A.6.

    ASSUMPTION 3.1. K is self-adjoint on L2(D), K : L2 Ht is continuousand Kf,f f2

    Ht /2 .

    As proved in Appendix A.6, Assumption 3.1 implies that the following map-ping constant ofK with respect to the given multiresolution analysis (Vj) is finite:

    Q(K) := supj0 2j t

    K1

    j VjVj < .(3.3)Introduce the integrated mean square error

    R(f , f ) := Ef f2L2(D)

    for an estimator f off and the rate exponent

    r(s,t,d) := 2s2s + 2t+ d.

    PROPOSITION 3.2. LetQ > 0. If the linear estimatorf, is specified by 2jmax{, }2/(2s+2t+d) and > Q, then

    supfWs (M)

    R(f, , f )max{, }2r(s,t,d)

    holds uniformly overK satisfying Assumption 3.1 with Q(K) Q.

    The normalized rate max{, }r(s,t,d) gives the explicit interplay between and and is indeed optimal over operators K satisfying Assumption 3.1 withQ(K)

    Q; see Section 5.2 below. Proposition 3.2 is essentially contained in Efro-

    movich and Koltchinskii [11], but is proved in Section 7.1 as a central referencefor the nonlinear results.

  • 7/28/2019 Linear Observation

    6/27

    ESTIMATION FOR INVERSE PROBLEMS 315

    4. Two nonlinear estimation methods.

    4.1. Nonlinear Estimation I. For x > 0 and two resolution levels 0 j0 < j1,define the level-dependent hard-thresholding operator Sx acting on L2(D) by

    Sx (h) :=

    ||j1h, 1{|h,|2||tx(||j0)+},(4.1)

    for some constant > 0 and where () is a regular wavelet basis generating themultiresolution analysis (Vj). Our first nonlinear estimator is defined by

    fI, := Smax{,}(f, ),(4.2)where f, is the linear estimator (3.2) specified by the level j1 and > 0.

    The factor 2||t in the threshold takes into account the increase in the noise levelafter applying the operator K1,j1 . The additional term

    (|| j0)+ is chosen toattain the exact minimax rate in the spirit of Delyon and Juditsky [8]. Hence, thenonlinear estimator fI, is specified by j0, j1, and .

    4.2. Nonlinear Estimation II. Our second method is conceptually different:we use matrix compression to remove the operator noise by thresholding K in afirst step and then apply the Galerkin inversion on the smoothed data g . Let

    K

    :=S

    op (K,J),(4.3)

    where K,J = PJK|VJ is the Galerkin projection of the observed operator andS

    op is a hard-thresholding rule applied to the entries in the wavelet representation

    of the operator:

    TJ Sop (TJ) :=

    ||,||JT,1{|T, |T()}, ,(4.4)

    where T(x) = x| log x| and T, := T , .The estimator g of the data is obtained by the classical hard-thresholding rule

    for noisy signals:

    g :=

    ||Jg, 1{|g ,|T()}.(4.5)

    After this preliminary step, we invert the linear system on the multiresolution spaceVJ to obtain our second nonlinear estimator:

    fII, :=

    K1 g, K1 HtL2 ,0, otherwise.

    (4.6)

    The nonlinear estimator fII, is thus specified by J, and . Observe that this

    time we do not use level-dependent thresholds since we threshold the empiricalcoefficients directly.

  • 7/28/2019 Linear Observation

    7/27

    316 M. HOFFMANN AND M. REISS

    5. Results for the nonlinear estimators.

    5.1. The setting. The nonlinearity of our two estimators permits to consider

    wider ranges of function classes for our target: we measure the smoothness s offin Lp-norm, with 1 p 2, in terms of Besov spaces Bsp,p . The minimax rates ofconvergence are computed over Besov balls

    Vsp (M) := {f Bsp,p; fBsp,p M}with radius M > 0. We show that an elbow in the minimax rates is given by thecritical line

    1

    p= 1

    2+ s

    2t+ d,(5.1)

    considering t and d as fixed by the model setting. Equation (5.1) is linked to thegeometry of inhomogeneous sparse signals that can be recovered in L2-error afterthe action of K ; see Donoho [9]. We retrieve the framework of Section 3 usingHs = Bs2,2.

    We prove in Section 5.2 that the first nonlinear estimator fI, achieves the op-timal rate over Besov balls Vsp (M). In Section 5.3 we further show that, under

    some mild restriction, the nonlinear estimator fII, is adaptive in s and nearly rate-optimal, losing a logarithmic factor in some cases.

    5.2. Minimax rates of convergence. In the following, we fix s+

    N and picka wavelet basis () associated with an s+-regular multiresolution analysis (Vj).The minimax rates of convergence are governed by the parameters s (0, s+),p > 0 and separate into two regions:

    dense region: Pdense :=

    (s,p) :1

    p 0. Specify the first nonlinear estimator fI, by 2j0

    max{, }2/(2s+2t+d), 2j1 max{, }1/t, > Q and > 0 sufficiently large. For(s,p) Pdense andp 1 we have

    supf

    Vs

    p(M)

    R(fI, , f )max{, }2r(s,t,d),

    uniformly overK satisfying Assumption 3.1 with Q(K) Q.

  • 7/28/2019 Linear Observation

    8/27

    ESTIMATION FOR INVERSE PROBLEMS 317

    For(s,p) Psparse andp 1 we havesup

    f

    Vsp (M)

    R(fI, , f )max| log |, | log |2r(s,p,t,d)

    ,

    uniformly overK satisfying Assumption 3.1 with Q(K) Q, where now

    r(s,p,t,d) := s + d/2 d/ps + t+ d/2 d/p .

    A sufficient value for can be made explicit by a careful study of Lemma 7.2together with the proof of Delyon and Juditsky [8]; see the proofs below.

    The rate obtained is indeed optimal in a minimax sense. The lower bound in thecase = 0 is classical (Nussbaum and Pereverzev [16]) and will not decrease forincreasing noise levels or , whence it suffices to provide the case

    =0.

    The following lower bound can be derived from Efromovich and Koltchinskii[11] for s > 0, p [1, ]:

    inff

    sup(f,K)Fs,p,t

    R(f, f ) 2r(s,t,d),(5.2)

    where the nonparametric class Fs,p,t =Fs,p,t(M,Q) takes the formFs,p,t = Vsp (M) {K satisfying Assumption 3.1 with Q(K) Q}.

    For (s,p) Pdense the lower bound matches the upper bound attained by fI, . In

    Appendix A.5 we prove the following sparse lower bound:THEOREM 5.2. For(s,p) Psparse we have

    inff

    sup(K,f )Fs,p,t

    R(f, f )

    | log |r(s,t,d),(5.3)and also the sparse rate of the first estimatorfI, is optimal.

    5.3. The adaptive properties of Nonlinear Estimation II. We first state a gen-eral result which gives separate estimates for the two error levels offII, associated

    with and , respectively, leading to faster rates of convergence than in Theo-rem 5.1 in the case of sparse operator discretizations.

    ASSUMPTION 5.3. K : Bsp,p Bs+tp,p is continuous.

    Furthermore, we state an ad hoc hypothesis on the sparsity ofK . It is expressedin terms of the wavelet discretization ofK and is specified by parameters (s, p).

    ASSUMPTION 5.4. For parameters s 0 and p > 0 we have uniformly overall multi-indices

    KB s+tp,p 2||(s+d/2d/p).

  • 7/28/2019 Linear Observation

    9/27

    318 M. HOFFMANN AND M. REISS

    Observe that this hypothesis follows from Assumption 5.3 with (s, p) = (s,p),p 1, due to Bsp,p 2||(s+d/2d/p). The case p < 1, however, expresseshigh sparsity: ifK is diagonal in a regular wavelet basis with eigenvalues of order

    2||t, then Assumption 5.4 holds for all s, p 0. For a less trivial example of asparse operator see Section 6.2. Technically, Assumption 5.4 will allow to controlthe error when thresholding the operator; see Proposition 7.4.

    Finally, we need to specify a restriction on the linear approximation error ex-pressed in terms of the regularity in H :

    s

    t+ ds + t+ d/2

    min

    log

    log , 1

    in the case > 1+d/ t.(5.4)

    Then for s (0, s+), p 1 and p > 0 we obtain the following general result in thedense case.

    THEOREM 5.5. Grant Assumptions 3.1, 5.3 and 5.4. Let (s,p),(s, p) Pdense satisfy

    2s + d 2d/p2s + 2t+ d

    2s d2t+ d with strict inequality forp > 1(5.5)

    and assume restriction (5.4) for 0. Choose > 0 and > 0 sufficiently largeand specify 2J min{1/t, (| log |)1/(t+d)}. Then

    supfVsp (M)W(M)

    R(

    fII, , f ) | log |

    2r(s,t,d)

    +

    |log

    | 2r (s,t,d)

    .

    The constant in the specification of 2J cannot be too large; see the proof ofProposition 7.5. While the bounds for and are explicitly computable fromupper bounds on constants involved in the assumptions on the operator, they are inpractice much too conservative, as is well known in the signal detection case (e.g.,Donoho and Johnstone [10]) or the classical inverse problem case (Abramovichand Silverman [1]).

    COROLLARY 5.6. Grant Assumptions 3.1 and 5.3. Suppose (s,p) P

    denseand 0 satisfies (5.4). Thensup

    fVsp (M)

    W (M)

    R(fII, , f )max

    | log |,

    | log |2r(s,t,d)follows from the smoothness restriction s (d2 + 8(2t + d)(d d/p))1/2/4, inparticular in the cases p = 1 ors > d (1 + 2t

    d)1/2/2.

    If in addition d/p d/2 + s(s d/2)/(s + t+ d/2) holds, then we get rid ofthe linear restriction:

    supfVsp (M)R(fII, , f )max

    | log |, | log |2r(s,t,d).

  • 7/28/2019 Linear Observation

    10/27

    ESTIMATION FOR INVERSE PROBLEMS 319

    PROOF. Set s = s and p = p and use that Assumption 5.3 implies Assump-tion 5.4. Then the smoothness restriction implies (5.5) and Theorem 5.5 applies.The particular cases follow because s and p are in Pdense.

    By Sobolev embeddings, Bsp,p W holds for s d/p d/2 and the lastassertion follows by substituting in (5.4).

    We conclude that Nonlinear Estimation II attains the minimax rate up to a log-arithmic factor in the dense case, provided the smoothness s is not too small. For(s,p) Psparse the rate with exponent r(s,p,t,d) is obtained via the Sobolev em-bedding Bsp,p B, with s d/p = d/ such that (,) Pdense, and evenexact rate-optimality follows in the sparse case.

    6. Numerical implementation.

    6.1. Specification of the method. While the mapping properties of the un-known operator K along the scale of Sobolev or Besov spaces allow a propermathematical theory and a general understanding, it is per se an asymptotic pointof view: it is governed by the decay rate of the eigenvalues. For finite samples onlythe eigenvalues in the Galerkin projection KJ matter, which will be close to thefirst 2J d eigenvalues ofK . Consequently, even if the degree of ill-posedness ofKis known in advance (as is the case, e.g., in Reiss [17]), optimizing the numerical

    performance should rather rely on the induced norm KJ := K1J L2 on VJand not on Ht.

    Another practical point is that the cut-off rule using in the definitions (3.2)and (4.6) is not reasonable given just one sample, but needed to handle possiblyhighly distorted observations. An obvious way out is to consider only indices J ofthe approximation space VJ which are so small that K,J remains invertible andnot too ill-conditioned. Then the cut-off rule can be abandoned and the parameter is obsolete.

    The estimator

    fII, is therefore specified by choosing an approximation space

    VJ and a thresholding constant . Since a thresholding rule is applied to bothsignal and operator, possibly different values of can be used. In our experience,thresholds that are smaller than the theoretical bounds, but slightly larger than goodchoices in classical signal detection work well; see Abramovich and Silverman [1]for a similar observation.

    The main constraint for selecting the subspace VJ is that J is not so large thatK,J is far away from KJ. By a glance at condition (7.6) in the proof of Theo-rem 5.5, working with KJ instead of Ht and with the observed operatorbefore thresholding, we want that

    K,J KJ(VJ,L2 )(VJ,KJ ) K1J 1(VJ,KJ )(VJ,L2 )

  • 7/28/2019 Linear Observation

    11/27

    320 M. HOFFMANN AND M. REISS

    with some (0, 1). This reduces to (IdK 1,JBJ)1 IdVJVJ , whichby Lemma 7.1 is satisfied with very high probability provided

    min(K,J) cdim(VJ),(6.1)where min() denotes the minimal eigenvalue and c > 0 a constant depending on and the desired confidence. Based on these arguments we propose the followingsequential data-driven rule to choose the parameter J:

    J := min{j 0|min(K,j+1) < c dim(Vj+1)}.(6.2)This rule might be slightly too conservative since after thresholding K will becloser to KJ than K,J. It is, however, faster to implement and the desired confi-dence can be better tuned. In addition, a conservative choice ofJ will only affect

    the estimation of very sparse and irregular functions.

    6.2. A numerical example. We consider a single-layer logarithmic potentialoperator that relates the density of the electric charge on a cylinder of radius r =1/4 to the induced potential on the same cylinder, when the cylinder is assumedto be infinitely long and homogeneous in that direction. Describing the angle bye2 ix with x [0, 1], the operator is given by

    Kf (x) = 1

    0k(x,y)f(y)dy with k(x,y) = log

    12 sin(x y).

    The single-layer potential operator is known to satisfy a degree of ill-posednesst= 1 because of its logarithmic singularity on the diagonal. In Cohen, Hoffmannand Reiss [5] this operator has been used to demonstrate different solution meth-ods for inverse problems with known operator: the singular-value decomposition(SVD), the linear Galerkin method and a nonlinear Galerkin method which corre-sponds to Nonlinear Estimation II in the case = 0.

    The aim here is to compare the performance of the presented methods given thatnot K , but only a noisy version K is available. Our focus is on the reconstructionproperties under noise in the operator and we choose

    =103,

    =105. As in

    Cohen, Hoffmann and Reiss [5] we consider the tent function

    f(x) = max1 30x 12 , 0, x [0, 1],as object to be estimated. Its spike at x = 1/2 will be difficult to reconstruct.

    For implementing the linear and the two nonlinear methods we use Daubechieswavelets of order 8 (with an extremal phase choice). We calculate the waveletdecomposition of K and f up to the scale Jmax = 10 by Mallats pyramidal al-gorithm. For the nonlinear methods the large space VJ, on which the Galerkininversion is performed, is determined by the rule (6.2) with c

    =5. Figure 1(a)

    shows the modulus of the wavelet discretization (|K,|) of the operator K on VJwith J = 7. Multi-indices with the same resolution level j are presented next to

  • 7/28/2019 Linear Observation

    12/27

    ESTIMATION FOR INVERSE PROBLEMS 321

    FIG . 1. Wavelet representation of K (a) andK (b).

    each other; the resolution level j decreases from left to right and from bottom totop. The units are multiples of . The finger-like structure, showing large coef-ficients for low resolution levels, along the diagonal and certain subdiagonals, istypical for wavelet representations of integral (CalderonZygmund) operators anddue to the support properties of the wavelets; see, for example, Dahmen, Harbrecht

    and Schneider [6]. In Figure 1(b) the modulus of the wavelet discretization of thenoisy observation K is shown. The structures off the main diagonal are hardlydiscernible.

    The performance of the methods for this simulation setup are very stable fordifferent noise realizations. In Figure 2(a) a typical linear estimation result forthe choice j = 5 is shown along with the true function (dashed). Remark that

    FIG . 2. Linear estimator(a) and NonlinearII estimator(b).

  • 7/28/2019 Linear Observation

    13/27

    322 M. HOFFMANN AND M. REISS

    because of 25/27 = 1/4 the result is obtained by using only the values ofK thatare depicted in the upper right quarter [0.75, 1]2 of Figure 1(b). For the oraclechoice j = 5 the root mean square error (RMSE) is minimal and evaluates to 0.029.

    For the two nonlinear estimation methods, the approximation space VJ (i.e.,Vj1 for Nonlinear Estimation I) chosen by the data-driven rule is J = 7 for allrealizations. As to be expected, the simulation results deviate only marginally fordifferent choices ofc [1, 20], giving either J = 6 or (mostly) J = 7. An imple-mentation of Nonlinear Estimation I is based on a level-dependent thresholdingfactor which is derived from the average decay of the observed eigenvalues ofK,J, ignoring the DelyonJuditsky correction

    (|| j0)+. With the threshold

    base level = 0.4 (oracle choice) Nonlinear Estimation I produces an RMSE of0.033. It shows a smaller error than the linear estimator at the flat parts far offthe spike, but has difficulties with too large fluctuations close to the spike. Themain underlying problem is that after the inversion the noise in the coefficientsis heterogeneous even on the same resolution level which is not reflected by thethresholding rule.

    Setting the base level = 1.5 for thresholding the operator and the data, theresulting estimator fII, of Nonlinear Estimation II is shown in Figure 2(b). It hasby far the best performance among all three estimators with an RMSE of 0.022.The only artefacts, from an a posteriori perspective, are found next to the spike andstem from overlapping wavelets needed to reconstruct the spike itself.

    In Cohen, Hoffmann and Reiss [5] simulations were performed for

    =2

    104

    knowing the operator K ( = 0). There the respective RMSE under oracle spec-ifications is 0.024 (SVD), 0.023 (linear Galerkin), 0.019 (nonlinear Galerkin).In comparison we see that roughly the same accuracy is achieved in the case = 103, = 105, which shows that the error in the operator is less severe thanthe error in the data. This observation is corroborated by further simulations fordifferent values of and .

    In order to understand why in this example the error in the operator is less se-vere and Nonlinear Estimation II performs particularly well, let us consider moregenerally the properties for thresholding a sparse operator representation as in Fig-ure 1(a). This is exactly the point where Assumption 5.4 comes into play withp (0, 1). To keep it simple, let us focus on the extreme case of an operator Kwhich is diagonalized by the chosen wavelet basis with eigenvalues 2||t. ThenK satisfies Assumption 5.4 for all (s, p) and by Theorem 5.5, choosing p suchthat (s, p) Pdense and restriction (5.5) is satisfied with equality, we infer

    supfVsp (M)W (M)

    R(fII, , f )

    | log |2r(s,t,d) + | log |min{(2sd)/t,2}.This rate is barely parametric in for not too small s. Hence, Nonlinear Estima-tion II can profit from the usually sparse wavelet representation of an operator, even

    without any specific tuning. This important feature is shared neither by NonlinearEstimation I nor by the linear Galerkin method.

  • 7/28/2019 Linear Observation

    14/27

    ESTIMATION FOR INVERSE PROBLEMS 323

    7. Proofs.

    7.1. Proof of Proposition 3.2. By definition, R(f, , f ) is bounded by a con-

    stant times the sum of three terms I+II+III, where term III comes from f, = 0ifK1,jVjVj > 2j t:

    I := f fj2L2 ,II := E(K1,jPjg fj)1{K1,jVjVj2j t}

    2L2

    ,

    III := f2L2P(K1,jVjVj > 2j t).

    The term I. This bias term satisfies under Assumption 3.1

    f fj2L2 22j s max{, }4s/(2s+2t+d)by estimate (A.1) in the Appendix and thus has the right order.

    The term III. For (0, 1) let us introduce the event,,j = {K1j BjVjVj }.(7.1)

    On the event ,,j the operator K,j = Kj(Id+K 1j Bj) is invertible withK1,jVjVj (1 )1K1j VjVj because

    (Id+K 1j Bj)1VjVj m0

    K1j BjmVjVj (1 )1(7.2)

    follows from the usual Neumann series argument. By (3.3), the choice > 1 Q/ (0, 1) thus implies {K1,jVjVj > 2j t} ,,j. For = 1 (2t +d)/(2s + 2t+ d) > 0 and sufficiently small , we claim that

    P(c,,j) exp(C22j d) for some C > 0,(7.3)which implies that term III is of exponential order and hence negligible. To prove(7.3), we infer from (3.3)

    c,,j {2j d/2BjVjVj > 1K1j 1VjVj2j d/2}

    2j d/2BjVjVj > 1Q12j (2t+d)/2and the claim (7.3) follows from 12j (2t+d)/2 and the following classicalbound for Gaussian random matrices:

    LEMMA 7.1 ([7], Theorem II.4). There are constants 0, c, C > 0 such that

    0 :P(2j d/2BjVjVj ) exp(c222j d),

    0 :P(2j d/2BjVjVj ) (C)22j d

    .

  • 7/28/2019 Linear Observation

    15/27

    324 M. HOFFMANN AND M. REISS

    The term II. Writing Pjg = PjKf + PjW and using the independence ofthe event c,,j from PjW (recall B and W are independent), we obtain

    EK1,jPjg fj2L2 1{K1,jVjVj2j t}1c,j, 22j t(PjKf2L2 + 2E[PjW2L2] + fj2L2 )P(c,,j).

    Because ofPjKf2L2 + fj2L2 M2, E[PjW2L2] 2j d and estimate (7.3),we infer that the above term is asymptotically negligible. Therefore, we are leftwith proving that E[K1,jPjg fj2L2 1,j, ] has the right order. On ,j, weconsider the decomposition

    K1,jPjg fj = (Id+K1j Bj)

    1 Idfj(7.4) + (Id+K 1j Bj)1K1j PjW .As for the second term on the right-hand side of (7.4), we have

    E[2(Id+K 1j Bj)1K1j PjW2L2 1,,j] 2E[(Id+K1j Bj)12VjVj1,,j]K1j 2VjVjE[PjW2L2] 222j t2djmax{, }4s/(2s+2t+d) ,

    where we used again the independence of ,,j and Pj

    W and the bound (7.2).

    The first term on the right-hand side of(7.4) is treated by

    E[K 1j Bj(Id+K 1j Bj)1fj2L2 1,,j] 2K1j 2VjVjfj2L2E[Bj2VjVj(Id+K 1j Bj)12VjVj1,,j] 2K1j 2VjVjE[Bj2VjVj] 222j t2djmax{, }4s/(2s+2t+d) ,

    where we successively used the triangle inequality, limj

    fj

    L2

    = f

    L2 from

    (A.1), bound (7.2), Lemma 7.1 and (3.3).

    7.2. Proof of Theorem 5.1.

    The main decomposition. The errorR(fI, , f ) is bounded by a constant timesthe sum I +II+III with

    I := f fj12L2 ,II := ESmax{,}(f, ) fj1

    2L2 1

    {K1,j1

    Vj

    1 Vj

    1 2j1 t

    },

    III := f2L2P(K1,j1Vj1Vj1 > 2

    j1t).

  • 7/28/2019 Linear Observation

    16/27

    ESTIMATION FOR INVERSE PROBLEMS 325

    For the term I, we use the bias estimate (A.1) and the choice of 2j1 . The term IIIis analogous to the term III in the proof of Proposition 7.1; we omit the details. Totreat the main term II, we establish sharp error bounds for the empirical wavelet

    coefficients f, , for || j1.The empirical wavelet coefficients. We consider again the event ,,j1 from

    (7.1) with j1 in place ofj. On that event, we have the decomposition

    f, = K1,j1 Pj1 g = fj1 K1j1

    Bj1 fj1 + K1j1 Pj1 W + r(1),j1

    + r (2),j1 ,with

    r(1),j1

    =n2

    (K1j1 Bj1 )nfj1 ,

    r(2),j1

    = K1j1 Bj1 (Id+Kj1 Bj1 )1K1j1 Pj1 W .

    In the Appendix we derive the following properties.

    LEMMA 7.2. Let || j1 and (0, 1 Q/). Under Assumption 3.1 thefollowing decomposition holds:

    K1j1 Bj1 fj1 , = 2||tfj1L2 c,

    K1j

    1

    Pj1

    W ,

    =2||t

    c

    ,

    r(1),j1

    , = 22||tfj1L22j1(t+d),j1 ,

    r(2),,j1

    , = 2||t2j1(t+d/2),j1 ,

    on ,,j1 , where |c|, |c| 1, and are standard Gaussian variables and,j1 , ,j1 are random variables satisfying

    maxP({|,j1 | } ,,j1 ),P({|,j1 | } ,,j1 )

    exp(c22j1d)for all 0 with some (explicitly computable) constants 0, c > 0.

    From this explicit decomposition we shall derive the fundamental deviationbound

    P

    2||t|f, , fj1 , | max{, } ,,j1

    (7.5) 4exp(C min{, 2j1d})

    for all || j1 and some explicitly computable constant C > 0. Once this isachieved, we are in the standard signal detection setting with exponentially tight

    noise. The asserted bound for term II is then proved exactly following the lines in[8]; see also the heteroskedastic treatment in [14]. The only fine point is that we

  • 7/28/2019 Linear Observation

    17/27

    326 M. HOFFMANN AND M. REISS

    estimate the Galerkin projection fj1 , not f, but by estimate (A.2) in the Appendixfj1Bsp,p fBsp,p .

    It remains to establish the deviation bound (7.5). By Lemma 7.2, that probability

    is bounded by the sum of the four terms

    PI := Pfj1L2 c||

    4

    ,

    PII := P|c|

    4

    ,

    PIII := P

    2j1(t+d)fj1L2 ,j1

    4

    ,,j1

    ,

    PIV := P2j1(t+d/2),j1 4 ,,j1.We obtain the bounds PI exp(cI2), PII exp(cII2) with some constantscI, cII > 0 by Gaussian deviations. The large deviation bound on ,j1 in Lemma7.2 implies with a constant cIII > 0

    PIII expcIII2j1(t+d2d)1.

    Equally, PIV exp(cIV2j1(t+d/22d)1) follows, which proves (7.5) withsome C > 0 depending on cI to cIV since 1 2j1t by construction.

    7.3. Proof of Theorem 5.5. The proof of Theorem 5.5 is a combination of adeviation bound for the hard-thresholding estimator in Ht-loss together with anerror estimate in operator norm. The following three estimates are the core of theproof and seem to be new. They are proved in the Appendix.

    PROPOSITION 7.3 (Deviation in Ht-norm). Assume > 4

    t /d, 2J 1/tand thats 0, p > 0 are in Pdense. Then there exist constants c0, 0, R0 > 0 suchthat for all functions g Bs+tp,p the hard-thresholding estimate g in (4.5) satisfieswith m := max{PJgBs+tp,p , PJg

    p/2Bs+tp,p }:

    0 :PT()r(s,t,d)g PJgHt m1r(s,t,d)

    c0

    2 + 2/8d/ t,R R0 :P(g PJgHt m + R)

    2/16d/ tR4.

    PROPOSITION 7.4 (Estimation in operator norm, L2-bound). Suppose 2 32max{d/t, 1 + d(2t+ d)/(4t (t+ d))}. Grant Assumption 5.4 with s > 0, p > 0satisfying restriction (5.5). Then

    EK KJ2(VJ,Bsp,p )Ht | log |2r(s,t,d).

  • 7/28/2019 Linear Observation

    18/27

  • 7/28/2019 Linear Observation

    19/27

    328 M. HOFFMANN AND M. REISS

    Second equality. We write K1j1 Pj1 W , = W , K1j1

    , which is cen-tered Gaussian with variance 2K1j1 2L2 , and the foregoing arguments apply.

    Third equality. On ,,j1 the term |r (1),j1 , | equals

    |(K 1j1 Bj1 )2(Id+ K 1j1 Bj1 )

    1fj1 , |= 2|Bj1 K1j1 Bj1 (Id+ K

    1j1

    Bj1 )1fj1 , K

    1j1

    | 2Bj12Vj1Vj1 K

    1j1

    Vj1Vj1 (Id+ K1j1

    Bj1 )1Vj1Vj1

    fj1L2K1j1 L2

    2Bj12Vj1Vj1 2j1t2||t,where we successively applied the CauchySchwarz inequality, (3.3), (7.2)on ,,j1 and (A.1) together with the same arguments as before to boundK1j1 L2 . Lemma 7.1 yields the result.

    Fourth equality. Since W and B are independent, we have that, conditional onB , the random variable r (2),,j1 , 1,,j1 is centered Gaussian with conditionalvariance

    22K1j1 Bj1 (Id+ K 1j1 Bj1 )1K1j1 2L2 1,,j1= 22Bj1 (Id+ K1j1 Bj1 )1K1j1 K1j1 2L2 1,,j1 22

    Bj1 (Id+K 1j1 Bj1 )1K1j1 2Vj1Vj1 22||t1,,j1 2222(||+j1)tBj12Vj1Vj1

    by (3.3) and estimate (7.2), which is not affected when passing to the adjoint, up toan appropriate modification of

    ,,j1incorporating B

    j1. We conclude by applying

    Lemma 7.1 which is also not affected when passing to the adjoint.

    A.2. Proof of Proposition 7.3. Denote by g and g the wavelet coefficientsofg and g. We have

    g PJg2Ht

    ||J22||t

    g 1{|g |T()} g

    2.The usual decomposition yields a bound of the right-hand side by the sum of fourterms I

    +II

    +III

    +IV with

    I :=

    22||t(g g)21{|g|(1/2)T()},

  • 7/28/2019 Linear Observation

    20/27

    ESTIMATION FOR INVERSE PROBLEMS 329

    II :=

    22||t(g g)21{|g g|>(1/2)T()},III :=2

    2||t(g)21{|g g|>T()},

    IV :=

    22||t(g)21{|g|

  • 7/28/2019 Linear Observation

    21/27

    330 M. HOFFMANN AND M. REISS

    By definition, each j has a normalized (to expectation 1) 2-distribution and sohas any convex combination

    j ajj. For the latter we infer P(

    j ajj 2)

    e2/2,

    1, by regarding the extremal case of only one degree of freedom.

    Consequently, we obtain P(c122j (2t+d)I 2) e2/2 with some constantc1 > 0. Substituting for j, we conclude with another constant c2 > 0 that

    P

    I 12 2PJgpBsp,p

    (T()PJgp/2Bs+tp,p )2r(s,t,d) exp(c22| log |) = c22 .

    The terms II and III. For these deviation terms we obtain by independence anda Gaussian tail estimate

    P(

    {II

    =0} {

    III

    =0}

    )

    P|g

    g

    | 12T() for all

    |

    | J

    1 exp(2| log |/8)#VJ.Using #VJ 2J d d/ t, we derive P(II+III> 0) 2/8d/t.

    The first assertion. We obtain for some 0 1 and all 0:

    Pg PJgHt m1r(s,td)T()r(s,t,d)

    PI > 12 2PJg

    p(1r(s,t,d))B

    s

    +t

    p,p

    T()2r(s,t,d)+ P(II+III> 0)+ PIV > 12 2PJg22r(s,t,d)Bs+tp,p T()2r(s,t,d)

    c22 + 2/8d/ t + 0.

    The second assertion. We show that the deviation terms are well bounded inprobability. While obviously III PJg2Ht PJg2Bs+tp,p holds,

    E

    [II] ||J 22

    |

    |tE

    [(g

    g

    )

    4

    ]1/2P|g

    g

    | >T

    ()/21/2

    is bounded in order by 2J (2t+d) 2 exp(2| log |/8)1/2 2/16d/t due to 2J 1/t. In the same way we find

    Var[II]

    ||J24||tE[(g g)8]1/2P

    |g g| > T()/21/2 2/16d/t.

    By Chebyshevs inequality, we infer P(II R2) 2/16d/tR4 for R > 0.Since the above estimates of the approximation terms yield superoptimal devia-tion bounds, the estimate follows for sufficiently large R.

  • 7/28/2019 Linear Observation

    22/27

    ESTIMATION FOR INVERSE PROBLEMS 331

    A.3. Proof of Proposition 7.4. The wavelet characterization of Besov spaces(cf. Appendix A.6) together with Hlders inequality for p1 + q1 = 1 yields

    K

    KJ

    (V

    J,B

    sp,p

    )

    Ht

    sup(a)p =1

    (K KJ)

    ||J2||(s+d/2d/p)a

    Ht

    2||(s+d/2d/p)(K KJ)Ht||Jq 2||(s+d/2d/p)KJ1r(s,t,d)

    B s+tp,p

    ||J

    q

    sup

    ||JKJ

    r(s,t,d)1B s+t

    p,p (K

    KJ)

    Ht.

    Due to Assumption 5.4 the last q -norm can be estimated in order by2j ((sd/2)+(s+d/2d/p)(1r(s,t,d)))jJq ,which is of order 1 whenever restriction (5.5) is fulfilled.

    By construction, K is the hard-thresholding estimator for KJ given theobservation ofK,J, which is KJ corrupted by white noise of level . There-fore Proposition 7.3 applied to K and gives for any 0:PKJ

    r(s,t,d)1B

    s+

    tp,p

    (

    K

    KJ)

    Ht

    T()r(s,t,d) c02 +

    2/8d/t.

    By estimating the probability of the supremum by the sum over the probabilities,we obtain from above with a constant c1 > 0 for all 0:

    PK KJ(VJ,Bsp,p )Ht T()r(s,t,d)

    ||JPKr(s,t,d)1

    B s+tp,p(K KJ)Ht c1T()r(s,t,d)

    2J d(c02

    +

    2/8d/ t)

    c02d/(t+d) + 2/8d(2t+d)/(t(t+d)) .

    For a sufficiently large 1 > 0, depending only on c0, d and t, with := 2/8 d(2t+ d)/(t(t+ d )) > 0, we thus obtain

    PK KJ(VJ,Bsp,p )Ht 1T()r(s,t,d) .

    By the above bound on the operator norm and Hlders inequality for q := /2 2and 1 + q1 = 1 together with the second estimate in Proposition 7.3, we findfor some constant R0 > 0:

    EK KJ2(VJ,Bsp,p )Ht1{KKJ(VJ,Bsp,p )Ht1T()r(s,t,d)}

  • 7/28/2019 Linear Observation

    23/27

    332 M. HOFFMANN AND M. REISS

    EK KJ2(VJ,Bsp,p )Ht

    1/ /q

    0 R

    2

    1PK KJ(VJ,Bsp,p )Ht RdR

    1/

    2

    R0 +

    R0

    R212J d2/16d/tR4dR

    1/2

    max

    (2/162d/t)/, 1

    2

    which is of order 2 by assumption on and the assertion follows.

    A.4. Proof of Proposition 7.5. For ||, || J we have for the entries in thewavelet representation

    |(K), K,| = |K,|1{|(K ),|T()} + |B,|1{|(K ),|>T()}.A simple rough estimate yields

    |(K), K,| 2T() + |K,|1{|(KK),|T()} + |B,|.We bound the operator norm by the corresponding HilbertSchmidt norm and useK < to obtain

    K

    KJ

    2

    (VJ,L2 )Ht

    ||,||J22||t

    (K), K,

    2

    22J (t+d)T()2 + #{|(K K),| T()} + 222J t

    ||,||JB2,,

    where the cardinality is taken for multi-indices (,) such that ||, || J. Thefirst term is of order | log |1. In view of(K K), = B,, the second termis a binomial random variable with expectation 22J dP(

    |B,

    |

    |log

    |1/2)

    2d/(t+d)+2/2. An exponential moment bound for the binomial distributionyields

    P#|B,| | log | ( 2/22d/(t+d)).

    For the last term, we use an exponential bound for the deviations of a normalized 2-distribution, as before, to infer from 2J (t+d) T() that

    P

    222J t

    ||,||JB2,

    exp

    22J (t+d)12

    /2q1

    holds, which gives the result.

  • 7/28/2019 Linear Observation

    24/27

    ESTIMATION FOR INVERSE PROBLEMS 333

    A.5. Proof of Theorem 5.2. To avoid singularity of the underlying probabil-ity measures we only consider the subclass F0 of parameters (K, f ) such thatKf

    =y0 for some fixed y0

    L2, that is, F0

    := {(K, f )

    |f

    =K1y0, K

    K

    },

    whereK =Kt(C) abbreviates the class of operators under consideration. We shallhenceforth keep y0 fixed and refer to the parameter (K, f ) equivalently just by K .

    The likelihood () ofPK2 under the law PK1 corresponding to the parametersK i , i = 1, 2, is

    (K2, K1) = exp1K2 K1, BHS 12 2K1 K22HSin terms of the scalar product and norm of the Hilbert space HS(L2) of HilbertSchmidt operators on L2 and with a Gaussian white noise operator B . In particular,the KullbackLeibler divergence between the two measures equals 12

    2

    K1

    K22HS and the two models remain contiguous for 0 as long as the HilbertSchmidt norm of the difference remains of order .

    Let us fix the parameter f0 = 1,0 = 1 and the operator K0 which, in a waveletbasis (), has diagonal form K0 = diag(2(||+1)t). Then K0 is ill-posed ofdegree t and trivially obeys all the mapping properties imposed. Henceforth, y0 :=K0f0 = 1 remains fixed.

    For any k = 0, . . . , 2J d 1, introduce the symmetric perturbation H = (H,)with vanishing coefficients except for H(0,0),(J,k) = 1 and H(J,k),(0,0) = 1. PutK

    =K0

    + H for some > 0. By setting

    :=J we enforce

    K

    K0

    HS

    =J. For f := (K)1y0, we obtainf f0 =

    (K)1 (K0)1y0

    = (K)1Hf0= (K)1J,0.

    Now observe that H trivially satisfies the conditions

    12 |Hf, f| 2J tf2Ht /2 , 12HL2Ht 2J t,

    12HBsp,pBs+tp,p 2J (t+s+d/2d/p).

    This implies that for 2J (t+s+d/2d/p) sufficiently small K inherits the mappingproperties from K0. Hence,

    f f0L2 J,0Ht = 2J t,f f0Bsp,p J,0Bs+tp,p = 2J (t+s+d/2d/p)

    follows. In order to apply the classical lower bound proof in the sparse case ([15],

    Theorem 2.5.3) and thus to obtain the logarithmic correction, we nevertheless haveto show that f f0 is well localized. Using the fact that ((H)2), = 1 holds for

  • 7/28/2019 Linear Observation

    25/27

    334 M. HOFFMANN AND M. REISS

    coordinates = = (0, 0) and = = (J,k), but vanishes elsewhere, we inferfrom the Neumann series representation

    f f0 =

    m=1( H)mf0

    =

    n=12nf0

    n=0

    2n+1J,k

    = 1 2 (f0 J,k ).

    Consequently, the asymptotics for 0 are governed by the term J,k , whichis well localized. The choice 2J < 1/(t+s+d/2d/p) ensures that fBsp,p re-mains bounded and we conclude by usual arguments; see Chapter 2 in [15] or thelower bound in [17].

    A.6. Some tools from approximation theory. The material gathered hereis classical; see, for example, [4]. We call a multiresolution analysis on L2(D)an increasing sequence of subspaces (VJ)J0 generated by orthogonal wavelets()||J, where the multi-index = (j,k) comprises the resolution level || :=j

    0 and the d-dimensional location parameter k. We use the fact that for regular

    domains #VJ 2J d and denote the L2-orthogonal projection onto VJ by PJ.Given an s+-regular multiresolution analysis, s+ N, an equivalent norming of

    the Besov space Bsp,p , s (s+, s+), p > 0, is given in terms of weighted waveletcoefficients:

    fBsp,p

    j=12j (s+d/2d/p)p

    k

    |f, j k|p1/p

    .

    For p < 1 the Besov spaces are only quasi-Banach spaces, but still coincide with

    the corresponding nonlinear approximation spaces; see Section 30 in [4]. If s isnot an integer or ifp = 2, the space Bsp,p equals the Lp-Sobolev space, which forp = 2 is denoted by Hs . The Sobolev embedding generalizes to Bsp,p Bs

    p,p for

    s s and s dp

    s dp .

    Direct and inverse estimates are the main tools in approximation theory. Usingthe equivalent norming, they are readily obtained for any s+ < s s < s+:

    infhjVj

    f hjBsp,p 2(ss)jfBsp,p ,

    hj Vj : hjBsp,p 2(ss)jhjBsp,p .

  • 7/28/2019 Linear Observation

    26/27

    ESTIMATION FOR INVERSE PROBLEMS 335

    In [5] it is shown that under Assumption 3.1 K1j HtL2 1 and we infer froman inverse estimate K1j L2L2 2j t.

    Let us finally bound

    f

    fj

    and

    fj

    for diverse norms. By definition, fj is

    the orthogonal projection of f onto Vj with respect to the scalar product K, such that K1/2(f fj)L2 K1/2(IdPj)fL2 and by Assumption 3.1 f fjHt/2 (IdPj)fHt/2 . Using the equivalent (weighted) p-norms, we findf fjBt /2p,p (IdPj)fBt /2p,p for any p. By a direct estimate, we obtain(IdPj)fBt/2p,p 2

    j (s+t /2)fBsp,p and

    f fjBt /2p,p 2j (s+t /2)fBsp,p ,

    hence

    Pjf fjBt /2p,p 2j (s+t /2)fBsp,p .An inverse estimate, applied to the latter inequality, yields together with theSobolev embeddings (p 2)

    f fjL2 f PjfL2 + Pjf fjL2(A.1)

    2j (s+d/2d/p)fBsp,p .Merely an inverse estimate yields the stability estimate

    fjBsp,p fj PjfB

    sp,p + PjfB

    sp,p fB

    sp,p .(A.2)

    Acknowledgment. Helpful comments by two anonymous referees are grate-fully acknowledged.

    REFERENCES

    [1] ABRAMOVICH , F. and SILVERMAN, B. W. (1998). Wavelet decomposition approaches to sta-tistical inverse problems. Biometrika 85 115129. MR1627226

    [2] CAI , T. and HAL L, P. (2006). Prediction in functional linear regression. Ann. Statist. 34 21592179. MR2291496

    [3] CAVALIER, L. and HENGARTNER , N. W. (2005). Adaptive estimation for inverse problemswith noisy operators. Inverse Problems 21 13451361. MR2158113

    [4] COHEN, A. (2000). Wavelet methods in numerical analysis. In Handbook of Numerical Analy-sis 7 417711. North-Holland, Amsterdam. MR1804747

    [5] COHEN, A., HOFFMANN, M. and REISS, M. (2004). Adaptive wavelet Galerkin methods forlinear inverse problems. SIAM J. Numer. Anal. 42 14791501. MR2114287

    [6] DAHMEN, W., HARBRECHT, H. and SCHNEIDER, R. (2005). Compression techniques forboundary integral equationsasymptotically optimal complexity estimates. SIAM J. Nu-mer. Anal. 43 22512271. MR2206435

    [7] DAVIDSON, K. R. and SZAREK, S. J. (2001). Local operator theory, random matrices and

    Banach spaces. In Handbook on the Geometry of Banach Spaces 1 (W. B. Johnson and J.Lindenstrauss, eds.) 317366. North-Holland, Amsterdam. MR1863696

    http://www.ams.org/mathscinet-getitem?mr=1627226http://www.ams.org/mathscinet-getitem?mr=2291496http://www.ams.org/mathscinet-getitem?mr=2158113http://www.ams.org/mathscinet-getitem?mr=1804747http://www.ams.org/mathscinet-getitem?mr=2114287http://www.ams.org/mathscinet-getitem?mr=2206435http://www.ams.org/mathscinet-getitem?mr=1863696http://www.ams.org/mathscinet-getitem?mr=1863696http://www.ams.org/mathscinet-getitem?mr=2206435http://www.ams.org/mathscinet-getitem?mr=2114287http://www.ams.org/mathscinet-getitem?mr=1804747http://www.ams.org/mathscinet-getitem?mr=2158113http://www.ams.org/mathscinet-getitem?mr=2291496http://www.ams.org/mathscinet-getitem?mr=1627226
  • 7/28/2019 Linear Observation

    27/27

    336 M. HOFFMANN AND M. REISS

    [8] DELYON, B. and JUDITSKY, A. (1996). On minimax wavelet estimators. Appl. Comput. Har-mon. Anal. 3 215228. MR1400080

    [9] DONOHO, D. (1995). Nonlinear solution of linear inverse problems by waveletvaguelette de-composition. Appl. Comput. Harmon. Anal. 2 101126. MR1325535

    [10] DONOHO, D. L. and JOHNSTONE, I. M. (1995). Adapting to unknown smoothness via waveletshrinkage. J. Amer. Statist. Assoc. 90 12001224. MR1379464

    [11] EFROMOVICH, S. and KOLTCHINSKII , V. (2001). On inverse problems with unknown opera-tors. IEEE Trans. Inform. Theory 47 28762894. MR1872847

    [12] ENG L, H. W., HANKE, M. and NEUBAUER, A. (2000). Regularization of Inverse Problems.Kluwer Academic Press, Dordrecht. MR1408680

    [13] HAL L, P. and HOROWITZ, J. L. (2005). Nonparametric methods for inference in the presenceof instrumental variables. Ann. Statist. 33 29042929. MR2253107

    [14] KERKYACHARIAN, G. and PICARD, D. (2000). Thresholding algorithms, maxisets and well-concentrated bases. Test 9 283344. MR1821645

    [15] KOROSTELEV, A. and TSYBAKOV, A. B. (1993). Minimax Theory of Image Reconstruction.Springer, New York. MR1226450

    [16] NUSSBAUM, M. and PEREVERZEV, S. V. (1999). The degree of ill-posedness in stochastic anddeterministic models. Preprint No. 509, Weierstrass Institute (WIAS), Berlin.

    [17] REISS, M. (2005). Adaptive estimation for affine stochastic delay differential equations.Bernoulli 11 67102. MR2121456

    CNRS-UMR 8050AN D

    UNIVERSITY OF MARNE-LA -VALLEFRANCEE- MAIL: [email protected]

    INSTITUTE OF APPLIED MATHEMATICSUNIVERSITY OF HEIDELBERG69120 HEIDELBERGGERMANYE- MAIL: [email protected]

    http://www.ams.org/mathscinet-getitem?mr=1400080http://www.ams.org/mathscinet-getitem?mr=1325535http://www.ams.org/mathscinet-getitem?mr=1379464http://www.ams.org/mathscinet-getitem?mr=1872847http://www.ams.org/mathscinet-getitem?mr=1408680http://www.ams.org/mathscinet-getitem?mr=2253107http://www.ams.org/mathscinet-getitem?mr=1821645http://www.ams.org/mathscinet-getitem?mr=1226450http://www.ams.org/mathscinet-getitem?mr=2121456mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.ams.org/mathscinet-getitem?mr=2121456http://www.ams.org/mathscinet-getitem?mr=1226450http://www.ams.org/mathscinet-getitem?mr=1821645http://www.ams.org/mathscinet-getitem?mr=2253107http://www.ams.org/mathscinet-getitem?mr=1408680http://www.ams.org/mathscinet-getitem?mr=1872847http://www.ams.org/mathscinet-getitem?mr=1379464http://www.ams.org/mathscinet-getitem?mr=1325535http://www.ams.org/mathscinet-getitem?mr=1400080