Noisy Information: Optimality, Complexity, Tractability€¦ · What if the worst case with noise y −N(f) ∼Gauss(0,) ? Theorem (Ibragimov & Hasminski’84, Brown & Feldman’90,

Noisy Information:Optimality, Complexity, Tractability

Leszek Plaskota

University of WarsawFaculty of Mathematics, Informatics, and Mechanics

Warsaw, [email protected]

MCQMC 201213 – 17 January 2012,Sydney, Australia

Leszek Plaskota Institute of Applied Mathematics

Noisy Information: Optimality, Complexity, Tractability

Outline

1 General framework (IBC)

2 Optimal algorithms

3 Information ε-complexity

4 Strong and Polynomial Tractability



General framework (IBC)

F - a normed space of (multivariate) functions

f : D 7→ R, D ⊂ Rd .

For f ∈ F , approximate S(f )

S : F 7→ G (G − normed).

Examples: Integration/Approximation

S = Int, Int(f ) =

∫D

f (x) dx

S = App, App(f ) = f



Algorithms

Approximation to S(f ) is obtained as

S(f ) ∼ ϕ(y)

where y is information about f

Information y is partial, noisy, and priced



Noisy information

y = [y1, y2, . . . , yn]T

nonadaptive: yk = Lk(f ) + xk , k = 1, 2, . . . , n

adaptive:

y1 = L1(f ) + x1y2 = L2(f ; y1) + x2

.......

yn = Ln(f ; y1, . . . , yn−1) + xn n = n(y)

Lk(·; y1, . . . , yk−1) are linear functionals, and xk is the noise



Noise (nonadaptive information)

deterministic

y ∈ { z ∈ Rn : ‖z− N(f )‖Y ¬ 1} =: N(f )

where ‖ · ‖Y is a norm in Rn, e.g.,

‖x‖Y = max{ |xk |/σk : 1 ¬ k ¬ n }‖x‖Y =

√xTΣ−1x, Σ = ΣT > 0

non-deterministic

y ∼ πf =: N(f )

where πf is a probability distribution on Rn, e.g.,

πf = Gauss(N(f ),Σ), Σ = ΣT > 0



Error in different settings

Worst case: f ∈ B ⊆ F

ew-wB (S ;N, ϕ) = supf ∈Bsupy∈N(f )

‖S(f )− ϕ(y)‖

ew-aB (S ;N, ϕ) = supf ∈B

(∫Y‖S(f )− ϕ(y)‖2πf (dy)

)1/2

Average case: f ∼ µ - zero-mean Gaussian measure on F

ea-aµ (S ;N, ϕ) =

(∫F

∫Y‖S(f )− ϕ(y)‖2πf (dy)µ(df )

)1/2ea-wµ (S ;N, ϕ) =

(∫Fsupy∈N(f )

‖S(f )− ϕ(y)‖2µ(df )

)1/2



Randomized setting

S(f ) is approximated by ϕω(y) where

y ∼ πf (·|ω) =: Nω(f )

and ω is a random parameter

Error:

eranB (S ;N, ϕ)

= supf ∈B

(Eω∫Yω‖S(f )− ϕω(y)‖2πf (dy|ω)

)1/2



Optimal algorithms: S = Int

Worst case with noise

‖y − N(f )‖Y ¬ 1,

where N : F 7→ Rn is linear.

Theorem (Smolyak’65, Micchelli & Rivlin’77, Sukharev’86, Magaril-Il’yaev & Osipenko’91)

If B is convex then there exists affine ϕ∗aff that is optimal,

ew-wB (Int;N, ϕ∗aff) = infϕ

ew-wB (Int;N, ϕ).

If B is convex and balanced then ϕ∗aff is linear.



What if the worst case with noise

y − N(f ) ∼ Gauss(0,Σ) ?

Theorem (Ibragimov & Hasminski’84, Brown & Feldman’90, Donoho’94, P’96)

If B is convex then there is affine ϕ∗aff that is close to optimal

ew-aB (Int;N, ϕ∗aff) ¬ 1.11 · infϕ

ew-aB (Int;N, ϕ)

If B is convex and balanced then ϕ∗aff is linear.

Proof: non-constructive, via hardest one-dimensional subproblems.

Similar results for average-average and average-worst, but not forrandomized.



Equivalence of different settings

µ - zero-mean Gaussian on F with correlation operator CµH - closure of H0 = Cµ(F ∗) with respect to inner product

〈f1, f2〉H =

∫F

L1(f )L2(f )µ(df ), CµLi = fi

Approximate Int(f ) from y = N(f ) + x (N-linear)Four settings:

w-w: f ∈ B-unit ball of H and√xTΣ−1x ¬ 1

w-a: f ∈ B-unit ball of H and x ∼ Gauss(0,Σ)

a-w: f ∼ µ-Gaussian on F and√xTΣ−1x ¬ 1

a-a: f ∼ µ-Gaussian on F and x ∼ Gauss(0,Σ)



NowInt(f ) = 〈f , f ∗〉H

andN(f ) = (〈f , f1〉H , 〈f , f2〉H , . . . , 〈f , fn〉H) .

Algorithm:

ϕ∗(y) =n∑k=1

zk yk

(Σ + M) z = N(f ∗), M = (〈fi , fj〉H)ni ,j=1 .

Note:∑nk=1 zk fk is “almost” H-orthogonal projection onto

span(f1, . . . , fn).



esett ∈ {ew-wB , ew-aB , ea-wµ , ea-aµ }

Theorem (P’96)

Linear algorithm ϕ∗ is close to optimal,

esett(Int;N, ϕ∗) =

√√√√Int(f ∗ −n∑k=1

zk fk

)¬ κsett · inf

ϕesett(Int;N, ϕ)

(κw-w, κw-a, κa-w, κa-a) = (1.43, 1.59, 2.13, 1.00).



Information Complexity

y = [y1, . . . , yn]T - information about f

yk = Lk(f ; y1, . . . , yk−1) + xk

xk ∼ Gauss(0, σ2k) or |xk | ¬ σk

Cost of information y

cost(N, y) =n∑k=1

c(σk)

c : [0,∞)→ (0,∞] is a cost function

Example:cγ(σ) = (1+ σ−1)γ , γ 0

(c0 ≡ 1)Leszek Plaskota Institute of Applied Mathematics


Information complexity: S = Int and Holder class

B - the class of d-variate functions f ∈ F = C rα([0, 1]d)

|f (r)(t1)− f (r)(t2)| ¬ ‖t1 − t2‖α∞, ∀ |r| = r1 + · · ·+ rd ¬ r .

Standard information

N(f ) = [f (t1), f (t2), . . . , f (tn)]T

For exact information (γ = 0)

compw-w(Int; ε) � ε−(dr+α

)(Bakhvalov’59)

What if information is noisy and c = cγ with γ > 0 ?Leszek Plaskota Institute of Applied Mathematics


Deterministic noise: yi = f (ti ) + xi , |xi | ¬ σi ∀i

Theorem (P)

compw-w(Int; ε) � ε−(dr+α+γ

)Hint: use uniform (nonadaptive) grid and σi � ε.

Non-deterministic noise: yi = f (ti ) + xi , xi ∼ Gauss(0, σ2i ) ∀i

With non-adaptive information the minimal cost is

�

ε−2 d ¬ 2(r + α), γ 2

ε−(dr+α+γ

(1− d2(r+α)

))d ¬ 2(r + α), 0 < γ < 2

ε−(dr+α

)d > 2(r + α)



Adaption helps for γ < 2

Theorem (P)

compw-a(Int; ε) �

ε−2 γ 2

ε−(

dr+α+d/2+γ

(1− d2(r+α)+d

))γ < 2

Hint: use σi � εr+α

r+α+d/2 for γ < 2, and σi � 1 for γ 2.

This is much better than for exact information. Why?

Theorem (P’96)

worst-average case setting ≡ randomized setting

compran(S ; ε) ¬ compw-a(S ; ε) ¬ compran(S ; ε) + 2



adaption + random noise = randomization

y1 = f (t) + x1, y2 = f (t) + x2, xi ∼ Gauss(0, σ2)

ω := y1 − y2 = x1 − x2 ∼ Gauss(0, 2σ2)

For i = 3, 4, . . .

ti = ti (y1, y2, . . . , yi−1) = ti (ω, . . .)



Information complexity: S = App and Holder class

Uniform approximation:

App : F 7→ L∞

For exact information (γ = 0)

compw-w(App; ε) � ε−(dr+α

)� compran(App; ε)

(Bakhvalov’59, Novak’88)

Theorem (P)

compw-a(App; ε) � compran(App; ε)

� ε−(dr+α+γ

)lnγ/2(1/ε), γ = min(2, γ)



Tractability: average-average case approximation

For d = 1, 2, . . .Appd : Fd 7→ Gd

Fd - Banach with zero-mean Gaussian measure µd , Gd - Hilbert

Approximate f ∼ µd from information y = [y1, . . . , yn]T

yk = Lk(f ; y1, . . . , yk−1) + xk , k = 1, 2, . . . , n(y)

xk ∼ Gauss(0, σ2k), σk = σk(y1, . . . , yk−1)

Permissible functionals:

‖Lk‖2µd :=

∫F

L2k(f )µd(df ) ¬ 1



Normalizing assumption:ea-a(Appd ; 0, 0) =

√∫F ‖f ‖2Gd µ(df ) = 1

Polynomial tractability:

compsett(Sd ; ε) ¬ C · dq · ε−p ∀d ∀ε ¬ ε0

If q = 0 then strong polynomial tractability

Tractability for exact information=⇒ Novak & Woźniakowski [08,10,12?]



Theorem (P)

Poly. tractability for (γ = 0) with exponents p and q =⇒poly. tractability for (γ > 0) with exponents{

p′ = γ, q′ = γ q/p for γ > pp′ = p, q′ = q for γ ¬ p

γ = min(2, γ).



Optimal information

Cνd : Gd 7→ Gd - correlation operator of νd on Gd induced by µd

Cνdg = Cµd (〈·, g〉Gd ), g ∈ Gd

where Cµd : F ∗d 7→ Gd - correlation operator of µd

{ξd ,i} - complete orthonormal eigenelements of Cν,d

Cν,d ξd ,i = λi ξd ,i



λd ,1 λd ,2 λd ,3 . . .

Kd ,i := λ−1/2d ,i 〈·, ξd ,i 〉Gd i 1

For γ ¬ 2 algorithms use Kd ,i with variable σi , and

for γ > 2 some linear combinations of Kd ,i with fixed σi



Thank you.



Documents

Noisy Information: Optimality, Complexity, Tractability€¦ · What if the worst case with noise y −N(f) ∼Gauss(0,) ? Theorem (Ibragimov & Hasminski’84, Brown & Feldman’90,