Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Noisy Information:Optimality, Complexity, Tractability
Leszek Plaskota
University of WarsawFaculty of Mathematics, Informatics, and Mechanics
Warsaw, [email protected]
MCQMC 201213 – 17 January 2012,Sydney, Australia
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Outline
1 General framework (IBC)
2 Optimal algorithms
3 Information ε-complexity
4 Strong and Polynomial Tractability
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
General framework (IBC)
F - a normed space of (multivariate) functions
f : D 7→ R, D ⊂ Rd .
For f ∈ F , approximate S(f )
S : F 7→ G (G − normed).
Examples: Integration/Approximation
S = Int, Int(f ) =
∫D
f (x) dx
S = App, App(f ) = f
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Algorithms
Approximation to S(f ) is obtained as
S(f ) ∼ ϕ(y)
where y is information about f
Information y is partial, noisy, and priced
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Noisy information
y = [y1, y2, . . . , yn]T
nonadaptive: yk = Lk(f ) + xk , k = 1, 2, . . . , n
adaptive:
y1 = L1(f ) + x1y2 = L2(f ; y1) + x2
.......
yn = Ln(f ; y1, . . . , yn−1) + xn n = n(y)
Lk(·; y1, . . . , yk−1) are linear functionals, and xk is the noise
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Noise (nonadaptive information)
deterministic
y ∈ { z ∈ Rn : ‖z− N(f )‖Y ¬ 1} =: N(f )
where ‖ · ‖Y is a norm in Rn, e.g.,
‖x‖Y = max{ |xk |/σk : 1 ¬ k ¬ n }‖x‖Y =
√xTΣ−1x, Σ = ΣT > 0
non-deterministic
y ∼ πf =: N(f )
where πf is a probability distribution on Rn, e.g.,
πf = Gauss(N(f ),Σ), Σ = ΣT > 0
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Error in different settings
Worst case: f ∈ B ⊆ F
ew-wB (S ;N, ϕ) = supf ∈Bsupy∈N(f )
‖S(f )− ϕ(y)‖
ew-aB (S ;N, ϕ) = supf ∈B
(∫Y‖S(f )− ϕ(y)‖2πf (dy)
)1/2
Average case: f ∼ µ - zero-mean Gaussian measure on F
ea-aµ (S ;N, ϕ) =
(∫F
∫Y‖S(f )− ϕ(y)‖2πf (dy)µ(df )
)1/2ea-wµ (S ;N, ϕ) =
(∫Fsupy∈N(f )
‖S(f )− ϕ(y)‖2µ(df )
)1/2
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Randomized setting
S(f ) is approximated by ϕω(y) where
y ∼ πf (·|ω) =: Nω(f )
and ω is a random parameter
Error:
eranB (S ;N, ϕ)
= supf ∈B
(Eω∫Yω‖S(f )− ϕω(y)‖2πf (dy|ω)
)1/2
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Optimal algorithms: S = Int
Worst case with noise
‖y − N(f )‖Y ¬ 1,
where N : F 7→ Rn is linear.
Theorem (Smolyak’65, Micchelli & Rivlin’77, Sukharev’86, Magaril-Il’yaev & Osipenko’91)
If B is convex then there exists affine ϕ∗aff that is optimal,
ew-wB (Int;N, ϕ∗aff) = infϕ
ew-wB (Int;N, ϕ).
If B is convex and balanced then ϕ∗aff is linear.
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
What if the worst case with noise
y − N(f ) ∼ Gauss(0,Σ) ?
Theorem (Ibragimov & Hasminski’84, Brown & Feldman’90, Donoho’94, P’96)
If B is convex then there is affine ϕ∗aff that is close to optimal
ew-aB (Int;N, ϕ∗aff) ¬ 1.11 · infϕ
ew-aB (Int;N, ϕ)
If B is convex and balanced then ϕ∗aff is linear.
Proof: non-constructive, via hardest one-dimensional subproblems.
Similar results for average-average and average-worst, but not forrandomized.
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Equivalence of different settings
µ - zero-mean Gaussian on F with correlation operator CµH - closure of H0 = Cµ(F ∗) with respect to inner product
〈f1, f2〉H =
∫F
L1(f )L2(f )µ(df ), CµLi = fi
Approximate Int(f ) from y = N(f ) + x (N-linear)Four settings:
w-w: f ∈ B-unit ball of H and√xTΣ−1x ¬ 1
w-a: f ∈ B-unit ball of H and x ∼ Gauss(0,Σ)
a-w: f ∼ µ-Gaussian on F and√xTΣ−1x ¬ 1
a-a: f ∼ µ-Gaussian on F and x ∼ Gauss(0,Σ)
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
NowInt(f ) = 〈f , f ∗〉H
andN(f ) = (〈f , f1〉H , 〈f , f2〉H , . . . , 〈f , fn〉H) .
Algorithm:
ϕ∗(y) =n∑k=1
zk yk
(Σ + M) z = N(f ∗), M = (〈fi , fj〉H)ni ,j=1 .
Note:∑nk=1 zk fk is “almost” H-orthogonal projection onto
span(f1, . . . , fn).
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
esett ∈ {ew-wB , ew-aB , ea-wµ , ea-aµ }
Theorem (P’96)
Linear algorithm ϕ∗ is close to optimal,
esett(Int;N, ϕ∗) =
√√√√Int(f ∗ −n∑k=1
zk fk
)¬ κsett · inf
ϕesett(Int;N, ϕ)
(κw-w, κw-a, κa-w, κa-a) = (1.43, 1.59, 2.13, 1.00).
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Information Complexity
y = [y1, . . . , yn]T - information about f
yk = Lk(f ; y1, . . . , yk−1) + xk
xk ∼ Gauss(0, σ2k) or |xk | ¬ σk
Cost of information y
cost(N, y) =n∑k=1
c(σk)
c : [0,∞)→ (0,∞] is a cost function
Example:cγ(σ) = (1+ σ−1)γ , γ 0
(c0 ≡ 1)Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Information complexity: S = Int and Holder class
B - the class of d-variate functions f ∈ F = C rα([0, 1]d)
|f (r)(t1)− f (r)(t2)| ¬ ‖t1 − t2‖α∞, ∀ |r| = r1 + · · ·+ rd ¬ r .
Standard information
N(f ) = [f (t1), f (t2), . . . , f (tn)]T
For exact information (γ = 0)
compw-w(Int; ε) � ε−(dr+α
)(Bakhvalov’59)
What if information is noisy and c = cγ with γ > 0 ?Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Deterministic noise: yi = f (ti ) + xi , |xi | ¬ σi ∀i
Theorem (P)
compw-w(Int; ε) � ε−(dr+α+γ
)Hint: use uniform (nonadaptive) grid and σi � ε.
Non-deterministic noise: yi = f (ti ) + xi , xi ∼ Gauss(0, σ2i ) ∀i
With non-adaptive information the minimal cost is
�
ε−2 d ¬ 2(r + α), γ 2
ε−(dr+α+γ
(1− d2(r+α)
))d ¬ 2(r + α), 0 < γ < 2
ε−(dr+α
)d > 2(r + α)
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Adaption helps for γ < 2
Theorem (P)
compw-a(Int; ε) �
ε−2 γ 2
ε−(
dr+α+d/2+γ
(1− d2(r+α)+d
))γ < 2
Hint: use σi � εr+α
r+α+d/2 for γ < 2, and σi � 1 for γ 2.
This is much better than for exact information. Why?
Theorem (P’96)
worst-average case setting ≡ randomized setting
compran(S ; ε) ¬ compw-a(S ; ε) ¬ compran(S ; ε) + 2
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
adaption + random noise = randomization
y1 = f (t) + x1, y2 = f (t) + x2, xi ∼ Gauss(0, σ2)
ω := y1 − y2 = x1 − x2 ∼ Gauss(0, 2σ2)
For i = 3, 4, . . .
ti = ti (y1, y2, . . . , yi−1) = ti (ω, . . .)
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Information complexity: S = App and Holder class
Uniform approximation:
App : F 7→ L∞
For exact information (γ = 0)
compw-w(App; ε) � ε−(dr+α
)� compran(App; ε)
(Bakhvalov’59, Novak’88)
Theorem (P)
compw-a(App; ε) � compran(App; ε)
� ε−(dr+α+γ
)lnγ/2(1/ε), γ = min(2, γ)
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Tractability: average-average case approximation
For d = 1, 2, . . .Appd : Fd 7→ Gd
Fd - Banach with zero-mean Gaussian measure µd , Gd - Hilbert
Approximate f ∼ µd from information y = [y1, . . . , yn]T
yk = Lk(f ; y1, . . . , yk−1) + xk , k = 1, 2, . . . , n(y)
xk ∼ Gauss(0, σ2k), σk = σk(y1, . . . , yk−1)
Permissible functionals:
‖Lk‖2µd :=
∫F
L2k(f )µd(df ) ¬ 1
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Normalizing assumption:ea-a(Appd ; 0, 0) =
√∫F ‖f ‖2Gd µ(df ) = 1
Polynomial tractability:
compsett(Sd ; ε) ¬ C · dq · ε−p ∀d ∀ε ¬ ε0
If q = 0 then strong polynomial tractability
Tractability for exact information=⇒ Novak & Woźniakowski [08,10,12?]
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Theorem (P)
Poly. tractability for (γ = 0) with exponents p and q =⇒poly. tractability for (γ > 0) with exponents{
p′ = γ, q′ = γ q/p for γ > pp′ = p, q′ = q for γ ¬ p
γ = min(2, γ).
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Optimal information
Cνd : Gd 7→ Gd - correlation operator of νd on Gd induced by µd
Cνdg = Cµd (〈·, g〉Gd ), g ∈ Gd
where Cµd : F ∗d 7→ Gd - correlation operator of µd
{ξd ,i} - complete orthonormal eigenelements of Cν,d
Cν,d ξd ,i = λi ξd ,i
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
λd ,1 λd ,2 λd ,3 . . .
Kd ,i := λ−1/2d ,i 〈·, ξd ,i 〉Gd i 1
For γ ¬ 2 algorithms use Kd ,i with variable σi , and
for γ > 2 some linear combinations of Kd ,i with fixed σi
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability
Thank you.
Leszek Plaskota Institute of Applied Mathematics
Noisy Information: Optimality, Complexity, Tractability