Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Bregman Iterative Methods,Lagrangian Connections,
Dual Interpretations,and Applications
Ernie Esser
UCLA
6-30-09
1
Outline
• Bregman Iteration Overview• Method for Constrained Optimization• Compare to Denoising Application
• Linearized Bregman forl1-Minimization• Derivation and Equivalent Forms
• Lagrangian Connections• Bregman Iteration / Method of Multipliers• Linearized Bregman / Uzawa Method
• Dual Interpretations• Proximal Point Algorithm• Gradient Ascent
2
Outline Continued
• Split Bregman Idea• TV-l2 Example
• More General Separable Convex Programs• Split Bregman Connection to ADMM
• Convergence• Dual Interpretation• TV-l − 1 Minimization Example
• Decoupling Variables for More Explicit Algorithms• TV Deblurring Example• Compressive Sensing Example
• Connection to PDHG• Main Idea and Derivation
• Further Applications...
3
A Model Constrained Minimization Problem
minu
J(u) s.t. Ku = f
J closed proper convex
J : Rm → (−∞,∞], u ∈ Rm
K ∈ Rs×m, f ∈ R
s
Examples:• J(u) = ‖u‖1 (Basis Pursuit)
• J(u) = ‖u‖TV
4
Bregman Distance
Dpk
J (u, uk) = J(u) − J(uk) − 〈pk, u − uk〉
wherepk ∈ ∂J(uk).
By definition of subdifferential,pk ∈ ∂J(uk) means
J(v) − J(uk) − 〈pk, v − uk〉 ≥ 0 ∀v
5
Bregman Iteration
uk+1 = arg minu
Dpk
J (u, uk) +δ
2‖Ku − f‖2
pk+1 = pk − δKT (Kuk+1 − f) ∈ ∂J(uk+1)
Equivalentuk+1:
uk+1 = arg minu
J(u) − 〈pk, u〉 +δ
2‖Ku − f‖2
Initialization: p0 = 0, u0 arbitrary
Ref: Yin, W., Osher, S., Goldfarb, D., Darbon, J.,Bregman Iterative Algorithmsfor l1-Minimization with Applications to Compressed Sensing, UCLA CAM Report[07-37], 2007.
6
Denoising Example
minu
‖u‖TV s.t. ‖u − f‖2 ≤ σ2
Apply Bregman iteration to:
minu
‖u‖TV s.t. u = f
⇒ uk+1 = arg minu
‖u‖TV +δ
2‖u − f − pk
δ‖2
pk+1 = pk − δ(uk+1 − f)
‖uk − f‖ → 0 monotonically
‖uk − u∗‖ non-increasing while‖uk − f‖ ≥ ‖u∗ − f‖
⇒ Stop iterating when constraint satisfied
7
Linearized Bregman for l1-Minimization
Apply Bregman iteration to
minu
‖u‖1 s.t. Ku = f
but replaceδ2‖Ku − f‖2 with 〈δKT (Kuk − f), u〉 + 1
2α‖u − uk‖2
⇒ uk+1 = arg minu
‖u‖1 +1
2α‖u − uk − αpk + δαKT (Kuk − f)‖2
pk+1 =−uk+1
α+
uk
α+ pk − δKT (Kuk − f)
Initialization: p0 = 0, u0 arbitrary
Ref: Osher, S., Mao, Y., Dong, B., and Yin, W.,Fast Linearized BregmanIteration for Compressive Sensing and Sparse Denoising, UCLA CAM Report[08-37], 2008.
8
Equivalent Form
Let vk = pk+1 + uk+1
α, v0 = δKT f .
Can rewrite linearized Bregman steps as
uk+1 = arg minu
‖u‖1 +1
2α‖u − αvk‖2
vk+1 = vk − δKT (Kuk+1 − f)
Remark 1: Algorithm actually solves
minu
‖u‖1 +1
2α‖u‖2 s.t. Ku = f
Remark 2: In practice, useµ‖u‖1 instead of‖u‖1 for numerical reasons.
9
Soft Thresholding
Explicit formula for
Sα(z) = arg minu
‖u‖1 +1
2α‖u − z‖2
2
=
{
z − α sign(z) if |z| > α
0 otherwise
Can use Moreau decomposition to reinterpretSα(z) in terms of a projection
Sα(z) = z − αΠ{z:‖z‖∞≤1}(z
α)
whereΠ(z) = zmax(|z|,1) is orthogonal projection onto{z : ‖z‖∞ ≤ 1}.
10
Some Convex Optimization References
• Bertsekas, D.,Constrained Optimization and Lagrange Multiplier Methods,Athena Scientific, 1996.
• Bertsekas, D.,Nonlinear Programming, Athena Scientific, Second Edition.1999.
• Bertsekas, D., and Tsitsiklis, J.,Parallel and Distributed Computation,Prentice Hall, 1989.
• Boyd, S., and Vandenberghe, L.,Convex Analysis, Cambridge UniversityPress, 2006.
• Ekeland, I., and Temam, R.Convex Analysis and Variational Problems,SIAM, Classics in Applied Mathematics, 28, 1999.
• Rockafellar, R., T.,Convex Analysis, Princeton University Press,Princeton, NJ, 1970.
11
Legendre-Fenchel Transform
J∗(p) = supw
〈p, w〉 − J(w)
Special case whenJ is a norm,J(w) = ‖w‖:
J∗(p) = supw
〈p, w〉 − ‖w‖
=
{
0 if 〈p, w〉 ≤ ‖w‖ ∀w
∞ otherwise
=
{
0 if sup‖w‖≤1〈p, w〉 ≤ 1
∞ otherwise
=
{
0 if ‖p‖∗ ≤ 1 by dual norm definition
∞ otherwise
12
Moreau Decomposition
Let f ∈ Rm andJ be a closed proper convex function onR
m. Then:
f = arg minu
J(u) +1
2α‖u − f‖2
2 + α
[
arg minp
J∗(p) +α
2‖p − f
α‖22
]
Sometimes written:
f = proxαJ(f) + α prox J∗
α(f
α)
Ref: Combettes, P., and Wajs, W.,Signal Recovery by Proximal Forward-BackwardSplitting, Multiscale Modelling and Simulation, 2006.
13
Bregman / Method of Multipliers
Bregman iteration forminu J(u) s.t. Ku = f :
uk+1 = arg minu
J(u) − 〈pk, u〉 +δ
2‖Ku − f‖2
pk+1 = pk − δKT (Kuk+1 − f), p0 = 0
Equivalent to method of multipliers:
uk+1 = arg minu
J(u) + 〈λk, Ku − f〉 +δ
2‖Ku − f‖2
λk+1 = λk + δ(Kuk+1 − f), λ0 = 0
with pk = −KT λk ∀k.
Ref: Yin, W., Osher, S., Goldfarb, D., Darbon, J.,Bregman Iterative Algorithmsfor l1-Minimization with Applications to Compressed Sensing, UCLA CAM Report[07-37], 2007.
14
Linearized Bregman / Uzawa
Linearized Bregman iteration forminu J(u)+ 12α
‖u‖2 s.t. Ku = f :
uk+1 = arg minu
J(u) +1
2α‖u − αvk‖2
vk+1 = vk − δKT (Kuk+1 − f), v0 = δKT f
Equivalent to Uzawa’s method:
uk+1 = arg minu
J(u) +1
2α‖u‖2 + 〈λk, Ku − f〉
λk+1 = λk + δ(Kuk+1 − f), λ0 = −δf
with vk = −KT λk ∀k.
Ref: Cai, J.F., Candes, E., and Shen, Z.,A Singular Value Thresholding Algorithmfor Matrix Completion, CAM [08-77], 2008.
15
Relevant Dual Functionals
Lagrangian forminu J(u) s.t. Ku = f is
L(u, λ) = J(u) + 〈λ, Ku − f〉
Dual functional is
q(λ) = infu
L(u, λ) = −J∗(−KT λ) − 〈λ, f〉
Dual problem:maxλ q(λ)
Augmented Lagrangian:
Lδ(u, λ) = L(u, λ) +δ
2‖Ku − f‖2
qδ(λ) = infu
Lδ(u, λ)
16
Proximal Point Interpretation
Lδ(u, λk) = maxy
L(u, y) − 1
2δ‖y − λk‖2
⇒ y∗ = λk + δ(Ku − f)
minu
maxy
L(u, y) − 1
2δ‖y − λk‖2 attained at(uk+1, λk+1)
⇒maxy
q(y) − 1
2δ‖y − λk‖2 attained atλk+1
⇒λk+1 = arg maxy
q(y) − 1
2δ‖y − λk‖2
(Proximal point algorithm for maximizingq(λ))
17
Gradient Ascent Interpretation
qδ(λ) = maxy q(y) − 12δ‖y − λ‖2 can be shown to be differentiable.
∇qδ(λk) = −
[
λk − arg maxy q(y) − 12δ‖y − λk‖2
δ
]
=λk+1 − λk
δ
⇒ λk+1 = λk + δ∇qδ(λk)
18
Dual Functional for Linearized Bregman
Let JLB = J(u) + 12δ‖u‖2.
Lagrangian forminu JLB s.t. Ku = f is
LLB(u, λ) = JLB(u) + 〈λ, Ku − f〉
Dual functional is
qLB(λ) = −J∗LB(−KT λ) − 〈λ, f〉
Remark: From strict convexity ofJLB , J∗LB is differentiable and
∇J∗LB is Lipschitz with constantα‖K‖2.
19
Gradient Ascent Interpretation
From optimality condition for Lagrangian form ofuk+1 update,
uk+1 = arg minu
J(u) +1
2α‖u‖2 + 〈λk, Ku − f〉,
0 ∈ ∂JLB(uk+1) + KT λk.
Using definitions of Legendre transform and subdifferential,
uk+1 = ∇J∗LB(−KT λk),
So ∇qLB(λk) = Kuk+1 − f
Can therefore interpret
λk+1 = λk + δ(Kuk+1 − f) as
λk+1 = λk + δ∇qLB(λk).
Ref: Yin, W.,Analysis and Generalizations of the Linearized Bregman Method,UCLA CAM Report [09-42], May 2009.
20
Split Bregman Idea
Example: Total Variation Denoising
minu
”‖∇u‖1” +λ
2‖u − f‖2
Reformulate as
minw,u
”‖w‖1” +λ
2‖u − f‖2 s.t. w = ∇u
Apply Bregman iteration to constrained problem but use alternatingminimization with respect tow andu.
Ref: Goldstein, T., and Osher, S.,The Split Bregman Algorithm for L1 RegularizedProblems, UCLA CAM Report [08-29], April 2008.
21
Discrete TV Seminorm Notation
‖u‖TV =
Mr∑
p=1
Mc∑
q=1
√
(D+1 up,q)2 + (D+
2 up,q)2
VectorizeMr × Mc matrix by stacking columns(p, q) element of matrix↔ (q − 1)Mr + p element of vector
Define grid-shaped graph withm nodes corresponding to elements(p, q).Index nodes by(q − 1)Mr + p and edges arbitrarily. For each edgeη withendpoint indices(i, j), i < j, define:
Dη,k =
−1 for k = i,
1 for k = j,
0 for k 6= i, j.
Also defineE ∈ Re×m such that
Eη,k =
{
1 if Dη,k = −1,
0 otherwise.
22
TV Notation (continued)
Define norm‖w‖E =∑m
k=1
(
√
ET (w2))
k.
Then can rewrite TV norm as
‖u‖TV = ‖Du‖E
Dual norm is defined by
‖p‖E∗ = ‖√
ET (p2)‖∞
23
Convex Programs with Separable Structure
minu
J(u) s.t. Ku = f
J(u) = H(u) +N
∑
i=1
Gi(Aiu + bi)
Rewrite asminz,u
F (z) + H(u) s.t. Bz + Au = b
where F (z) =∑N
i=1 Gi(zi), z =
z1
...
zN
, B =
[
−I
0
]
,
A =
A1
...
AN
K
, and b =
−b1
...
−bN
f
.
24
Application of Bregman Iteration
Apply Bregman Iteration to:
minz,u
F (z) + H(u) s.t. Bz + Au = b
(zk+1, uk+1) = arg minz∈Rn,u∈Rm
F (z) − F (zk) − 〈pkz , z − zk〉+
H(u) − H(uk) − 〈pku, u − uk〉+
α
2‖b − Au − Bz‖2
pk+1z =pk
z + αBT (b − Auk+1 − Bzk+1)
pk+1u =pk
u + αAT (b − Auk+1 − Bzk+1).
Initialization: p0z = 0, p0
u = 0
25
Augmented Lagrangian Form
Augmented Lagrangian is given by
Lα(z, u, λ) = F (z) + H(u) + 〈λ, Au + Bz − b〉 +α
2‖Au + Bz − b‖2
Then(zk+1, uk+1) can be equivalently updated by
(zk+1, uk+1) = arg minz,u
Lα(z, u, λk)
λk+1 = λk + α(Auk+1 + Bzk+1 − b), λ0 = 0,
which is the method of multipliers.
Equivalence to Bregman iteration again follows frompkz = −BT λk and
pku = −AT λk.
26
ADMM / Split BregmanAlternate minimization with respect tou andz:Theorem 1 (Eckstein, Bertsekas) Suppose B has full column rank andH(u) + ‖Au‖2 is strictly convex. Let λ0 and u0 be arbitrary and let α > 0.Suppose we are also given sequences {µk} and {νk} such that µk ≥ 0,νk ≥ 0,
∑∞k=0 µk < ∞ and
∑∞k=0 νk < ∞. Suppose that
‖zk+1 − arg minz∈Rn
F (z) + 〈λk, Bz〉 +α
2‖Auk + Bz − b‖2‖ ≤ µk (1)
‖uk+1 − arg minu∈Rm
H(u) + 〈λk, Au〉 +α
2‖Au + Bzk+1 − b‖2‖ ≤ νk (2)
λk+1 = λk + α(Auk+1 + Bzk+1 − b). (3)
If there exists a saddle point of L(z, u, λ) , then zk → z∗, uk → u∗ andλk → λ∗, where (z∗, u∗, λ∗) is such a saddle point. If no such saddle pointexists, then at least one of the sequences {uk} or {λk} must be unbounded.
Ref: Eckstein, J., and Bertsekas, D.,On the Douglas-Rachford splitting methodand the proximal point algorithm for maximal monotone operators, MathematicalProgramming 55, North-Holland, 1992.
27
Dual Functional
q(λ) = infz,u
F (z) + H(u) + 〈λ, Au + Bz − b〉
= −F ∗(−BT λ) − 〈λ, b〉 − H∗(−AT λ)
λ∗ is optimal if
0 ∈ −B∂F ∗(−BT λ∗) + b − A∂H∗(−AT λ∗)
Let Ψ(λ) = −B∂F ∗(−BT λ) + b φ(λ) = −A∂H∗(−AT λ).
28
Douglas Rachford Splitting
Formally apply Douglas Rachford splitting withα as the time step:
0 ∈ rk+1 − λk
α+ Ψ(rk+1) + φ(λk),
0 ∈ λk+1 − λk
α+ Ψ(rk+1) + φ(λk+1)
Remark: There are possibly many ways to satisfy the above iterations, butADMM satisfies it in a particular way.
rk+1 = (I + αΨ)−1(λk + αAuk)
λk+1 = (I + αφ)−1(rk+1 − αAuk)
29
Reformulation of DR Splitting
rk+1 = arg minr
F ∗(−BT r) + 〈r, b〉 +1
2α‖r − λk + αqk‖2
λk+1 = arg minλ
H∗(−AT λ) +1
2α‖λ − rk+1 − αqk‖2
qk+1 = qk +1
α(rk+1 − λk+1)
Remark: Don’t need the ’full column rank’ or ’strictly convex’ assumptions toguarantee thatλk converges to solution of dual problem
Ref: Eckstein, J.,Splitting Methods for Monotone Operators with Applications toParallel Optimization, Ph. D. Thesis, Massachusetts Institute of Technology,Dept. of Civil Engineering, http://hdl.handle.net/1721.1/14356, 1989.
30
TV- l1 Example
minu
‖u‖TV + β‖Ku − f‖1
Rewrite asmin
u‖Du‖E + β‖Ku − f‖1
Let z =
[
w
v
]
=
[
Du
Ku − f
]
, B = −I A =
[
D
K
]
, b =
[
0
f
]
to put in form minz,u F (z) + H(u) s.t. Bz + Au = b
Introduce dual variableλ =
[
p
q
]
.
Solution exists assumingker(D)⋂
ker(K) = {0}.
31
Augmented Lagrangian and ADMM Iterations
L(z, u, λ) =‖w‖E + β‖v‖1 + 〈p, Du − w〉 + 〈q, Ku − f − v〉+α
2‖w − Du‖2 +
α
2‖v − Ku + f‖2
The ADMM iterations are given by
wk+1 = arg minw
‖w‖E +α
2‖w − Duk − pk
α‖2
vk+1 = arg minv
β‖v‖1 +α
2‖v − Kuk + f − qk
α‖2
uk+1 = arg minu
α
2‖Du − wk+1 +
pk
α‖2 +
α
2‖Ku − vk+1 − f +
qk
α‖2
pk+1 = pk + α(Duk+1 − wk+1)
qk+1 = qk + α(Kuk+1 − f − vk+1),
wherep0 = q0 = 0, u0 is arbitrary andα > 0.
32
Explicit Iterations
The explicit formulas forwk+1, vk+1 anduk+1 are given by
wk+1 = S̃ 1α(Duk +
pk
α)
vk+1 = S β
α
(Kuk − f +qk
α)
uk+1 = (−4 + KT K)−1
(
DT wk+1 − DT pk
α+ KT (vk+1 + f) − KT qk
α
)
= (−4 + KT K)−1(
DT wk+1 + KT (vk+1 + f))
.
where
S̃c(f) = f − cΠ{p:‖p‖E∗≤1}(f
c),
Π{p:‖p‖E∗≤1}(p) =p
E max(
√
ET (p2), 1)
33
TV- l1 Results
f u
TV-l1 Minimization of512 × 512 Synthetic Image
Image Size Iterations Time
64 × 64 40 1s
128 × 128 51 5s
256 × 256 136 78s
512 × 512 359 836sIterations until‖uk − uk−1‖∞ ≤ .5, ‖Duk − wk‖∞ ≤ .5 and‖vk − uk + f‖∞ ≤ .5
β = .6, .3, .15 and.075, α = .02, .01, .005 and.0025
34
Decoupling VariablesOne can add additional proximal-like penalties to ADMM iterations andobtain a more explicit algorithm that still converges.
Given a step of the ADMM algorithm of the form
uk+1 = arg minu
J(u) + 〈λk, Ku − f〉 +α
2‖Ku − f‖2,
modify the objective functional by adding
1
2〈u − uk, (
1
δ− αKT K)(u − uk)〉,
whereδ is chosen such that0 < δ < 1α‖KT K‖ .
Modified update is given by
uk+1 = arg minu
J(u) + 〈λk, Ku − f〉 +1
2δ‖u − uk + αδKT (Kuk − f)‖2.
Ref: Zhang, X., Burger, M., Bresson, X., Osher, S.,Bregmanized NonlocalRegularization for Deconvolution and Sparse Reconstruction, UCLA CAM Report[09-03] 2009.
35
Convex Constraints as Indicator Functions
Given a constraint of the formu ∈ S whereS is convex, we can enforce theconstraint by adding to the objective functional the indicator function forS,
H(u) =
{
0 if u ∈ S
∞ otherwise
One can then develop algorithms that project onto the constraint set at eachiteration, or in combination with the decoupling trick handle the constraint ina more explicit manner.
36
Example Constraint
Suppose the constraint is‖Ku − f‖ ≤ ε, soS = {u : ‖Ku − f‖ ≤ ε}.
ΠS(z) = (I − K†K)z + K†
{
Kz if ‖Kz − f‖ ≤ ε
f + r(
Kz−KK†f‖Kz−KK†f‖
)
otherwise,
where
r =√
ε2 − ‖(I − KK†)f‖22
By decoupling variables, can simplify projection step to
Π{z:‖z−f‖2≤ε}(z) = f +z − f
max(
‖z−f‖2
ε, 1
) .
Useful whenK† not easy to compute.
37
TV Deblurring Example
minu
‖u‖TV s.t. ‖Ku − f‖ ≤ ε
Rewrite as
minu
‖Du‖E + H(Ku) where H(z) =
{
0 ‖z − f‖ ≤ ε
∞ otherwise.
Let T = {z : ‖z − f‖ ≤ ε} andX = {p : ‖p‖E∗ ≤ 1}.
Saddle point problem from Lagrangian:
maxp,q
infu,w,z
‖w‖E + 〈p, Du − w〉 + H(z) + 〈q, Ku − z〉
38
Split Inexact Uzawa Method for TV Deblurring
uk+1 = arg minu
δ
2‖Du − wk‖2 +
δ
2‖Ku − zk‖2 + 〈pk, Du〉 + 〈qk, Ku〉
+1
2〈u − uk,
[
(1
α− δDT D) + (
1
α− δKT K)
]
(u − uk)〉
wk+1 = arg minw
‖w‖E +δ
2‖w − Duk+1 − pk
δ‖2
zk+1 = arg minz
H(z) +δ
2‖z − Kuk+1 − qk
δ‖2
pk+1 = pk + δ(Duk+1 − wk+1)
qk+1 = qk + δ(Kuk+1 − zk+1)
If D ∼ ∇ andK is normalized blurring operator, just need0 < α < 14δ
.Ref: Zhang, X.,A Unified Primal-Dual Algorithm Based on l1 and Bregman Iteration,(Private Communication), April 2009.
39
TV Deblurring Algorithm (continued)
Use two applications of Moreau decomposition to rewrite previous algorithmin terms of projections ontoT andX :
uk+1 = uk − α
2
[
DT (2pk − pk−1) + KT (2qk − qk−1)]
pk+1 = ΠX(pk + δDuk+1)
qk+1 = (qk + δKuk+1) − δΠT (qk
δ+ Kuk+1)
Remark: Can require more iterations than a more implicit algorithm, but hasthe advantage of only requiring matrix multiplications andsimple projections.
40
Compressive Sensing Example
minz
‖Ψz‖1 s.t. ‖RΓz − f‖2 ≤ ε,
whereRΓ is the measurement matrix and we expectΨz to be sparse.
Let J = ‖ · ‖1, A = Ψ, K = RΓ and
H(x) =
{
0 if ‖x − f‖2 ≤ ε
∞ otherwise.
⇒ Just like the deblurring example
If ΨT Ψ = I (tight frame), can choose to handle implicitly
If Γ is discrete Fourier transform, can handleK implicitly too
41
Connections to PDHG
SinceJ∗∗ = J , J(Au) = J∗∗(Au) = supp〈p, Au〉 − J∗(p).
Can therefore obtain the following saddle point problem fromminu J(Au) + H(u),
minu
supp
−J∗(p) + 〈p, Au〉 + H(u).
The Primal Dual Hybrid Gradient algorithm then alternates primal and dualproximal steps of the form:
pk+1 = arg maxp
−J∗(p) + 〈p, Auk〉 − 1
2δk
‖p − pk‖22
uk+1 = arg minu
H(u) + 〈AT pk+1, u〉 +1
2αk
‖u − uk‖22
Ref: Zhu, M., and Chan, T.,An Efficient Primal-Dual Hybrid Gradient Algorithm forTotal Variation Image Restoration, UCLA CAM Report [08-34], May 2008.
42
Figure 1: PDHG-Related Algorithm Framework
(P) minu FP (u)
FP (u) = J(Au) + H(u)
(D)maxp FD(p)
FD(p) = −J∗(p) − H∗(−AT p)
(PD)minu supp LPD(u, p)
LPD(u, p) = 〈p,Au〉 − J∗(p) + H(u)
(SPP)maxp infu,w LP (u,w, p)
LP (u,w, p) = J(w) + H(u) + 〈p,Au − w〉
(SPD)maxu infp,y LD(p, y, u)
LD(p, y, u) = J∗(p) + H∗(y) + 〈u,−AT p − y〉?? ??
AMA on (SPP)⇔
PFBS on (D)
AMA on (SPD)⇔
PFBS on (P)
@@
@@R
��
��
+ 1
2α‖u − uk‖2
2 + 1
2δ‖p − pk‖2
2
����������
����������
CCCCCCCCCW
CCCCCCCCCW
+ δ
2‖Au − w‖2
2 +α
2‖AT p + y‖2
2Relaxed AMA
on (SPP)Relaxed AMA
on (SPD)
@@@R@
@@I ����
���
ADMM on (SPP)⇔
Dougles Rachfordon (D)
ADMM on (SPD)⇔
Dougles Rachfordon (P)
@@@@ ����
@@@R
@@@R
���
���
+ 1
2〈u − uk, ( 1
α− δAT A)(u − uk)〉 + 1
2〈p − pk, (1
δ− αAAT )(p − pk)〉
Primal-Dual Proximal Pointon (PD)
⇔PDHG
��
��
�
��
��
�
@@
@@@R
@@
@@@R
pk+1 →2pk+1 − pk
uk →2uk − uk−1
Split Inexact Uzawaon (SPP)
⇔PDHGMp
Split Inexact Uzawaon (SPD)
⇔PDHGMu
Legend: (P): Primal(D): Dual(PD): Primal-Dual(SPP): Split Primal(SPD): Split Dual
AMA: Alternating Minimization Algorithm (4.2.1)PFBS: Proximal Forward Backward Splitting (4.2.1)ADMM: Alternating Direction Method of Multipliers (4.2.2)PDHG: Primal Dual Hybrid Gradient (4.2)PDHGM: Modified PDHG (4.2.3)⇒ Well Understood Convergence Properties
43
Types of Applications
• Convex programs that decompose into problems of the form
minz,u
F (z) + H(u) s.t. Au + Bz = b
• Especially useful for problems involving convex constraints, l2 andl1-like terms that can be separated
• l1-like terms include TV seminorm, Besov norm and even the nuclearnorm, which is thel1 norm of the singular values of a matrix
• Can also apply these algorithms to convex relaxations of non-convexproblems
44
Sparse Approximation
These algorithms are useful for functionals involving multiple l1-like terms,which can arise when modelling signals as sums of sparse signals in differentrepresentations:
• TV-l1• Cartoon / Texture Decomposition
Sparse∇ + Sparse Fourier coefficients• Background Video Detection
min ‖A‖nuclear + λ‖E‖1 s.t. A + E ∼ original video
Low rank (background) + Sparse error (foreground)
Ref: Osher, S., Sole, A. Vese, L.,Image Decomposition and Restoration Using
Total Variation Minimization and the H−1 Norm. [UCLA CAM Report 02-57]Ref: Talk by John Wright
45
Nonlocal Total Variation
Graph definition of discrete TV seminorm makes it straightforward to extendthese algorithms to non-local TV minimization problems.
‖u‖TV = ‖Du‖E
Simply redefine the edge-node adjacency matrixD.
Let A be the adjacency matrix for the new set of edges, redefineE
accordingly and letW be a diagonal matrix of precomputed nonnegativeweights on the edges. Then
‖u‖NLTV = ‖√
WAu‖E
46
Convexification of Image Segmentation
minu
‖u‖TV + λ‖u(c1 − f)‖2 + λ‖(1 − u)(c2 − f)‖2 s.t. u binary
⇔
minu
‖u‖TV + λ〈(c1 − f)2 − (c2 − f)2, u〉 s.t. 0 ≤ u ≤ 1
Convexification idea also extends to active contours, multiphase segmentation.
Ref: Burger, M., and Hintermüller, M.,Projected Gradient Flows for BV / LevelSet Relaxation, UCLA CAM Report [05-40] 2005.Ref: Goldstein, T., Bresson, X., Osher, S.,Geometric Applications of the SplitBregman Method: Segmentation and Surface Reconstruction, UCLA CAM Report[09-06] 2009.
47
Convexification of Image Registration
Given imagesu andφ, minimize
‖φ(x − v) − u(x)‖2 +γ
2‖∇v1‖2 +
γ
2‖∇v2‖2
with respect to displacement fieldv.Obtain convex relaxation by adding edges with unknown weights ci,j suchthat
(vi1, v
i2) =
(xi1 −
∑
j∼i
ci,jyj1), (x
i2 −
∑
j∼i
ci,jyj2)
phi
x2
x1
y2
y1
u
F (c) = ‖Aφc − u‖2 +γ
2‖D(Ay1
c − x1)‖2 +γ
2‖D(Ay2
c − x2)‖2
such thatci,j ≥ 0 and∑
j∼i ci,j = 1.
48