Constrained optimization on Hierarchies
B Ravi Kiran
Part IIIICIP 2014 Tutorial T9
Optimizations on Hierarchies of Partitions
B Ravi Kiran : Constrained Optimization on HOP 1/31
Outline of the lecture
1 Review on Constrained OptimizationOptimally Pruned Decision TreesDecision Trees in Information Theory
2 Constrained optimization on Hierarchies of PartitionsMoving to hierarchiesOptimal Cuts on Hierarchies of PartitionsLagrangian Formulationλ-cuts are Upper Bounds
3 Conclusion
4 References
B Ravi Kiran : Constrained Optimization on HOP 2/31
Review on Constrained Optimization
Decision Trees and Optimally Pruning
R1
R2
R3
R4
t1
t2
t3
X1 →
↑X2 t1 ≤ 2
t2 ≤ 1
R2 R1
t3 ≤ 3
R3 R4
2D feature space recursively partitioned producing binary tree.
Grow tree until each class contains minimal number points.
How to find a good classifier or regression for f (X1,X2)?
B Ravi Kiran : Constrained Optimization on HOP 3/31
Review on Constrained Optimization
Cost complexity Pruning [Breiman 1984]
Given a grown tree T , we can write a cost:
Rλ(T ) =
|T |∑m=1
∑xi∈Rm
(yi − µRm)2 + λ∣∣∣T ∣∣∣ .
µRm : mean value of observed variable y in region Rm.
λ: parameter that governs trade-off between tree size and fidelityto data.
T : terminal nodes of tree T
Error term for classification includes Impurity measures like Ginicoefficient.
Constrained optimization problem: Trade-off between ClassifierComplexity-Error
B Ravi Kiran : Constrained Optimization on HOP 4/31
Review on Constrained Optimization
Cost complexity Pruning [Breiman 1984]
Figure: Variation of Training Error vs Tree Cost.
B Ravi Kiran : Constrained Optimization on HOP 5/31
Review on Constrained Optimization
Pruning Example
10
3 5
1 1 1 2 1
10.5
3.5 5.5
1.5 1.5 1.5 2.5 1.5
11
4 6
2 2 2 3 2
Figure: Pruning example demonstrating Cost-Complexity pruning. Tree with cost functiongiven for each node given. For λ = 0(left), 0.5(center), 1(right), pruned optimal subtreesare shown. AS λ increases one gets shorter trees. Ideal value of λ is chose bycross-validation, by re-running the fit over k-folds of the original data.
The value of λ at which a pruned parent node is kept w.r.t its childrenis:
λ(T ) =Error(T , t)− Error(T )
T − 1(1)
B Ravi Kiran : Constrained Optimization on HOP 6/31
Review on Constrained Optimization
Decision Trees in Information Theory
CART trees in Rate Distortion Minimization Framework:
D(R) = infPY |X{E [ρ(X ,Y )]|I (X ,Y ) ≤ R}
[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling
B Ravi Kiran : Constrained Optimization on HOP 7/31
Review on Constrained Optimization
Decision Trees in Information Theory
CART trees since then have had diverse applications in informationtheory and classifier design:
[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases in aRate-Distortion Sense
B Ravi Kiran : Constrained Optimization on HOP 8/31
Review on Constrained Optimization
Decision Trees in Information Theory
[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling
[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases inRate distortion sense
[Donoho 1997] CART and Best-ortho-basis: A connection
[Wakin, Romberg, Choi, & Baraniuk 2002] Rate distortionoptimization image compression using Wedge-lets
[Chiang & Boyd 2004] Geometric Programming Duals of ChannelCapacity and Rate Distortion
[Shukla & Vettereli 2005] Tree structured Compression forPiecewise Polynomial Images
B Ravi Kiran : Constrained Optimization on HOP 9/31
Constrained optimization on Hierarchies of Partitions
Moving to Hierarchies
CART motivated applications in hierarchies of segmentations:
[Salembier-Garrido 2000] Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval.
[Guigues 2003] Scale-Sets.
[Ballester, Caselles, Igual, Garrido 2006] Level Lines Selectionwith. Variational Models for Segmentation and Encoding.
[Calederero-Marques 2010] Region merging techniques usinginformation theory statistical measures.
Wide new domain of Hierarchical processing:
[Sylvia Valero 2011] Hyper-spectral data representation usingBinary Partition trees.
[Camille Kurtz 2012] Extraction of complex patterns from multiresolution remote sensing images.
[Xu et al. 2012] Morphological Filtering in Shape Spaces .
B Ravi Kiran : Constrained Optimization on HOP 10/31
Constrained optimization on Hierarchies of Partitions
Binary Partition Trees [Salembier-Garrido 2000]
Problem
Given Max-tree representation of gray scale Image
Calculate the partition with least distortion given Rate constraint
Calculate the optimal trade off parameter λ which achieves aconstraint bandwidth rate.
B Ravi Kiran : Constrained Optimization on HOP 11/31
Constrained optimization on Hierarchies of Partitions
Binary Partition Trees [Salembier-Garrido 2000]
Algorithm
Inputs: Distortion D; Rate C ; Budget Rate: C0; Lagrange parameter λλl = 0; \\Compute D and C for a very low λ∗ BottomUpAnalysis(Input: λl , output: C ,D)if C < C0 then { no solution; exit;}Cl = C ; Dl = D;λ = 1020; \\Compute D and C for a very high λ∗ BottomUpAnalysis(Input: λh, output: C ,D)if C > C0 then { no solution; exit;}Ch = C ; Dh = D;do {\\Find the optimum λ valueλ = Dl−Dh
Ch−Cl;
∗ BottomUpAnalysis(Input: λ, output: C ,D)if C < C0 then { Ch = C ; Dh = D;}else { Cl = C ; Dl = D;}} until (C ≈ C0)
B Ravi Kiran : Constrained Optimization on HOP 12/31
Constrained optimization on Hierarchies of Partitions
Dynamic Program
ω∗(π(S)) = min{ω({S},∑
a∈π(S)
ω(a)}
{S}
a b c π(S) = a t b t c
π∗(S) =
{{S}, if ω(S) ≤
∑a∈π(S) ω(a)
π(S), otherwise
Here ω(S) = ωϕ(S) + λ · ω∂(S), we will see why.D ←− ωϕ, C ←− ω∂ for Salembier-Garrido.
B Ravi Kiran : Constrained Optimization on HOP 13/31
Constrained optimization on Hierarchies of Partitions
Scale-Sets [Guigues 2003]
Extraction of a set of optimal cuts from a hierarchy characterized by
Energy functional/model: Mumford-ShahA scale parameter λ
B Ravi Kiran : Constrained Optimization on HOP 14/31
Constrained optimization on Hierarchies of Partitions
Scale-Sets [Guigues 2003]
Energy formulation in Guigues case:
ω(S , λ) = ωϕ(S) + λω∂(S)
Remark
Start from a Hierarchy, Calculate the scale function λ(S) = −∆ωϕ
∆ω∂
for classes in hierarchy H
Calculate indexed hierarchy (H, λ+) consisting of minimal cuts forincreaing λ’s:
{Π(λ,H)}λ∈R+ → (H, λ+)
Furthermore minimization of an energy on Π(H,E ) is NP hard.
Instead chose minimal cuts corresponding to scale parameter λ.
B Ravi Kiran : Constrained Optimization on HOP 15/31
Constrained optimization on Hierarchies of Partitions
Guigue’s Problems
We have a constrained optimization problem on hierarchies.
Problem
Conditions on objective function ωϕ and Constraint ω∂ to obtainmonotonically ordered set of optimal cuts with λ, thus an indexedoptimal hierarchy.
Conditions on energy ωϕ, ω∂ which ensure unique optimum for agiven λ?
B Ravi Kiran : Constrained Optimization on HOP 16/31
Constrained optimization on Hierarchies of Partitions
Problem Formulation
Given energies ωϕ, ω∂ : D(E )→ R
minimizeπ∈Π(E ,H)
∑S∈π
ωϕ(S)
subject to∑S∈π
ω∂(S) ≤ C
minimizeπ∈Π(E ,H)
∑S∈π
ω∂(S)
subject to∑S∈π
ωϕ(S) ≤ K
B Ravi Kiran : Constrained Optimization on HOP 17/31
Constrained optimization on Hierarchies of Partitions
Level Line selection [Casselles et al 2006]
Hierarchy: Tree of Shapes which is an Inclusion tree built from theupper and lower level sets of a scalar function
Rate-Distortion framework for compression if images.
Distorion: |f (x)− µ(S)|2 quadratic error
Rate: ∂S contour length
B Ravi Kiran : Constrained Optimization on HOP 18/31
Constrained optimization on Hierarchies of Partitions
Dynamic Program
ω∗(π(S)) = min{ω({S},∑
a∈π(S)
ω(a)}
{S}
a b c π(S) = a t b t c
π∗(S) =
{{S}, if ω(S) ≤
∑a∈π(S) ω(a)
π(S), otherwise
B Ravi Kiran : Constrained Optimization on HOP 19/31
Constrained optimization on Hierarchies of Partitions
Primal and Dual problems
Lagrangian Primal problem:
minimizeπ∈Π(E ,B)
ωϕ(π)
subject to ω∂(π) ≤ C ,
The Lagrangian is now written as:
minimizeπ∈Π(E ,H)
∑S∈π
ωϕ(S)
subject to∑S∈π
ω∂(S) ≤ C
Now the domain of the feasible cuts is the subset Π′ of Π
Π′ = {π, π ∈ Π, ω∂(π) ≤ C}
B Ravi Kiran : Constrained Optimization on HOP 20/31
Constrained optimization on Hierarchies of Partitions
Lagrangian Multipliers
Remark
For the constrained optimization problem [Salembier, Guigues et al.]use the Lagrangian multiplier methods to formulate an unconstrainedoptimization problem.
As we know from optimization theory, the Lagrangian is given by:
ω(π, λ) = ωϕ(π) + λ · ω∂(π)
Minimal Cuts are the family of cuts with least ωϕ for a given λ.
B Ravi Kiran : Constrained Optimization on HOP 21/31
Constrained optimization on Hierarchies of Partitions
Unconstrained minimization of Lagrangian
Remark
Guigues assumes sub-additive constraint ω∂ and super-additiveobjective ωϕ to extract λ-ordered cuts from the input hierarchy
Salembier et al. proposes a gradient search based method to findλ which achieves the constraint rate C approximately, that isω∂(π(λ)) ≈ C .
Breiman, Salembier, Guigues and many others, ensure Uniquenessby choosing the smallest cut that satisfies C . This is basically thecondition of Uniqueness.
B Ravi Kiran : Constrained Optimization on HOP 22/31
Constrained optimization on Hierarchies of Partitions
Demonstration
30
20 4
5 5 4
1 1 1 1 1 1
ωϕ Tree 4
3 2
2 2 2
1 2 1 2 1 2
ω∂ Tree
π
π′
E
j i
g h i
a b c d e f
Dendrogram6
10 2
3 3 2
- - - - - -
λ-tree
π1
π2
π3
Figure: Bottom Left: hierarchy H. Top row: two energies (ωϕ, ω∂) forcorresponding classes. Bottom right: lambda values by equating parent andchild energies, whose level sets give the minimal cuts w.r.t. the ωλ. Scale-setsor λ-cuts shown for λ = 2, 3, 4 as π2, π3, π4.
B Ravi Kiran : Constrained Optimization on HOP 23/31
Constrained optimization on Hierarchies of Partitions
λ-cuts are Upper Bounds
λ
ω∂ , ωϕ
0 1 2 3 4 5
6
8
9
14
C = 7.5
ωϕ(π∗λ)
ω∂(π∗λ)
Figure: For 2 < λ < 3 the minimal cut is (a, b, c , d , k) and ω∂ = 8, for λ ≥ 3the minimal cut is (g , h, k) and ω∂ = 6, i.e. ω∂ is never equal to the costC = 7.5 at any time.
B Ravi Kiran : Constrained Optimization on HOP 24/31
Constrained optimization on Hierarchies of Partitions
λ-cuts are Upper Bounds
Remark
Lack of Cost→Multiplier mapping: For a given cost ω∂ ≤ C one isnot assured a corresponding multiplier λ.
Uniqueness is lost, even when ωϕ is strictly h-increasing.
π∗(λ∗) is only the upper-bound of the constrained minimal cuts.
the error | ω∂(π∗(λ∗))− C | gives no information about the error| ωϕ(π∗(λ∗))− ωϕ(π) | where π is a constrained minimal cut.
On the ω∂-tree the structure of the solution space forms a lattice.
B Ravi Kiran : Constrained Optimization on HOP 25/31
Constrained optimization on Hierarchies of Partitions
Everett’s Theorem
Remark
The family of cuts is an abstract set, with energies ωϕ and ω∂ neitherdifferentiable, convex, nor smooth.
Given the multiplier λ ∈ R
minπ∈Π(E ,H)
{∑π
ωϕ(S) + λ∑π
ω∂(S)
}The solution π(λ) to this unconstrained minimization is also an optimalsolution to perturbed primal problem:
minimizeπ∈Π(E ,H)
∑π
ωϕ(S) subject to∑π
ω∂(S) ≤∑π(λ)
ω∂(S)
This solution solves the constrained problem, where the constraint isλ-dependent
B Ravi Kiran : Constrained Optimization on HOP 26/31
Constrained optimization on Hierarchies of Partitions
Optimal λ
The two problems will be solved jointly by introducing
λ∗ = inf{λ | ω∂(π∗(λ)) ≤ 0}.
The constraint function ω∂ being h-increasing, and
0 ≤ λ∗ ≤ λ ⇒ π∗(λ∗) �∂ π∗(λ) ⇒ 0 ≥ ω∂(π∗(λ∗)) ≥ ω∂(π∗(λ)).
The domain of the feasible λ is therefore λ ≥ λ∗.We can now set the minimization problem more precisely. Threeconditions are needed:
1 Primal constraint qualification: the set Π′ is not empty,2 Dual constraint qualification: λ∗ exists and is ≥ 0,3 Multiplier based constraint: ω∂(π∗(λ∗)) = 0.
B Ravi Kiran : Constrained Optimization on HOP 27/31
Conclusion
Conclusion
Figure: A brief overview on constrained optimization of Hierarchies.
B Ravi Kiran : Constrained Optimization on HOP 28/31
Conclusion
Conclusion
Tree structured constraints are predominantly used in the fields ofCoding theory, Machine Learning, Image compression andsegmentation.
Rate-distortion minimization, Cost-Complexity, Min Descriptionlength, are various types of constrained optimization problems,which have their tree structured counterparts.
Due to discrete nature of functions, we use Lagrangian multipliersand perturbation methods to reach a minimum
Dual parameter searches can at the best provide an upper-boundon the minimum.
Uniqueness in most cases are ensured by singularity.
B Ravi Kiran : Constrained Optimization on HOP 29/31
References
References
P. Salembier and Garrido, L., Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval, ITIP, vol. 9, pp. 561–576, 2000
Laurent Guigues, Jean Pierre Cocquerez, and Herve Le Men.Scale-sets image analysis. International Journal of ComputerVision, 68(3):289–317, 2006.
Coloma Ballester, Vicent Caselles, Laura Igual, and Luis Garrido.Level lines selection with variational models for segmentation andencoding. JMIV, 27(1):5?27, 2007.
Y. Shoham and A Gersho. Efficient bit allocation for an arbitraryset of quantizers [speech coding]. Acoustics, Speech and SignalProcessing, IEEE Transactions on, 36(9): 1445?-1453
B Ravi Kiran : Constrained Optimization on HOP 30/31
References
References
Hugh Everett. Generalized lagrange multiplier method for solvingproblems of optimum allocation of resources. OperationsResearch, 11(3):399–417, 1963.
P.A Chou, T. Lookabaugh, and R.M. Gray. Optimal pruning withapplications to tree-structured source coding and modeling.Information Theory, IEEE Transactions on, 35(2):299–315, Mar1989. ISSN 0018–9448
Context-based energy estimator: Application to objectsegmentation on the tree of shapes, Yongchao Xu, Geraud, T.,Najman, L., ICIP 2012.
Energetic-Lattice Based optimization, PhD Thesis, B Ravi Kiran,To be defended 31 Oct ESIEE paris.
B Ravi Kiran : Constrained Optimization on HOP 31/31