Download pdf - Constrained optimization on Hierarchieskiranr/ICIP2014TutPartIII-Ravi.pdf · information theory statistical measures. Wide new domain of Hierarchical processing: [Sylvia Valero 2011]Hyper-spectral

Constrained optimization on Hierarchies

B Ravi Kiran

Part IIIICIP 2014 Tutorial T9

Optimizations on Hierarchies of Partitions

B Ravi Kiran : Constrained Optimization on HOP 1/31

Outline of the lecture

1 Review on Constrained OptimizationOptimally Pruned Decision TreesDecision Trees in Information Theory

2 Constrained optimization on Hierarchies of PartitionsMoving to hierarchiesOptimal Cuts on Hierarchies of PartitionsLagrangian Formulationλ-cuts are Upper Bounds

3 Conclusion

4 References


Review on Constrained Optimization

Decision Trees and Optimally Pruning

R1

R2

R3

R4

t1

t2

t3

X1 →

↑X2 t1 ≤ 2

t2 ≤ 1

R2 R1

t3 ≤ 3

R3 R4

2D feature space recursively partitioned producing binary tree.

Grow tree until each class contains minimal number points.

How to find a good classifier or regression for f (X1,X2)?



Cost complexity Pruning [Breiman 1984]

Given a grown tree T , we can write a cost:

Rλ(T ) =

|T |∑m=1

∑xi∈Rm

(yi − µRm)2 + λ∣∣∣T ∣∣∣ .

µRm : mean value of observed variable y in region Rm.

λ: parameter that governs trade-off between tree size and fidelityto data.

T : terminal nodes of tree T

Error term for classification includes Impurity measures like Ginicoefficient.

Constrained optimization problem: Trade-off between ClassifierComplexity-Error



Cost complexity Pruning [Breiman 1984]

Figure: Variation of Training Error vs Tree Cost.



Pruning Example

10

3 5

1 1 1 2 1

10.5

3.5 5.5

1.5 1.5 1.5 2.5 1.5

11

4 6

2 2 2 3 2

Figure: Pruning example demonstrating Cost-Complexity pruning. Tree with cost functiongiven for each node given. For λ = 0(left), 0.5(center), 1(right), pruned optimal subtreesare shown. AS λ increases one gets shorter trees. Ideal value of λ is chose bycross-validation, by re-running the fit over k-folds of the original data.

The value of λ at which a pruned parent node is kept w.r.t its childrenis:

λ(T ) =Error(T , t)− Error(T )

T − 1(1)



Decision Trees in Information Theory

CART trees in Rate Distortion Minimization Framework:

D(R) = infPY |X{E [ρ(X ,Y )]|I (X ,Y ) ≤ R}

[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling




CART trees since then have had diverse applications in informationtheory and classifier design:

[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases in aRate-Distortion Sense




[Chou Lookabaugh & Gray 1988] Tree structured source codingand modeling

[Ramachandran & Vettereli 1988] Best Wavelet Packet Bases inRate distortion sense

[Donoho 1997] CART and Best-ortho-basis: A connection

[Wakin, Romberg, Choi, & Baraniuk 2002] Rate distortionoptimization image compression using Wedge-lets

[Chiang & Boyd 2004] Geometric Programming Duals of ChannelCapacity and Rate Distortion

[Shukla & Vettereli 2005] Tree structured Compression forPiecewise Polynomial Images


Constrained optimization on Hierarchies of Partitions

Moving to Hierarchies

CART motivated applications in hierarchies of segmentations:

[Salembier-Garrido 2000] Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval.

[Guigues 2003] Scale-Sets.

[Ballester, Caselles, Igual, Garrido 2006] Level Lines Selectionwith. Variational Models for Segmentation and Encoding.

[Calederero-Marques 2010] Region merging techniques usinginformation theory statistical measures.

Wide new domain of Hierarchical processing:

[Sylvia Valero 2011] Hyper-spectral data representation usingBinary Partition trees.

[Camille Kurtz 2012] Extraction of complex patterns from multiresolution remote sensing images.

[Xu et al. 2012] Morphological Filtering in Shape Spaces .



Binary Partition Trees [Salembier-Garrido 2000]

Problem

Given Max-tree representation of gray scale Image

Calculate the partition with least distortion given Rate constraint

Calculate the optimal trade off parameter λ which achieves aconstraint bandwidth rate.



Binary Partition Trees [Salembier-Garrido 2000]

Algorithm

Inputs: Distortion D; Rate C ; Budget Rate: C0; Lagrange parameter λλl = 0; \\Compute D and C for a very low λ∗ BottomUpAnalysis(Input: λl , output: C ,D)if C < C0 then { no solution; exit;}Cl = C ; Dl = D;λ = 1020; \\Compute D and C for a very high λ∗ BottomUpAnalysis(Input: λh, output: C ,D)if C > C0 then { no solution; exit;}Ch = C ; Dh = D;do {\\Find the optimum λ valueλ = Dl−Dh

Ch−Cl;

∗ BottomUpAnalysis(Input: λ, output: C ,D)if C < C0 then { Ch = C ; Dh = D;}else { Cl = C ; Dl = D;}} until (C ≈ C0)



Dynamic Program

ω∗(π(S)) = min{ω({S},∑

a∈π(S)

ω(a)}

{S}

a b c π(S) = a t b t c

π∗(S) =

{{S}, if ω(S) ≤

∑a∈π(S) ω(a)

π(S), otherwise

Here ω(S) = ωϕ(S) + λ · ω∂(S), we will see why.D ←− ωϕ, C ←− ω∂ for Salembier-Garrido.



Scale-Sets [Guigues 2003]

Extraction of a set of optimal cuts from a hierarchy characterized by

Energy functional/model: Mumford-ShahA scale parameter λ



Scale-Sets [Guigues 2003]

Energy formulation in Guigues case:

ω(S , λ) = ωϕ(S) + λω∂(S)

Remark

Start from a Hierarchy, Calculate the scale function λ(S) = −∆ωϕ

∆ω∂

for classes in hierarchy H

Calculate indexed hierarchy (H, λ+) consisting of minimal cuts forincreaing λ’s:

{Π(λ,H)}λ∈R+ → (H, λ+)

Furthermore minimization of an energy on Π(H,E ) is NP hard.

Instead chose minimal cuts corresponding to scale parameter λ.



Guigue’s Problems

We have a constrained optimization problem on hierarchies.

Problem

Conditions on objective function ωϕ and Constraint ω∂ to obtainmonotonically ordered set of optimal cuts with λ, thus an indexedoptimal hierarchy.

Conditions on energy ωϕ, ω∂ which ensure unique optimum for agiven λ?



Problem Formulation

Given energies ωϕ, ω∂ : D(E )→ R

minimizeπ∈Π(E ,H)

∑S∈π

ωϕ(S)

subject to∑S∈π

ω∂(S) ≤ C


∑S∈π

ω∂(S)

subject to∑S∈π

ωϕ(S) ≤ K



Level Line selection [Casselles et al 2006]

Hierarchy: Tree of Shapes which is an Inclusion tree built from theupper and lower level sets of a scalar function

Rate-Distortion framework for compression if images.

Distorion: |f (x)− µ(S)|2 quadratic error

Rate: ∂S contour length



Dynamic Program

ω∗(π(S)) = min{ω({S},∑

a∈π(S)

ω(a)}

{S}

a b c π(S) = a t b t c

π∗(S) =

{{S}, if ω(S) ≤

∑a∈π(S) ω(a)

π(S), otherwise



Primal and Dual problems

Lagrangian Primal problem:

minimizeπ∈Π(E ,B)

ωϕ(π)

subject to ω∂(π) ≤ C ,

The Lagrangian is now written as:


∑S∈π

ωϕ(S)

subject to∑S∈π

ω∂(S) ≤ C

Now the domain of the feasible cuts is the subset Π′ of Π

Π′ = {π, π ∈ Π, ω∂(π) ≤ C}



Lagrangian Multipliers

Remark

For the constrained optimization problem [Salembier, Guigues et al.]use the Lagrangian multiplier methods to formulate an unconstrainedoptimization problem.

As we know from optimization theory, the Lagrangian is given by:

ω(π, λ) = ωϕ(π) + λ · ω∂(π)

Minimal Cuts are the family of cuts with least ωϕ for a given λ.



Unconstrained minimization of Lagrangian

Remark

Guigues assumes sub-additive constraint ω∂ and super-additiveobjective ωϕ to extract λ-ordered cuts from the input hierarchy

Salembier et al. proposes a gradient search based method to findλ which achieves the constraint rate C approximately, that isω∂(π(λ)) ≈ C .

Breiman, Salembier, Guigues and many others, ensure Uniquenessby choosing the smallest cut that satisfies C . This is basically thecondition of Uniqueness.



Demonstration

30

20 4

5 5 4

1 1 1 1 1 1

ωϕ Tree 4

3 2

2 2 2

1 2 1 2 1 2

ω∂ Tree

π

π′

E

j i

g h i

a b c d e f

Dendrogram6

10 2

3 3 2

- - - - - -

λ-tree

π1

π2

π3

Figure: Bottom Left: hierarchy H. Top row: two energies (ωϕ, ω∂) forcorresponding classes. Bottom right: lambda values by equating parent andchild energies, whose level sets give the minimal cuts w.r.t. the ωλ. Scale-setsor λ-cuts shown for λ = 2, 3, 4 as π2, π3, π4.



λ-cuts are Upper Bounds

λ

ω∂ , ωϕ

0 1 2 3 4 5

6

8

9

14

C = 7.5

ωϕ(π∗λ)

ω∂(π∗λ)

Figure: For 2 < λ < 3 the minimal cut is (a, b, c , d , k) and ω∂ = 8, for λ ≥ 3the minimal cut is (g , h, k) and ω∂ = 6, i.e. ω∂ is never equal to the costC = 7.5 at any time.



λ-cuts are Upper Bounds

Remark

Lack of Cost→Multiplier mapping: For a given cost ω∂ ≤ C one isnot assured a corresponding multiplier λ.

Uniqueness is lost, even when ωϕ is strictly h-increasing.

π∗(λ∗) is only the upper-bound of the constrained minimal cuts.

the error | ω∂(π∗(λ∗))− C | gives no information about the error| ωϕ(π∗(λ∗))− ωϕ(π) | where π is a constrained minimal cut.

On the ω∂-tree the structure of the solution space forms a lattice.



Everett’s Theorem

Remark

The family of cuts is an abstract set, with energies ωϕ and ω∂ neitherdifferentiable, convex, nor smooth.

Given the multiplier λ ∈ R

minπ∈Π(E ,H)

{∑π

ωϕ(S) + λ∑π

ω∂(S)

}The solution π(λ) to this unconstrained minimization is also an optimalsolution to perturbed primal problem:


∑π

ωϕ(S) subject to∑π

ω∂(S) ≤∑π(λ)

ω∂(S)

This solution solves the constrained problem, where the constraint isλ-dependent



Optimal λ

The two problems will be solved jointly by introducing

λ∗ = inf{λ | ω∂(π∗(λ)) ≤ 0}.

The constraint function ω∂ being h-increasing, and

0 ≤ λ∗ ≤ λ ⇒ π∗(λ∗) �∂ π∗(λ) ⇒ 0 ≥ ω∂(π∗(λ∗)) ≥ ω∂(π∗(λ)).

The domain of the feasible λ is therefore λ ≥ λ∗.We can now set the minimization problem more precisely. Threeconditions are needed:

1 Primal constraint qualification: the set Π′ is not empty,2 Dual constraint qualification: λ∗ exists and is ≥ 0,3 Multiplier based constraint: ω∂(π∗(λ∗)) = 0.


Conclusion

Conclusion

Figure: A brief overview on constrained optimization of Hierarchies.


Conclusion

Conclusion

Tree structured constraints are predominantly used in the fields ofCoding theory, Machine Learning, Image compression andsegmentation.

Rate-distortion minimization, Cost-Complexity, Min Descriptionlength, are various types of constrained optimization problems,which have their tree structured counterparts.

Due to discrete nature of functions, we use Lagrangian multipliersand perturbation methods to reach a minimum

Dual parameter searches can at the best provide an upper-boundon the minimum.

Uniqueness in most cases are ensured by singularity.


References

References

P. Salembier and Garrido, L., Binary partition tree as an efficientrepresentation for image processing, segmentation and informationretrieval, ITIP, vol. 9, pp. 561–576, 2000

Laurent Guigues, Jean Pierre Cocquerez, and Herve Le Men.Scale-sets image analysis. International Journal of ComputerVision, 68(3):289–317, 2006.

Coloma Ballester, Vicent Caselles, Laura Igual, and Luis Garrido.Level lines selection with variational models for segmentation andencoding. JMIV, 27(1):5?27, 2007.

Y. Shoham and A Gersho. Efficient bit allocation for an arbitraryset of quantizers [speech coding]. Acoustics, Speech and SignalProcessing, IEEE Transactions on, 36(9): 1445?-1453


References

References

Hugh Everett. Generalized lagrange multiplier method for solvingproblems of optimum allocation of resources. OperationsResearch, 11(3):399–417, 1963.

P.A Chou, T. Lookabaugh, and R.M. Gray. Optimal pruning withapplications to tree-structured source coding and modeling.Information Theory, IEEE Transactions on, 35(2):299–315, Mar1989. ISSN 0018–9448

Context-based energy estimator: Application to objectsegmentation on the tree of shapes, Yongchao Xu, Geraud, T.,Najman, L., ICIP 2012.

Energetic-Lattice Based optimization, PhD Thesis, B Ravi Kiran,To be defended 31 Oct ESIEE paris.