Lecture 2az/lectures/opt/lect2.pdf · 2013. 1. 20. · Lecture 2 C25 Optimization Hilary 2013 A. Zisserman • Discrete optimization • Dynamic Programming • Applications • Generalizations

Lecture 2

C25 Optimization Hilary 2013 A. Zisserman

• Discrete optimization

• Dynamic Programming

• Applications

• Generalizations …

The Optimization Tree

Discrete Optimization

• Also called combinatorial optimization

• In continuous optimization each component, xi, of the vector x can vary continuously

• In discrete optimization each component, or variable, xi can only take a finite number of possible states

• The advantage is that for special cases the global optimum can be found for cost functions which are not convex when the variables are continuous

• And these special cases occur often in real applications

• And the global optimum can be found efficiently

continuous discrete

Dynamic programming

• Applies to problems where the cost function can be:

• decomposed into a sequence (ordering) of stages, and

• each stage depends on only a fixed number of previous stages

• The cost function need not be convex (if variables continuous)

• The name “dynamic” is historical

• Also called the “Viterbi” algorithm

Consider a cost function of the form

where xi can take one of h values

e.g. h=5, n=6

x1 x2 x3 x4 x5 x6

f(x) : IRn → IR

f(x) =

find shortest

path

Complexity of minimization:

• exhaustive search O(hn)

• dynamic programming O(nh2)

f(x) =nXi=1

mi(xi) +nXi=2

φi(xi−1, xi)

m1(x1) +m2(x2) +m3(x3) +m4(x4) +m5(x5) +m6(x6)

φ(x1, x2) + φ(x2, x3) + φ(x3, x4) + φ(x4, x5) + φ(x5, x6)

trellis

Example 1

closeness to measurements

smoothness

f(x) =nXi=1

(xi − di)2 +nXi=2

λ2(xi − xi−1)2

f(x) =nXi=1

mi(xi) +nXi=2

φ(xi−1, xi)

d

i

x

i

Motivation: complexity of stereo correspondence

Objective: compute horizontal displacement for matches between left and right images

xi is spatial shift of i’th pixel → h = 40

x is all pixels in row → n = 256

Complexity O(40256) vs O(256× 402)

x1 x2 x3 x4 x5 x6

Key idea: the optimization can be broken down into n sub-optimizations

f(x) =nXi=1

mi(xi) +nXi=2

φ(xi−1, xi)

Step 1: For each value of x2 determine the best value of x1

• Compute

S2(x2) = minx1{m2(x2) +m1(x1) + φ(x1, x2)}

= m2(x2) +minx1{m1(x1) + φ(x1, x2)}

• Record the value of x1 for which S2(x2) is a minimum

To compute this minimum for all x2 involves O(h2) operations

x1 x2 x3 x4 x5 x6

Step 2: For each value of x3 determine the best value of x2 and x1

• Compute

S3(x3) = m3(x3) +minx2{S2(x2) + φ(x2, x3)}

• Record the value of x2 for which S3(x3) is a minimum

Again, to compute this minimum for all x3 involves O(h2) operations

Note Sk(xk) encodes the lowest cost partial sum for all nodes up to k

which have the value xk at node k, i.e.

Sk(xk) = minx1,x2,...,xk−1

kXi=1

mi(xi) +kXi=2

φ(xi−1, xi)

Viterbi Algorithm

Complexity O(nh2)

• Initialize S1(x1) = m1(x1)

• For k = 2 : n

Sk(xk) = mk(xk) + minxk−1

{Sk−1(xk−1) + φ(xk−1, xk)}bk(xk) = argmin

xk−1{Sk−1(xk−1) + φ(xk−1, xk)}

• Terminatex∗n = argmin

xnSn(xn)

• Backtrackxi−1 = bi(xi)

Example 2

Note, f(x) is not convex

f(x) =nXi=1

(xi − di)2 +nXi=2

gα,λ(xi − xi−1)

where

gα,λ(∆) = min(λ2∆2,α) =

(λ2∆2 if |∆| < √α/λα otherwise.

i

d

i

x

Note

This type of cost function often arises in MAP estimation

x∗ = argmaxxp(x|y)

measurements

= argmaxxp(y|x)p(x) Bayes’ rule

e.g. for Gaussian measurement errors, and first order smoothness

Use negative log to obtain a cost function of the form

from likelihood from prior

f(x) =nXi=1

(xi − yi)2 +nXi=2

λ2(xi − xi−1)2

∼nYi

e−(xi−yi)

2

2σ2 e−β2(xi−xi−1)2

Where can DP be applied?

Example Applications:

1. Text processing: String edit distance

2. Speech recognition: Dynamic time warping

3. Computer vision: Stereo correspondence

4. Image manipulation: Image re-targeting

5. Bioinformatics: Gene alignment

6. Hidden Markov model (HMM)

Dynamic programming can be applied when there is a linear ordering on the cost function (so that partial minimizations can be computed).

Application I: string edit distance

The edit distance of two strings, s1 and s2, is the minimum number of single character mutations required to change s1 into s2, where a mutation is one of:

1. substitute a letter ( kat cat ) cost = 1

2. insert a letter ( ct cat ) cost = 1

3. delete a letter ( caat cat ) cost = 1

Example: d( opimizateon, optimization )

op imizateon|| |||||||||optimization||||||||||||cciccccccscc

d(s1,s2) = 2

‘c’ = copy, cost = 0

Complexity

• for two strings of length m and n, exhaustive search has complexity O( 3m+n )

• dynamic programming reduces this to O( mn )

Using string edit distance for spelling correction

1. Check if word w is in the dictionary D

2. If it is not, then find the word x in D that minimizes d(w, x)

3. Suggest x as the corrected spelling for w

Note: step 2 appears to require computing the edit distance to all words in D, but this is not required at run time because edit distance is a metric, and this allows efficient search.

0 1 2 3 4 5 6-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

50 100 150 200 250 300 350 400

50

100

150

200

250

50 100 150 200 250 300 350

50

100

150

200

250

audio

log(STFT)

time

short term Fourier transform

sample template

Application II: Dynamic Time Warp (DTW)

Objective: temporal alignment of a sample and template speech pattern

freq

uenc

y (H

z)

warp to match `columns’ of log(STFT) matrix

Application II: Dynamic Time Warp (DTW)

is time shift of i th column

quality of match cost of allowed moves

f(x) =nXi=1

mi(xi) +nXi=2

φ(xi−1, xi)

template

sample

(1, 0)

(0, 1)

(1, 1)

xi

Application III: stereo correspondence

Objective: compute horizontal displacement for matches between left and right images

quality of match uniqueness, smoothness

f(x) =nXi=1

mi(xi) +nXi=2

φ(xi−1, xi)

is spatial shift of i th pixelxi

left image band

right image band

normalized cross correlation(NCC)

1

0

0.5

x

m(x) = α(1−NCC)2

NCC of square image regions at offset (disparity) x

• Arrange the raster intensities on two sides of a grid

• Crossed dashed lines represent potential correspondences

• Curve shows DP solution for shortest path (with cost computed from f(x))

range map

Pentagon example

left image right image

Real-time application – Background substitution

Left view Right view

Input

input left view

Results

Background substitution 1 Background substitution 2

• Remove image “seams” for imperceptible aspect ratio change

Application IV: image re-targeting

seam

Seam Carving for Content-Aware Image Retargeting. Avidan and Shamir, SIGGRAPH, San-Diego, 2007

scale

seam removal

Finding the optimal seam – s

E(I) = |∂I/∂x|+ |∂I/∂y|→ s∗ = argminsE(s)

E(I)s

Dynamic Programming on graphs

• Graph (V,E)

• Vertices vi for i = 1, . . . , n

• Edges eij connect vi to other vertices vj

f(x) =Xvi∈V

mi(vi) +Xeij∈E

φ(vi, vj)

3

1

2

e.g.

• 3 vertices, each can take one of h values

• 3 edges

Dynamic Programming on graphs

So far have considered chains

Can dynamic programming be applied to these configurations?

54 61 2 3

1

3

4 5

6

2

Fully connected

1

3

4 5

6

2

Star structure

1

3

4

5

6

2

Tree structure

Documents

Lecture 2az/lectures/opt/lect2.pdf · 2013. 1. 20. · Lecture 2 C25 Optimization Hilary 2013 A. Zisserman • Discrete optimization • Dynamic Programming • Applications • Generalizations