Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Lecture 2
C25 Optimization Hilary 2013 A. Zisserman
• Discrete optimization
• Dynamic Programming
• Applications
• Generalizations …
The Optimization Tree
Discrete Optimization
• Also called combinatorial optimization
• In continuous optimization each component, xi, of the vector x can vary continuously
• In discrete optimization each component, or variable, xi can only take a finite number of possible states
• The advantage is that for special cases the global optimum can be found for cost functions which are not convex when the variables are continuous
• And these special cases occur often in real applications
• And the global optimum can be found efficiently
continuous discrete
Dynamic programming
• Applies to problems where the cost function can be:
• decomposed into a sequence (ordering) of stages, and
• each stage depends on only a fixed number of previous stages
• The cost function need not be convex (if variables continuous)
• The name “dynamic” is historical
• Also called the “Viterbi” algorithm
Consider a cost function of the form
where xi can take one of h values
e.g. h=5, n=6
x1 x2 x3 x4 x5 x6
f(x) : IRn → IR
f(x) =
find shortest
path
Complexity of minimization:
• exhaustive search O(hn)
• dynamic programming O(nh2)
f(x) =nXi=1
mi(xi) +nXi=2
φi(xi−1, xi)
m1(x1) +m2(x2) +m3(x3) +m4(x4) +m5(x5) +m6(x6)
φ(x1, x2) + φ(x2, x3) + φ(x3, x4) + φ(x4, x5) + φ(x5, x6)
trellis
Example 1
closeness to measurements
smoothness
f(x) =nXi=1
(xi − di)2 +nXi=2
λ2(xi − xi−1)2
f(x) =nXi=1
mi(xi) +nXi=2
φ(xi−1, xi)
d
i
x
i
Motivation: complexity of stereo correspondence
Objective: compute horizontal displacement for matches between left and right images
xi is spatial shift of i’th pixel → h = 40
x is all pixels in row → n = 256
Complexity O(40256) vs O(256× 402)
x1 x2 x3 x4 x5 x6
Key idea: the optimization can be broken down into n sub-optimizations
f(x) =nXi=1
mi(xi) +nXi=2
φ(xi−1, xi)
Step 1: For each value of x2 determine the best value of x1
• Compute
S2(x2) = minx1{m2(x2) +m1(x1) + φ(x1, x2)}
= m2(x2) +minx1{m1(x1) + φ(x1, x2)}
• Record the value of x1 for which S2(x2) is a minimum
To compute this minimum for all x2 involves O(h2) operations
x1 x2 x3 x4 x5 x6
Step 2: For each value of x3 determine the best value of x2 and x1
• Compute
S3(x3) = m3(x3) +minx2{S2(x2) + φ(x2, x3)}
• Record the value of x2 for which S3(x3) is a minimum
Again, to compute this minimum for all x3 involves O(h2) operations
Note Sk(xk) encodes the lowest cost partial sum for all nodes up to k
which have the value xk at node k, i.e.
Sk(xk) = minx1,x2,...,xk−1
kXi=1
mi(xi) +kXi=2
φ(xi−1, xi)
Viterbi Algorithm
Complexity O(nh2)
• Initialize S1(x1) = m1(x1)
• For k = 2 : n
Sk(xk) = mk(xk) + minxk−1
{Sk−1(xk−1) + φ(xk−1, xk)}bk(xk) = argmin
xk−1{Sk−1(xk−1) + φ(xk−1, xk)}
• Terminatex∗n = argmin
xnSn(xn)
• Backtrackxi−1 = bi(xi)
Example 2
Note, f(x) is not convex
f(x) =nXi=1
(xi − di)2 +nXi=2
gα,λ(xi − xi−1)
where
gα,λ(∆) = min(λ2∆2,α) =
(λ2∆2 if |∆| < √α/λα otherwise.
i
d
i
x
Note
This type of cost function often arises in MAP estimation
x∗ = argmaxxp(x|y)
measurements
= argmaxxp(y|x)p(x) Bayes’ rule
e.g. for Gaussian measurement errors, and first order smoothness
Use negative log to obtain a cost function of the form
from likelihood from prior
f(x) =nXi=1
(xi − yi)2 +nXi=2
λ2(xi − xi−1)2
∼nYi
e−(xi−yi)
2
2σ2 e−β2(xi−xi−1)2
Where can DP be applied?
Example Applications:
1. Text processing: String edit distance
2. Speech recognition: Dynamic time warping
3. Computer vision: Stereo correspondence
4. Image manipulation: Image re-targeting
5. Bioinformatics: Gene alignment
6. Hidden Markov model (HMM)
Dynamic programming can be applied when there is a linear ordering on the cost function (so that partial minimizations can be computed).
Application I: string edit distance
The edit distance of two strings, s1 and s2, is the minimum number of single character mutations required to change s1 into s2, where a mutation is one of:
1. substitute a letter ( kat cat ) cost = 1
2. insert a letter ( ct cat ) cost = 1
3. delete a letter ( caat cat ) cost = 1
Example: d( opimizateon, optimization )
op imizateon|| |||||||||optimization||||||||||||cciccccccscc
d(s1,s2) = 2
‘c’ = copy, cost = 0
Complexity
• for two strings of length m and n, exhaustive search has complexity O( 3m+n )
• dynamic programming reduces this to O( mn )
Using string edit distance for spelling correction
1. Check if word w is in the dictionary D
2. If it is not, then find the word x in D that minimizes d(w, x)
3. Suggest x as the corrected spelling for w
Note: step 2 appears to require computing the edit distance to all words in D, but this is not required at run time because edit distance is a metric, and this allows efficient search.
0 1 2 3 4 5 6-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
50 100 150 200 250 300 350 400
50
100
150
200
250
50 100 150 200 250 300 350
50
100
150
200
250
audio
log(STFT)
time
short term Fourier transform
sample template
Application II: Dynamic Time Warp (DTW)
Objective: temporal alignment of a sample and template speech pattern
freq
uenc
y (H
z)
warp to match `columns’ of log(STFT) matrix
Application II: Dynamic Time Warp (DTW)
is time shift of i th column
quality of match cost of allowed moves
f(x) =nXi=1
mi(xi) +nXi=2
φ(xi−1, xi)
template
sample
(1, 0)
(0, 1)
(1, 1)
xi
Application III: stereo correspondence
Objective: compute horizontal displacement for matches between left and right images
quality of match uniqueness, smoothness
f(x) =nXi=1
mi(xi) +nXi=2
φ(xi−1, xi)
is spatial shift of i th pixelxi
left image band
right image band
normalized cross correlation(NCC)
1
0
0.5
x
m(x) = α(1−NCC)2
NCC of square image regions at offset (disparity) x
• Arrange the raster intensities on two sides of a grid
• Crossed dashed lines represent potential correspondences
• Curve shows DP solution for shortest path (with cost computed from f(x))
range map
Pentagon example
left image right image
Real-time application – Background substitution
Left view Right view
Input
input left view
Results
Background substitution 1 Background substitution 2
• Remove image “seams” for imperceptible aspect ratio change
Application IV: image re-targeting
seam
Seam Carving for Content-Aware Image Retargeting. Avidan and Shamir, SIGGRAPH, San-Diego, 2007
scale
seam removal
Finding the optimal seam – s
E(I) = |∂I/∂x|+ |∂I/∂y|→ s∗ = argminsE(s)
E(I)s
Dynamic Programming on graphs
• Graph (V,E)
• Vertices vi for i = 1, . . . , n
• Edges eij connect vi to other vertices vj
f(x) =Xvi∈V
mi(vi) +Xeij∈E
φ(vi, vj)
3
1
2
e.g.
• 3 vertices, each can take one of h values
• 3 edges
Dynamic Programming on graphs
So far have considered chains
Can dynamic programming be applied to these configurations?
54 61 2 3
1
3
4 5
6
2
Fully connected
1
3
4 5
6
2
Star structure
1
3
4
5
6
2
Tree structure