Lecture 3 - University of Oxfordaz/lectures/opt/lect3.pdf · 2013. 2. 10. · Lecture 3 C25 Optimization Hilary 2013 A. Zisserman Dynamic Programming on graphs • Terminology: chains,

Lecture 3

C25 Optimization Hilary 2013 A. Zisserman

Dynamic Programming on graphs

• Terminology: chains, stars, trees, loops

• Application

• Message passing

• Shortest path - Dijkstra's algorithm

The Optimization Tree

Consider a cost function of the form

where xi can take one of h values

e.g. h=5, n=6

x1 x2 x3 x4 x5 x6

f(x) : IRn → IR

f(x) =

find shortest

path

Complexity of minimization:

• exhaustive search O(hn)

• dynamic programming O(nh2)

f(x) =nXi=1

mi(xi) +nXi=2

φi(xi−1, xi)

m1(x1) +m2(x2) +m3(x3) +m4(x4) +m5(x5) +m6(x6)

φ(x1, x2) + φ(x2, x3) + φ(x3, x4) + φ(x4, x5) + φ(x5, x6)

trellis

RECAP

x1 x2 x3 x4 x5 x6

Key idea: the optimization can be broken down into n sub-optimizations

f(x) =nXi=1

mi(xi) +nXi=2

φ(xi−1, xi)

Step 1: For each value of x2 determine the best value of x1

• Compute

S2(x2) = minx1{m2(x2) +m1(x1) + φ(x1, x2)}

= m2(x2) +minx1{m1(x1) + φ(x1, x2)}

• Record the value of x1 for which S2(x2) is a minimum

To compute this minimum for all x2 involves O(h2) operations

RECAP

x1 x2 x3 x4 x5 x6

Step 2: For each value of x3 determine the best value of x2 and x1

• Compute

S3(x3) = m3(x3) +minx2{S2(x2) + φ(x2, x3)}

• Record the value of x2 for which S3(x3) is a minimum

Again, to compute this minimum for all x3 involves O(h2) operations

Note Sk(xk) encodes the lowest cost partial sum for all nodes up to k

which have the value xk at node k, i.e.

Sk(xk) = minx1,x2,...,xk−1

kXi=1

mi(xi) +kXi=2

φ(xi−1, xi)

RECAP

Viterbi Algorithm

Complexity O(nh2)

• Initialize S1(x1) = m1(x1)

• For k = 2 : n

Sk(xk) = mk(xk) + minxk−1

{Sk−1(xk−1) + φ(xk−1, xk)}bk(xk) = argmin

xk−1{Sk−1(xk−1) + φ(xk−1, xk)}

• Terminatex∗n = argmin

xnSn(xn)

• Backtrackxi−1 = bi(xi)

RECAP


• Graph (V,E)

• Vertices vi for i = 1, . . . , n

• Edges eij connect vi to other vertices vj

f(x) =Xvi∈V

mi(vi) +Xeij∈E

φ(vi, vj)

3

1

2

e.g.

• 3 vertices, each can take one of h values

• 3 edges


So far have considered chains

Can dynamic programming be applied to these configurations?

(i.e. to reduce the optimization complexity from O(hn) to O(nh2))

54 61 2 3

1

3

4 5

6

2

Fully connected

1

3

4 5

6

2

Star structure

1

3

4

5

6

2

Tree structure

Terminlogy

simple (or open) chain

54 61 2 3

1

3

4 5

6

2

Graphs with loops

1

2

6

3

5

4

closed chain

1

3

4

5

6

2

1

3

4 5

6

2

Fully connected

Tree structure

Star structure

1

3

4

5

6

2

Terminlogy

1

3

4

5

6

2

degree = 1

degree (or valency) of a vertex

• is the number of edges incident to the vertex

degree = 3

Terminlogy for trees

root node0

1 2

3 4 5

leaf node

depth0

1

2

depth of node

• number of edges between node and root

children of node i

• are neighbouring vertices of depth di + 1

Dynamic programming on stars and trees

1

3

4 5

6

21

3

4

5

6

2

• for a value of the central node determine the best value of each of the other nodes in turn O(nh)

• repeat for all values of the central node O(h)

• final complexity O(nh2)

• order the nodes by their depth from the root, where d is max depth

• start with nodes at depth d-1, and compute best value for all (child) nodes at depth d, O(h2)

• decrease depth and repeat, O(n)


Dynamic programming on graphs with loops

• select one node and choose its value

• for the other nodes, the graph is then equivalent to an open chain and can be optimized with O(nh2) complexity

• repeat for all values of the selected node and choose lowest overall cost from these


1

2

6

3

5

4

Summary of different graph structures

1

3

4 5

6

2

Fully connected

O(hn)

1

3

4 5

6

2

Star structure

O(nh2)

1

3

4

5

6

2

Tree structure

O(nh2)

1

2

6

3

5

4Open chain O(nh2)

54 61 2 3

Closed chain O(nh3)

Application: facial feature detection in images

• Parts V= {v1, … vn}

• Connected by springs in star configuration to nose

• Quadratic cost for spring

Model

high spring cost 1 - NCC with appearance template

f(x) =Xvi∈V

mi(vi) +Xeij∈E

φ(vi, vj)

=Xvi∈V

mi(vi) +Xj

φ(v1, vj)

Spring extension

from v1 to vj

v1

v2 v3

v4

Fitting the model to an imageFind the configuration with the lowest energy

?

Model

v1

v2

v4

v3

h = 10^6 n = 4

?

Model

v1

v2

v4

v3


h = 10^6 n = 4

?

Model

v1

v2

v4

v3


h = 10^6 n = 4

Appearance templates and springs

f(x) =Xvi∈V

mi(vi) +Xj

φ(v1, vj)

x = (x1, y1, . . . , x4, y4)>

Each (xi, yi) ranges over h (x,y) positions in the image

NCC = 1−mi

Requires pair wise terms for correct detection

Example

1−mi

Example

Application of trees – computer vision example

Detect person layout

Message Passing

Example: counting people in a queue

How to count the number of people in a queue?

Method 1:

• ask them all to shout out their names and count these

Method 2:

• person at either end sends “1” to their neighbour

• if a person receives a number from their neighbour, add one, and pass total to neighbour on other side

• when anyone receives a message from both sides, they can compute the length of the queue by adding both messages and one (for themselves)

54 61 2 3

Counting vertices on a tree

How would the algorithm have to be modified to count vertices on a tree?

1

3

4

5

6

2

From David Mackay’s book

Viterbi Algorithm

Complexity O(nh2)

• Initialize S1(x1) = m1(x1)

• For k = 2 : n

Sk(xk) = mk(xk) + minxk−1

{Sk−1(xk−1) + φ(xk−1, xk)}bk(xk) = argmin

xk−1{Sk−1(xk−1) + φ(xk−1, xk)}

• Terminatex∗n = argmin

xnSn(xn)

• Backtrackxi−1 = bi(xi)

RECAP

Message passing for Dynamic Programming

• nodes send messages to their neighbours

• messages are vectors of dimension h

• notation: is message sent from p to q at time t

• also known as Belief Propagation

mtp→q(vi), i.e. the ith component of the vector,is low if vertex p “believes” that i is a good

(low cost) solution for vertex q

Non Examinable

minxf(x) =

Xvp∈V

ψp(vp) +Xepq∈E

φ(vp, vq)

Algorithm

• Initialize mtp→q(vq) = 0

• Repeat for each vertex

mtp→q(vq) = minvp{ψp(vp) + φ(vp, vq)

+X

s∈N(p)\qmt−1s→p(vp)}

Viterbi Algorithm using Message Passing

Non Examinable

Shortest Path Algorithms

The Shortest Path Problem

Given: a weighted directed graph, with a single source s

Goal: Find the shortest path from s to t

Length of path = Σ Length of arcs

Application

Minimize numberof stops(lengths = 1)

Minimize amountof time(positive lengths)

Dijkstra’s Algorithm (1959)

Given: a directed graph with non-negative edge costs, Find: the shortest path from s to every other vertex

• pick the unvisited vertex with the lowest distance to s

• calculate the distance through it to each unvisited neighbour, and update the neighbour's distance if smaller

• mark vertex as visited once all neighbours explored

Example: Dijkstra's Shortest Path Algorithm

Find shortest path from s to t.

s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

Slide: Robert Sedgewick and Kevin Wayne

Dijkstra's Shortest Path Algorithm

s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

∞

∞ ∞

∞

∞

∞

∞

0

distance label

S = { }PQ = { s, 2, 3, 4, 5, 6, 7, t }

Slide: Robert Sedgewick and Kevin Wayne


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

∞

∞ ∞

∞

∞

∞

∞

0

distance label

S = { }PQ = { s, 2, 3, 4, 5, 6, 7, t }

select


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

distance label

S = { s }PQ = { 2, 3, 4, 5, 6, 7, t }

decrease

∞X

∞

∞X

X

compute distance through vertex to each unvisited neighbour, and update with min if smaller


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

distance label

S = { s }PQ = { 2, 3, 4, 5, 6, 7, t }

∞X

∞

∞X

X

Select pick the unvisited vertex with the lowest distance to s


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

S = { s, 2 }PQ = { 3, 4, 5, 6, 7, t }

∞X

∞

∞X

X


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

S = { s, 2 }PQ = { 3, 4, 5, 6, 7, t }

∞X

∞

∞X

X

decrease

X 33

compute distance through vertex to each unvisited neighbour, and update with min if smaller


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

S = { s, 2 }PQ = { 3, 4, 5, 6, 7, t }

∞X

∞

∞X

X

X 33

select

pick the unvisited vertex with the lowest distance to s


s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9 ∞

∞

∞

14

∞

0

S = { s, 2, 6 }PQ = { 3, 4, 5, 7, t }

∞X

∞

∞X

X

X 33

44X

X32

Dijkstra's Shortest Path Algorithm - solution

s

3

t

2

6

7

45

24

18

2

9

14

155

30

20

44

16

11

6

19

6

15

9

∞

∞

14

∞

0

S = { s, 2, 3, 4, 5, 6, 7, t }PQ = { }

∞X

∞

∞X

X

44X

35X

59 XX51

X 34

X50

X45

∞X 33X32

Applications of shortest path algorithms:

• plotting routes on Google maps

• robot path planning

• urban traffic planning (e.g. through congestion/road works)

• routing of communications messages

• and many more …

Shortest path using linear programming

Given: a weighted directed graph, with a single source s

Distance from s to v: length of the shortest part from s to v

Goal: Find distance (and shortest path) to every vertex

More …

• Applications:• Snakes in computer vision: dynamic programming on a

closed or open chain

• Detecting human limb layout in images: dynamic programming on trees

• David Mackay’s book for message passing

• More on web page: • http://www.robots.ox.ac.uk/~az/lectures/opt

Documents

Lecture 3 - University of Oxfordaz/lectures/opt/lect3.pdf · 2013. 2. 10. · Lecture 3 C25 Optimization Hilary 2013 A. Zisserman Dynamic Programming on graphs • Terminology: chains,