34
The right solution to a nearby problem? Or a nearby solution to the right problem? OPTIMIZATION

Andrew Fitzgibbon Optimization

Embed Size (px)

DESCRIPTION

Second talk by Andrew Fitzgibbon in Microsoft Computer Vision Summer School 2011

Citation preview

Page 1: Andrew Fitzgibbon Optimization

The right solution to a nearby problem? Or a nearby solution to the right problem?

OPTIMIZATION

Page 2: Andrew Fitzgibbon Optimization

HOW TO GET IDEAS

Chase the high-hanging fruit.

Try to make stuff really work.

Look for things that confuse/annoy you.

And occasionally you’ll get a good idea.

Write about it well.

2

Page 3: Andrew Fitzgibbon Optimization

COMPUTER VISION AS OPTIMIZATION

Computing PCA, LDA, ....

Clustering, Gaussian mixture fitting, ....

Sometimes the optimization is easy...

...sometimes it’s hard

Sometimes the hard problem is the one you must solve

Page 4: Andrew Fitzgibbon Optimization

GOAL

Given function 𝑓 𝑥 :ℝ𝑑 ↦ ℝ,

Devise strategies for finding 𝑥 which minimizes 𝑓

Page 5: Andrew Fitzgibbon Optimization

CLASSES OF FUNCTIONS

Given function 𝑓 𝑥 :ℝ𝑑 ↦ ℝ

Devise strategies for finding 𝑥 which minimizes 𝑓

quadratic convex single-extremum

multi-extremum

noisy horrible

Page 6: Andrew Fitzgibbon Optimization

EXAMPLE

x

y

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

Page 7: Andrew Fitzgibbon Optimization

EXAMPLE

x

y

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

>> print -dmeta

Page 8: Andrew Fitzgibbon Optimization

EXAMPLE

>> print –dpdf % then go to pdf and paste

Page 9: Andrew Fitzgibbon Optimization

EXAMPLE

>> set(gcf, 'paperUnits', 'centimeters', 'paperposition', [1 1 9 6.6]) >> print –dpdf % then go to pdf and paste

Page 10: Andrew Fitzgibbon Optimization

switch to matlab…

Page 11: Andrew Fitzgibbon Optimization

ALTERNATION

Easy Hard

Page 12: Andrew Fitzgibbon Optimization

WHAT TO DO?

Ignore the hard ones

If you search, you’ll find plenty of easy examples

But your customers won’t…

Or update more than one coordinate at once

Page 13: Andrew Fitzgibbon Optimization

GRADIENT DESCENT

Alternation is slow because valleys may not be axis aligned

So try gradient descent?

-5 0 5 10 15 -5

0

5

10

15

Page 14: Andrew Fitzgibbon Optimization

GRADIENT DESCENT

Alternation is slow because valleys may not be axis aligned

So try gradient descent?

Page 15: Andrew Fitzgibbon Optimization

GRADIENT DESCENT

Alternation is slow because valleys may not be axis aligned

So try gradient descent?

Note that convergence proofs are available for both of the above

But so what?

Page 16: Andrew Fitzgibbon Optimization

AND ON A HARD PROBLEM

Page 17: Andrew Fitzgibbon Optimization

USE A BETTER ALGORITHM

(Nonlinear) conjugate gradients

Uses 1st derivatives only

Avoids “undoing” previous work

Page 18: Andrew Fitzgibbon Optimization

USE A BETTER ALGORITHM

(Nonlinear) conjugate gradients

Uses 1st derivatives only

And avoids “undoing” previous work

101 iterations on this problem

Page 19: Andrew Fitzgibbon Optimization

but we can do better…

Page 20: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Starting with 𝑥 how can I choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

So compute min𝛿 𝑓 𝑥 + 𝛿

But hang on, that’s the same problem we were trying to solve?

Page 21: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Starting with 𝑥 how can I choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

So compute min𝛿 𝑓 𝑥 + 𝛿

≈ min𝛿 𝑓 𝑥 + 𝛿⊤𝑔(𝑥) + 12𝛿

⊤𝐻 𝑥 𝛿

𝑔 𝑥 = 𝛻𝑓 𝑥 𝐻 𝑥 = 𝛻𝛻⊤𝑓(𝑥)

Page 22: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

How does it look?

𝑓 𝑥 + 𝛿⊤𝑔(𝑥) + 12𝛿⊤𝐻 𝑥 𝛿

𝑔 𝑥 = 𝛻𝑓 𝑥 𝐻 𝑥 = 𝛻𝛻⊤𝑓(𝑥)

Page 23: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

Compute

min𝜹𝑓 + 𝜹⊤𝑔 + 12𝜹

⊤𝐻 𝜹

[derive]

Page 24: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

Compute

min𝜹𝑓 + 𝜹⊤𝑔 + 12𝜹

⊤𝐻 𝜹

𝜹 = −𝐻−1𝑔

Page 25: Andrew Fitzgibbon Optimization

IS THAT A GOOD IDEA?

demo_taylor_2d(0, 'newton', 'rosenbrock')

demo_taylor_2d(1, 'newton', 'rosenbrock')

demo_taylor_2d(1, 'newton', ‘sqrt_rosenbrock')

Page 26: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Choose 𝛿 so that 𝑓 𝑥 + 𝛿 is better than 𝑓(𝑥)?

Updates: 𝜹Newton = −𝐻

−1𝑔

𝜹GradientDescent = −𝜆𝑔

Page 27: Andrew Fitzgibbon Optimization

USE SECOND DERIVATIVES…

Updates: 𝜹Newton = −𝐻

−1𝑔 𝜹GradientDescent = −𝜆𝑔

So combine them: 𝜹DampedNewton = − 𝐻 + 𝜆

−1𝐼𝑑−1𝑔

= −𝜆 𝜆𝐻 + 𝐼𝑑−1𝑔

𝜆 small ⇒conservative gradient step

𝜆 large ⇒Newton step

Page 28: Andrew Fitzgibbon Optimization

1ST DERIVATIVES AGAIN

Levenberg-Marquardt

Just damped Newton with approximate 𝐻

For a special form of 𝑓

𝑓 𝑥 = 𝑓𝑖 𝑥2

𝑖

where 𝑓𝑖(𝑥) are zero-mean

small at the optimum

Page 29: Andrew Fitzgibbon Optimization

BACK TO FIRST DERIVATIVES

Levenberg Marquardt

Just damped Newton with approximate 𝐻

For a special form of 𝑓

𝑓 𝑥 = 𝑓𝑖 𝑥2

𝑖

𝛻𝑓 𝑥 = 𝛻𝛻⊤𝑓 𝑥 =

Page 30: Andrew Fitzgibbon Optimization

BACK TO FIRST DERIVATIVES

Levenberg Marquardt

Just damped Newton with approximate 𝐻

For a special form of 𝑓

𝑓 𝑥 = 𝑓𝑖 𝑥2

𝑖

𝛻𝑓 𝑥 = 2𝑓𝑖 𝑥 𝛻𝑓𝑖(𝑥)

𝑖

𝛻𝛻⊤𝑓 𝑥 = 2 𝑓𝑖 𝑥 𝛻𝛻⊤𝑓𝑖 𝑥 + 𝛻𝑓𝑖(𝑥)𝛻

⊤𝑓𝑖 𝑥

𝑖

Page 31: Andrew Fitzgibbon Optimization

CONCLUSION: YMMV

Page 32: Andrew Fitzgibbon Optimization

CONCLUSION: YMMV

GIRAFFE

500 runs

Page 33: Andrew Fitzgibbon Optimization

CONCLUSION: YMMV

FACE

1000 runs

DINOSAUR

1000 runs

GIRAFFE

500 runs

Page 34: Andrew Fitzgibbon Optimization

Truth B𝒖T

On many problems, alternation is just fine Indeed always start with a

couple of alternation steps

Computing 2nd derivatives is a pain anyone with compiler/parser

experience please contact me.

Just alternation is fine Unless you’re willing to

problem-select

Alternation has a convergence guarantee

Inverting the Hessian is always 𝑂(𝑛3)

TRUTH AND B𝑢T [∃𝑢 ∈ H, I, L, S,T,U 6]

There is a universal optimizer