Gordon Wetzstein Stanford University · • more accurate than homographymethod • can dial in lens distortion estimation, and estimation of intrinsic parameters (beyond this lecture,

Pose Tracking II

Gordon WetzsteinStanford University

EE 267 Virtual Reality

Lecture 12stanford.edu/class/ee267/

WARNING

• this class will be dense!

• will learn how to use nonlinear optimization (Levenberg-Marquardt algorithm) for pose estimation

• why ???• more accurate than homography method• can dial in lens distortion estimation, and estimation of intrinsic

parameters (beyond this lecture, see lecture notes)• LM is very common in 3D computer vision à camera-based tracking

Pose Estimation - Overview

• goal: estimate pose via nonlinear least squares optimization

• minimize reprojection error• pose p is 6-element vector with 3

Euler angles and translation of VRduino w.r.t. base station

minimizep{ }

b − f g p( )( )2

2

b =

x1n

y1n

!x4n

y4n

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

p =

θ x

θ y

θ z

txtytz

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

xin

yin

⎛

⎝⎜⎜

⎞

⎠⎟⎟

xiyi0

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

image formation

Overview

• review: gradients, Jacobian matrix, chain rule, iterative optimization

• nonlinear optimization: Gauss-Newton, Levenberg-Marquardt• pose estimation using LM• pose estimation with VRduino using nonlinear optimization

Review

Review: Gradients

• gradient of a function that depends on multiple variables:

∂∂x

f x( ) = ∂ f∂x1, ∂ f∂x2

,…, ∂ f∂xn

⎛⎝⎜

⎞⎠⎟

f :ℜn →ℜ

Review: The Jacobian Matrix

• gradient of a vector-valued function that depends on multiple variables:

∂∂x

f x( ) = J f =

∂ f1∂x1

!∂ f1∂xn

" # "∂ fm∂x1

!∂ fm∂xn

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

f :ℜn →ℜm , J f ∈ℜm×n

Review: The Chain Rule

• here’s how you’ve probably been using it so far:

• this rule applies when f :ℜ→ℜg :ℜ→ℜ

∂∂x

f g x( )( ) = ∂ f∂g

⋅ ∂g∂x

Review: The Chain Rule

• here’s how it is applied in general:

f :ℜo →ℜm , g :ℜn →ℜo, J f ∈ℜm×o, Jg ∈ℜ

o×n

∂∂x

f g x( )( ) = J f ⋅ Jg =

∂ f1∂g1

!∂ f1∂go

" # "∂ fm∂g1

!∂ fm∂go

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

⋅

∂g1∂x1

!∂g1∂xn

" # "∂go∂x1

!∂go∂xn

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Review: Minimizing a Function

• goal: find point x* that minimizes a nonlinear function f(x)

f x( )

xx*

Review: What is a Gradient?

• gradient of f at some point x0 is the slope at that point

f x( )

xx0


• gradient of f at some point x0 is the slope at that point

f x( )

xx0


• extremum is where gradient is 0! (sometimes have to check 2nd

derivative to see if it’s a minimum and not a maximum or saddle point)

f x( )

xx*

Review: Optimization

• convex optimization: there is only a single global minimum• non-convex optimization: multiple local minima

f x( ) f x( )

xx

convex non-convex

• extremum is where gradient is 0! (sometimes have to check 2nd

derivative to see if it’s a minimum and not a maximum or saddle point)


f x( )

x

• how to find where gradient is 0?


f x( )

x


1. start with some initial guess x0, e.g. a random value

x0


f x( )

x


1. start with some initial guess x0, e.g. a random value2. update guess by linearizing function and minimizing that

x0


f x( )

x

• how to linearize a function? à using Taylor expansion!

x0

f x0 + Δx( ) ≈ f x0( ) + J fΔx + ε


f x( )

x

• find minimum of linear function approximation

x0

f x0 + Δx( ) ≈ f x0( ) + J fΔx + ε

minimizeΔx

b − f x0 + Δx( )2

2

≈ b − f x0( ) + J fΔx( )2

2


f x( )

x

• find minimum of linear function approximation (gradient=0)

x0

0 = J fT J fΔx − J f

T b − f x( )( ) Δx = J fT J f( )−1 J fT b − f x( )( )

normal equations

f x0 + Δx( ) ≈ f x0( ) + J fΔx + ε

minimizeΔx

b − f x0 + Δx( )2

2

≈ b − f x0( ) + J fΔx( )2

2

equate gradient to zero:


f x( )

x

• take step and repeat procedure

x0 Δxx0+

Δx = J fT J f( )−1 J fT b − f x( )( )


f x( )

x

• take step and repeat procedure, will get there eventually

x0 Δxx0+

Δx = J fT J f( )−1 J fT b − f x( )( )

Review: Optimization – Gauss-Newton

f x( )

x

• results in an iterative algorithm

x0

xk+1 = xk + Δx

Δx = J fT J f( )−1 J fT b − f x( )( )


f x( )

x

• results in an iterative algorithm

x0

1. x = rand() // initialize x0

2. for k=1 to max_iter

3. f = eval_objective(x)

4. J = eval_jacobian(x)

5. x = x + inv(J’*J)*J’*(b-f) // update x

Δx = J fT J f( )−1 J fT b − f x( )( )


f x( )

x

• matrix JTJ can be ill-conditioned (i.e. not invertible)

x0

Δx = J fT J f( )−1 J fT b − f x( )( )

Review: Optimization – Levenberg

f x( )

x

• matrix JTJ can be ill-conditioned (i.e. not invertible)• add a diagonal matrix to make invertible – acts as damping

x0

Δx = J fT J f + λI( )−1 J fT b − f x( )( )

Review: Optimization – Levenberg-Marquardt

f x( )

x

• matrix JTJ can be ill-conditioned (i.e. not invertible)• better: use JTJ instead of I as damping. This is LM!

x0

Δx = J fT J f + λ diag J f

T J f( )( )−1 J fT b − f x( )( )

Review: Optimization – Levenberg-Marquardt

f x( )

x

• matrix JTJ can be ill-conditioned (i.e. not invertible)• better: use JTJ instead of I as damping. This is LM!

x0

1. x = rand() // initialize x0


3. f = eval_objective(x)

4. J = eval_jacobian(x)

5. x = x+inv(J’*J+lambda*diag(J’*J))*J’*(b-f)

Δx = J fT J f + λ diag J f

T J f( )( )−1 J fT b − f x( )( )

Pose Estimation via Levenberg-Marquardt

Pose Estimation - Overview


• minimize reprojection error• pose p is 6-element vector with 3

Euler angles and translation of VRduino w.r.t. base station

minimizep{ }

b − f g p( )( )2

2

b =

x1n

y1n

!x4n

y4n

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

p =

θ x

θ y

θ z

txtytz

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

xin

yin

⎛

⎝⎜⎜

⎞

⎠⎟⎟

xiyi0

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

image formation


• objective function is sum of squares of reprojection error

minimizep{ }

b − f g p( )( )2

2

b − f g p( )( )

2

2= x1

n − f1 g p( )( )( )2 + y1n − f2 g p( )( )( )2 +…+ x4

n − f7 g p( )( )( )2 + y4n − f8 g p( )( )( )2

xiyi0

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

xin

yin

⎛

⎝⎜⎜

⎞

⎠⎟⎟

Pose Estimation - Objective Function

image formation

Image Formation1. transform 3D point into view space:

xiyi0

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

xin

yin

⎛

⎝⎜⎜

⎞

⎠⎟⎟

xin

yin

⎛

⎝⎜⎜

⎞

⎠⎟⎟=

xic

wic

yic

wic

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

2. perspective divide:

xic

yic

wic

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

=1 0 00 1 00 0 −1

⎛

⎝

⎜⎜

⎞

⎠

⎟⎟ ⋅

r11 r12 tx

r21 r22 ty

r31 r32 tz

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

⋅xi

yi

1

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

=

h1 h2 h3

h4 h5 h6

h7 h8 h9

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟⋅

xi

yi

1

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

• split up image formation into two functions

Image Formation: g(p) and f(h)

f h( ) = f g p( )( )g :ℜ6 →ℜ9, f :ℜ9 →ℜ8

Image Formation: f(h)

f h( ) =

f1 h( )f2 h( )!

f7 h( )f8 h( )

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

=

x1n

y1n

!x4n

y4n

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

=

h1x1 + h2y1 + h3h7x1 + h8y1 + h9h4x1 + h5y1 + h6h7x1 + h8y1 + h9

!h1x4 + h2y4 + h3h7x4 + h8y4 + h9h4x4 + h5y4 + h6h7x4 + h8y4 + h9

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

• f(h) uses elements of homography matrix h to compute projected 2D coordinates as

Jacobian Matrix of f(h)

J f =

∂ f1∂h1

!∂ f1∂h9

" # "∂ f8∂h1

!∂ f8∂h9

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟f1 h( )= h1 x1+h2 y1+h3

h7 x1+h8 y1+h9

∂ f1∂h1

= x1h7 x1 + h8 y1 + h9

∂ f1∂h2

= y1h7 x1 + h8 y1 + h9

∂ f1∂h3

= 1h7 x1 + h8 y1 + h9

∂ f1∂h4

= 0, ∂ f1∂h5

= 0, ∂ f1∂h6

= 0

∂ f1∂h7

= − h1 x1 + h2 y1 + h3h7 x1 + h8 y1 + h9( )2

⎛

⎝⎜

⎞

⎠⎟ x1

∂ f1∂h8

= − h1 x1 + h2 y1 + h3h7 x1 + h8 y1 + h9( )2

⎛

⎝⎜

⎞

⎠⎟ y1

∂ f1∂h9

= − h1 x1 + h2 y1 + h3h7 x1 + h8 y1 + h9( )2

• first row of Jacobian matrix

Jacobian Matrix of f(h)

• the remaining rows of the Jacobian can be derived with a similar pattern

• see course notes for a detailed deriavtion of the elements of this Jacobian matrix

J f =

∂ f1∂h1

!∂ f1∂h9

" # "∂ f8∂h1

!∂ f8∂h9

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Image Formation: g(p)• g(p) uses 6 pose parameters to compute elements

of homography matrix h as

g p( ) =g1 p( )!

g9 p( )

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟=

h1!h9

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

h1 h2 h3h4 h5 h6h7 h8 h9

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟=

1 0 00 1 00 0 −1

⎛

⎝

⎜⎜

⎞

⎠

⎟⎟⋅

r11 r12 txr21 r22 tyr31 r32 tz

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

definition of homography matrix:

r11 r12 r13r21 r22 r23r31 r32 r33

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟=

cos θ z( ) −sin θ z( ) 0

sin θ z( ) cos θ z( ) 0

0 0 1

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

1 0 00 cos θ x( ) −sin θ x( )0 sin θ x( ) cos θ x( )

⎛

⎝

⎜⎜⎜

⎞

⎠

⎟⎟⎟

cos θ y( ) 0 sin θ y( )0 1 0

−sin θ y( ) 0 cos θ y( )

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

rotation matrix from Euler angles:R = Rz θ z( ) ⋅Rx θ x( ) ⋅Ry θ y( )

Image Formation: g(p)

h1 = g1 p( ) = cos θ y( )cos θ z( )− sin θ x( )sin θ y( )sin θ z( )h2 = g2 p( ) = −cos θ x( )sin θ z( )h3 = g3 p( ) = txh4 = g4 p( ) = cos θ y( )sin θ z( ) + sin θ x( )sin θ y( )cos θ z( )h5 = g5 p( ) = cos θ x( )cos θ z( )h6 = g6 p( ) = ty

h7 = g7 p( ) = cos θ x( )sin θ y( )h8 = g8 p( ) = −sin θ x( )h9 = g9 p( ) = −tz

• write as

Jg =

∂g1∂p1

!∂g1∂p6

" # "∂g9∂p1

!∂g9∂p6

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Jacobian Matrix of g(p)p = p1, p2 , p3 , p4 , p5 , p6( ) = θ x ,θ y ,θ z , t x , t y , t z( )

h1 = g1 p( ) = cos θ y( )cos θ z( )− sin θ x( )sin θ y( )sin θ z( )h2 = g2 p( ) = −cos θ x( )sin θ z( )h3 = g3 p( ) = txh4 = g4 p( ) = cos θ y( )sin θ z( ) + sin θ x( )sin θ y( )cos θ z( )h5 = g5 p( ) = cos θ x( )cos θ z( )h6 = g6 p( ) = ty

h7 = g7 p( ) = cos θ x( )sin θ y( )h8 = g8 p( ) = −sin θ x( )h9 = g9 p( ) = −tz

Jacobian Matrix of g(p)

∂g1∂p1

= −cos θ x( )sin θ y( )sin θ z( )∂g1∂p2

= −sin θ y( )cos θ z( )− sin θ x( )cos θ y( )sin θ z( )∂g1∂p3

= −cos θ y( )sin θ z( )− sin θ x( )sin θ y( )cos θ z( )∂g1∂p4

= 0, ∂g1∂p5

= 0, ∂g1∂p6

= 0

h1 = g1 p( ) = cos θ y( )cos θ z( )− sin θ x( )sin θ y( )sin θ z( )

p = p1, p2 , p3 , p4 , p5 , p6( ) = θ x ,θ y ,θ z , t x , t y , t z( )

Jg =

∂g1∂p1

!∂g1∂p6

" # "∂g9∂p1

!∂g9∂p6

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

• first row of Jacobian matrix

Jacobian Matrix of g(p)

Jg =

∂g1∂p1

!∂g1∂p6

" # "∂g9∂p1

!∂g9∂p6

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

• the remaining rows of the Jacobian can be derived with a similar pattern

• see course notes for a detailed deriavtion of the elements of this Jacobian matrix

Jacobian Matrices of f and g• to get the Jacobian of f(g(p)), compute the two Jacobian

matrices and multiply them

J = J f ⋅ Jg =

∂ f1∂h1

!∂ f1∂h9

" # "∂ f8∂h1

!∂ f8∂h9

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

⋅

∂g1∂p1

!∂g1∂p6

" # "∂g9∂p1

!∂g9∂p6

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Pose Tracking with LM

• LM then iteratively updates pose as

• pseudo-code 1. p = ... // initialize p0


3. f = eval_objective(p)

4. J = get_jacobian(p)

5. p = p + inv(J’*J+lambda*diag(J’*J))*J’*(b-f)

p k+1( ) = p k( ) + J T J + λ diag J T J( )( )−1

J T b− f g p k( )( )( )( )


1. value = function eval_objective(p)

2. for i=1:4

3. value(2*(i-1)) = ...

4. value(2*(i-1)+1) = ...

h1xi + h2yi + h3h7xi + h8yi + h9

h4xi + h5yi + h6h7xi + h8yi + h9


1. J = function get_jacobian(p)

2. Jf = get_jacobian_f(g(p))

3. Jg = get_jacobian_g(p)

4. J = Jf*Jg

Jg =

∂g1∂p1

!∂g1∂p6

" # "∂g9∂p1

!∂g9∂p6

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

J f =

∂ f1∂h1

!∂ f1∂h9

" # "∂ f8∂h1

!∂ f8∂h9

⎛

⎝

⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟

Pose Tracking with LM on VRduino

• some more hints for implementation:

• let Arduino Matrix library compute matrix-matrix multiplications and also matrix inverses for you!

• run homography method and use that to initialize p for LM

• use something like 5-25 interations of LM per frame for real-time performance

• user-defined parameter • good luck!

λ

Outlook: Camera Calibration• camera calibration is one of the most fundamental problems in

computer vision and imaging

• task: estimate intrinsic (lens distortion, focal length, principle point) & extrinsic (translation, rotation) camera parameters given images of planar checkerboards

• uses similar procedure as discussed today

http://www.vision.caltech.edu/bouguetj/calib_doc/

Outlook: Sensor Fusion with Extended Kalman Filter

• also desirable: estimate bias of each of all IMU sensors

• also desirable: joint pose estimation from all IMU + photodiode measurements

• can do all of that with an Extended Kalman Filter - slightly too advanced for this class, but you can find a lot of literature in the robotic vision community

Outlook: Sensor Fusion with Extended Kalman Filter

• Extended Kalman filter: can be interpreted as a Bayesian framework for sensor fusion

• Hidden Markov Model (HMM)

xt−1

zt−1

xt

zt

x1

z1

x0 … unknown, evolving states

measurements

known initial state

Must read: course notes on tracking!

Documents

Gordon Wetzstein Stanford University · • more accurate than homographymethod • can dial in lens distortion estimation, and estimation of intrinsic parameters (beyond this lecture,