270
UNIVERSIDAD POLIT ´ ECNICA DE MADRID FACULTAD DE INFORM ´ ATICA TESIS DOCTORAL Efficient Model-based 3D Tracking by Using Direct Image Registration presentada en la FACULTAD DE INFORM ´ ATICA de la UNIVERSIDAD POLIT ´ ECNICA DE MADRID para la obtenci´ on del GRADO DE DOCTOR EN INFORM ´ ATICA AUTOR: Enrique Mu˜ noz Corral DIRECTOR: Luis Baumela Molina Madrid, 2012

Efficient Model-based 3D Tracking by Using Direct Image Registration

Embed Size (px)

Citation preview

Page 1: Efficient Model-based 3D Tracking by Using Direct Image Registration

UNIVERSIDAD POLITECNICA DE MADRID

FACULTAD DE INFORMATICA

TESIS DOCTORALEfficient Model-based 3D Tracking by Using Direct Image

Registration

presentada en laFACULTAD DE INFORMATICA

de laUNIVERSIDAD POLITECNICA DE MADRID

para la obtencion delGRADO DE DOCTOR EN INFORMATICA

AUTOR: Enrique Munoz CorralDIRECTOR: Luis Baumela Molina

Madrid, 2012

Page 2: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 3: Efficient Model-based 3D Tracking by Using Direct Image Registration

i

Page 4: Efficient Model-based 3D Tracking by Using Direct Image Registration

ii

Page 5: Efficient Model-based 3D Tracking by Using Direct Image Registration

Agradecimientos

La verdad es que los diez anos (diez!) que he tardado en escribir esta tesis danpara muchas cosas, y si tuviera que agradecer algo a todas las personas que mehan ayudado, necesitarıa un capıtulo entero. En primer lugar quisiera agradecera Luis Baumela, gran director de tesis y mejor persona, el haber despertado enmı el gusanillo por la investigacion, y sobre todo, por tener la suficiente pacienciapara aguantar mis cabezonadas. Luis, si no fuera por tı, no habrıa entrado en laUniversidad y estarıa en la empresa privada ganando una pasta gansa—yeah, thankyou so much!

Gracias mil a Javier de Lope, por incansables discusiones tecnicas y no tantecnicas y sobre todo a Jose Miguel Buenaposada, quien durante todos estos anosme ha aguantado, ayudado, irritado, bromeado, e incluso buscado trabajo. No mepuedo olvidar de los buenos ratos pasados en la hora de la comida junto con las“chicas” de estadıstica (Maribel, Arminda, Concha y Juan Antonio), en las que hanaguantado mis interminables peroratas sobre la burbuja inmobiliaria y los polıticospatrios. Un recuerdo tambien para todos los companeros que han pasado por ellaboratorio L-3202 durante estos anos: los “chicos de Javi” (Javi, Juan, Bea yYadira), Juan Bekios, los dos “Pablos” (Marquez y Herrero), Antonio y Ruben.

Quisiera agradecer tambien a Lourdes Agapito por permitirme participar en elproyecto Automated facial expression analysis using computer vision, financiado porla Royal Society del Reino Unido. Gracias a este proyecto pude tener el privilegiode trabajar con Lourdes y con Xavier Llado, y sobre todo de conocer a ese singularpersonaje llamado Alessio del Bue. No tengo palabras para agradecer a Alessio elser tan majete y el aguantar estoicamente tantas veces como le hemos gorroneado.Tampoco puedo olvidarme de la ayuda prestada por el profesor Thomas Vetter y sugrupo de la Universidad de Basilea (especialmente Brian Amberg y Pascal Paysan);ellos se tomaron la molestia de construir un modelo tridimensional de mi cara,incluyendo deformaciones y expresiones. No quisiera cerrar estos agradecimientossin comentar que parte de los trabajos de esta tesis se han realizado bajo los proyectosdel Ministerio de Ciencia y Tecnologıa TIC2002-00591, y del Ministerio de Cienciae Innovacion TIN2008-06815-C02-02.

Y por ultimo, aunque no por ello menos importante, agradecer a Susana lapaciencia que ha tenido todos estos anos (que han sido muchos) en los que he estadoliado con la tesis. Va por tı, Susana!

Enero de 2012

iii

Page 6: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 7: Efficient Model-based 3D Tracking by Using Direct Image Registration

Contents

Resumen xvii

Summary xix

Notations 1

1 Introduction 51.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 9

2 Literature Review 132.1 Image Registration vs. Tracking . . . . . . . . . . . . . . . . . . . . . 132.2 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Model-based 3D Tracking . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Modelling assumptions . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Rigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.3 Nonrigid Objects . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.4 Facial Motion Capture . . . . . . . . . . . . . . . . . . . . . . 18

3 Efficient Direct Image Registration 213.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Imaging Geometry . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Brightness Constancy Constraint . . . . . . . . . . . . . . . . 233.2.3 Image Registration by Optimization . . . . . . . . . . . . . . . 233.2.4 Additive vs. Compositional . . . . . . . . . . . . . . . . . . . 25

3.3 Additive approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.1 Lucas-Kanade Algorithm . . . . . . . . . . . . . . . . . . . . . 273.3.2 Hager-Belhumeur Factorization Algorithm . . . . . . . . . . . 29

3.4 Compositional approaches . . . . . . . . . . . . . . . . . . . . . . . . 313.4.1 Forward Compositional Algorithm . . . . . . . . . . . . . . . . 333.4.2 Inverse Compositional Algorithm . . . . . . . . . . . . . . . . 35

3.5 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

v

Page 8: Efficient Model-based 3D Tracking by Using Direct Image Registration

4 Equivalence of Gradients 39

4.1 Image Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Image Gradients in R2 . . . . . . . . . . . . . . . . . . . . . . 40

4.1.2 Image Gradients in P2 . . . . . . . . . . . . . . . . . . . . . . 42

4.1.3 Image Gradients in R3 . . . . . . . . . . . . . . . . . . . . . . 43

4.2 The Gradient Equivalence Equation . . . . . . . . . . . . . . . . . . . 45

4.2.1 Relevance of the Gradient Equivalence Equation . . . . . . . . 46

4.2.2 General Approach to Gradient Replacement . . . . . . . . . . 46

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Additive Algorithms 51

5.1 Gradient Replacement Requirements . . . . . . . . . . . . . . . . . . 52

5.2 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 3D Rigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3.1 3D Textured Models . . . . . . . . . . . . . . . . . . . . . . . 55

5.3.2 Shape-induced Homography . . . . . . . . . . . . . . . . . . . 57

5.3.3 Change to the Reference Frame . . . . . . . . . . . . . . . . . 57

5.3.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 61

5.3.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 61

5.3.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 63

5.4 3D Nonrigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4.1 Nonrigid Morphable Models . . . . . . . . . . . . . . . . . . . 65

5.4.2 Nonrigid Shape-induced Homography . . . . . . . . . . . . . . 65

5.4.3 Change of Variables to the Reference Frame . . . . . . . . . . 66

5.4.4 Optimization Outline . . . . . . . . . . . . . . . . . . . . . . . 69

5.4.5 Gradient Replacement . . . . . . . . . . . . . . . . . . . . . . 69

5.4.6 Systematic Factorization . . . . . . . . . . . . . . . . . . . . . 71

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Compositional Algorithms 77

6.1 Unravelling the Inverse Compositional Algorithm . . . . . . . . . . . 77

6.1.1 Change of Variables in IC . . . . . . . . . . . . . . . . . . . . 79

6.1.2 The Efficient Forward Compositional Algorithm . . . . . . . . 79

6.1.3 Rationale of the Change of Variables in IC . . . . . . . . . . . 82

6.1.4 Differences between IC and EFC . . . . . . . . . . . . . . . . . 84

6.2 Requirements for Compositional Warps . . . . . . . . . . . . . . . . . 85

6.2.1 Requirement on Warp Composition . . . . . . . . . . . . . . . 85

6.2.2 Requirement on Gradient Equivalence . . . . . . . . . . . . . 85

6.3 Other Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 86

6.3.1 Generalized Inverse Compositional Algorithm . . . . . . . . . 86

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

vi

Page 9: Efficient Model-based 3D Tracking by Using Direct Image Registration

7 Computational Complexity 917.1 Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.1.1 Number of Operations . . . . . . . . . . . . . . . . . . . . . . 917.1.2 Complexity of Matrix Operations . . . . . . . . . . . . . . . . 927.1.3 Comparing Algorithm Complexities . . . . . . . . . . . . . . . 93

7.2 Algorithm Naming Conventions . . . . . . . . . . . . . . . . . . . . . 947.2.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 957.2.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 96

7.3 Complexity of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 967.3.1 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . 977.3.2 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . 103

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8 Experiments 1078.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.2 Features and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.2.1 Numerical Ranges for Features . . . . . . . . . . . . . . . . . . 1158.3 Generation of Synthetic Experiments . . . . . . . . . . . . . . . . . . 116

8.3.1 Synthetic Datasets and Images . . . . . . . . . . . . . . . . . 1188.3.2 Generation of Result Plots . . . . . . . . . . . . . . . . . . . . 120

8.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 1228.4.1 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . 1228.4.2 Visibility Management . . . . . . . . . . . . . . . . . . . . . . 1228.4.3 Scale of Homographies . . . . . . . . . . . . . . . . . . . . . . 1258.4.4 Minimization of Jacobian Operations . . . . . . . . . . . . . . 126

8.5 Additive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 1268.5.1 Experimental Hypotheses . . . . . . . . . . . . . . . . . . . . 1268.5.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 1278.5.3 Experiments with Synthetic Nonrigid data . . . . . . . . . . . 1428.5.4 Experiments With Nonrigid Sequence . . . . . . . . . . . . . . 1518.5.5 Experiments with real Rigid data . . . . . . . . . . . . . . . . 1548.5.6 Experiment with real Nonrigid data . . . . . . . . . . . . . . . 158

8.6 Compositional Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 1638.6.1 Experimental Hyphoteses . . . . . . . . . . . . . . . . . . . . 1638.6.2 Experiments with Synthetic Rigid data . . . . . . . . . . . . . 163

8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9 Conclusions and Future work 1799.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 1799.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1809.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

A Gauss-Newton Optimization 201

B Plane-induced Homography 203

vii

Page 10: Efficient Model-based 3D Tracking by Using Direct Image Registration

C Plane+Parallax-constrained Homography 205C.1 Compositional Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

D Methodical Factorization 209D.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209D.2 Lemmas that Re-organize Product of Matrices . . . . . . . . . . . . . 211D.3 Lemmas that Re-organize Kronecker Products . . . . . . . . . . . . . 215D.4 Lemmas that Re-organize Sums of Matrices . . . . . . . . . . . . . . 216

E Methodical Factorization of f3DTM 219

F Methodical Factorization of f3DMM (Partial case) 223

G Methodical Factorization of f3DMM (Full case) 225

H Detailed Complexity of Algorithms 235H.1 Warp f3DTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235H.2 Warp f3DMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236H.3 Jacobian of Algorithm HB3DTM . . . . . . . . . . . . . . . . . . . . 237H.4 Jacobian of Algorithm HB3DTMNF . . . . . . . . . . . . . . . . . . 239H.5 Jacobian of Algorithm HB3DMMNF . . . . . . . . . . . . . . . . . 241H.6 Jacobian of Algorithm HB3DMMSF . . . . . . . . . . . . . . . . . . 246

viii

Page 11: Efficient Model-based 3D Tracking by Using Direct Image Registration

List of Figures

1.1 Example of 3D rigid tracking. . . . . . . . . . . . . . . . . . . . . 61.2 3D Nonrigid Tracking. . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Image registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Industrial applications of 3D tracking. . . . . . . . . . . . . . . 91.5 Motion capture in the film industry. . . . . . . . . . . . . . . . 101.6 Markerless facial motion capture. . . . . . . . . . . . . . . . . . 11

3.1 Imaging geometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Iterative gradient descent image registration. . . . . . . . . . . 243.3 Generic descent method for image registration. . . . . . . . . . 263.4 Lucas-Kanade image registration. . . . . . . . . . . . . . . . . . 283.5 Hager-Belhumeur image registration. . . . . . . . . . . . . . . . 323.6 Forward compositional image registration. . . . . . . . . . . . . 343.7 Inverse compositional image registration. . . . . . . . . . . . . 36

4.1 Depiction of Image Gradients. . . . . . . . . . . . . . . . . . . . 414.2 Image Gradient in P

2. . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 Image gradient in R

3. . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 Comparison between BCC and GEE. . . . . . . . . . . . . . . . 474.5 Gradients and Convergence. . . . . . . . . . . . . . . . . . . . . . 494.6 Open Subsets in Various Domains. . . . . . . . . . . . . . . . . . 49

5.1 3D Textured Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2 Shape-induced homographies. . . . . . . . . . . . . . . . . . . . . 585.3 Warp defined on the reference frame. . . . . . . . . . . . . . . . 595.4 Reference frame advantages. . . . . . . . . . . . . . . . . . . . . . 605.5 Nonrigid Morphable Models. . . . . . . . . . . . . . . . . . . . . 655.6 Nonrigid shape-induced homographies. . . . . . . . . . . . . . . 675.7 Deformable warp defined on the reference frame. . . . . . . . 68

6.1 Change of variables in IC. . . . . . . . . . . . . . . . . . . . . . . 806.2 Forward compositional image registration. . . . . . . . . . . . . 836.3 Generalized inverse compositional image registration. . . . . . 88

7.1 Complexity of Additive Algorithms. . . . . . . . . . . . . . . . . 1027.2 Complexities of Compositional Algorithms . . . . . . . . . . . 105

ix

Page 12: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.1 Registration vs. Tracking. . . . . . . . . . . . . . . . . . . . . . . 1098.2 Algorithm initialization . . . . . . . . . . . . . . . . . . . . . . . . 1108.3 Accuracy and convergence. . . . . . . . . . . . . . . . . . . . . . 1148.4 Ground Truth and Noise Variance. . . . . . . . . . . . . . . . . 1178.5 Definition of Datasets. . . . . . . . . . . . . . . . . . . . . . . . . 1188.6 Example of Synthetic Datasets. . . . . . . . . . . . . . . . . . . . 1198.7 Experimental Evaluation with Synthetic Data . . . . . . . . . 1218.8 Visibility management. . . . . . . . . . . . . . . . . . . . . . . . . 1238.9 Efficiently solving of WLS. . . . . . . . . . . . . . . . . . . . . . . 1258.10 The cube model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.11 The face model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.12 The tea box model. . . . . . . . . . . . . . . . . . . . . . . . . . . 1298.13 Results from dataset DS1 for cube. . . . . . . . . . . . . . . . . . 1308.14 Results from dataset DS2 for cube. . . . . . . . . . . . . . . . . . 1318.15 Results from dataset DS3 for cube. . . . . . . . . . . . . . . . . . 1328.16 Results from dataset DS4 for cube. . . . . . . . . . . . . . . . . . 1338.17 Results from dataset DS5 for cube. . . . . . . . . . . . . . . . . . 1348.18 Results from dataset DS6 for cube. . . . . . . . . . . . . . . . . . 1358.19 tea box sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1368.20 Results for the tea box sequence. . . . . . . . . . . . . . . . . . . 1378.21 Estimated parameters from teabox sequence. . . . . . . . . . . 1388.22 Estimated parameters from face sequence. . . . . . . . . . . . . 1408.23 Good texture vs. bad texture. . . . . . . . . . . . . . . . . . . . 1418.24 The face-deform model. . . . . . . . . . . . . . . . . . . . . . . . . 1428.25 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 1438.26 Results from dataset DS1 for face-deform. . . . . . . . . . . . . 1458.27 Results from dataset DS2 for face-deform. . . . . . . . . . . . . 1468.28 Results from dataset DS3 for face-deform. . . . . . . . . . . . . 1478.29 Results from dataset DS4 for face-deform. . . . . . . . . . . . . 1488.30 Results from dataset DS5 for face-deform. . . . . . . . . . . . . 1498.31 Results from dataset DS6 for face-deform. . . . . . . . . . . . . 1508.32 face-deform sequence. . . . . . . . . . . . . . . . . . . . . . . . . . 1518.33 Results from face-deform sequence. . . . . . . . . . . . . . . . . 1528.34 Estimated parameters from face-deform sequence. . . . . . . . 1538.35 The cube-real model. . . . . . . . . . . . . . . . . . . . . . . . . . 1548.36 The cube-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 1568.37 Results from cube-real sequence. . . . . . . . . . . . . . . . . . . 1578.38 Selected facial scans used to build the model. . . . . . . . . . . 1588.39 Unfolded texture model. . . . . . . . . . . . . . . . . . . . . . . . 1598.40 The face-real sequence. . . . . . . . . . . . . . . . . . . . . . . . 1608.41 Anchor points in the model. . . . . . . . . . . . . . . . . . . . . . 1618.42 Results for the face-real sequence. . . . . . . . . . . . . . . . . 1628.43 The plane model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1648.44 Distribution of Synthetic Datasets. . . . . . . . . . . . . . . . . 165

x

Page 13: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.45 Results from dataset DS1 for plane. . . . . . . . . . . . . . . . . 1678.46 Results from dataset DS2 for plane. . . . . . . . . . . . . . . . . 1688.47 Results from dataset DS3 for plane. . . . . . . . . . . . . . . . . 1698.48 Results from dataset DS4 for plane. . . . . . . . . . . . . . . . . 1708.49 Results from dataset DS5 for plane. . . . . . . . . . . . . . . . . 1718.50 Results from dataset DS6 for plane. . . . . . . . . . . . . . . . . 1728.51 Average Time per iteration. . . . . . . . . . . . . . . . . . . . . . 176

9.1 Spiderweb Plots for Image Registration Algorithms. . . . . . 1829.2 Spherical Harmonics-based Illumination Model . . . . . . . . . 1849.3 Tracking by simultaneously using texture and edges infor-

mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1859.4 Efficient tracking using multiple views . . . . . . . . . . . . . . 186

B.1 Plane-induced homography. . . . . . . . . . . . . . . . . . . . . . 203

C.1 Plane+Parallax-constrained homograpy. . . . . . . . . . . . . . 206

xi

Page 14: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 15: Efficient Model-based 3D Tracking by Using Direct Image Registration

List of Tables

4.1 Characteristics of the warps . . . . . . . . . . . . . . . . . . . . . 50

6.1 Relationship between compositional algorithms and warps . . 896.2 Requirements for Optimization Algorithms . . . . . . . . . . . 90

7.1 Complexity of matrix operations. . . . . . . . . . . . . . . . . . 937.2 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 957.3 Additive testing algorithms. . . . . . . . . . . . . . . . . . . . . . 967.4 Complexity of Algorithm LK3DTM. . . . . . . . . . . . . . . . . 977.5 Complexity of Algorithm HB3DTM. . . . . . . . . . . . . . . . 987.6 Complexity of Algorithm LK3DMM. . . . . . . . . . . . . . . . 987.7 Complexity of Algorithm HB3DMMNF. . . . . . . . . . . . . . 997.8 Complexity of Algorithm HB3DMM. . . . . . . . . . . . . . . . 1007.9 Complexity of Algorithm HB3DMMSF. . . . . . . . . . . . . . 1017.10 Complexities of Additive Algorithms. . . . . . . . . . . . . . . . 1017.11 Complexity of Algorithm LKH8. . . . . . . . . . . . . . . . . . . 1037.12 Complexity of Algorithm ICH8. . . . . . . . . . . . . . . . . . . 1037.13 Complexity of Algorithm HBH8. . . . . . . . . . . . . . . . . . . 1047.14 Complexity of Algorithm GICH8. . . . . . . . . . . . . . . . . . 1047.15 Complexities of Compositional Algorithms. . . . . . . . . . . . 1067.16 Comparison of Relative Complexities for Additive Algorithms1067.17 Comparison of Relative Complexities for Compositional Al-

gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.1 Registration vs. tracking in efficient methods . . . . . . . . . . 1118.2 Features and Measures. . . . . . . . . . . . . . . . . . . . . . . . . 1158.3 Numerical Ranges for Features. . . . . . . . . . . . . . . . . . . . 1158.4 Evaluated Additive Algorithms . . . . . . . . . . . . . . . . . . . 1278.5 Ranges of parameters for cube experiments. . . . . . . . . . . . 1298.6 Average reprojection error vs. noise for cube. . . . . . . . . . . 1298.7 Ranges of parameters for face-deform experiments. . . . . . . 1448.8 Average reprojection error vs. noise for face-deform. . . . . . 1448.9 Evaluated Compositional Algorithms . . . . . . . . . . . . . . . 1648.10 Ranges of motion parameters for each dataset. . . . . . . . . . 1658.11 Average reprojection error vs. noise for plane. . . . . . . . . . 166

xiii

Page 16: Efficient Model-based 3D Tracking by Using Direct Image Registration

9.1 Classification of Motion Warps. . . . . . . . . . . . . . . . . . . . 181

D.1 Lemmas used to re-arrange matrices product. . . . . . . . . . 214D.2 Lemmas used to re-arrange Kronecker matrix products. . . . 216

xiv

Page 17: Efficient Model-based 3D Tracking by Using Direct Image Registration

List of Algorithms

1 Outline of the basic GN-based descent method for imageregistration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Outline of the Lucas-Kanade algorithm. . . . . . . . . . . . . . 283 Outline of the Hager-Belhumeur algorithm. . . . . . . . . . . . 314 Outline of the Forward Compositional algorithm. . . . . . . . 345 Outline of the Inverse Compositional algorithm. . . . . . . . . 366 Iterative factorization of the Jacobian matrix. . . . . . . . . . 547 Outline of the HB3DTM algorithm. . . . . . . . . . . . . . . . . 648 Outline of the full-factorized HB3DMM algorithm. . . . . . . 759 Outline of the HB3DMMSF algorithm. . . . . . . . . . . . . . . 7610 Outline of the Efficient Forward Compositional algorithm. . . 8211 Outline of the Generalized Inverse Compositional algorithm. 8812 Creating the synthetic datasets. . . . . . . . . . . . . . . . . . . 11913 Outline of the GN algorithm. . . . . . . . . . . . . . . . . . . . . 202

xv

Page 18: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 19: Efficient Model-based 3D Tracking by Using Direct Image Registration

Resumen

Esta tesis trata el problema de seguimiento eficiente de objectos 3D en secuencias deimagenes. Tratamos el problema del seguimiento 3D usando registrado de imagenesdirecto, una tecnica que permite alinear dos imagenes usando sus niveles de inten-sidad. El registrado de imagenes se suele resolver usando metodos de optimizacioniterativa, donde la funcion a minimizar depende del error en los niveles de intensidad.En esta tesis examinaremos los metodos de registrado de imagenes mas comunes,haciendo hincapie en aquellos que usan algoritmos eficientes de optimizacion.

En esta tesis investigaremos dos formas de registrado eficiente. La primera in-cluye a los metodos aditivos de registrado: los parametros de movimiento se calculanincrementalmente mediante una aproximacion lineal de la funcion de error. Dentrode este tipo de algoritmos, nos centraremos en el metodo de factorizacion de Hager yBelhumeur. Introduciremos un requisito necesario que el algoritmo de factorizaciondebe cumplir para tener una buena convergencia. Ademas, proponemos un pro-cedimiento automatico de factorizacion que nos permitira seguir objetos 3D tantorıgidos como deformables.

El segundo tipo son los llamados metodos composicionales de registrado, dondela norma de error se reescribe usando composicion de funciones. Estudiaremos losmetodos composicionales mas usuales, haciendo hincapie en el metodo de registradomas rapido, el algoritmo composicional inverso. Introduciremos un nuevo metodode registrado composicional, el algoritmo Efficient Forward Compositional, que nospermite interpretar los mecanismos de funcionamiento del algoritmo composicionalinverso. Gracias a esta interpretacion novedosa, enunciaremos dos requisitos funda-mentales para algoritmos composicionales eficientes.

Por ultimo, realizaremos una serie de experimentos con datos reales y sinteticospara comprobar los postulados teoricos. Ademas, diferenciaremos entre los proble-mas de registrado y seguimiento para algoritmos eficientes: aquellos algoritmos quecumplan su(s) requisito(s) podran usarse para registrado de imagenes, pero no paraseguimiento.

xvii

Page 20: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 21: Efficient Model-based 3D Tracking by Using Direct Image Registration

Abstract

This thesis deals with the problem of efficiently tracking 3D objects in sequences ofimages. We tackle the efficient 3D tracking problem by using direct image registra-tion. This problem is posed as an iterative optimization procedure that minimizesa brightness error norm. We review the most popular iterative methods for imageregistration in the literature, turning our attention to those algorithms that useefficient optimization techniques.

Two forms of efficient registration algorithms are investigated. The first typecomprises the additive registration algorithms : these algorithms incrementally com-pute the motion parameters by linearly approximating the brightness error function.We centre our attention on Hager and Belhumeur’s factorization-based algorithm forimage registration. We propose a fundamental requirement that factorization-basedalgorithms must satisfy to guarantee good convergence, and introduce a systematicprocedure that automatically computes the factorization. Finally, we also bringout two warp functions to register rigid and nonrigid 3D targets that satisfy therequirement.

The second type comprises the compositional registration algorithms, where thebrightness function error is written by using function composition. We study thecurrent approaches to compositional image alignment, and we emphasize the impor-tance of the Inverse Compositional method, which is known to be the most efficientimage registration algorithm. We introduce a new algorithm, the Efficient ForwardCompositional image registration: this algorithm avoids the necessity of invertingthe warping function, and provides a new interpretation of the working mechanismsof the inverse compositional alignment. By using this information, we propose twofundamental requirements that guarantee the convergence of compositional imageregistration methods.

Finally, we support our claims by using extensive experimental testing withsynthetic and real-world data. We propose a distinction between image registrationand tracking when using efficient algorithms. We show that, depending whether thefundamental requirements are hold, some efficient algorithms are eligible for imageregistration but not for tracking.

xix

Page 22: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 23: Efficient Model-based 3D Tracking by Using Direct Image Registration

Notations

Specific Sets and ConstantsX Set of target points or target region.Ω Set of target points currently visible.N Number of points in the target region—i.e., N = ‖X‖.NΩ Number of visible target points—i.e., NΩ = ‖Ω‖.P Dimension of the parameter space.C Number of image channels.K Dimension of the deformations space.F Number of frames in the image sequence.

Vectors and Matricesa Lowercase bold letters denote vectors.Am×n Monospace uppercase letters denote m× n matrices.vec(A) Vectorization of matrix A: if A is a m×n matrix, vec(A) is

a mn× 1 vector.Ik ∈ Mk×k k × k identity matrix.I 3× 3 identity matrix.0k ∈ R

k k × 1 vector full with zeroes.0m×n ∈ Mm×n m× n matrix full with zeroes.

Camera Model Notations

x ∈ R2 Pixel location at the image.

x ∈ P2 Location in the Projective space.

X ∈ R3 Point in Cartesian coordinates

Xc ∈ R3 Point expressed in the camera reference system.

K ∈ M3×3 3× 3 camera intrinsics matrix.P ∈ M3×4 3× 4 camera projection matrix.

1

Page 24: Efficient Model-based 3D Tracking by Using Direct Image Registration

Imaging Notations

T (x) ∈ Rc Brightness value of the template image at pixel x.

I(x, t) ∈ Rc Brightness value of the current image for pixel x at instant t.

It(x) Another notation for I(x, t).T,It Vector forms of functions T and It.[ ] Composite function of I p, that is I[x] = I(p(x)).

Optimization Notationsµ ∈ R

P Column vector of motion parameters.µ0 ∈ R

P Initial guess of the optimization.µi ∈ R

P Parameters at the i-th iteration of the optimization.µ∗ ∈ R

P Actual optimum of the optimization.µt ∈ R

P Parameters at image t.µJ∈ R

P Parameters where the Jacobian is computed for efficient algorithms.δµ ∈ R

P Incremental step at the current state of the optimization.ℓ(δµ) Linear model for the incremental step δµ.L(δµ) Local minimizer for the incremental step δµ.r(µ) ∈ R

N N × 1 vector-valued residuals function at parameters µ.∇xf(x) Derivatives of function f with respect to variables x, instantiated at x.J(µ) ∈ MN×P Jacobian matrix of the brightness dissimilarity at µ (i.e., J(µ) =

∇µD(X ;µ)).H(µ) ∈ MP×P Hessian matrix of the brightness dissimilarity at µ (i.e., H(µ) =

∇2µD(X ;µ)).

Warp Function Notations

f(x;µ) : Rn × RP 7→ R

n Motion model or Warp.p : Rn 7→ R

2 Projection into the Cartesian plane.R ∈ M3×3 3× 3 rotation matrix.ri ∈ R

3 Columns of the rotation matrix R (i.e., R = (r1, r2, r3)).t ∈ R

3 Translation vector in Euclidean space.D : R2 × R

p 7→ R Dissimilarity function.U : Rp × R

p 7→ Rp Parameters update function.

ψ : Rp × Rp 7→ R

p Jacobian update function for algorithm GIC.

2

Page 25: Efficient Model-based 3D Tracking by Using Direct Image Registration

Factorization Notations

⊗ Kronecker product.⊙ Row-wise Kronecker product.S(x) Constant matrix in the factorization method that is computed from the

target structure and camera calibration.M(µ) Variable matrix in the factorization methods that is computed from

motion parameters.W ∈ Mp×p Weighting matrix for Weighted Least-Squares.π : Rn 7→ R

n Permutation of the set 1, . . . , n.Pπ(n) ∈ Mn×n Permutation matrix of the set 1, . . . , n.π(n, q) Permutation of the set 1, . . . , n with ratio q.

3D Models Notations

F ⊂ R2 Reference frame for algorithm HB.

S : F 7→ R3 Target shape function.

T : F 7→ RC Target texture function.

u ∈ F Target coordinates in the reference frame.S ∈ M3×Nv

Target 3D shape.s ∈ R

3 Shape coordinates in the Euclidean space.s0 ∈ R

3 Mean shape of the target generative model.si ∈ R

3 i-th basis of deformation of the target generative model.n⊤ ∈ R

3 Normal vector to a given triangle. n⊤ is normalized with the triangledepth (i.e., if x belongs to the triangle, then n⊤x = 1).

Bs ∈ M3×K Basis of deformations.c ∈ R

K Vector containing K deformation coefficients.HA ∈ M3×3 Affine warp between the image reference frame and F .˙R∆ Derivatives of the rotation matrix R with respect to the Euler angle

∆ = α, β, γ.λ ∈ R Homogeneous scale factor.v ∈ R

3 Change of variables defined as v = K−1HAu.

Function Naming ConventionsfH82D : P

2 7→ P2 8-dof homography.

fH6P : P2 7→ P

2 Plane-induced homography.fH6S : P

2 7→ P2 Shape-induced homography.

f3DTM : P2 7→ P

2 3D Textured Model motion model.fH6D : P

2 7→ P2 Deformable shape-induced homography.

f3DMM : P2 7→ P

2 3D Textured Morphable Model motion model.ε : Rp 7→ R Reprojection error function.

3

Page 26: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithms Naming Conventions

LK Lucas-Kanade algorithm [Lucas and Kanade, 1981]1.HB Hager-Belhumeur factorization algorithm [Hager and Belhumeur, 1998].IC Inverse Compositional algorithm [Baker and Matthews, 2004].FC Forward Compositional algorithm [Baker and Matthews, 2004].GIC Generalized Inverse Compositional algorithm [Brooks and Arbel, 2010].EFC Efficient Forward Compositional algorithm.LKH8 Lucas-Kanade algorithm for homographies.LKH6 Lucas-Kanade algorithm for plane-induced homographies.LK3DTM Lucas-Kanade algorithm for 3D Textured Models (rigid).LK3DMM Lucas-Kanade algorithm for 3D Morphable Models (deformable).HB3DTR Full-factorized HB algorithm for 6-dof motion in R

3 [Sepp, 2006].HB3DTM Full-factorized HB algorithm for 3D Textured Models (rigid).HB3DMM Full-factorized HB algorithm for 3D Morphable Models (deformable).HB3DMMSF Semi-factorized HB algorithm for 3D Morphable Models.HB3DMMNF HB algorithm for 3D Morphable Models without the factorization stage.ICH8 IC algorithm for homographies.ICH6 IC algorithm for plane-induced homographies.GICH8 IC algorithm for homographies.GICH6 IC algorithm for plane-induced homographies.IC3DRT IC algorithm for 6-dof motion in R

3 [Munoz et al., 2005].FCH6PP FC algorithm for plane+parallax homographies.

1We only show the most relevant cite for each algorithm

4

Page 27: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 1

Introduction

This thesis deals with the problems of registration and tracking in sequences ofimages. Both problems are classical topics in Computer Vision and Image Processingthat have been widely studied in the past. We summarize the subjects of this thesisin the dissertation title:

Efficient Model-based 3D Tracking by using Direct Image Registration

What is 3D Tracking? Let the target be a part of the scene—e.g. the cube inFigure 1.1. We define tracking as the process of repeatedly computing the targetstate in a sequence of images. When we describe this state as the relative 3Dorientation and location of the target with respect the coordinate system of thecamera (or another arbitrary reference system), we refer to this process as 3D rigidtracking (see Figure 1.1). If we also include state parameters that describe thepossible deformation of the object, we have 3D nonrigid or deformable tracking (seeFigure 1.2). We use 3D tracking to refer to both the rigid or the nonrigid case.

What is Direct Image Registration? When the target is imaged by two cam-eras with different point-of-view, the resulting images are different although theyrepresent the same portion of the scene (see Figure 1.3). Image Registration orImage Alignment computes the geometric transformation that best aligns the coor-dinate systems of both images such that their pixel-wise differences are minimal (cf.Figure 1.3). We say that the image registration is a direct method when we registerthe coordinate systems by just using the brightness differences of the images.

What is Model-based? We say that a technique is model-based when we re-strict the information from the real world by using certain assumptions: on thetarget dynamics, on the target structure, on the camera sensing process, etc—e.g.in Figure 1.1 we model the target with a cube structure and rigid body dynamics.

5

Page 28: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 1.1: Example of 3D rigid tracking (Left) Selected frames of a scene containinga textured cube. We track the object and we overlay its state in blue. (Right) The relativeposition of the camera—represented by a coloured pyramid—and the cube is computedfrom the estimated 3D parameters.

Figure 1.2: 3D Nonrigid Tracking. Selected frames from a sequence of a cushionunder a bending motion. We track some landmarks on the cushion through the sequence,and we plot the resulting triangular mesh for the selected frames. The motion of thelandmarks is both global—translation of the mesh—and local—changes on the relativeposition of the mesh vertices due to the deformation. Source: Alessio del Bue.

And Finally, What does Efficient mean? We say that a method is efficientif it substantially improves the computation time with respect to gold-standardtechniques. In a more practical way, efficient is equivalent to real-time—i.e. the

6

Page 29: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 1.3: Image registration (Top-row)Image of a portion of the scene under twodistinct point-of-views. We have outlined the target in blue (Top-left) and green (Top-right). (Bottom)The left image is warped such that the coordinates of the target matchup in both images. Source:Graffiti sequence, from Oxford Visual Geometry Group.

tracking procedure operates at 25 frames per second.

1.1 Motivation

In less than thirty years, and quite enclosed to academic or military environments,video tracking has a widespread acknowledgement mainly thanks to the media.

7

Page 30: Efficient Model-based 3D Tracking by Using Direct Image Registration

Thus, video tracking is now a staple in sci-fi shows and films where futuristic Head-up Displays (hud) work in a show-and-tell fashion, a camera surveillance systemcan locate an object or a person, or a robot can address people and even recognizetheir mood.

However, tv is, sadly to say, years ahead of reality. Actual video tracking systemsare still in a primitive stage: they are inaccurate, sloppy, slow, and usually work inlaboratory conditions only. Anyway, video tracking progression increases by leapsand bounds and it will probably match some sci-fi standards soon.

We investigate the problem of efficiently tracking an object in a video sequence.Nowadays there exists several efficient optimization algorithms for video trackingor image registration. We study two of the fastest algorithms available: the Hager-Belhumeur factorization algorithm and the Baker-Matthews inverse compositionalalgorithm. Both algorithms, although very efficient for planar registration, presentdiverse problems for 3D tracking. This thesis studies which assumptions can be donewith these algorithms whilst underlining their limitations through extensive testing.Eventually, the objective is to provide a detail description of each algorithm, pointingout pros and cons, leading to a kind of Quick Guide to Efficient Tracking Algorithms.

1.2 Applications

Typical applications for 3D tracking include target localization for military oper-ations; security and surveillance tasks such as person counting, face identification,people detection, determining people activity or detecting left objects; it also in-cludes human-computer interaction for computer security, aids for disabled peopleor even controlling video-games. Tracking is used for augmenting video sequenceswith additional information such as advertisements, expanding information aboutthe scene, or adding or removing objects of the scene. We show some examples ofactual industrial applications in Figure 1.4.

A tracking process that is widely used in film industry is Motion Capture: wetrack the motion of the different parts of the an actor’s body using a suit equippedwith reflective markers; then, we transfer the estimated motion to a computer-generated character (see Figure 1.5). Using this technique, we can animate a syn-thetic 3D character in a movie as Gollum in the Lord of the Rings trilogy (2001),or Jar-Jar Binks in the new Star Wars trilogy (1999). Another relevant moviesthat employ these techniques are Polar Express (2004), King Kong (2005), Beowulf(2007), A Christmas Carol (2009), and Avatar (2009). Furthermore, we can generatea complete computer-generated movie populated with characters animated throughmotion capture. Facial motion capture is of special interest for us: we animate acomputer-generated facial expression by facial expression tracking (see Figure 1.5).

We turn our attention to markerless facial motion capture, that is, the processof recovering the face expression and orientation without using fiducial markers.Markerless motion capture does not require special equipment—such as close-up

8

Page 31: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 1.4: Industrial applications of 3D tracking. (Top-left) Augmented realityinserts virtual objects into the scene. (Top-middle) Augmented reality shows additionalinformation about tracked objects in the scene. Source:Hawk-eye, Hawk-Eye InnovationsLtd., copyright c©2008. Top-right Tracking a pedestrian for video surveillance. Source:Martin Communications, copyright c©1998-2007. Bottom-left People flow counter bytracking. Source: EasyCount, by Keeneo, copyright c©2010. Bottom-middle Car track-ing detects possible traffic infractions or estimates car speed. Source: Fibridge, copy-right c©. Bottom-right Body tracking is used for interactive controlling of video-games.Source: Kinect, Microsoft, copyright c©2010.

cameras—or a complicated set-up on the actor’s face—such as special reflectivemake-up or facial stickers. In this thesis we propose a technique that captures facialexpressions motion by only using brightness information and a prior knowledge onthe deformation of the target (see Figure 1.6).

1.3 Contributions of the Thesis

We outline the remaining chapters of the thesis and their principal contributions asfollows:

Chapters2: Literature Review We provide a detailed survey of the literatureon techniques for both image registration and tracking.

Chapters3: Efficient Image Registration We review the state-of-the-art onefficient methods. We introduce the taxonomy for efficient registration algorithms:

9

Page 32: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 1.5: Motion capture in the film industry. Facial and body motion capturefrom AvatarTM (Top-row) and Polar ExpressTM (Bottom-row). (Left-column) Thebody motion and head pose are computed using reflective fiducial markers—grey spheresof the motion capture jumpsuit. For facial expression capture they use plenty of smallermarkers and even close-up cameras. (Right-column) They use the estimated motion toanimate characters in the movie. Source: Avatar, 20th Century Fox, copyright c©2009;Polar Express, Warner Bros. Pictures, copyright c©2004.

an algorithm is classified as either additive or compositional.

Chapter 4: Equivalence of Gradients We introduce the gradient equiva-lence equation constraint: we show that the accomplishment of this assumptionhas positive effects on the performance of the algorithms.

Chapter 5: Additive Algorithms We review which constraints determine theconvergence of additive registration algorithms, specially the factorization approach.We provide a methodical procedure to factorize an algorithm in general form; westate a basic set of theorems and lemmas that enable us to systematize the factor-ization. We introduce two tracking algorithms using factorization: one for rigid 3Dobjects, and other for deformable 3D objects.

10

Page 33: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 1.6: Markerless facial motion capture. (Top) Several frames where theface modifies both its orientation—due to a rotation—and its shape structure—due tochanges in facial expression. (Bottom) The tracking state vector includes both pose anddeformation. Legend : Blue Actual projection of the target shape using the estimatedparameters; Pink Highlighted projections corresponding to profiles of the jaw, eyebrows,lips and nasolabial wrinkles.

Chapter 6: Compositional Algorithms We review the basic inverse composi-tional algorithm. We introduce an alternative efficient compositional algorithm thatis equivalent to the inverse compositional algorithm under certain assumptions. Weshow that if the gradient equivalent equation holds then both efficient compositionalmethods shall converge.

Chapter 7: Computational Complexity We study the resources used by theregistration algorithms in terms of their computational complexity. We compare thetheoretical complexities of efficient and nonefficient algorithms.

Chapter8: Experiments We devise a set of experimental tests that shall con-firm our assumptions on the registration algorithms, that is, (1) the dependenceof the convergence on the algorithm constraint, and (2) evaluate the theoreticalcomplexities with actual data.

Chapter 9: Conclusions and Future Work Finally, we drawn conclusionsabout where each technique is more suitable to be used, and we provide insight intofuture work to improve the proposed methods.

11

Page 34: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 35: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 2

Literature Review

In this chapter we review the basic literature on tracking and image registration.First we introduce the basic similarities and differences between image registrationand tracking. Then, we review the usual methods for both tracking and imageregistration.

2.1 Image Registration vs. Tracking

The frontier between image registration and tracking is a bit fuzzy: tracking identi-fies the location of an object in a sequence of images, whereas registration finds thepixel-to-pixel correspondence between a pair of images. Note that in both cases wecompute a geometric and photometric transformation between images: pairwise inthe context of image registration and among multiple images for the tracking case.Although we may indistinctly use the terms registration and tracking, we define thefollowing subtle semantic differences between them:

• Image registration finds the best alignment between two images of the samescene. We use use a geometric transformation to align the images of bothcameras. We consider that image registration emphasizes in finding the bestalignment between two images in visual terms, not in accurately recoveringparameters of the transformation—this is usually the case in e.g., medicalapplications.

• Tracking finds the location of a target object in each frame of a sequence. Weassume that the difference of object position between two consecutive frames issmall. In tracking we are typically interested in recovering the parameters de-scribing the state of the object rather than the coordinates of the location: wecan describe an object using richer information that just its position (e.g. 3Dorientation, modes of deformation, lighting changes, etc.). This is usually thecase in robotics [Benhimane and Malis, 2007; Cobzas et al., 2009; Nick Molton,2004], or augmented reality [Pilet et al., 2008; Simon et al., 2000; Zhu et al.,2006].

13

Page 36: Efficient Model-based 3D Tracking by Using Direct Image Registration

Also, image registration involves two images with arbitrary baseline whereas track-ing usually operates in a sequence with a small inter-frame baseline. We assumethat tracking is a higher level problem than image registration. Furthermore, wepropose a tracking-by-registration approach: we track an object through a sequenceby iteratively registering pairs of consecutive images [Baker and Matthews, 2004];however, we can perform tracking without any registration at all (e.g. tracking-by-detection [Viola and Jones, 2004], or tracking-by-classification [Vacchetti et al.,2004]).

2.2 Image Registration

Image registration is a classic topic in computer vision and numerous approacheshave been proposed in the literature; two good surveys in the subject are [Brown,1992] and [Zitova, 2003]. The process involves computing the pixel-to-pixel corre-spondence between the two images: that is, for each pixel on one image we findthe corresponding pixel in the other image so that both pixels project from thesame actual point in the scene (cf. Figure 1.1). Applications include image mo-saicing [Capel, 2004; Irani and Anandan, 1999; Shum and Szeliski, 2000], videostitching [Caspi and Irani, 2002], super-resolution [Capel, 2004; Irani and Peleg,1991], region tracking [Baker and Matthews, 2004; Hager and Belhumeur, 1998; Lu-cas and Kanade, 1981], recovering scene/camera motion [Bartoli et al., 2003; Iraniet al., 2002], or medical image analysis [Lester and Arridge, 1999].

Image registration methods commonly fall in one of the two following groups [Bar-toli, 2008; Capel, 2004; Irani and Anandan, 1999]:

Direct methods A direct image registration method aligns two images by onlyusing the colour—or intensity in greyscale data—values of the pixels thatare common to both images (namely, the region of support). Direct meth-ods minimize an error measure based on image brightness from the region ofsupport. Typical error measures include a L2-norm of the brightness differ-ence [Irani and Anandan, 1999; Lucas and Kanade, 1981], normalized cross-correlation [Brooks and Arbel, 2010; Lewis, 1995], or mutual information [Dow-son and Bowden, 2008; Viola and Wells, 1997].

Feature-based methods In feature-based methods, we align two images by com-puting the geometric transformation between a set of salient features thatwe detect in each image. The idea is to abstract distinct geometric imagefeatures that are more reliable than the raw intensity values; typically thesefeatures show invariance with respect to modifications of the camera point-of-view, illumination conditions, scale, or orientation of the scene [Schmid et al.,2000]. Corners or interest points [Bay et al., 2008; Harris and Stephens, 1988;Lowe, 2004; Torr and Zisserman, 1999] are classical features in the literature,although we can use other features such us edges [Bartoli et al., 2003], orextremal image regions [Matas et al., 2002].

14

Page 37: Efficient Model-based 3D Tracking by Using Direct Image Registration

Direct or feature-based methods? Choosing between direct or feature-basedmethods is not an easy task: we have to know the strong points of each methodand for what applications it is more suitable. A good comparison between the twotypes of methods is [Capel, 2004]. Feature-based methods typically show stronginvariance to a wide range of photometric and geometric transformation of the im-age, and they are more robust to partial occlusions of the scene that their directcounterparts [Capel, 2004; Torr and Zisserman, 1999]. On the other hand, directmethods can align images with sub-pixel accuracy, estimate dominant motion evenwhen multiple motion are present, and they can provide dense motion field in case of3D estimation [Irani and Anandan, 1999]. Moreover, direct methods do not requirehigh-frequency textured surfaces (corners) to operate, but have optimal performancewith smooth graylevel transitions [Benhimane et al., 2007].

2.3 Model-based 3D Tracking

In this section we define what is model-based tracking, and we review the previousliterature on 3D tracking of rigid and nonrigid objects. A special case of interestfor nonrigid objects is the 3D tracking of human faces or facial motion capture.Recovering the 3D orientation and position of the target can be done with respectto the camera (or an arbitrary reference system), or the relative displacement andorientation of the camera with respect to the target (or another arbitrary referencesystem in the scene), [Sepp, 2008]. A good survey on the subject is [Lepetit andFua, 2005].

2.3.1 Modelling assumptions

In model-based techniques we use a priori knowledge about the scene, the target,or the sensing device, as a basis for the tracking procedure. We classify theseassumptions on the real-world information as follows:

Target modelThe target model specifies how to represent the information about the structure ofthe scene in our algorithms. Template tracking or template matching simply repre-sents the target as the pixel intensity values inside a region defined on one image:we call this region—or the image itself—the reference image or template. One ofthe first proposed technique for template matching was [Lucas and Kanade, 1981],although it was initially devised for solving optical flow problems. The literatureproposes numerous extensions to this technique [Baker and Matthews, 2004; Benhi-mane and Malis, 2007; Brooks and Arbel, 2010; Hager and Belhumeur, 1998; Jurieand Dhome, 2002a].

We may also allow the target to deform its shape: this deformation induceschanges in the target projected appearance. We model these changes in targettexture by using generative models such as eigenimages [Black and Jepson, 1998;

15

Page 38: Efficient Model-based 3D Tracking by Using Direct Image Registration

Buenaposada et al., 2009], Active Appearance Models (aam) [Cootes et al., 2001],active blobs [Sclaroff and Isidoro, 2003], or subspace representation [Ross et al.,2004]. Instead of modelling brightness variations we may represent target shapedeformation by using a linear model representing the location of a set of featurepoints [Blanz and Vetter, 2003; Bregler et al., 2000; Del Bue et al., 2004], or FiniteElement Meshes [Pilet et al., 2005; Zhu et al., 2006]. Alternative approaches modelnon-rigid motion of the target by using anthropometric data [Decarlo and Metaxas,2000], or by using a probability distribution of the intensity values of the targetregion [Comaniciu et al., 2000; Zimmermann et al., 2009].

These techniques are suitable to track planar objects of the scene. If we add fur-ther knowledge about the scene, we can track more complex objects: with a propermodel we are able to recover 3D information. Typically, we use a wireframe 3Dmodel of the target and tracking consists on finding the best alignment between thesensed image and the 3D model [Cipolla and Drummond, 1999; Kollnig and Nagel,1997; Marchand et al., 1999]. We can augment this model by adding further texturepriors either from the image stream [Cobzas et al., 2009; Munoz et al., 2005; Seppand Hirzinger, 2003; Vacchetti et al., 2004; Xiao et al., 2004a; Zimmermann et al.,2006], or from and external source (e.g. a 3D scanner or a texture mosaic) [Hongand Chung, 2007; La Cascia et al., 2000; Masson et al., 2004, 2005; Pressigout andMarchand, 2007; Romdhani and Vetter, 2003].

Motion modelThe motion model describes the target kinematics (i.e. how the object modifiesits position in the image/scene). The motion model is tightly coupled to the tar-get model: it is usually represented by a geometric transformation that maps thecoordinates of the target model into a different set of coordinates. For a planartarget, these geometric transformations are typically affine [Hager and Belhumeur,1998], homographic [Baker and Matthews, 2004; Buenaposada and Baumela, 1999],or spline-based warps [Bartoli and Zisserman, 2004; Brunet et al., 2009; Lester andArridge, 1999; Masson et al., 2005]. For actual 3D targets, the geometric warpsaccount for computing the rotation and translation of the object using a 6 degree-of-freedom (dof) rigid body transformation [Cipolla and Drummond, 1999; La Casciaet al., 2000; Marchand et al., 1999; Sepp and Hirzinger, 2003].

Camera modelThe camera model specifies how the images are sensed by the camera. The pin-hole camera models the imaging device as a projector of the coordinates of thescene [Hartley and Zisserman, 2004]. For tracking zoomed objects located far away,we may use orthographic projection [Brand and R.Bhotika, 2001; Del Bue et al.,2004; Tomasi and Kanade, 1992; Torresani et al., 2002]. The perspective projectionaccounts for perspective distortion, and it is more suitable for close-up views [Munozet al., 2005, 2009]. The camera model may also account for model deviations suchas lens distortion [Claus and Fitzgibbon, 2005; Tsai, 1987].

16

Page 39: Efficient Model-based 3D Tracking by Using Direct Image Registration

Other model assumptionsWe can also model prior photometric knowledge about the target/scene such asillumination cues [La Cascia et al., 2000; Lagger et al., 2008; Romdhani and Vetter,2003], or global colour [Bartoli, 2008].

2.3.2 Rigid Objects

We can follow two strategies to recover the 3D parameters of a rigid object:

2D Tracking The first group of methods involves a two-step process: first wecompute the 2D motion of the object as a displacement of the target projectionon the image; second, we recover the actual 3D parameters from the computed2D displacements by using the scene geometry. A natural choice is to useoptical flow : [Irani et al., 1997] computes the dominant 2D parametric motionbetween two frames to register the images; the residual displacement—theimage regions that cannot be registered—is used to recover the 3D motion.When the object is a 3D plane, we can use a homographic transformation tocompute plane-to-plane correspondences between two images; then we recoverthe actual 3D motion of the plane using the camera geometry [Buenaposadaand Baumela, 2002; Lourakis and Argyros, 2006; Simon et al., 2000]. Wecan also compute the inter-frame displacements by using linear regressors orpredictors, and then we robustly adjust the projections to a target model—using RANSAC—to compute the 3D parameters [Zimmermann et al., 2009]. Analternative method is to compute pixel-to-pixel correspondences by using aclassifier [Lepetit and Fua, 2006], and then recover the target 3D pose usingPOSIT [Dementhon and Davis, 1995], or equivalent methods [Lepetit et al.,2009].

3D Tracking These methods directly compute the actual 3D motion of the objectfrom the image stream. They mainly use a 3D model of the target to computethe motion parameters; the 3D model contains a priori knowledge of the targetthat improves the estimation of motion parameters (e.g. to get rid of projec-tive ambiguities). The simplest way to represent a 3D target is using a texturemodel—a set of image patches sensed from one or several reference images—asin [Cobzas et al., 2009; Devernay et al., 2006; Jurie and Dhome, 2002b; Massonet al., 2004; Sepp and Hirzinger, 2003; Xu and Roy-Chowdhury, 2008]. Themain drawback of these methods is the lack of robustness against changes inscene illumination, specular reflections. We can alternatively fit the projectionof a 3D wireframe model (e.g. a cad model) to the edges of the image [Drum-mond and Cipolla, 2002]. However, these methods have also problems withcluttered backgrounds [Lepetit and Fua, 2005]. To gain robustness, we can usehybrid models of texture and contours such as [Marchand et al., 1999; Massonet al., 2003; Vacchetti et al., 2004], or simply use an additional model to dealwith illumination [Romdhani and Vetter, 2003].

17

Page 40: Efficient Model-based 3D Tracking by Using Direct Image Registration

2.3.3 Nonrigid Objects

Tracking methods for nonrigid objects fall in the same categories that we used forrigid ones. Point-to-point correspondences of the deformable target can recoverthe pose and/or deformation parameters using subspace methods [Del Bue, 2010;Torresani et al., 2008], or fitting a deformable triangle mesh [Pilet et al., 2008;Salzmann et al., 2007]. We can alternatively fit the 2D silhouette of the target to a3D skeletal deformable model of the object [Bowden et al., 2000].

Direct estimation of the 3D parameters unifies the processes of matching pixelcorrespondences, and estimating the pose and deformation of the target. [Brand,2001; Brand and R.Bhotika, 2001] constrains the optical flow by using a lineargenerative model to represent the deformation of the object. [Gay-Bellile et al.,2010] models the object 3D deformations, including self-occlusions, by using a setof Radial Basis Functions (rbf).

2.3.4 Facial Motion Capture

Estimation of facial motion parameters is a challenging task; head 3D orientationwas typically estimated by using fiducial markers to overcome the inherent difficultyof the problem [Bickel et al., 2007].

However, markerless methods have been also developed in recent years. Facialmotion capture involves recovering head 3D orientation and/or face deformation dueto changes in expression. We first review techniques for recovering head 3D pose,then we review techniques for recovering both pose and expression.

Head pose estimation There are numerous techniques to compute head pose or3D orientation. In the following, we review a number of them—a recent detailedsurvey on the subject is [Murphy-Chutorian and Trivedi, 2009]. The main difficultyof estimating head pose lies on the nonconvex structure of the human head. Classic2D approaches such as [Black and Yacoob, 1997; Hager and Belhumeur, 1998] areonly suitable to track motions of the head parallel to the image plane: the rea-son is that these methods only use information from a single reference image. Tofully recover the 3D rotation parameters of the head we need additional informa-tion. [La Cascia et al., 2000] uses a texture map that was computed by cylindricalprojection of different point-of-view images of the head; [Baker et al., 2004a; Jangand Kanade, 2008] also use an analogous cylindrical model. In a similar fashion, wecan use a 3D ellipsoid shape [An and Chung, 2008; Basu et al., 1996; Choi and Kim,2008; Malciu and Preteux, 2000]. Instead of using a cylinder or an ellipsoid, we canhave a detailed model of the head like a 3D Morphable Model (3dmm) [Blanz andVetter, 2003; Munoz et al., 2009; Xu and Roy-Chowdhury, 2008], an aam coupledtogether with a 3dmm [Faggian et al., 2006], or a triangular mesh model of theface [Vacchetti et al., 2004]. The latter is robustly tracked in [Strom et al., 1999]using an Extended Kalman Filter. We can also have a head model with reducedcomplexity as in [B. Tordoff et al., 2002].

18

Page 41: Efficient Model-based 3D Tracking by Using Direct Image Registration

Face expression estimation A change of facial expression induces a deforma-tion in the 3D structure of the face. The estimation of this deformation can beused for face expression recognition, expression detection, or facial motion trans-fer. Classic 2D approaches such as aams [Cootes et al., 2001; Matthews and Baker,2004] are only suitable to recover expressions from a frontal face. 3D aams are thethree-dimensional extension to these 2D methods: they adjust a statistical modelof 3D shapes and texture—typically a PCA model—to the pixel intensities of theimage [Chen and Wang, 2008; Dornaika and Ahlberg, 2006]. Hybrid methods thatcombine 2D and 3D aams show both real-time performance and actual 3D headpose estimation: we can use the 3D aams to simultaneously constrain the 2D aamsmotion and compute the 3D pose [Xiao et al., 2004b], or directly compute the fa-cial motion from the 2D aams parameters [Zhu et al., 2006]. In contrast to pure2D aams, 3D aams can recover actual 3D pose and expression from faces that arenot frontal to the camera. However, the out-of-plane rotations that can be recov-ered by these methods are typically smaller than using a pure 3D model (e.g. a3dmm). [Blanz and Vetter, 2003; Romdhani and Vetter, 2003] search the best con-figuration for a 3dmm such that the differences between the rendered model and theimage are minimal; both methods also show great performance recovering strong fa-cial deformations. Real-time alternatives using 3dmm include [Hiwada et al., 2003;Munoz et al., 2009]. [Pighin et al., 1999] uses a linear combination of 3D face modelsfitted to match the images to estimate realistic facial expressions. Finally, [Decarloand Metaxas, 2000] derives an anthropometric physically-based face model that maybe adjusted to each individual face target; besides, they solve a dynamic system forthe face pose and expression parameters by using optical flow constrained by theedges of the face.

19

Page 42: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 43: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 3

Efficient Direct Image Registration

3.1 Introduction

This chapter reviews the problem of efficiently registering two images. We defineDirect Image Alignment (dia) problem as the process that computes the trans-formation between two frames using only image brightness information. We orga-nize the chapter as follows: Section 3.2 introduces basic registration notions; Sec-tion 3.3 reviews additive registration algorithms such as Lucas-Kanade or Hager-Belhumeur; Section 3.4 reviews compositional registration algorithms such as Bakerand Matthews’ Forward Compositional and Inverse Compositional; finally, othermethods are reviewed in Section 3.5.

3.2 Modelling Assumptions

This section reviews those assumptions on the real world that we use to mathemat-ically model the registration procedure. We introduce the notation on the imagingprocess through a pinhole camera. We ascertain the Brightness Constancy Assump-tion or Brightness Constancy Constraint (bcc) as the cornerstone of the directimage registration techniques. We also pose the registration problem as an itera-tive optimization problem. Finally, we provide a classification of the existing directregistration algorithms.

3.2.1 Imaging Geometry

We represent points of the scene using Cartesian coordinates in R3 (e.g. X =

(X, Y, Z)⊤). We represent points on the image with homogeneous coordinates, sothat the pixel position x = (i, j)⊤ is represented using the notation for augmentedpoints as x = (i, j, 1)⊤. The homogeneous point x = (x1, x2, x3)

⊤ is converselyrepresented in Cartesian coordinates using the mapping p : P2 → R

2, such thatp(x) = x = (x1/x3, x2/x3). The scene is imaged through a perfect pin-hole cam-era [Hartley and Zisserman, 2004]; by abuse of notation, we define the perspective

21

Page 44: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 3.1: Imaging geometry. An object of the scene is imaged through cameracentres C1 and C2 onto two distinct images I1 and I2 (related by a rotation R anda translation t). The point X is projected to the points x1 = p(K

[I|0]X) and x2 =

p(K[R∣∣− Rt

]X) in the two images.

projection p : R3 7→ R2 that maps scene coordinates onto image points,

x = p(Xc) =

(k⊤1 Xc

k⊤3 Zc

,k⊤2 Yc

k⊤3 Zc

)⊤

,

where K = (k⊤1 ,k

⊤2 ,k

⊤3 )

⊤ is the 3 × 3 matrix that contains the camera intrinsics(cf. [Hartley and Zisserman, 2004]), and Xc = (Xc, Yc, Zc)

⊤. We implicitly assumethat Xc represents a point in the camera reference system. If the points to projectare expressed in an arbitrary reference system of the scene we need an additionalmapping; hence, the perspective projection for a point X in the scene is

x = K[R∣∣− Rt

](X1

)

,

where R and t are the rotation and translation between the scene and the cameracoordinate system (see Figure 3.1). Our input is a smooth sequence of images—i. e.inter-frame differences are small—where It is the t-th frame of the sequence. We de-note T as the reference image or template. Images are discrete matrices of brightnessvalues, although we represent them as functions from R

2 to RC, where C is the num-

ber of image channels (i.e. C = 3 for colour, and C = 1 for gray-scale images): It(x) isthe brightness value at pixel x. For non-discrete pixel coordinates, we use bilinear in-terpolation. If X is a set of pixels, we collect the brightness values of I(x), ∀x ∈ X ina single column vector as I(X )—i.e., I(X ) = (I(x1), . . . , I(xN))

⊤, x1, . . . ,xN ∈ X .

22

Page 45: Efficient Model-based 3D Tracking by Using Direct Image Registration

3.2.2 Brightness Constancy Constraint

The bcc relates brightness information between two frames of a sequence [Hagerand Belhumeur, 1998; Irani and Anandan, 1999]. The reference image T is onearbitrary image of the sequence. We define the target region X as a set of pixelcoordinates X = x1, . . . ,xN defined on T (see Figure 3.2). We define the templateas the image values of the target region, that is, T (X ). Let us assume we knowthe transformation of the target region between T and another arbitrary image ofthe sequence, It. The motion model f defines this transformation as Xt = f(X ;µt),where the set of coordinates Xt is the target region on It and µt are the motionparameters. The bcc states that the brightness values of the template T and theinput image It warped by f with parameters µt should be equal,

T (X ) = It(f(X ;µt)). (3.1)

The direct conclusion from Equation 3.1 is that the brightness of the target does notdepend on its motion—i.e., the relative position and orientation of the camera withrespect the target does not affect the brightness of the latter. However, we may aug-ment the bcc to include appearance changes [Black and Jepson, 1998; Buenaposadaet al., 2009; Matthews and Baker, 2004], and changes in illumination conditions dueto ambient [Bartoli, 2008; Basri and Jacobs, 2003] or specular lighting [Blanz andVetter, 2003].

3.2.3 Image Registration by Optimization

Direct image registration is usually posed as an optimization problem. We minimizean error function based on the brightness pixel-wise difference that is parameterizedby motion variables:

µ∗ = argminµ

D(X ;µ)2, (3.2)

whereD(X ;µ) = T (X )− It(f(X ;µ)) (3.3)

is a dissimilarity measure based on the bcc (Equation 3.1).

Descent Methods

Recovering these parameters is typically a non-linear problem as it depends onimage brightness—which is usually non-linearly related to the motion parameters.The usual approach is iterative gradient-based descent (GD): from a starting pointµ0 in the search space, the method iteratively computes a series of partial solu-tions µ1,µ2, . . .µk that, under certain conditions, converge to the local minimizerµ∗ [Madsen et al., 2004] (see Figure 3.2). We typically use Gauss-Newton (GN)methods for efficient registration because they provide good convergence withoutcomputing second derivatives (see Appendix A). Hence, the basic GN-based algo-rithm for image registration operates as we outline in Algorithm 1 and depict inFigure 3.3. We describe the four stages of the algorithm in the following:

23

Page 46: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 3.2: Iterative gradient descent image registration. Top-left Templateimage for the registration. We highlight the target region as a green quadrangle. Top-right Image that we register against the template. We generate the image by rotating theimage around its centre and translating it in the X-axis. We highlight the correspondingtarget region in yellow. We also display the initial guess for the optimization as a greenquadrangle. Notice that it exactly corresponds to the position of the target region at thetemplate. Bottom-left Contour plot of the image brightness dissimilarity. The axis showthe values of the search space: image rotation and translation. We show the successiveiterations in the search space: we reach the solution in four steps—µ0 to µ4. Bottom-right We show the target region that corresponds to the parameters of each iteration.The colour of each quadrangle matches the colour of the parameters that generated it asseen in the Bottom-left figure.

24

Page 47: Efficient Model-based 3D Tracking by Using Direct Image Registration

Dissimilarity measure The dissimilarity measure is a function on the image bright-ness error between two images. The usual measure for image registration isthe Sum of Squared Differences (ssd), that is, the L2-norm of the differenceof pixel brightness (Equation 3.3) [Brooks and Arbel, 2010; Hager and Bel-humeur, 1998; Irani and Anandan, 1999; Lucas and Kanade, 1981]. However,we can use other measures such as normalized cross-correlation [Brooks andArbel, 2010; Lewis, 1995], or mutual information [Brooks and Arbel, 2010;Dowson and Bowden, 2008; Viola and Wells, 1997].

Linearize the dissimilarity The next stage linearizes the brightness function aboutthe current search parameters µ; this linearization enables us to transformthe problem into a system of linear equations on the search variables. Wetypically approximate the function using Taylor series expansion; dependingon how many terms—derivatives—we compute, we have optimisation methodslike Gradient Descent [Amberg and Vetter, 2009], Newton-Raphson [Lucas andKanade, 1981; Shi and Tomasi, 1994], Gauss-Newton [Baker and Matthews,2004; Brooks and Arbel, 2010; Hager and Belhumeur, 1998] or even higher-order methods [Benhimane and Malis, 2007; Keller and Averbuch, 2004, 2008;Megret et al., 2008]. This is theoretically a good approximation when the dis-similarity is small [Irani and Anandan, 1999], although the estimation can beimproved by using coarse-to-fine iterative methods [Irani and Anandan, 1999],or by selecting appropriate pixels [Benhimane et al., 2007]. Although Taylorseries expansion is the usual approach to compute the coefficients of the sys-tem, other approaches such as linear regression [Cootes et al., 2001; Jurie andDhome, 2002a] or numeric differentiation [Gleicher, 1997] may be used.

Compute the descent direction The descent direction is a vector δµ in thesearch space such that D(µ+ δµ) < D(µ). In a GN-based algorithm, we solvethe linear system of equations of the previous stage using least-squares [Bakerand Matthews, 2004; Madsen et al., 2004]. Note that we do not perform theline search stage—i.e., we implicitly assume that the step size α = 1, cf.Appendix A.

Update the search parameters Once we have determined the search direction,δµ, we compute the next point in the series by using the update functionU : RP 7→ R

P : µ1 = U(µ0, δµ). We compute the dissimilarity value at µ1 tocheck convergence: if the dissimilarity is below a given threshold, then µ1 is theminimizer µ∗—i.e., µ∗ = µ1; in other case, we repeat the whole process (i.e.µ1 are the actual current parameters µ) until we find a suitable minimizer.

3.2.4 Additive vs. Compositional

We turn our attention to the step 4 of Algorithm 1: how to compute the new es-timation of the optimization parameters. In a GN optimization scheme, the new

25

Page 48: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 1 Outline of the basic GN-based descent method for imageregistration

On-line: Let µi = µ0 be the initial guess.1: while no convergence do2: Compute the dissimilarity function at D(µi).3: Compute the search direction: linearize the dissimilarity and compute the

descent direction, δµi.4: Update the optimization parameters:µi+1 = U(µi, δµi).5: end while

Figure 3.3: Generic descent method for image registration. We initialize thecurrent parameter estimation at frame It+1 (µ = µ0) using the local minimizer at theprevious frame It (µ0 = µ∗

t ). We compute the Dissimilarity Measure between the Im-age and the Template using µ (Equation 3.3). We linearize the dissimilarity measureto compute the descent direction of the search parameters (δµ). We update the searchparameters using the search direction and we obtain an approximation to the minimum(µ1). We check if µ1 is a local minimizer by using the brightness dissimilarity: if D issmall enough, then µ1 is the local minimizer (µ∗ = µ1); in other case, we repeat theprocess with using µ1 as the current parameters estimation (µ = µ1).

26

Page 49: Efficient Model-based 3D Tracking by Using Direct Image Registration

parameters are typically computed by adding the former optimization parametersto the search direction vector: µt+1 = µt + δµt (cf. Appendix A); this summationis a direct consequence of the definition of Taylor series [Madsen et al., 2004]. Wecall additive approaches to those methods that update parameters by using addi-tion [Hager and Belhumeur, 1998; Irani and Anandan, 1999; Lucas and Kanade,1981]. Nonetheless, Baker and Matthews [Baker and Matthews, 2004] subsequentlyproposed a GN-based method that updated the parameters using composition—i.e., µt+1 = µt δµt. We call these methods compositional approaches [Baker andMatthews, 2004; Cobzas et al., 2009; Munoz et al., 2005; Romdhani and Vetter,2003; Xu and Roy-Chowdhury, 2008].

3.3 Additive approaches

In this section we review some works that use additive update. We introduce theLucas-Kanade algorithm, the fundamental work on direct image registration. Weshow the basic algorithm as well as the common problems regarding the method. Wealso introduce the Hager-Belhumeur approach to image registration and we pointout its highlights.

3.3.1 Lucas-Kanade Algorithm

The Lucas-Kanade (LK) algorithm [Lucas and Kanade, 1981] solves the registrationproblem using a GN optimization scheme. The algorithm defines the residuals r ofEquation 3.3 as

r(µ) ≡ T(x)− I(f(x;µ)). (3.4)

The corresponding linear model for these residuals is

r(µ+ δµ) ≃ ℓ(δµ) ≡ r(µ) + r′(µ)δµ

= r(µ) + J(µ)δµ,(3.5)

where

r(µ) ≡ T(x)− I(f(x;µ)), and J(µ) ≡∂I(f(x; µ)

∂µ

∣∣∣∣µ=µ

. (3.6)

Hence, our optimization process amounts to minimise now

δµ∗ = argminδµ

ℓ(δµ)⊤ℓ(δµ) = argminδµ

L(δµ). (3.7)

We compute the local minimizer of L(δµ) as follows:

0 = L′(δµ) = ∇δµ

(r(µ)⊤r(µ) + 2δµ⊤J(µ)r(µ) + δµ⊤J(µ)⊤J(µ)δµ

)

= J(µ)r(µ) + J(µ)⊤J(µ)δµ.(3.8)

Again, we obtain an approximation to the local minimum at

δµ = −(J(µ)⊤J(µ)

)−1J(µ)⊤r(µ), (3.9)

which we iteratively refine until we find a suitable solution. We summarize theoptimization process in Algorithm 2 and Figure 3.4.

27

Page 50: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 2 Outline of the Lucas-Kanade algorithm.

On-line: Let µi = µ0 be the initial guess.1: while no convergence do2: Compute the residual function at r(µi) from Equation 3.4.3: Linearize the dissimilarity: J = ∇µr(µi)

4: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

5: Update the optimization parameters:µi+1 = µi + δµi.6: end while

Figure 3.4: Lucas-Kanade image registration. We initialize the current parameterestimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frame It(µ0 ≡ µ∗

t ). We compute the dissimilarity residuals between the Image and the Templateusing µ (Equation 3.4). We linearize the residuals at the current parameters µ, andwe compute the descent direction of the search parameters (δµ). We additively updatethe search parameters using the search direction and we obtain an approximation to theminimum—i.e. µ1 = µ0+ δµ. We check if µ1 is a local minimizer by using the brightnessdissimilarity: if D is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in othercase, we repeat the process with using µ1 as the current parameters estimation (µ ≡ µ1).

28

Page 51: Efficient Model-based 3D Tracking by Using Direct Image Registration

Known Issues

The LK algorithm is one instance of a well known technique for object tracking, [Bakerand Matthews, 2004]. The most remarkable feature of this algorithm is its robust-ness: given a suitable bcc, the LK algorithm typically ensures a good convergence.However, the algorithm has a series of weaknesses that degrades the overall perfor-mance of the tracking:

Computational Cost The LK algorithm computes the Jacobian at each iterationof the optimization loop. Furthermore, the minimization cycle is repeatedbetween each two consecutive frames of the video sequence. The consequenceis that the Jacobian is computed F× L times, where F is the number of framesand L is the number of iterations in the optimization loop. The computationalburden of these operations is really high if the Jacobian is large: we have tocompute the derivatives at each point of the target region, and each pointcontributes to a row in the Jacobian. As an example, Table 7.15—page 106—compares the computational complexity of LK algorithm with respect to otherefficient methods.

Local Minima The GN optimization scheme, which is the basis for the LK al-gorithm, is prone to get trapped in local minima. The very essence of theminimization implies that the algorithm converges to the closest minimumto the starting point. So, we must choose the initial guess of the optimiza-tion very carefully to assure convergence to the true optimum. The best wayto guarantee that the starting point for tracking and the optimum are closeenough is imposing that the differences between consecutive images are small.On the contrary, images with large baseline will cause problems to LK as fallinginto local minima is more likely, which leads to incorrect alignment. To solvethis problem, common to all direct approaches, a pyramidal implementationof the optimization may be used [Bouguet, 2000].

3.3.2 Hager-Belhumeur Factorization Algorithm

We review now an efficient algorithm for determining the motion parameters of thetarget. The algorithm is similar to LK, but uses a priori information about the tar-get motion and structure to save computation time. The Hager-Belhumeur (HB) orfactorization algorithm was first proposed by G. Hager and P. Belhumeur in [Hagerand Belhumeur, 1998]. The authors noticed the high computational cost when lin-earizing the brightness error function in the LK algorithm: the dissimilarity dependson each different frame of the sequence, It. The method focuses on how to efficientlycompute the Jacobian matrix of step 3 of the LK algorithm (see Algorithm 2). Thecomputation of the Jacobian in the HB algorithm has two separate stages:

1. Gradient replacementThe key idea is to use the derivatives at the template T instead of computingthe derivatives at frame It when estimating J. Hager and Belhumeur dealt with

29

Page 52: Efficient Model-based 3D Tracking by Using Direct Image Registration

this issue in a very neat way: they noticed that, if the bcc (Equation 3.1) re-lated image and template brightness values, it could possibly relate also imageand template derivatives—cf. [Hager and Belhumeur, 1998]. The derivativesof both sides of Equation 3.1 with respect to the target region coordinates are

∇xT (x) =∇xIt(f(x;µt)), x ∈ X ,

=∇xIt(x)∇xf(x;µ), x ∈ X . (3.10)

On the other hand, we compute the Jacobian as

J =∇µtIt(f(x;µt)),

=∇xIt(x)∇µtf(x;µ). (3.11)

We isolate the term ∇tIx(x) in Equations 3.10 and 3.11, and we equal theremaining terms as follows:

J = ∇xT (x)∇xf(x;µ)−1∇µt

f(x;µ). (3.12)

Notice that in Equation 3.12 the Jacobian depends on the template derivatives,∇xT (x), which are constant. Using template derivatives speed up the wholeprocess up to 10-fold (cf. Table 7.16–page 106).

2. FactorizationEquation 3.12 reveals the internal structure of the Jacobian: it comprisesthe product of three matrices: a matrix ∇xT (x) that depends on templatebrightness values and two matrices,∇xf(x;µ)

−1 and ∇µtf(x;µ), whose values

depend on both the target shape coordinates and the motion parameters µt.The factorization stage re-arranges the Jacobian internal structure such thatwe speed up the computation of this matrix product.

A word about factorization In the literature, matrix factorization or ma-trix decomposition refers to the process that expresses the values of amatrix as the product of matrices of special types. One mayor exampleis to factorize a matrix A into the product of a lower triangular ma-trix L and and upper triangular matrix U, A = LU. This factorizationis called lu decomposition and it allows us to solve the linear systemAx = b more efficiently: solving Ux = L−1b require fewer additions andmultiplications than the original system, [Golub and Van Loan, 1996].Other famous examples of matrix factorization are spectral decomposi-tion, Cholesky factorization, Singular Value Decomposition (svd) andqr factorization (see [Golub and Van Loan, 1996] for more information).

The key concept on using factorization in this problem states as follows:

Given a matrix product whose operands contain both constant andvariable terms, we want to re-arrange the product such that oneoperand contains only constant values and the other one only con-tains variable terms.

30

Page 53: Efficient Model-based 3D Tracking by Using Direct Image Registration

We rewrite this idea in equation as follows:

J = ∇xT (x)∇xf(x;µ)−1∇µt

f(x;µ) = S(x)M(µ), (3.13)

where S(x) contains only target coordinate values and M(µ) contains onlymotion parameters. The process to decompose the matrix J into the productS(x)M(µ) is generally ad hoc: we must gain insight of the analytic structureof the matrices ∇xf(x;µ)

−1 and ∇µtf(x;µ) to re-arrange their entries into

S(x)M(µ) [Hager and Belhumeur, 1998]. This process is not obvious at alland it has been a frequent source of criticism for the HB algorithm [Bakerand Matthews, 2004]. However, we shall introduce procedures for systematicfactorization in Chapter 5

We outline the basic HB optimization in Algorithm 3.3; notice that the onlydifference with the LK algorithm lies on the Jacobian computation. We depict thedifferences more clearly in Figure 3.5: in the dissimilarity linearization stage we usethe derivatives of the template instead of the frame.

Algorithm 3 Outline of the Hager-Belhumeur algorithm.

Off-line: Let µi = µ0 be the initial guess.1: Compute S(x)

On-line:2: while no convergence do3: Compute the residual function at r(µi) from Equation 3.4.4: Compute the matrix M(µi)5: Compute the Jacobian: J(µi) = S(x)M(µi)

6: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

7: Update the optimization parameters:µi+1 = µi + δµi.8: end while

3.4 Compositional approaches

From Section 3.2.4 we recall the definition of compositional method : a GN-likeoptimization method that updates the search parameters using function composition.We review two compositional algorithms: the Forward Compositional (FC) and theInverse Compositional (IC), [Baker and Matthews, 2004].

A word about composition Function composition is usually defined as the ap-plication of the results of a function onto another. Let f : X 7→ Y , andg : Y 7→ Z be two function applications. We define the composite func-tion g f : X 7→ Z as (g f)(x) = g(f(x)). In the literature on imageregistration the problem is posed as follows: Let f : R2 7→ R

2 be the tar-get motion model parameterized by µ. We compose the target motion as

31

Page 54: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 3.5: Hager-Belhumeur image registration. We initialize the current param-eter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previous frameIt (µ0 ≡ µ

∗t ). We additionally create the matrix S(x) whose entries depend on the target

values. We compute the dissimilarity residuals between the Image and the Template usingµ (Equation 3.4). Instead of linearizing the residuals, we compute the Jacobian matrixat µ using Equation 3.12, and we solve for the descent direction using Equation 3.9. Weadditively update the search parameters using the search direction and we obtain an ap-proximation to the minimum— i.e. µ1 = µ0 + δµ. We check if µ1 is a local minimizerby using the brightness dissimilarity: if D is small enough, then µ1 is the local minimizer(µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the current parametersestimation (µ ≡ µ1).

32

Page 55: Efficient Model-based 3D Tracking by Using Direct Image Registration

z = f(f(x;µ1);µ2) = f(x;µ1 µ2) ≡ f(x;µ3), that is, the coordinates zare the result of mapping x onto y = f(x;µ1) and y onto z = f(y;µ2). Werepresent the composite parameters as µ3 = µ1 µ2 such that z = f(x;µ3).

3.4.1 Forward Compositional Algorithm

The FC algorithm was first proposed in [Shum and Szeliski, 2000], although theterminology was introduced in [Baker and Matthews, 2001]: FC is an optimizationalgorithm, equivalent to the LK approach, that relies in a compositional updatestep. Compositional algorithms for image registration uses a dissimilarity brightnessfunction slightly different from Equation 3.3; we pose the image registration problemas the following optimization:

µ∗ = argminµ

D(X ;µ)2, (3.14)

with

D(X ;µ) = T (X )− It+1(f(f(X ;µ);µt)), (3.15)

where µt comprises the optimal parameters at the image It. Note that our searchvariables µ are those parameters that should be composed with the current estima-tion to yield the minimum. The residuals corresponding to Equation 3.15 are

r(µ) ≡ T(x)− It+1(f(f(x;µ);µt)), (3.16)

As in the LK algorithm, we compute the linear model of the residuals, but now atthe point µ = 0 in the search space:

r(0+ δµ) ≃ ℓ(δµ) ≡ r(0) + r′(0)δµ

= r(0) + J(0)δµ,(3.17)

where

r(0) ≡ T(x)− It+1(f(f(x;0);µt)),

and J(0) ≡∂It+1(f(f(x; µ);µt)

∂µ

∣∣∣∣µ=0

.(3.18)

Notice that, in this case, µt acts as a constant in the derivative. Again, the localminimizer is

δµ = −(J(0)⊤J(0)

)−1J(0)⊤r(0). (3.19)

We iterate the above procedure until convergence. The next point in the iterativeseries is not computed as µt+1 = µt+ δµ, but as µt+1 = µt δµ to be coherent withEquation 3.16. Also notice that the Jacobian J(0) (Equation 3.18) is not constantas it depends both in the image It+1 and the parameters µt. Figure 3.6 shows agraphical depiction of the algorithm that is outlined in Algorithm 4.

33

Page 56: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 4 Outline of the Forward Compositional algorithm.

On-line: Let µi = µ0 be the initial guess.1: while no convergence do2: Compute the residual function at r(µi) from Equation 3.16.3: Linearize the dissimilarity: J = ∇µr(0), using Equation 3.18.

4: Compute the search direction: δµi = −(J(0)⊤J(0)

)−1J(0)⊤r(0).

5: Update the optimization parameters:µi+1 = µi δµi.6: end while

Figure 3.6: Forward compositional image registration. We initialize the currentparameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previousframe It (µ0 ≡ µ∗

t ). We compute the dissimilarity residuals between the Image andthe Template using µ (Equation 3.15). We linearize the residuals at µ = 0 and wecompute the descent direction δµ using Equation 3.19. We update the parameters usingfunction composition— i.e. µ1 = µ0 δµ. We check if µ1 is a local minimizer by usingthe brightness dissimilarity: if D (Equation 3.15) is small enough, then µ1 is the localminimizer (µ∗ ≡ µ1); in other case, we repeat the process with using µ1 as the currentparameters estimation (µ ≡ µ1).

34

Page 57: Efficient Model-based 3D Tracking by Using Direct Image Registration

3.4.2 Inverse Compositional Algorithm

The IC algorithm reinterprets the FC optimization scheme by changing the rolesof the template and the image. The key feature of IC is that its GN Jacobian isconstant : we compute the Jacobian using only template brightness values, thereforeit is constant. Using a constant Jacobian speeds up the whole computation asthe linearization stage is the most critical in time. The IC algorithm receives itsname because we reverse the roles of the template and the current frame (i.e. wecompute the Jacobian on the template). We rewrite the residuals function from FC

(Equation 3.16) as follows:

r(µ) ≡ T(f(x;µ))− It+1(f(x;µt)), (3.20)

yielding the residuals for IC. Notice that the template brightness values now dependon the search parameters µ. We linearize the Equation 3.20 around the point µ = 0in the search space:

r(0+ δµ) ≃ ℓ(δµ) ≡ r(0) + r′(0)δµ

= r(0) + J(0)δµ,(3.21)

where

r(0) ≡ T(f(x;0))− It+1(f(x;µt)),

and J(0) ≡∂T(f(x; µ))

∂µ

∣∣∣∣µ=0

.(3.22)

We compute the local minimizer of Equation 3.7 by deriving it respect δµ andequalling to zero,

0 = L′(δµ) = ∇δµ

(r(0)⊤r(0) + 2δµ⊤J(0)r(0) + δµ⊤J(0)⊤J(0)δµ

)

= J(0)r(0) + J(0)⊤J(0)δµ.(3.23)

Again, we obtain an approximation to the local minimum at

δµ = −(J(0)⊤J(0)

)−1J(0)⊤r(0), (3.24)

which we iteratively refine until we find a suitable solution. We summarize theoptimization process in Algorithm 5 and Figure 3.7.

Note that the Jacobian matrix J(0) is constant as it is computed on the templateimage—which is fixed—at the point µ = 0 (cf. Equation 3.22). Notice that thecrucial point of the derivation of the algorithm lies in the change of variables inEquation 3.20. Solving for the search direction only consists on computing theIC residuals and computing the least-squares approximation (Equation 3.24). TheDissimilarity Linearization stage from Algorithm 1 is no longer required, whichresults in a boost of the performance of the algorithm.

35

Page 58: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 5 Outline of the Inverse Compositional algorithm.

Off-line: Compute J(0) = ∇µr(0) using Equation 3.22.On-line: Let µi = µ0 be the initial guess.1: while no convergence do2: Compute the residual function at r(µi) from Equation 3.20.

3: Compute the search direction: δµi = −(J(0)⊤J(0)

)−1J(0)⊤r(0).

4: Update the optimization parameters:µi+1 = µi δµ−1i .

5: end while

Figure 3.7: Inverse compositional image registration. We initialize the currentparameter estimation at frame It+1 (µ ≡ µ0) using the local minimizer at the previousframe It (µ0 ≡ µ∗

t ). At this point we compute the Jacobian J(0) using Equation 3.22.We compute the dissimilarity residuals between the Image and the Template using µ(Equation 3.15). Using J(0) we compute the descent direction δµ (Equation 3.24). Weupdate the parameters using inverse function composition— i.e. µ1 = µ0 δµ−1. Wecheck if µ1 is a local minimizer by using the brightness dissimilarity: if D (Equation 3.15)is small enough, then µ1 is the local minimizer (µ∗ ≡ µ1); in other case, we repeat theprocess with using µ1 as the current parameters estimation (µ ≡ µ1).

36

Page 59: Efficient Model-based 3D Tracking by Using Direct Image Registration

Relevance of IC

The IC algorithm is known to be the most efficient optimization technique for directimage registration [Baker and Matthews, 2004]. The algorithm was initially pro-posed for template tracking, although it was later improved to use aams [Matthewsand Baker, 2004], register 3D Morphable Models [Romdhani and Vetter, 2003; Xuand Roy-Chowdhury, 2008], account for photometric changes [Bartoli, 2008] andallow for appearance variation [Gonzalez-Mora et al., 2009].

Some efficient algorithms using a constant residual Jacobian with additive in-crements have been proposed in literature but no one shows reliable performance:in [Cootes et al., 2001] an iterative regression-based gradient scheme is proposed toalign AAM to frontal images of faces. The regression matrix (similar to our Jaco-bian matrix) is numerically computed off-line and it remains constant during theGauss-Newton optimisation. The method shows good performance because the so-lution does not depart far from the initial guess. The method is revisited in [Donneret al., 2006] using Canonical Correlation Analysis instead of numerical differentia-tion to achieve better convergence rate and range. In [La Cascia et al., 2000] theauthors propose a Gauss-Newton scheme with constant Jacobian matrix for 6-dof3D tracking of heads. The method needs regularisation constraints to improve theconvergence of the optimisation.

Recently, [Brooks and Arbel, 2010] augmented the scope of the IC frameworkwith the Generalized Inverse Compositional (GIC) image registration: they proposean additive update to the parameters that is equivalent to the compositional updatefrom IC; therefore, they can adapt the IC to other optimization methods than GN,such as Broyden-Fletche-Goldfarb-Shanno (bfgs) [Press et al., 1992].

3.5 Other Methods

Iterative gradient-based optimization algorithms (see Figure 3.4) can improve theirefficiency in two different ways: (1) by speeding up the linearization of the dissim-ilarity function, and (2) by reducing the number of iterations of the process. Thealgorithms that we have presented—i.e. HB and IC—belong to the first type. Thesecond type of methods achieve efficiency by using a more involved linearizationthat converges faster to the minimum. [Averbuch and Keller, 2002] approximatesthe error function in both the template and the current image and average theleast-squares solution to both. They show it converges with less iterations thanLK although the time per iteration is higher. Malis et. al [Benhimane and Malis,2007] propose a similar method called Efficient Second-Order Minimization (esm)which differs from the latter in using an efficient linearization on the template bymeans of Lie algebra properties. Recently, both methods have been revisited andreformulated in a common Bi-directional Framework in [Megret et al., 2008]. [Kellerand Averbuch, 2008] derives a high-order approximation to the error function thatleads to a faster algorithm with a wider convergence basin. Unfortunately—withthe exception of esm—none of these algorithm are appropriate for real-time image

37

Page 60: Efficient Model-based 3D Tracking by Using Direct Image Registration

registration.

3.6 Summary

We have introduced the basic concepts on direct image registration. We pose the reg-istration problem as the result of gradient-descent optimizing a dissimilarity functionbased on brightness differences. We classify the direct image registration algorithmsas either additive or compositional: in the former group we highlight the LK and theHB algorithms, whereas the FC and IC algorithms belong to the latter.

38

Page 61: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 4

Equivalence of Gradients

In this chapter we introduce the concept of Equivalence of Gradients, that is, theprocess of replacing the gradient of a brightness function for an equivalent alterna-tive. In chapter 3 we have shown that some efficient algorithms for direct imageregistration use a gradient replacement technique as a basis for their speed improve-ment: (1) HB algorithm transforms the template derivatives using the target warp toyield the image derivatives; and (2) IC algorithm replaces the image derivatives bythe template derivatives without any modification, but they change the parametersupdate rule so the GN-like optimization converges. We introduce a new constraint,the Gradient Equivalence Equation, and we show that this constraint is a necessaryrequirement for the high computational efficiency of both HB and IC algorithms.

We organize the chapter as follows: Section 4.1 introduces the basic conceptson image gradients in R

2, and its extension to higher dimension spaces such as P2

and R3; Section 4.2 introduces the Gradient Equivalence Equation, that shall be

subsequently used to impose some requirements on the registration algorithms.

4.1 Image Gradients

We introduce the concept of gradient of a scalar function below. We consider imagesas functions in two dimensions that assign a brightness value to an image pixelposition.

The Concept of Gradient The gradient of a scalar function f : Rn 7→ R at apoint x ∈ R

n is a vector ∇f(x) ∈ Rn that points towards the direction of greatest

rate of increase of f(x). The length of the gradient vector |∇f(x)| is the greatestrate of change of the function.

Image Gradients Grayscale images are discrete scalar functions I : R2 7→ R

ranging from 0 (black) to 255 (white)—see Figure 4.1. We turn our attention tograyscale images, but we may deal with colour-channelled images (e.g. rgb images)by simply considering them as one grayscale image per colour plane. Grayscale

39

Page 62: Efficient Model-based 3D Tracking by Using Direct Image Registration

images are discrete functions: we represent an image as a matrix whose elementsI(i, j) are the brightness function values. We continuously approximate the discretefunction by using interpolation (see Figure 4.1).

We introduce the image gradients in the most common domains in ComputerVision—R

2, P2, and R

3. Image gradients are naturally defined in R2, since the

images are functions defined in such domain. In some Computer Vision applicationsthe domain of x, D, is not constrained to R

2, but to P2 [Buenaposada and Baumela,

2002; Cobzas et al., 2009], or to R3 [Sepp, 2006; Xu and Roy-Chowdhury, 2008]. In

the following, the target coordinates are expressed in a domain D ∈ R3,P2, sowe need a projection function to map the target coordinates onto the image. Wegenerically define the projection mapping as p : D 7→ R

2.

The corresponding projectors are the homogeneous to Cartesian mapping, p :P2 7→ R

2, and the perspective projection, p : R3 7→ R2. Image gradients in domains

other than R2 are computed by using the chain rule with the projector p : Rn 7→ R

2:

∇x(I p(x)) = ∇xI(p(x)) = ∇xI(x)∇xp(x),

=

∂I(X)

∂X

∣∣∣∣∣X=p(x)

(

∂p(Y)

∂Y

∣∣∣∣∣Y=x

)

,

= ∇ ˆp(x)I(p(x))∇xp(x), x ∈ D ⊂ Rn.

(4.1)

Equation 4.1 represents image gradients in domain D as the image gradient inR

2 lifted up onto the higher-dimension space D by means of the Jacobian matrix∇xp(x).

Notation We use operator [ ] to denote the composite function I p, that is,

I(p(x)) = I[x].

4.1.1 Image Gradients in R2

If the target and its kinematics are expressed in R2, there is no need to use a

projector as both the target and the image share a common reference frame. Thegradient of a grayscale image at point x = (i, j)⊤ is the vector

∇xI(x) = (∇iI(x),∇jI(x)) =

(∂I(x)

∂i,∂I(x)

∂j

)

, (4.2)

that flows from the darker areas of the image to the brighter ones (see Figure 4.1).Moreover, the direction of the gradient vector at point x ∈ R

2 is orthogonal to thelevel set of the brightness function at the point (see Figure 4.1).

40

Page 63: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 4.1: Depiction of Image Gradients. (Top-left) An image is a rectangulararray where each element is a brightness value. (Top-right) Continuous representationof the image brightness values; we compute the values from the discrete array by interpo-lation. (Bottom-left) Image gradients are vectors from each image array element in thedirection of maximum increase of brightness (compare to the top-right image). (Bottom-right) Gradient vectors are orthogonal to the brightness function contour curves. Legend:blue Gradient vectors. different colours Contour curves.

41

Page 64: Efficient Model-based 3D Tracking by Using Direct Image Registration

4.1.2 Image Gradients in P2

Projective warps map the projective plane onto itself f : P2 7→ P2. We represent

points in P2 using homogeneous coordinates (see Section 3.2.1).

We compute the image derivatives on the projective plane by using derivativeson homogeneous coordinates. We compute the gradient of the composite functionI p at the point x = (x, y, w)⊤ ∈ P

2 using the chain rule:

∇xI[x] =

[∂I(p(x))

∂x

∣∣∣∣x=x

]

=

[∂I(p)∂p

∣∣∣∣p=p(x)

]⊤ [∂p(x)∂x

∣∣∣∣x=x

]⊤

,

=[∇iI ∇jI

]⊤

[1w 0 − x

w2

0 1w − y

w2

]

,

=[

1w∇iI

1w∇jI − 1

w2 (x∇iI + y∇jI)]

.

(4.3)

Geometric Interpretation The image brightness gradient in P2 has a geometric

interpretation. The following proposition defines the geometric locus of the imagegradient in P

2

Proposition 1. The image gradient of x ∈ P2 is a projective line l incident to the

point—i.e. l⊤x = 0.

Proof. The projective line l =[1w∇iI

1w∇jI − 1

w2 (x∇iI + y∇jI)]is incident to

the projective point x = (x, y, z)⊤ since:

l⊤x =[

1w∇iI

1w∇jI − 1

w (x∇iI + y∇jI)]⊤

xyw

,

=x

w∇jI +

y

w∇jI −

w

w2 (x∇iI + y∇jI) = 0.

(4.4)

Figure 4.2 depicts the image gradients in P2 for image T . This may also be derived

by using Euler’s homogeneous function theorem1: I[x] is a homogeneous function ofdegree 0 in x ∈ P

2,I[λx] = I[x] = λnI[x],

with λ 6= 0 and n = 0. Then, by Euler’s theorem we have

∂I[λx]

∂λ= 0 = n · I[x] = x

∂I[x]

∂x+ y

∂I[x]

∂y+ w

∂I[x]

∂w.

1Let f : Rn+ 7→ R be a continuously differentiable homogeneous function of degree n (i.e.

f(λx) = λnf(x)), then nf(x) =∑i

xi∂f∂xi

42

Page 65: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 4.2: Image Gradient in P2. (Left) Coordinates in P

2 are equal up to scale, thatis, x ∼ λx ∼ λ′x. Incidence is also preserved up to scale—i.e., xl⊤ = λxl⊤ = λ′xl⊤ = 0.(Right) The image gradient in P

2, l⊤, is tangent to the contour of the brightness function.Notice that the normal to l⊤ is parallel to the image gradient in R

2, ∇xI(x).

Corollary 1. The director vector of the image gradient at x ∈ P2 is orthogonal to

the image brightness contour at p(x).

Proof. The image gradient at p(x), ∇ ˆp(x)I(p(x)), is tangent to the contour of the

brightness function at that point. Thus, its director vector, which is[1w∇iI

1w∇jI

],

is orthogonal to the brightness function contour curve—see Figure 4.2.

4.1.3 Image Gradients in R3

Often, the target is not defined in R3, but on a manifold in R

3—e.g., a surfaceembedded in 3D space. Images defined in the whole R3 space provide 3D volumetricdata instead of two-dimensional information [Baker et al., 2004b].

We assume that the target is defined on a manifold D ∈ R3, and that the warp

f : R3 7→ R3 defines the motion of manifold D. In this case P : R3 → R

2 is aprojector such that (x, y, z)⊤ 7→ (f x

z, f y

z)⊤—with f , the camera focal length.

We compute the image derivatives by using the chain rule on the function I P

43

Page 66: Efficient Model-based 3D Tracking by Using Direct Image Registration

at the point x = (x, y, z)⊤ ∈ R3:

∇xI[x] =

[∂I(P(x))

∂x

∣∣∣∣x=u

]

=

[∂I(p)∂p

∣∣∣∣p=P(u)

]⊤ [∂P(x)∂x

∣∣∣∣x=u

]⊤

,

=[∇iI ∇jI

]⊤

[f 1z 0 −f x

z2

0 f 1z −f y

z2

]

,

=[f

z∇iIf

z∇jI − f

z2(x∇iI + y∇jI)

]

.

(4.5)

Geometric Interpretation As in projective coordinates, image gradients in R3

can be geometrically represented. Sepp [Sepp, 2008] introduces the following propo-sition:

Proposition 2. The image gradient of a point x ∈ R3 is a plane through x and the

origin.

Proof. A plane through a point x ∈ R3 and the origin o is a 3-element vector π

such that π⊤x = 0. This is true in our case as:

π⊤x =[1z∇iI

1z∇jI − 1

z (x∇iI + y∇jI)]⊤

xyz

,

=x

z∇jI +

y

z∇jI −

z

z2(x∇iI + y∇jI) = 0.

(4.6)

Thus, the point x belongs to the plane as π⊤x = 0. Besides, o = 0 trivially belongsto the plane as π does not have an independent term (i.e. π⊤0 = 0). We show thegeometry of the image gradient in R

3 in Figure 4.3. As in the P2, this proposition

is an immediate result from the fact that I[x] is a homogeneous function of degree0 in x ∈ R

3.

We also infer the following two corollaries from Proposition 2:

Corollary 2. The image gradient of a point x ∈ R3, is a plane π through the origin

that also contains the projection of the point, P(x).

Proof. Let x = (f xz , fyz , f)

⊤ be the projection of x onto the image plane by meansof P. Point x belongs to the plane π⊤ defined by the target image gradient asπ⊤x = 0 and the origin,

π⊤x =[1z∇iI

1z∇jI −1

z (x∇iI + y∇jI)]⊤

f xzfyzf

,

= fx

z2∇jI + f

y

z2∇jI −

f

z2(x∇iI + y∇jI) = 0.

(4.7)

44

Page 67: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 4.3: Image gradient in R3. (Left) A 3D model is imaged onto the camera. The

gradient on the image of the target point x is a plane that contains both x and the origin(the camera centre). (Right) Close-up of the image at x. Plane π contains ∇xI(x)

⊥,the tangent to the brightness function. Thus, the plane vector π is orthogonal to thebrightness function.

This corollary could be immediately proved by noticing that the projection of thepoint x onto the image belongs to the line through o and x by definition of perspec-tive projection, and this line is contained in the plane π (see Figure 4.3).

Corollary 3. The image gradient of a point x ∈ R3, π, is a plane going through the

origin whose director vector is orthogonal to the brightness contour at the image pointP(x).

Proof. The image gradient at p(x) is tangent to the contour of the brightness func-tion at that point; the normal vector is thus orthogonal to the brightness contourcurve.

Note that P2 can be interpreted as an Euclidean space where points are lines throughthe origin and lines are planes through the origin [Hartley and Zisserman, 2004].Thus, results for P2 and R

3 are equivalent.

4.2 The Gradient Equivalence Equation

We recall the brightness constancy constraint bcc from Equation 3.1:

T (X ) = It(f(X ;µt)).

The Gradient Equivalence Equation (GEE) relates the derivatives of the brightnessfunction between two frames of the sequence:

∇xT (X ) = ∇xIt(f(X ;µt)). (4.8)

45

Page 68: Efficient Model-based 3D Tracking by Using Direct Image Registration

We define the GEE to be the differential counterpart of the bcc: the bcc relatesimage brightness values between two images whereas the GEE relates image gradi-ents (see Figure 4.4). Note that image brightness are scalars (in grayscale images)but image gradients are vectors

4.2.1 Relevance of the Gradient Equivalence Equation

We verify whether we can substitute image derivatives by their template derivativesby using the gradient equivalence equation: that is, if Equation 4.8 holds then wecan swap gradients. We shall show in Chapters 5 and 6 that swapping gradients isthe cornerstone to speed improvement for both HB and IC algorithms:

• Additive algorithms such as HB rewrites the image gradient as a function ofthe template gradient, increasing the speed of the optimization.

• Compositional algorithms such as IC rely on a compositional formulation ofthe bcc to directly use the template gradient.

We shall show that if the GEE does not hold, the convergence of the algorithmsworsen when using gradient swapping. The foundations for this statement are sim-ple: the Jacobian of a GN-like optimization method—e.g. HB or IC—depends onthe image derivatives. The Jacobian establishes the search direction towards theminimum in the optimization space. When we build the Jacobian from templatederivatives—to gain efficiency—we must guarantee that the resulting Jacobian isequivalent to the one computed from image derivatives. If the GEE (Equation 4.8)holds, then the Jacobian matrices computed from both template and image deriva-tives are equivalent. If the Jacobians are not equivalent, the iterative search direc-tions may not converge to the optimum (see Figure 4.5). Hence, the GEE is directlyrelated to the algorithm convergence.

4.2.2 General Approach to Gradient Replacement

Once we have acknowledged the importance of the GEE, one question remains stillopen: how do we verify that GEE holds? Gradient equivalence (Equation 4.8)implies that the image gradient vectors in both T and It are equal for each pointx ∈ X . Recall that two vectors are equal if both their directions and lengths match.

From basic calculus we recall the following lemma:

Lemma 3. Let f and g be two functions f, g : D 7→ R that coincide in an open setΩ ∋ x0. Then

∂f(x)

∂x

∣∣∣∣x=x0

=∂g(x)

∂x

∣∣∣∣x=x0

.

46

Page 69: Efficient Model-based 3D Tracking by Using Direct Image Registration

T It

Figure 4.4: Comparison between BCC and GEE. (Top-row) The image on the leftis rotated to generate the image on the right. Their bcc states that the image values areequal for the target regions of both images despite their orientation. (Middle-row) Gra-dients corresponding to both images; from left to right: ∇iT (x), ∇jT (x), ∇iIt(f(x;µt)),and ∇jIt(f(x;µt)). Notice that the relative values of ∇iT (x) and ∇iIt(f(x;µt)) are equaldespite their orientation—ditto for ∇jT (x) and ∇jIt(f(x;µt)). (Bottom-row) The GEEstates that the gradient vectors for both images are coherent up to the warp. Transformsone image into another: if a point of an image suffers a rotation, its gradient vector isrotated by the same amount. 47

Page 70: Efficient Model-based 3D Tracking by Using Direct Image Registration

The proof of this lemma is immediate from the definition of open set and theequality of f and g in Ω—see Figure 4.5.

Corollary 4. Let T [x] be the reference texture image, It[f(x;µt)] the input image at

time t warped by f , and x a pixel location. If the bcc holds—i.e. T [x] = It[f(x;µt)]—

in an open set Ω, then the GEE holds,

∇xT [x] = ∇xIt[f(x;µt)], ∀x ∈ Ω.

The proof is immediate from Lemma 3.

We may assume that both T [x] and It[x;µt] are continuous functions of x byusing interpolation from neighbouring pixel locations. Then, the bcc holds in anopen set Ω ⊂ R

2 or Ω ⊂ P2, except maybe for those pixels at image borders or

occlusion boundaries. The GEE consequently holds from Corollary 4.Unfortunately, GEE generally does not hold in R

3. In the case of 2.5d track-ing [Matthews et al., 2007]—a 2D surface embedded in 3D space—T [x] and It[x;µt]do not coincide in an open set, since in general, T [x] 6= It[x;µt] for points outsidethe surface. Thus, the GEE does not hold as a consequence of Lemma 3—see Fig-ure 4.6.

4.3 Summary

As we shall shown later, the GEE is the cornerstone to speed improvement in efficientimage registration algorithms. If the image warping function is defined on R

2 or P2,the GEE is satisfied. If the warping is defined on a 3D manifold D ⊂ R

3—i.e., thecase of 2.5D tracking—the GEE does not hold. In Table 4.1 we enumerate somewarping functions and state whether they satisfy the GEE.

48

Page 71: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 4.5: Gradients and Convergence. (Left) We optimize a quadratic formstarting from the initial guess x0 = (0, 14) using two gradient-based iterative methods.Solid red line: Algorithm that uses a correct Jacobian to compute each iterative stepxi towards the solution; by definition, the gradient is orthogonal to the isocurve of thefunction at each xi. The algorithm reaches the optimum x∗ after some iterations. Dottedblack line: Algorithm that uses an incorrect Jacobian to compute iterative steps xi; thecomputed gradients are no longer orthogonal to the function curve. The algorithm reachesa solution xn that is different to the actual optimum. (Right) Functions f and g, andtheir respective gradients (see figure legend). The grey region represents the open intervalD = (3.5, 5.5).

Figure 4.6: Open Subsets in Various Domains. (Left) Open subset in R2. The

neighbourhood D ⊂ R2 of the point is fully contained in R

2. (Right) Open subset for apoint x ∈ R

3 in a 3D manifold D ⊂ R3. The neighbourhood of the point x is not fully

contained in the manifold D.

49

Page 72: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 4.1: Characteristics of the warps

Warp Name Domain Geometry dof Allowed Motion GEE

2D Affine R2 6

• rotation• translation• scaling• shearing

YES

Homography R2,P2 8

• rotation• translation• scaling• shearing• perspectivedeformation

YES

Plane-inducedhomography(see Appendix B)

P2 6

• 3D rotation• 3D translation

YES(1)

3D Rigid BodyR

3 6• 3D rotation• 3D translation

NO

(1) NO in the case of Plane+Parallax-constrained homography (see Appendix C)

50

Page 73: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 5

Additive Algorithms

This chapter discusses efficient additive algorithms for image registration. We turnour attention to the HB algorithm, as it is known to be the most efficient additivealgorithm for image registration—we do not consider GIC as additive, but composi-tional.

Open Issues with the Factorization Algorithm Although the HB algorithmbeats LK performance and enables us to achieve real-time image registration (cf. [Hagerand Belhumeur, 1998]), it is not free of criticism. Matthews and Baker analyse thealgorithm in [Baker and Matthews, 2004], and we summarize their criticisms in thefollowing two open questions:

Is every warp suitable for HB optimization?The usual criticism to HB is that it works with only a limited number ofmotion models: pure translation, an affine model (scaling, translation, rota-tion and shearing), a restricted affine model (without shearing) and an “es-oteric” (according to [Baker and Matthews, 2004]) non-linear model. [Bakerand Matthews, 2004] also stated that the algorithm could not use homographicwarps, although [Buenaposada and Baumela, 2002] subsequently solved theproblem by using homogeneous coordinates. However, there is no evidencethat the algorithm could work with other warps. In this chapter we intro-duce a requirement that determines whether a given warp works with the HB

algorithm. We show that this requirement is related to the Gradient Re-placement stage of the HB algorithm.

Can we systematize the factorization scheme?The second quibble refers to stage two of the HB algorithm: the Factorizationstep. [Baker and Matthews, 2004] argue that the factorization step must bedone using ad hoc techniques for each motion model. In this chapter and Ap-pendix D we provide lemmas and theorems that systematize the factorizationof any expression involving matrices.

This chapter provides some insight of the two stages of the HB algorithm: in Sec-tion 5.1 we study the requirements to perform the gradient replacement, and in Sec-

51

Page 74: Efficient Model-based 3D Tracking by Using Direct Image Registration

tion 5.2 we study the process of methodical factorization. We subsequently applythis knowledge to register a 3D target model under a rigid body motion (Section 5.3):we provide a suitable warp that can be directly used with the HB algorithm, andwe show the resulting HB optimization. Finally, Section 5.4 shows how to registermorphable models in a similar fashion.

5.1 Gradient Replacement Requirements

In this section we show the necessary requirements to perform the gradient replace-ment operation: that is, we state the requirements on target motion and structurethat assure a proper convergence of the HB algorithm. We recall the scheme of thegradient replacement operation: we expand the GEE (Equation 4.8) using the chainrule,

∇xT [x] = ∇xI[x]∇µf(x;µ), (5.1)

and we isolate the term ∇xI[x] as follows:

∇xI[x] = ∇xT [x] (∇µf(x;µ))−1 . (5.2)

Then, we insert Equation 5.2 into the equation of the Jacobian expanded using thechain rule (Equation 3.11),

J = ∇xT [x]∇xf(x;µ)−1∇µf(x;µ), (5.3)

which expresses the Jacobian matrix in terms of the template gradients. Notice thatthe GEE has a key role in Equation 5.3—it is the basis for the gradient replacement.Thus, we formulate our requirement using the GEE as follows:

Requirement 1. The gradient replacement operation within the HB algorithm isfeasible if and only if the GEE holds.

We do not prove Requirement 1 as it is a direct consequence of the GEE. Require-ment 1 is a rule that let us establish if we can use a warp with the HB optimization.We thoroughly studied the GEE in Chapter 4, and we summarized the relationbetween warps and GEE in Table 4.1. Thus, those warps that do not hold theGEE, shall not satisfy Requirement 1, and consequently won’t be suitable for theHB algorithm.

It is important to note that we additionally require that an inverse must existfor the derivative ∇xf(x;µ)—i.e., warp f must be invertible. If this derivative issingular, we shall not be able to compute Equation 5.3.

5.2 Systematic Factorization

In this section we provide insight into the factorization process. We introduce a me-thodical procedure to perform the factorization. This is by large a quite challenging

52

Page 75: Efficient Model-based 3D Tracking by Using Direct Image Registration

task—as we shall show, the factorization is generally nonunique—but we provide atheoretical framework to support our claims.

Why use factorization? The efficiency of the HB algorithm depends on two fac-tors, (1) the gradient replacement operation, and (2) the factorization of the Jaco-bian matrix. The improvement due to the gradient replacement is noticeable as itavoids to repeatedly compute image gradients. However, the improvement in speeddue to the factorization stage is not obvious at all.

Let us suppose a chain of product matrices,

En×r = An×m Bm×p Cp×q Dq×r , (5.4)

where red matrices—A and D—depend on parameters x, and green matrices—B andC—depend on parameters y. In the factorization we group those matrices whoseelements depend on the same parameters, that is,

En×r = An×m D′m×r′ B′r′×p′ C′p′×r ,

= Xn×r′ Yr′×r

(5.5)

We compute matrix D′ by reordering the elements of matrix D into a suitable matrix—idem for matrices B′ and C′. Furthermore, matrix X in Equation 5.5 only dependson the parameters x as we compute it from the matrices A and D′ (we equivalentlycompute matrix Y). Notice that although matrices in Equation 5.4 are different fromthose in Equation 5.5, the final product E is identical. When some of the parametersare constant—and so are their associated matrices—the improvement in speed dueto factorization is rather noticeable. For example, if we assume that the parametersx are constant, then we avoid to compute the product AD′ from Equation 5.5; hence,the key point of applying factorization to compute E is to spare computing spuriousoperations due to constant terms.

Factorization via reversible operators There is still an open question left:How do we systematize the factorization? [Brand and R.Bhotika, 2001] introducesthe following definition:

Definition 5.2.1. We may reorder arbitrarily a sequence of matrix operations(sums, products, reshapings and rearrangements) using matrix reversible operators.

[Brand and R.Bhotika, 2001] define the matrix reversible operators as those thatdo not lose matrix information. They state that the Kronecker product, ⊗, orcolumn vectorization, vec(A) [K. B. Petersen], are reversible whereas matrix multi-plication or division are not [Brand and R.Bhotika, 2001].

We also introduce the operator ⊙ which performs a row-wise Kronecker productof two matrices. In Appendix D we introduce a series of theorems and lemmas that,using the aforementioned reversible operators, rearrange the product and sum ofmatrices.

53

Page 76: Efficient Model-based 3D Tracking by Using Direct Image Registration

Uniqueness of the factorization In general, the factorization of Equation 5.5is not unique: we can rearrange the matrices of Equation 5.4 in different ways suchthat the result of Equations 5.4 and 5.5 is identical. This situation is particularlynoticeable when we have distributive products: we can either apply the distributiveproperty then factorize, or factorize first, then distribute.

Jacobian matrix factorization We apply the aforementioned procedures to de-compose the Jacobian matrix (Equation 5.3). First, we represent the Jacobian ma-trix as a product of matrices whose elements are either shape or motion parameters.Notice that the terms ∇xf(x;µ)

−1 and ∇µf(x;µ) intermingle shape and motion el-ements such that the factorization is not obvious at all—on the other hand, ∇xT [x]only depends on shape terms. The objective is to represent J as a leftmost matrixwhose elements are only shape terms—or constant values—and a rightmost matrixwhose elements are only motion terms:

J =∇xT [x]∇xf(x;µ)−1∇µf(x;µ),

= S M .(5.6)

We use an iterative procedure to compute the factorization. We represent thisprocedure in Algorithm 6.

Algorithm 6 Iterative factorization of the Jacobian matrix.

Off-line: Represent the Jacobian as a chain of matrix sums or products.On-line:1: Select two adjacent elements such that the right-hand one is a motion term and

the left-hand one is shape term.2: Use one of the factorization lemmas to reverse the order of the terms.3: Repeat until the Jacobian is as Equation 5.6.

Notation of the Factorization Process We consistently use the following no-tation to describe the factorization procedure:

• We represent a matrix whose terms are either constants or shape terms using

a red box S .

• We represent a matrix whose terms are either constants or motion terms using

a green box M .

• When we reverse a pair of shape-motion terms by using a lemma we highlight

the result by using a blue box R . We represent the application of a given

lemma by using an arrow labelled with the name of the rule (see Equation 5.7).

54

Page 77: Efficient Model-based 3D Tracking by Using Direct Image Registration

Em×r =Am×n Bn×p Cp×q Dq×r,

Lemma−−−−→Am×n B′n×p′C

′p′×q Dq×r,

=Am×n B′n×p′ C′p′×q Dq×r.

(5.7)

5.3 3D Rigid Motion

In this section we describe how to register images from a 3D target under a rigidbody motion in R

3 by using a HB optimization. The fundamental challenge lies insatisfying Requirement 1: according to Table 4.1, the usual rigid body warp doesnot verify the GEE. Hence, we demand a warp that both models the target dynamicsand holds Requirement 1.

Previous Work The original HB algorithm was not specifically suited for 3D tar-gets: [Hager and Belhumeur, 1998] only defined affine warps over 2D templates.Later, [Hager and Belhumeur, 1999] extended the HB algorithm to handle 3D motion;nonetheless, the algorithm seems to be limited as it only handles small rotations—around 10 degrees. [Buenaposada and Baumela, 2002] extended the HB algorithmto handle homogeneous coordinates, so a full projective homography could be used.Using this homography, the authors effectively computed the 3D orientation of aplane in space. [Sepp and Hirzinger, 2003] proposed a HB algorithm to register com-plex 3D targets. Unfortunately, the results seems poor as the algorithm handlelimited out-of-plane rotations—less than 30o, cf. [Sepp and Hirzinger, 2003]. Wemay explain this flaw in the performance as a direct result of a wrongful gradient re-placement: the rigid body transformation does not verify the GEE (cf. Chapter 4),hence Requirement 1 does not hold.

We define an algorithm that effectively registers 3D targets by using a family ofhomographies : one homography per plane/triangle of the model. We combine thesehomographies together by considering that all the triangles of the target modelare parameterized by the same rotation and translation [Munoz et al., 2009]. Inthe following we introduce the convention to represent target models, 3D TexturedModels, and a new warp that parameterized a family of homographies for a set ofplanes, the shape-induced homography.

5.3.1 3D Textured Models

We describe the target using 3D Textured Models (3dtm) [Blanz and Vetter, 2003].We follow a convention similar to [Blanz and Vetter, 1999]: we model shape andcolour separately, but both are defined on a common bi-dimensional space, F ⊂ R

2,that we call the Reference Frame. Target shape is a discrete function S : F 7→ R

3

that maps ui = (xi, yi)⊤ ∈ F into si = (Xi, Yi, Zi)

⊤ ∈ R3 for i = 1, . . . , N, where

N is the number of vertices of the target model. Vertices are arranged in a discretepolygonal mesh that we define by a list of triangles. We provide continuity in the

55

Page 78: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 5.1: 3D Textured Model. (Top-left) The bi-dimensional reference frame F .We associate a colour triplet to each discrete point in F , resulting in a texture image.(Top-right) The shape of the model in R

3. (Bottom-left) Close-up of F (green squarein top-left image). The reference frame is discretized in triangles. (Bottom-right) Closeup of the shape s (green square in top-right image). The shape is a 3D triangular mesh.The bi-dimensional triangle (u1,u2,u3)

⊤ is mapped into the 3D triangle (s1, s2, s3)⊤. The

interior point of the triangle u is coherently mapped into its corresponding point in theshape by using barycentric coordinates.

space F by interpolating among neighbouring vertices in the triangle list [Romdhaniand Vetter, 2003]. The target colour or texture, T : F 7→ R

C , similarly maps thebi-dimensional space u ∈ F into the colour space—RGB if coloured or grey scale ifmonochrome (see Figure 5.1). Again, we achieve continuity in the colour space byinterpolating the colour values at the vertices in the triangle list.

56

Page 79: Efficient Model-based 3D Tracking by Using Direct Image Registration

Notation By abuse of notation, T denotes both the usual image function, T :R

2 7→ RC , and the texture defined in the reference space, T : F 7→ R

C .

5.3.2 Shape-induced Homography

We introduce a new warp that (1) represents the target rigid body motion in R3, and

(2) holds Requirement 1. We base this warp on the plane-induced homography fh6—see Appendix B. The plane-induced homography relates the projections of a planethat rotates and translates in R

3. We equivalently define the shape-induced homo-graphies fh6s as a family of plane-induced homographies that relate the projectionsof a shape that rotates and translates in space,

fh6s(x,n;µ) = x′ = K(R− Rtn⊤

)K−1x, (5.8)

where x and x′ are the projections of a generic vertex s of shape S (see Figure 5.2).We consider that vertex s is the centroid of the triangle with normal n, locatedat d depth from the origin—i.e. n⊤s = d. We normalize n by the triangle depth,n = n/d, such that n⊤s = 1. Vector µ contains a parameterization for R and t.Notice that µ is common to every point in S but n is not; hence, we have one plane-induced homography for each pair of projections x ↔ x′, but every homographyshares the same R and t (see Figure 5.2).

5.3.3 Change to the Reference Frame

Equation 5.8 relates the projections of a shape vertex in two views. We describenow how to express Equation 5.8 in terms of F -coordinates. Let u1, u2, and u3

be the coordinates in the reference frame of the shape triangle s1, s2, and s3—i.e.si = S(ui). Let xi be the projection of si on a given image. Figure 5.3 shows therelationship among the triangles (s1, s2, s3)

⊤, (x1, x2, x3)⊤, and (u1, u2, u3)

⊤. Werepresent the transformation between vertices (u1, u2, u3)

⊤ and (x1, x2, x3)⊤ using

an affine warp HA,

xi = HAui =

a11 a12 txa21 a22 ty0 0 1

(ui

1

)

i = 1, 2, 3, (5.9)

where ui ∈ P2 is the augmented vector of ui. Affine transformation HA is explicitly

defined by the three correspondences u1 ↔ x1, u2 ↔ x2,and u3 ↔ x3; the inte-rior points of the triangles are coherently transformed as the affinity is invariantto barycentric combinations [Hartley and Zisserman, 2004]. When we extend Equa-tion 5.9 to the N vertices of S we obtain a piecewise affine transformation [Matthewsand Baker, 2004] between F and the view (see Figure 5.3). If the affinity HA is notdegenerate—i.e. det(HA) 6= 0—then we can rewrite Equation 5.8 as follows:

f3DTM(u,n;µ) = u′ = K(R− Rtn⊤

)K−1HAu. (5.10)

The transformation f3dtm (Equation 5.10) relates the 3D motion of the shape to thereference frame F (see Figure 5.3).

57

Page 80: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 5.2: Shape-induced homographies. (Top) We respectively image the shapeS using P = [I|0] (left view) and P′ = [R| − Rt] (right view). (Middle) Close-up of the 3Dshape. We select three shape points s1, s2, s3 ∈ R

3, each one belongs to a triangle locatedon a different plane in R

3 —whose normals are respectively π1,π2,π3 ∈ R3. (Bottom)

Close-ups of the two views. We image the shape point si as xi on the left view, and x′i

on the right view, for i = 1, 2, 3. The point si on plane πi induces the homography Hi

between xi and x′i. Note that H1, H2, and H3 are different homographies that share R and

t parameters (cf. Equation 5.8).

58

Page 81: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 5.3: Warp defined on the reference frame. (Top-left) Shape triangle meshand texture on the reference frame F . (Top-right) The image corresponding to the leftview of Figure 5.2. This is the reference image. (Middle) Close-up of the 3D shape.We select three shape points s1, s2, s3 ∈ R

3, each one belongs to a triangle located on adifferent plane in R

3—whose normals are respectively π1,π2,π3 ∈ R3. (Bottom) Close-

ups from the images of the top row: the reference frame (left) and the reference image(right). We image the shape point si as xi on the left view, and x′

i on the right view, fori = 1, 2, 3. On the other hand, we know the relationship between the point si and itscorrespondence in the reference frame ui by means of the function S. Thus, there exists acorrespondence ui ↔ xi by means of si. We compute such a correspondence as an affinetransformation HiA between ui and xi —i.e. xi = HiAui, the correspondence is a piecewiseaffine transformation. Note that transformations HiA are different from each other sincethey depend on the correspondence ui ↔ xi.

59

Page 82: Efficient Model-based 3D Tracking by Using Direct Image Registration

Does f3DTM hold the GEE? Equation 5.10 is a homography resulting fromchaining two nondegenerate homographies. Thus, according to Lemma 3 any ho-mographic transformation such as f3dtm holds the GEE and, by extension, holdsRequirement 1—see Table 4.1.

Advantages of the Reference Frame

Many previous approaches to 3D tracking, e.g. [Baker and Matthews, 2004; Bue-naposada et al., 2004; Cobzas et al., 2009; Decarlo and Metaxas, 2000; Hager andBelhumeur, 1998; Sepp, 2006], use a reference template—or a selected frame of thesequence—to define the texture function T (see Figure 5.4). Using this texture in-formation they define the brightness dissimilarity function that they subsequentlyoptimize (e.g., by using Equation 3.1).

β = 0o β = 30o β = 60o β = 90o

Figure 5.4: Reference frame advantages. (Top-row) We rotate the shape modelaround Y -axis with degrees β = 0, 30, 60, 90. (Bottom-row) Visible points for eachview of the rotated shape. Green dots: Hidden triangles due to self-occlusions. We considerthat a shape triangle is visible in the image if the angle of the normal to the triangle andthe camera ray is less than 70o.

This information is valid in the neighbourhood of the reference image only. Thedissimilarity function uses the texture that is available from the imaged target at thatreference image. Thus, there is no texture information from those parts of the objectthat are not imaged in the template. As the projected appearance changes due tothe motion of the target, more and more uncertainty is introduced in the brightness

60

Page 83: Efficient Model-based 3D Tracking by Using Direct Image Registration

dissimilarity function. This leads to a decrease in the performance of the tracker forlarge rotations [Sepp, 2006; Xu and Roy-Chowdhury, 2008] (see Figure 5.4).

We solve the problem by using the texture reference frame [Romdhani and Vetter,2003]. With the texture reference frame we can define a continuous brightnessdissimilarity function over the whole 3D target. Using this continuous texture wecan define the brightness dissimilarity even in the case of large rotations of thetarget (see Figure 5.4). Another approach to this problem is to use several referenceimages [Vacchetti et al., 2004]. When the target is a 3D plane, one reference imagesuffices to provide texture information [Baker and Matthews, 2004; Buenaposadaand Baumela, 2002; Hager and Belhumeur, 1998]. Although it does not suffer fromself-occlusions, it may have aliasing artefacts.

5.3.4 Optimization Outline

We define the brightness dissimilarity function (using Equation 5.10) as follows:

D(F ;µ) = T [F ]− It[f3dtm(F ,n;µ)], (5.11)

where n ≡ n(u) is the normal vector to the plane that contains the point S(u), ∀u ∈F—strictly speaking, u = (u, 1)⊤, with u ∈ F . The dissimilarity function (Equa-tion 5.11) is continuous over the reference space F—as well as the normal functionn(F). Notice that the dissimilarity function is defined over the texture functionT instead of a single template—as in Equation 3.1. We rewrite Equation 5.11 inresiduals form as:

r(µ) = T[u]− It+1[f3dtm(u;µ)], (5.12)

where we drop the parameter n of f3dtm for clarity. The corresponding linear modelfor the residuals of Equation 5.12 is

ℓ(δµ) ≡ r(µ) + J(µ)δµ, (5.13)

where

J(µ) =∂It+1[f3dtm(u; µ)]

∂µ

∣∣∣∣µ=µ

. (5.14)

5.3.5 Gradient Replacement

We rewrite Equation 5.14 using the gradient replacement equation (Equation 5.3)as follows:

J(µ) = (∇uT[u])⊤ (∇uf3dtm(u;µ))−1 (∇µf3dtm(u;µ)) . (5.15)

In the following, we individually analyze each term of Equation 5.15.

Template gradients on F The first term deals with the template derivatives onthe reference frame F :

∇uT [u]⊤ =

(1

w∇iT [u],

1

w∇jT [u],−

1

w(u∇iT [u] + v∇jT [u])

)⊤

. (5.16)

61

Page 84: Efficient Model-based 3D Tracking by Using Direct Image Registration

Warp gradients on target coordinates The second term handles the gradientsof the warp f3dtm(u;µ) with respect to the target coordinates u. The target isdefined on the projective plane P

2, so we trivially compute the gradient as:

∇uf3dtm(u,n;µ) = K(R− Rtn⊤

)K−1HA. (5.17)

The resulting homography matrix (see Equation 5.17) must be inverted for eachpoint u in the reference frame. We directly invert the Equation 5.17 as follows:

∇uf−13dtm =

(K(R− Rtn⊤

)K−1HA

)−1

= H−1A K

(R− Rtn⊤

)−1K−1,

= H−1A K

(I− tn⊤

)−1R⊤K−1.

(5.18)

We invert the term(I− tn⊤

)using Sherman-Morrison matrix inversion formula [K. B. Pe-

tersen]:(I− tn⊤

)−1= I+

tn⊤

1− n⊤t. (5.19)

Plugging Equation 5.19 into Equation 5.18 results in

∇uf−13dtm = H−1

A K

(

I+ tn⊤

1− n⊤t

)

R⊤K−1,

= H−1A K

(

(1− n⊤t)I+ tn⊤

1− n⊤t

)

R⊤K−1,

= λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤K−1,

(5.20)

where λ = 1/(1 − n⊤t) is a homogeneous scale factor that depends on each shapepoint.

Target motion gradients The third term computes the gradients of the warpf3dtm(u;µ) with respect to the motion parameters µ. The resulting Jacobian matrixhas the following form:

∇µf3dtm(u;µ) =(∇

Rf3dtm(u; R) ∇tf3dtm(u; t)

). (5.21)

The derivative of the warp with respect to each rotation parameter is computed asfollows:

∇∆f3dtm(u; ∆) = KR∆K−1HAu, (5.22)

where R∆ is the derivative of the rotation matrix R with respect to the Euler angle∆ = α, β, γ. We trivially compute the derivatives of the warp with respect to thetranslation parameters t in the following:

∇tf3dtm(u; t) = Kn⊤K−1HAu. (5.23)

Notice that Equation 5.23 only depends on the target shape —u, the target coordi-nates in F , and n, its corresponding word plane—but does not depend on the motionparameters anymore; hence, the Equation 5.23 is constant. Plugging Equations 5.22and 5.23 into the Equation 5.21 we obtain the final form of the derivatives:

∇µf3dtm(u;µ) =(KRαK

−1HAu KRβK−1HAu KRγK

−1HAu Kn⊤K−1HAu). (5.24)

62

Page 85: Efficient Model-based 3D Tracking by Using Direct Image Registration

Assemblage of the Jacobian Substituting Equations 5.16, 5.20, and 5.24 backinto Equation 5.15 we have the analytic form of each row of the Jacobian matrix(Equation 5.14):

J⊤ =(J1 J2 J3 J4 J5 J6

), (5.25)

with

J1 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤RαK

−1HAu,

J2 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤RβK

−1HAu,

J3 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤RγK

−1HAu,

J4 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r1n

⊤K−1HAu,

J5 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r2n

⊤K−1HAu,

J6 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r3n

⊤K−1HAu,

(5.26)

where ri for i = 1, . . . , 3 is the i − th column vector of the rotation matrix R—i.e.R = (r1, r2, r3).

5.3.6 Systematic Factorization

We proceed with the systematic factorization of Equation 5.25 using the theoremsand lemmas from Appendix D. We attempt to rewrite the terms of the Jacobian suchthat we compute the products in Equation 5.15 more efficiently. Our first operationis a change of variables v = K−1HAu that rewrites Equations 5.26 as follows:

J1 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤Rαv,

J2 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤Rβv,

J3 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)R⊤Rγv,

J4 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r1n

⊤v,

J5 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r2n

⊤v,

J6 = ∇uT [u]⊤λH−1A K

(I− (n⊤t)I+ tn⊤

)r3n

⊤v.

(5.27)

We factorize each term of Equation 5.27 as we indicated in Section 5.3. Thus, werewrite the i-th row of the Jacobian matrix J as

J(i)⊤ = S(i)⊤M, (5.28)

where S(i)⊤ =(

S(i)⊤1 S

(i)⊤2

)

. We define matrices S(i)⊤1 , and S

(i)⊤2 as follows:

S(i)⊤1 =∇uT [u]⊤(I3 ⊗ n(i)⊤)A+

⟨(I3 ⊗ n(i)⊤)(I9 ⊗ v(i)⊤)− (I3 ⊗ n(i)⊤)(Pπ(9:3) ⊗ v(i)⊤)

⟩B,

S(i)⊤2 =S

(i)⊤1

((I3 ⊗ n(i))⊗ I4

).

(5.29)

63

Page 86: Efficient Model-based 3D Tracking by Using Direct Image Registration

We build the motion matrix M as

M =

([1t

]

⊗ I9

)⟨

vec(R⊤αtR) vec(R

⊤βtR) vec(R

⊤γtR)⟩

036×3

012×3

(

I3 ⊗

[1t

])

R

. (5.30)

The full derivation of the matrices from Equation 5.29 is presented in Appendix E.We assemble the Jacobian J by stacking the N rows J(i)⊤ up into a single matrix.Since matrix M is the same for each row J(i)⊤, we can extract M as a common factorand write the Jacobian matrix as

J = SM, (5.31)

where S is the matrix that we compute by stacking the N entries S(i)⊤1 and S

(i)⊤2 ,

S =

S(1)⊤1 S

(1)⊤2

......

S(N)⊤1 S

(N)⊤2

. (5.32)

Outline of Algorithm HB3DTM We define the HB algorithm for the warp f3dtmas hb3dtm. We use the factorization equations (Equations 5.29) as a basis for ouralgorithm; we show the outline for algorithm hb3dtm in Algorithm 7.

Algorithm 7 Outline of the HB3DTM algorithm.

Off-line: Let µi = µ0 be the initial guess.1: for i = 1 to N do2: Compute S

(i)1 and S

(i)2 using Equation 5.29.

3: end for4: Assemble matrix S using Equation 5.32.

On-line:5: while no convergence do6: Compute the residual function at r(µi) from Equation 5.127: Compute matrix M(µi) using Equation 5.30.8: Assemble the Jacobian: J(µi) = SM(µi) (Equation 5.31).

9: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

10: Additively update the optimization parameters:µi+1 = µi + δµi.11: end while

64

Page 87: Efficient Model-based 3D Tracking by Using Direct Image Registration

5.4 3D Nonrigid Motion

In addition to the motion of the target in space, we allow deformations of the targetitself. This nonrigid motion is caused by elastic deformations of the target (e.g.deformations on an elastic sheet, changes on facial expression), or by nonelasticmotion of target portions (e.g. jaw rotation due to mouth opening).

5.4.1 Nonrigid Morphable Models

As in the rigid case (Section 5.3), we describe the target using nonrigid morphablemodels (3dmm) [Romdhani and Vetter, 2003]: we describe the shape deformationusing a linear combination of modes of deformation that we have obtained by ap-plying pca on a set of shape samples:

s = s0 +K∑

k=1

cksk, s, s0, sk ∈ R

3 (5.33)

where s0 ∈ R3 is the pca average sample shape, sk are the K modes of variation

from pca, and ck are the coefficients of the linear combination (see Figure 5.5).Note that each shape point s has different s0 and sk, but all of them share the sameK coefficients:

S3×N = S03×N+

NK∑

k=1

ckSk3×N

, (5.34)

where S03×N, and Sk3×N

are the matrices that we compute by joining the N shapeaverages s0 and sk.

Figure 5.5: Nonrigid Morphable Models. The 3dmm models the 3D shape as alinear combination of 3D shapes that represent the modes of variation.

5.4.2 Nonrigid Shape-induced Homography

As for the rigid case, we model the target dynamics using shape-induced homo-graphies (see Equation 5.8), but we must account for the target deformation. Weequivalently define the nonrigid shape-induced homography fh6d as a family of plane-induced homographies that relate the projections of a shape that rotates, translatesand deforms in space:

fh6d(x,n;µ) = x′ = K(R+ RBscn

⊤ − Rtn⊤)K−1x, (5.35)

65

Page 88: Efficient Model-based 3D Tracking by Using Direct Image Registration

where x and x′ are the projections of a generic shape point s located on the planeπ⊤ = (n, 1)⊤ (see Figure 5.6), Bs =

(s1, . . . , sk

)is a 3×K matrix that contains the

modes of variations, and c = (c1, . . . , ck)⊤ is the vector of deformation coefficients.

Vector µ contains a parameterization for the rotation matrix R, the translation tand the deformation coefficients c.

Warp Rationale We project the generic shape point s onto the cameras

P = K[I|0] P′ = K[R| − Rt]. (5.36)

We describe the point s by rewriting Equation 5.33 as:

s =s0 +N∑

k=1

cksk, s, s0, sk ∈ R

3,

=s0 + Bsc,

(5.37)

where. We explicitly assume that the target does not deform in the first view, thatis, we image s under P as:

x = K[I|0]

(s0 + Bs0

1

)

. (5.38)

If we encode the deformation between the two views as c, then we image s under P′

as:

x′ =K[R| − Rt]

(s0 + Bsc

1

)

,

=K (Rs0 + RBsc− Rt) .

(5.39)

The world plane π⊤ = (n⊤, 1)⊤ naturally satisfies π⊤s0 = 1; thus, we rewriteEquation 5.39 as follows:

x′ =K(Rs0 + RBscn

⊤s0 − Rtn⊤s0),

=K(R+ RBscn

⊤ − Rtn⊤)s0.

(5.40)

Using Equation 5.38 we rewrite Equation 5.40 as the nonrigid shape-induced ho-mography between projections x ↔ x′ :

x′ = K(R+ RBscn

⊤ − Rtn⊤)K−1x. (5.41)

5.4.3 Change of Variables to the Reference Frame

We can also express the target coordinates in terms of the reference frame F . Asin the rigid case, there exists an affine transformation HA such that x = HAu —seeFigure 5.7. Thus, we write the warp f3dmm that relates shape coordinates in F withthe projections onto a view due to the target motion:

f3dmm = x′ = K(R+ RBscn

⊤ − Rtn⊤)K−1HAu. (5.42)

66

Page 89: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 5.6: Nonrigid shape-induced homographies. (Top-left) We image theaverage shape s1 using P = [I|0] onto the left view. (Top-right) We compute the deformedshape s′1 using Equation 5.33. We respectively image this shape onto the right view byusing P′ = [R| − Rt]. (Middle-left) Close-up of the average shape. Point s1 on theaverage shape lies on the plane with coordinates (π1, 1)

⊤. (Middle-right) Close-up ofthe deformed shape. The plane in which s′1 lies differs from (π1, 1)

⊤ by a rigid bodytransformation and a shape deformation. (Bottom-left) Close-up of the left view. Weimage the average shape point s1 as x1. (Bottom-right) Close-up of the right view. Weimage the deformed point s′1 as x′

1. The correspondence s1 ↔ s′1 induces a homographyH1 between x1 and x′

1. Note that each shape induces a family of homographies that areparameterized by a common R and t (cf. Equation 5.40).

67

Page 90: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 5.7: Deformable warp defined on the reference frame. (Top-left) Shapetriangle mesh and texture on the reference frame F . (Top-right) The image correspondingto the left view of Figure 5.6—the reference image. (Middle) Close-up of the averageshape (see Figure 5.6—Middle-left). (Bottom-left) Close-up of Top-left. We computethe point u1 on the reference frame that corresponds to the average shape point s1 byusing the shape function S. (Bottom-right) Close-up of Top-right. We respectivelyimage the point s1 as x1. Thus, there exists a correspondence u1 ↔ x1 by means of s1.We compute such a correspondence as an affine transformation H1A between u1 and x1.This transformation holds for all the points on the triangle that contains u1. There isa different transformation HiA for each i-th triangle in the shape. Hence, the mappingbetween the reference frame and the reference image is a piecewise affine transformation.

68

Page 91: Efficient Model-based 3D Tracking by Using Direct Image Registration

5.4.4 Optimization Outline

We define the brightness dissimilarity function using Equation 5.42 as follows:

D(F ;µ) = T [F ]− It[f3dmm(F ,n(F);µ)], (5.43)

where n ≡ n(u) is the vector normal to the plane that contains the point S(u), ∀u ∈F . As in the rigid case (Equation 5.11), Equation 5.43 is continuous over F . Werewrite Equation 5.43 as residuals,

r(µ) = T[u]− It+1[f3dmm(u;µ)], (5.44)

where we drop the parameter n from f3dmm for clearness. The corresponding linearmodel for the residuals of Equation 5.44 is

ℓ(δµ) ≡ r(µ) + J(µ)δµ, (5.45)

where

J(µ) =∂It+1[f3dmm(u; µ)]

∂µ

∣∣∣∣µ=µ

. (5.46)

5.4.5 Gradient Replacement

We rewrite Equation 5.46 using the gradient replacement equation (Equation 5.3)as follows:

J(µ) = (∇uT[u])⊤ (∇xf3dmm(u;µ))−1 (∇µf3dmm(u;µ)) . (5.47)

In the following, we analyze each term of the Equation 5.47 separately.

Template gradients on F The first term deals with the template derivatives onthe reference frame F . These derivatives are identical to Equation 5.16 since theydo not depend upon the target dynamics.

Warp gradients on target coordinates The second term handles the gradientsof the warp f3dmm(u;µ) with respect to the target coordinates u. We compute thegradient as follows:

∇uf3dmm(u,n;µ) =K(R+ RBscn

⊤ − Rtn⊤)K−1HA,

=KR(I+ Bscn

⊤ − tn⊤)K−1HA.

(5.48)

Equation 5.47 calls for the inverse form of Equation 5.48, thus

∇uf3dmm(u,n;µ)−1 = H−1

A K(I+ Bscn

⊤ − tn⊤)−1

R⊤K−1. (5.49)

Again, we analytically invert(I+ Bscn

⊤ − tn⊤)by using the Sherman-Morrison

Inversion Formula, [K. B. Petersen]:

(I+ (Bsc− t)n⊤

)−1= I+

(Bsc− t)n⊤

1 + n⊤(Bsc− t). (5.50)

69

Page 92: Efficient Model-based 3D Tracking by Using Direct Image Registration

Plugging (5.50) into (5.49) results in

∇uf3dmm(u,n;µ)−1 = H−1

A K

(

I+(Bs − t)n⊤

1 + n⊤(Bs − t)

)

R⊤K−1,

= H−1A K

(

(1 + n⊤(Bs − t)I+ (Bst)n⊤

1 + n⊤(Bs − t)

)

R⊤K−1,

= λH−1A K

(I+ (n⊤Bsct)I− (n⊤t)I− Bscn

⊤ + tn⊤)R⊤K−1,

(5.51)

where λ = 1/(1 + n⊤(Bsctt)) is a homogeneous scale factor that depends on eachtarget point.

Target motion gradients The third term computes the gradients of the warpf3dmm(u;µ) with respect to the motion parameters µ:

∇µf3dmm(u;µ) =(∇

Rf3dmm(u; R) ∇tf3dmm(u; t) ∇cf3dmm(u; c)

). (5.52)

We compute the derivatives of the warp with respect to each one of the rotationparameters as follows:

∇∆f3dmm(u; ∆) = KR∆K−1HAu+ KR∆BscK

−1HAu,

= KR∆ (I+ Bsc) K−1HAu,

(5.53)

where R∆ is the derivative of the rotation matrix R with respect to the Euler angle∆ = α, β, γ. We trivially compute the derivatives of the warp with respect to thetranslation parameters t in the following:

∇tf3dmm(u; t) = Kn⊤K−1HAu. (5.54)

We additionally compute the derivatives of f3dmm with respect to the deformationparameters c:

∇ckf3dmm(u; ck) = KRBkn⊤K−1HAu, (5.55)

where Bk is the k-th column of the matrix B— i.e. Bk is the k-th mode of deforma-tion.

Assemblage of the Jacobian Substituting Equations 5.53, 5.54, and 5.55 backinto Equation 5.47 yields the analytic form of each row of the Jacobian matrix,

J⊤ =(J1 J2 J3 J4 J5 J6 J7 · · · J(6+NK)

), (5.56)

70

Page 93: Efficient Model-based 3D Tracking by Using Direct Image Registration

with

J1 = ∇uT [u]⊤DRα (I+ Bsc)v,

J2 = ∇uT [u]⊤DRβ (I+ Bsc)v,

J3 = ∇uT [u]⊤DRγ (I+ Bsc)v,

J4 = ∇uT [u]⊤Dr1n⊤v,

J5 = ∇uT [u]⊤Dr2n⊤v,

J6 = ∇uT [u]⊤Dr3n⊤v,

Jk = ∇uT [u]⊤DBkn⊤v, k = 1, . . . , NK ,

(5.57)

where D is short for

D = λH−1A K

(I+ (n⊤Bsct)I− (n⊤t)I− Bscn

⊤ + tn⊤)R⊤, (5.58)

v is short for v = K−1HAu, and ri for i = 1, . . . , 3 is the i− th column vector of therotation matrix R—i.e. R = (r1, r2, r3).

5.4.6 Systematic Factorization

In this Section we introduce the factorization of Equation 5.57. As we will see inChapter 7, the factorization of the nonrigid warp f3dmm does not increase the effi-ciency of the original model (Equation 5.57). The overhead of repeated operationsbetween parameters is a computational burden. We solve the problem by intro-ducing a partial factorization procedure: we only factorize and precompute thosenonrepeated combination of parameters, which is faster than computing the fullfactorization.

Full Factorization

We factorize Equation 5.57 using the theorems and lemmas from Appendix D. Wepresent the full derivation of matrices S and M in Appendix G. As in the rigid case,we proceed with each row of the Jacobian matrix (Equation 5.56) by rewriting themas

J⊤1×(6+K) = S⊤M. (5.59)

We write the row vector S⊤ as

S⊤ =(S⊤1 ,S

⊤1 ,S

⊤1 ,S

⊤4 ,S

⊤5

)

(210+280K+72K2), (5.60)

71

Page 94: Efficient Model-based 3D Tracking by Using Direct Image Registration

where the we define the vectors S⊤i for i = 1, . . . , 5 as follows:

S1 =

(D⊤(I3 ⊗ n⊤Bs)(I3 ⊗ v⊤)

)⊤

(D⊤Bs(IK ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ n⊤Bs)

(I3 ⊗ vec(B⊤

s )v))⊤

−(D⊤Bs(IK ⊗ n⊤)

(I3 ⊗ vec(B⊤

s )v))⊤

(D⊤(I3 ⊗ v⊤)

)⊤

−(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

−(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

1×(63+81K+18K2)

,

S2 =

(D⊤n⊤v(I3 ⊗ n⊤Bs)

)⊤

−(D⊤n⊤v

(Bs(IK ⊗ n⊤)

))⊤

(D⊤n⊤v

)⊤

−(D⊤n⊤v(I3 ⊗ n⊤)

)⊤

(D⊤n⊤v(I3 ⊗ n⊤)

)⊤

1×(21+6K)

,

and

S3 =

(D⊤(n⊤v)B(IK ⊗ (n⊤Bs))

)⊤

−(D⊤(n⊤v)Bs(IK ⊗ n⊤)(I3K ⊗ vec(B)⊤)

)⊤

(D⊤Bn⊤v

)⊤

−(D⊤(I3 ⊗ n⊤)(n⊤v)(I9 ⊗ vec(B)⊤)

)⊤

(D⊤(I3 ⊗ n⊤)(n⊤v)(I9 ⊗ vec(B)⊤)

)⊤

1×(31K+18K2)

.

(5.61)

The matrix M comprises the motion terms; we define this matrix as

M =

M1 0 0 0 0

0 M2 0 0 0

0 0 M3 0 0

0 0 0 M4 0

0 0 0 0 M5

(210+280K+72K2)×(6+K)

, (5.62)

72

Page 95: Efficient Model-based 3D Tracking by Using Direct Image Registration

where

M1 =

vec(R⊤αR(IK ⊗ c⊤))

vec(R⊤αR(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤αR)

vec(R⊤αR(I3 ⊗ t⊤))

vec(R⊤αR(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

,

M2 =

vec(R⊤β R(IK ⊗ c⊤))

vec(R⊤β R(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤β R)

vec(R⊤β R(I3 ⊗ t⊤))

vec(R⊤β R(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

,

M3 =

vec(R⊤γ R(IK ⊗ c⊤))

vec(R⊤γ R(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤γ R)

vec(R⊤γ R(I3 ⊗ t⊤))

vec(R⊤γ R(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

,

M4 =

(IK ⊗ c)R⊤

(I3 ⊗ c)R⊤

R⊤

(I3 ⊗ t)R⊤

(t⊗ I3)R⊤

(21+6K)×3

,

and

M5 =

(I3K ⊗ c)(I3K ⊙ (I3 ⊗ c))

IK((I3 ⊗ t)⊙ IK)((t⊗ I3)⊙ IK)

(31K+18K2)×K

.

(5.63)

73

Page 96: Efficient Model-based 3D Tracking by Using Direct Image Registration

We assemble the Jacobian matrix (Equation 5.47) as

J = SM, (5.64)

where we define S as the concatenation of N rows S⊤ (Equation 5.60).

Partial Factorization

We introduce an alternate decomposition of the Jacobian matrix (Equation 5.47).The main feature of this decomposition is that it does not provide a full factorization:the factorization does not completely separate structure and motion terms, butprovides a partial separation instead. In the experiments we show that this partialfactorization is more efficient than using no factorization at all or using the fullfactorization procedure. The partial factorization provides an speed improvementas it precomputes some operations among shape parameters.

We show the detailed derivation for the partial factorization of Equation 5.47 inAppendix F. The resulting elements for a row of the Jacobian are

J1 =D1D2R⊤Rαt

(I3 + Bscn

⊤)v,

J2 =D1D2R⊤Rβt

(I3 + Bscn

⊤)v,

J3 =D1D2R⊤Rγt

(I3 + Bscn

⊤)v,

J4 =D1D2r1n⊤v,

J5 =D1D2r2n⊤v,

J6 =D1D2r3n⊤v,

Jk =D1D2R⊤Bk, i = 1, . . . , K,

(5.65)

where

D1 = 〈I3P′ + [s1P+ s2Q] Q

′〉 ,

and

D2 =

I3 ⊗

1tc

.

(5.66)

Note that there are shape terms post-multiplying the motion term D2 (see Equa-tion 8.5), so we cannot express the Jacobian as in the full factorization case—i.e.J = SM. We show that the partial factorization (Equations F.9) is far more efficientthan (1) not using factorization at all, and (2) using the full factorization (Equa-tion 5.61). The reason for the latter is that if we try to compute a full factorizationfrom Equations 8.5, the computational cost increases due to the larger size of theinner product matrices. We give the theoretical foundations of this fact in Chapter 7.

74

Page 97: Efficient Model-based 3D Tracking by Using Direct Image Registration

Outline of Algorithm HB3DMM We define the full-factorization HB algorithmfor the warp f3dmm as hb3dmm. We use the factorization equations (Equations 5.59)as a basis for our algorithm; we show the outline for algorithm hb3dmm in Algo-rithm 8.

Algorithm 8 Outline of the full-factorized HB3DMM algorithm.

Off-line: Let µi = µ0 be the initial guess.1: for i = 1 to N do2: Compute S

(i)1 ,S

(i)2 , and S

(i)3 using Equation 5.61.

3: end for4: Assemble the matrix S.

On-line:5: while no convergence do6: Compute the residual function at r(µi) from Equation 5.447: Compute the matrix M(µi) using Equation 5.63.8: Assemble the Jacobian: J(µi) = SM(µi) (Equation 5.59).

9: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

10: Additively update the optimization parameters:µi+1 = µi + δµi.11: end while

Outline of Algorithm HB3DMMSF In this section we define again the HB

algorithm for the warp f3dmm. In this case we deal with the partial factorizationcase; thus, we rename this algorithm as hb3dmmsf to differentiate for the full-factorized algorithm —i.e. hb3dmm algorithm. We use the factorization equations(Equations 8.5) as a basis for our algorithm; we show the outline for algorithmhb3dmmsf in Algorithm 9.

5.5 Summary

• In this section have analysed the HB factorization-based optimization in depth;we have proved that the efficiency of the method relies in (1) a gradient re-placement procedure, and (2) a neat factorization of the Jacobian matrix.

• We have proposed a necessary requirement that constrains the motion modelfor the HB algorithm. We have solved a fundamental criticism on the algorithmHB by proposing a systematic factorization framework.

• We have also introduced two motion/warping models that enable us to effi-ciently track 3D rigid—shape-induced homography—and non-rigid targets—non-rigid shape homography—by using a factorization approach.

75

Page 98: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 9 Outline of the HB3DMMSF algorithm.

Off-line: Let µi = µ0 be the initial guess.1: for i = 1 to N do2: Compute D

(i)1 using Equation 5.66.

3: end forOn-line:4: while no convergence do5: Compute the residual function at r(µi) from Equation 5.126: Compute D2 using Equation 5.66 on the current parameters t and c.7: for i = 1 to N do8: Compute J

(i)1 , . . . , J

(i)6+k using Equation 8.5.

9: end for10: Assemble the Jacobian J(µi).

11: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

12: Additively update the optimization parameters:µi+1 = µi + δµi.13: end while

76

Page 99: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 6

Compositional Algorithms

In this chapter we discuss compositional algorithms in greater depth than in Sec-tion 3.4. We organize this chapter as follows: in Section 6.1.3 we provide the basicassumptions on compositional image registration; besides, we give some insightsinto the workings of the IC algorithm, specially in the reverse role of template andimage; moreover, we introduce the Efficient Forward Compositional algorithm, andwe show that IC can be derived as a special case of this algorithm. Section 6.2 in-troduces two basic requirements that compositional algorithms must hold. Finally,Section 6.3 studies with detail other compositional methods such as the GeneralizedInverse Compositional algorithm.

6.1 Unravelling the Inverse Compositional Algo-

rithm

IC is known to be the fastest algorithm for image registration [Baker and Matthews,2004; Buenaposada et al., 2009]. Although it is widely used, [Brooks and Arbel, 2010;Dowson and Bowden, 2008; Guskov, 2004; Megret et al., 2008, 2006; Munoz et al.,2005; Papandreu and Maragos, 2008; Romdhani and Vetter, 2003; Tzimiropouloset al., 2011; Xu and Roy-Chowdhury, 2008], it is not still well understood howthe IC algorithm works in terms of traditional gradient descent algorithms. Wesummarize these questions in the following:

Convergence of compositional algorithms In GD optimization the conver-gence is guaranteed by construction: the algorithm looks for a set of parametersxk+1 ∈ R

N such that the value of the cost function F : RN 7→ R decreases—i.e.F(xk+1) < F(xk). This problem is solved by expressing xk+1 as xk+1 = xk + h, forsome unknown h ∈ R

N ; notice that this is equivalent to the additive update stepof the additive image registration algorithms (see Section 3.3). The values of h arecomputed by expanding F(xk+1) by Taylor series (cf. [Madsen et al., 2004; Press

77

Page 100: Efficient Model-based 3D Tracking by Using Direct Image Registration

et al., 1992]) as follows:

F(xk+1) ≃ F(xk) + h⊤F′(xk).

Vector h is the descent direction for F at xk if h⊤F′(xk) < 0; hence the require-ment F(xk+1) < F(xk) holds, as F(xk+1) − F(xk) < 0. Then, the next iteration iscomputed by using xk+1 = xk + h.

In the case of compositional algorithms we cannot use the previous approach: thenext iteration in the search space is not computed as xk+1 = xk + h but as xk+1 =Ψ(xk,h) for some composition function Ψ. The algorithm is not a GD method inthe strict sense. Convergence is assured in GD methods by construction: the costfunction value at the next step is always lower than the previous one (cf. [Madsenet al., 2004; Press et al., 1992]). However, such an statement cannot be made forthe IC algorithm as it is not possible to relate the values of the objective functionbetween two steps due to a non-additive step.

Origins of inverse composition In Section 3.4.2 we showed that the crucialpoint in the improvement of efficiency in the IC with respect to the FC algorithm isto rewrite the brightness error function: the FC brightness dissimilarity function,

D(X ; δµ) = T (X )− It(f(f(X ; δµ);µ)), (6.1)

is rewritten in the IC brightness dissimilarity as

D(X ; δµ) = T (f−1(X ; δµ))− It(f(X ;µ)). (6.2)

The vector δµ comprises the optimization variables—µ is deemed as constant. Thelocal miminizer based on the residuals of Equation 6.2 has a constant Jacobian (cf.Section 3.4.2) unlike the minimizer based on Equation 6.1.

In the original formulation of the IC algorithm [Baker and Matthews, 2001, 2004],Baker and Matthews simply stated Equation 6.2 without any further explanation:they did not justify how to transform the FC dissimilarity (Equation 6.1) into theIC dissimilarity (Equation 6.2). Here we show that this transformation depends ona change of variables in Equation 6.1.

Can we always use Inverse Composition? [Baker and Matthews, 2004] statethat we can reverse the roles of template and image provided that the followingrequirements on the warp f are satisfied:

1. The warp is closed under composition (i.e. f(x;µ′) = f(f(x; δµ);µ)).

2. The warp has an inverse f−1 such that x = f−1(f(x;µ);µ), ∀µ.

3. The warp identity is µ = 0 (i.e. f(x;0) = x).

78

Page 101: Efficient Model-based 3D Tracking by Using Direct Image Registration

These requirements imply that the warp f must form a group [Baker and Matthews,2004]. However, the IC algorithm is not suitable for certain problems such as 2.5dtracking [Matthews et al., 2007]: even when the group constrain holds, the algorithmdoes not properly converge. We introduce two requirements that effectively constrainthe warp.

6.1.1 Change of Variables in IC

We define a new variable U = f(X ; δµ), such that X = f−1(U ; δµ) also holds. Wesubstitute this variable in Equation 6.1 as follows:

D(U ; δµ) = T (f−1(U ; δµ))− It(f(U ;µ)). (6.3)

Note that Equations 6.1 and 6.3 are equivalent by construction: we have just changedthe domain in which the functions are defined (see Figure 6.1), but the coordinatesX and f−1(U ; δµ) are exactly the same—ditto for U and f(X ; δµ). Also note thatEquation 6.3 is similar to the IC dissimilarity function—cf. Equation 6.2. The onlydifference is that both equations are defined in different domains : Equation 6.3 isdefined in domain U and Equation 6.2 (IC) is defined in X . This difference is nottrivial at all: X and U are only identical if δµ = 0 (see Figure 6.1); hence, the IC

problem should be solved in the unknown domain U—we don’t know the coordinatesU as they depend on the unknown variables δµ. Thus, we are facing with a chicken-and-egg situation: we have to know δµ to solve for δµ. In [Baker and Matthews,2004] they just ignore the problem and they solve δµ using Equation 6.2, whichraises the following question:

How does the IC algorithm (Equation 6.2) converge, if it is defined in the wrongdomain?

We shall show that this is not always true; we demonstrate this assertion by in-troducing a new FC algorithm that is equivalent to the IC under certain assumptionsonly.

6.1.2 The Efficient Forward Compositional Algorithm

We define the Efficient Forward Compositional (EFC) algorithm as a FC algorithmwith a constant Jacobian: the EFC is similar to the IC—both are GN-like methodswith constant Jacobian. Their critical difference is that EFC does not reverse the rolesof template and image: the brightness dissimilarity is linearized in the image—as inFC—and not in the template—as the IC algorithm does.

First, we rewrite Equation 6.1 such that the variable t appears in explicit form,

D(X ; δµ, t+ 1) = T (X )− I(f(f(X ; δµ);µt), t+ 1), (6.4)

79

Page 102: Efficient Model-based 3D Tracking by Using Direct Image Registration

T It+1

Figure 6.1: Change of variables in IC. (Top-left) We overlay the target region X(yellow square) onto the template T . (Top-right) We transform the target region X bymeans of f(f(X ; δµ);µt), and we overlay it onto the image It+1 (yellow square). (Bottom-left) We overlay the target region U = f(X ; δµ) (green square) onto the template T . Wealso depict the transformed region f−1(U ; δµ) (dotted blue line). (Bottom-right) Wetransform the target region U by means of f(U ;µt), and we overlay it onto the imageIt+1 (green square). Notice that the regions X and f−1(U ; δµ) delimit identical imageareas—ditto for f(f(X ; δµ);µt) and f(U ;µt)—but X and U do not.

80

Page 103: Efficient Model-based 3D Tracking by Using Direct Image Registration

where the dissimilarity is now a two-variable function. Let tτ = t + τ be a timeinstant in the range t ≤ t+ τ ≤ t+1 such that its brightness constancy assumption,

I(f(X ;µtτ); tτ ) = T (X ), (6.5)

holds for parameters µtτ. We rewrite Equation 6.4 as a residuals vector as follows:

rEFC(µtτ; µ, τ) ≡ T(x)− I(f(f(x; µ);µtτ

), τ), (6.6)

where µ are registration parameters such that

T(x) = I(f(f(x; µ);µtτ), τ). (6.7)

We approximate the residuals function rEFC(µtτ;0 + δµ, τ) by using a first order

Taylor expansion at µ = 0 and τ = tτ ,

rEFC(µtτ; δµ, t+1) ≡ rEFC(µtτ

;0, tτ )+∇µrEFC(µtτ;0, tτ )δµ+∇τrEFC(µtτ

;0, tτ )∆t+O(δµ,∆t)2,(6.8)

where ∆t = t+ 1− tτ ,

rEFC(µtτ;0, tτ ) = T(x)− I(f(f(x;0);µtτ

); tτ ) = T(x)− I(f(x;µtτ); tτ ), (6.9)

∇µrEFC(µtτ;0, tτ ) = −

∂I(f(f(x; µ);µtτ), tτ )

∂µ

∣∣∣∣µ=0

, (6.10)

and

∇τrEFC(µtτ;0, tτ ) = −

∂I(f(f(x;0);µtτ), τ)

∂τ

∣∣∣∣τ=tτ

. (6.11)

This linear approximation is valid for any µtτprovided that δµ and ∆t are small

enough. We can then make the additional approximation

∂I(f(f(x;0;µtτ), τ)

∂τ

∣∣∣∣τ=tτ

∆t ≈ I(f(f(x;0);µtτ), t+1)−I(f(f(x;0);µtτ

), tτ )+O(∆t)2.

(6.12)Inserting Equation 6.12 into Equation 6.8 we get

rEFC(µt+τ ; δµ, t+ 1) ≃ ℓ(δµ) ≡ rEFC(µt+τ ;0, t+ 1) + JEFC(µt+τ ;0, tτ )δµ, (6.13)

whererEFC(µt+τ ;0, t+ 1) = T(x)− I(f(f(x;0);µt+τ ), t+ 1), (6.14)

JEFC(µt+τ ;0, tτ ) =∂I(f(f(x; µ);µt, tτ )

∂µ

∣∣∣∣µ=0

=

[∂I(f(x;µtτ

), tτ )

∂x

∣∣∣∣x=x

]⊤∂f(x; µ)

∂µ

∣∣∣∣µ=0

.

(6.15)The first term of Equation 6.15 is the gradient of the warped image at time tτ . Thesecond is the Jacobian of the warp at µ = 0, which is constant. From Equation 6.5we know that the image at time tτ warped by µtτ

is identical to the template image.

81

Page 104: Efficient Model-based 3D Tracking by Using Direct Image Registration

Note that, actually, only the images associated to time instants t and t + 1 shallbe available. This is not a problem, since we are only interested in substituting thegradient of the warped image by that of the template.

Therefore, the gradient of the warped image should be equal to the gradient ofthe template:

∂I(f(x;µtτ); tτ )

∂x

∣∣∣∣x=x

=∂T(x)

∂x

∣∣∣∣x=x

. (6.16)

Equation 6.16 holds if the GEE does. In this case, we may rewrite Equation 6.15 as

JEFC(0) =

[∂T(x)

∂x

∣∣∣∣x=x

]⊤∂f(x; µ)

∂µ

∣∣∣∣µ=0

, (6.17)

which is constant by construction as it only depends on x and µ = 01—thus, weremove the dependencies on µt+τ and tτ from Equation 6.17.

Outline of the EFC Algorithm We compute the local minimizer of ℓEFC(δµ) byusing:

δµ = −(JEFC(0)

⊤JEFC(0))−1

JEFC(0)⊤rEFC(0), (6.18)

which will be iterated until convergence using µt+1 = µt δµ as update rule—thealgorithm is still forward compositional. We outline the algorithm in Figure 6.2 andAlgorithm 10.

Algorithm 10 Outline of the Efficient Forward Compositional algorithm.

Off-line:1: Compute the constant Jacobian, JEFC(0), by using Equation 6.17.

On-line: Let µi = µ0 be the initial guess.2: while no convergence do3: Compute the residual function at rEFC(µi;0, t+ 1) from Equation 6.6.4: Compute the search direction: δµi =

−(JEFC(0)

⊤JEFC(0))−1

JEFC(0)⊤rEFC(µi;0, t+ 1).

5: Update the optimization parameters:µi+1 = µi δµi.6: end while

6.1.3 Rationale of the Change of Variables in IC

We show now that we may transform any EFC problem (Equation 6.1) into itscorresponding IC equivalent (Equation 6.2) in the following proposition:

1We thankfully acknowledge J. M. Buenaposada for rewriting the FC algorithm by using theGEE

82

Page 105: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 6.2: Forward compositional image registration. We compute the targetregion on the frame t+ 1 (Image) using the parameters of frame t (µt). Using the targetregion at the Template we compute a Dissimilarity Measure. We linearize the dissimilaritymeasure around 0 and we compute descent direction in the search space using Least-squares. We update the parameters using composition and we recompute the target regionon frame t+ 1 using the new parameters. The process is iterated until convergence.

83

Page 106: Efficient Model-based 3D Tracking by Using Direct Image Registration

Proposition 4. The EFC problem is equivalent to the IC problem.

Proof. The GEE holds by definition of EFC. Thus, Corollary 4 holds up to a firstorder approximation. Let us assume there exists an open set Ω ∋ x0, and some δµsuch that f−1(x0; δµ) = x′ ∈ Ω, then the bcc holds both in x0 and x′,

T [x0] = It+1[f(f(x0; δµ);µt)], (6.19)

andT [x′] = It+1[f(f(x

′; δµ);µt)]. (6.20)

Thus, we may rewrite Equation 6.20 as

T [f−1(x0; δµ)] = It+1[f(f(f−1(x0; δµ)︸ ︷︷ ︸

x′

; δµ)

︸ ︷︷ ︸x0

;µt)],

which is the IC equivalent formulation of the Equation 6.19.

6.1.4 Differences between IC and EFC

Although both IC and EFC are equivalent according to Proposition 4, there aresubtle differences between them. In Section 6.2 we introduced the notion that theIC dissimilarity function,

D(X ; δµ) = T (f−1(X ; δµ))− It(f(X ;µ)). (6.21)

is the result of a change of variables in the EFC dissimilarity function,

D(X ; δµ) = T (X )− It(f(f(X ; δµ);µ)).

However, the original IC formulation computes the warp function at the template,not its inverse (cf. [Baker and Matthews, 2004]),

D(X ; δµ) = T (f(X ; δµ))− It(f(X ;µ)). (6.22)

Equations 6.21 and 6.22 are equivalent but they do not yield the same result; theparameters δµ computed from Equation 6.21 are different than those computedfrom Equation 6.22, but both yield the same parameters µt+1: the update functionfor Equation 6.21 is µt+1 = µt δµ, but Equation 6.22 computes µt+1 as µt+1 =µt δµ

−1.Thus, although equivalent, EFC has an immediate advantage over the original IC:

efficiency in the EFC algorithm does not depend on any inversion, neither in the warpnor in the parameters update function. Note that inversion may pose a problem forwarps such as 3DMM or AAM—as pointed out in [Romdhani and Vetter, 2003].

84

Page 107: Efficient Model-based 3D Tracking by Using Direct Image Registration

6.2 Requirements for Compositional Warps

In this section we state two requirements that every efficient compositional algorithmshould meet; note that we refer to efficient methods, that is, the EFC, IC, and GIC

algorithms. We intentionally leave out the FC algorithm, as it must verify only oneof the requirements.

6.2.1 Requirement on Warp Composition

The first requirement constrains the properties of the motion warp; [Baker andMatthews, 2004] states a similar property by requiring the warp f to be closedunder composition, that is

f(x;µt+1) = f(f(x; δµ);µt), (6.23)

for some parameters µt, µt+1, and δµ. We generalize this property by allowingthe composition between different warps: f ,g : X × P 7→ X are two warps map-ping domain X into itself, that are parameterized in the domain P ; we state therequirement as follows:

Requirement 2. The composition f g must be a warp f , that is, for any µ ∈ Pthere exist δµ ∈ P and µ′ ∈ P such that

f(X ;µ′) = f(g(X ; δµ);µ).

This generalization is useful for some warps—e.g. for plane+parallax homogra-phies h3dpp (see Section C)—although for most of the cases we can safely assumethat g ≡ f . Besides, there must exist the identity parameters µ0 such that

g(X ;µ0) = X .

This constraint is similar to the one proposed in [Baker and Matthews, 2004] whereµ0 = 0. Requirement 2 is mandatory to express the dissimilarity function

D(X ;µt+1) = T (X )− It+1(f(X ;µt+1)), (6.24)

into the equivalent error function

D(X ;µt+1) = T (X )− It+1(f(f(X ; δµ);µt)), (6.25)

which is intrinsic to every compositional algorithm—FC, EFC, IC, and GIC.

6.2.2 Requirement on Gradient Equivalence

The second requirement is absolutely necessary to achieve efficiency in compositionalalgorithms.

85

Page 108: Efficient Model-based 3D Tracking by Using Direct Image Registration

Requirement 3. A GN-like algorithm with constant Jacobian is feasible if thebrightness error GEE holds.

Requirement 3 lets us transform the FC algorithm into the EFC constant Jacobianalgorithm (see Section 6.1.2). Furthermore, Requirement 3 allows to effectively per-form the change of variables needed by IC algorithm (see Section 6.1.3). Notice thatRequirement 3 is similar to the Requirement 1 proposed for additive image registra-tion algorithms; although both requirements are identical, we distinguish betweenthem to have separate requirements for additive and compositional approaches.

6.3 Other Compositional Algorithms

[Baker et al., 2004b] introduced FC and IC as the basic algorithms for compositionalimage registration. However, other authors have also proposed modifications to thesealgorithms to extend their functionality. In this section we review the GeneralizedInverse Compositional, which extends the IC to use other optimization methodsthan GN.

6.3.1 Generalized Inverse Compositional Algorithm

We introduced the GIC algorithm [Brooks and Arbel, 2010] in Section 3.5: themotivation under the GIC was to create an efficient algorithm—i.e. with constantJacobian—that could be used with other optimization methods with additive updatedifferent than GN such as bgfs, etc.

We now review the GIC algorithm using the change-of-variable procedure thatwe applied to IC. We recall the IC residuals from Equation 3.20,

r(δµ) ≡ T(f(x; δµ))− It+1(f(x;µt)). (6.26)

We rewrite Equation 6.26 introducing a function ψ such that δµ = ψ(µt,µt+1); notethat we can always define ψ because µt+1 and µt are related through f(x;µt+1) =f(f(x; δµ);µt). Thus, we rewrite Equation 6.26 as follows:

r(δµ) ≡ T(f(x;ψ(µt,µt+1)))− It+1(f(x;µt)). (6.27)

Notice that Equation 6.27 does not explicitly depend on δµ, but on µt+1. However,the GIC algorithm implicitly defines this relationship as µt+1 = µt + δµ (as in theLK algorithm). Substituting this constrain in Equation 6.27 we have

r(µt + δµ) ≡ T(f(x;ψ(µt,µt + δµ)))− It+1(f(x;µt)). (6.28)

We linearize Equation 6.28 around µt by using Taylor series,

r(µt + δµ) ≃ ℓ(δµ) ≡ r(µt) + J(µt)δµ, (6.29)

86

Page 109: Efficient Model-based 3D Tracking by Using Direct Image Registration

where

r(µt) =T(f(x;ψ(µt,µt)))− It+1(f(x;µt)),

=T(f(x;0))− It+1(f(x;µt)),(6.30)

and

J(µt) =∂T(f(x;ψ(µt; µ)))

∂µ

∣∣∣∣µ=µt

. (6.31)

Notice that ψ(µt,µt) = 0 by definition. Unlike the Jacobian in the IC algorithm(Equation 3.22), the Jacobian in Equation 6.31 is not constant as it depends on µt.However, we can obtain a pseudo-constant Jacobian from Equation 6.31 by usingthe chain rule:

J(µt) =∂T(f(x;ψ(µt; µ)))

∂µ

∣∣∣∣µ=µt

,

=∂T(f(x; ψ))

∂ψ

∣∣∣∣∣ψ=ψ(µt,µt)

∂ψ(µt, µ)

∂µ

∣∣∣∣µ=µt

,

=∂T(f(x; ψ))

∂ψ

∣∣∣∣∣ψ=0

JIC(0)

∂ψ(µt, µ)

∂µ

∣∣∣∣µ=µt

,

=JIC(0)∂ψ(µt, µ)

∂µ

∣∣∣∣µ=µt

.

(6.32)

The Jacobian matrix J(µt) is not constant: JIC(0) is constant but ∇µψ(µt) is not—it depends on µt. However, computing J(µt) is efficient as we only need to compute∇µψ(µt), which is a square matrix of the size of the number of parameters, and theproduct matrix of Equation 6.32.

Again, we compute the local minimizer of Equation 6.29 using least-squares:

δµ = −(J(µt)

⊤J(µt))−1

J(µt)⊤r(µt). (6.33)

Unlike in the IC algorithm, where the update is compositional, the GIC algorithmadditively updates the current parameters with the descent direction. We outlinethe algorithm in Figure 6.3 and Algorithm 11.

Discussion of the GIC algorithm The GIC algorithm expresses the compositionaloptimization in terms of the usual gradient descent formulation. However, the wholeprocedure still depends upon the implicit change of variables of IC residuals (cf.Equations 6.26 and 6.27). Thus, GIC must comply the same requirements that IC,namely Requirements 2 and 3. This implication reduces the number of warps thatprovide good convergence for GIC. One could infer at a first glance that GIC is aslower copy of IC; nonetheless, we must take into account of the impact on thealgorithm performance due to the use of more powerful optimization schemes suchas bgfs or Quasi-Newton [Brooks and Arbel, 2010].

87

Page 110: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 11 Outline of the Generalized Inverse Compositional algo-rithm.On-line: Let µi = µ0 be the initial guess.1: while no convergence do2: Compute the residual function at r(µi) from Equation 6.26.3: Linearize the dissimilarity: J = ∇µr(0)∇µψ(µt), using Equation 6.32.

4: Compute the search direction: δµi = −(J(µi)

⊤J(µi))−1

J(µi)⊤r(µi).

5: Update the optimization parameters:µi+1 = µi + δµi.6: end while

Figure 6.3: Generalized inverse compositional image registration. We computethe target region on the frame t+ 1 (Image) using the parameters of frame t (µt). Usingthe target region at the Template we compute a Dissimilarity Measure. We linearize thedissimilarity measure around 0 and we compute descent direction in the search space usingLeast-squares. We update the parameters using composition and we recompute the targetregion on frame t+1 using the new parameters. The process is iterated until convergence.

88

Page 111: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 6.1: Relationship between compositional algorithms and warps

Warp Name FC EFC IC GIC

2D Affine(2DAFF) YES YES YES YES

Homography(H8) YES YES YES YES

Plane-inducedHomography

(H6)NO(1) NO(1) NO(1) NO(1)

Plane+Parallax(H6PP) YES NO(2) NO(2) NO(2)

3D Rigid Body(3DRT) YES NO(2) NO(2) NO(2)

(1) Does not meet Requirement 2(2) Does not meet Requirement 3

6.4 Summary

• In this section we have analysed in detail the compositional image alignmentapproach, and introduced two requirements that a warp function must satisfyto be used within this paradigm.

• We have introduced the Efficient Forward Compositional (EFC), a new com-positional algorithm, and proved that it is equivalent to the well-known IC

algorithm. The EFC algorithm provides a new interpretation of IC that allowus to state a basic requirement such that the algorithm is valid.

• We have also reviewed the GIC image alignment algorithm, and proved thatits requirements for convergence are the same to those of IC.

Table 6.2 summarizes the requirements and the principal characteristics of the al-gorithms reviewed in Chapters 5 and 6. Table 6.1 compares each compositionalalgorithm to the warps introduced in Chapter 4. We consider whether a warp issuitable for an optimization algorithm or not—YES/NO in the table—in terms ofthe compliance of the warp with the algorithm requirements.

89

Page 112: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 6.2: Requirements for Optimization Algorithms

Warp Name Jacobian Update RuleWarp

Requirements

Lucas-Kanade(LK)

Variable Additive None

Hager-Belhumeur(HB)

Part-constant(1) Additive Requirement 1

ForwardCompositional

(FC)Variable Compositional Requirement 2

InverseCompositional

(IC)Constant Compositional

Requirement 2,and

Requirement 3

Efficient ForwardCompositional

(EFC)Constant Compositional

Requirement 2,and

Requirement 3

Generalized InverseCompositional

(GIC)Part-Constant(2) Additive

Requirement 2,and

Requirement 3

(1) The Jacobian is partially factorized(2) The Jacobian is post-multiplied by a nonconstant matrix

90

Page 113: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 7

Computational Complexity

In this chapter we study the resources that the registration algorithms require tosolve a problem—i.e. their computational complexity. We organize the chapter asfollows: Section 7.1 describes the measures and the criteria that we shall use tocompare the complexities of the algorithms. Section 7.2 introduces the algorithmsthat we shall experimentally evaluate in later chapters; we propose a naming con-vention for the algorithms and we define two sets of testing algorithms: additiveand compositional algorithms. Finally, in Section 7.3, we compute the theoreticalcomplexity for each algorithm, and we provide a some comparisons between them.

7.1 Complexity Measures

We can measure the complexity by either using (1) the time that an algorithm re-quires or time complexity, or (2) the computational resources—i.e. memory space—that the algorithm requires or computational complexity. Both measures are oftenexpressed as a function of the length of the input: the running time of an algorithmdepends on the size of the input (e.g. larger problems require more run-time orlarger memory). In analysis of algorithms, which provides theoretical estimates forthe resources needed by an algorithm, the big O notation describes the usage ofcomputational resources as a function of the problem size. For example, findingan item in an unsorted vector takes O(n) time, where n is the length of the array.Run-time is a function of the length, so although we increase the size of the vector,the time complexity of the algorithm would be still O(n).

7.1.1 Number of Operations

Despite that big O notation is the most usual measure in algorithm analysis, weprefer to define our own measure. The reason is that big O notation is not suitablefor fine-grain comparisons: for example, both IC and HB algorithms yield O(n)complexity while we know that the former is more efficient than the latter. Weprovide a fine-grained comparison by using the number of operations. We define thenumber of operations of an algorithm, Θ, as the total aggregate of multiplications

91

Page 114: Efficient Model-based 3D Tracking by Using Direct Image Registration

and additions for each step of the algorithm. The number of operations of somealgorithm alg, Θalg, is written as

Θalg =< number of multiplications > M+ < number of sums > A, (7.1)

where M and A respectively stand for multiplications and additions. Notice thatthe + operator is used only for the sake of notation: The operator indicates thatthe total number of operations is the aggregate of the number of multiplications andthe number of additions, but it is not an actual addition. As in big O notation,the number of operations of an algorithm depends on the problem size. We use twovariables to take into account the scale of the problem: NΩ represents the size of thevisible target region, and K represents the number of deformation basis.

7.1.2 Complexity of Matrix Operations

In this section we describe the number of operations for the most common matrixoperations: the dot product, the matrix product, and the matrix summation.

Vector Scalar Product : If a = (a1, . . . , an)⊤ and b = (b1 . . . , bn)

⊤ are n × 1vectors, we define their scalar or dot product as

a⊤b =(a1 · · · an

)

b1...bn

= a1 × b1 + . . .+ an × bn.

(7.2)

We compute the number of operations of the vector scalar product by countingthe number of products and sums in Equation 7.2,

Θa⊤b =< n > M+ < (n− 1) > A. (7.3)

The complexity depends on the number of elements of the vector, n.

Matrix Product : If A is a m × n matrix, and B is a n × p matrix, then matrixproduct AB is

AB =

a11 . . . a1n...

. . ....

am1 . . . amn

b11 . . . b1p...

. . ....

bn1 . . . anp

=

[

a⊤1

...... a⊤

m

]

[b1 · · · bp

]

=

a⊤1 b1 · · · a⊤

1 bp

.... . .

...a⊤mb1 · · · a⊤

mbp

,

(7.4)

92

Page 115: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 7.1: Complexity of matrix operations.

Operation Multiplications Additions

a⊤1×nbn×1 n (n− 1)

an×1b⊤1×n n2 0

An×mBm×p mpn mp(n− 1)

An×m + Bn×m 0 mn

where a⊤i = (ai1, . . . , ain) is the i-th row of matrix A, and bj = (b1j, . . . , bnj)

is the j-th column of matrix B. In the final statement of Equation 7.4 wereformulate the matrix product as m×p dot products of their columns. Hence,we compute the number of operations of the matrix product from the Θ of thescalar product:

ΘAB = (mp)Θa⊤b =< mpn > M+ < mp(n− 1) > A. (7.5)

Matrix Addition : If A and B are m× n matrices, then their sum is

A+ B =

a11 + b11 · · · a1n + b1n...

. . ....

am1 + bm1 · · · amn + bmn

. (7.6)

There are no multiplication operations in the definition, so the complexity ofsumming up two matrices is

ΘA+B =< mn > A. (7.7)

We summarize the complexities of matrix operations in table 7.1.

7.1.3 Comparing Algorithm Complexities

When computing the complexity of an algorithm—using the operations from Ta-ble 7.1—we shall compare two algorithms as fair as we can using the followingassumptions.

Only Count Non-Zero Operations If we compute the product of two matrices,we only take into account those operations that affect non-zero entries. A typicalexample is when a matrix multiplication involves a Kronecker product. Let a and b

93

Page 116: Efficient Model-based 3D Tracking by Using Direct Image Registration

be n×1 vectors and Im be the m×m identity matrix; the product (Im⊗a)(Im⊗b⊤)is

(Im ⊗ a)(Im ⊗ b⊤) =

a 0 · · · 00 a · · · 0...

.... . .

...0 0 · · · a

b⊤ 0⊤ · · · 0⊤

0⊤ b⊤ · · · 0⊤

......

. . ....

0⊤ 0⊤ · · · b⊤

=

ab⊤ 0 0 0

0 ab⊤ 0 0...

.... . .

...

0 0 0 ab⊤

,

(7.8)

where 0 is the 3 × 1 zero vector, and 0 is the 3 × 3 zero matrix. The complexityof this matrix product is Θmatrixp = (m3n2)m + (m2n2(m − 1))a. Nonetheless,the last statement of Equation 7.8 shows that many of these sums and productsoperate over zero entries of the matrices; hence, these operations can be spared. Infact, the non-zero operations are on the block-diagonal of the result matrix: thediagonal comprises m matrices whose complexities are ΘMATRIXP = (n2)m operationseach. Thus, the total number of non-zero operations for Equation 7.8 is Θmatrixp =(mn2)m.

Neglect Duplicated Operations If an operation has to be computed severaltimes, we shall count just once that operation. A typical example is a matrixproduct with repeated entries (as in the Kronecker product): in Equation 7.8, eachmatrix in the block-diagonal of the result matrix is the product ab⊤; we shall denotethe product complexity as Θmatrixp = (n2)m instead of (mn2)m.

Matrix Chain Multiplication As we showed in Section 5.3, the factorization ofthe Jacobian matrix is generally not unique. Furthermore, given a single factoriza-tion in form of a chain of matrix products, there are many ways to choose the orderin which we perform the multiplications. The Matrix chain multiplication or matrixparenthesization is a well known optimization problem [Cormen et al., 2001]. Wemay use dynamic programming [Neapolitan and Naimipour, 1996] to compute themost efficient multiplication order for our factorization.

7.2 Algorithm Naming Conventions

We define a testing algorithm as the combination of an optimization scheme anda warp. We write a given algorithm by using fixed size fonts—e.g. HB3DMM is theunion of the optimization algorithm HB and the warp 3DMM.

94

Page 117: Efficient Model-based 3D Tracking by Using Direct Image Registration

7.2.1 Additive Algorithms

We present the testing algorithms that use additive update in Table 7.2. For con-venience, we keep the naming convention for optimization algorithms that we usedin Chapter 3; warp names are accordingly taken from Table 4.1 and Chapter 5.

Table 7.2: Additive testing algorithms.

Algorithm Warp Optimization Commentaries

LK3DTM

3D Shape-inducedHomography

(f3dtm)

LK

(Algorithm 2)

We use the original GNalgorithm from [Lucasand Kanade, 1981].

HB3DTM

3D Shape-inducedHomography

(f3dtm)

HB

(Algorithm 3)We implement Algo-rithm 7 (see page 64).

HB3DRT3D Rigid Body

(f3drt)HB

(Algorithm 3)

We use the originalalgorithm from [Seppand Hirzinger, 2003].

LK3DMM

NonrigidShape-inducedHomography

(f3dmm)

LK

(Algorithm 2)

We use the original GNalgorithm from [Lucasand Kanade, 1981].

HB3DMM

NonrigidShape-inducedHomography

(f3dmm)

HB

(Algorithm 3)We implement Algo-rithm 8 (see page 75).

HB3DMMSF

NonrigidShape-inducedHomography

(f3dmm)

HB

(Algorithm 3)We implement Algo-rithm 9 (see page 76).

LKH8

CartesianHomography

(fh82d)

LK

(Algorithm 2)

We use the original GNalgorithm from [Lucasand Kanade, 1981].

LKH6

Plane-inducedHomography

(fh6p)

LK

(Algorithm 2)—

95

Page 118: Efficient Model-based 3D Tracking by Using Direct Image Registration

7.2.2 Compositional Algorithms

We present the testing algorithms that use compositional update in Table 7.3. Forconvenience, we keep the naming convention for optimization algorithms that weused in Chapter 3; warp names are accordingly taken from Table 4.1 and Chapter 5.

Table 7.3: Additive testing algorithms.

Algorithm Warp Optimization Commentaries

ICH8

CartesianHomography

(fh82d)

IC

(Algorithm 5)—

GICH8

CartesianHomography

(fh82d)

GIC

(Algorithm 11)—

ICH6

Plane-inducedHomography

(fh6p)

IC

(Algorithm 5)—

GICH6

Plane-inducedHomography

(fh6p)

GIC

(Algorithm 11)—

FCH6PP

Plane+ParallaxHomography

(fh6pp)

FC

(Algorithm 4)—

7.3 Complexity of Algorithms

In this section we show the computational complexity of several testing algorithms.We are specially interested in comparing the complexities for additive algorithms:we show that the extensions to the HB algorithm to track 3D targets— either rigidor nonrigid—are much more efficient that their LK counterparts.

A Word About Implementation The complexity of a testing algorithm is re-lated to an iteration of the optimization loop: the total complexity is the sum ofthe complexities of each step of the algorithm. We ensure the scalability of ourestimations of complexity by using the following variables:

96

Page 119: Efficient Model-based 3D Tracking by Using Direct Image Registration

1. NΩ: is the number of visible points in the current view—i.e. NΩ = ‖Ω‖,see Page 122. This variable measures how many times a given operation isrepeated (once per visible point in the target region).

2. K: is the number of deformation components of a morphable model.

Another implementation issue is how we deal with derivatives. We compute suchderivatives as image gradients and the Jacobian by using central differences [Presset al., 1992],

Jµi=

r(µi + δ)− r(µi − δ)

2δ, (7.9)

where Jµi= ∇µi

r(µ) is the i-th column of the Jacobian matrix of the iterativeoptimization. We also compute the derivatives of function ψ in GIC algorithm (seeEquation 6.32) by using numerical differentiation instead of explicit methods.

7.3.1 Additive Algorithms

Tables 7.4–7.9 show the complexities for the additive algorithms from Table 7.2. Webreak down every algorithm in its basic steps; we compute the number of operationsfor each step by using the conventions in Section 7.1. The detailed steps of thederivation are shown in Appendix H. The final complexity is the summation of thenumber of operations for each step of the algorithm.

Table 7.4: Complexity of Algorithm LK3DTM.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J 894NΩ 685NΩ

Compute f3dtm using Equation 5.10. 74 51Compute J using Equation 7.9. 6× 149 6× 115

3. Compute J⊤J. 36NΩ −36+ 36NΩ4. Invert

(J⊤J

)— —

5. Compute r(µ) (Equation 5.12) 74NΩ 52NΩ6. Compute J⊤r(µ) 6NΩ −6+ 6NΩ7. Compute

(J⊤J

)−1J⊤r(µ) 36 30

TOTAL 36+ 1010NΩ −12+ 779NΩ

We summarize the total complexities for additive registration algorithms in Ta-ble 7.10. Direct comparison of values from Table 7.10 is difficult as the complexitiesdepend upon the variables NΩ and K. We ease the comparison by plotting thecomplexities for different values of NΩ and K in Figure 7.1.

97

Page 120: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 7.5: Complexity of Algorithm HB3DTM.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J 81+ 75NΩ 54+ 66NΩ

Compute R⊤Rα,β,γ. 81 54Compute J using Equation 5.31. 75 66

3. Compute J⊤J. 36NΩ −36+ 36NΩ4. Invert

(J⊤J

)— —

5. Compute r(µ) (Equation 5.12) 74NΩ 52NΩ6. Compute J⊤r(µ) 6NΩ −6+ 6NΩ7. Compute

(J⊤J

)−1J⊤r(µ) 36 30

TOTAL 117+ 191NΩ 42+ 160NΩ

Table 7.6: Complexity of Algorithm LK3DMM.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J (1002+203K+6K2)NΩ (762+175K+3K2)NΩ

Compute f3dmm using Equation 5.42. 83+3K 57+3K

3. Compute J⊤J. (36+12K+K2)NΩ(−36−12K−K2)

+(36+12K+K2)NΩ

4. Invert(J⊤J

)— —

5. Compute r(µ) from Equation 5.44. (83+3K)NΩ (58+3K)NΩ

6. Compute J⊤r(µ) (6+K)NΩ (−6−K)+(6+K)NΩ

7. Compute(J⊤J

)−1J⊤r(µ) 36+K2 30+K2

TOTAL(36+K2)

+(1127+207K+7K2)NΩ

(−12−K)

+(863+179K+9K2)NΩ

98

Page 121: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 7.7: Complexity of Algorithm HB3DMMNF.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J 81+(219+24K)NΩ 54+(171+16K)NΩ

Compute R⊤Rα,β,γ 81 54

(−36−12K−K2)

+(36+12K+K2)NΩ

4. Invert(J⊤J

)— —

5. Compute r(µ) from Equation 5.44. (83+3K)NΩ (58+3K)NΩ

6. Compute J⊤r(µ) (6+K)NΩ (−6−K)+(6+K)NΩ

7. Compute(J⊤J

)−1J⊤r(µ) 36+K2 30+K2

TOTAL(117+K2)

+(344+40K+K2)NΩ

(42−13K)

+(271+32K+K2)NΩ

Results for 3D rigid targets show that HB roughly performs an 80% less operationsthan its LK counterpart (see Figure 7.1–(Top-left)). Results for nonrigid targetsare similar than those for rigid ones: HB algorithm that uses a semi-factorizationapproach (HB3DMMSF) is six times faster—84% less operations—than its LK equivalent(LK3DMM). The resulting complexities are similar for the three nonrigid cases—K =6, 9, 15—in terms of speed gain, although the absolute numbers change accordinglythe size of the deformation basis: the bigger the basis, the higher the number ofoperations.

99

Page 122: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 7.8: Complexity of Algorithm HB3DMM.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J (210+280K+72K2)NΩ (204+280K+72K2)NΩ

Compute S⊤1 Mi, for i = 1, . . . , 3. 63+81K+18K2 62+81K+18K2

Compute S⊤2 M4. 21+6K 20+6K

Compute S⊤3 M5. 31K+18K2 −1+31K+18K2

3. Compute J⊤J. (36+12K+K2)NΩ(−36−12K−K2)

+(36+12K+K2)NΩ

4. Invert(J⊤J

)— —

5. Compute r(µ) from Equation 5.44. (83+3K)NΩ (58+3K)NΩ

6. Compute J⊤r(µ) (6+K)NΩ (−6−K)+(6+K)NΩ

7. Compute(J⊤J

)−1J⊤r(µ) 36+K2 30+K2

TOTAL(36+K2)

+(335+296K+73K2)NΩ

(−12−13K)

+(304+296K+73K2)NΩ

Figure 7.1 also points out the advantages of using a factorization procedure:we reduce the number of operations by using a semi-factorization approach by a30%—compare algorithms HB3DMMNF and HB3DMMSF in Figure 7.1. However, the full-factorization HB scheme (HB3DMM) is much slower than LK3DMM due the difficultiesof the factorization; completely separate motion and structural variables needs suchan amount of resources that renders unusable the advantages of using HB algorithm.

100

Page 123: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 7.9: Complexity of Algorithm HB3DMMSF.

Step Action Multiplications Additions

1. Compute visibility set Ω. — —2. Compute J 81+(60+18K)NΩ 54+(36+14K)NΩ

Compute R⊤Rα,β,γ 81 54

Compute J(i)1 , . . . , J

(i)6+k using

Equations 8.5. 60+18K 36+14K

3. Compute J⊤J. (36+12K+K2)NΩ(−36−12K−K2)

+(36+12K+K2)NΩ

4. Invert(J⊤J

)— —

5. Compute r(µ) from Equation 5.44. (83+3K)NΩ (58+3K)NΩ

6. Compute J⊤r(µ) (6+K)NΩ (−6−K)+(6+K)NΩ

7. Compute(J⊤J

)−1J⊤r(µ) 36+K2 30+K2

TOTAL(117+K2)

+(185+34K+K2)NΩ

(42−13K)

+(136+30K+K2)NΩ

Table 7.10: Complexities of Additive Algorithms.

Algorithm Multiplications Additionslk3dtm

(Table 7.4)36+1010NΩ −12+779NΩ

hb3dtm

(Table 7.5)117+191NΩ 42+160NΩ

lk3dmm

(Table 7.6)36+K2+(1127+207K+7K2)NΩ −12−K+(863+179K+9K2)NΩ

hb3dmmnf

(Table 7.7)117+K2+(344+40K+K2)NΩ 42−13K+(271+32K+K2)NΩ

hb3dmm

(Table 7.8)36+K2+(335+296K+73K2)NΩ −12−13K+(304+296K+73K2)NΩ

hb3dmmsf

(Table 7.9)117+K2+(185+35K+K2)NΩ 42−13K+(136+30K+K2)NΩ

101

Page 124: Efficient Model-based 3D Tracking by Using Direct Image Registration

Rigid case Nonrigid case (K = 6)

Nonrigid case (K = 9) Nonrigid case (K = 15)

Figure 7.1: Complexity of Additive Algorithms. (Top-Left) Number of operationsvs. target size for additive rigid registration algorithms: algorithm LK3DTM (blue line),and algorithm HB3DTM (red line). We also display the number of operations vs. targetsize for additive nonrigid registration algorithms: LK3DMM (red), HB3DMM (blue), HB3DMMNF(magenta), and HB3DMMSF (green). We compare the complexities for different numberof modes of deformation: K = 6 (Top-right), K = 9 (Bottom-left), and K = 15(Bottom-right).

102

Page 125: Efficient Model-based 3D Tracking by Using Direct Image Registration

7.3.2 Compositional Algorithms

Tables 7.11–7.14 show the complexities for some compositional registration algo-rithms. We show the detailed derivation in Appendix H. We register the number ofoperations of each algorithm by using a similar procedure to Section 7.3.1. Insteadof directly comparing the algorithms from Table 7.13, we contrast complexities byperforming an 8-dof homography. We also include here the additive algorithm LKH8

just for the sake of comparison with its compositional counterparts.

Table 7.11: Complexity of Algorithm LKH8.

Step Action Multiplications Additions

1. Compute J 160NΩ 104NΩCompute fH8. 11 6

Compute J using centraldifferences (Equation 7.9).

8×20 8×13

2. Compute J⊤J. 64NΩ −64+ 64NΩ3. Invert

(J⊤J

)— —

4. Compute r(µ) (Equation 3.3). 11NΩ 6NΩ5. Compute J⊤r(µ) 8NΩ −8+ 8NΩ6. Compute

(J⊤J

)−1J⊤r(µ) 64 56

TOTAL 64+ 243NΩ −16+ 182NΩ

Table 7.12: Complexity of Algorithm ICH8.

Step Action Multiplications Additions

1. Compute JIC — —2. Compute J⊤

ICJIC. — —

3. Invert(J⊤ICJIC)

— —4. Compute r(µ) (Equation 3.3). 11NΩ 6NΩ5. Compute J⊤

ICr(µ) 8NΩ −8+ 8NΩ

6. Compute(J⊤ICJIC)−1

J⊤ICr(µ) 64 56

TOTAL 64+ 19NΩ 48+ 14NΩ

103

Page 126: Efficient Model-based 3D Tracking by Using Direct Image Registration

We choose Lucas-Kanade (LKH8) to be the nonefficient algorithm, whereas theefficient algorithms are respectively Hager-Belhumeur (HBH8), Inverse CompositionalICH8, and Generalized Inverse Compositional (GICH8). We compute the complexityfor one iteration of the search loop for each algorithm. We show the results: inTables 7.11–7.14. As in the additive case, we consider negligible certain constantoperations (denoted as—) such as inverting the Hessian matrix, or computing anoffline Jacobian (such as JIC). We also consider the case where the Hessian of theoptimization in algorithm ICH8 is not constant—e.g., due to partial occlusion of thetarget; in this case, although the Jacobian JIC is constant, but we have to computethe matrix product J⊤

ICJIC (see Table 7.12–Step 2).

Table 7.13: Complexity of Algorithm HBH8.

Step Action Multiplications Additions

1. Compute J 24NΩ 16NΩCompute J = M0Σ(µ) using[Buenaposada and Baumela, 2002].

24 16

2. Compute J⊤J. 64NΩ−64+ 64NΩ3. Invert

(J⊤J

)— —

4. Compute r(µ) (Equation 3.3). 11NΩ 6NΩ5. Compute J⊤r(µ) 8NΩ −8+ 8NΩ6. Compute

(J⊤J

)−1J⊤r(µ) 64 56

TOTAL 64+ 107NΩ−16+ 94NΩ

Table 7.14: Complexity of Algorithm GICH8.

Step Action Multiplications Additions

1. Compute Jgic 448+ 64NΩ 288+ 56NΩCompute JIC. — —Compute ∇µψ(µ). 8× 56 8× 36Compute JGIC = JIC ×∇µψ(µ). 64NΩ 56NΩ

2. Compute J⊤GIC

JGIC. 64NΩ−64+ 64NΩ3. Invert

(J⊤GIC

JGIC)

— —4. Compute r(µ) (Equation 3.3). 11NΩ 6NΩ5. Compute J⊤

GICr(µ) 8NΩ −8+ 8NΩ

6. Compute(J⊤GIC

JGIC)−1

J⊤GIC

r(µ) 64 56

TOTAL 512+ 147NΩ 272+ 134NΩ

104

Page 127: Efficient Model-based 3D Tracking by Using Direct Image Registration

0 0.5 1 1.5 2

x 105

0

1

2

3

4

5

6

7

8

9x 10

7

Number of data

Num

ber

of o

pera

tions

LKH8HBH8ICH8GICH8ICH8 (var. Hessian)

Figure 7.2: Complexity of Compositional Algorithms. Number of operations vs.target size for compositional registration algorithms: LKH8 (red), HBH8 (blue), ICH8 (green),and GICH8 (magenta). We also include the ICH8 algorithm with variable Hessian (lightblue) for the sake of comparison.

We summarize the results in Table 7.15; direct inspection of these results showthat one iteration of the IC algorithm requires less operations than the remainingalgorithms—at least ten-times faster than the equivalent LK. We also plot the resultsin Figure 7.2 for ease of comparison—we plot the complexities for different valuesof NΩ.

The results show that the algorithms fall in three categories with IC being thefastest, HB and GIC being in the medium range, and LK being the slowest by aconsiderable difference.

7.4 Summary

This chapter computes the computational cost for those image registration algo-rithms that we shall experimentally evaluate later. We summarize the comparisonof complexities among the different algorithms in Tables 7.16 and 7.17; we read theTables as follows: the computational cost of the algorithm in the i-th row servesas the basis to compute the percentage of increase or decrease of cost with respectto the algorithms in the corresponding columns—e.g., in Table 7.16, in comparisonwith algorithm LK3DMM, the algorithm HB3DMMNF is 77% faster, algorithm HB3DMMSF

is 84% faster, and HB3DMM is a 93% slower. Thus, for the nonrigid case, a semi-factorization approach (HB3DMMSF) is more efficient than a proper full factorization(HB3DMM), and only slightly better than no factorization at all but swapping gra-dients (HB3DMMNF). For the rigid case, the HB algorithm is a 80% faster than the

105

Page 128: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm Mult. Add.

lkh8 (Table 7.11) 64+ 243NΩ −16+ 182NΩ

hbh8 (Table 7.13) 64+ 107NΩ −16+ 94NΩ

ich8 (Table 7.12) 64+ 19NΩ 18+ 14NΩ

ich8 (Table 7.12)(Variable Hessian)

64+ 83NΩ −48+ 78NΩ

gich8 (Table 7.14) 512+ 147NΩ 272+ 134NΩ

Table 7.15: Complexities of Compositional Algorithms.

corresponding LK.We summarize the results for compositional algorithms in Table 7.17. Results

show that IC is much more efficient than the usual LK—about ten-times speed-up—and much faster than efficient algorithms with nonconstant Jacobian—about fivetimes faster than HB and GIC. However, if the Jacobian of algorithm ICH8 is notconstant, IC algorithm is only a 62% faster than LK, and HB is only a 24% slower.

Table 7.16: Comparison of Relative Complexities for Additive Algorithms

HB3DTM HB3DMM HB3DMMNF HB3DMMSF

LK3DTM H 80.3800%LK3DMM N 93.5067% H 77.0790% H 84.0844%HB3DMM H 88.1550% H 91.7752%

HB3DMMNF H 30.5630%

Table 7.17: Comparison of Relative Complexities for Compositional Algo-rithms

ICH8 (Var. Hes.) HBH8 GICH8 ICH8

LKH8 H 62.1176% H 52.7058% H 33.8805% H 92.2350%ICH8 (Var. Hes.) N 24.8446% N 74.5387% H 79.5024%

HBH8 N 39.8047% H 83.5815%GICH8 H 88.2562%

106

Page 129: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 8

Experiments

In this chapter we describe the experiments that validate the theoretical resultsthat we introduced in Chapters 5 and 6. We demonstrate in our experiments (1)the influence of the Gradient Equivalent Equation (GEE) in the convergence of thealgorithm, and (2) the correctness of the hyphoteses about the complexity of theoptimization algorithms. We systematize the comparison of algorithms by using a setof standarized measures. These measures describe certain features of the algorithmssuch as efficiency, accuracy or robustness.

We organize the chapter as follows: Section 8.1 draws a distinction between regis-tration and tracking for efficient algorithms, and introduces basic hypotheses aboutthe algorithms; Section 8.2 introduces the qualitative features that our experimentsshould exploit together with the quantitative measures that we shall use to verifythe former; Section 8.3 describes the procedure that we use to generate the syn-thetic data needed by our experiments; Section 8.4 discusses some aspects relativeto the implementation of our algorithms; Section 8.5 describes the experiments us-ing additive algorithms, and Section 8.6 evaluates compositional algorithms; finally,Section 8.7 summarizes the results and provides some discussion about them.

NotationOur testing algorithms are iterative: from an initial estimate µ0 in the search spaceR

P we iterate until we find an optimum µ∗ ∈ RP . We optimize the cost function

by descending along the gradient steepest direction given by the Jacobian matrixJ. The Jacobian is constant for efficient optimization methods such as IC or IC; inthis case, we denote µ

J∈ R

P as those fixed parameters at which we compute thisconstant Jacobian.

8.1 Motivation

The purpose of the experiments is to test a set of hypotheses that describe func-tional characteristics of the algorithms. We aim to demonstrate different propertiesabout the algorithms such as convergence or efficiency. We informally present these

107

Page 130: Efficient Model-based 3D Tracking by Using Direct Image Registration

hypotheses in the following questions:

Efficient Registration and Tracking In Section 2.1 we stated the generic differ-ences between image registration and tracking. However, efficient registration/trackingalgorithms—such as HB, IC, GIC, or EFC—have subtle differences when used for reg-istration or tracking. By construction, efficient algorithms compute the Jacobianmatrix used in the iterative optimization process at a fixed location µ

J. This Ja-

cobian matrix may be either constant—as in IC or EFC algorithm—or partiallyconstant—as in HB or GIC. We show in the following the main differences betweenimage registration and tracking when using efficient methods:

Number of imagesEfficient registration involves an image and a template: the algorithm warpsthe image so that its texture and the template texture coincides (see Fig-ure 8.1). On the other hand, efficient template-based tracking involves a se-quence of images and a template: the tracking algorithm searches for thetemplate in each image of the sequence; besides, the template may not beeven an image of the sequence—see Figure 8.1.

Algorithm initializationDirect registration methods imply by definition that the template and theregistered/tracked image overlap to some extent: the error between templateand image is linearized into a gradient descent scheme whose outputs are thetarget parameters. Thus, it is critical for the algorithm performance to choosea proper initialization of the optimization procedure.

In registration problems, the template and the registered image must suf-ficiently overlap (see Figure 8.1). The registration algorithm is usually ini-tialized at µ∗

template, the location of the target region on the template—i.e.,µ0 = µ∗

template, see Figure 8.2. The initial guess must be close enough to µ∗,the actual target parameters at the registered image: the regions defined byµ0 and µ

∗ in the image must overlap so the image error can be linearized—cf.Figure 8.2–(Top-right).

In tracking problems, the template and the tracked image may be arbi-trarily different (see Figure 8.1). The tracking algorithm is not initialized atthe template—i.e, µ0 6= µ∗

template—but at the previous target location in thesequence: for the sequence frame captured at t + 1, we initialize the itera-tive tracking procedure at the optimum computed at the frame t, µ∗

t—i.e.,µ0 = µ∗

t . Again, the initial guess must be close enough to the actual targetlocation µ∗

t+1 so the error can be linearized: the regions defined by µ∗t and µ

∗t+1

must overlap, which is equivalent to say that the inter-frame differences mustbe small enough—see Figure 8.2–(Bottom-right). Notice that the image inFigure 8.2–(Bottom-right) can be tracked in a sequence but not registered,as the intersection of µ∗

template and µ∗t+1 is empty.

108

Page 131: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.1: Registration vs. Tracking. (Top) The registration aligns regions the ontemplate and the image (green squares). The algorithm warps the image (pink square)such that the intensity values of image and template coincide. (Bottom) The trackingalgorithm searches for those regions in the sequence whose texture coincide with the tem-plate. The output is a vector containing the state the target region (position, orientation,scale, etc).

109

Page 132: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.2: Algorithm initialization. (Top-left) Template image where the Jacobianmatrix is computed at the fixed parameters µ

J= µ∗

template (green square). (Top-right)Image to be registered to the template with the actual parameters µ∗ of the target region(green square). We initialize the registration procedure with the target location at thetemplate—i.e., µ0 is µ∗

template. (Bottom-left) Frame t in the image sequence. We showthe actual target parameters at time t, µ∗

t . (Bottom-right) Frame t + 1 in the imagesequence with the actual target parameters µ∗

t+1 (green square). The tracking algorithmis not initialized at µ∗

template (yellow square) but at the previous target location µ∗t (pink

square).

110

Page 133: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 8.1: Registration vs. tracking in efficient methods

Registration Tracking

The aim is to align image regionsThe aim is to recover the target state(position, orientation, velocity, etc.)

An image is registered against the tem-plate

A sequence of images is tracked againstthe template(1)

The template and the registered imagemust overlap to some extent

The template and the tracked imagemay not overlap

The Jacobian is computed at the initialguess of the optimization

The Jacobian is computed far awayfrom the initial guess of the optimiza-tion

(1) The template may not even be a part of the sequence.

Efficient JacobianEfficient algorithms—partially or totally—compute the Jacobian matrix of thebrightness error at the fixed the parameters µ

J. It is usually assumed that

efficient algorithms can be used for either registration or tracking. However,we show that the assumptions for efficient algorithms behave very differentlyin both problems.

In registration problems, the efficient Jacobian is computed at the locationof the template, which happens to be also the initial guess for the iterativeregistration procedure—i.e., µ

J= µ∗

template = µ0, see Figure 8.2. This is thecase of the experiments in [Baker and Matthews, 2004] or [Brooks and Arbel,2010]. The optimization procedure starts with the actual Jacobian at µ0,which remains constant—or partially constant—for the rest of the iterationsof the algorithm.

In tracking problems, the efficient Jacobian is computed at the location ofthe template, which is usually very different from the initial guess for theiterative tracking procedure—i.e., µ

J= µ∗

template 6= µ0, see Figure 8.2. This iswhat happens in the experiments in [Hager and Belhumeur, 1998] or [Cobzaset al., 2009]. In this case, the optimization procedure does not start with theactual Jacobian—i.e., the Jacobian computed at µ0—but with that computedat µ∗

template. This Jacobian must be somehow transformed by parameters µ∗t ,

so it can be used at frame t+ 1 (see Figure 8.2).

We summarize the differences between registration and tracking for efficient algo-rithms in Table 8.1. In this chapter we show that efficient algorithms that verifytheir requirements can be used either for tracking or registration. If the efficientalgorithms do not hold the requirements, then they can be eligible for registrationbut not for tracking.

111

Page 134: Efficient Model-based 3D Tracking by Using Direct Image Registration

Do Warp Requirements Influence Convergence? In previous sections westated the requirements that warps should meet to work with efficient registrationalgorithms (see Table 6.1–page 89). We are interested in studying how the com-pliance of these warp requirements affects algorithms convergence. Our hypothesisis that the optimization successfully converges if and only if the warp meets itsrequirements—e.g. any IC-based optimization algorithm shall converge if its warpholds Requirements 2 and 3 (cf. Table 6.1).

However, proving this hypothesis is not easy: an algorithm may converge closeto the optimum even when none of its Requirements are met, so we are not facing atrue-or-false situation. Recall that efficient algorithms substitute the actual gradientby an approximation provided by the GEE (see Section 5.1 and 6.2). Thus, ifthe approximated gradient is sufficiently similar to the actual one—that is, theone computed in each iteration—the optimization may converge close to the actualoptimum.

When the warp requirements are satisfied, we theoretically demonstrated thatthe approximation due to gradient swapping was accurate—which should lead togood convergence of the optimization. However, when the algorithm fails to meet therequirements, it is actually approximating the Jacobian. We show in our experimentsthat, in this case, the optimization converges when µ0 ≡ µ

J—i.e. in registration

problems. As µ0 becomes increasingly different from µJin the parameter space—as

is the case in tracking problems—the performance of the optimization degrades.

From how far do Algorithms Converge? The convergence in gradient-basedoptimization heavily depends on the choice of the initial guess [Press et al., 1992].Gradient-based optimization linearizes the cost function using a first order Tay-lor series expansion at the starting point. The accuracy of an approximation thatuses Taylor series depends on the order of the approximating polynomial, so a lin-ear approximation is rather coarse; hence, the initial guess for our gradient-basedoptimization must be close to the optimum.

If S ⊆ Rp is the search space, we define the basin of convergence Λ as the

neighbourhood around the optimum µ∗ where the algorithm converges— i.e. Λ =µ0|µ

∗ = limk→∞Υ(µ0), where Υ(µ0) = (µ0,µ1, . . . ,µk) is the iterative sequencethat begins at µ0. The basin of convergence is typically an open ball with centre µ∗

and radius rΛ. The radius rΛ describes the convergence properties of our algorithm:the larger the radius, the bigger the basin of convergence, so the algorithm convergesfar from the optimum. We show in our experiments that the basin of convergenceof a given algorithm strongly depends on the compliance of the warp requirements.

Do Theoretical and Empirical Complexities Match? The experimental testsshall also confirm the theoretical complexities that we obtained for our algorithmsin Chapter 7. We are specially interested in confirming the actual speed incrementin the factorization-based algorithms: we have only guessed their complexity in atheoretical fashion, and we want to compare it to other approaches such as LK.

112

Page 135: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.2 Features and Measures

We answer the aforementioned questions by analyzing the experimental results of thealgorithms under review. We describe the qualitative properties of the algorithms byusing a collection of features. We quantify each feature by measuring certain outputvalues when testing the algorithms. In the following we present these features alongwith their corresponding measures.

AccuracyWe measure how accurate the algorithms are, that is, how close are their out-comes to the actual optimum. We measure the accuracy of our algorithmsusing the Reprojection Error: let µ ∈ R

P be the parameters estimatedby an algorithm, and let µ∗ ∈ R

P be actual optimum for a given configura-tion. We define the reprojection error ε(µ) as the average Euclidean distancebetween the projections of µ and µ∗,

ε(µ) =1

N‖p(f(x; µ))− p(f(x;µ∗))‖, ∀x ∈ X (8.1)

where N = |X |, f : R3 × RP 7→ R

3 is the warp function of our algorithm, andp : R3 7→ R

2 is the corresponding projection. Notice that we define the errorfunction in R

2 —the image space—instead of RP—the search space. Thus, theestimated parameters are accurate if they project the shape to coordinates thatare close enough to those projected by the actual optimum (see Figure 8.3).We could quantify the accuracy by directly comparing the parameters µ andµ∗. However, comparing motion parameters in R

P does not have a naturalgeometric meaning—e.g. it is difficult to compare homography parameters orrotation matrices.

EfficiencyThe efficiency feature is directly related to the computational complexity of thealgorithm. We theoretically computed the complexity of some algorithms inChapter 7, but we corroborate these estimations in the experiments. We break-down the computational burden for each algorithm in (1) the total number ofiterations of the optimization loop, and (2) the time per iteration measured inseconds.

RobustnessThe robustness feature measures the basin of convergence of the algorithm.Our fundamental assumption is that the initial guess µ0 is located in an smallneighbourhood of the actual optimum. We measure the robustness of the al-gorithm by successively increasing the radius of this neighbourhood and com-puting the frequency of convergence of the optimization.

We define the Frequency of Convergence as the percentage of successfullyconverged trials: we consider that an algorithm has successfully converged

113

Page 136: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.3: Accuracy and convergence. (Left) The target projected by the ac-tual parameters µ∗. We overlay on the alpha channel the target projected by the es-timated parameters µ. (Right-Top row) Two snippets of the image that show theprojections of u = p(f(x;µ∗)) and u′ = p(f(x′;µ∗)), respectively, onto u = p(f(x; µ))and u′ = p(f(x′; µ)). (Right-Bottom row) The green circle represents the thresholdin the reprojection error: we consider that the estimated parameters are accurate if theestimated projection µ belongs to the circle.

when the reprojection error is below a given threshold (see Figure 8.3). Be-sides, we define the Rate of Convergence as the variation of reprojectionerror per iteration. We typically plot the rate of convergence as the reprojec-tion error versus the iteration of the optimization loop.

LocalnessLocalness measures the convergence of an efficient algorithm with respect tothe point where the gradient is computed. The localness feature helps us to dis-criminate between image registration and tracking: algorithms that are validfor registration have high localness, whereas algorithms devised for trackingshould have low localness.

We measure localness by comparing the convergence frequency for differentdatasets: algorithms with high localness should converge for registration-likeproblems; on the other hand, algorithms with low localness should equallyconverge for all the datasets. We shall show that those algorithms that do notsatisfy their requirements are more local than those who do satisfy.

GeneralityThe generality feature measures the ability of the optimization scheme todeal with different warp functions. Generality is a generic property of thealgorithms. Thus, we indirectly measure generality by comparing convergencerates and frequencies of the same algorithm with different warp functions.

We summarize the qualitative features that we expect in our algorithms versus theircorresponding quantitative measures in Table 8.2.

114

Page 137: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 8.2: Features and Measures.

Qualitative Features Quantitative MeasuresAccuracy Reprojection error

EfficiencyTime per iteration

Number of iterations

RobustnessRate of Convergence

Convergence Frequency

LocalnessRate of Convergence

Convergence Frequency

GeneralityRate of Convergence

Convergence Frequency

Table 8.3: Numerical Ranges for Features.

Quantitative MeasuresFreq. Convergence Reproj. Error Iterations

QualitativeFeatu

res

Convergencex ≥ 80% — —

40% < x < 80% — —x ≤ 40% — —

Accuracy— x ≤ 1.0px. —— 1.0px. < x < 5.0px. —— x ≥ 5.0px. —

Efficiency— — x ≤ 10— — 10 < x < 30— — x ≥ 30

8.2.1 Numerical Ranges for Features

We compare the algorithms by analyzing the values of their numerical measures.However, it is also interesting to compare the algorithms with respect to a fixedscale. We build such a three-fold scale by classifying the numerical outcomes of theexperiments in good, medium, and bad. We show the ranges for the numerical valuesof the measures in Table 8.3. Each feature is described by one or more measures, andeach measure is defined according to a given numerical range. Thus, we classify thefeature into good—green rows—medium—yellow rows—and bad—red rows. Noticethat we arbitrary define these ranges of values, so the final classification may besubject to interpretation.

115

Page 138: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.3 Generation of Synthetic Experiments

We examine the registration/tracking algorithms by using a collection of syntheti-cally generated experiments. The importance of using synthetic data is to accuratelyverify the outcomes of the algorithms. We design our synthetic experiments to an-alyze the influence of the features introduced in Section 8.2.

Each experiment consists in registering a 3D target to an image. We syntheticallygenerate this image by rendering a 3D model of the target at parameters µ∗, theactual optimum. For efficient registration algorithms, we also render the templateimage at µ

J, the fixed parameters where we compute the Jacobian matrix. We

typically choose µJsuch that the rendered image best displays the texture template

so the derivatives can be computed as accurately as possible—i.e., the texture istotally visible, and it is neither distorted nor aliased, see Figure 8.6.

Once the image is generated, we iterate the registration algorithm starting fromthe initial guess µ0. Recall from appendix A that the performance of GN-like op-timization methods depends upon the shape of the cost function between µ0 andµ∗—which is usually convex when µ0 and µ∗ are close enough—and the accuracyof the linear approximation provided by the Jacobian at µ

J. Thus, by tuning pa-

rameters µ∗, µ0, and µJwe may study the behaviour of the registration algorithm.

In the following we show how to generate synthetic datasets that highlight thefeatures in in Section 8.2:

Accuracy: Synthetic Ground-truth Synthetic data naturally provides groundtruth for evaluating accuracy: µ∗ is known by definition, so computing the reprojec-tion error is straightforward from the algorithms outcomes. Moreover, results canbe easily compared by sharing the synthetic data among all the algorithms.

Robustness: Gaussian Noise The robustness of a gradient-based optimizationprocedure measures its convergence with respect to the difference between the initialguess µ0 and the actual optimum µ∗. We study the robustness of the algorithms bygenerating experiments whose initial guess increasingly diverges from the ground-truth optimum. We generate the initial guess for our experiments by corrupting theactual optimum µ∗ with Gaussian noise of increasing standard deviation σ.

µ0 ∼ N (µ∗, σ2). (8.2)

We analyze the convergence of the testing algorithms when varying the noise vari-ance: the higher the variance the greater the distance between the initial guess andthe optimum. We show the ground truth data and several initial guesses (gener-ated from different variances) in Figure 8.4. Although the variance is defined inthe parameter space, its effects are equivalent in R

2, the image coordinates. Ta-ble 8.11—page 166—shows the reprojection error for the initial parameters ε(µ0)(see Equation 8.1) for different noise values. Again, the bigger the variance thegreater the Euclidean distance in the image plane. We prefer to measure the error

116

Page 139: Efficient Model-based 3D Tracking by Using Direct Image Registration

σ = 0.5 σ = 1.0 σ = 1.5 σ = 2.0 σ = 2.5

σ = 3.0 σ = 3.5 σ = 4.0 σ = 4.5 σ = 5.0

Figure 8.4: Ground Truth and Noise Variance. We show the ground-truth values—textured cube—with three parameters samples each—µ1

0 in red, µ20 in blue, and µ3

0 ingreen—generated by using the corresponding noise variance σ. The noise ranges fromσ = 0.5 to σ = 5.0, and the successive samples increasingly depart from the ground-truth.

in the image to be comparable to previous works such as [Baker and Matthews,2004; Brooks and Arbel, 2010].

Localness: Multiple Datasets Localness is specially relevant in the case of ef-ficient algorithms such as HB or IC. Localness relates the convergence of an efficientalgorithm while the actual optimum µ∗ and the initial guess µ0 diverge from theparameters µ

Jwhere we compute the Jacobian. We study the influence of localness

in the convergence of the testing algorithms by building multiple datasets: eachdataset contains samples of µ∗ that are increasingly different from µ

J. The experi-

ments aim to demonstrate that efficient algorithms are local when they do not meetthe fundamental requirements: that is, the approximation of the gradients providedby the efficient algorithms, when they do not satisfy the requirements, is only validin a local neighbourhood of µ

J.

We design six datasets that increasingly minimize localness. We name thesedatasets from DS1 to DS6 (see Figure 8.5). For each dataset we randomly generate10, 000 samples of the target position µ∗. We control the divergence between µ∗

and µJby defining the ranges where we sample each target parameter. These

ranges define increasingly larger neighbourhoods centred in µJthat do not overlap

with each other. We show an example of datasets DS1–DS6 for parameters µ =α, β, γ, tx, ty, tz

⊤ in Table 8.5—page 129—and we graphically represent the valuescontained in the table in Figure 8.5. We choose those samples in dataset DS1 tobe equal to the Jacobian parameters—that is, µ∗ = µ

Jfor all the 10, 000 samples.

For the remaining datasets, we arbitrarily choose the parameter ranges dependingon each target—for example, we do not rotate planar targets more than 60o sothe texture may be accurately recovered from the image. For each interval rangeΨi ≡ [ai, bi] corresponding to parameter µi we randomly sample the parameter µ∗

i

from an Uniform distribution defined over the support sets [ai, bi] and [−ai,−bi]—

117

Page 140: Efficient Model-based 3D Tracking by Using Direct Image Registration

DS1 DS2 DS3

DS4 DS5 DS6

Figure 8.5: Definition of Datasets. Ranges of rotation parameters α, β, and γ fordatasets DS1 (Top-Left) to DS6 (Bottom-Right). For each parameter we display therange interval as a green square annulus). We represent the combination of the threeranges as a cubic shell—the region between two concentric cubes, plotted in blue. Insidethis region we plot the targets obtained from 8 random samples of the rotation parameterswithin their corresponding ranges. The plots show that the rotation of the target increasesfor each dataset.

i.e. µ∗i ∼ U([−bi,−ai] ∪ [ai, bi]). We repeat this process for each parameter of µ∗ as

each parameter may be defined over different ranges for the same dataset.

8.3.1 Synthetic Datasets and Images

We generate our synthetic experimental data for a given target by simultaneously us-ing multiple datasets and Gaussian noise. We present the procedure in Algorithm 12.Although the parameters and their ranges may change from one experiment to an-other, the procedure is similar for all the cases.

The procedure generates six datasets DS1,. . . ,DS6 with 10, 000 samples each. Wegenerate the corresponding initial guess µ0 for the optimization by sampling 1, 000data from a Normal distribution with variances ranging from σ = 0.5 to σ = 5.0with increments of 0.5 units—i.e. 1, 000 ground truth samples for each one of the10 noise values. Thus, we totalize 60, 000 samples. We render one synthetic imageper ground-truth sample: Each image represents the projection of the target underthe motion parameters µ∗ and the camera parameters K,

x = K[R(µ∗)|t(µ∗)

](X1

)

, (8.3)

118

Page 141: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithm 12 Creating the synthetic datasets.

1: for i = 1 to 6 do2: for σ = 0.5 to 5.0 do3: for j = 1 to 1, 000 do4: Generate ground truth sample for range i, µ∗ = (µ1, . . . , µP)

⊤,where µi ∼ U([−bi,−ai] ∪ [ai, bi]) for i = 1, . . . , P .

5: Generate the initial guess with Gaussian noise, µ0 ∼ N (µ∗, σ).6: end for7: end for8: end for

where X are the target shape coordinates, R and t are the rigid body motion corre-sponding to the ground truth µ∗, and x are the target image projections.

We represent the target using a textured triangle mesh (see Chapter 5). Weproject the target using Equation 8.3 and we render the texture on the image usingPOVRAY, a free tool for ray tracing. The result is a collection of 60, 000 images (seeFigure 8.6).

DS1

DS2

DS3

DS4

DS5

DS6

Figure 8.6: Example of Synthetic Datasets. We select different samples from eachdataset. Datasets range from DS1 (Top) to DS6 (Bottom), according to Table 8.5. Top-left image represents the position where we compute the Jacobian for efficient methods.Notice that the successive samples increasingly depart from this location.

119

Page 142: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.3.2 Generation of Result Plots

We use the synthetic data from Section 8.3.1 with a group of selected algorithms.Notice that the ground-truth data µ∗

j , the corresponding synthetic image, and theinitial guess µj are common to all algorithms.

We register the target with the synthetic image by using each algorithm. Fromeach optimization, we collect (1) the reprojection error ε(µ) in pixels, (2) the totaloptimization time in seconds, and (3) the number of iterations of the optimization.We average the 1, 000 values of reprojection error, optimization time, and numberof iterations for each noise σ value. Besides, we consider that one optimization hassuccessfully converged when its reprojection error is below 5.0 pixels.

For each dataset we collect 10, 000 outcomes per algorithm: 1000 samples foreach one of the 10 levels of noise. Using the collected data of all the algorithms,we generate four plots for each dataset. We plot each algorithm by using differentcolours and markers for ease of comparison (e.g., see Figures 8.45–Page 167). Eachplot is computed as follows:

Reprojection Error We plot the average reprojection error (in pixels) of thoseoptimizations that have successfully converged for each algorithm. We inde-pendently average these values for each level of noise—see, e.g., Figure 8.45–(Top-Left). Notice that we are measuring the accuracy of the algorithms atideal conditions, as we are only using those optimizations that have success-fully converged

Percentage of Convergence We plot the percentage of optimizations that havesuccessfully converged for each level of noise—see, e.g., Figure 8.45–(Top-Right).

Convergence Time We plot the average convergence time (in seconds) of thoseoptimizations that have successfully converged for each algorithm—see, e.g.,Figure 8.45–(Bottom-Left).

Number of Iterations We plot the total number of iterations of those optimiza-tions that have successfully converged for each algorithm—see, e.g., Figure 8.45–(Bottom-Right).

Finally, note that we are only averaging those results from optimizations that haveconverged: those plots concerning reprojection error or number of iterations onlyconsider the best outcomes of those algorithms. Thus, the average results alsodepend on the frequency of convergence—i.e., low reprojection error coupled withlow frequency of convergence is less meaningful than low reprojection error coupledwith high percentage of convergence. We summarize the process of generating andevaluating the experiments in Figure 8.7.

120

Page 143: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.7: Experimental Evaluation with Synthetic Data. We generate synthetic data for the 6 datasets. For each datasetwe generate 10, 000 parameters (green) using the given range (magenta). With these parameters we compute both the initial guess forthe optimization (yellow) and the synthetic images (blue). Finally, each pair synthetic image–initialization is evaluated through thealgorithms LK, HB, IC, FC and GIC, and the results collected (red).

121

Page 144: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.4 Implementation Details

In this section we give detailed information about the implementation of the regis-tration/tracking algorithms: the criteria to decide when to stop the minimization,how to deal with self-occlusions of the model, or further improvements on the fac-torization.

8.4.1 Convergence Criteria

We implement the gradient-based optimization of all the algorithms using the Gauss-Newton scheme proposed by Madsen et al. [Madsen et al., 2004]. We use the fol-lowing stopping criteria:

1. one based on the value of the gradient: ‖g(x)‖ ≤ ǫ1, with g(x) being thegradient evaluated at parameters x.

2. one based on the parameters increment: ‖x(n + 1) − x(n)‖ ≤ ǫ2(‖x‖ + ǫ2),with x(n+ 1) and x(n) being the parameters at time n+ 1 and n.

Both criteria depend upon the values of constants ǫ1 and ǫ2. For all the range ofexperiments we use the values ǫ1 = 10−8, ǫ2 = 10−8. Finally, we define a safeguardagainst infinite loop by imposing an upper bound of 50 iterations for the optimisationscheme. The parameters values at that point will be considered as solution.

Notice that the usual Gauss-Newton algorithm recomputes the Jacobian matrixof the error whereas other methods such as IC do not. We deal with that issue byhaving separate functions to compute both the Jacobian and the approximation tothe Hessian matrices while maintaining the overall scheme of the optimisation. Eachalgorithm was implemented using MATLAB scripting language.

8.4.2 Visibility Management

Convex targets—i.e. nonplanar—typically suffer from self-occlusion: the target can-not be completely imaged but only a portion of it (see Figure 8.8). A triangle ofthe target mesh becomes self-occluded when (1) the triangle is covered by othertriangles of the target that are closer to the camera, or (2) the normal vector to thetriangle is orthogonal —or partially orthogonal—to the camera projection rays. Theset of occluded triangles dynamically depends on the relative orientation of targetand camera: some of the triangles appear and others disappear from the image dueto changes on rotation, translation, and even the deformation of the target.

We consequently augment the brightness dissimilarity (Equation 3.1) using the

D(Ω;µ) = T (Ω)− I(f(Ω;µ)), (8.4)

where Ω ⊂ X is the set of visible vertices in the current view I that depends on µ—i.e. Ω = Ω(µ). Equation 8.4 actually holds only on the nonoccluded points: if we

122

Page 145: Efficient Model-based 3D Tracking by Using Direct Image Registration

β = 0o β = 30o β = 60o β = 90o

W1JJ⊤W⊤1 W2JJ

⊤W⊤2 W3JJ⊤W⊤3 W4JJ

⊤W⊤4

H1 = J⊤W⊤1 W1J H2 = J⊤W⊤2 W2J H3 = J⊤W⊤3 W3J H4 = J⊤W⊤4 W4J

Figure 8.8: Visibility management. (Top-row)We rotate the shape model around Y -axis with degrees β = 0, 30, 60, 90. (Middle-row) Block-structure plots of the Jacobianmatrix. We compute the constant Jacobian J at β = 0o. We also compute the weightingmatrices Wi, i = 1, . . . , 4, such that Wi depends on the rotation with angle βi. For eachview, we plot the matrix WiJJ

⊤Wi. Blue dots: Represent the non-zero entries of matrixWiJJ

⊤Wi. (Bottom-row) Block-structure plots of the Hessian matrix. Different colourvalues represent different data entries.

include any vertex outside Ω, its dissimilarity will be corrupted due to the erroneousbrightness of the imaged vertex.

We take the self-occluded points into account in the optimization scheme (such asLK, HB, etc) using Weighted Least-Squares (wls) [Press et al., 1992]. We transformthe ordinary least-squares problem,

(J⊤J

)δµ = −J⊤r,

123

Page 146: Efficient Model-based 3D Tracking by Using Direct Image Registration

into the wls problem,(J⊤W⊤WJ

)δµ = −J⊤W⊤Wr,

where W is the weight matrix. Matrix W is typically diagonal—the residuals areuncorrelated—and the i-th entry in the main diagonal indicates the importance ofresidual ri(µ)—the i-th entry of vector r. We choose the following weighting matrix:

W =

ω1 0 · · · 00 ω2 · · · 0...

.... . .

...0 0 · · · ωNΩ

, (8.5)

where

ωi,i =

1 if xi ∈ Ω ⊂ X ,

0 if xi /∈ Ω ⊂ X ,(8.6)

and NΩ = ‖Ω‖. Matrix W in Equation 8.5 accounts for the i-th residual into thenormal equations (Equation 8.4.2) if the corresponding i-th target point is visible(cf. Equation 8.6). Those points not present in Ω won’t affect the local minimizerof Equation 8.4.2, so we have effectively taken account of the self-occluded points.

Constant Hessian and WLS Note that the Hessian matrix in Equation 8.4.2includes the inner product W⊤W. If W depends on µ, then the Hessian is no longerconstant. Thus, the efficiency of algorithms that use a constant Hessian, such as IC,greatly diminishes as the product

(J⊤W⊤WJ

)changes over time.

We can estimate the loss of efficiency of the IC algorithm using the examplesfrom Chapter 7. The original implementation for algorithm ICH8 is about six-timesfaster than HBH8 (see Table 7.17). However, if the Jacobian depends on the matrix W,the resulting IC algorithm with variable Hessian is slower—cf. Table 7.15–Page 106.

Efficient Solution of WLS We may alleviate the loss of efficiency of any algo-rithm that uses wls by using a technique similar to [Gross et al., 2006]: the keyidea is to keep as much pre-computed values as possible. The method subdivides thetracking region of the object in P non-overlapping partitions Pi. The partitions arechosen such that the triangles inside have a consistent orientation. We precomputethe Hessian matrix for each partition, HPi

, and we compute the Hessian matrix forthe optimization as

H =P∑

i=1

λiHPi,

where λi is a weight for each partition Pi (see Figure 8.9).We also adapt the method from [Gross et al., 2006] to the HB algorithm: we

compute the matrix SPifor each partition Pi and we build the matrix S as

S =∑

Pi

S⊤PiSPi

.

124

Page 147: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.9: Efficiently solving of WLS. The texture in the reference space is sub-divided in regions—green squares—whose Hessians are computed separately. The actualHessian only takes into account visible regions—blue overlay.

8.4.3 Scale of Homographies

In Section 5.3 we proved that the Equation 5.31 represents a proper factorization asS (Equation 5.32) only contains target shape elements. However, this is not entirelyaccurate as matrix S also depends on the homogeneous scale factor λ = 1/(1−n⊤t)(cf. Appendix E). Note that the λ scale depends on the translation vector t, andhence, the matrix S cannot be constant.

However, we can still use the factorization with no efficiency loss by employingwls—as in the case of self-occlusions. We redefine the weighting matrix W (Equa-tion 8.5) as

W =

ω′1 0 · · · 00 ω′

2 · · · 0...

. . ....

...0 0 · · · ω′

, (8.7)

and we define each weight ω′i,i = ωi,iλi, where ωi,i are the occlusion weights (Equa-

tion 8.6), and λi = 1/(1 − n(i)⊤t) with n(i)⊤ being the plane normal to the i-thvertex.

The conclusion is that we can extract the homogeneous scale λ from matrix S,and account for the factor when solving the local minimizer in a wls fashion. Thus,the matrix S is actually constant and we do not lose efficiency as the weighting

125

Page 148: Efficient Model-based 3D Tracking by Using Direct Image Registration

process is unavoidable.

8.4.4 Minimization of Jacobian Operations

We also benefit from the block distribution of the factorization form of the Jacobianmatrix. The idea was first proposed in [Sepp and Hirzinger, 2003], and we adaptit to our algorithms. We show how to apply the idea for algorithm hb3dtm (seeAlgorithm 7) in the following.

We solve for the local minimizer (Equation 8.4.2) by using a Jacobian matrixfrom the factorization (Equation 5.31) as follows:

δµ =((SM)⊤(SM)

)−1(SM)⊤r. (8.8)

We turn our attention to the GN Hessian matrix of Equation 8.8,((SM)⊤(SM)

), and

we rewrite it using Equations 5.32 and 5.30 as follows:

M⊤S⊤SM =

[M⊤1 0

0 M⊤2

] [S⊤1S⊤2

][S1 S2

][M1 0

0 M2

]

,

=

[M⊤1 0

0 M⊤2

] [S⊤1 S1 S⊤1 S2S⊤2 S1 S⊤2 S2

] [M1 0

0 M2

]

,

=

[M1S

⊤1 S1M1 M1S

⊤1 S2M2

M2S⊤2 S1M1 M2S

⊤2 S2M2

]

.

(8.9)

Notice that the matrix from Equation 8.9 is symmetric so there is no need to computethe elements of the lower triangular matrix—M2S

⊤2 S1M1 in this case. Thus, we roughly

spare a 25% of the computations to compute the Hessian matrix (Equation 8.9).

8.5 Additive Algorithms

In this section we present the experiments that we conducted to evaluate additiveregistration algorithms. We organize this Section as follows: We introduce thealgorithms that we use to evaluate the hypotheses in Section 8.5.1; we generatesynthetic data for rigid and nonrigid targets that we evaluate using the algorithmsin Sections 8.5.2 and 8.5.3; we prove the robustness of our algorithms using real datain Sections 8.5.5 and 8.5.6.

8.5.1 Experimental Hypotheses

The purpose of the tests is to confirm theoretical hypotheses concerning some al-gorithms. We are specially interested in investigating the relationship between thefulfilment of certain requirements by the algorithm and its convergence.

We use the same naming convention for algorithms that we introduced in Chap-ter 7. We select four algorithms from Table 7.2 (see Page 95): two for rigid targets—LK3DTM and HB3DTM—and two for nonrigid data—LK3DMM and HB3DMM. For the sake

126

Page 149: Efficient Model-based 3D Tracking by Using Direct Image Registration

of comparison we also include the algorithm HB3DRT: 6-dof tracking with the HB

algorithm [Sepp, 2006]. Table 8.4 summarizes some characteristics of the selectedalgorithms: description of the warp, constancy of the Jacobian matrix, and fulfil-ment of Requirement 1.

Table 8.4: Evaluated Additive Algorithms

Algorithm Warp Constant Req. 1LK3DTM Shape-induced homography No —HB3DTM Shape-induced homography Partially YESLK3DMM Nonrigid shape-induced homography No —HB3DMM Nonrigid shape-induced homography Partially YESHB3DRT Rigid Body Partially NO

We use the algorithms from Table 8.4 to validate that (1) the convergence of HBdepends upon Requirement 1, (2) HB is accurate and robust, and (3) HB is efficient.

8.5.2 Experiments with Synthetic Rigid data

This set of experiments studies the convergence of the evaluated algorithms in acontrolled environment. The synthetic datasets provide us precise measurements onthe outcomes of the algorithms.

Target Model Our target is a 3D textured model (3dtm): a 3D triangle meshand a texture image— both of them defined in a reference frame—constitute thetarget (cf. Section 5.3.1). We use three models in our experiments with rigid data,(1) a textured cube, (2) a human face, and (3) a textured rectangular box.

The cube model comprises 15, 606 vertices, 30, 000 triangles, and the textureimage has 640 × 480 pixels (see Figure 8.10). We use the centroid of each triangleas target target, adding up to 30, 000 vertices. We compute the centroids of thetriangle mesh and the reference frame using barycentric coordinates. The texturefor each triangle centroid is computed by averaging the colour values for each vertexof the triangle. We do not consider those vertices close to the edges of the cube;instead of removing those vertices from the model, we mark them as “forbidden”and they are treated as occluded points (see Section 8.4.2).

The face1 model comprises 5, 866 vertices, 11, 344 triangles, and the textureimage has 500 × 500 pixels (see Figure 8.11). We use the centroid of each triangleas target vertex, adding up to 11, 344 points (see Figure 8.11).

The tea boxmodel comprises 61, 206 vertices, 120, 000 triangles, and the textureimage has 575 × 750 pixels (see Figure 8.12). We use the centroid of each triangleas target vertex, adding up 120, 000 points. We mark some of those vertices as“forbidden”, and hereby we do not use them in our algorithms.

1Generously provided by Prof. T. Vetter and S. Rhomdani from Basel University

127

Page 150: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.10: The cube model. (Left) The model is a textured triangle mesh. We havedownsampled the triangle mesh for a proper visualization (blue lines). (Right) Imagecontaining the texture map of the model.

Figure 8.11: The face model. (Left) The model is a textured triangle mesh. We havedownsampled the triangle mesh for a proper visualization (blue lines). (Right) Texturemap of the model. The texture image is actually a cylindrical projection of the meshcolour data. Source: Data provided by T. Vetter and S. Rhondami, University of Basel.

Experiments with cube model

We generate 60, 000 synthetic experiments for a rotating cube model by using theprocedure described in Section 8.3. Figure 8.6 shows a subset of selected samplesfrom the aforementioned collection. We rotate the model around the Euler anglesα, β, and γ, coupled with translations along the three axis tX , tY , and tZ . Weshow the ranges of the parameters that we have used to generate the experimentsin Table 8.5. Notice that we include extreme rotations of the target to test therobustness of the algorithms. We register each target in the 60, 000 experimentsby using the algorithms LK3DTM, HB3DTM, and HB3DRT. Table 8.6 shows the averageinitial reprojection error for each dataset and level of noise. As expected, the higherthe noise the larger the reprojection error. We apply the algorithms to the generated

128

Page 151: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.12: The tea box model. (Left) The model is a textured parallelepiped,representing a box of tea. We have downsampled the triangle mesh (blue lines) for aproper visualization. (Bottom) Texture image of the model. We obtained the texture byunfolding, then scanning the actual tea box.

Table 8.5: Ranges of parameters for cube experiments.

Dataset α β γ tx ty tzRegistration DS1 0 0 0 0 0 0

Tracking

DS2 [0,72] [0,72] [0,72] [0,20] [0,10] [0,10]

DS3 [72,144] [72,144] [72,144] [20,40] [10,20] [10,20]

DS4 [144,216] [144,216] [144,216] [40,60] [30,30] [30,30]

DS5 [216,288] [216,288] [216,288] [60,80] [30,40] [30,40]

DS6 [288,360] [288,360] [288,360] [80,100] [40,50] [40,50]

experiments and we show the results in Figures 8.13–8.18.

Table 8.6: Average reprojection error vs. noise for cube.

σ 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.0DS1 1.56 3.17 4.60 6.38 7.98 9.41 10.98 12.49 14.30 15.69

DS2 1.58 3.21 4.81 6.25 7.93 9.48 11.07 12.76 14.11 15.79

DS3 1.79 3.47 5.07 6.80 8.74 10.39 11.92 14.42 15.35 16.70

DS4 1.61 3.16 4.71 6.35 7.91 9.53 11.21 12.91 14.41 15.56

DS5 1.79 3.86 5.10 7.02 8.76 10.74 12.57 13.89 15.17 16.97

DS6 1.76 3.29 5.11 6.47 7.92 9.73 11.34 12.95 14.36 16.12

129

Page 152: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.13: Results from Additive Rigid Dataset DS1 for cube (Top-left) Ac-curacy plot : average reprojection error against noise standard deviation. (Top-right)Robustness plot : average frequency of convergence against noise. (Bottom-left) Effi-ciency plot : average convergence time against noise standard deviation.(Bottom-right)Efficiency plot : Average number of iterations against noise.

Dataset DS1 This dataset contains those experiments closest to the registrationproblem (i.e. µ∗

template = µ0). We show the results in Figure 8.13. The three algo-rithms converge for each experiment. The results for accuracy are similar for all theexperiments, although the error remains constant around 1.5 pixels. However, forthe number of iterations we have different results: the HB3DRT algorithm approxi-mately iterates twice the number of iterations of the other algorithms—although theresulting optimization time is half than LK3DTM due to a lower time per iteration.

Datasets DS2 We show the results for dataset DS2 in Figure 8.14. The accuracyplot shows that the average reprojection error has increased with respect to datasetDS1:about 0.5 pixels, a 25% more, for algorithms LK3DTM and HB3DTM; moreover,the reprojection error for HB3DRT is a 50% higher than for the remaining two algo-rithms. Results for percentage of convergence are even more different than for DS1:optimizations for both LK3DTM and HB3DTM converge around a 90% for the worstcase, while convergence for algorithm HB3DRT reduces to a 70%. As in dataset DS1,the number of iterations needed by HBSDRT to converge approximately doubles theiterations of LK3DTM and HB3DTM—again, algorithm HB3DTR is faster than LK3DTM

130

Page 153: Efficient Model-based 3D Tracking by Using Direct Image Registration

due to a lower time per iteration.

Figure 8.14: Results from dataset DS2 for cube. (Top-left) Accuracy plot : averagereprojection error against noise standard deviation. (Top-right) Robustness plot : averagefrequency of convergence against noise. (Bottom-left) Efficiency plot : average conver-gence time against noise standard deviation. Efficiency plot : (Bottom-right) averagenumber of iterations against noise.

131

Page 154: Efficient Model-based 3D Tracking by Using Direct Image Registration

Datasets DS3 We show the results for dataset DS3 in Figure 8.15. The results forreprojection error, frequency of convergence and number of iterations are similar foralgorithms LKSDTM and HB3DTM in dataset DS2. However, the results for algorithmHB3DRT significantly worsen: the average reprojection error lies in the range of 4.0pixels, roughly a 100% more than for the other two algorithms; also, the convergenceof algorithm HB3DRT monotonically decreases with the level of noise from 85% toa rough 25%. The number of iterations do not grow with respect to dataset DS2:the algorithm converges in less cases, but iterates fewer times for each convergence.However, for those cases in which HB3DRT converges, the convergence time is betterthan LK3DTM. Algorithm HB3DTM has the lowest convergence time.

Figure 8.15: Results from dataset DS3 for cube. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

132

Page 155: Efficient Model-based 3D Tracking by Using Direct Image Registration

Datasets DS4–DS6 We show the results for these datasets in Figures 8.16–8.18.The accuracy results are similar to previous datasets: the reprojection error foralgorithm HB3DRT is higher than for algorithms LK3DRT and HB3DTM—although thedifferences are small for dataset DS4. Algorithm HB3DRT also converges less timesthan the other two algorithms: the frequency of convergence approximately rangesfrom 80% to 10%—although the frequency of convergence for dataset DS6 is slightlybetter than for datasets DS4 and DS5. The results for the number of iterations aresimilar to previous datasets: the number of iterations linearly grow with noise, al-though the results for algorithm HB3DRT double those from the other two algorithms.

Figure 8.16: Results from dataset DS4 for cube. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

133

Page 156: Efficient Model-based 3D Tracking by Using Direct Image Registration

Discussion The results clearly shows that those efficient algorithms that do nothold their requirements are not valid for tracking: the convergence of algorithmHB3DRT, which does not hold Requirement 1, is very poor—80% at ideal conditions—for datasets DS3-DS6—i.e., those that represent the tracking problem; however, thealgorithm have good convergence for datasets DS1 and DS2—i.e., those datasetswhose samples represent the registration problem.

On the other hand, the efficient algorithm HB3DTM holds Requirement 1 and itsresults are valid for both registration and tracking: although the convergence ofalgorithm HB3DTM degrades from dataset DS1 to DS6, the optimizations convergemore than 80% at the worst possible conditions. Moreover, notice that the resultsof algorithm HB3DTM are equivalent to those of algorithm LK3DTM—which does notassume any requirement. Thus, we conjecture that the degradation of the conver-gence is due to the difficulty on the experiments, not problems with the efficientapproximation.

Figure 8.17: Results from dataset DS5 for cube. (Top-left) Accuracy plot : averagereprojection error against noise standard deviation. (Top-right) Robustness plot : averagefrequency of convergence against noise. (Bottom-left) Efficiency plot : average conver-gence time against noise standard deviation. Efficiency plot : (Bottom-right) averagenumber of iterations against noise.

134

Page 157: Efficient Model-based 3D Tracking by Using Direct Image Registration

Algorithms HB3DTM and LK3DTM show similar accuracy for all datasets. Althoughthe average accuracy seems to be low—it ranges from 1.5 to 2.2 pixels —the resultsare consistent for the two algorithms. The accuracy results for algorithm HB3DRT

are worse than those for the other two algorithms—even for those cases that itsuccessfully converged. Thus, neglecting the Requirement 1 for HB algorithm has adirect impact in the accuracy of the optimization.

Timing results are also affected by Requirement 1. Algorithm HB3DRT consis-tently iterates more times to converge than the other two algorithms: if Require-ment 1 does not hold, the efficient Jacobian is incorrectly approximated, and thesuccessive iterations spend more times to reach the optimum.

In summary, the satisfaction of Requirement 1 affects the convergence of al-gorithm HB and, to a lesser extent, the accuracy of the optimization. Thus, theRequirement 1 is mandatory to use the algorithm HB for tracking.

Figure 8.18: Results from dataset DS6 for cube. (Top-left) Accuracy plot : averagereprojection error against noise standard deviation. (Top-right) Robustness plot : averagefrequency of convergence against noise. (Bottom-left) Efficiency plot : average conver-gence time against noise standard deviation. Efficiency plot : (Bottom-right) averagenumber of iterations against noise.

135

Page 158: Efficient Model-based 3D Tracking by Using Direct Image Registration

Experiments with tea box model

We also test the tea box model. The purpose of these experiments is to verify thevalidity of our algorithm to track objects that rotate 360o. We show that our trackernaturally handles strong target rotations by unfolding the target texture.

15 30 45 60 75 90

105 120 135 150 165 180

195 210 240 255 270 300

315 330 345 375 390 405

420 435 450 465 480 505

525 540 555 570 585 599

Figure 8.19: tea box sequence. Selected frames from the sequence generated forthe target tea box. We approximately show one every fifteen frames. The target displaysstrong rotations in the three axes, and translations that take the target to the very bordersof the image.

136

Page 159: Efficient Model-based 3D Tracking by Using Direct Image Registration

Experiments with the tea box model are different from the previous ones: weevaluate a continuous sequence of the target rotating and translating through thescene. We generate a 600 frames sequence in which we completely rotate the targetaround its vertical axis—i.e. β ∈ [0, 360]. Besides, we strongly rotate the targetaround the remaining axes—α ∈ [0, 120] and γ ∈ [0, 300]—and we move near theborders of the image (see Figure 8.19). We continuously track the object through thescene using algorithm HB3DTM. We show the results in Figure 8.20. The algorithmconsistently recovers the target rotation and orientation.

15 30 45 60 75 90

105 120 135 150 165 180

195 210 240 255 270 300

315 330 345 375 390 405

420 435 450 465 480 505

525 540 555 570 585 599

Figure 8.20: Results for the tea box sequence. We overlay the results of algorithmHB3DT onto the frames from the tea-box sequence. For each frame we project the targetvertices using the resulting parameters of the optimization. We represent these projectionsusing blue dots.

137

Page 160: Efficient Model-based 3D Tracking by Using Direct Image Registration

We confirm the results by plotting the estimated values along the frame numberin Figure 8.21. Target rotation is accurately estimated: the three Euler angles fromthe estimation match the ground-truth values despite the extreme orientations. Also,translation parameters are correctly estimated.

Figure 8.21: Estimated parameters from teabox sequence. (Top-row) Eulerangles against frame number. ∆GT : Ground-truth Euler angles used to generate thesequence; ∆EST : Estimated Euler angles using shape-induced HB, for ∆ = α, β, γ.(Bottom-row) Translation parameters against frame number. tGT

∆ : Ground truth trans-lation parameters used to generated the sequence; tEST

∆ : Estimated translation parametersfor ∆ = α, β, γ.

138

Page 161: Efficient Model-based 3D Tracking by Using Direct Image Registration

Good Texture to Track

The convergence of direct methods heavily depends on the texture of the target.Although the existence of texture corners is not strictly necessary for the trackingalgorithms to work, the target texture influence the result [Benhimane et al., 2007].

The question now is: how do we classify a texture as good or bad? A clas-sical reference on the subject, [Shi and Tomasi, 1994], claims that high-frequencytextures—i.e. targets with high contrast patterns, and clearly defined borders—arethe most suitable for tracking/registration. [Shi and Tomasi, 1994] demonstratestheir claims by tracking Harris corners features using LK; as high-frequency texturepatterns provide more stable estimations of Harris corners, they allegedly shouldimprove tracking.

However, more recently [Benhimane et al., 2007] demonstrated that low-frequencytextures—i.e. textures that change gradually, or do not have clearly defined borders—improve the convergence of gradient-based direct methods. They support theirclaims by showing that the solution to least-squares problem involving image gradi-ents is more accurate when the Jacobian is computed from smooth textures patterns.This assumption apparently conflicts with the idea that a high frequency texturepattern may provide a better estimation of the registration parameters.

We study the relationship between convergence and texture in HB by performingan experiment with the face model. We compare the estimated parameters from HB

by using the same structure but (1) the usual texture of the face model, and (2) atexture of Gaussian gradients similar to the cubemodel (see Figure 8.10). We intendto isolate the influence of the target texture on the accuracy of the estimation fromother sources such as the target structure or kinematics. Notice that this experimentwith the face model also proves that our proposed algorithm to track shape-inducedhomographies with HB can deal with 3D models more complicated than “boxes”—such as the cube and teabox models.

We build up the experiment by rotating the face model 90o back and fortharound the Y -axis, and estimating the parameters using HB—see Figure 8.23–(Toprows). We also modify the texture of the face model using a pattern of Gaussiangradients—see Figure 8.23–Bottom-rows—and we again estimate the motion pa-rameters using HB. The face model provides a high-frequency texture (specially ineyebrows, lips, eyes, and ears) whereas the Gaussian gradients provides by definitiona low-frequency texture. Even more, Gaussian texture is uniformly distributed overthe face, whereas eyes, lips, and brows are mainly visible in frontal views—i.e., thetexture on the sides of the face is ill-conditioned.

We also plot the ground-truth values that we use to generate the sequence backto back with the estimated values from the HB with the usual and the Gaussiantextures. Figure 8.22 shows the values of the target rotational parameters—i.e.Euler angles α, β, and γ—for the ground-truth data (αGT ), the estimation from theface model texture (αTXT ), and the estimation from the Gaussian texture (αGSS).

139

Page 162: Efficient Model-based 3D Tracking by Using Direct Image Registration

Results from Figure 8.22 show that HB with the usual texture cannot deal withextreme out-of-plane rotations: the estimation loses track at β ≈ 60o and the ob-tained solution cannot provide a reliable initial estimation for the remaining frames.However, HB with a Gaussian gradient texture provide an accurate estimation forevery frame, even for the very extreme value of β = ±90o; this is possibly due tothe fact that Gaussian texture is uniformly distributed all over the model.

We conclude that texture is fundamental for an accurate estimation of the targetmotion: results may greatly diverge when using different textures for the same targetstructure. Furthermore, we show that the problem of tracking a human head underlarge rotations is rather difficult as the texture from the face at large rotations isnot entirely suitable for gradient-based registration.

Figure 8.22: Estimated parameters from face sequence. Euler angles for ground-truth and estimated values plotted against frame number. ∆GS : Ground-truth Eulerangles used to generate the sequence; ∆TXT : Euler angles estimated from the usual tex-ture; ∆GSS : Euler angles estimated from the Gaussian gradients texture; for ∆ = α, β, γ.

140

Page 163: Efficient Model-based 3D Tracking by Using Direct Image Registration

1 10 20 30 40

50 60 70 80 90

100 110 120 130 140

150 160 170 180 190

Figure 8.23: Good texture vs. bad texture. The face model with the usual texture(Top-rows) and with a Gaussian gradient texture (Bottom-rows). We project the targetvertices onto each image using blue-dots.

141

Page 164: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.5.3 Experiments with Synthetic Nonrigid data

In this group of experiments we allow the target model to deform in addition tochange its position and orientation.

Target Model We describe our target using a 3D morphable model (3DMM), [Blanzand Vetter, 2003]: the target comprises a set of linear deformation basis (includingthe mean sample) and a texture image, both of them defined in the reference frame.The face-deform model contains 9 modes of deformation.

x b1 b2 b3 b4

b5 b6 b7 b8 b9

Figure 8.24: The face-deform model. Target deformation is encoded using lineardeformation basis bk, k = 1, . . . , 9 with mean x (cf. Equation 5.34).

We derive the face-deform model by adding 9 linear deformation basis to themean represented by the face model. Each basis—and the mean—embraces 5, 866vertices, distributed in 11, 344 triangles. The texture image is 500 × 500 pixelssize. We compute the deformation basis by using PCA in a distribution of facemeshes. This distribution results from deforming the facemodel triangle mesh usinga muscle-based system [Parke and Waters, 1996] : we attach 18 parametric musclesto certain vertices of the mesh such that modifying the muscles actually deforms themesh2. We generate 475 keyframes encompassing different values for our syntheticmuscles. Each frame provides a sample mesh for the PCA procedure that estimatesthe linear deformation model. We show the resulting model in Figure 8.24.

2J.M. Buenaposada kindly provided the muscle-based animation

142

Page 165: Efficient Model-based 3D Tracking by Using Direct Image Registration

DS1

DS2

DS3

DS4

DS5

DS6

Figure 8.25: Distribution of Synthetic Datasets. We select different samples fromeach dataset. Datasets range from DS1 (Top) to DS6 (Bottom), according to Table 8.5.Top-left image represents the position where we compute the Jacobian for efficient meth-ods. Notice that the successive samples increasingly depart from this location.

Figure 8.25 shows 36 selected samples from a total of 60, 000 for the face-deformmodel—6 random samples for each dataset. We randomly sample rotation andtranslation parameters from an Uniform distribution defined on the ranges displayedin Table 8.7 (using the procedure that we described in Section 8.3).

Besides pose and orientation we deform the model polygonal mesh. We ran-domly select one of the 475 meshes from the shape distribution, and we compute itscorresponding vector of deformation parameters c∗. The corresponding initial guessis more involved to compute than pose and orientation: we must carefully choose thevariance of the Gaussian noise so that the resulting c0 produces a physically plausi-ble shape. Thus, we compute the covariance matrix of the deformation coefficients,Λc, from the shape distribution. We generate the initial guess for deformation, coi ,by corrupting the ground truth value with Gaussian noise,

c0 ∼ N (c∗,Λc).

143

Page 166: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 8.7: Ranges of parameters for face-deform experiments.

Dataset α β γ tx ty tzRegistration DS1 0 0 0 0 0 0

Tracking

DS2 [0,10] [0,10] [0,10] [0,10] [0,10] [0,10]

DS3 [10,20] [10,20] [10,20] [10,20] [10,20] [10,20]

DS4 [20,30] [20,30] [20,30] [20,30] [30,30] [30,30]

DS5 [30,40] [30,40] [30,40] [30,30] [30,30] [30,30]

DS6 [40,50] [40,50] [40,50] [30,30] [30,30] [30,30]

We show the average initial reprojection error for the generated initializations inTable 8.8. Using this initialization values, we execute the optimization proceduresfor algorithms LK3DMM and HB3DMMSF. We generate join plots with the results inFigure 8.26–8.31.

Table 8.8: Average reprojection error vs. noise for face-deform.

σ 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.0DS1 1.50 3.09 4.47 6.07 7.58 8.99 10.71 12.29 13.50 15.51

DS2 1.54 2.95 4.62 6.08 7.63 9.14 10.41 12.18 13.60 15.05

DS3 1.53 3.05 4.46 6.03 7.64 9.32 10.70 12.09 13.73 15.07

DS4 1.53 2.98 4.59 6.11 7.50 9.20 10.55 12.01 13.64 15.47

DS5 1.60 3.12 4.72 6.26 7.90 9.34 11.06 12.41 14.10 15.66

DS6 1.58 3.08 4.71 6.25 7.76 9.18 11.09 12.74 13.85 15.63

144

Page 167: Efficient Model-based 3D Tracking by Using Direct Image Registration

Dataset DS1 In this dataset we study those experiments corresponding to theregistration problem—i.e. µJ = µ0. We show the results in Figure 8.26. Resultsfor accuracy show that the reprojection error linearly grows with the noise variance,although the average reprojection error for LK algorithm is smaller than for HB. Re-sults for frequency of convergence shows that HB is less robust than LK: algorithmHB perfectly converges for low noise—i.e., σ ≤ 1.5—but the convergence monotoni-cally decreases for higher noise variance. On the other hand, HB algorithm is moreefficient than LK: for those cases that converged, although both algorithms iteratethe same, HB is almost 10 times faster than LK.

Figure 8.26: Results from dataset DS1 for face-deform. (Top-left) Accuracy plot :average reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : averageconvergence time against noise standard deviation. Efficiency plot : (Bottom-right)average number of iterations against noise.

145

Page 168: Efficient Model-based 3D Tracking by Using Direct Image Registration

Datasets DS2–DS6 These datasets contain those experiments that represent thetracking problems—i.e., those experiments that are not initialized in the positionwhere the efficient Jacobian is computed. We show the results in Figures 8.27–8.31.As these results are similar for all the datasets, we summarize their interpretationin the following.

Figure 8.27: Results from dataset DS2 for face-deform. (Top-left) Accuracy plot :average reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : averageconvergence time against noise standard deviation. Efficiency plot : (Bottom-right)average number of iterations against noise.

146

Page 169: Efficient Model-based 3D Tracking by Using Direct Image Registration

LK is consistently more accurate for all datasets than HB algorithm: the differencesin re-projection error are small for low noise; however, for high noise variance, HBalmost doubles the re-projection error of LK.

Figure 8.28: Results from dataset DS3 for face-deform. (Top-left) Accuracy plot :average reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : averageconvergence time against noise standard deviation. Efficiency plot : (Bottom-right)average number of iterations against noise.

147

Page 170: Efficient Model-based 3D Tracking by Using Direct Image Registration

Results for frequency of convergence show that LK is more robust than HB: conver-gence for LK algorithm is around 100% for high noise variance, whereas convergencefor HB ranges from 60% in DS2 to 10% in DS6. However, notice that the convergencefor both algorithms is similar for low noise variance—i.e., σ ≤ 1.5.

Figure 8.29: Results from dataset DS4 for face-deform. (Top-left) Accuracy plot :average reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : averageconvergence time against noise standard deviation. Efficiency plot : (Bottom-right)average number of iterations against noise.

148

Page 171: Efficient Model-based 3D Tracking by Using Direct Image Registration

Efficiency comparisons are mostly favourable to HB algorithm: average time periteration for HB is 0.023 seconds, and 0.178 seconds for LK. The lower time periteration compensates the higher number of iterations of HB with respect to LK: eveniterating a 60% more to converge, the total time for HB algorithm is typically seventimes lower than for LK.

Figure 8.30: Results from dataset DS5 for face-deform. (Top-left) Accuracy plot :average reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : averageconvergence time against noise standard deviation. Efficiency plot : (Bottom-right)average number of iterations against noise.

149

Page 172: Efficient Model-based 3D Tracking by Using Direct Image Registration

Summarizing, these experiments confirm that LK algorithm is more robust thanHB for the face-deform model. We conjecture the HB optimization with deformationparameters is more sensitive to noise than the HB algorithm for 6-dof tracking; wesupport this claim by confirming that the HB algorithm has perfect convergence forlow noise in all datasets. On the other hand, HB is more efficient than LK for all thestudied cases: on average, the HB algorithm is more than five-times faster than LK.

Figure 8.31: Results from dataset DS6 for face-deform. (Top-left) Accuracyplot : average reprojection error against noise standard deviation. (Top-right) Robustnessplot : average frequency of convergence against noise. (Bottom-left) Efficiency plot :average convergence time against noise standard deviation.(Bottom-right) Efficiencyplot : Average number of iterations against noise.

150

Page 173: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.5.4 Experiments With Nonrigid Sequence

Results from previous section showed that HB algorithm was less accurate and robustthan LK for those cases where the optimum parameters were different from the initialguess of the optimization—i.e. for high levels of noise. At a first glance, from theseresults we may conclude that the HB algorithm is not suitable for nonrigid tracking.We vindicate the HB algorithm by using a challenging experiment.

2 20 40 60 80

100 120 140 160 180

200 220 240 260 280

300 320 340 360 380

400 420 440 460 470

Figure 8.32: face-deform sequence. Selected frames from the sequence generated forthe target face-deform. We approximately show one every fifteen frames. The targetdisplays strong rotations in X- and Y -axis, and several deformations.

151

Page 174: Efficient Model-based 3D Tracking by Using Direct Image Registration

We generate a synthetic sequence using face-deform model (see Figure 8.32).The sequence is 470 frames long, and it shows the model alternatively rotating inthe X and Y -axis while performing several facial expressions: frowning, grimacing,grinning, raising eyebrows, and opening the mouth.

2 20 40 60 80

100 120 140 160 180

200 220 240 260 280

300 320 340 360 380

400 420 440 460 470

Figure 8.33: Results from face-deform sequence. Selected frames from the sequenceface-deform processed using HB. Blue dots: The vertices of the model are projected ontothe image by using the estimated parameters. pink dots: Estimated projections fromselected regions of the face: eyebrows, lips, jaw and nasolabial wrinkles.

We show the results from processing the sequence using HB in Figure 8.33. Weoverlay the model projection onto the frames using the estimated values; the resultsare visually accurate and no drift is noticeable. We confirm this perception by plot-ting the estimated values against the ground truth values that we used to generatethe sequence in Figure 8.34.

152

Page 175: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.34: (Top-row) Estimated vs. ground-truth rotation parameters. We denoteground-truth parameters as ∆GT , and estimated parameters as ∆EST , where ∆ = α, β, γ.(Bottom-row) Estimated vs. ground-truth deformation parameters. We show the pa-rameters corresponding to the first four basis of deformation (i.e. b1 to b4). We denoteground-truth parameters as cGT

i , and estimated parameters as cESTi , i = 1, . . . , 4.

The estimated results for rotation accurately match the ground-truth values.We also compare some of the deformation parameters: we select the parametersc1, . . . , c4, that is, the first four components of the basis of deformation b1 to b4.We choose these parameters because they collect more than 80% of the energy inthe pca factorization. Quantitative results for deformation parameters seem lessaccurate than those for rigid motion. However, the impact of this estimation errorin facial motion is small.

153

Page 176: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.5.5 Experiments with real Rigid data

This experiment demonstrates the suitability of our algorithm to track a rigid objectin a real-world sequence. Real sequences are hard to process as the essential assump-tions on which we base our algorithms are not strictly hold: the bcc (Equation 3.1)is not an strict equality due to (1) lighting changes not included in the model, and(2) numerical inaccuracies induced by camera discretization, quantization noise, andaliasing.

Target Model Besides, modelling and initializing the target is also less accuratethan in the synthetic case. Even for a very simple target—e.g. the cube used inthis experiment—it is difficult to establish a one-to-one correspondence between thebrightness values of our synthetic model and the images of the target in the sequence.We accordingly require the target model to be (1) easy to be synthetically built, and(2) easy to put in correspondence with a single image of the sequence—initializationprocedure.

Based on these reasons we choose the target to be a textured cube. We buildthe cube by sticking six squares of rigid cardboard sheet onto a wood frame. Eachcardboard side has a piece of textured fabric stuck on top of it. The advantage ofusing a fabric for the texture pattern is that the material does not produce specularhighlights due to changes in the target orientation. We also stick a calibrationpattern on one side of the cube: we shall use this pattern to put in correspondencethe target image and the synthetic model. We also attach an aluminium rod tothe wooden frame to serve as a handle to freely rotate and translate the targetobject. We display the target object in Figure 8.35. The cube-realmodel comprises

Figure 8.35: The cube-real model. (Left) The actual target made of cardboard andfabric. The calibration pattern is used to initialize the registration algorithm. (Right)Texture map corresponding to the unfolded cube target.

118, 428 vertices arranged in 233, 496 triangles. We encode the unwrapped texturein a 564 × 690 rgb image (see Figure 8.35). As in the synthetic cube model, we

154

Page 177: Efficient Model-based 3D Tracking by Using Direct Image Registration

mark the vertices at the border of each side of the cube as “forbidden”—these pointsshall not be considered into the optimization.

Experiment Arrangement We capture a 470 frames sequence in which we rotateand translate the cube device using a handheld video camera (see Figure 8.36).We use a Panasonic NV-GS400 3CCD DV camera, and we disable features such asautofocus and automatic white balancing to avoid focal length or sudden lightingchanges. We compute the camera intrinsic using a planar calibration pattern andYves Bouguet’s Camera Calibration Toolbox, [Bouguet].

We capture the sequence by moving the cube across the scene using the rodhandle. To demonstrate the advantages of using an unwrapped texture, we rotateand translate the cube in the three coordinate axis such that each side of the cubeis visible in the sequence at least once. We deliberately rotate the cube by largeamounts in the three axis to demonstrate the robustness of our method.

The initial guess for the algorithm are the rotation and translation parameterssuch that the associated projection of the model is perfectly aligned with the firstimage of the sequence. We compute such initialization by using Bouguet’s calibra-tion program with the calibration pattern of the cube. We compute the off-linematrices of algorithm HB3DTM with the initialization values (see Algorithm 7).

Results of the experiment

We show the results in Figure 8.37. For each frame of the sequence we obtain arotation matrix and a translation vector as a result of the algorithm HB3DTM. Wenormalize the brightness values of each side of the cube—in both the recovered pixelsfrom the sequence and the texture image—to minimize the effects of illuminationchanges.

We project the edges of the cube onto the image by using a projection matrixassembled from these parameters (see Figure 8.37). The algorithm accurately com-putes the position and orientation of the target; although the estimation degeneratesin some frames—e.g., frames 40, 160, or 380—the algorithm is able to produce ac-curate solutions for most of them. Besides, the algorithm is also able to recoverfrom erroneous estimations in those frames where the target motion was too fastor abrupt—see e.g., frames 260 or 380 in Figure 8.37. Note that we may improvethe performance of the algorithm by using a Pyramid-base optimization [Bouguet,2000]; however, we choose not to use such approach to better analyze the behaviourof the algorithm.

155

Page 178: Efficient Model-based 3D Tracking by Using Direct Image Registration

2 20 40 60 80

100 120 140 160 180

200 220 240 260 280

300 320 340 360 380

400 420 440 460 470

Figure 8.36: The cube-real sequence. Selected frames from the sequence cube-real.We translate the target whilst it rotates around the axis defined by the handle rod. Thisrotational motion involves the three axis of rotation of the object. Finally, when a sub-stantial portion of the target is no longer visible in the image—i.e., the target leaves thecamera field of view—the algorithm stops its execution.

156

Page 179: Efficient Model-based 3D Tracking by Using Direct Image Registration

2 20 40 60 80

100 120 140 160 180

200 220 240 260 280

300 320 340 360 380

400 420 440 460 470

Figure 8.37: Results from cube-real sequence. Selected frames from the sequencecube-real processed using HB (Algorithm 7). For each frame we compute the rotationmatrix and the translation vector that best registers the model to the image intensityvalues. We use these parameters to project the cube wireframe model onto the image(blue lines).

157

Page 180: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.38: Selected facial scans used to build the model. Each scan is a three-dimensional textured mesh that represents a facial expression. These 3D meshes arecomputed from three views of the subject by using reconstruction algorithms based onstructured light.

8.5.6 Experiment with real Nonrigid data

In this experiment we show the performance of algorithm HB3DMM for registering adeforming human face as it changes its expression. The algorithm faces the samechallenges as for the real rigid case—that is, deviations between the sequence andthe model caused by illumination or camera quantization—plus those derived fromthe nonrigid nature of the target: it is remarkably difficult to accurately model thedeformations of the nonrigid target.

Target Model The face model should capture the maximum variability of thetarget structure for each facial expression—joy, disgust, fear, etc. We use PCA toprovide a set of deformation basis that represent the information contained in facialmotion. Professor Thomas Vetter and his team provided us with a complete facemodel with expressions3. The model was built from 88 structured light 3D scansof the author’s face performing different expressions—joy, sadness, surprise, anger,winking, etc—as shown in Figure 8.38. These scans are aligned into a commonreference system by using a semi-automatic procedure that uses manually selectedface landmarks. The basis of deformation are computed by applying a PCA proce-

3The author gratefully thanks Pascal Paysan and Brian Amberg for the scanning session andthe construction of the models.

158

Page 181: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.39: Unfolded texture model. (Left) Spherical coordinates (θ, φ, ρ) areused to project the 3D shape onto a bidimensional reference space (θ, φ). (Right) Thetemplate of the registration algorithm comprises the actual rgb values of the target textureprojected on the reference space (θ, φ).

dure to the aligned scans. The mean of the resulting PCA model comprises 97, 577vertices arranged in 195, 040 triangles. The basis of deformation are the 88 principalcomponents computed from PCA.

The original model has a number of physical details such as tongue, eyeballs andeye sockets, the back of the head and neck. As we do not need such details for ouralgorithms, we strip them down by deleting their corresponding meshes from themodel. We project the model colour by using cylindrical projection to render thetexture onto the reference space (see Figure 8.39).

Experiment Arrangement We capture a sequence of the face performing bothrigid and nonrigid motion: the rigid motion consists of rotating the head out ofthe image plane about 60o, and then rotating the head inside the image plane; thenonrigid motion consists on the face opening the mouth and raising eyebrows (seeFigure 8.40). We capture the 210 frames long sequence by using a Panasonic DV

handheld camera mounted onto a fixed tripod. We illuminate the scene by usingtwo halogen lamps located above and below the face. We do not use any facial make-up nor special lighting to avoid specular spots on the target. We provide an initialguess for the registration algorithm by fitting the morphable model to the first imageof the sequence: we compute the rotation and translation of the morphable modelsuch that the differences between its projected texture and the frame brightness areminimal. We ease the procedure by defining anchor points, that is, correspondencesbetween some vertices of the model and some pixels on the initial frame. We definea total of 18 anchor points including the corners of the eyes and mouth, nostrils, tip

159

Page 182: Efficient Model-based 3D Tracking by Using Direct Image Registration

1 20 38 34 45

79 89 105 118 145

159 167 169 172 175

185 195 200 203 206

Figure 8.40: The face-real sequence. Selected frames from the sequence of the faceperforming rigid and nonrigid motion. The head rotates to its left around its vertical axis,then nods from left to right, and finally the mouth opens and the forehead wrinkles withthe rise of the eyebrows.

160

Page 183: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.41: Anchor points in the model. We plot the anchor points used to manuallyfit the model to the image (blue circles) on both the reference template (Left) and themodel shape (Right).

of the nose, and tips of the ears (see Figure 8.41).

Moreover, illumination conditions drastically change the brightness of the faceimage with respect to the texture of the morphable model. Light sources—i.e., theceiling lamp—create specular highlights on a glossy surface such as human skin, andnon-cast shadows as parts of the face are more illuminated than others. However, thetexture of the morphable model represents the light reflected by the skin surface, soshadows are not considered. We modify the texture colour of the morphable modelby using the illumination basis provided by spherical harmonics [Basri and Jacobs,2001; Ramamoorthi, 2002]. Using these illumination basis we adjust the texture ofthe morphable model to be as similar as possible to the pixel values of the projectedface.

Our fitting process also considers visibility issues on the morphable model. Weremove from the model those areas which may be problematic for the optimizationsuch as the lower jaw, the back of the head, the neck, the ear canal, and the nostrils.We also remove the eyes as winking produces a sudden change of texture in thoseareas. We consider the aforementioned issues into our fitting process by using non-linear reweighted least-squares. The result is a projection matrix that best fits themorphable model to the image frame.

Results of the Experiment We iteratively apply Algorithm 7 to the frames ofthe sequence. We show the results in Figure 8.42. The algorithm accurately recoversthe face rotation around Y -axis ; notice that we have restricted the rotation to≈ 60 degrees, as the experiment in Section 8.5.2 suggested that face texture may

161

Page 184: Efficient Model-based 3D Tracking by Using Direct Image Registration

1 20 38 34 45

79 89 105 118 145

159 167 169 172 175

185 195 200 203 206

Figure 8.42: Results for the face-real sequence. We show selected frames fromthe face-real sequence processed with the HB algorithm (see Algorithm 7, page 64). Foreach frame of the sequence, the algorithm computes a rotation matrix, a translation vectorand a set of deformation coefficients. We use these parameters to project the shape modelonto the image—blue dots.

162

Page 185: Efficient Model-based 3D Tracking by Using Direct Image Registration

degenerate for strong rotations. Nonetheless, the algorithm is able to track themorphable model. Finally, the algorithm recovers the nonrigid motion of mouthand forehead due to the change of expression. In summary, the algorithm is able torecover the face motion and deformation with small drift.

8.6 Compositional Algorithms

In this section we present the experiments that we conducted to evaluate composi-tional registration algorithms. We organize this Section as follows: We introduce theexperimental hypotheses, together with the algorithms in Section 8.6.1, and performsynthetic experiments involving a rigid 3D target in Section 8.6.2.

8.6.1 Experimental Hyphoteses

As for the additive approach, we select some compositional algorithms to validatethe experimental hypotheses. Again, we verify that those algorithms that fulfil Re-quirements 2 and 3 should have better convergence than those that do not. Weselect the following compositional algorithms: Inverse Compositional Homography(ICH8), Generalized Inverse Compositional homography (GICH8), Inverse Composi-tional plane-induced homography (ICH6), Generalized Inverse Compositional plane-induced homography (GICH6), Inverse Compositional rigid body transformation(IC3DRT), and Forward Compositional plane+parallax plane-induced homography(FCH6PP). For the sake of comparison we evaluate two additive LK algorithms: onefor 8-dof homographies and one for 6-dof homographies. We also include algorithmHB3DTM (see Table 8.4) to compare timing and convergence results. We summarizethe evaluated algorithms in Table 8.9.

Experiments with the EFC algorithm Note that we do not include the EFC

algorithm in the comparison. The reason is that the numerical results of EFC andIC are exactly identical for all the experiments—as we theoretically demonstratedin Chapter 6; thus, we do not plot the results of EFC for ease of visualization.

We test the compositional algorithms from Table 8.9 to validate the followinghypothesis: the convergence of compositional algorithms depends on the complianceof their requirements. If the requirements hold, then the convergence shall be good;otherwise, problems in convergence shall arise.

8.6.2 Experiments with Synthetic Rigid data

This set of experiments studies the convergence of the evaluated algorithms in a con-trolled environment. The synthetic datasets provide us with precise measurementson the outcomes of the algorithms.

163

Page 186: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 8.9: Evaluated Compositional Algorithms

Algorithm Warp Update Constant Req. 2 Req. 3LKH8P 8-dof homography Additive No — —ICH8 8-dof homography Compositional Yes YES YESGICH8 8-dof homography Additive Partially YES YESLKH6 6-dof homography Additive No — —ICH6 6-dof homography Compositional Yes NO YESGICH6 6-dof homography Compositional Yes NO YESFCH6PP 6-dof Plane+Parallax Compositional No YES —IC3DRT 6-dof in R

3 Compositional Yes YES NOHB3DTM 6-dof homography Additive Partially NO YES(1)

(1) Strictly speaking, HB3DTM does not hold Requirement 3, but Requirement 1, cf.Section 6.2.2–Page 85.

Target Model For our experiments with compositional algorithms we use a tex-tured model of a 3D plane. The key issue about using a simple plane is that it canbe registered using 2D or 3D warps.

The plane model comprises 10, 201 vertices organized in 20, 000 triangles (seeFigure 8.43). We represent the target texture as a collection of RGB triplets, oneper vertex. The texture depicts four Gaussian-based gradient patterns: smoothgradients are more suitable for direct image registration than high frequency tex-tures [Benhimane et al., 2007].

Figure 8.43: The plane model. (Left) The model is a textured triangle mesh. Bluelines: Triangle mesh, downsampled here for a proper visualization. (Right) Planar tex-ture map of the model. The texture represents four gradient patterns based on a Gaussiandistribution.

164

Page 187: Efficient Model-based 3D Tracking by Using Direct Image Registration

Experiments with plane model

We generate a collection of experiments using the procedure described in Section 8.3.We show the ranges of motion parameters that we use to generate the datasets inTable 8.10.

Table 8.10: Ranges of motion parameters for each dataset.

Dataset α β γ tx ty tzRegistration DS1 0 0 0 0 0 0

Tracking

DS2 [0,10] [0,10] [0,10] [0,10] [0,10] [0,10]

DS3 [10,20] [10,20] [10,20] [10,20] [10,20] [10,20]

DS4 [20,30] [20,30] [20,30] [20,30] [20,30] [20,30]

DS5 [30,40] [30,40] [30,40] [20,30] [20,30] [20,30]

DS6 [40,50] [40,50] [40,50] [20,30] [20,30] [20,30]

DS1

DS2

DS3

DS4

DS5

DS6

Figure 8.44: Distribution of Synthetic Datasets. We select different samples fromeach dataset. Datasets range from DS1 (Top) to DS6 (Bottom), according to Table 8.10.Top-left image represents the position where we compute the Jacobian for efficient meth-ods. Notice that the successive samples increasingly depart from this location.

165

Page 188: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 8.11: Average reprojection error vs. noise for plane.

σ 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.0DS1 2.43 4.79 7.19 9.71 11.91 14.64 16.37 19.21 21.09 23.61

DS2 2.45 4.70 7.46 9.45 11.86 14.51 16.53 18.72 21.37 24.21

DS3 2.43 4.81 7.21 9.49 11.87 14.43 16.84 18.99 21.27 23.60

DS4 2.34 4.77 7.08 9.70 11.94 13.75 16.06 18.90 21.21 24.09

DS5 2.55 4.96 7.51 10.04 12.68 15.36 17.40 20.02 22.58 24.98

DS6 2.50 4.97 7.20 9.81 12.04 15.20 16.43 20.46 22.67 24.23

As for the experiments with the face model, we show the average initial re-projection error in Table 8.11. As expected, the higher the noise the larger thereprojection error. Moreover, the average reprojection error is approximately sim-ilar among datasets—for the same level of noise. We present the results of theexperiments in the following.

Dataset DS1 We present the results for dataset DS1 in Figure 8.45. Reprojec-tion error plot shows that all the involved algorithms have similar accuracy. Thefrequency of convergence plot shows that algorithms with constant Jacobian—IC-and GIC-based algorithms—converge less often than those algorithms that recom-pute the Jacobian, such as FC, LK, or HB: the plot shows that for a noise withstandard deviation bigger than σ = 3.5 (≈ 16 pixels error) compositional-basedmethods converge below 80%. Moreover, the convergence for those algorithms thatdo not hold Requirement 2—i.e., ICH6 and GICH6—are up to 20 percent worse thanthe other compositional algorithms. However, algorithm IC3DRT—which does nothold Requirement 3—has better convergence than algorithms ICH8 and GICH8—thathold both requirements. Timing results relationship with noise behave as expected:as the initialization error increases, the algorithms need to iterate more times toreach the optimum. Absolute timing results show that inverse compositional meth-ods are faster than forward compositional or LK-based methods. However, inversecompositional algorithms that do not satisfy Requirement 2—ICH6 and GICH6—orRequirement 3—IC3DRT—spend more iterations than those that hold the require-ments: GICH6 iterates as much as double than FCH6PP and ICH8, whereas the numberof iterations for IC3DRT are just slightly higher. The plots also show an interestingresult: LK-based methods iterate more times than compositional methods, speciallyin low noise cases. We conjecture that for homography-based tracking compositionis more natural and better conditioned. Thus, the descent directions of the Jacobianfor additive algorithms show a ’zigzag’ pattern near the optimum, and the algorithmneeds more iterations to compute a local minimum with desired accuracy. This isknown to be a problem with gradient-descent methods and poorly conditioned con-vex problems. Notice that LKH8 needs more iterations as the search space has moredimensions than the case of LKH6.

166

Page 189: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.45: Results from dataset DS1 for plane. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

Dataset DS2 We present the results for dataset DS2 in Figure 8.46. All thealgorithms show similar accuracy for those cases that converged—the maximumdivergence is under 0.005. The frequency of convergence decreases with the noisevariance for those algorithms that efficiently compute the Jacobian. The convergenceworsens in those algorithms that do not hold Requirement 2, specially algorithmGICH6: for values of noise above σ = 4.0, the algorithm converges between 40–20%of trials against 60–40% in DS1. Convergence of ICH6, IC3DRT, ICH8, and GICH8

are similar to those in DS1. However, the timing results for LK algorithms aremore coherent than those in DS1: computation time linearly grows with the noisevariance.

167

Page 190: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.46: Results from dataset DS2 for plane. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

Dataset DS3 Figure 8.47 shows the results for dataset DS3. Again, the accuracyof the algorithms is very similar for those cases that converged. The frequencyof convergence decreases with the noise variance, and shows noticeable differencesbetween those algorithms that verify Requirements 2 and 3 and those that do not.Algorithm GICH6 practically do not converge for noise above σ = 4.0, and theconvergence of algorithms ICH6 and IC3DRT decreases up to 40%. The convergenceof the remaining algorithms is very similar to previous datasets. Even worse, forthose cases where GICH6 converged, the optimization reached the maximum numberof iterations on average. Timing results are consistent with previous datasets.

168

Page 191: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.47: Results from dataset DS3 for plane. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

Dataset DS4 We present the results for dataset DS4 in Figure 8.48. The plotshows the gap between those algorithms that hold the requirements and those thatdo not. We leave algorithm GICH6 out of the comparison as its convergence quicklydegrades: the algorithm converges only 10% of the times for σ = 0.5 and does notconverge for any trial with noise σ ≥ 3.5. Algorithm ICH6 converges less than halfof the times than in previous datasets—less than 20% in DS4 against around a40% in DS1–DS3. Convergence results for algorithm IC3DRT range from 80 to 15%.For these two algorithms, even for noise with variance σ = 0.5, the frequency ofconvergence is approximately of 80% in contrast to a 100% of convergence for theremaining algorithms. Accuracy results are worse than those of datasets DS1-DS3,although the results are similar among all the algorithms, with ICH6 and IC3DRT

showing the lowest reprojection error. Notice that this estimation is not accurate asthe reprojection error is computed using 10% of the samples, instead of 100% in caseof LK algorithm. The convergence problems also reflects in the number of iterations:algorithms ICH6 and IC3DRT iterate three times more than the other algorithms.

169

Page 192: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.48: Results from dataset DS4 for plane. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

Dataset DS5 We present the results for dataset DS5 in Figure 8.49. We leavealgorithm GICH6 out of the comparison as it has bad convergence: the convergenceis below 5% for the lowest noise, and it does not converge at all for σ ≥ 2.5.The convergence of algorithm IC3DRT (40 to 3% of the optimizations successfullyconverge) is approximately the half of dataset DS4—convergence is 80 to 15%, cf.Figure 8.48. Also, the convergence of algorithm ICH6 degrades even more: the rangeof convergence reduces from 80–15 percent for dataset DS4 to 10 to 1 percent fordataset DS5. Moreover, the convergence of all algorithms decreases in this dataset—even LK and FC algorithms have worst performances than in previous datasets. Thecause may be the difficulties in registering a plane that is skewed with respect tothe camera (see Figure 8.44). Notice that the number of iterations the algorithmsneed to converge accordingly increase. Accuracy is also affected: the reprojectionerror increases more than a 100 percent with respect to dataset DS4.

170

Page 193: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.49: Results from dataset DS5 for plane. (Top-left) Accuracy plot : av-erage reprojection error against noise standard deviation. (Top-right) Robustness plot :average frequency of convergence against noise. (Bottom-left) Efficiency plot : aver-age convergence time against noise standard deviation.(Bottom-right) Efficiency plot :Average number of iterations against noise.

Dataset DS6 Figure 8.50 shows the results from the experiments for datasetDS6. We discard algorithms ICH6 and GICH6 as they do not converge for any ofthe 60, 000 experiments—i.e., their reprojection error is always above 5.0 pixels. Wealso leave algorithm IC3DRT out of the comparison as it convergence ranges from3% for the lowest noise values, and it does not converge at all for noise σ ≥ 2.0.The frequency of convergence of the remaining algorithms is worse than in DS5 dueto larger rotations of the plane with respect to the camera—see Figure 8.44. Theaccuracy of all the algorithms are similar to the results in dataset DS5. However, thetiming results are worse: the number of iterations—together with the computationtime—increase with respect to previous datasets. The larger the noise variance, thehigher the number of iterations that the algorithm requires to converge.

171

Page 194: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 8.50: Results from dataset DS6 for plane.(Top-left) Accuracy plot : averagereprojection error against noise standard deviation. (Top-right) Robustness plot : averagefrequency of convergence against noise. (Bottom-left) Efficiency plot : average conver-gence time against noise standard deviation. Efficiency plot : (Bottom-right) averagenumber of iterations against noise.

Summarizing, all algorithms show similar accuracy, although those with lessparameters—e.g., LKH6 and FCH6PP—have less reprojection error. Efficient algo-rithms have lower convergence rates than algorithms that recompute the Jacobianat each iteration. Moreover, those algorithms that do not meet the compositionalrequirements—i.e., ICH6 and GICH6—nor the gradient requirements—i.e., IC3DRT—systematically have lower convergence rates than those that hold Requirements 2and 3. Finally, algorithms that recompute the Jacobian such as LK or FC have higherconvergence time than the others; although algorithm FC iterates few times, it hasthe highest time per iteration—cf. Figure 8.51.

172

Page 195: Efficient Model-based 3D Tracking by Using Direct Image Registration

8.7 Discussion

This section draws some conclusions from the results of the experiments with thecube, face-deform and plane models.

Comparison to Previous Works The proposed experiments are related to somerelevant previous works such as [Baker and Matthews, 2004; Brooks and Arbel, 2010].These works only consider the registration problem, whereas we have also studiedthe tracking problem. Typically, the template is a fixed square region of the imagewhere the Jacobian is computed—that is, the quadrangle is the projection of thetarget at µ∗ = µ

J. Then, the initial guess for the optimization is computed by

perturbing the corners of the quadrilateral by using Gaussian noise. [Baker andMatthews, 2004] modifies the corner locations of the ground truth quadrangle withnoise of variance up to σ = 10, which approximately represents an upper bounderror in corner location of 30 pixels (≈ 3σ). [Brooks and Arbel, 2010] uses a similarprocedure, although the initial reprojection error ranges from 2 to 80 pixels. In ourexperiments the Gaussian noise directly modifies the parameters µ∗, and not thetarget projections; thus, the noise variance σ in both methods is not equivalent.Nonetheless, we still may compare both methods in image coordinates space: ourexperiments have an average error of 25 pixels for the highest noise values—cf.Table 8.11.

By considering multiple datasets, we also study those cases where µ∗ may differfrom µ

J—i.e., the tracking case. Results show that algorithms that do not hold their

corresponding warp requirements are not suitable for tracking but may be eligiblefor image registration. Moreover, results also show that efficient algorithms withconstant Jacobian have worse convergence than those that recompute the Jacobianat each iteration. Using an experimental methodology similar to that in [Baker andMatthews, 2004; Brooks and Arbel, 2010] would not allow us to obtain such result.

Convergence of Algorithm HB Depends upon Gradient Equivalence Al-gorithm HB approximates the actual Jacobian computed at each frame by anotherone that is semi-constant. Requirement 1 states that the quality of the Jacobianapproximation is directly related to the GEE: we use the GEE to build the semi-constant Jacobian that speeds up the algorithm. Thus, if the GEE is satisfied, theapproximated Jacobian shall be identical to the true one, and the convergence shallnot be affected. However, if the GEE does not hold, the approximation may induceerrors during the optimization —thus, leading to poor convergence.

Algorithm HB3DRT does not hold Requirement 1, and its convergence deterioratesas result: the convergence is good for dataset DS1, but it gradually degrades as thenoise and the differences between µ

Jand µ0 increase—e.g., for datasets DS3-DS6

the algorithm HB3DRT approximately converges between 80–15 percent of the times.On the other hand, algorithm HB3DTM—that holds Requirement 1—converges atleast an 80% of the time in the worst case—i.e., DS6 with noise σ = 5.0. Thus,Requirement 1 is directly related to the convergence of HB-based algorithms.

173

Page 196: Efficient Model-based 3D Tracking by Using Direct Image Registration

Convergence of Algorithm IC Depends upon Gradient Equivalence andComposition Algorithm IC approximates the actual Jacobian by another onewhich is efficiently computed. As in the HB case, the quality of the approximationdirectly depends on the compliance of Requirements 2 and 3: if both Require-ments hold, then the approximation shall be accurate, and the optimization shallsuccessfully converge. We confirm these hypotheses by using the results from theexperiments.

Algorithms ICH8 and GICH8 hold both requirements and have similar behaviour.Accuracy is good for all datasets, and the convergence is medium for all datasetsexcept DS6, for which it is bad. However, these results are consistently better thanthose algorithms that do not hold the requirements, such as IC3DRT, ICH6 and GICH6.

Algorithms ICH6 and GICH6 only hold Requirement 3, as plane-induced homogra-phies are not closed under composition. Notice that, although the parameter updatefor GIC-based algorithms is additive and not compositional, we require a composi-tional warp in GICH6 by construction (the warp composition is used in the change ofvariables, see Section 6.1.1). Accuracy is good in both algorithms for all datasets,but the convergence is different for either case: convergence for ICH6 is medium-bad, but we can describe the convergence of GICH6 as bad for all datasets. Even forthe registration case—dataset DS1– the convergence is borderline bad when highinitialization noise is present (see Figure 8.45). Results show that the convergenceof IC and GIC algorithms is noticeably worse for plane-induced homography thanfor 8-dof homography. This result demonstrates that compliance of warp require-ments determine the convergence of the algorithm. Besides, there is a noticeabledifference between results for IC and GIC that may be explained by the numericalapproximation to the matrix ∇µψ(µt) of Equation 6.32—results in Figure 8.45 forlow levels of noise appear to be good.

Algorithm IC3DRT holds only Requirement 2, as rigid body transformations inR

2 do not verify the GEE. The algorithm shows good accuracy for those cases thatconverge. The convergence of the algorithm is medium for datasets DS1 and DS2:the results are similar to algorithms ICH8 and GICH8. However, the results fordatasets DS3 and DS4 are significantly worse, and the algorithm practically do notconverge for datasets DS5 and DS6; hence, IC3DRT is eligible for registration butnot for tracking.

Finally, we examine algorithm FCH6PP that only holds Requirement 3: plane+parallaxinduced-homographies do not verify the GEE (see Table 4.1). However, as FC onlyrequires the warp to be closed under composition, results for FCH6PP show goodconvergence and accuracy for all datasets. Moreover, FC converges in less iterationsthan the equivalent LK for plane-induced homography—although the total optimiza-tion time is higher. We speculate that, for this model, composition is more naturaland better conditioned than the usual GN approach.

Requirements Determine Behaviour for Efficient Algorithms We haveshown that the performance of efficient algorithms depend upon the complianceof their requirements—Requirement 1 for HB algorithm, and Requirements 2 and 3

174

Page 197: Efficient Model-based 3D Tracking by Using Direct Image Registration

for IC and GIC. We have also made a distinction between registration and trackingdepending on the parameters µ0 and µ

J(cf. Section 8.1).

For registration, the compliance of the requirements is not a determinant fac-tor: algorithms ICH6 and GICH6 have good convergence for datasets DS1 and DS2(at least for low noise) even when they do not hold Requirement 2—cf. Figures 8.45and 8.46. Moreover, algorithm IC3DRT does not hold Requirement 3, and the con-vergence results are even better than those for algorithms ICH6 and GICH6—cf.Figures 8.45 and 8.46. Additive algorithms have similar performance: algorithmHB3DRT does not hold Requirement 1, but has good convergence for datasets DS1and DS2—cf. Figures 8.13 and 8.14.

For tracking, the compliance of the requirements is fundamental for a properconvergence of the algorithms: ICH6 and GICH6 (that do not hold Requirement 2)and algorithm IC3DRT (that does not hold Requirement 1) have bad convergencefor datasets DS3 to DS6—i.e., those that represent the tracking problem, cf. Fig-ures 8.47–8.50. The convergence specially degrades for dataset DS6, where almostnone of the optimizations successfully converged. The additive algorithms similarlybehave: HB3DRT does not hold Requirement 1 and its convergence quickly degradesthrough datasets DS3 to DS6—cf. Figures 8.15–8.18. However, algorithms thathold their requirements—such as ICH8, GICH8, and HB3DTM—have better conver-gence, even for the challenging dataset DS6.

Efficient algorithms greatly depend on an approximation to the GN Jacobianmatrix: we substitute the actual Jacobian J(µt) by one fixed at µ

J, J(µ

J), or we

approximate J(µt) by computing J(µt) = SM(µt). In registration problems, weassume that µ

J≡ µt, so the approximated Jacobian is similar to the actual one, but

in tracking problems, the difference between µJand µt may be arbitrarily large. The

requirements determine the accuracy of the approximation to the actual Jacobian.If the requirements do not hold, the approximated Jacobian is still similar to theactual one for registration case—i.e., J(µ

J) ≡ J(µt); however, the approximated

Jacobian for the tracking case is quite different when the requirements do not hold,which results in a degraded convergence for those cases.

Timing Results Comparison Figure 8.51 shows the time per iteration for thealgorithms used in the compositional experiments. For each algorithm we averagethe time per iteration all over the 60, 000 experiments. Results show that IC algo-rithm is undoubtedly the fastest, with GIC ranking the second by little difference.The latter is a direct consequence of updating the constant IC Jacobian. The LK al-gorithms are about four times slower than their compositional counterparts—either8 or 6-dof—as computing the Jacobian in each iteration is a costly process. Finally,FC is the slowest of the reviewed algorithms. The explanation to this fact is sim-ple: the Jacobian computation involves more operations as the function is composedwithin. We have also included the HB algorithm for the sake of comparison. Timingresults for HB are comparable to those of the GIC algorithm: about twice the time periteration of IC but still much faster than LK or FC. Notice that the final computationtime of an algorithm depends on (1) the number of iterations of the optimization

175

Page 198: Efficient Model-based 3D Tracking by Using Direct Image Registration

loop, and (2) the time per iteration.

Figure 8.51: Average Time per iteration. Colour bars show the average optimizationtime over 60, 000 experiments for each algorithm.

Comparison Between IC and HB We now compare the two efficient registrationtechniques reviewed in this chapter. For a fairer comparison, we compare algorithmsthat hold their respective requirements: ICH8 verifies both the GEE and the warpcomposition, whereas HB3DTM holds the GEE. We evaluate two different homographicwarps to register the plane target: an 8-dof homography and a 6-dof plane-inducedhomography. This would make a difference for accuracy or efficiency measures, asthe number of parameters changes the shape of the optimization surface, but notconvergence.

Convergence for algorithm HB3DTM is good for datasets DS1-DS3 and mediumfor the rest, but always converges at least a 40% of the time. HB3DTM consistentlyperforms better than ICH8 for all datasets and, with the difference peaking at 15−20% for noise σ = 5.0. These results demonstrate that (1) HB is more robust thanIC—convergence is better when increasing the noise—and (2) HB is less local thanIC—convergence is better for last datasets.

176

Page 199: Efficient Model-based 3D Tracking by Using Direct Image Registration

Efficiency is very disputed, although HB performs slightly better in both totaltime and number of iterations. Nonetheless, IC beats HB in time per iteration—0.009 vs. 0.016 seconds, cf. Figure 8.51.

Algorithm GIC has the same Requirements than IC Results also show thatthe behaviour of algorithms IC and GIC is exactly the same: the convergence ofboth algorithms is good if Requirements 2 and 3 hold. Note that, although thealgorithm GIC additively updates the parameters, results from algorithm GICH6 showpoor convergence as the warp is not closed under composition—i.e., Requirement 3does not hold. This may be seen as contradictory—an additive update needs acompositional warp—however, it is a direct consequence of the IC change of variablesthat is the basis of the GIC algorithm (cf. Section 6.3.1, Equation 6.26). Thus, ifthe warp does not hold Requirement 3, the change of variables of Equation 6.26 isnot valid anymore, and the algorithm would optimize an erroneous cost function.Nonetheless, it may be interesting to study the impact of using optimization methodslike BGFS and Newton-Raphson when the requirements are not fulfilled.

On the Robustness of Efficient Algorithms Results show that efficient algorithms—i.e., IC and HB—are less robust than LK or FC, even if their corresponding require-ments are hold. In theory, the compliance of the requirements indicates that theefficient approximations of the Jacobian—constant Jacobian for IC and factorizedJacobian for HB—are accurate enough to provide a successful optimization.

However, the Jacobian matrix of both efficient algorithms is still an approxi-mation to the actual gradient. Thus, when the initial guess is not within a smallneighbourhood from the actual optimum, the procedure with the approximated Ja-cobian converges less often. We examine the robustness of the efficient algorithmsby analyzing Figures 8.45–8.50. Results for algorithm ICH8 show that the frequencyof convergence decreases as the initialization noise increases. Moreover, the conver-gence worsens for the successive datasets, as the initialization is increasingly differentfrom the reference template. Algorithm HB3DTM shows a similar behaviour, althoughthe frequency of convergence is better than the ICH8 case. However, both algorithmsdisplay worse frequency of convergence than LK due to the Jacobian approximation:over 90% for LK against 65–25% for ICH8 and 80–38% for HB3DTM.

We have tested the algorithm HB in both rigid and nonrigid real-world sequences.Results for both sequences are similar: they show good convergence, although theestimation of the parameters sometimes lacks accuracy (cf. Figures 8.42 and 8.37).We may explain the accuracy problems by analyzing the behaviour of the algorithmwith respect to noise: HB algorithm is sensitive to the initialization. Moreover, sceneillumination poses a challenge to the brightness constancy assumption which resultsin inaccurate estimation of the target position or orientation.

A Proper Target Texture is Critical for Convergence Section Good Textureto Track demonstrated that the convergence of the HB algorithm heavily depends on

177

Page 200: Efficient Model-based 3D Tracking by Using Direct Image Registration

the texture of the target (cf. Section 8.5.2–Page 139). The texture of the face modelis specially troublesome in case of rotations greater than 60o (cf. Figure 8.22). Thismay explain the lack of robustness in the face-deform sequence (see Figures 8.26–8.29).

178

Page 201: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chapter 9

Conclusions and Future work

This chapter summarizes the contributions of the thesis and hints feasible lines offuture work.

9.1 Summary of Contributions

We highlight the following contributions of the thesis:

Survey of Existing Methods We have analysed the existing additive and com-positional image registration algorithms in depth.

Gradient Equivalence Equation We have introduced the GEE as a differentialextension to the bcc. The GEE is crucial for the proper convergence of efficientregistration algorithms.

Fundamental Requirements for Convergence We have proposed some require-ments on the motion model that efficient image registration algorithms mustsatisfy to guarantee accurate results. Motion warps have different requirementsdepending on the approach to the image registration: Requirement 1 for ad-ditive algorithms, and Requirements 2 and 3 for compositional approaches.

Distinction Between Registration and Tracking We have introduced a differ-entiation between registration and tracking for efficient algorithms: those effi-cient algorithms that do not hold their requirements are valid for registrationbut not for tracking.

Systematic Factorization Framework We have introduced lemmas and theo-rems that systematize the factorization stage of the HB algorithm.

Efficient 3D Tracking using HB We have proposed two homography-based warpsfor tracking 3D targets by using the HB algorithm: the shape-induced ho-mography represents the rigid motion of a triangle mesh, and the nonrigidshape-induced homography models both the rigid and nonrigid motion of

179

Page 202: Efficient Model-based 3D Tracking by Using Direct Image Registration

a deforming 3D target. We have provided the HB factorization schemes forboth warps by using the proposed systematic factorization procedure.

Efficient Forward Compositional Algorithm We have introduced a new com-positional algorithm for image registration, the EFC, which is equivalent to theIC. The EFC algorithm provides a new interpretation of IC which clearly ex-plains the change of roles between target and template derivatives. Moreover,the EFC does not require the warping function to be invertible.

9.2 Conclusions

We summarize some thoughts from the results of the experiments in the followingparagraphs.

Handbook of Motion Warps We introduce a classification of motion warps, interms of their suitability, for efficient image registration/tracking. We gather infor-mation from Tables 4.1 and 6.1 to build the following classification—see Table 9.1:

• Warps in R2 Motion warps in R

2— affine, rotation-translation-scale, andhomography—are suitable for every image registration algorithm.

• Warps in P2 Homographies in P

2 are suitable warps for every image reg-istration algorithm— efficient or not. However, 6-dof homographic warps—such as plane-induced and shape-induced homographies— do not comply theRequirement 3—i.e., these warps do not form a group. Thus, 6-dof homo-graphies are not eligible for compositional algorithms. On the other hand,Plane+Parallax homographies do form a group, but they do not hold the GEE;thus, Plane+Parallax homography is not eligible for efficient algorithms—either additive or compositional.

• Warps in R3 General rigid body motion does not hold the GEE—cf. Require-

ments 3 and 1; therefore, warps in R3 are generally not eligible for efficient

algorithms.

Handbook of Efficient Algorithms We provide a classification of efficient im-age registration algorithms in terms of their suitability to solve problems. Figure 9.1shows the qualitative features for algorithms LK, IC, HB, FC, and GIC; we estimatethis qualitative information from the quantitative outcomes of the experiments inChapter 8. Using the values depicted in Figure 9.1 we infer the following classifica-tion:

The most complete: Lucas-Kanade algorithm is the one thatachieves the best marks on almost every feature: it is the most ro-bust, accurate and can be applied to any differentiable warp func-

tion. Also, it can be robustly used for both registration and tracking. However,its poor efficiency renders the algorithm unusable for real-time applications.

180

Page 203: Efficient Model-based 3D Tracking by Using Direct Image Registration

Table 9.1: Classification of Motion Warps.

AlgorithmsLK FC HB IC GIC

Motion

Warp

sAffine R

2

Homography R2

Homography P2

Plane-induced Homography P2

Shape-induced Homography P2

Plane+Parallax Homography P2

Rigid Body R3

Camera Rotation R3

The most efficient: Inverse Compositional and Generalized In-verse Compositional are the fastest image registration algorithmsaround. If the warp function complies with the requirements of

Chapter 6, these algorithms offer good convergence at the best speed. How-ever, even in these cases, algorithms with constant Jacobian lack robustness.Moreover, IC algorithm cannot track nonplanar 3D targets—but may handle3D objects in registration cases.

The most balanced: Hager-Belhumeur algorithm has a perfecttrade-off between speed and accuracy: it is not as accurate androbust as LK but is is obviously far more efficient; on the other

hand, although HB is not as efficient as IC or GIC, it is more robust and can beapplied to a wider range of motion warps—specially plane-induced and shape-induced homography for 3D tracking. Moreover, it converges better than IC

for both registration and tracking problems.

Registration is not Tracking This thesis emphasizes the differences betweenregistration and tracking: in registration the initial guess in the optimization spaceis close to the point where we compute the gradient, whereas in tracking the distancein optimization space between the initial guess and the gradient parameters can bearbitrarily large. We empirically show that efficient algorithms that do not holdtheir requirements are suitable for registration but not for tracking.

This restriction is obvious for algorithms with constant Jacobian such inversecompositional and its extensions. The critical distinction between registration andtracking has been discussed here for the first time. Hence, inverse compositionalalgorithms would have good convergence when the parameters of the target aresimilar to those of the reference image; however, the algorithm may drift —andaccumulate reprojection error—as the target parameters diverge from the referenceones.

181

Page 204: Efficient Model-based 3D Tracking by Using Direct Image Registration

Lucas-Kanade Inverse Compositional

A

R

E G

L

A

R

E G

L

Hager-Belhumeur Generalized Inverse Compositional

A

R

E G

L

A

R

E G

L

Forward Compositional

A

R

E G

L

Figure 9.1: Spiderweb Plots for Several Image Registration Algorithms. (Top-right) Lucas-Kanade algorithm. (Top-left) Inverse Compositional algorithm. (Middle-left) Hager-Belhumeur algorithm. (Middle-right) Generalized Inverse Compositionalalgorithm. (Bottom) Forward Compositional algorithm. Legend: (A) Accuracy, (L)Localness, (G) Generality, (E) Efficiency, and (R) Robustness.

182

Page 205: Efficient Model-based 3D Tracking by Using Direct Image Registration

This behaviour is shown clearly when tracking faces using AAMs [Matthews andBaker, 2004]: the IC algorithm for AAMs does not hold Requirement 2 as AAMs do notform a group. The algorithm has good convergence if the face does not move and itis frontal to the camera—i.e., the parameters of the face are similar to those used tocompute the constant Jacobian; the convergence is also good when the face changesexpression but it is still frontal to the camera and centred; however, the convergencedecreases when the face translates from its central position.

3D Tracking Implies Non-Constant Jacobian By definition, a 3D target isnot totally visible from a single view—except, e.g., if the target is a plane. Efficientalgorithms like IC cannot precompute the Jacobian and the Hessian of 3D targets: ICalgorithm computes the Jacobian in those visible target points at J. However, somepoints appear and others disappear due to self-occlusions and the relative motionbetween the target and the camera; in these cases, the Jacobian must be partiallyrecomputed to handle the newly visible points. Also notice that the efficiency of theIC algorithm greatly diminishes when the Jacobian is not constant (cf. Table 7.15–Page 106).

In Defence of Additive Algorithms The Inverse Compositional algorithm hasbeen synonym of efficient registration/tracking since it was first published in [Bakerand Matthews, 2001]. The rise of IC brought the fall of the existing efficient methods,specially additive ones such as [Hager and Belhumeur, 1998]. In this thesis wevindicate HB as the most balanced registration/tracking algorithm, as:

1. HB can handle a wider range of motion warps: HB is able to track 3D rigidand deformable objects (see Table 9.1), which is not possible when using theIC algorithm—we have shown that IC is not correct for 3D tracking in Sec-tion 4.2.2.

2. HB roughly performs as efficiently as IC when the Jacobian must be recomputed—as in the case of 3D tracking, cf. Table 7.15–Page 106.

9.3 Future Work

We also suggest the following lines of investigation for future improvements of thealgorithms.

Illumination Model In this thesis we have assumed that the BCC is purelyLambertian—i.e., the texture of the target does not depend on its position norits orientation. However, to be physically more accurate, the BCC should account forattached shadows : changes in texture due to the relative orientation of the targetand the light source—i.e., side-lighting cause some facets of the object to be moreilluminated than others. We use spherical harmonics [Basri and Jacobs, 2003; Ra-mamoorthi, 2002] to model attached shadows. We display the spherical harmonics

183

Page 206: Efficient Model-based 3D Tracking by Using Direct Image Registration

of the model face in Figure 9.2. We propose to handle the changes in illumination

Figure 9.2: Spherical Harmonics-based Illumination Model

due to orientation by augmenting the BCC as follows:

9∑

i=0

Bi[x] = It[f(x;µt)], ∀x ∈ X ,

where B0 = T , and Bi : R2 7→ R is the bi-dimensional brightness function corre-

sponding to the i-th spherical harmonic basis computed from [Basri and Jacobs,2003]. This Equation may also be factorized as the usual bcc using the techniquesproposed in the thesis—a similar problem involving the 2D homographic case wassolved in [Buenaposada et al., 2009].

Combine Texture and Edges In this thesis we have only used texture infor-mation to perform the image registration. However, we could improve the regis-tration/tracking by using features other than texture, such as edges [Decarlo andMetaxas, 2000; Marchand et al., 1999; Masson et al., 2003; Vacchetti et al., 2004],or even illumination cues [Lagger et al., 2008; Romdhani and Vetter, 2003]. Besides,we could devise a factorization to include these terms using the techniques proposedin the thesis.

Multi-view Registration/Tracking This thesis have presented only monoculartracking procedures. However, we can extend some of the procedures to work inmulti-view environments. Using multiple cameras we could (1) estimate the param-eters more robustly than only using just one of them, and (2) extend the field of

184

Page 207: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 9.3: Tracking by simultaneously using texture and edges information

view as several cameras may capture more information from the object (see Fig-ure 9.4). Moreover, multi-view tracking using HB is still efficient. Let P1, . . . , Pv be vdistinct cameras that capture a single target (see Figure 9.4). We assume that thetarget is expressed in the scene coordinate system; hence, the motion parametersare independent of each camera. We set up the following equation:

S⊤1...S⊤v

M =

e1...ev

, (9.1)

where S1, . . . , Sv and e1, . . . , ev are the constant factorization matrices and the errorvectors that depend on the cameras P1, . . . , Pv. Notice that the matrix M that dependson the target motion is common to all the views.

Regularization of Gradient Equivalence Although fast, IC is specially limitedto the choice of motion warp: no warp involving nonplanar 3D motion is allowed (cf.Table 9.1). IC places strict constraints on the target motion to (1) allow composition,and (2) hold the gradients equivalence. The non-compliance of the motion warp witheither of these requirements leads to a poor convergence of the algorithm.

185

Page 208: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure 9.4: Efficient tracking using multiple views.

Nonetheless, we can achieve a good convergence even when the requirements arenot met. [Amberg and Vetter, 2009] improves IC registration of AAMs—which do notform a group—by using regularization. We could use a similar technique to enhancethe convergence of IC of (1) those warps that do not hold Requirement 2—such asplane-induced homography, [Cobzas et al., 2009]—or (2) those warps that do nothold Requirements 1 or 3—such as rigid body transformation, [Munoz et al., 2005].

Quantization of Parameter Stability In Chapter 8 we showed that the con-vergence of efficient algorithms depends upon holding certain requirements. How-ever, how does the algorithm behave when the requirements are not satisfied?. Wehave only verified this result in an experimental way. We propose to analyticallystudy the convergence of the algorithms when using an inaccurate approximationto the Jacobian matrix. The idea is to compute some statistics on the results of theoptimization—e.g., confidence intervals on each parameter, convergence or accuracymeasures, etc—given numerical or analytic information about the approximationJacobian: for example, we would like to know for which ranges of 6-dof parametersin R

3, the IC algorithm for rigid body transformation shall converge more than a90% of the time.

Automatic Factorization In Chapter 5 and Appendix D we introduced lemmasand rules to systematically solve the factorization problem: we have demonstratedthat—under certain assumptions—the factorization is feasible. However, we havenot demonstrated that the obtained factorization is the most efficient possible; re-call that factorization is similar to the chain matrix multiplication problem (cf.Section 7.1.3). We propose to build an automatic procedure to compute the fac-torization: the input would be a chain of matrix operations, and the output would

186

Page 209: Efficient Model-based 3D Tracking by Using Direct Image Registration

another chain of matrix operations whose matrices are clearly separated; the result-ing chain of matrices would be such that the operations are minimal. The optimumorder of operations can be computed using dynamic programming triggered by theproposed rules of factorization.

Alternative Computation of the Brightness Error Function In this the-sis we have posed the registration/tracking problem as the minimization of thequadratic error function of the brightness differences between the template and theimage—which is usually known as Sum of Squared Differences or SSD. However,there are alternative error norms in the Fourier domain [Navarathna et al., 2011],or maximizing the correlation of image gradient orientations [Tzimiropoulos et al.,2011]. We may improve the robustness of the HB algorithm by deriving factorizationmethods for such norms.

We may even go a step further by using Discriminative Tracking [Avidan, 2001;Liu, 2007; Lucey, 2008; Wang et al., 2010]: we maximize the classification score ofthe image of the target instead of miminizing the SSD error norm. We may searchfor those parameters that best categorize the target region in the “well-aligned”class—as opposed to the “bad-aligned” class. We propose to speed-up existingdiscriminative tracking techniques by using factorization methods.

187

Page 210: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 211: Efficient Model-based 3D Tracking by Using Direct Image Registration

Bibliography

Amberg, B. and Vetter, T. (2009). On compositional imge alignment, with anapplication to active appearance models. In Proc. of CVPR.

An, K. H. and Chung, M. J. (2008). 3d head tracking and pose-robust 2d tex-ture map-based face recognition using a simple ellipsoid model. In IEEE/RSJInternational Conference on Intelligent Robots and Systems,IROS.

Averbuch, A. and Keller, Y. (2002). Fast motion estimation using bidirectionalgradient methods. In Proc. International Conference on Acoustics, Speech, andSignal Processing.

Avidan, S. (2001). Support vector tracking. In IEEE Trans. on Pattern Analysisand Machine Intelligence, pages 184–191.

B. Tordoff, W. M., de Campos, T., and Murray, D. (2002). Head pose estimation forwearable robot control. In Proc 13th British Machine Vision Conference, Cardiff,September 2002, volume 2, pages 807–816.

Baker, S. and Matthews, I. (2001). Equivalence and efficiency of image alignmentalgorithms. In Proc. of CVPR, volume 1, pages 1090–1097. IEEE.

Baker, S. and Matthews, I. (2004). Lucas-kanade 20 years on: A unifiying framework.International Journal of Computer Vision, 56(3):221–255.

Baker, S., Matthews, I., Xiao, J., Gross, R., Kanade, T., and Ishikawa, T. (2004a).Real-time non-rigid driver head tracking for driver mental state estimation. In11th World Congress on Intelligent Transportation Systems.

Baker, S., Patil, R., Cheung, G., and Matthews, I. (2004b). Lucas-kanade 20 yearson: Part 5. Technical Report CMU-RI-TR-04-64, Robotics Institute, CarnegieMellon University, Pittsburgh, PA.

Bartoli, A. (2008). Groupwise geometric and photometric direct image registration.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(12):2098 –2108.

189

Page 212: Efficient Model-based 3D Tracking by Using Direct Image Registration

Bartoli, A., Hartley, R., and Kahl, F. (2003). Motion from 3d line correspondences:Linear and non-linear solutions. In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, Madison, Wisconsin, USA, pages 477–484.IEEE CSP.

Bartoli, A. and Zisserman, A. (2004). Direct estimation of non-rigid registration. InIn British Machine Vision Conference.

Basri, R. and Jacobs, D. W. (2001). Lambertian reflectance and linear subspaces.In Proc. of ICCV, volume 2, pages 383–390.

Basri, R. and Jacobs, D. W. (2003). Lambertian reflectance and linear subspaces.IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2):218–233.

Basu, S., Essa, I., and Pentland, A. (1996). Motion regularization for model-basedhead tracking. In ICPR ’96: Proceedings of the International Conference onPattern Recognition (ICPR ’96) Volume III-Volume 7276, page 611, Washington,DC, USA. IEEE Computer Society.

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robustfeatures (surf). Computer Vision and Image Understanding, 110(3):346 – 359.

Benhimane, S., Ladikos, A., Lepetit, V., and Navab, N. (2007). Linear and quadraticsubsets for template-based tracking. In Proc. of CVPR.

Benhimane, S. and Malis, E. (2007). Homography-based 2d visual tracking andservoing. International Jounal of Robotics Research, 26(7):661–676.

Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross,M. (2007). Multi-scale capture of facial geometry and motion. ACM Trans.Graph., 26(3):33.

Black, M. J. and Jepson, A. D. (1998). Eigentracking: Robust matching and trackingof articulated objects using a view-based representation. International Journal ofComputer Vision, 26(1):63–84.

Black, M. J. and Yacoob, Y. (1997). Recognizing facial expressions in image se-quences using local parameterized models of image motion. International Journalof Computer Vision, 25(1):23–48.

Blanz, V. and Vetter, T. (1999). A morphable model for the synthesis of 3d faces.In Proc. of SIGGRAPH, pages 187–194. ACM Press.

Blanz, V. and Vetter, T. (2003). Face recognition based on fitting a 3d morphablemodel. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1–12.

Bouguet, J. Y. Camera calibration toolbox for matlab.

190

Page 213: Efficient Model-based 3D Tracking by Using Direct Image Registration

Bouguet, J.-Y. (2000). Pyramidal implementation of the lucas kanade featuretracker. Intel Corporation, Microprocessor Research Labs.

Bowden, R., Mitchell, T. A., and Sharhadi, M. (2000). Non-linear statistical mod-els for the 3d reconstruction of human pose and motion from monocular imagesequences. Image and Vision Computing, 9(18):729–737.

Brand, M. (2001). Morphable 3d models from video. Computer Vision and PatternRecognition, IEEE Computer Society Conference on, 2:456+.

Brand, M. and R.Bhotika (2001). Flexible flow for 3d nonrigid tracking and shaperecovery. In IEEE Computer Society Conference on Computer vision and PatternRecognition (CVPR), volume 1, pages 315–322.

Bregler, C., Hertzmann, A., and Biermann, H. (2000). Recovering non-rigid 3dshape from image streams. In Proc. of CVPR, pages 690–696.

Brooks, R. and Arbel, T. (2010). Generalizing inverse compositional and esm imagealignment. International Journal of Computer Vision, 11(87):191–212.

Brown, L. G. (1992). A survey of image registration techniques. ACM Comput.Surv., 24:325–376.

Brunet, F., Bartoli, A., Navab, N., and Malgouyres, R. (2009). Nurbs warps. InBritish Machine Vision Conference (BMVC), London.

Buenaposada, J., Munoz, E., and Baumela, L. (2009). Efficient illumination indepen-dent appearance-based face tracking. Image and Vision Computing, 27(5):560–578.

Buenaposada, J. M. and Baumela, L. (1999). Seguimiento robusto del rostro humanomediante visi’on computacional. In Proc. Conferencia Asociacion Espanola parala Inteligencia Artificial, volume I, pages 48–53. AEPIA.

Buenaposada, J. M. and Baumela, L. (2002). Real-time tracking and estimation ofplane pose. In Proc. of ICPR, volume II, pages 697–700, Quebec, Canada. IEEE.

Buenaposada, J. M., Munoz, E., and Baumela, L. (2004). Efficient appearance-based tracking. In Proc. CVPR-Workshop on Nonrigid and Articulated Motion.IEEE.

Capel, D. (2004). Image Mosaicing and Super-Resolution (Cphc/Bcs DistinguishedDissertations.). SpringerVerlag.

Caspi, Y. and Irani, M. (2002). Spatio-temporal alignment of sequences. IEEETrans. Pattern Anal. Mach. Intell., 24(11):1409–1424.

191

Page 214: Efficient Model-based 3D Tracking by Using Direct Image Registration

Chen, C.-W. and Wang, C.-C. (2008). 3d active appearance model for aligning facesin 2d images. In IEEE/RSJ International Conference on Robots and Systems(IROS), Nice, France.

Choi, S. and Kim, D. (2008). Robust head tracking using 3d ellipsoidal head modelin particle filter. Pattern Recogn., 41(9):2901–2915.

Cipolla, R. and Drummond, T. W. (1999). Real-time tracking of complex structureswith on-line camera calibration. In In Proceedings of the 10th British MachineVision Conference, BMVC, Nottingham, UK.

Claus, D. and Fitzgibbon, A. W. (2005). A rational function lens distortion modelfor general cameras. In CVPR ’05: Proceedings of the 2005 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition (CVPR’05) -Volume 1, pages 213–219, Washington, DC, USA. IEEE Computer Society.

Cobzas, D., Jagersand, M., and Sturm, P. (2009). 3d ssd tracking with estimated3d planes. Image and Vision Computing, 27(1-2):69–79.

Comaniciu, D., Ramesh, V., and Meer, P. (2000). Real-tiem tracking of non-rigidobjects using mean shift. In Proc. of CVPR, pages 142–149. IEEE.

Cootes, T., Edwards, G., and Taylor, C. (2001). Active appearance models. IEEETransactions on Pattern Analysis and Machine Intelligence, 23(6):681–685.

Cormen, T. H., Stein, C., Rivest, R. L., and Leiserson, C. E. (2001). Introductionto Algorithms. McGraw-Hill Higher Education, 2nd edition.

Decarlo, D. and Metaxas, D. (2000). Optical flow constraints on deformable modelswith applications to face tracking. International Journal of Computer Vision,38(2):99–127.

Del Bue, A. (2010). Adaptive metric registration of 3d models to non-rigid imagetrajectories. In Daniilidis, K., Maragos, P., and Paragios, N., editors, 11th Euro-pean Conference on Computer Vision (ECCV 2010), Crete, Greece, volume 6313of Lecture Notes in Computer Science, pages 87–100. Springer.

Del Bue, A., Smeraldi, F., and Agapito, L. (2004). Non-rigid structure from mo-tion using non-parametric tracking and non-linear optimization. In Proc. CVPR-Workshop on Nonrigid and Articulated Motion, volume 1. IEEE.

Dementhon, D. F. and Davis, L. S. (1995). Model-based object pose in 25 lines ofcode. Int. J. Comput. Vision, 15(1-2):123–141.

Devernay, F., Mateus, D., and Guilbert, M. (2006). Multi-camera scene flow bytracking 3-d points and surfels. In Proc. of CVPR, volume II, pages 2203– 2212.

192

Page 215: Efficient Model-based 3D Tracking by Using Direct Image Registration

Donner, R., Reiter, M., Langs, G., Peloschek, P., and Bischof, H. (2006). Fast activeappearance models search using canonical correlation analysis. IEEE Transactionson Pattern Analysis and Machine Intelligence, 28(10):1690–1694.

Dornaika, F. and Ahlberg, J. (2006). Fitting 3d face models for tracking and activeappearance model training. Image and Vision Computing, 24:1010–1024.

Dowson, N. and Bowden, R. (2008). Mutual information for lucas-kanade tracking(milk): An inverse compositional formulation. IEEE Transactions on PatternAnalysis and Machine Intelligence, 30(1):180–185.

Drummond, T. and Cipolla, R. (2002). Real-time visual tracking of complex struc-tures. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):932–946.

Faggian, N., Paplinski, A. P., and Sherrah, J. (2006). Active appearance modelsfor automatic fitting of 3d morphable models. In AVSS ’06: Proceedings of theIEEE International Conference on Video and Signal Based Surveillance, page 90,Washington, DC, USA. IEEE Computer Society.

Gay-Bellile, V., Bartoli, A., and Sayd, P. (2010). Direct estimation of nonrigidregistrations with image-based self-occlusion reasoning. IEEE Transactions onPattern Analysis and Machine Intelligence, 32(1):87–104.

Gleicher, M. (1997). Projective registration with difference decomposition. In Proc.of CVPR, pages 331–337.

Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations (Johns HopkinsStudies in Mathematical Sciences)(3rd Edition). The Johns Hopkins UniversityPress, 3rd edition.

Gonzalez-Mora, J., Guil, N., and De la Torre, F. (2009). Efficient image alignmentusing linear appearance models. In Proc. of CVPR.

Gross, R., Matthews, I., and Baker, S. (2006). Active appearance models withocclusion. Image and Vision Computing, 24(6):593–604.

Guskov, I. (2004). Multiscaled inverse compositional image alingment for subdivisionsurface maps. In Proc. European Conference on Computer Vision.

Hager, G. and Belhumeur, P. (1998). Efficient region tracking with parametricmodels of geometry and illumination. IEEE Transactions on Pattern Analysisand Machine Intelligence, 20(10):1025–1039.

Hager, G. and Belhumeur, P. (1999). Tracking in 3d: Image variability decomposi-tion for recovering object pose and illumination. Pattern Analysis and Applica-tions.

Harris, C. and Stephens, M. (1988). A combined corner and edge detection. InProceedings of The Fourth Alvey Vision Conference, pages 147–151.

193

Page 216: Efficient Model-based 3D Tracking by Using Direct Image Registration

Hartley, R. and Zisserman, A. (2004). Multiple View Geometry in Computer Vision.Cambridge University Press, second edition.

Hiwada, K., Maki, A., and Nakashima, A. (2003). Mimicking video: real-timemorphable 3d model fitting. In VRST ’03: Proceedings of the ACM symposiumon Virtual reality software and technology, pages 132–139, New York, NY, USA.ACM.

Hong, H. S. and Chung, M. J. (2007). 3d pose and camera parameter trackingalgorithm based on lucas-kanade image alignment algorithm. In InternationalConference on Control, Automation and Systems, Seoul, Korea.

Irani, M. and Anandan, P. (1999). All about direct methods. In Triggs, W., Zis-serman, A., and Szeliski, R., editors, Vision Algorithms: Theory and practice.Springer-Verlag.

Irani, M., Anandan, P., and Cohen, M. (2002). Direct recovery of planar-parallaxfrom multiple frames. IEEE Transactions on Pattern Analysis and Machine In-telligence, 24(11):1528–1534.

Irani, M. and Peleg, S. (1991). Improving resolution by image registration. CVGIP:Graph. Models Image Process., 53(3):231–239.

Irani, M., Rousso, B., and Peleg, S. (1997). Recovery of ego-motion using regionalignment. IEEE Trans. Pattern Anal. Mach. Intell., 19(3):268–272.

Jang, J.-S. and Kanade, T. (2008). Robust 3d head tracking by online feature reg-istration. In The IEEE International Conference on Automatic Face and GestureRecognition.

Jurie, F. and Dhome, M. (2002a). Hyperplane approximation for template matching.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):996–100.

Jurie, F. and Dhome, M. (2002b). Real time robust template matching. In Proc.BMVC, pages 123–132.

K. B. Petersen, M. P. The matrix cookbook.

Keller, Y. and Averbuch, A. (2004). Fast motion estimation using bidirectionalgradient methods. Trans. on IP, 13(8):1042–1054.

Keller, Y. and Averbuch, A. (2008). Global parametric image alignment via high-order approximation. Computer Vision and Image Understanding, 109(3):244–259.

Kollnig, H. and Nagel, H. H. (1997). 3d pose estimation by directly matchingpolyhedral models to gray value gradients. In International Journal of ComputerVision, volume 23, pages 283–302.

194

Page 217: Efficient Model-based 3D Tracking by Using Direct Image Registration

La Cascia, M., Sclaroff, S., and Athitsos, V. (2000). Fast, reliable head trackingunder varying illumination: An approach based on robust registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 22(4):322–336.

Lagger, P., Salzmann, M., Lepetit, V., and Fua, P. (2008). 3d pose refinement fromreflections. In Computer Vision and Pattern Recognition.

Lepetit, V. and Fua, P. (2005). Monocular model-based 3d tracking of rigid objects.Found. Trends. Comput. Graph. Vis., 1(1):1–89.

Lepetit, V. and Fua, P. (2006). Keypoint recognition using randomized trees. IEEETransactions on Pattern Analysis and Machine Intelligence, 28(9):1465–1479.

Lepetit, V., Moreno-Noguer, F., and Fua, P. (2009). Epnp: An accurate o(n) solutionto the pnp problem. International Journal of Computer Vision, 81(2).

Lester, H. and Arridge, S. R. (1999). A survey of hierarchical non-linear medicalimage registration. Pattern Recognition, 32(1):129 – 149.

Lewis, J. P. (1995). Fast normalized cross-correlation. In Vision Interface, pages120–123. Canadian Image Processing and Pattern Recognition Society.

Liu, X. (2007). Generic face alignment using boosted appearance model. In in Proc.IEEE Computer Vision and Pattern Recognition, pages 1079–1088.

Lourakis, M. I. A. and Argyros, A. A. (2006). Chaining planar homographies forfast and reliable 3d plane tracking. In Proc. of ICPR, pages 582–586, Washington,DC, USA. IEEE Computer Society.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. In-ternational Journal of Computer Vision, 2(60):91–110.

Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique withan application to stereo vision. In Proc. of Int. Joint Conference on ArtificialIntelligence, pages 674–679.

Lucey, S. (2008). Enforcing non-positive weights for stable support vector tracking.In IEEE International Conference on Computer Vision and Pattern Recognition(CVPR).

Madsen, K., Nielsen, H., and Tingleff, O. (2004). Methods for non-linear leastsquares problems. Informatiks and Mathematical Modelling, Technical Universityof Denmark, second edition.

Malciu, M. and Preteux, F. (2000). A robust model-based approach for 3d headtracking in video sequences. In FG ’00: Proceedings of the Fourth IEEE Inter-national Conference on Automatic Face and Gesture Recognition 2000, page 169,Washington, DC, USA. IEEE Computer Society.

195

Page 218: Efficient Model-based 3D Tracking by Using Direct Image Registration

Marchand, E., Bouthemy, P., Chaumette, F., and Moreau, V. (1999). Robust real-time visual tracking using 2d-3d model-based approach. In In International Con-ference on Computer Vision, ICCV, Corfu, Greece.

Masson, L., Dhome, M., and Jurie, F. (2004). Robust real time tracking of 3dobjects. In ICPR ’04: Proceedings of the Pattern Recognition, 17th InternationalConference on (ICPR’04) Volume 4, pages 252–255, Washington, DC, USA. IEEEComputer Society.

Masson, L., Dhome, M., and Jurie, F. (2005). Tracking 3d objects using flexiblemodels. BMVC, 2005.

Masson, L., Jurie, F., and Dhome, M. (2003). Contour/texture approach for visualtracking. In SCIA’03: Proceedings of the 13th Scandinavian conference on Imageanalysis, pages 661–668, Berlin, Heidelberg. Springer-Verlag.

Matas, J., Chum, O., Martin, U., and Pajdla, T. (2002). Robust wide baselinestereo from maximally stable extremal regions. In Proceedings of British MachineVision Conference, volume 1, pages 384–393, London.

Matthews, I. and Baker, S. (2004). Active appearance models revisited. Interna-tional Journal of Computer Vision, 60(2):135–164.

Matthews, I., Xiao, J., and Baker, S. (2007). 2d vs. 3d deformable face models:Representational power, construction, and real-time fitting. International Journalof Computer Vision, 75(1):93–113.

Megret, R., Authesserre, J., and Berthoumieu, Y. (2008). The bi-directional frame-work for unifying parametric image alignment approaches. In Proc. EuropeanConference on Computer Vision, pages 400–411.

Megret, R., Mikram, M., and Berthoumieu, Y. (2006). Inverse composition formulti-kernel tracking.

Munoz, E., Buenaposada, J. M., and Baumela, L. (2005). Efficient model-based3d tracking of deformable objects. In Proc. of ICCV, volume I, pages 877–882,Beijing, China.

Munoz, E., Buenaposada, J. M., and Baumela, L. (2009). A direct approach forefficiently tracking with 3d morphable models. In Proc. of ICCV, volume I, Kyoto,Japan.

Murphy-Chutorian, E. and Trivedi, M. M. (2009). Head pose estimation in computervision: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 31(4):607–626.

Navarathna, R., Sridharan, S., and Lucey, S. (2011). Fourier active appearancemodels. In Proceedings of IEEE International Conference on Computer Vision(ICCV 2011).

196

Page 219: Efficient Model-based 3D Tracking by Using Direct Image Registration

Neapolitan, R. and Naimipour, K. (1996). Foundations of algorithms. D. C. Heathand Company, Lexington, MA, USA.

Nick Molton, Andrew Davison, I. R. (2004). Parameterisation and probability inimage alignment. In Proc. of Asian Conference on Computer Vision.

Papandreu, G. and Maragos, P. (2008). Adaptative and constrained algorithms forinverse compositional active appearance models fitting. In Proc. of CVPR.

Parke, F. I. and Waters, K. (1996). Computer Facial Animation. AK Peters Ltd.

Pighin, F., Salesin, D. H., and Szeliski, R. (1999). Resynthesizing facial animationthrough 3d model-based tracking. In In International Conference on ComputerVision, ICCV, Corfu, Greece.

Pilet, J., Lepetit, V., and Fua, P. (2005). Real-time non-rigid surface detection. InProc. of CVPR. IEEE.

Pilet, J., Lepetit, V., and Fua, P. (2008). Fast non-rigid surface detection, registra-tion and realistic augmentation. Int. J. Comput. Vision, 76(2):109–122.

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1992).Numerical Recipes: The Art of Scientific Computing. Cambridge University Press,Cambridge (UK) and New York, 2nd edition.

Pressigout, M. and Marchand, E. (2007). Real-time hybrid tracking using edge andtexture information. Int. Journal of Robotics Research, IJRR, 26(7):689–713.

Ramamoorthi, R. (2002). Analytic pca construction for theoretical analysis of light-ing variability in images of a lambertian object. IEEE Trans. Pattern Analysisand Machine Intelligence, 24:1322–1333.

Romdhani, S. and Vetter, T. (2003). Efficient, robust and accurate fitting of a 3dmorphable model. In Proc. of ICCV, volume 1, pages 59–66.

Ross, D., Lim, J., and Yang, M.-H. (2004). Adaptive probabilistic visual trackingwith incremental subspace update. In Proc. European Conference on ComputerVision, volume LNCS 3022, pages 470–482. Springer-Verlag.

Salzmann, M., J.Pilet, S.Ilic, and P.Fua (2007). Surface deformation models for non-rigid 3–d shape recovery. IEEE Transactions on Pattern Analysis and MachineIntelligence, 29(8):1481–1487.

Schmid, C., Mohr, R., and Bauckhage, C. (2000). Evaluation of interest pointdetectors. International Journal of Computer Vision, 37(2):151–172.

Sclaroff, S. and Isidoro, J. (2003). Active blobs: region-based, deformable appear-ance models. Comput. Vis. Image Underst., 89(2-3):197–225.

197

Page 220: Efficient Model-based 3D Tracking by Using Direct Image Registration

Sepp, W. (2006). Efficient tracking in 6-dof based on the image-constancy assump-tion in 3-d. In Proc. of ICPR.

Sepp, W. (2008). Visual Servoing of Textured Free-Form Objects in 6 Degreesof Freedom. PhD thesis, Institut fr Datenverarbeitung, Technische UniversittMnchen.

Sepp, W. and Hirzinger, G. (2003). Real-time texture-based 3-d tracking. In Proc. ofDeutsche Arbeitsgemeinschaft fur Mustererkennung e.V., volume 2781 of LNCS,pages 330–337. Springer.

Shi, J. and Tomasi, C. (1994). Good features to track. In 1994 IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR’94), pages 593 – 600.

Shum, H.-Y. and Szeliski, R. (2000). Construction of panoramic image mosaics withglobal and local alignment. International Journal of Computer Vision, 36(2):101–130.

Simon, G., Fitzgibbon, A. W., and Zisserman, A. (2000). Markerless tracking usingplanar structures in the scene. In International Symposium on Augmented Reality,pages 120–128.

Strom, J., Jebara, T., Basu, S., and Pentland, A. (1999). Real time tracking andmodeling of faces: An EKF-based analysis by synthesis approach. In Proceed-ings of the Modelling People Workshop at the 1999 International Conference onComputer Vision.

Tomasi, C. and Kanade, T. (1992). Shape and motion from image streams under or-thography: A factorization approach. International Journal of Computer Vision,9(2):137–154.

Torr, P. H. S. and Zisserman, A. (1999). Feature based methods for structure andmotion estimation. In Triggs, W., Zisserman, A., and Szeliski, R., editors, VisionAlgorithms: Theory and practice, pages 278–295. Springer-Verlag.

Torresani, L., Hertzmann, A., and Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Trans.Pattern Anal. Mach. Intell., 30(5):878–892.

Torresani, L., Yang, D., Alexander, G., and Bregler, C. (2002). Tracking and mod-elling non-rigid objects with rank constraints. In Proc. of CVPR. IEEE.

Tsai, R. (1987). A versatile camera calibration technique for high-accuracy 3d ma-chine vision metrology using off-the-shelf tv cameras and lenses. Robotics andAutomation, IEEE Journal of, 3(4):323–344.

Tzimiropoulos, G., Zafeiriou, S., and Pantic, M. (2011). Robust and efficient para-metric face alignment. In Proceedings of IEEE International Conference on Com-puter Vision (ICCV 2011), pages 1847–1854. Oral.

198

Page 221: Efficient Model-based 3D Tracking by Using Direct Image Registration

Vacchetti, L., Lepetit, V., and Fua, P. (2004). Stable real-time 3d tracking using on-line and offline information. IEEE Transactions on Pattern Analysis and MachineIntelligence, 26(10):1385–1391.

Viola, P. and Jones, M. J. (2004). Robust real-time face detection. InternationalJournal of Computer Vision, 57(2):137–154.

Viola, P. and Wells, III, W. M. (1997). Alignment by maximization of mutualinformation. Int. J. Comput. Vision, 24(2):137–154.

Wang, X., Hua, G., and Han, T. X. (2010). Discriminative tracking by metriclearning. In ECCV (3), pages 200–214.

Xiao, J., Baker, S., Matthews, I., and Kanade, T. (2004a). Real-time combined2d+3d active appearance models. In Proc. of CVPR, Washington, D.C. IEEE.

Xiao, J., Baker, S., Matthews, I., and Kanade, T. (2004b). Real-time combined2d+3d active appearance models. In Proc. of CVPR, volume 2, pages 535 – 542.

Xu, Y. and Roy-Chowdhury, A. K. (2008). Inverse compositional estimation of 3dpose and lighting in dynamic scenes. IEEE Transactions on Pattern Analysis andMachine Intelligence, 30(7):1300 – 1307.

Zhu, J., Hoi, S. C., and Lyu, M. R. (2006). Real-time non-rigid shape recoveryvia active appearance models for augmented reality. In Proceedings 9th EuropeanConference on Computer Vision (ECCV2006), Graz, Austria.

Zimmermann, K., Matas, J., and Svoboda, T. (2009). Tracking by an optimalsequence of linear predictors. IEEE Trans. Pattern Anal. Mach. Intell., 31(4):677–692.

Zimmermann, K., Svoboda, T., and Matas, J. (2006). Multiview 3d tracking with anincrementally constructed 3d model. In 3DPVT ’06: Proceedings of the Third In-ternational Symposium on 3D Data Processing, Visualization, and Transmission(3DPVT’06), pages 488–495, Washington, DC, USA. IEEE Computer Society.

Zitova, B. (2003). Image registration methods: a survey. Image and Vision Com-puting, 21(11):977–1000.

199

Page 222: Efficient Model-based 3D Tracking by Using Direct Image Registration

200

Page 223: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix A

Gauss-Newton Optimization

Let f be a vector function, f : Rn 7→ Rm with m ≥ n. We want to find the minimum

of f , i.e. minimise ‖f(x)‖, so we cast the problem as

x∗ = argminx

F(x),

where

F(x) =1

2‖f(x)‖2 =

1

2f(x)⊤f(x).

We assume that x∗ = x+h, where h ∈ Rn is an arbitrary vector such that F(x+h)

is a local minimizer.We find h by linearizing the function f at x using a truncatedTaylor series:

f(x+ h) ≃ ℓ(h) ≡ f(x) + f ′(x)h,

where f ′(x) is the first derivative of function f at point x. We redefine the problemas

F(x+ h) ≃ L(h) ≡1

2ℓ(h)⊤ℓ(h)

=1

2f(x)⊤f(x) + h⊤J(x)⊤f(x) +

1

2h⊤J(x)⊤J(x)h

with J(x) = f ′(x). Matrix J is usually referred as the Jacobian. The Gauss-Newtonstep h minimises the linear model L:

h = argminh

L(h).

If L(h) is a local minimizer then we assume that L′(h) = 0. The first derivative ofthe linear model is

L′(h) = J(x)⊤f(x) + J(x)⊤J(x)h.

We equal this derivative to zero, yielding the normal equations :

J(x)⊤J(x)h = −J(x)⊤f(x)

201

Page 224: Efficient Model-based 3D Tracking by Using Direct Image Registration

Finally, the GN descent direction is computed in closed-form as

h = −(J(x)⊤J(x)

)−1J(x)⊤f(x).

We use a line search strategy to compute a step size α alont the descent directiontowards the optimun—i.e., α = argminα F(x+ αh). Typically, F(x+αh) is not thetrue local minimizer so we repeat the process from the point x′ = x+ h. Again, welinearize F(x′ + h) and we compute a new GN step. We iterate the process untilconvergence. We outline the whole process in Algorithm 13.

Algorithm 13 Outline of the GN algorithm.

On-line: Let xi be the starting point, with i=0.1: while no convergence do2: Compute the Jacobian, J(xi).

3: Compute the GN step, hi = −(J(xi)

⊤J(xi))−1

J(xi)⊤f(xi).

4: Update xi+1 = xi + hi, i = i+ 15: end while

202

Page 225: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix B

Plane-induced Homography

A plane-induced homography relates the image of a plane in two views, or the imageof two planes in a single view(see Figure B.1). Suppose that the camera matrices of

Figure B.1: (Left) The plane π induces a homography H between the imaged planeon views C and C′. The plane-induced homography H depends on the relative motionbetween C and C′—which depends on R and t. (Right) The plane-induced homographyalternatively represents the motion of a plane in a single view. In this case, the plane-induced homography depends on the relative motion between the planes π and π′.

the two views are those of a calibrated stereo rig,

P = K[I |0] P′ = K[R | t],

such that the world origin is at the first camera P. The world plane π has coordinatesπ = (n⊤

π , d)⊤ such that n⊤

πxπ + d = 0, for every point xπ on the plane. The pointxπ = (xπ, yπ, zπ)

⊤ is sensed in both views as

x = K[I |0]xπ = xπ, (B.1)

203

Page 226: Efficient Model-based 3D Tracking by Using Direct Image Registration

andx′ = K[R | t]xπ = K (Rxπ + t) . (B.2)

We rewrite Equation B.2 using −n⊤πxπ/d = 1 as

x′ = K[R | t]xπ = K(Rxπ + tn⊤xπ

), (B.3)

where n⊤ = −n⊤π/d. Inserting Equation B.1 into Equation B.2 results in an equation

that relates plane projections x and x′,

x′ = K(R+ tn⊤

)K−1x. (B.4)

We define the plane-induced homographic motion as a function fh6p : P2 7→ P

2 suchthat

x′ = fh6p(x;µ) = H6x, (B.5)

where µ = (α, β, γ, t)⊤, and H6 = K(R+ tn⊤

)K−1 is a 6-dof homography. This

homography is parameterized by the rotation R— described by the Euler angles α,β, and γ—and the translation t.

204

Page 227: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix C

Plane+Parallax-constrainedHomography

The homography induced by a plane generates a virtual parallax due to the motionof the plane (cf. [Hartley and Zisserman, 2004]–p.335). Let us suppose we have acalibrated stereo rig with camera matrices

P = K[R | − Rt] P′ = K[R′ | − R′t′].

The world plane π has coordinates π = (n⊤π , d)

⊤ such that n⊤πxπ + d = 0, for every

point on the plane xπ. The plane-induced homography H relates the images of thepoint xπ on the two views, x and x′ (see Appendix B). However, this statementis not longer true when the plane’s coordinates change. Let π′ = (n

′⊤π , d′)⊤ be the

resulting plane from applying a rigid body transformation with parameters δR andδt to π (see Figure C.1).

We image the point xπ′ as x′′ on the right view. We relate the two image pointsx and x′′ as

x′′ = Hx+ ρe′ (C.1)

where he scalar ρ is the parallax displacement relative to the plane-induced homog-raphy H, and e′ is the projection of the epipole under P′.

We demonstrate that there exists a closed-form of ρ when we know both R′ andt′. First, we respectively express xπ′ in the reference system of cameras C and C′

as follows:

x =Rxπ′ − Rt,

x′ =R′xπ′ − R′t′,(C.2)

that is, x is xπ′ expressed in coordinates of C, and x′ is xπ′ expressed in coordinatesof C′.

205

Page 228: Efficient Model-based 3D Tracking by Using Direct Image Registration

Figure C.1: Plane+Parallax-constrained homograpy.

We compute the transformation that relates x and x′ by combining Equations C.2as follows:

x′ = R′R⊤x+ R′(t− t′). (C.3)

Notice that x′′ = Kx′ as x′ is already expressed in coordinates of C′.We express x by chaining three transformations: (1) from x to xπ,

xπ =(R⊤tn⊤

πR⊤/(d− n⊤

πt))Kx, (C.4)

(2) from xπ to xπ′ ,xπ′ =

(δR− δRδtn⊤

π/d)xπ, (C.5)

and (3) from xπ′ to x,

x =(R− Rtn⊤

πδR⊤/(d− n⊤

πδt))xπ′ . (C.6)

Inserting Equations C.4–C.6 into Equation C.1, and projecting using K leads to anexpression relating x and x′′,

x′′ = KR′R⊤(

R− Rtn⊤πδR

(d− n⊤πδt)

)(

δR− δRδtn⊤π

d

)(

R⊤tn⊤πR

(d− n⊤πt)

)

Kx+ KR′(t− t′)

(C.7)We rewrite Equation C.7 as

x′′ = Hx+ e′, (C.8)

206

Page 229: Efficient Model-based 3D Tracking by Using Direct Image Registration

where

H = KR′R⊤(

R− Rtn⊤πδR

(d− n⊤πδt)

)(

δR− δRδtn⊤π

d

)(

R⊤tn⊤πR

(d− n⊤πt)

)

K−1, (C.9)

ande′ = KR′(t− t′). (C.10)

Matrix H is the plane induced homography between x and x′, and e′ is the epipolein the left view. Note that e′ ∈ P

2 is a point in general projective form— i.e.e′ = (x, y, w)⊤. However, if we express e′ as an augmented point on the Euclideanplane—i.e. e′ = (x/w, y/w, 1)⊤— then we rewrite Equation C.8 as

x′′ = Hx+ ρe′, (C.11)

where ρ = w is the projective depth of the epipole.We define the Plane+Parallax Constrained Homography, fH6PP, using Equa-

tion C.9,x′′ = fH6PP(x;µ) = Hx+ ρe′. (C.12)

C.1 Compositional Form

We may rewrite the warp fH6PP (Equation C.12) as a composition of two functions,so it can be directly used in compositional algorithms (see Chapter 6). We recastEquation C.12 as

x′′ = fH6PP(x;µ) = h(g(x; δR, δt); R, R′, t, t′), (C.13)

where we define the functions h and g as

h(x; R, R′, t, t′) = KR′R⊤x− KR′(t− t′),

g(x; δR, δt) =

(

R− Rtn⊤πδR

(d− n⊤πδt)

)(

δR− δRδtn⊤π

d

)(

R⊤tn⊤πR

(d− n⊤πt)

)

Kx.

(C.14)

Notice that the actual optimization parameters are δR and δt—the parameters R

and t are fixed thourough the process, and R′ and t′ depend upon δR and δt.

207

Page 230: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 231: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix D

Methodical Factorization

The main goal of the factorization algorithm is to re-organise the chain of matrixproducts of the Jacobian matrix due to gradient replacement. Frequently, this ar-rangement is done using ad hoc techniques, [Hager and Belhumeur, 1998]. In thissection we propose a method to sistematically carry out the factorization step. Weuse the following theorems and their corresponding corollaries as a basis for thistechnique.

D.1 Basic Definitions

Definition D.1.1. The vec Operator: If A is a m× n matrix with values

A =

a11 · · · a1n...

. . ....

am1 · · · amn

,

the vec operator stacks the matrix columns into a vector

vec(A) =

a11...

am1...

a1n...

amn

.

Definition D.1.2. Kronecler Product: If A is a m× n matrix and B is a p× q

209

Page 232: Efficient Model-based 3D Tracking by Using Direct Image Registration

matrix, then the Kronecker product A⊗ B is the (mp)× (nq) block matrix

A⊗ B =

a11B · · · a1nB...

......

am1B · · · amnB

.

For more properties of the Kronecker product we recommend the reading of [K. B. Pe-tersen].

Definition D.1.3. Kronecker Row-product: If A is a m× n matrix and B is ap× q matrix, we define the Kronecker row-product A⊙ B as the (mp)× q matrix

A⊙ B =

A⊗ b⊤1

...

A⊗ b⊤p

,

where b⊤i are the p rows of matrix B.

We define the concepts of permutation and permutation matrix bellow. Weshall use these definitions to re-organize Kronecker products:

Definition D.1.4. Permutation: Given a set 1, . . . ,m, a permutation, π, ofthe set is a bijective map of that set onto itself: π : 1, . . . ,m 7→ 1, . . . ,m. A lessformal defition would enunciate the permutation of a set as a reordering of the setelements. We annotate the permutation of set m, π(m), as

π(m) =

(1 2 · · · m

π(1) π(2) · · · π(m)

)

.

For example, given the set 1, 2, 3, a valid permutation of that set is

(1 2 3

π(1) π(2) π(m)

)

=

(1 2 32 3 1

)

.

Definition D.1.5. Permutation Matrix: If π(n) is a permutation of the set1, . . . , n, then we define the n× n permutation matrix, Pπ(n), as

Pπ(n) =

e⊤π(1)

e⊤π(2)...

e⊤π(n)

,

where ei ∈ Rn is the i-th unit vector: i.e. the vector that is zero in all entries except

the i-th where it is 1.

210

Page 233: Efficient Model-based 3D Tracking by Using Direct Image Registration

The permutation matrix enable us to re-order the rows and collumns of a matrixor vector. We shall use this property in Theorem 6. Additionally, we define thepermutation with ratio as a special sub-class of permutations

Definition D.1.6. The permutation of the set 1, . . . ,m with ratio q, π(m : p),is the permutation that verifies

(1 2 3

π(1) π(2) π(m)

)

For example, the permutation π(9 : 3) is

π(9 : 3) =

(1 2 3 4 5 6 7 8 9

π(1) π(2) π(3) π(4) π(5) π(6) π(7) π(8) π(9)

)

=

(1 2 3 4 5 6 7 8 91 4 7 2 5 8 3 6 9

)

D.2 Lemmas that Re-organize Product of Matri-

ces

Using the above definitions we enunciate the theorem that let us to re-arrange themultiplication of two matrices whereas it keeps result of the product the same.

Theorem 5. Let A and B be m × n and n × p matrices respectively. We can rewritetheir product AB as

(Im ⊗ vec(B)⊤

) (Ip ⊙ A

).

Proof. The product AB can be alternatively written as a row-wise times collumn-wisevector product,

Am×nBn×p =

a⊤11×n

...a⊤m1×n

[b1n×1

· · · bpn×1

],

=

a⊤1 b1 · · · a⊤

1 bp

.... . .

...a⊤mb1 · · · a⊤

mbp,

, =

b⊤1 a1 · · · b⊤

p a1...

. . ....

b⊤1 am · · · b⊤

p am,

211

Page 234: Efficient Model-based 3D Tracking by Using Direct Image Registration

after some basic matrix manipulations. The result can be re-arranged as product ofa m×mnp matrix times a mnp× p matrix,

Am×nBn×p =

b⊤1 b

⊤2 · · ·b⊤

p 0⊤ · · · 0⊤

0⊤ b⊤1 b

⊤2 · · ·b⊤

p · · · 0⊤

......

. . ....

0⊤ · · · 0⊤ b⊤1 b

⊤2 · · ·b⊤

p

a1 · · · 0...

. . ....

0 · · · a1

a2 · · · 0...

. . ....

0 · · · a2...

......

am · · · 0...

. . ....

0 · · · am

which can be compacly rewritten using Kronecker product and row-product and vec

operator as

Am×nBn×p =(Im ⊗ vec(B)⊤

) (Ip ⊙ A

).

Following this theorem we define four corollaries that deal with the most commoncases.

Corollary 5. If A be a m× n matrix and b a n× 1 column vector, then the productAb can be rewritten as

(Im ⊗ b⊤

)vec(A⊤).

Proof. If we apply theorem 5 using the current matrix dimensions, we get

Am×nbn×1 =(Im ⊗ b⊤

)(I1 ⊙ A)

=(Im ⊗ b⊤

)

a1

a2...am

=(Im ⊗ b⊤

)vec(A⊤)

We can reach the same result by just rewriting the matrix A row-wise so the productis

Ab =

a⊤1...a⊤m

b =

a⊤1 b...

a⊤mb

=

b⊤a1...

b⊤am

The resulting transposed product can be rewriten as a matrix multiplication in the

212

Page 235: Efficient Model-based 3D Tracking by Using Direct Image Registration

form

Ab =

b⊤ · · · 01×n

......

...

01×n · · · b⊤

a1...am

=(Im ⊗ b⊤

)vec(A⊤),

compactly written using Kronecker product and vec operator.

Corollary 6. If b a m× 1 column vector and A be a m× n matrix then the productb⊤A can be rewritten as vec(A)⊤ (In ⊗ b).

Proof. We rewrite the product straightforward by using theorem 5, but changingthe dimensions accordingly,

b⊤1×nAm×n =

(I1 ⊗ vec(A)⊤

) (In ⊙ b⊤

)

= vec(A)⊤ (In ⊗ b) .

Corollary 7. If a is a m×1 column vector and b⊤ is a 1×n row vector, the resultingm× n matrix ab⊤ can be rewritten as

(Im ⊗ b⊤

)(a⊗ In)

Proof. We use the theorem 5 to rewrite the product ab⊤ accordingly to the vectorsizes,

am×1b⊤1×n =

(Im ⊗ vec(b⊤)⊤

)(In ⊙ a)

=(Im ⊗ b⊤

)(a⊗ In) .

Notice that we can re-organize the row-collumn product, a⊤b, in a direct way by justrewriting it as b⊤a. However, we will the following corollary several times duringour factorisation.

Corollary 8. Let a and b be two n × 1 collumn vectors. The product (a⊤b)Im canbe rewritten as (Im ⊗ a⊤)(Im ⊗ b).

Proof. The initial product is compactly rewritten as:

(a⊤b)Im =

a⊤b · · · 0...

......

0 · · · a⊤b

=

a⊤ · · · 01×n

......

...

01×n · · · b⊤

b · · · 0n×1...

......

0n×1 · · · b

,

= (Im ⊗ a⊤)(Im ⊗ b).

213

Page 236: Efficient Model-based 3D Tracking by Using Direct Image Registration

Lemmas Input Output

Theorem 5A

(m×n)

B

(n×p)

(Im ⊗ vec(B)⊤

)

m×(np)

(Ip ⊙ A

)

(np)×p

Corollary 5A

(m×n)

b(n×1)

(Im ⊗ b⊤)

m×(mn)

vec(A⊤)

(mn)×1

Corollary 6b⊤

(1×m)

A

(m×n)

vec(A)⊤

1×(mn)

(In ⊗ b))

(mn)×n

Corollary 7a

(m×1)

b⊤

(1×n)

(Im ⊗ b⊤)

m×(mn)

(a⊗ In)

(mn)×n

Corollary 8( a⊤

(1×n)

b(1×n)

) Im

(m×m)

(Im ⊗ a⊤)

m×(mn)

(Im ⊗ b)

(mn)×m

Table D.1: Lemmas used to re-arrange matrices product.

214

Page 237: Efficient Model-based 3D Tracking by Using Direct Image Registration

D.3 Lemmas that Re-organize Kronecker Prod-

ucts

In this section we shall derive some theorems and corollaries that we shall use tore-organize those products involving Kronecker matrices (see definition D.1.2).

The first theorem is used to re-order the operands of a Kronecker product. Noticethat, generally, the Kronecker product is not conmutative.

Theorem 6. If A is a m × n matrix and B is a p × q matrix, then their Kroneckerproduct is permutation equivalent, i.e., there exist (mp)× (mp) permutation matricesP and Q such that

A⊗ B = P(B⊗ A)Q

.

The following theorem and its corollaries re-organize a Kronecker product whereone of the operands is an identity matrix.

Theorem 7. If Im is a m×m identity matrix and A is a n× p matrix, then we canre-organize the (mn)× (mp) Kronecker product as

Im ⊗ A = Pπ(p) (A⊗ Im) ,

where Pπ((mn):p) permutation matrix of the set 1, . . . , (mn) with ratio p (see defini-tion D.1.6).

From this theorem we derive three corollaries that we shall use directly in ourderivations.

Corollary 9. If Im is the m×m identity matrix, a and b are n× 1 vectors, we canrewrite the (mn)× (mn) product (Ip ⊗ a)

(Ip ⊗ b⊤

)as

(Im ⊗ a)(

Im ⊗ b⊤)

= P⊤π((mn):n)

(

Im ⊗ b⊤)

Pπ((mn2):n) (Im ⊗ a) .

Corollary 10. If Im is the m×m identity matrix, a and b are n× 1 vectors, we canrewrite the m×m product

(Im ⊗ a⊤

)(Im ⊗ b) as

(

Im ⊗ a⊤)

(Im ⊗ b) =(

b⊤ ⊗ Im

)

(a⊗ Im)

215

Page 238: Efficient Model-based 3D Tracking by Using Direct Image Registration

Lemmas Input Output

Theorem 7Im ⊗ A

(mn)×(mp)

Pπ((mn):p)

(mn)×(mn)

(A⊗ Im)

(mn)×(mp)

Corollary 9(Im ⊗ a)

m×(mn)

(Im ⊗ b⊤)

(mn)×n

P⊤π((mn):n)

m×m

(Im ⊗ b⊤

)

m×(mn)

Pπ((mn2):n)

(mn)×(mn)

(Im ⊗ a)

(mn)×m

Corollary 10(Im ⊗ a⊤)

m×(mn)

(Im ⊗ b)

(mn)×m

(b⊤ ⊗ Im)

m×(mn)

(a⊗ Im)

(mn)×m

Corollary 11(a⊗ Im)

(mn)×m

(Im ⊗ b⊤)

m×(mn)

(I(mn) ⊗ b⊤)

(mn)×(mn2)

(a⊗ I(mn))

(mn2)×(mn)

Table D.2: Lemmas used to re-arrange Kronecker matrix products.

Corollary 11. If Im is the m × m identity matrix, and a and b are n × 1 vectors,we can rewrite the (mn)× (mn) product (a⊗ Im)

(Im ⊗ b⊤

)as

(a⊗ Im)(

Im ⊗ b⊤)

=(

I(mn) ⊗ b⊤) (

a⊗ I(mn)

).

In addition to this, we shall use the following properties of the Kronecker productin our derivations. If Im is the m×m identity matrix, and a and b are n×1 vectors,then

(Im ⊗ ab⊤

)= (Im ⊗ a)

(Im ⊗ b⊤

),

(Im ⊗ a)⊤ =(Im ⊗ a⊤

),

(a⊗ Im)⊤ =

(a⊤ ⊗ Im

).

D.4 Lemmas that Re-organize Sums of Matrices

In this section we shall define some propositions that re-arrange the distributiveproduct of matrices.

Proposition 8. If A is a m×p matrix, B is a m× (np) matrix, Ip is the p×p identitymatrix, and a is a n× 1 vector, then there exist m× (m+ np) matrices P and Q such

216

Page 239: Efficient Model-based 3D Tracking by Using Direct Image Registration

that we can rewrite the distributive product A+ B (Ip ⊗ a) as

A+ B (Ip ⊗ a) = (AP+ BQ)

(

Ip ⊗

[1a

])

.

The matrices P and Q are respectively in the form,

P =[e1 0m×n e2 0m×n · · · ei 0m×n · · · 0m×n em

],

and

Q =

0n×1 In 0n×1 0n · · · 0n×1 0n · · · 0n×1 0n

0n×1 0n 0n×1 In · · · 0n×1 0n · · · 0n×1 0n...

......

.... . .

......

......

...0n×1 0n 0n×1 0n · · · 0n×1 In · · · 0n×1 0n...

......

.... . .

......

. . ....

...0n×1 0n 0n×1 0n · · · 0n×1 0n · · · 0n×1 In

,

where ei is the i-the unit vector (see Definition D.1.5), 0n×1 is the n×1 zeroes vector,and 0n is the n× n matrix whose all entries are zero.

217

Page 240: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 241: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix E

Methodical Factorization of f3DTM

The goal of this section is to show how to factorize the Jacobian matrix J in asysthematic way by just using the lemmas of Appendix D. We attempt to obtainthe most efficient factorization (i.e. the factorization that employs the least num-ber of operations). We separately factorize each element of the Jacobian matrix J

(Equations 5.27).

Decomposition of J1

J1 =∇uT [u]⊤λH−1A

(I3 − (n⊤t)I3 + tn⊤

)

R⊤Rα v

Cor. 5−−−→D⊤

(

I3 − ( n⊤ t )I3 + tn⊤)

(I⊗ v⊤)vec(R⊤αR)

Cor. 8−−−→D⊤

(

I3 − (I3 ⊗ n⊤)(I3 ⊗ t) + t n⊤

)

(I3 ⊗ v⊤)vec(R⊤αR)

Cor. 7−−−→D⊤

(

I3 − (I3 ⊗ n⊤(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3))

(I3 ⊗ v⊤)vec(R⊤αR),

(E.1)

where D⊤ = ∇uT [u]⊤λH−1A .We continue with the factorization process by eliminat-

ing the distributive product of Equation E.1. First, we insert the term I3 ⊗ v⊤ intothe sum of the distributive product,

J1 =D⊤(I3 − (I3 ⊗ n⊤(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3)

)(I3 ⊗ v⊤)vec(R

⊤αR)

=D⊤((I3 ⊗ v⊤)− (I3 ⊗ n⊤(I3 ⊗ t)(I3 ⊗ v⊤) + (I3 ⊗ n⊤)(t⊗ I3)(I3 ⊗ v⊤)

)vec(R

⊤αR).

(E.2)

Second, we reorganize the two terms of the expression inside the parentheses ofEquation E.2 containing translations, such that we scroll the operands containing tto the right side of the product. We reorganize the term (I3⊗t)(I3⊗v⊤) as follows:

(I3 ⊗ n(i)⊤) (I3 ⊗ t) (I3 ⊗ v(i)⊤)

Cor. 5−−−−→(I3 ⊗ n(i)⊤) Pπ(9:3)(I9 ⊗ v(i)⊤)(I9 ⊗ v(i)⊤)Pπ(27:3)(I9 ⊗ t) .

(E.3)

Notice two important facts about Equation E.3: (1) we simplify the term Pπ(9:3)(I9⊗v⊤) using a basic property of Kronecker product [K. B. Petersen], and (2) we use

219

Page 242: Efficient Model-based 3D Tracking by Using Direct Image Registration

the Corollary 11 to reorder the Kronecker product Pπ(27:3)(I9⊗ t). If we apply thesetwo properties, we obtain the following results:

(I3 ⊗ n⊤(I3 ⊗ t)(I3 ⊗ v⊤) =(I3 ⊗ n⊤) Pπ(9:3)(I9 ⊗ v⊤) Pπ(27:3)(I9 ⊗ t)

−→(I3 ⊗ n⊤) (Pπ(9:3) ⊗ v⊤) Pπ(27:3)(I9 ⊗ t)

Cor. 11−−−−−→(I3 ⊗ n⊤)(Pπ(9:3) ⊗ v⊤) (t⊗ I9) .

(E.4)

We rearrange the second term in a similar way: we apply the Corollary 11 to theterm (t⊗ I3)(I3 ⊗ v⊤), so we obtain:

(I3 ⊗ n⊤) (t⊗ I3) (I3 ⊗ v⊤)Cor. 11−−−−→ (I3 ⊗ n⊤) (I9 ⊗ v⊤)(t⊗ I9) . (E.5)

Notice the common factor (t ⊗ I9) in both Equations E.4 and E.5; we can nowreorganize the summation part of the distributivity by applying Proposition 8:

(I3 − (I3 ⊗ n⊤(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3)

)(I3 ⊗ v⊤)

Prop. 8−−−−→

[(I3 ⊗ n⊤)A+

⟨(I3 ⊗ n⊤)(I9 ⊗ v⊤)− (I3 ⊗ n⊤)(Pπ(9:3) ⊗ v⊤)

⟩B]([

1t

]

⊗ I9

)

,

(E.6)

where the matrices A and B are in the following form (see Theorem 7 for furtherdetails):

A =

1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 0 0 0

,

and

B =

03×1 I3 03×1 03 03×1 0303×1 03 03×1 I3 03×1 0303×1 03 03×1 03 03×1 I3

.

(E.7)

Finally, we rename the product of D⊤ and the highlighted portion of the Equa-tion E.6 as S⊤

1 :

S⊤1 = D⊤

[(I3 ⊗ n⊤)A+

⟨(I3 ⊗ n⊤)(I9 ⊗ v⊤)− (I3 ⊗ n⊤)(Pπ(9:3) ⊗ v⊤)

⟩B]. (E.8)

Therefore, we write the factorized version of J1 as:

J1 = S⊤1

([1t

]

⊗ I9

)

vec(R⊤αtR). (E.9)

Notice that the Equation E.9 represents a proper factorization (according to therules of Section 5.3): the term S⊤

1 is constant—only made of shape terms—and the

term

([1t

]

⊗ I9

)

vec(R⊤αtR) depends on only motion-variable terms.

220

Page 243: Efficient Model-based 3D Tracking by Using Direct Image Registration

Decomposition of J2 and J3 We reorganize the terms J2 and J3 in the same waywe did with J1. The only difference lies on the parameter of the rotational derivative:we use α for J1, β for J2, and γ for J3. Hence, we show the final factorized formsfor J2 and J3 in the next equations:

J2 =S⊤1

([1t

]

⊗ I9

)

vec(R⊤β R),

and

J3 =S⊤1

([1t

]

⊗ I9

)

vec(R⊤γ R).

(E.10)

Decomposition of J4 The matrix decomposition for J4, J5, and J6 is slightydifferent from the three previous elements as the former elements do not involve arotational derivative, that is:

J4 = D⊤(R⊤ − (n⊤t)R⊤ + tn⊤R⊤

)r1n

⊤v. (E.11)

Alghough we could deliver a completely different routine to rearrange J4, we optto reuse the most part of the factorization process of J1. We reorder the terms ofEquation E.11 as follows:

J4 = D⊤(R⊤ − (n⊤t)R⊤ + tn⊤R⊤

)r1 n⊤v

Cor. 5−−−−→

(R⊤ − (n⊤t)R⊤ + tn⊤R⊤

)

(I3 ⊗ v⊤n)vec(r⊤1 ) ,(E.12)

where r1 is the first collumn of matrix R— i.e. R = (r1, r2, r3). We decompose(I3 ⊗ v⊤n) into (I3 ⊗ v⊤)(I3 ⊗ n) by using Corollary 8,

J4 = D⊤(I3 − (n⊤t)I3 + tn⊤I3

)(I3 ⊗ v⊤n)vec(r⊤1 )

Cor. 8−−−−→D⊤

(I3 − (n⊤t)I3 + tn⊤I3

)

(I3 ⊗ v⊤)(I3 ⊗ n) vec(r⊤1 ).(E.13)

Now we can reorder Equation E.13 using the result from Equation E.8 and Corol-lary 11; we show the process in the next equations:

D⊤(I3 − (n⊤t)I3 + tn⊤I3

)(I3 ⊗ v⊤) (I3 ⊗ n)vec(r⊤1 )

Eq. E.8−−−−−→S1

([1t

]

⊗ I9

)

(I3 ⊗ n) vec(r⊤1 )

Cor. 11−−−−−→S1 ((I3 ⊗ n)⊗ I4)

(

I3 ⊗

[1t

])

vec(r⊤1 )

(E.14)

We write the expression S⊤2 as

S⊤2 = S⊤

1 ((I3 ⊗ n)⊗ I4) , (E.15)

so we concisely write the term J4 as:

J4 = S⊤2

(

I3 ⊗

[1t

])

r1. (E.16)

221

Page 244: Efficient Model-based 3D Tracking by Using Direct Image Registration

Decomposition of J5 and J6 We reorganize the terms J5 and J6 in the sameway we did with J4. The only difference lies on the columns of matrix R that areinvolved: we use r1 for J4, r2 for J5, and r3 for J6. We show the decompositionexpressions fo J5 and J6 in the following equations:

J5 = S⊤2

(

I3 ⊗

[1t

])

r2, (E.17)

and

J6 = S⊤2

(

I3 ⊗

[1t

])

r3. (E.18)

Summary of results for J If we gather Equations E.9, E.10, E.16, E.17 and E.18,we can rewrite Equation 5.27 as follows:

J⊤ =[S⊤1 S⊤

2

]M, (E.19)

where we define the matrix M as follows:

M =

([1t

]

⊗ I9

)⟨

vec(R⊤αtR) vec(R

⊤βtR) vec(R

⊤γtR)⟩

036×3

012×3

(

I3 ⊗

[1t

])

R

. (E.20)

222

Page 245: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix F

Methodical Factorization off3DMM (Partial case)

The process of decomposing Equations 5.57 is slightly different from the full-factorizationcase. First, we completely factorize the expression D (see Equation 5.58). We groupthose motion terms that are common in D

( n⊤Bs c )I3 − Bs c n⊤Cor. 5−−−→ (I3 ⊗ n⊤Bs)(I3 ⊗ c) − Bs (I6 ⊗ n⊤)(c⊗ I3) ,

Cor. 5−−−→(I3 ⊗ n⊤Bs)(I3 ⊗ c)− Bs(I6 ⊗ n⊤) Pπ(9:3)(I3 ⊗ c) ,

Cor. 5−−−→

[(I3 ⊗ n⊤Bs)− Bs(I6 ⊗ n⊤)Pπ(9:3)

](I3 ⊗ c).

(F.1)

and

t n⊤ − ( n⊤ t )I3Cor. 5−−−→ (I3 ⊗ n⊤)(t⊗ I3) − (I3 ⊗ n⊤)(I3 ⊗ t) ,

I3Cor. 5−−−→(I3 ⊗ n⊤) Pπ(9:3)(I3 ⊗ t) − (I3 ⊗ n⊤)(I3 ⊗ t),

I3Cor. 5−−−→

[(I3 ⊗ n⊤)Pπ(9:3) − (I3 ⊗ n⊤)

](I3 ⊗ t).

(F.2)

Using Equations F.1 and F.2 we rewrite D as

D = I3 + s1(I3 ⊗ t) + s2(I3 ⊗ c), (F.3)

where

s1 =[(I3 ⊗ n⊤)Pπ(9:3) − (I3 ⊗ n⊤)

], and

s2 =[(I3 ⊗ n⊤Bs)− Bs(I6 ⊗ n⊤)Pπ(9:3)

].

(F.4)

Notice that we can re-organize the summation terms in Equation F.3 more compactlyby using Proposition 8, as we show here:

D =I3 + [s1P+ s2Q]

(

I3 ⊗

[tc

])

H = 〈I3P′ + [s1P+ s2Q] Q

′〉

I3 ⊗

1tc

,

(F.5)

223

Page 246: Efficient Model-based 3D Tracking by Using Direct Image Registration

where

P =

I3 03 03 03 03 03 03 03 0303 03 03 I3 03 03 03 03 0303 03 03 03 03 03 I3 03 03

,

Q =

06×3 I6 06×3 06 06×3 0606×3 06 06×3 I6 06×3 0606×3 06 06×3 06 06×3 I6

,

P′ =[e1 03×9 e2 03×9e3 03×9

],

and

Q′ =

09×1 I9 09×1 09 09×1 0909×1 09 09×1 I9 09×1 0909×1 09 09×1 09 09×1 I9

.

(F.6)

We rewrite Equation F.5 more compactly by using

D1 = 〈I3P′ + [s1P+ s2Q] Q

′〉 ,

D2 =

I3 ⊗

1tc

,(F.7)

so the resulting representation of D is

D = D1D2. (F.8)

Notice that Equation F.8 has two differently parts: D1 depends only on structureparameters whereas D2 solely depends on motion. The key idea of the partial fac-torization is to leave untouched those parts of Equation 5.57 whose factorizationprocess could slow down the computation; thus, we only decompose the term D1 sowe rewrite the elements of the Jacobian (Equation 5.57) as follows:

J1 =D1D2R⊤Rαt

(I3 + Bscn

⊤)v,

J2 =D1D2R⊤Rβt

(I3 + Bscn

⊤)v,

J3 =D1D2R⊤Rγt

(I3 + Bscn

⊤)v,

J4 =D1D2r1n⊤v,

J5 =D1D2r2n⊤v,

J6 =D1D2r3n⊤v,

and

Jk =D1D2R⊤Bk.

(F.9)

224

Page 247: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix G

Methodical Factorization off3DMM (Full case)

The goal of this section is to show how to factorize the Jacobian matrix J that spanEquations 5.57 by using the lemmas of Appendix D. We attempt to obtain themost efficient factorization (i.e. the factorization that employs the least number ofoperations). We separately factorize each element of the Jacobian matrix J in thefollowing.

Decomposition of J1 We show the expanded version of J1 from Equation 5.57as follows:

J1 = D⊤(I+ (n⊤Bsct)I− (n⊤t)I− Bscn

⊤ + tn⊤)R⊤Rα (I+ Bsc)v, (G.1)

where D⊤ = ∇uT [u]⊤λH−1A K. We split Equation G.1 in four chunks by applying the

distributive property as follows:

J(1)1 =D⊤

((n⊤Bsct)I− Bscn

⊤)R⊤Rαv,

J(2)1 =D⊤

((n⊤Bsct)I− Bscn

⊤)R⊤Bscv,

J(3)1 =D⊤

(I− (n⊤t)I+ tn⊤

)R⊤Rαv,

J(4)1 =D⊤

(I− (n⊤t)I+ tn⊤

)R⊤Bscv.

(G.2)

We separately re-arrange the terms J(1)1 , J

(2)1 , J

(3)1 , and J

(4)1 from Equations G.2.

We re-organize each term of J1 by using the Lemmas in Appendix D. We show the

225

Page 248: Efficient Model-based 3D Tracking by Using Direct Image Registration

re-arranging process for J(1)1 in the following:

J(1)1 =D⊤

(

( n⊤Bs ct )I− Bscn⊤)

R⊤Rαv,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs c n⊤

)

R⊤Rαv,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs (IK ⊗ n⊤)(I3 ⊗ c))

R⊤Rαv,

=D⊤(I3 ⊗ n⊤Bs) (IK ⊗ c)R⊤Rα v

−D⊤(Bs(IK ⊗ n⊤)(I3 ⊗ c)

)R⊤Rαv,

Cor. 5−−−→D⊤(I3 ⊗ n⊤Bs) (I3 ⊗ v⊤)vec(R

⊤αR(IK ⊗ c⊤)

−D⊤Bs(IK ⊗ n⊤) (I3 ⊗ c)R⊤Rα v ,

Cor. 5−−−→D⊤(I3 ⊗ n⊤Bs)(I3 ⊗ v⊤)vec(R

⊤αR(IK ⊗ c⊤))

−D⊤Bs(IK ⊗ n⊤) (I3 ⊗ v⊤)vec(R⊤αR(I3 ⊗ c⊤)) ,

(G.3)

Note that we can rewrite Equation G.3 as a product of two matrices,

J(1)1 =

(D⊤(I3 ⊗ n⊤Bs)(I3 ⊗ v⊤),D⊤Bs(IK ⊗ n⊤)(I3 ⊗ v⊤)

)

(

vec(R⊤αR(IK ⊗ c⊤))

vec(R⊤αR(I3 ⊗ c⊤))

)

(G.4)

We now proceed with the remaining terms J(2)1 , J

(3)1 , and J

(4)1 in the following:

J(2)1 =D⊤

(

( n⊤Bs ct )I− Bscn⊤)

R⊤Bscv,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs c n⊤

)

R⊤Bscv,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs (IK ⊗ n⊤)(I3 ⊗ c))

R⊤Bscv,

=D⊤(I3 ⊗ n⊤Bs)(IK ⊗ c)R⊤ Bs c v

−D⊤Bs(IK ⊗ n⊤)(I3 ⊗ c)R⊤ Bs c v,

Cor. 6−−−→D⊤(I3 ⊗ n⊤Bs)(IK ⊗ c)R⊤ (I3 ⊗ c⊤)vec(B⊤

s ) v

−D⊤Bs(IK ⊗ n⊤)(I3 ⊗ c)R⊤ (I3 ⊗ c⊤)vec(B⊤s ) v,

=D⊤(I3 ⊗ n⊤Bs) (IK ⊗ c)R⊤(I3 ⊗ c⊤)vec (B⊤s )v

−D⊤Bs(IK ⊗ n⊤)(I3 ⊗ c)R⊤(I3 ⊗ c⊤)vec(B⊤s )v,

Cor. 6−−−→D⊤(I3 ⊗ n⊤Bs)

(I3 ⊗ vec(B⊤

s )v)vec((IK ⊗ c)R(I3 ⊗ c⊤))

−D⊤Bs(IK ⊗ n⊤) (I3 ⊗ c)R⊤(I3 ⊗ c⊤)vec (B⊤s )v ,

Cor. 6−−−→D⊤(I3 ⊗ n⊤Bs)

(I3 ⊗ vec(B⊤

s )v)vec((IK ⊗ c)R(I3 ⊗ c⊤))

−D⊤Bs(IK ⊗ n⊤)(I3 ⊗ vec(B⊤

s )v)vec((IK ⊗ c)R(I3 ⊗ c⊤)) ,

(G.5)

226

Page 249: Efficient Model-based 3D Tracking by Using Direct Image Registration

J(3)1 =D⊤

(

I− ( n⊤ t )I+ tn⊤)

R⊤Rαv,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + t n⊤

)

R⊤Rαv,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3))

R⊤Rαv,

=D⊤R⊤Rα v − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤Rαv

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Rαv,

Cor. 5−−−→D⊤

(I3 ⊗ v⊤)vec(R⊤αR) − (I3 ⊗ n⊤) (I3 ⊗ t)R⊤Rα v

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Rαv,

Cor. 5−−−→D⊤(I3 ⊗ v⊤)vec(R

⊤αR)− (I3 ⊗ n⊤) (I3 ⊗ v⊤)vec(R

⊤αR(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤) (t⊗ I3)R⊤Rα v ,

Cor. 5−−−→D⊤(I3 ⊗ v⊤)vec(R

⊤αR)− (I3 ⊗ n⊤)(I3 ⊗ v⊤)vec(R

⊤αR(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤) (I3 ⊗ v⊤)vec(R⊤αR(t

⊤ ⊗ I3)) .

(G.6)

J(4)1 =D⊤

(

I− ( n⊤ t )I+ tn⊤)

R⊤Bscv,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + t n⊤

)

R⊤Bscv,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3))

R⊤Bscv,

=D⊤R⊤ Bs c v − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤Bscv

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

Cor. 5−−−→D⊤R⊤ (I3 ⊗ c⊤)vec(B⊤s ) v − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤Bscv

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

=D⊤R⊤(I3 ⊗ c⊤) vec(B⊤s )v − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤Bscv

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

Cor. 5−−−→D⊤ (I3 ⊗ v⊤vec(B⊤s )

⊤)vec ((I3 ⊗ c)R) − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤ Bs c v

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

Cor. 5−−−→D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤)vec ((I3 ⊗ c)R)− (I3 ⊗ n⊤)(I3 ⊗ t)R⊤ (I3 ⊗ c⊤)vec(B⊤s ) v

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

(G.7)

227

Page 250: Efficient Model-based 3D Tracking by Using Direct Image Registration

=D⊤(I3 ⊗ v⊤vec(B⊤s )⊤)vec ((I3 ⊗ c)R)− (I3 ⊗ n⊤) (I3 ⊗ t)R⊤(I3 ⊗ c⊤) vec(B⊤s )v

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤Bscv,

Cor. 5−−−→D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤)vec ((I3 ⊗ c)R)

− (I3 ⊗ n⊤)(I3 ⊗ v⊤vec(B⊤s )

⊤)vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤ Bs c v,

Cor. 5−−−→D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤)vec ((I3 ⊗ c)R)

− (I3 ⊗ n⊤)(I3 ⊗ v⊤vec(B⊤s )

⊤)vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤ (I3 ⊗ c⊤)vec(B⊤s ) ,

=D⊤(I3 ⊗ v⊤vec(B⊤s )⊤)vec ((I3 ⊗ c)R)

− (I3 ⊗ n⊤)(I3 ⊗ v⊤vec(B⊤s )

⊤)vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤) (t⊗ I3)R⊤(I3 ⊗ c⊤) vec(B⊤s ) ,

Cor. 5−−−→D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤)vec ((I3 ⊗ c)R)

− (I3 ⊗ n⊤)(I3 ⊗ v⊤vec(B⊤s )

⊤)vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))

+ (I3 ⊗ n⊤)(I3 ⊗ v⊤vec(B⊤s )

⊤)vec((I3 ⊗ c)R⊤(t⊗ I3)) .

We rewrite Equations G.5, G.6, and G.7 in the say way that Equation G.4 as follows:

J(2)1 =

( (D⊤(I3 ⊗ n⊤Bs)

(I3 ⊗ vec(B⊤

s )v))⊤

−(D⊤Bs(IK ⊗ n⊤)

(I3 ⊗ vec(B⊤

s )v))⊤

)⊤(vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

)

,

J(3)1 =

(D⊤(I3 ⊗ v⊤)

)⊤

−(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

vec(R⊤αR)

vec(R⊤αR(I3 ⊗ t⊤))

vec(R⊤αR(t

⊤ ⊗ I3))

,

J(4)1 =

(D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

−(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

vec((I3 ⊗ c)R)vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(G.8)

Combining Equations G.4 and G.8 we rewrite the Jacobian element J1 (Equa-tion G.1) as

J1 = S⊤1 M1, (G.9)

228

Page 251: Efficient Model-based 3D Tracking by Using Direct Image Registration

where

S1 =

(D⊤(I3 ⊗ n⊤Bs)(I3 ⊗ v⊤)

)⊤

(D⊤Bs(IK ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ n⊤Bs)

(I3 ⊗ vec(B⊤

s )v))⊤

−(D⊤Bs(IK ⊗ n⊤)

(I3 ⊗ vec(B⊤

s )v))⊤

(D⊤(I3 ⊗ v⊤)

)⊤

−(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ n⊤)(I3 ⊗ v⊤)

)⊤

(D⊤(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

−(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

(D⊤(I3 ⊗ n⊤)

(I3 ⊗ v⊤vec(B⊤s )

⊤))⊤

1×(63+81K+18K2)

,

and

M1 =

vec(R⊤αR(IK ⊗ c⊤))

vec(R⊤αR(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤αR)

vec(R⊤αR(I3 ⊗ t⊤))

vec(R⊤αR(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

(G.10)

Decomposition of J2 and J3 Decomposing J2 is exactly identical to the aboveprocedure but changing the Euler angle—β instead of α—of the rotational derivative.Hence, we decompose J2 as the product J2 = S1M2 where

M2 =

vec(R⊤β R(IK ⊗ c⊤))

vec(R⊤β R(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤β R)

vec(R⊤β R(I3 ⊗ t⊤))

vec(R⊤β R(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

. (G.11)

Note that there is no need to compute a matrix S2 as the shape elements (vertices,normals and basis shapes) do not change with respect to J1. We equivalentely

229

Page 252: Efficient Model-based 3D Tracking by Using Direct Image Registration

decompose J3 as J3 = S1M3, where

M3 =

vec(R⊤γ R(IK ⊗ c⊤))

vec(R⊤γ R(I3 ⊗ c⊤))

vec((IK ⊗ c)R(I3 ⊗ c⊤))vec((IK ⊗ c)R(I3 ⊗ c⊤))

vec(R⊤γ R)

vec(R⊤γ R(I3 ⊗ t⊤))

vec(R⊤γ R(t

⊤ ⊗ I3))vec((I3 ⊗ c)R)

vec((I3 ⊗ c)R⊤(I3 ⊗ t⊤))vec((I3 ⊗ c)R⊤(t⊗ I3))

(63+81K+18K2)×1

. (G.12)

Decomposition of J4 We expand the term J4 from Equation 5.57 as follows:

J4 = D⊤(I+ (n⊤Bsct)I− (n⊤t)I− Bscn

⊤ + tn⊤)R⊤n⊤v, (G.13)

where D⊤ = ∇uT [u]⊤λH−1A K. We split Equation G.13 in two chunks by applying

the distributive property,

J(1)4 = D⊤

(I+ (n⊤Bsct)− Bscn

⊤)R⊤n⊤v,

and

J(2)4 = D⊤

(I− (n⊤t)I+ tn⊤

)R⊤n⊤v.

(G.14)

We separaterly re-arrange the terms J(1)4 and J

(2)4 from Equation G.14. We show

the process for J(1)4 in the following:

J(1)4 =D⊤

(

( n⊤Bs ct )I− Bscn⊤)

R⊤n⊤v,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs c n⊤

)

R⊤n⊤v,

Cor. 8−−−→D⊤

(

(I3 ⊗ n⊤Bs)(IK ⊗ c)− Bs (IK ⊗ n⊤)(I3 ⊗ c))

R⊤n⊤v,

=D⊤(I3 ⊗ n⊤Bs)(IK ⊗ c)R⊤n⊤v

−D⊤(Bs(IK ⊗ n⊤)(I3 ⊗ c)

)r1n

⊤v,

=D⊤n⊤v(I3 ⊗ n⊤Bs)(IK ⊗ c)R⊤

−D⊤n⊤v(Bs(IK ⊗ n⊤)(I3 ⊗ c)

)R⊤.

(G.15)

230

Page 253: Efficient Model-based 3D Tracking by Using Direct Image Registration

Notice that we can place the scalar n⊤1×3v

⊤3×1 anywhere in Equation G.15 without

using any of the Lemmas of Appendix D. We similarly re-organize the element J(2)4 ,

J(2)4 =D⊤

(

I− ( n⊤ t )I+ tn⊤)

R⊤n⊤v,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + t n⊤

)

R⊤n⊤v,

Cor. 8−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3))

R⊤n⊤v,

=D⊤R⊤n⊤v − (I3 ⊗ n⊤)(I3 ⊗ t)R⊤n⊤v

+ (I3 ⊗ n⊤)(t⊗ I3)R⊤n⊤v,

=D⊤n⊤vR⊤ −D⊤n⊤v(I3 ⊗ n⊤)(I3 ⊗ t)R⊤

+D⊤n⊤v(I3 ⊗ n⊤)(t⊗ I3)R⊤.

(G.16)

Using Equations G.15 and G.16 we rewrite the Jacobian element J4 as J4 = S⊤2 M4,

where

S2 =

(D⊤n⊤v(I3 ⊗ n⊤Bs)

)⊤

−(D⊤n⊤v

(Bs(IK ⊗ n⊤)

))⊤

(D⊤n⊤v

)⊤

−(D⊤n⊤v(I3 ⊗ n⊤)

)⊤

(D⊤n⊤v(I3 ⊗ n⊤)

)⊤

1×(21+6K)

,

and

M4 =

(IK ⊗ c)R⊤

(I3 ⊗ c)R⊤

R⊤

(I3 ⊗ t)R⊤

(t⊗ I3)R⊤

(21+6K)×3

.

(G.17)

Decomposition of JK We expand the term JK from Equation 5.57 as follows:

JK = D⊤(I+ (n⊤Bsct)I− (n⊤t)I− Bscn

⊤ + tn⊤)Bn⊤v, (G.18)

where D⊤ = ∇uT [u]⊤λH−1A K. We split Equation G.17 in two chunks by applying

the distributive property,

J(1)K = D⊤

(+(n⊤Bsct)I− Bscn

⊤)Bn⊤v,

J(2)K = D⊤

(I− (n⊤t)I+ tn⊤

)Bn⊤v.

(G.19)

231

Page 254: Efficient Model-based 3D Tracking by Using Direct Image Registration

We separately re-arrange the terms J(1)K and J

(2)K from Equation G.19. We show the

process for J(1)K in the following:

J(1)K =D⊤

(

(n⊤Bsct)I− Bs c n⊤

)

Bn⊤v,

Cor. 8−−−→D⊤

(

(n⊤Bsct)I− Bs (IK ⊗ n⊤)(I3 ⊗ c))

Bn⊤v,

=D⊤ (n⊤Bsct)I Bn⊤v

−D⊤Bs(IK ⊗ n⊤)(I3 ⊗ c)Bn⊤v,

Cor. ??−−−−→D⊤ (n⊤v)B(n⊤Bsct)IK

−D⊤Bs(IK ⊗ n⊤) (I3 ⊗ c) Bn⊤v ,

Cor. 5−−−→D⊤(n⊤v)B( n⊤Bs ct )IK

−D⊤(n⊤v)Bs(IK ⊗ n⊤) (I3K ⊗ vec(B)⊤)(I3K ⊙ (I3 ⊗ c)) ,

Cor. 5−−−→D⊤(n⊤v)B (IK ⊗ (n⊤Bs))(I3K ⊗ c)

−D⊤(n⊤v)Bs(IK ⊗ n⊤)(I3K ⊗ vec(B)⊤)(I3K ⊙ (I3 ⊗ c))

(G.20)

We similarly re-arrange the term J(2)K ,

J(2)K =D⊤

(

I− ( n⊤ t )I+ tn⊤)

Bn⊤v,

Cor. 5−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + t n⊤

)

Bn⊤v,

Cor. 5−−−→D⊤

(

I− (I3 ⊗ n⊤)(I3 ⊗ t) + (I3 ⊗ n⊤)(t⊗ I3))

Bn⊤v,

=D⊤Bn⊤v −D⊤(I3 ⊗ n⊤) (I3 ⊗ t) Bn⊤v

+D⊤(I3 ⊗ n⊤)(t⊗ I3)Bn⊤v

Cor. 5−−−→D⊤Bn⊤v −D⊤(I3 ⊗ n⊤) (n⊤v)(I9 ⊗ vec(B)⊤)((I3 ⊗ t)⊙ IK)

+D⊤(I3 ⊗ n⊤) (t⊗ I3) Bn⊤v ,

Cor. 5−−−→D⊤Bn⊤v −D⊤(I3 ⊗ n⊤)(n⊤v)(I9 ⊗ vec(B)⊤)((I3 ⊗ t)⊙ IK)

+D⊤(I3 ⊗ n⊤) (n⊤v)(I9 ⊗ vec(B)⊤)((t⊗ I3)⊙ IK) .

(G.21)

232

Page 255: Efficient Model-based 3D Tracking by Using Direct Image Registration

Using Equations G.20 and G.21 we rewrite the element JK (Equation G.18) asJK = S⊤

3 M5, where

S3 =

(D⊤(n⊤v)B(IK ⊗ (n⊤Bs))

)⊤

−(D⊤(n⊤v)Bs(IK ⊗ n⊤)(I3K ⊗ vec(B)⊤)

)⊤

(D⊤Bn⊤v

)⊤

−(D⊤(I3 ⊗ n⊤)(n⊤v)(I9 ⊗ vec(B)⊤)

)⊤

(D⊤(I3 ⊗ n⊤)(n⊤v)(I9 ⊗ vec(B)⊤)

)⊤

1×(31K+18K2)

,

and

M5 =

(I3K ⊗ c)(I3K ⊙ (I3 ⊗ c))

IK((I3 ⊗ t)⊙ IK)((t⊗ I3)⊙ IK)

(31K+18K2)×K

.

(G.22)

Summarizing the results for J⊤ We rewrite Equation 5.57 by gathering Equa-tions G.10, Equations G.11, Equations G.12, Equations G.13, Equations G.17, andEquations G.22,

J⊤ = S⊤M, (G.23)

where

S =(S⊤1 ,S

⊤1 ,S

⊤1 ,S

⊤2 ,S

⊤3

)

1×(210+280K+72K2),

and

M =

M1 0 0 0 0

0 M2 0 0 0

0 0 M3 0 0

0 0 0 M4 0

0 0 0 0 M5

(210+280K+72K2)×(6+K)

.

(G.24)

233

Page 256: Efficient Model-based 3D Tracking by Using Direct Image Registration
Page 257: Efficient Model-based 3D Tracking by Using Direct Image Registration

Appendix H

Detailed Complexity ofAlgorithms

In this section we provide detailed descriptions of the computation of the num-ber of operations for certain stages of the algorithms under review in Chapter 7.First, we separately compute the complexities of warps f3DTM and f3DMM. Usingthese results, we subsequentely comput the complexities of algorithms HB3DTM,HB3DTMNF, HB3DMM, HB3DMMNF, and HB3DMMSF (see Chapter 7for a detailed description of these algorithms).

H.1 Warp f3DTM

We compute the number of operations that we need to perform each time we use thewarp f3DTM. We only compute the operations directly related to the warp; thus,we obviate operations that are commmon to every warp and algorithm such thatthe image operation—I or T —or the operation that extracts R or t from µ.

We recall the warp definition from Equation 5.10:

f3DTM(u,n;µ) = K′(R+ tn⊤

)Ku′, (H.1)

where K′ = H−1A K, and u′ = HAu. We display the dimensions for each term of

Equation H.1 in the following:

f3dmmr(u,n;µ) = K′

3×3

(

R

3×3

+ t

3×1

n⊤

1×3

)

K

3×3

u′

3×1

. (H.2)

We compute the complexity of Equation H.2 step by step:

f3dmmr(u,n;µ) = K′

3×3

(

R

3×3

+ t

3×1

n⊤

1×3

< 9 >M

)

K

3×3

u′

3×1

,

235

Page 258: Efficient Model-based 3D Tracking by Using Direct Image Registration

= K′

3×3

(

R

3×3

+ tn⊤

3×3

< 9 >A

)

K

3×3

u′

3×1

,

= K′

3×3

(

R+ tn⊤)

3×3

< 27 >M+< 18 >A

K

3×3

u′

3×1

,

= K′(

R+ tn⊤)

3×3

K

3×3

< 27 >M+< 18 >A

u′

3×1

,

= K′(

R+ tn⊤)

K

3×3

u′

3×1

< 9 >M+< 6 >A

.

We sum up all the partial complexities to compute the total complexity for the warp:

Θ3dmmr =< 74 > M+ < 51 > A. (H.3)

Notice that we have added an 2 extra multiplications in Equation H.3 due to thehomogeneous to Cartesian coordinates mapping.

H.2 Warp f3DMM

We show now how to compute the number of operations of the warp f3dmm. Werecall the warp structure from Equation 5.42 and we show its dimensions as follows:

f3dmm = K

3×3

(

R

3×3

+ R

3×3

(

Bs

3×K

c

K×1

− t

3×1

)

n⊤

1×3

)

K−1

3×3

u′

3×1

, (H.4)

where we define u′ as in Equation H.1. We show the step-by-step complexity in thefollowing:

f3dmm = K

3×3

(

R

3×3

+ R

3×3

(

Bs

3×K

c

K×1

< 3K >M+< 3K − 3 >A

− t

3×1

)

n⊤

1×3

)

K−1

3×3

u′

3×1

,

= K

3×3

(

R

3×3

+ R

3×3

(

Bsc

3×1

− t

3×1

< 3 >A

)

n⊤

1×3

)

K−1

3×3

u′

3×1

,

236

Page 259: Efficient Model-based 3D Tracking by Using Direct Image Registration

= K

3×3

(

R

3×3

+ R

3×3

(

Bsc− t

3×1

)

< 9 >M+< 6 >A

n⊤

1×3

)

K−1

3×3

u′

3×1

,

= K

3×3

(

R

3×3

+ R(

Bsc− t

3×1

)

n⊤

1×3

< 9 >M

)

K−1

3×3

u′

3×1

,

= K

3×3

(

R

3×3

+ R(

Bsc− t)

n⊤

3×3

< 9 >A

)

K−1

3×3

u′

3×1

,

= K

3×3

(

R+ R(

Bsc− t)

n⊤)

3×3

< 27 >M+< 18 >A

K−1

3×3

u′

3×1

,

= K(

R+ R(

Bsc− t)

n⊤)

3×3

K−1

3×3

< 27 >M+< 18 >A

u′

3×1

,

= K(

R+ R(

Bsc− t)

n⊤)

K−1

3×3

u′

3×1

< 9 >M+< 6 >A

.

Summing up these partial complexities—and including 2 multiplications for due tothe mapping from P

2 to R2 —we compute the complexity of the warp f3dmm as:

Θ3dmm =< 83 + 3K > M+ < 57 + 3K > A. (H.5)

H.3 Jacobian of Algorithm HB3DTM

We calculate the Jacobian matrix by separaterly computing each column elementfrom Equation 5.31. Notice that the expression S

((1, t)⊤ ⊗ I9

)is common to all the

elements of the row of the Jacobian, so we just compute it once to avoid includingrepeated operations.

J1 = S1

1×36

([1t

]

⊗ I9

)

36×9

< 27 >M+< 27 >A

vec( R⊤αt

3×3

R

3×3

)

< 27 >M+< 18 >A

237

Page 260: Efficient Model-based 3D Tracking by Using Direct Image Registration

= S1

([1t

]

⊗ I9

)

1×9

vec(R⊤αtR)

9×1

< 9 >M+< 8 >A

J2 = S1

([1t

]

⊗ I9

)

1×9

vec( R⊤βt

3×3

R

3×3

)

< 27 >M+< 18 >A

= S1

([1t

]

⊗ I9

)

1×9

vec(R⊤βtR)

9×1

< 9 >M+< 8 >A

J3 = S1

([1t

]

⊗ I9

)

1×9

vec( R⊤γt

3×3

R

3×3

)

< 27 >M+< 18 >A

= S1

([1t

]

⊗ I9

)

1×9

vec(R⊤γtR)

9×1

< 9 >M+< 8 >A

J4 = S1

([1t

]

⊗ I9

)

1×9

(I3 ⊗ n

)

9×3

< 9 >M+< 6 >A

R⊤

100

,

3×1

= S1

([1t

]

⊗ I9

)(I3 ⊗ n

)

1×3

R⊤

100

.

3×1

< 3 >M+< 2 >A

J5 = S1

([1t

]

⊗ I9

)

1×9

(I3 ⊗ n

)

9×3

< 9 >M+< 6 >A

R⊤

010

,

3×1

238

Page 261: Efficient Model-based 3D Tracking by Using Direct Image Registration

= S1

([1t

]

⊗ I9

)(I3 ⊗ n

)

1×3

R⊤

010

.

3×1

< 3 >M+< 2 >A

J6 = S1

([1t

]

⊗ I9

)

1×9

(I3 ⊗ n

)

9×3

< 9 >M+< 6 >A

R⊤

001

,

3×1

= S1

([1t

]

⊗ I9

)(I3 ⊗ n

)

1×3

R⊤

001

.

3×1

< 3 >M+< 2 >A

Summing up all the partial complexities of Equations H.3 we have the resultingcomplexity for computing the Jacobian matrix JHB3DTM:

ΘJHB3DTM=< 81 + 75Nv > M+ < 54 + 66Nv > A. (H.6)

Notice that some operations—e.g the product R⊤R∆—are only computed once whereasthe rest of the complexities are computed Nv times.

H.4 Jacobian of Algorithm HB3DTMNF

In the following we compute the number of operations associated to Equations 5.26,that is, we compute the complexity of the gradient replacement stage, not the factor-ization. Again, we compute the complexity for Equations 5.26 in the most efficientway: first we compute the summation

(I3 − (n(i)⊤t)I3 + tn(i)⊤

), and then we com-

pute the remaining products:

J1 = D⊤

3×1

(

I3

3×3

− ( n(i)⊤

1×3

t

3×1

) I3

3×3

< 6 >M+< 2 >A

+ t

3×1

n(i)⊤

1×3

< 9 >M

)

R⊤

3×3

Rαt

3×3

< 27 >M+< 18 >A

v,

3×1

= D⊤

3×1

(

I3 − (n⊤t)I3 + tn(i)⊤

3×3

)

< 9 >M+< 6 >A

R⊤Rαt

3×3

v

3×1

< 9 >M+< 6 >A

= D(i)⊤(

I3 − (n(i)⊤t)I3 + tn⊤)

1×3

R⊤Rαtv,

3×1

< 3 >M+< 2 >A

239

Page 262: Efficient Model-based 3D Tracking by Using Direct Image Registration

J2 = D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

Rβt

3×3

< 27 >M+< 18 >A

v,

3×1

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤Rβt

3×3

v,

3×1

< 9 >M+< 6 >A

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤Rβtv,

3×1

< 3 >M+< 2 >A

J3 = D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

Rγt

3×3

< 27 >M+< 18 >A

v,

3×1

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤Rγt

3×3

v,

3×1

< 9 >M+< 6 >A

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤Rγtv,

3×1

< 3 >M+< 2 >A

J4 = D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

100

3×1

n⊤

1×3

v,

3×1

< 9 >M

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

100

3×1

n⊤

1×3

v,

3×1

< 3 >M+< 2 >A

J5 = D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

010

3×1

n⊤

1×3

v,

3×1

< 9 >M

240

Page 263: Efficient Model-based 3D Tracking by Using Direct Image Registration

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

010

3×1

n⊤

1×3

v,

3×1

< 3 >M+< 2 >A

J6 = D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

001

3×1

n⊤

1×3

v,

3×1

< 9 >M

= D⊤(

I3 − (n⊤t)I3 + tn⊤)

1×3

R⊤

3×3

001

3×1

n⊤

1×3

v.

3×1

< 3 >M+< 2 >A

Summing up all the partial complexities of Equations H.4 we have the resultingcomplexity for computing the Jacobian matrix JHB3DTMNF:

ΘJHB3DTMNF=< 81 + 96Nv > M+ < 54 + 56Nv > A. (H.7)

Notice that some operations—e.g the product R⊤R∆—are only computed once whereasthe rest of the complexities are computed Nv times.

H.5 Jacobian of Algorithm HB3DMMNF

In the following we compute the number of operations associated to computingthe Jacobian of algorithm hb3dmmnf (Equation 5.57); notice that this algorithmonly uses the gradient replacement, not the factorization stage. We compute thecomplexity of Equations 5.57 in the most efficient way: we first compute the commonterm D⊤ (Equation 5.58), that is,

D⊤ = ∇uT [u]⊤

1×3

(

I3

3×3

+( n(i)⊤

1×3

B(i)

3×K

c

K×1

) I3

3×3

−( n(i)⊤

1×3

t

3×1

) I3

3×3

− B(i)

3×K

c

K×1

n(i)⊤

1×3

+ t

3×1

n(i)⊤

1×3

)

(H.8)

241

Page 264: Efficient Model-based 3D Tracking by Using Direct Image Registration

We break down Equation H.8 into to easily display the number of operations asfollows:

D⊤ = ∇uT [u]⊤

1×3

(I3 + ( n⊤

1×3

Bs

3×K

c

K×1

) I3

3×3

< 3K + 6 >M+< 3(K − 1) + 2 >A

− ( n⊤

1×3

t

3×1

) I3

3×3

< 6 >M+< 2 >A

− Bs

3×K

c

K×1

n⊤

1×3

< 3K + 9 >M+< 3(K − 1) >A

+ t

3×1

n⊤

1×3

< 9 >M

)

= ∇uT [u]⊤

1×3

(

I3

3×3

+ (n⊤Bsc)I3

3×3

− (n⊤t)I3

3×3

− Bscn⊤

3×3

+ tn⊤

3×3

)

< 9 >A+< 9 >A+< 9 >A+< 9 >A

= ∇uT [u]⊤

1×3

(

I3 + (n⊤Bsc)I3 − (n⊤t)I3 − Bscn⊤ + tn⊤

)

3×3

< 9 >M+< 6 >A

The complexity of computing D⊤ is the sum of all partial complexities of Equa-tion H.5, that is,

ΘD⊤ =< 39 + 6K > M+ < 40 + 6K > A. (H.9)

Once we have computed D⊤ there is no need to compute it again. We proceed tocompute each term of the Jacobian as follows:

J1 = D⊤

1×3

R⊤

3×3

3×3

< 27 >M+< 18 >A

(I3

3×3

+ Bscn⊤

3×3

)v

242

Page 265: Efficient Model-based 3D Tracking by Using Direct Image Registration

= D⊤

1×3

R⊤Rα

3×3

(I3 + Bscn

⊤)

< 9 >A

v

= D⊤

1×3

R⊤Rα

3×3

(I3 + Bscn

⊤)

3×3

< 27 >M+< 18 >A

v

3×1

= D⊤

1×3

R⊤Rα

3×3

(I3 + Bscn

⊤)

3×3

< 27 >M+< 18 >A

v

3×1

= D⊤

1×3

R⊤Rα(I3 + Bscn

⊤)

3×3

v.

3×1

< 12 >M+< 8 >A

J2 = D⊤

1×3

R⊤

3×3

3×3

< 27 >M+< 18 >A

(I3 + Bscn

⊤)

3×3

v

3×1

= D⊤

1×3

R⊤Rβ

3×3

(I3 + Bscn

⊤)

3×3

< 27 >M+< 18 >A

v

3×1

= D⊤

1×3

R⊤Rβ(I3 + B(i)cn⊤

)

3×3

v.

3×1

< 12 >M+< 8 >A

J3 = D⊤

1×3

R⊤

3×3

3×3

< 27 >M+< 18 >A

(I3 + Bscn

⊤)

3×3

v

3×1

= D⊤

1×3

R⊤Rγ

3×3

(I3 + Bscn

⊤)

3×3

< 27 >M+< 18 >A

v

3×1

= D⊤

1×3

R⊤Rγ(I3 + Bscn

⊤)

3×3

v.

3×1

< 12 >M+< 8 >A

243

Page 266: Efficient Model-based 3D Tracking by Using Direct Image Registration

J4 = D⊤

1×3

R⊤

100

3×1

n⊤

1×3

< 9 >M

v

3×1

= D⊤

1×3

R⊤

100

n⊤

3×3

v

3×1

< 12 >M+< 8 >A

= D⊤

1×3

R⊤

100

n⊤v

3×1

< 3 >M+< 2 >A

J5 = D⊤

1×3

R⊤

010

3×1

n⊤

1×3

< 9 >M

v

3×1

= D⊤

1×3

R⊤

010

n⊤

3×3

v

3×1

< 12 >M+< 8 >A

= D⊤

1×3

R⊤

010

n⊤v

3×1

< 3 >M+< 2 >A

J6 = D⊤

1×3

R⊤

001

3×1

n⊤

1×3

< 9 >M

v

3×1

244

Page 267: Efficient Model-based 3D Tracking by Using Direct Image Registration

= D⊤

1×3

R⊤

001

n⊤

3×3

v

3×1

< 12 >M+< 8 >A

= D⊤

1×3

R⊤

001

n⊤v

3×1

< 3 >M+< 2 >A

Jk = D⊤

1×3

R⊤

3×3

Bk

3×1

< 9 >M+< 6 >A

n⊤

1×3

v

3×1

< 3 >M+< 2 >A

= D⊤

1×3

R⊤Bk

3×1

n⊤v

1×1

< 3 >M

= D⊤

1×3

R⊤Bkn⊤v

3×1

< 3 >M+< 2 >A

, i = 1, . . . , K.

Summing up all the partial complexities of Equations H.5 and H.9 we have theresulting complexity for computing the Jacobian matrix Jhb3dmmnf:

ΘJhb3dmmnf=< 81 + (24K + 219)Nv > M+ < 54 + (16K + 171)Nv > A. (H.10)

Notice that some operations—e.g the product R⊤R∆—are only computed oncewhereas the rest of the complexities are computed Nv times.

245

Page 268: Efficient Model-based 3D Tracking by Using Direct Image Registration

H.6 Jacobian of Algorithm HB3DMMSF

In the following we compute the number of operations associated to computing theJacobian of algorithm hb3dmmsf (Equation 8.5) by means of a partial factorization.We compute the complexity of Equations 8.5 in the most efficient way, that is, weavoid to compute repeated operations more than once. We show the complexitiesassociated to each element of the row of the Jacobian in the following:

J1 = S1

1×(3(K+4))

I3 ⊗

1tc

(3(K+4))×3

< 3K + 5 >M+< 3K + 3 >A

R⊤

3×3

Rαt

3×3

(

v(i)

3×1

+ B(i)

3×K

c

K×1

n(i)⊤

1×3

v(i)

3×1

)

= S1

I3 ⊗

1tc

1×3

R⊤

3×3

Rαt

3×3

< 27 >M+< 18 >A

(

v(i)

3×1

+ B(i)

3×K

c

K×1

n(i)⊤

1×3

v(i)

3×1

< 3 >M+< 2 >A

)

= S1

I3 ⊗

1tc

1×3

R⊤Rαt

3×3

(

v(i)

3×1

+ B(i)

3×K

c

K×1

n(i)⊤v(i)

1×1

< 3K + 3 >M+< 3K − 3 >A

)

= S1

I3 ⊗

1tc

1×3

R⊤Rαt

3×3

(

v(i)

3×1

+ B(i)cn(i)⊤v(i)

3×1

)

< 3 >A

= S1

I3 ⊗

1tc

1×3

R⊤Rαt

3×3

(v(i) + B(i)cn(i)⊤v(i)

).

3×1

< 12 >M+< 8 >A

J2 = S1

I3 ⊗

1tc

1×3

R⊤

3×3

Rβt

3×3

< 27 >M+< 18 >A

(v(i) + B(i)cn(i)⊤v(i)

)

3×1

246

Page 269: Efficient Model-based 3D Tracking by Using Direct Image Registration

= S1

I3 ⊗

1tc

1×3

R⊤Rβt

3×3

(v(i)B(i)cn(i)⊤v(i)

).

3×1

< 12 >M+< 8 >A

J3 = S1

I3 ⊗

1tc

1×3

R⊤

3×3

Rγt

3×3

< 27 >M+< 18 >A

(v(i) + B(i)cn(i)⊤v(i)

)

3×1

= S1

I3 ⊗

1tc

1×3

R⊤Rγt

3×3

(v(i) + B(i)cn(i)⊤v(i)

).

3×1

< 12 >M+< 8 >A

J4 = S1

I3 ⊗

1tc

1×3

n(i)⊤v(i)

1×1

< 3 >M

R⊤

100

3×1

= S1

I3 ⊗

1tc

n(i)⊤v(i)

1×3

R⊤

100

.

3×1

< 3 >M+< 2 >A

J5 = S1

I3 ⊗

1tc

1×3

n(i)⊤v(i)

1×1

< 3 >M

R⊤

010

3×1

= S1

I3 ⊗

1tc

n(i)⊤v(i)

1×3

R⊤

010

.

3×1

< 3 >M+< 2 >A

247

Page 270: Efficient Model-based 3D Tracking by Using Direct Image Registration

J6 = S1

I3 ⊗

1tc

1×3

n(i)⊤v(i)

1×1

< 3 >M

R⊤

001

3×1

= S1

I3 ⊗

1tc

n(i)⊤v(i)

1×3

R⊤

001

.

3×1

< 3 >M+< 2 >A

Jk = S1

I3 ⊗

1tc

1×3

n⊤v

1×1

< 3 >M

R⊤

3×3

Bk

3×1

< 9 >M+< 6 >A

= S1

I3 ⊗

1tc

n⊤v

1×3

R⊤Bk.

3×1

< 3 >M+< 2 >A

Summing up all the partial complexities of Equations H.6 we obtain the resultingcomplexity for computing the Jacobian matrix Jhb3dmmsf:

ΘJhb3dmmsf=< 81 + (18K + 60)Nv > M+ < 54 + (14K + 36)Nv > A. (H.11)

Notice that some operations—e.g the product R⊤R∆—are only computed once whereasthe rest of the complexities are computed Nv times.

248